pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-22 22:25:10 +08:00

Author	SHA1	Message	Date
kiukchung	36449ea931	(torch/elastic) add fqdn hostname to error printout (#66182 ) (#66662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66182 closes https://github.com/pytorch/pytorch/issues/63174 Does a few things: 1. adds hostname to the error report 2. moves the "root cause" section to the end (presumably since the logs are being "tailed" we want the root cause to appear at the end) 3. moves redundant error info logging to debug 4. makes the border max 60 char in length and justifies left for the header NOTE: YOU HAVE TO annotate your main function with torch.distributed.elastic.multiprocessing.errors.record, otherwise no traceback is printed (this is because python exception propagation does NOT work out of the both for IPC - hence the extra record annotation). Test Plan: Sample ``` ============================================================ run_script_path FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES> ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2021-10-05_17:37:22 host : devvm4955.prn0.facebook.com rank : 0 (local_rank: 0) exitcode : 1 (pid: 3296201) error_file: /home/kiuk/tmp/elastic/none_3_lsytqe/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/tmp/jetter.xr3_x6qq/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 372, in wrapper return f(args, *kwargs) File "main.py", line 28, in main raise RuntimeError(args.throws) RuntimeError: foobar ============================================================ ``` Reviewed By: cbalioglu, aivanou Differential Revision: D31416492 fbshipit-source-id: 0aeaf6e634e23ce0ea7f6a03b12c8a9ac57246e9	2021-10-14 18:35:23 -07:00
Nikita Shulga	b544cbddfa	Handle shared memory cases in MathBitFallback (#66667 ) * Handle shared memory cases in MathBithFallback (#63602) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63602 This PR fixes the case when a read and write is performed on a memory shared between mutable and (or) non-mutable arguments. Example: ``` a=torch.tensor([1+1j]) b=a.conj() b.add_(a) # should return tensor([2]) but returns tensor ([2-2j]) ``` The issue here is that in the conjugate fallback, we resolve the conjugation in-place for mutable arguments which can be a problem as shown above in the case when other input arguments share memory with the mutable argument(s). This PR fixes this issue by: 1. first scanning through the operator input arguments and creating a vector of mutable arguments that have the conj bit set to `True` (and accordingly setting the flag `check_for_alias_with_mut_arg ` to `True` or `False`). 2. Iterating through all the arguments. At this time we only look at the non-mutable arguments. If `check_for_alias_with_mut_arg` is set to `True`, then we iterate through `mutable_inputs` to check if the current arg tensor in question doesn't alias any of the entries in `mutable_inputs`. If yes, then we clone the non-mutable tensor arg, else we resolve the conjugation as before. 3. Now we look through the mutable_inputs vector (which contains only mutable input tensors with conj bit set to `True`). We in-place conjugate each of the entries in the vector. 4. Do the computation. 5. Re-conjugate the mutable argument tensors. NOTE: `TensorLists` are not fully handled in ConjugateFallback. Please see the in-line comment for more details. Fixes https://github.com/pytorch/pytorch/issues/59943 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D30466905 Pulled By: anjali411 fbshipit-source-id: 58058e5e6481da04a12d03f743c1491942a6cc9b * fix lint (#66572) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66572 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D31624043 Pulled By: suo fbshipit-source-id: 9db9cee3140d78c2a2f0c937be84755206fee1dd Co-authored-by: anjali411 <chourdiaanjali123@gmail.com> Co-authored-by: Michael Suo <suo@fb.com>	2021-10-14 18:34:13 -07:00
anjali411	ddf3092581	Disable .numpy() and .tolist() for tensor subclasses subclasses and f… (#66642 ) * Disable .numpy() and .tolist() for tensor subclasses subclasses and fix .tolist() for conjugated and negated tensors (#66082) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66082 Fixes https://github.com/pytorch/pytorch/issues/66024 #65779 cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved albanD Test Plan: Imported from OSS Reviewed By: Gamrix, albanD Differential Revision: D31615588 Pulled By: anjali411 fbshipit-source-id: c3e65ef0fe301630eb76732ccd7819683c09aa19 * Apply suggestions from code review Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com> Co-authored-by: Nikita Shulga <nshulga@fb.com>	2021-10-14 16:00:56 -07:00
Nikita Shulga	cc360fa38f	Delete extraneous whitespaces	2021-10-14 15:57:16 -07:00
anjali411	3c134b8b1e	Disable .numpy() and .tolist() for tensor subclasses subclasses and fix .tolist() for conjugated and negated tensors (#66082 ) (#66576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66082 Fixes https://github.com/pytorch/pytorch/issues/66024 #65779 cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved albanD Test Plan: Imported from OSS Reviewed By: Gamrix, albanD Differential Revision: D31615588 Pulled By: anjali411 fbshipit-source-id: c3e65ef0fe301630eb76732ccd7819683c09aa19	2021-10-14 13:16:03 -07:00
Nikita Shulga	4a514dd81e	Call `PyArray_Check` only if NumPy is available (#66433 ) (#66629 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66353 Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66433 Reviewed By: seemethere, janeyx99 Differential Revision: D31548290 Pulled By: malfet fbshipit-source-id: 3b094bc8195d0392338e0bdc6df2f39587b85bb3	2021-10-14 09:46:41 -07:00
Natalia Gimelshein	c3ea586e32	fix normal with empty std (#66524 )	2021-10-14 09:42:41 -07:00
Natalia Gimelshein	9509e8a3d6	Fix cosine similarity dim checks (#66214 ) * fix cosine similarity dimensionality check * fix shapes in the doc	2021-10-08 07:22:40 -07:00
Gary Miguel	1774a6a2f4	[ONNX] Deprecate various args (#65962 ) * [ONNX] Remove argument _retain_param_name from torch.onnx.export() function. (#61702) (#64370) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64370 As of now, the "_retain_param_name" parameter has no description in PyTorch docs website. According to code, this argument determines if we keep the original parameter names of PyTorch model in the final ONNX graph. If this is False, those original parameter names will be replaced with a series of integers starting from 1. Since setting numbers as parameter names make no sense to users, we remove this argument from the torch.onnx.export() function to increase user experience of calling this function. This PR will still keep it in torch.onnx.export() function for backward support while all backend logic has been changed to work as _retain_param_name is set to True. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905270 Pulled By: malfet fbshipit-source-id: ca60757ca17daaff937e9f08da42596086795f4a Co-authored-by: fatcat-z <zhang-ji@outlook.com> * [ONNX] Remove strip_doc_string param from torch.onnx.export() function. (#61712) (#64371) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64371 As of now, the "strip_doc_string" parameter was described as below: strip_doc_string (bool, default True): do not include the field doc_string``` from the exported model. Otherwise the field will mention the source code locations for model``. This is usually useless to users who want to transform a PyTorch model to ONNX one. Only when the user wants to debug the export process, these source code locations could provide benefits. To make the export() function more friendly by providing less parameters, we combined "strip_doc_string" into "verbose" parameter. If a user set verbose to True, it means the users need some log information for debugging the export process and this is similar with the purpose of strip_doc_string parameter. But the usage of these 2 arguments are opposite: setting verbose to True means we want to print log information to help debug, which means strip_doc_string should be False. And this is how we replace strip_doc_string with verbose argument in this PR. This PR will still keep it in torch.onnx.export() function for backward support while the usage of it has been combined with verbose argument. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905268 Pulled By: malfet fbshipit-source-id: 2f06eb805c01fe15ff7a1b4f6595c937ba716d60 Co-authored-by: fatcat-z <zhang-ji@outlook.com> * [ONNX] minor doc improvements and cleanup (#62514) (#64373) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64373 * Fix some bad formatting and clarify things in onnx.rst. * In `export_to_pretty_string`: * Add documentation for previously undocumented args. * Document that `f` arg is ignored and mark it deprecated. * Update tests to stop setting `f`. * Warn if `_retain_param_name` is set. * Use double quotes for string literals in test_operators.py. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905271 Pulled By: malfet fbshipit-source-id: 3627eeabf40b9516c4a83cfab424ce537b36e4b3 * [ONNX] Deprecated the example_outputs param from torch.onnx.export() function. (#62815) (#64380) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64380 * `example_outputs` used to determine the type and shape of the outputs without tracing the execution of the model. And it must be provided when exporting a ScriptModule or ScriptFunction when using export() function. * Since we can work out `example_outputs` in internal function instead of being provided by user, so we deprecated this argument in the export() function to increase user experience of calling this function. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905266 Pulled By: malfet fbshipit-source-id: d00b00d7d02b365d165028288ad915678caa51f2 Co-authored-by: hwangdeyu <dejack953@outlook.com> * [ONNX] Deprecate use_external_data_format param from torch.onnx.export() function. (#62257) (#64382) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64382 * This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit. * When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself. * This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905265 Pulled By: malfet fbshipit-source-id: 82b4e17bfa6a8de2bfd700a5282c12f6835603cb Co-authored-by: hwangdeyu <dejack953@outlook.com> * fix clang-tidy error introduced by #64382 (#65977) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65977 Reviewed By: ngimel Differential Revision: D31423174 Pulled By: malfet fbshipit-source-id: 0ea560b9a6ddd6431f70bd3ac10ace68e26ab352 Co-authored-by: BowenBao <bowbao@microsoft.com> Co-authored-by: fatcat-z <zhang-ji@outlook.com> Co-authored-by: hwangdeyu <dejack953@outlook.com>	2021-10-08 07:21:29 -07:00
Erjia Guan	a27906c250	Convert Sampler back to lazily construction (#63646 ) (#65926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63646 Fixes #63609 Test Plan: Imported from OSS Reviewed By: NivekT Differential Revision: D30451774 Pulled By: ejguan fbshipit-source-id: 550d77494326446d1a42b5da0559e0d384c47413	2021-10-08 07:20:03 -07:00
Prabhat Roy	49f52b6c07	Revert "Added option to update parameters using state_dict in AveragedModel (#65495 ) (#65755 )" (#66308 ) This reverts commit 5f1a434599b46afd99607839d15892e09269a1c4.	2021-10-08 07:17:47 -07:00
Prabhat Roy	5f1a434599	Added option to update parameters using state_dict in AveragedModel (#65495 ) (#65755 ) * Added option to update parameters using state_dict in AveragedModel (#65495) Summary: While implementing [EMA](https://github.com/pytorch/vision/pull/4381)(which extends AveragedModel) in torchvision, update_parameters() from AveragedModel could not be used as it did not handle state_dict(), so a custom update_parameters() needed to be defined in [EMA class](https://github.com/pytorch/vision/pull/4406). This PR aims to handle this scenario removing the need for this custom update_parameters() implementation. Discussion: https://github.com/pytorch/vision/pull/4406#pullrequestreview-753734102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65495 Reviewed By: datumbox Differential Revision: D31176742 Pulled By: prabhat00155 fbshipit-source-id: 326d14876018f21cf602bab5eaba344678dbabe2 (cherry picked from commit 2ea724b1fd543304e3be7bd223cac451cd093e16) * Added validation of mode parameter in AveragedModel (#65921) Summary: Discussion: https://github.com/pytorch/pytorch/pull/65495#issuecomment-930460469 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65921 Reviewed By: albanD Differential Revision: D31310105 Pulled By: prabhat00155 fbshipit-source-id: 417691832a7c793744830c11e0ce53e3972d21a3 (cherry picked from commit c7748fc172553da66368fd0b7fea3fe5661e2dc1)	2021-10-06 11:13:31 -07:00
Nikita Shulga	ecbf5a7439	Tweak `file_diff_from_base` for release/1.10 branch (#66202 )	2021-10-06 08:34:46 -07:00
Kevin Tse	4e3ebebcff	[DataPipe] DataPipe Fix and Deprecation Warnings for Release 1.10 (#65932 ) * Unify the output pathname of archive reader and extractor (#65424) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65424 This PR is re-implementation for https://github.com/facebookexternal/torchdata/pull/93 Same PR has landed into torchdata https://github.com/facebookexternal/torchdata/pull/157 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D31090447 Pulled By: ejguan fbshipit-source-id: 45af1ad9b24310bebfd6e010f41cff398946ba65 * [DatePipe] add deprecation warnings for DataPipes that will solely exist in TorchData (#65827) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65827 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31272794 Pulled By: NivekT fbshipit-source-id: 8da8266184b4df050422904cbc5fca6d7c3d2e02 * [DataPipe] Fixes an issue where TarArchiveReader closes stream when read into a buffer (#65877) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65877 Fixes #65808 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31296041 Pulled By: NivekT fbshipit-source-id: cdcad3a333ae9781d6063678a122a128955b0ff4 Co-authored-by: Erjia Guan <erjia@fb.com>	2021-10-05 20:54:40 -07:00
Nikita Shulga	2b46c95e7c	[iOS][CI] Update dev certs (#66004 ) (#66188 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65988 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66004 Reviewed By: xta0 Differential Revision: D31340893 Pulled By: malfet fbshipit-source-id: 3bf0be266e9686a73d62e86c5cf0bebeb0416260 Co-authored-by: Tao Xu <taox@fb.com>	2021-10-05 20:12:40 -07:00
Nikita Shulga	5f3eee1ca5	Fix backward compatibility tests (#66186 ) Compare operator list against RC1 build rather than against nightly	2021-10-05 20:12:13 -07:00
Nikita Shulga	4731f33d02	Fix Windows ninja builds when MAX_JOBS is specified (#65444 ) (#66155 ) Summary: Reported by cloudhan in https://github.com/pytorch/pytorch/pull/64733#issuecomment-924545463 Fixes regression introduced by `047e68235f` cc malfet seemethere Pull Request resolved: https://github.com/pytorch/pytorch/pull/65444 Reviewed By: dagitses, seemethere Differential Revision: D31103260 Pulled By: malfet fbshipit-source-id: 9d5454a64cb8a0b96264119cf16582cc5afed284	2021-10-05 12:03:27 -07:00
n-v-k	ecfcb8ff5a	Binary building wthout python fix (#66031 ) (#66117 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66031 Reviewed By: VitalyFedyunin Differential Revision: D31356243 Pulled By: malfet fbshipit-source-id: d1537bc65bbba5d6497ecb8db7160a397eca81fd	2021-10-05 12:02:51 -07:00
Nikita Shulga	6aadfda9e2	[ci] try installing libgnutls to fix cert error (#65934 ) (#65979 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65934 see: https://github.com/pytorch/pytorch/issues/65931, this was a suggested remediation on the linked issue Test Plan: Imported from OSS Reviewed By: malfet, zhouzhuojie Differential Revision: D31313040 Pulled By: suo fbshipit-source-id: a9e2b82a1e879962af768ed3049c73ab77394738 Co-authored-by: Michael Suo <suo@fb.com>	2021-09-30 18:55:44 -07:00
Erjia Guan	13666d20fd	[DataPipe] Fix deepcopy filehandle for Mapper and in-place modification for IterableWrapper (#65220 ) (#65924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65220 Fixes #65221 - Remove deepcopy from Mapper to support file handles - Convert `IterableWrapper` to deepcopy iterable instance within each iterator to prevent in-place modification (different data per epoch) - Convert `IDP` to `IterableWrapper` in test_datapipe.py - Refine the variable names (prevent using `dp` that is module reference) Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31021886 Pulled By: ejguan fbshipit-source-id: 72a9eee66c758e2717d591cd0942892bddedc223	2021-09-30 18:36:49 -07:00
Nikita Shulga	1fa17a20fc	Fix the slowdown of _object_to_tensor since 1.9 (#65721 ) (#65835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65721 #Closes: https://github.com/pytorch/pytorch/issues/65696 The bug is introduced in https://github.com/pytorch/pytorch/pull/55861, and it causes 100X slowdown since 1.9. ghstack-source-id: 139128267 Test Plan: Performance test: ``` import time from torch.distributed.distributed_c10d import _object_to_tensor start = time.time() _object_to_tensor("x" * 50_000_000) print("Time:", time.time() - start) ``` Reviewed By: rohan-varma Differential Revision: D31219794 fbshipit-source-id: 1abec38f9d51361c1eab6ad5efd87b589322e208 Co-authored-by: Yi Wang <wayi@fb.com>	2021-09-29 14:38:54 -07:00
Zhuojie Zhou	c05547fa6c	Fix test reporting git merge-base (#65787 )	2021-09-28 15:48:32 -07:00
Richard Zou	0e857bf109	[1.10] Remove torch.vmap (#65496 ) torch.vmap is a prototype feature and should not be in the stable binary. This PR: - Removes the torch.vmap API - Removes the documentation entry for torch.vmap - Changes the vmap tests to use an internal API instead of torch.vmap. Test Plan: - Tested locally (test_torch, test_autograd, test_type_hints, test_vmap), but also wait for CI.	2021-09-24 10:29:08 -07:00
Nikita Shulga	ad22804b95	[release/1.10] Pin builder and xla repo (#65433 ) Pin builder to https://github.com/pytorch/builder/commits/release/1.10 Pin xla to https://github.com/pytorch/xla/tree/r1.10	2021-09-21 16:16:22 -07:00
Natalia Gimelshein	eb4fb1ed81	THCTensor cleanup (#65369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65369 Reviewed By: bhosmer Differential Revision: D31071406 Pulled By: ngimel fbshipit-source-id: bbc3f2781003333641524aeb692b944fd3ad8d7a	2021-09-21 10:28:19 -07:00
Xing Liu	600df80296	[PT/ShardedTensor]Allow zero size local shard (#65007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65007 Relax shard size check in ShardMetadata to allow zero size local shard. When sharding a tensor on N ranks, some ranks may have empty shard allocated. As we are assuming SPMD, the ranks w/ empty shard still need to participate in all collectives, and we need to allow this in ShardMetadata. Test Plan: Unit tests and CLI Reviewed By: jiaqizhai, wanchaol Differential Revision: D30926566 fbshipit-source-id: afa562c94ffa8f8d91d65ddb4c348156d871dc36	2021-09-21 09:58:54 -07:00
kshitij12345	7f6580a868	OpInfo: nn.functional.conv2d (#65233 ) Summary: Reland : https://github.com/pytorch/pytorch/issues/63517 Reference: https://github.com/pytorch/pytorch/issues/54261 Reference: facebookresearch/functorch#78 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65233 Reviewed By: malfet Differential Revision: D31025538 Pulled By: zou3519 fbshipit-source-id: b1cd38c22f4cb8eedd3f958e02dd7410dcbb8d8d	2021-09-21 09:26:23 -07:00
Mike Iovine	9324181d0a	[JIT] Re-land "Add aten::slice optimization" (#65341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65341 The changes in D30231044 (`babd449978`) were removed due to a downstream issue in glow. Now that the issue has been fixed by D30849396, we can safely re-introduce the changes. Test Plan: `buck test //caffe2/test:jit -- TestPeephole` Glow test: * `buck test //glow/fb/torch_glow/tests:unfuse_glow_ops_test` * qxy11 confirmed that the problematic glow model now loads correctly with these changes Reviewed By: eellison Differential Revision: D31056878 fbshipit-source-id: 049903ee04ba88885cc9d1a91427af0f1f44f681	2021-09-21 07:29:51 -07:00
kshitij12345	9c23f6eb7d	[nn] TripletMarginLoss and PairwiseDistance : no batch dim (#64882 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64882 Reviewed By: malfet Differential Revision: D31055577 Pulled By: jbschlosser fbshipit-source-id: 2f0a5a08619b672026b48a78bc7d83a6dccba0bf	2021-09-21 07:29:48 -07:00
Teng Gao	d35ee431d8	correlate forward and backward op (#62553 ) Summary: Use startThreadId+seqNumber of forward-op and fwdThreadId+seqNumber of backward-op to correlate pair of them. third_party/kineto should be updated accordingly: https://github.com/pytorch/kineto/pull/372 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62553 Reviewed By: malfet Differential Revision: D30125728 Pulled By: gdankel fbshipit-source-id: 9877a54392ba043d0eac56ce5b7bbf244277fa7e	2021-09-21 07:28:29 -07:00
Rodrigo Berriel	f0ada4bd54	[docs] Remove .data from some docs (#65358 ) Summary: Related to https://github.com/pytorch/pytorch/issues/30987. Fix the following task: - [ ] Remove the use of `.data` in all our internal code: - [ ] ... - [x] `docs/source/scripts/build_activation_images.py` and `docs/source/notes/extending.rst` In `docs/source/scripts/build_activation_images.py`, I used `nn.init` because the snippet already assumes `nn` is available (the class inherits from `nn.Module`). cc albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/65358 Reviewed By: malfet Differential Revision: D31061790 Pulled By: albanD fbshipit-source-id: be936c2035f0bdd49986351026fe3e932a5b4032	2021-09-21 06:32:31 -07:00
Benjamin Rowell	daa50f1e9f	Adds keyword only args to gradcheck (#65290 ) Summary: Changes the call signature of gradcheck so that kwargs are kwargs only. Also modifies return call from gradgradcheck, to reflect these changes. Fixes https://github.com/pytorch/pytorch/issues/65165 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65290 Reviewed By: soulitzer Differential Revision: D31061316 Pulled By: albanD fbshipit-source-id: 3505569a33a497a8be4347bdd425bb2b8e536999	2021-09-21 06:31:07 -07:00
Chen Lai	880098a7e3	[PyTorch Edge] Backport function for defaults args with out args, flag on (#63651 ) Summary: 1. Enable support for operators with default args and out args. For `torch.add(x, h, out=x)`, the number of specified arguments will be 3 instead of 4. 2. Bump bytecode version from 6 to 7 3. Implement backport_v7_to_v6 function. Also slightly refactor the local_thread to allow re-emit operators. 4. unittest to cover backport function 5. Update expect result from 4 to 3 in unit test DefaultArgsWithOutArg to cover the number of specified arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63651 ghstack-source-id: 138539912 Test Plan: ``` caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions ``` Reviewed By: raziel, tugsbayasgalan Differential Revision: D30454080 fbshipit-source-id: 357c50b96682430675142d20d688d1f64e1de307	2021-09-20 22:50:30 -07:00
Pavel Belevich	5826d207ad	[JIT] Delete obsolete message: or if you absolutely have to, use c10::impl::GenericDict(c10::impl::deprecatedUntypedDict()) (#65164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65164 Looks like it was forgotten in https://github.com/pytorch/pytorch/pull/25439 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31072625 Pulled By: pbelevich fbshipit-source-id: a5ffcfb0836f962ab6952a187ba7717c4d4a6e33	2021-09-20 22:50:28 -07:00
Pavel Belevich	19a1063888	[JIT] Support device as Dict key (#65079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65079 This is required to use RPC DeviceMap aka Dict[torch.device, torch.device] in torchscript Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31072626 Pulled By: pbelevich fbshipit-source-id: 51cfa5653db86de73b624e9157d68d1b319bfc64	2021-09-20 22:49:15 -07:00
Amr Elshennawy	512834b61d	Reduce PyTorch warnings: Cast fix xplat/caffe2/aten/src/ATen/core/DeprecatedTypeProperties.h (#65031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65031 Test Plan: ``` buck build --show-output //caffe2/torch/fb/sparsenn:sparsenn_operators buck test caffe2/torch/fb/sparsenn:test ``` Reviewed By: r-barnes Differential Revision: D30948791 fbshipit-source-id: 13046e1d0ce2c24864ad38f318ca5e34b1bb9552	2021-09-20 20:29:58 -07:00
Pritam Damania	0dc98728bc	Basic implementation of ShardedLinear using ShardedTensor. (#64128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64128 This PR implements a sharded nn.Linear layer using ShardedTensors with the following limitations: 1) Works only for ChunkShardingSpec. 2) Implementation is only aimed to demonstrate functionality and is most likely not performant at all. The PR also introduces a `shard_parameter` API to easily shard parameters of `nn.Modules`. This also has the following limitations: 1) Works only for ChunkShardingSpec. 2) Is not performant since it uses broadcast instead of scatter since ProcessGroupNCCL doesn't yet support scatter. Overall user API for running a sharded linear would be something like this: ``` # SPMD programming paradigm running same code on all nodes. fc = nn.Linear(10, 10) # Setup sharding. sharding_spec=ChunkShardingSpec(...) shard_parameter(fc, 'weight', sharding_spec, src_rank=0) # Run as a normal linear layer. inp = torch.rand(10, 10) output = fc(inp) ``` ghstack-source-id: 138500985 Test Plan: 1) unit tests. 2) waitforbuildbot Reviewed By: wanchaol, bowangbj Differential Revision: D30621215 fbshipit-source-id: 1aa7478568c18a4572f6c3462fdf24a4cbde01d6	2021-09-20 18:31:11 -07:00
driazati	257a18d951	Track peak memory usage (#65157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65157 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31029049 Pulled By: driazati fbshipit-source-id: 3e87e94e4872d118ad191aef2b77b8cefe90aeb6	2021-09-20 17:25:16 -07:00
driazati	58909395ab	Fix logic to determine master vs PR (#65155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65155 This was bugged before on empty strings which caused the hook to write on any job, not just `master` regardless of the `only_on_master` flag. Test Plan: see `[scribe] Skipping RDS write on PR` in the logs for `linux-xenial-cuda11.3-py3.6-gcc7` Reviewed By: malfet Differential Revision: D31029048 Pulled By: driazati fbshipit-source-id: 77c4a60e443d8fc19990755a3a346576afee86d8	2021-09-20 17:25:14 -07:00
Ben Koopman	60915eb810	[quant] Add fp32/fp16 zero_point support for CPU fakeQuant (#65055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65055 Test Plan: Imported from OSS Reviewed By: jingsh, supriyar Differential Revision: D30975238 Pulled By: b-koopman fbshipit-source-id: 2000660ffe71cb85d00fdabaf8fc3ba7323f9a1e	2021-09-20 17:25:12 -07:00
Hao Lu	ce101fed02	[PyPer] copy-free freeze_module (#65118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65118 Cloning the module can increase memory use. By freezing the module directly without cloning it first, we can avoid this memory usage increase. Reviewed By: eellison, movefast1990 Differential Revision: D30955053 fbshipit-source-id: 2feb738eddcf66aa68c92bf695cc05b57bd990f0	2021-09-20 17:25:10 -07:00
Amr Elshennawy	ca649851c6	Reduce PyTorch warnings: Cast fix xplat/caffe2/c10/core/TensorOptions.h (#65030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65030 Test Plan: ``` buck build --show-output //caffe2/torch/fb/sparsenn:sparsenn_operators buck test caffe2/torch/fb/sparsenn:test ``` Reviewed By: r-barnes Differential Revision: D30948721 fbshipit-source-id: 16fe42daab35709c56a4d3ccc276ea635a3510c1	2021-09-20 17:23:58 -07:00
Tao Xu	2465a103b8	[iOS] Zero out NSError to avoid heap corruptions for the OSS builds (#65355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65355 I've been seeing heap corruptions in the CMake builds due to the NSError* not being initialized with `nil`. However, I haven't see this issue for the BUCK builds. ghstack-source-id: 138502708 Test Plan: 1. Test the OSS builds to make sure the heap corruption has gone. 2. Test the Buck build in the playground app 3. Circle CI Reviewed By: hanton Differential Revision: D31048010 fbshipit-source-id: cfd8d614f3f91f09caee4aab61237007ec080481	2021-09-20 16:31:23 -07:00
=	b7adb3350a	Add crow_/col_indices to view types (#63176 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61103 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63176 Reviewed By: malfet, albanD Differential Revision: D30315882 Pulled By: cpuhrsch fbshipit-source-id: eedae5265a757ed68fd69e4f9d07070b05de4bd8	2021-09-20 14:35:58 -07:00
Protonu Basu	31f61122da	Creating a helper function to generate an unique name for an attr in a module (#64970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64970 Add a helper function to create an unique name for an attr. This can be used when we want to add a weight to a module. Test Plan: run CI. Reviewed By: jfix71 Differential Revision: D30921497 fbshipit-source-id: 598569d107df8b516ff12920a4bef3a42577e987	2021-09-20 14:35:56 -07:00
Protonu Basu	b45ec16310	Add support to lower acc_ops.transpose (#65036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65036 Reviewed By: jfix71, 842974287 Differential Revision: D30934503 fbshipit-source-id: 51880d3d36492f5206f77c9d1a994d8532597b62	2021-09-20 14:35:54 -07:00
Shiyan Deng	e33a1fa680	[fx] give warning instead of fatal the program when submod not found during adding get_attr (#65225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65225 Currently when create get_attr node, if the attribute is in a submodule, we'll fist find the submodule. If the submodule isn't in the owning module we throw an exception. However, if the attribute can't be found, we give a warning but still allow to create the get_attr node. To align with this behavior, we change the reaction when submod not found from fatal to giving a warning. Test Plan: CI Reviewed By: jamesr66a, jfix71 Differential Revision: D31021535 fbshipit-source-id: 4c0b471448c09cc927d0f47b5bf56594f25a8863	2021-09-20 14:35:52 -07:00
Can Balioglu	8fb253757d	Remove @balioglu from PyTorch Distributed code owners (#65239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65239 Due to too much noise caused by the GitHub notifications, going forward I prefer to track PRs manually. ghstack-source-id: 138386041 Test Plan: N/A Reviewed By: mrshenli Differential Revision: D31027792 fbshipit-source-id: 6578e41d4ab53ad2c64a41584716f4903298cd6b	2021-09-20 14:34:37 -07:00
Michael Carilli	e3210ca184	[CUDA graphs] Beta, not prototype (#65247 ) Summary: Powers have decided this API should be listed as beta. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65247 Reviewed By: malfet Differential Revision: D31057940 Pulled By: ngimel fbshipit-source-id: 137b63cbd2c7409fecdc161a22135619bfc96bfa	2021-09-20 13:32:36 -07:00
Rodrigo Berriel	b71f01f70d	Fix full backward hook when grad is disabled (#65335 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59901. See discussion in the issue. cc albanD soulitzer Pull Request resolved: https://github.com/pytorch/pytorch/pull/65335 Reviewed By: malfet Differential Revision: D31055865 Pulled By: albanD fbshipit-source-id: 53605df62bc73c99d8908248087ab400b81ac495	2021-09-20 13:31:19 -07:00
zhouzhuojie	2abf3594d5	Fix unassigned ciflow trigger (#65354 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65250#issuecomment-923120764 this is a limitation of github action triggers, it's hard to introduce condition before the workflow, that's why we intentionally pick the rare event ("unassigned"). The fix I think for people didn't opt-in ciflow and manually unassign, is to run all the CI (otherwise we introduce new condition on this and not worth to make things even complex). `unassigned` event payload looks like this, just to make sure `github.event.assignee.login` is pointing to the right location. ``` { "action": "unassigned", "assignee": { "avatar_url": "https://avatars.githubusercontent.com/u/658840?v=4", "events_url": "https://api.github.com/users/zhouzhuojie/events{/privacy}", "followers_url": "https://api.github.com/users/zhouzhuojie/followers", "following_url": "https://api.github.com/users/zhouzhuojie/following{/other_user}", "gists_url": "https://api.github.com/users/zhouzhuojie/gists{/gist_id}", "gravatar_id": "", "html_url": "https://github.com/zhouzhuojie", "id": 658840, "login": "zhouzhuojie", "node_id": "MDQ6VXNlcjY1ODg0MA==", "organizations_url": "https://api.github.com/users/zhouzhuojie/orgs", "received_events_url": "https://api.github.com/users/zhouzhuojie/received_events", "repos_url": "https://api.github.com/users/zhouzhuojie/repos", "site_admin": false, "starred_url": "https://api.github.com/users/zhouzhuojie/starred{/owner}{/repo}", "subscriptions_url": "https://api.github.com/users/zhouzhuojie/subscriptions", "type": "User", "url": "https://api.github.com/users/zhouzhuojie" }, ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/65354 Reviewed By: malfet, seemethere, janeyx99 Differential Revision: D31060212 Pulled By: zhouzhuojie fbshipit-source-id: ce815cc96e8a00016646d6f02f0917169fa652dc	2021-09-20 12:33:23 -07:00
Alban Desmaison	378949b83c	fix typo missing f string (#65226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65226 Reviewed By: malfet Differential Revision: D31055793 Pulled By: albanD fbshipit-source-id: fafac53e75223c4f599bd2162095aacad7b690df	2021-09-20 12:31:54 -07:00
Tao Xu	0430d1da12	[iOS] Fix the TestApp (#65319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65319 Test Plan: Imported from OSS Reviewed By: hanton Differential Revision: D31049543 Pulled By: xta0 fbshipit-source-id: ff0d0baac30682c63b2a28254ee0a5d8d9b8ca6f	2021-09-20 11:28:40 -07:00
Pritam Damania	3e64c9e176	[Pipe] Add a `WithDevice` wrapper to specify device execution for a module. (#65190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65190 As described in https://github.com/pytorch/pytorch/issues/65093, there could be modules which don't have any parameters/buffers. In this case, Pipe determines that the module should be executed on CPU. However this might result in unnecessary GPU to CPU transfers whereas the user expected the module to be executed on the GPU itself by keeping its inputs and outputs on GPU. For this use case, we introduce a `WithDevice` wrapper which can be used to override which device a particular module should be executed on as part of the pipeline. #Closes: https://github.com/pytorch/pytorch/issues/65093 ghstack-source-id: 138376272 Test Plan: 1) waitforbuildbot 2) unit tests Reviewed By: SciPioneer Differential Revision: D31010027 fbshipit-source-id: 4c1c61d3c6feeef341e002e5f7e83dd33ff3a516	2021-09-20 11:27:27 -07:00
Nicolas Hug	0a3cf8886a	Torchhub: More robust assumption regarding main or master branch (#64364 ) Summary: Closes https://github.com/pytorch/pytorch/issues/63753 This PR changes the assumption regarding the default branch of a repo to the following: > If main exist then use main,otherwise use master This will make torchhub more robust w.r.t. to the ongoing changes where repo use `main` instead of `master` as the development / default branch. cc nairbv NicolasHug Pull Request resolved: https://github.com/pytorch/pytorch/pull/64364 Reviewed By: saketh-are Differential Revision: D30731551 Pulled By: NicolasHug fbshipit-source-id: 7232a30e956dcccca21933a29de5eddd711aa99b	2021-09-20 10:36:13 -07:00
Mike Iovine	99e4ab5d44	[Static Runtime] Implement and enable variadic tuple unpack (#64934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64934 Add a new op `static_runtime::VarTupleUnpack` and a graph pass transforming graph sequences from: ``` %0, %1 = prim::TupleUnpack(%a) %2, %3 = prim::TupleUnpack(%b) ``` into: ``` %0, %1, %2, %3 = static_runtime::VarTupleUnpack(%a, %b) ``` The pass is only applied to contiguous blocks of `TupleUnpack` nodes. This is the most straightforward way to guarantee correctness, and it is sufficient for the models we care about. Test Plan: New unit tests: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- VarTupleUnpack` Reviewed By: d1jang Differential Revision: D30872109 fbshipit-source-id: 1ed4a7e201c532da28f703a3a50241c392a6c7e9	2021-09-20 10:36:11 -07:00
Jerry Zhang	14347d0dd5	[quant][fx][graphmode] Fix a bug for sub (#65109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65109 Previously for sub we set the dtype for sub with qconfig since it's matched with a QuantizeHandler, however this is incorrect, the dtype for sub is decided by whether the output is quantized or not, so we added a check of is_output_quantized while deciding the dtype for the output of sub. Later: is_output_quantized now depends on is_reference, which is pretty confusing and it may cause problems down the road, we should remove this dependency in the future. Test Plan: python test/test_quantization.py TestQuantizeFx.test_sub_scalar Imported from OSS Reviewed By: vkuzo Differential Revision: D30977826 fbshipit-source-id: 551fd63bd61b43b3c3415944ff73174e3a21cc8a	2021-09-20 10:36:09 -07:00
Natalia Gimelshein	c562ebca23	Revert "Revert D30558877: Ported std/var to ReductionOpInfo (#65262 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/63978 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65262 Reviewed By: mruberry Differential Revision: D31037360 Pulled By: ngimel fbshipit-source-id: 1c60f40c547229767cba3bbe7e11ca0fbbc8f95f	2021-09-20 10:36:06 -07:00
Michael Dagitses	fb1e6835cc	simplify `torch.meshgrid`'s shape computation (#62905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62905 Reviewed By: mruberry Differential Revision: D31021274 Pulled By: dagitses fbshipit-source-id: c219389bdc543e9592f7b1c707acfbf752ee6f34	2021-09-20 10:34:45 -07:00
Erjia Guan	cf60d24028	[DataPipe] Unlimited buffer for Forker and Demultiplexer (#64994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64994 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D30934362 Pulled By: ejguan fbshipit-source-id: d3b774d7e28c0b9659e999511e5a68c3929857d4	2021-09-20 09:30:39 -07:00
Facebook Community Bot	88032d8943	Automated submodule update: FBGEMM (#64640 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `d1ecc7dbe2` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64640 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D30805660 fbshipit-source-id: 9f783862e89fe3974badd5194ef793db55e7d275	2021-09-18 16:29:30 -07:00
Jerry Zhang	d8189db80f	[quant][fx2trt] Generate engine graph for explicit quant/implicit quant and fp16 graph (#65289 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65289 Turn on VERBOSE logging and use engine visualizer to generate the graph. Runtime: ``` explicit quant result diff max tensor(0.0771) implicit quant result diff max tensor(0.1909) trt fp16 time (ms/iter) 1.0740923881530762 trt int8 time (ms/iter) 0.5288887023925781 trt implicit int8 time (ms/iter) 0.6334662437438965 PyTorch time (CUDA) (ms/iter) 4.448361396789551 PyTorch time (CPU) (ms/iter) 45.13296604156494 ``` Generated Graphs: ``` explicit int8: https://www.internalfb.com/intern/graphviz/?paste=P458669571 implicit int8: https://www.internalfb.com/intern/graphviz/?paste=P458669656 fp16: https://www.internalfb.com/intern/graphviz/?paste=P458669708 ``` Test Plan: ``` buck run mode/opt -c python.package_style=inplace caffe2:fx2trt_quantized_resnet_test 2>log buck run //deeplearning/trt/fx2trt/tools:engine_layer_visualize -- --log_file log ``` Reviewed By: 842974287 Differential Revision: D30955035 fbshipit-source-id: 24949458ad9823fb026d56d78a6ee1c6874b6034	2021-09-18 13:30:37 -07:00
Don Jang	7f8d622d70	[Static Runtime] Add perf metrics for number of managed tensors & unmanaged values (#64992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64992 This change lets Static Runtime print out number of managed tensors & unmanaged values as performance metrics during profile runs. We will use /enhance these metrics to guide the effort of managing output tensors. Test Plan: Confirmed that a profile run prints out the added metric values on inline_cvr nets: ``` (inline_cvr/local) ... Total number of managed tensors: 2754 Total number of unmanaged values: 3240 ... (inline_cvr/local_ro) Total number of managed tensors: 1554 Total number of unmanaged values: 2966 ... (inline_cvr/remote_ro) Total number of managed tensors: 1439 Total number of unmanaged values: 28 ... ``` Reviewed By: hlu1 Differential Revision: D30926617 fbshipit-source-id: b86e071003ac941b9663db103eaa7c614466b4e0	2021-09-18 11:26:37 -07:00
Saketh Are	4a128ed811	Remove incorrect stride assert in Reduce.cuh (#65227 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37583 Per discussion with ngimel, the condition asserted here may not always hold after TensorIterator's dimension coalescing and reordering. However, the reduction output should still be correct when `sub_iter.strides(0)[0]` is non-zero. I've verified correctness empirically by: 1. Lowering the threshold ([configured here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/TensorIterator.cpp#L1127)) at which iterators are split into sub-iterators, making it easier to trigger. 2. Generating many tensors with random dimensions and randint elements which produce a non-zero `sub_iter.strides(0)[0]` in the CUDA kernel. 3. Verifying that the reduction `t.sum(dim=0)` produces the same results for those tensors on CPU and on CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65227 Reviewed By: ngimel Differential Revision: D31031406 Pulled By: saketh-are fbshipit-source-id: 5cbf2001224454c74f6db42455c507365ad1f2b1	2021-09-18 10:29:13 -07:00
Michael Dagitses	543185a0fd	support using gradients named for outputs in derivatives (#63947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63947 Fixes #62196 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30541485 Pulled By: dagitses fbshipit-source-id: ea1dd0edd1a51936a295631e52b85e9c022a9c87	2021-09-18 07:31:45 -07:00
Michael Dagitses	926a3d2e85	clarify implementation of check_grad_usage (#64439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64439 1) remove unused fully_implemented 2) rename used_grad to uses_grad and make it a boolean 3) rename used_grads to num_grads_uses 4) add comments explaining what some of the checks mean Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30733904 Pulled By: dagitses fbshipit-source-id: dccbbef8a4be8713215ef91aa97a34124f06a7a1	2021-09-18 07:30:30 -07:00
Jerry Zhang	d3e36fade2	[quant][fx2trt] Enable comparison with implicit quant mode (#65043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65043 Currently got following result, will take a look at the executed graph again: ``` trt fp16 time (ms/iter) 1.0483217239379883 trt int8 time (ms/iter) 0.5329632759094238 trt implicit int8 time (ms/iter) 0.6769704818725586 PyTorch time (ms/iter) 6.453146934509277 ``` Test Plan: ``` python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py ``` Imported from OSS Reviewed By: 842974287 Differential Revision: D30954871 fbshipit-source-id: 8d7ff82b8f5d0b7946fbd38a7cddede7d40b28aa	2021-09-17 23:29:35 -07:00
CodemodService Bot	4150b672aa	[Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` Reviewed By: zertosh Differential Revision: D31039372 fbshipit-source-id: a5e54a9b1d2ef97e9bc206b9e2a82124e5a22a7a	2021-09-17 20:33:12 -07:00
Jane Xu	6707dfeefb	Remove 9.2 related macros for CONSTEXPR (#65066 ) Summary: Removes C10_HOST_CONSTEXPR_EXCEPT_CUDA92 references in the code Pull Request resolved: https://github.com/pytorch/pytorch/pull/65066 Reviewed By: driazati Differential Revision: D31022520 Pulled By: janeyx99 fbshipit-source-id: f02cdc6caba5b48405575242921f5845ff18f729	2021-09-17 17:31:20 -07:00
zhouzhuojie	1cd9018b6f	Make github.com in noproxy list (#65256 ) Summary: Fixes #{issue number} Attempt to solve some ratelimiting issue we saw from calling GitHub apis Pull Request resolved: https://github.com/pytorch/pytorch/pull/65256 Reviewed By: seemethere Differential Revision: D31035115 Pulled By: zhouzhuojie fbshipit-source-id: 7efd5d5af7d91805e4bf27b86847791e991b741e	2021-09-17 17:31:18 -07:00
Natalia Gimelshein	50c29fef3e	remove utils.cpp (#65184 ) Summary: Dead code Pull Request resolved: https://github.com/pytorch/pytorch/pull/65184 Reviewed By: mruberry Differential Revision: D31031777 Pulled By: ngimel fbshipit-source-id: 13633888229a7af8cfd8ea7e55ea2880b2e47273	2021-09-17 17:31:15 -07:00
Shiyan Deng	19471c54a6	[fx const fold] fix a case when some inputs are unused (#65223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65223 If there're unused inputs, they won't appear in `submod_1`. We need to add all the unused inputs so that the model after const fold has the same inputs as the original model. Reviewed By: jfix71 Differential Revision: D31021217 fbshipit-source-id: b7452c90d133b747e0699936a81d3fee14af9cc9	2021-09-17 17:29:55 -07:00
Gisle Dankel	992dad1855	[Profiler] Update kineto submodule (#65236 ) Summary: Update to latest kineto revision. See Kineto repo for change log. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65236 Reviewed By: malfet Differential Revision: D31031638 Pulled By: gdankel fbshipit-source-id: 681655b2e8e151895afa91445ced0fd57a11fa93	2021-09-17 16:26:30 -07:00
Shiyan Deng	4408b755bc	[fx2trt] re-enable profiler and some miscs for TRTModule (#65072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65072 Previously disabled attaching trt profiler to exec context in TRTModule because https://fburl.com/mc33n880 states that `enqueue()` doesn't support profiling. Seems to be a lie though. Re-enable attaching profiler in this diff. Also added a bunch of checks for dtype and shape, and fixed saving state_dict and loading back. Test Plan: buck run mode/opt -c python.package_style=inplace -j 40 deeplearning/trt/fx2trt:acc2trt_test Reviewed By: yinghai Differential Revision: D30962757 fbshipit-source-id: 9c664b0500a8169b7952f6f912239a5a05772aea	2021-09-17 16:26:28 -07:00
Michael Suo	afa25c77f1	[package] Make it possible to re-save a PackageImporter module (#65101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65101 As title. Previously this was guarded against for implementation simplicity, as we didn't really think there was a use case for saving a mangled module name directly. But people started doing stuff like: ``` exporter.save_module(my_imported_obj.__module__) ``` which implicitly passes along the mangled module name. This PR makes it so that given `PackageImporter` instance can always import modules that it created, and changes `PackageExporter` to properly demangle the resulting module name when writing the package to the export archive. Differential Revision: D30975712 D30975712 Test Plan: Imported from OSS Pulled By: suo fbshipit-source-id: d9e849bf651713890e72dccdcef74fa52d377149	2021-09-17 16:25:11 -07:00
Jason Ansel	487c771593	[FX] Fix tracing of bitwise and/or (#65196 ) Summary: Previously resulted in `AttributeError: module 'operator' has no attribute 'and'` and/or are python keywords, so they are renamed to `operator.and_` and `operator.or_` Pull Request resolved: https://github.com/pytorch/pytorch/pull/65196 Reviewed By: Chillee Differential Revision: D31020336 Pulled By: jansel fbshipit-source-id: 51d888151fe78c0c1197ecaf161976b219c59694	2021-09-17 14:33:02 -07:00
Mike Ruberry	6596173811	Revert D30731191: [pytorch][PR] Torchhub: rewrite commit hash check to avoid using unnecessary GitHub API credits Test Plan: revert-hammer Differential Revision: D30731191 (`f9bf144a0c`) Original commit changeset: d1ee7c2ef259 fbshipit-source-id: 5c7207f66c5354ce7b9ac2594e4f5b8307619b0c	2021-09-17 14:33:00 -07:00
BowenBao	3d32dec5ba	[ONNX] Deprecate enable_onnx_checker argument in torch.onnx.export() (#61708 ) (#64369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64369 As of now, the "enable_onnx_checker" parameter was described as below: enable_onnx_checker (bool, default True): If True the ONNX model checker will be run to ensure the exported model is a valid ONNX model. An invalid ONNX graph is useless to users so such checker should be done for each call. In this PR, we will still write the model to an ONNX file even it is invalid. And the exception will be thrown after the ONNX file has been created. This enables user output an invalid ONNX graph for debug. This PR will still keep it in torch.onnx.export() function for backward support while all backend logic has been changed to work as enable_onnx_checker is set to True. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905267 Pulled By: malfet fbshipit-source-id: 3ad3f68e77fcec012cc7ef674cc9a61755eebc9e Co-authored-by: fatcat-z <zhang-ji@outlook.com>	2021-09-17 14:31:41 -07:00
Don Jang	ae00075ac7	[Static Runtime] Move MemoryPlanner out into memory_planner.cpp (#65123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65123 This change re-reverts D30883290 (`0e11454d19`). D30883290 (`0e11454d19`) broke the OSS build since the change in this change implicitly removed the default move constructor of `StaticRuntime`. ``` ep 15 15:39:57 /var/lib/jenkins/workspace/benchmarks/static_runtime/deep_wide_pt_bench.cc:95:10: error: call to implicitly-deleted copy constructor of 'torch::jit::StaticRuntime' Sep 15 15:39:57 return torch::jit::StaticRuntime(*smod); Sep 15 15:39:57 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sep 15 15:39:57 /var/lib/jenkins/workspace/torch/csrc/jit/runtime/static/impl.h:321:34: note: copy constructor of 'StaticRuntime' is implicitly deleted because field 'planner_' has a deleted copy constructor Sep 15 15:39:57 std::unique_ptr<MemoryPlanner> planner_; Sep 15 15:39:57 ^ Sep 15 15:39:57 /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/unique_ptr.h:356:7: note: 'unique_ptr' has been explicitly marked deleted here Sep 15 15:39:57 unique_ptr(const unique_ptr&) = delete; Sep 15 15:39:57 ^ Sep 15 15:39:57 /var/lib/jenkins/workspace/benchmarks/static_runtime/deep_wide_pt_bench.cc:99:9: error: call to implicitly-deleted copy constructor of 'torch::jit::StaticRuntime' Sep 15 15:39:57 auto sr = getStaticRuntime(); Sep 15 15:39:57 ^ ~~~~~~~~~~~~~~~~~~ Sep 15 15:39:57 /var/lib/jenkins/workspace/torch/csrc/jit/runtime/static/impl.h:321:34: note: copy constructor of 'StaticRuntime' is implicitly deleted because field 'planner_' has a deleted copy constructor Sep 15 15:39:57 std::unique_ptr<MemoryPlanner> planner_; Sep 15 15:39:57 ^ Sep 15 15:39:57 /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/unique_ptr.h:356:7: note: 'unique_ptr' has been explicitly marked deleted here Sep 15 15:39:57 unique_ptr(const unique_ptr&) = delete; Sep 15 15:39:57 ^ Sep 15 15:39:57 2 errors generated. ``` This change fixes the issue by explicitly defining the default move constructor (courtesy of mikeiovine). Original Summary: This change moves `MemoryPlanner` out of impl.cpp into memory_planner.cpp. `MemoryPlanner` performs an independent sub-task of static analysis of a graph, and creating memory planning, and allocating/deallocating managed Tensors. This change will reduce merge conflicts as I work on MemoryPlanner more actively for output Tensor support. Test Plan: - Confirm that OSS build went well (See External Tests section). Reviewed By: mikeiovine Differential Revision: D30983292 fbshipit-source-id: a59f407fa1123527824157268111144a1bf58116	2021-09-17 13:32:01 -07:00
Mengwei Liu	eaf85fad62	[PyTorch] Extract parseOperator() into a standalone source file (#65179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65179 This is following up this PR: https://github.com/pytorch/pytorch/pull/61862. The purpose is to modularize operator parsing so that it can be used as needed without pulling the whole `import.cpp` into build. Test Plan: Added a unit test in `test_lite_predictor.cpp` called `ParseOperators`, similar to `ParseBytecode`. Reviewed By: iseeyuan Differential Revision: D31006555 fbshipit-source-id: c38e221800af4cf72963a353c452c5437f56a0ac	2021-09-17 13:31:59 -07:00
Scott Wolchok	35084ee451	[PyTorch] Improve OperatorEntry::getKernelForDispatchKey (#64838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64838 The returned pointer, if present, could never be nullptr, so there is no reason to wrap it in an optional rather than just using the nullptr state. The repeated calls to kernels_.at() were not getting optimized away, so just use the perfectly good iterator find() already gave us. ghstack-source-id: 138304429 Test Plan: CI Reviewed By: bdhirsh Differential Revision: D30875748 fbshipit-source-id: 9cbb875715b7a582380c7402155fdbe21944dc85	2021-09-17 13:31:56 -07:00
Scott Wolchok	fcaf526815	avoid moving Argument in infer_schema (#64822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64822 Turns out the suppressed lint message was trying to tell us something: we can construct our Argument in-place rather than create a temporary and move into the argument vector. ghstack-source-id: 138304423 Test Plan: CI, profile op registration and observe reduced Argument move ctor and dtor costs Reviewed By: smessmer Differential Revision: D30860718 fbshipit-source-id: c8da45ab7e61b5df9fa1273301896309bca108b5	2021-09-17 13:31:54 -07:00
Scott Wolchok	79cbcd3e7c	[PyTorch] Fix missing move in Argument ctor (#64821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64821 Not moving adds excess refcounting overhead. ghstack-source-id: 138304432 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D30860720 fbshipit-source-id: de695e5cdfb1fa314b53a8bcb291343ae4eb87a5	2021-09-17 13:31:51 -07:00
Scott Wolchok	5a3475df21	[PyTorch] shrink Argument (#64820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64820 Putting boolean fields next to each other avoids wasting space for padding. ghstack-source-id: 138304433 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D30860717 fbshipit-source-id: ad45c37574a7c857958978aad42fd1333c6b29ee	2021-09-17 13:31:48 -07:00
Scott Wolchok	132d65ed25	[PyTorch] Compare pointers before calling expensive Type comparison (#64784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64784 See code comment for explanation. ghstack-source-id: 138304431 Test Plan: Reduced overhead in findSchemaDifferences while profiling registration at startup in a case where I forced duplicates to be registered (by looping in RegisterDispatchKey.cpp). Reviewed By: dhruvbird Differential Revision: D30854036 fbshipit-source-id: 568733c3cf449697cdeb74cf57fed0926729fa68	2021-09-17 13:31:46 -07:00
Jane Xu	cf5c00f155	CI: Consolidate Build and Test naming for better stats collection (#65232 ) Summary: All build pytorch steps should now be named "Build" and test steps named "Test" for workflows that test PyTorch on Linux and Windows. I left the binary stuff alone as that build is different. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65232 Reviewed By: driazati, seemethere Differential Revision: D31024232 Pulled By: janeyx99 fbshipit-source-id: 24b1a1e2b1b25aba70b7adc41603ec8fa4ce7dd6	2021-09-17 13:30:31 -07:00
Rohan Varma	45bd0f6181	Back out "Revert D30745960: [DDP] Remove SPMD from self.modules_buffers" (#64778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64778 Original commit changeset: d3f3fb813c45 ghstack-source-id: 138326910 Test Plan: ci Reviewed By: H-Huang Differential Revision: D30849443 fbshipit-source-id: 15dab8a959a29d2e2fefac6ad52b8d8168eacc02	2021-09-17 12:28:36 -07:00
Rohan Varma	70f286c1e2	Back out "Revert D30745961: [DDP] Remove self.modules_params" (#64777 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64777 Original commit changeset: 59f7cc50d369 ghstack-source-id: 138326909 Test Plan: ci Reviewed By: H-Huang Differential Revision: D30849442 fbshipit-source-id: bb87ba83935374d8a3ebbc29365df1417dd4f26f	2021-09-17 12:28:34 -07:00
Rohan Varma	61dfcbf4bc	Back out "Revert D30745921: [DDP] Fix when buffers are reassigned in module" (#64776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64776 Original commit changeset: 343ead86bf1e ghstack-source-id: 138326914 Test Plan: ci Reviewed By: H-Huang Differential Revision: D30849444 fbshipit-source-id: 9a72805416fe7d6c68e51bdcdb88f6e1fecb614d	2021-09-17 12:28:32 -07:00
Sangbaek Park	cce5381238	[xplat][pytorch]: Disabling too many logging. (#65170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65170 Disabling too many logging. These are per frame logging and outputting lots of logs in Skylight command line. Test Plan: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` Reviewed By: SS-JIA Differential Revision: D30778852 fbshipit-source-id: bcf75ec417dfe3e9ce3df92a1894352772bd663d	2021-09-17 12:28:30 -07:00
Michael Dagitses	047e68235f	delegate parallelism to Ninja when possible (#64733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64733 The previous implementation was wrong when CPU scheduling affinity is set. In fact, it is still wrong if Ninja is not being used. When there is CPU scheduling affinity set, the number of processors available on the system likely exceeds the number of processors that are usable to the build. We ought to use `len(os.sched_getaffinity(0))` to determine the effective parallelism. This change is more minimal and instead just delegates to Ninja (which handles this correctly) when it is used. Test Plan: I verified this worked as correctly using Ninja on a 96-core machine with 24 cores available for scheduling by checking: * the cmake command did not specify "-j" * the number of top-level jobs in top/pstree never exceeded 26 (24 + 2) And I verified we get the legacy behavior by specifying USE_NINJA=0 on the build. Reviewed By: jbschlosser, driazati Differential Revision: D30968796 Pulled By: dagitses fbshipit-source-id: 29547dd378fea793957bcc2f7d52d5def1ecace2	2021-09-17 12:28:28 -07:00
Michael Dagitses	b936a10074	add test for number of jobs when building (#65162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65162 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30998006 Pulled By: dagitses fbshipit-source-id: 8b8d45668acf0e6c0f16df0f705a1af8c6d4f22d	2021-09-17 12:28:25 -07:00
Jane Xu	1ee66a5278	Remove CUDA 9.2 references conditionals and workarounds (#65070 ) Summary: Title says it all Pull Request resolved: https://github.com/pytorch/pytorch/pull/65070 Reviewed By: malfet Differential Revision: D30966464 Pulled By: janeyx99 fbshipit-source-id: e454906fd5d7d321d390939ba5d237e1d9b150f8	2021-09-17 12:28:23 -07:00
edward-io	51e12f0071	fix torch.distributed.elastic event docs (#64974 ) Summary: the example code wasn't working for me. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang cbalioglu gcramer23 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64974 Reviewed By: kiukchung, cbalioglu Differential Revision: D30926481 Pulled By: edward-io fbshipit-source-id: f5e32cc2b948b6ee30d84a8247856f39fc786f67	2021-09-17 12:27:09 -07:00
Raghavan Raman	bbe25af0df	[nnc] Updated inlining to handle cases when producer indices are constants after eval (#65044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65044 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30954655 Pulled By: navahgar fbshipit-source-id: dfaedb5af710b2625ceec3a443a6c4e34158ab16	2021-09-17 11:28:48 -07:00
Raghavan Raman	03fc636d5c	[nnc] Updated inliner to remove assertions and exception (#64719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64719 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30828583 Pulled By: navahgar fbshipit-source-id: 9826a59085a210e44d101a843ff2cae440dfd633	2021-09-17 11:28:46 -07:00
Nikita Shulga	340531f2e0	[ONNX] Do not use `numpy` in ONNX opsets (#65188 ) Summary: Replace `torch.tensor([numpy.arange(a, b, c)])` with `torch.arange(a, b, c).unsqueeze(0)` Replace `tuple(numpy.add(a, b))` with `tuple( x + y for (x, y) in zip(a, b)` As `numpy` is an optional dependency, it shouldn't be used in PyTorch core by default Pull Request resolved: https://github.com/pytorch/pytorch/pull/65188 Reviewed By: mruberry Differential Revision: D31009490 Pulled By: malfet fbshipit-source-id: 528e48f055bf9ac1de1fd7e94c0be41915df9a0b	2021-09-17 11:28:44 -07:00
Tao Xu	7ced25eee3	[CoreML][OSS] Include Core ML in iOS/MacOS nightlies (#65075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65075 Need to drop one line at - https://github.com/pytorch/builder/blob/master/conda/pytorch-nightly/meta.yaml#L65 ghstack-source-id: 138324213 Test Plan: - Check the iOS nightly builds - `pod install LibTorch-Lite-Nightly` Reviewed By: hanton Differential Revision: D30912269 fbshipit-source-id: b07679b75ecf89beae2975c37cf17d2449a3304f	2021-09-17 11:27:20 -07:00
Shiyan Deng	f9c0a39ad9	add a test case for const fold (#65224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65224 Add a test case for the fix D30996277 (`8c38d141df`). Test Plan: buck test mode/opt -c python.package_style=inplace -c fbcode.nvcc_arch=v100,a100 -c fbcode.enable_gpu_sections=true -j 40 caffe2/test:fx_const_fold -- test_const_fold_module_attr Reviewed By: jfix71 Differential Revision: D31000386 fbshipit-source-id: f444361839decc583bf93ac946cfe2049376719e	2021-09-17 10:32:07 -07:00
Pavithran Ramachandran	3c003aa6ae	[PyTorchEdge] promote prim ops by using ops table for mobile runtime (#64816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64816 ## Context: Promoting prim ops: Certain prim ops are frequent than others (like tupleIndex, raiseException, ...). These ops are frequent that they are chosen to be promoted as first class instructions. To promote it requires multiple steps and support from TS team as it changes how the bytecode is serialized and deserialized. So to prevent multiple bytecode version bumps and provided stability while these changes happen, an iterim iterative process is proposed which uses a table to lookup for "promoted" op's function. This allows us to rapidly update the ops list and test on production model without having to change the bytecode. In case of failure, we can quickly revert this change. ## Observation The ops are chosen based on the notebook N1135657 which examines the top frequent ops. ## Fix An iterim solution of having a static table, which when given a prim op name returns a function to be applied on the stack. This helps us check in `function.cpp` to get the "promoted" op. As a fall back, the "promoted" op still resides in `register_prim_ops.cpp` so that the function of prim op is never missed. ghstack-source-id: 138261338 Test Plan: ``` [pavithran@67109.od ~/fbsource/fbcode (eddab7da6)]$ buck test caffe2/test/cpp/jit:jit -- BackendTest.TestComposite Building: finished in 5.4 sec (100%) 7284/7284 jobs, 0/7284 updated Total time: 5.8 sec More details at https://www.internalfb.com/intern/buck/build/480191aa-a1ba-42ca-99e9-ee4bf2b06d65 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 867382eb-327f-43d7-a45c-875b7f484b15 Trace available for this run at /tmp/tpx-20210914-100224.283682/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/844425134506115 ✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (12.159) ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestCompositeWithSetStates (0.797) ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestComposite (0.779) Summary Pass: 2 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/844425134506115 ``` {F663491347} Reviewed By: iseeyuan Differential Revision: D30819926 fbshipit-source-id: 4cbe05d5761bdc9d62ef08e18172dcf64cb49526	2021-09-17 10:32:05 -07:00
Michael Suo	ecfc784e67	Revert D30993855: [pytorch][PR] OpInfo: nn.functional.conv2d Test Plan: revert-hammer Differential Revision: D30993855 (`873255c6d9`) Original commit changeset: 7402f99addb4 fbshipit-source-id: b0539daa195dc6a3739bce5c264cb2177b7721ff	2021-09-17 10:32:02 -07:00
Tao Xu	18fa58c4e9	[CoreML][OSS] Integrate with CMake (#64523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64523 - Build Pytorch with CoreML delegate - ` USE_PYTORCH_METAL=ON python setup.py install --cmake` - Build iOS static libs - `IOS_PLATFORM=SIMULATOR USE_COREML_DELEGATE=1 ./scripts/build_ios.sh` ghstack-source-id: 138324216 Test Plan: - Test the Helloword example {F657778559} Reviewed By: iseeyuan Differential Revision: D30594041 fbshipit-source-id: 8cece0b2d4b3ef82d3ef4da8c1054919148beb16	2021-09-17 10:32:00 -07:00
Yi Wang	c1415a0a72	[Reland] [Model Averaging] Simplify PostLocalSGD Optimizer API (#65197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65197 1. The constructor accepts a local optimizer instance instead of the inputs of local optimizer constructor and the class type. 2. The parameters are read from local optimizer's param_groups instead of a separate input. Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 138307226 Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity Reviewed By: rohan-varma Differential Revision: D31007439 fbshipit-source-id: bbb0526e6763ef76775b85088571506b3942c722	2021-09-17 10:31:58 -07:00
haozhe.zhu	752a820230	Bf16 matmul (#64619 ) Summary: Re-create PR to fix https://github.com/pytorch/pytorch/pull/61891. Drop the support for addbmm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64619 Reviewed By: jbschlosser Differential Revision: D30902995 Pulled By: VitalyFedyunin fbshipit-source-id: dc318d73adff8f6974c9752d0d097e69276f8206	2021-09-17 10:31:56 -07:00
Nicolas Hug	f9bf144a0c	Torchhub: rewrite commit hash check to avoid using unnecessary GitHub API credits (#64362 ) Summary: This PR adds more detailed error messages to torchhub if the commit hash validation goes wrong, providing suggestions to the users on how to resolve the issue. It also documents why such validation is important. EDIT: it also avoids validatating some stuff when we know "stuff" isn't a commit since there's no risk in this case CC malfet mthrok cc nairbv NicolasHug Pull Request resolved: https://github.com/pytorch/pytorch/pull/64362 Reviewed By: gchanan, malfet Differential Revision: D30731191 Pulled By: NicolasHug fbshipit-source-id: d1ee7c2ef2591dd7a5291977af1635ada2552d1b	2021-09-17 10:30:39 -07:00
James Reed	0559cb37cd	[FX] Ensure BC coverage for all of torch.fx.passes (#65081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65081 Test Plan: Imported from OSS Reviewed By: jbschlosser, khabinov Differential Revision: D30967428 Pulled By: jamesr66a fbshipit-source-id: 2ff83da728dc469f086cf504e71b43396db612d8	2021-09-17 09:32:43 -07:00
James Reed	cf7409e184	[FX] Move graph_manipulation and param_fetch out of experimental and into passes (#65183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65183 ghstack-source-id: 138309655 Test Plan: waitforsadcastle Reviewed By: protonu Differential Revision: D31007630 fbshipit-source-id: 77d14b284737aabbe2b9e6394177a0c2e40aafba	2021-09-17 09:32:40 -07:00
Shiyan Deng	6aa04b6843	[fx2trt] make gpu trace better (#65168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65168 Add record_function to TRTModule and EngineHolder so each parts would appear on gpu trace. Test Plan: CI Reviewed By: wushirong Differential Revision: D30997968 fbshipit-source-id: b90662f20a8c0d321846c222f3e8c8eb7e010eba	2021-09-17 09:32:37 -07:00
Tao Xu	a8d7b885c5	[CoreML][iOS/MacOS] Add the CoreML executor (#64522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64522 The `PTMCoreMLExecutor` serves as a bridge between the delegate APIs and Core ML runtime. ghstack-source-id: 138324217 allow-large-files Test Plan: iOS: Run the CoreML tests in the playground app MacOS: ``` buck test pp-macos PASS 633ms 1 Passed 0 Skipped 0 Failed CoreMLTests ``` {F657776101} Reviewed By: raziel, iseeyuan Differential Revision: D30594042 fbshipit-source-id: a42a5307a24c2f364333829f3a84f7b9a51e1b3e	2021-09-17 09:32:34 -07:00
Elias Ellison	aafeea3a6c	Allow extra unused arguments in symbolic shape function (#65095 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65095 The reason I didn't do this initially was because I was worried that matching one schema to another schema with an extra argument might change semantics, e.g. Add(Tensor, Tensor) to Add(Tensor, Tensor, Tensor) might be different. However we don't actually need to worry about this because the graph schema isn't used for node matching, unlike symbolic_script.cpp Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30972081 Pulled By: eellison fbshipit-source-id: d4089e8feafc330df2ca158866fe779a7da0b073	2021-09-17 09:31:02 -07:00
albanD	6eafe7f15e	Actually deprecate __torch_function__ as plain methods (#64843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64843 Fix for https://github.com/pytorch/pytorch/issues/63767 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30991425 Pulled By: albanD fbshipit-source-id: 1214143b8aea87e6ff406c7fc13096bd15d1a768	2021-09-17 08:32:53 -07:00
albanD	1ed9c33d08	Update fx proxy to use classmethod for __torch_function__ (#64842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64842 Change the `__torch_function__` to follow best guidelines of using classmethods. I am not sure how to handle the case where multiple tracer objects are given as input but given that before we were getting an arbitrary tracer from there based on the "self" that was arbitrarily chosen by the torch_function caller, the new implementation is not worst? Let me know what you think! Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30991423 Pulled By: albanD fbshipit-source-id: d28940df230b543952b278a0eb2d61cf7ae123ce	2021-09-17 08:32:51 -07:00
albanD	473e55d5b2	Use classmethods for overrides (#64841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64841 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30991424 Pulled By: albanD fbshipit-source-id: 551e2119768f3a4292713f3bfa83930f5506adbd	2021-09-17 08:32:49 -07:00
Howard Huang	a95fabfecb	Fix port allocation race condition for elastic test (#65149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65149 Fixes #64789 There is a race condition between when the free port is acquired to when it is used to create the store in which it may have been used. Since this test only tests that timeout is triggered for tcpstore, we can bind to any port on tcpstore creation. This only affects the test on the server (since that is where the port is used), but I changed both tests for clarity cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang cbalioglu gcramer23 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30993166 Pulled By: H-Huang fbshipit-source-id: eac4f28d641ac87c4ebee89df83f90955144f2f1	2021-09-17 08:32:47 -07:00
Stephen Jia	f101070587	Small improvements to compare_models_torch binary (#65171 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65171 Add the model comparison binary to BUCK, and also add some quality of life features such as controlling the input range. Test Plan: ``` # Build the binary cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:ptmobile_compareAndroid\#android-arm64 --show-ou # Push it to the device adb push buck-out/gen/xplat/caffe2/ptmobile_compareAndroid\#android-arm64 /data/local/tmp/compare_models # Run the benchmark binary BENCH_CMD="/data/local/tmp/compare_models" BENCH_CMD+=" --model=$PATH_TO_MODEL" BENCH_CMD+=" --refmodel=$PATH_TO_REFERENCE_MODEL" BENCH_CMD+=" --input_type=float --input_dims=$MODEL_INPUT_SIZE" BENCH_CMD+=" --iter=100" BENCH_CMD+=" --tolerance 1e-5" ``` Reviewed By: beback4u Differential Revision: D30371322 fbshipit-source-id: 5e520aaf119c90985a1d5a135f76e4057148333b	2021-09-17 08:32:45 -07:00
Edward Yang	9601deb1b3	Disable autograd fallback tests on Windows (#65147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65147 I think they trigger an MSVC bug per https://github.com/pytorch/pytorch/issues/48763 ghstack-source-id: 138247203 Test Plan: breakpointed https://www.internalfb.com/intern/sandcastle/job/9007199738584981/ and sush'ed into the host and ran `buck build arvr/mode/win/opt //xplat/caffe2:autograd_libtorch_test_ovrsource` in `/cygdrive/d/ovrsource-null-hg` Reviewed By: soulitzer Differential Revision: D30992685 fbshipit-source-id: 06c6fb2c18d55490f89fc91ee5b7a4c5a7faf1c6	2021-09-17 08:32:43 -07:00
Michael Dagitses	aaffcfe9cd	implement "xy" indexing for torch.meshgrid (#62724 ) Summary: This is step 4/7 of https://github.com/pytorch/pytorch/issues/50276. This allows the use of `"xy"` indexing but doesn't change any defaults. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62724 Reviewed By: heitorschueroff Differential Revision: D30995290 Pulled By: dagitses fbshipit-source-id: 08a6a6144b20bc019f68bc3c52e3bbf967976d8f	2021-09-17 08:31:17 -07:00
Alban Desmaison	d37c02be08	Allow parametrization to be nested (#65167 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65163 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65167 Reviewed By: jbschlosser Differential Revision: D31002318 Pulled By: albanD fbshipit-source-id: b1f1c6c9efa9e83af9789ed13efc133f777f418e	2021-09-17 07:29:01 -07:00
Nicolas Hug	9157a2889f	Pass GITHUB_TOKEN to linux CI jobs and avoid skipping torchhub tests (#64807 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64760 This should hopefully put the torchhub tests back. This also avoids skipping the torchhub tests: currently the tests are skipped if they fail, which pretty much defeats the purpose of having a test in the first place since we're never notified when they do fail. cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra nairbv NicolasHug Pull Request resolved: https://github.com/pytorch/pytorch/pull/64807 Reviewed By: seemethere Differential Revision: D30994585 Pulled By: NicolasHug fbshipit-source-id: 561782c22462b5cfec99cca153eb59623db5660a	2021-09-17 03:30:56 -07:00
Tao Xu	7dc3858deb	[CoreML][fbcode] Add the `preprocess` python APIs (#64521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64521 Add the preprocess part for the coreml delegate. Check out the `example.py` for the usage. ghstack-source-id: 138324214 Test Plan: ``` (base) [taox@devvm2780.vll0 ~/fbsource/fbcode/caffe2/fb] buck run coreml:example -- --model="/home/taox/mobilenetv2/mobilenetv2.pt" --out="/home/taox/mobilenetv2/mobilenetv2_coreml.pt" Parsing buck files: finished in 0.5 sec Downloaded 0/1 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 10.6 sec (100%) 12611/57623 jobs, 1/57623 updated Total time: 11.1 sec Converting Frontend ==> MIL Ops: 100%\|██████████████████████████████████████████▉\| 382/383 [00:00<00:00, 692.58 ops/s] Running MIL optimization passes: 100%\|███████████████████████████████████████████\| 18/18 [00:00<00:00, 45.55 passes/s] Translating MIL ==> MLModel Ops: 100%\|███████████████████████████████████████████\| 704/704 [00:01<00:00, 468.56 ops/s] input { name: "input_0" type { multiArrayType { shape: 1 shape: 3 shape: 224 shape: 224 dataType: FLOAT32 } } } output { name: "645" type { multiArrayType { dataType: FLOAT32 } } } metadata { userDefined { key: "com.github.apple.coremltools.source" value: "torch==1.10.0a0+fb" } userDefined { key: "com.github.apple.coremltools.version" value: "4.1" } } {'inputs': '[["input_0", "0", "[1, 3, 224, 224]"]]', 'outputs': '[["645", "0", "[1, 1000]"]]', 'config': '{"spec_ver": "4", "backend": "cpu", "allow_low_precision": "True"}', 'metadata': '{"coremltool_ver": "4.1", "torch_ver": "torch==1.10.0a0+fb"}'} WARNING: Logging before InitGoogleLogging() is written to STDERR W0826 13:27:12.690302 2477051 backend_detail.cpp:376] Warning: Backend [coreml] is not available. Execution of this Module is still possible by saving and loading on a device where the backend is available. (function codegen_backend_module) graph(%self.1 : torch.jit.LoweredModule.coreml.__torch__.torchvision.models.mobilenetv2.MobileNetV2, %x.1 : Tensor): %51 : str = prim::Constant[value="Exception: Backend is not available."]() %50 : str = prim::Constant[value="AssertionError: "]() %14 : str = prim::Constant[value="forward"]() # <string>:5:62 %48 : Tensor = prim::Uninitialized() %44 : Tensor = prim::Uninitialized() %typed_inputs.1 : Any[] = prim::ListConstruct(%x.1) %__backend.3 : __torch__.torch.classes.__backends__.coreml = prim::GetAttr[name="__backend"](%self.1) %8 : bool = prim::CallMethod[name="is_available"](%__backend.3) # <string>:4:19 %49 : Tensor = prim::If(%8) # <string>:4:16 block0(): %__backend : __torch__.torch.classes.__backends__.coreml = prim::GetAttr[name="__backend"](%self.1) %__handles : Dict(str, Any) = prim::GetAttr[name="__handles"](%self.1) %15 : Any = aten::__getitem__(%__handles, %14) # <string>:5:47 %17 : Any[] = prim::CallMethod[name="execute"](%__backend, %15, %typed_inputs.1) # <string>:5:24 %18 : Any = prim::ListUnpack(%17) %20 : bool = prim::isinstance[types=[Tensor]](%18) %39 : Tensor = prim::If(%20) # <string>:6:18 block0(): %22 : Tensor = prim::unchecked_cast(%18) -> (%22) block1(): = prim::RaiseException(%50) # <string>:6:18 -> (%44) -> (%39) block1(): = prim::RaiseException(%51) # <string>:9:18 -> (%48) return (%49) ``` Reviewed By: raziel Differential Revision: D30585154 fbshipit-source-id: 66c7d2e931be6eaa3c43a0ee131ea8046452449d	2021-09-17 00:25:14 -07:00
Don Jang	8241193d76	[Static Runtime] Introduce static_runtime::dict_unpack (#64771 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64771 Test Plan: - Added `StaticRuntime.RemoveImmutableInputDictLookupsWithImmutableInputDict` - Added `StaticRuntime.RemoveImmutableInputDictLookupsWithMutableInputDict` - TBD: Perf impact measurement Reviewed By: mikeiovine Differential Revision: D30685083 fbshipit-source-id: 050a92ef3b3ed0fdc0ab7a13a4b5dbfede9342a9	2021-09-16 23:25:13 -07:00
BowenBao	e6c39a521b	[ONNX] Update submodule to 1.10.1 (#63716 ) (#64576 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * https://github.com/pytorch/pytorch/issues/64576 [ONNX] Update submodule to 1.10.1 (https://github.com/pytorch/pytorch/issues/63716) * [ONNX] Update IR version to 7 * [ONNX] update submodule to 1.10.1 * Disable some tests in caffe2 that fail b/c caffe2 doesn't support the new ops. * Update Bazel file. * Update expect files for new ONNX IR version Pull Request resolved: https://github.com/pytorch/pytorch/pull/64576 Reviewed By: jansel Differential Revision: D31006896 Pulled By: msaroufim fbshipit-source-id: f3bf97709f23a5a2cd49c708e7363231f2c1961a	2021-09-16 22:29:54 -07:00
James Reed	9117eed6ed	[FX} Add torch.ops.profiler._record_function_{enter,exit} as stateful ops for DCE (#65180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65180 Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31007115 Pulled By: jamesr66a fbshipit-source-id: 823b15db712a382a4f2a4fd409983d47bc067150	2021-09-16 21:31:54 -07:00
Zafar Takhirov	02dec91212	[quant] AO migration of the `torch/quantization/utils.py` (phase 1) (#64919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64919 AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates the quantization utilities. ghstack-source-id: 138303325 Test Plan: `buck test mode/dev //caffe2/test:quantization` Reviewed By: jerryzh168 Differential Revision: D30899082 fbshipit-source-id: 85eb38c419e417147e71758b682cd095308dd0c9	2021-09-16 21:30:18 -07:00
Jordan Fix	64641eaee6	[acc_utils] Add print_model_info (#65045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65045 This is a useful tool for printing out all of the ops that are found in a model after acc_tracer. It assumes the provided model has no `call_module` or `call_method`, which is generally a reasonable assumption assuming a model has been successfully traced by the acc_tracer. Test Plan: Tested locally. Sample output: ``` Model Info: > placeholder: 1184 > get_attr: 655 > output: 2 > torch.fx.experimental.fx_acc.acc_ops.add: 2 > torch.fx.experimental.fx_acc.acc_ops.cat: 23 > torch.fx.experimental.fx_acc.acc_ops.embedding_bag: 576 > torch.fx.experimental.fx_acc.acc_ops.layer_norm: 15 > torch.fx.experimental.fx_acc.acc_ops.linear: 27 > torch.fx.experimental.fx_acc.acc_ops.matmul: 3 > torch.fx.experimental.fx_acc.acc_ops.mul: 17 > torch.fx.experimental.fx_acc.acc_ops.permute: 2 > torch.fx.experimental.fx_acc.acc_ops.reshape: 419 > torch.fx.experimental.fx_acc.acc_ops.sigmoid: 16 > torch.fx.experimental.fx_acc.acc_ops.slice_tensor: 630 > torch.fx.experimental.fx_acc.acc_ops.sum: 4 > torch.fx.experimental.fx_acc.acc_ops.tanh: 315 ``` Reviewed By: 842974287 Differential Revision: D30954829 fbshipit-source-id: 5c4f0770667b72859b74099d9f4575284fc48bd2	2021-09-16 20:29:22 -07:00
Yinghai Lu	8c38d141df	Add back the owning_module fix (#65159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65159 This was a legit fix originally introduced in D30905949 (`446d95a7f6`). But we hesitated and removed it for some reason. Putting it back. Reviewed By: 842974287 Differential Revision: D30996277 fbshipit-source-id: 3f5eede11dba2072e7cd5ae6ca7ac81d55fb75fa	2021-09-16 19:29:56 -07:00
Rui Zhu	c886406ce0	Add dropout shape inference as no-op in acc_tracer (#65113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65113 Register dropout as no-op in acc_tracer & Add shape inference for no-op Test Plan: buck test glow/fb/fx/acc_tracer:test_acc_shape_inference --- test_unary_15_dropout_no_op buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_dropout Reviewed By: jfix71 Differential Revision: D30880679 fbshipit-source-id: 592fe50e17137c94c12727658191dedf08daf8cf	2021-09-16 18:26:55 -07:00
Nikita Shulga	6f120ada50	Pin SciPy to 1.6.2 on Windows (#65017 ) Summary: Re-enable previously disabled test_distributions Note: conda does not have ScipPy-1.6.3, only 1.6.2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65017 Reviewed By: seemethere Differential Revision: D31003199 Pulled By: malfet fbshipit-source-id: 96b9d2a833f703008bb1f4df9361db8ec6f8ccc6	2021-09-16 18:25:43 -07:00
Avery Wang	0a5149019f	Added logging for the Reducer's non-member functions. (#65023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65023 Added an optional logging parameter for non-member functions `compute_bucket_assignment_by_size` and `verify_replica0_across_processes`. If a logger is provided then `TORCH_CHECK` assertions are replaced with a wrapper that logs the error to the DDP reducer's logger before calling `TORCH_CHECK`. If a logger is not provided `TORCH_CHECK` is still called. Modified python-side calls to `_compute_bucket_assignment_by_size` and `_verify_model_across_ranks` to include a logger whenever possible. A notable exception is when these non-member functions are called in DDP's constructor - we cannot pass in a logger as they may have not been initialized yet. We also added 4 new tests: `test_compute_bucket_assignment_by_size_sparse_error_{with, without}_logger` which tests the `_compute_bucket_assignment_by_size` function to ensure that sparse tensors are rejected and the errors are logged. `test_verify_model_across_rank_{with, without}_logger` calls `_verify_model_across_ranks` to ensure that ill-formed models (different ranks have different number of parameters compared to rank 0) are rejected and the errors are logged. The test `test_ddp_model_diff_across_ranks` remains unchanged - while it does construct a ill-formed DDP instance which triggers the error in `_verify_model_across_ranks`, we cannot check the logger because this error appears in the constructor. Lastly, did some cleanup of the `test_ddp_model_diff_across_ranks` function to make the logic of choosing which context manager and error message to use more clean. Test Plan: Build commands `buck build mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn --keep-going` `buck build mode/dev-nosan //caffe2/test/distributed:distributed_gloo_spawn --keep-going` Test commands Test for `_compute_bucket_assignment_by_size` (Python)/ `compute_bucket_assignment_by_size` (C++) `BACKEND={nccl, gloo} WORLD_SIZE=2 ../buck-out/dev/gen/caffe2/test/distributed/distributed_{nccl, gloo}_spawn#binary.par -r test_compute_bucket_assignment_by_size_sparse_error_{with, without}_logger` Test for `_verify_model_across_ranks` (Python)/`verify_replicas0_across_process` (C++) `BACKEND={nccl, gloo} WORLD_SIZE=2 ../buck-out/dev/gen/caffe2/test/distributed/distributed_{nccl, gloo}_spawn#binary.par -r test_verify_model_across_ranks_{with, without}_logger` Test that constructs an ill-formed DDP instance. Only did cleanup of this function. `BACKEND={nccl, gloo} WORLD_SIZE=2 ../buck-out/dev/gen/caffe2/test/distributed/distributed_{nccl, gloo}_spawn#binary.par -r test_ddp_model_diff_across_ranks` Reviewed By: rohan-varma Differential Revision: D30924790 fbshipit-source-id: dae6fa82485a204a6a4b022f2d073417d07ebb2f	2021-09-16 16:39:39 -07:00
kshitij12345	873255c6d9	OpInfo: nn.functional.conv2d (#63517 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Reference: https://github.com/facebookresearch/functorch/issues/78 Mostly inspired from https://github.com/pytorch/pytorch/issues/62882 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63517 Reviewed By: heitorschueroff Differential Revision: D30993855 Pulled By: zou3519 fbshipit-source-id: 7402f99addb4ef8f19c2ce1a09ed9006e737cc7e	2021-09-16 14:27:36 -07:00
Jane Xu	4c4c03124b	Remove old references to 9.2 in documentation (#65059 ) Summary: Removes references in .rst and README.md and comments in the Dockerfile Pull Request resolved: https://github.com/pytorch/pytorch/pull/65059 Reviewed By: malfet Differential Revision: D30961110 Pulled By: janeyx99 fbshipit-source-id: 702a9a81bf08125ec4ac38bc656fc2c128c30018	2021-09-16 13:24:05 -07:00
Kefei Lu	4c15f8e8b4	Provide function interface for `remove_duplicate_output_args` (#65134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65134 So that its implementation can be abstracted and replaced Test Plan: Run linter, CI Reviewed By: 842974287 Differential Revision: D30966916 fbshipit-source-id: 92ec78c7410d0be14faecb0ba1eafdc74bab5a5d	2021-09-16 13:17:37 -07:00
Kefei Lu	f9c341fdf2	Add type annotation for `TRTInterpreter.run` (#65135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65135 Opportunistically adding type annotation as I work through fx2trt code base. Test Plan: run linter and CI Reviewed By: houseroad, 842974287 Differential Revision: D30903185 fbshipit-source-id: 3f700b57f4433f2d312c1ff2e6b99948e3c8845c	2021-09-16 13:16:06 -07:00
Charles David Hernandez	8a094e3270	[quant]ao migration for quantization mappings and fuser method mappings hg mv (#64985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64985 moving quantization_mappings.py and fuser_method_mappings.py to the ao folder while retaining backwards compatibility also added dict test ghstack-source-id: 138215312 Test Plan: buck test mode/dev //caffe2/test:quantization https://www.internalfb.com/intern/testinfra/testrun/7036874471986444 buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization https://www.internalfb.com/intern/testinfra/testrun/5348024625792701 Reviewed By: z-a-f Differential Revision: D30982551 fbshipit-source-id: 00f53bd44009d6012a7de852000aad6885131edb	2021-09-16 12:59:20 -07:00
Jane Xu	9af6fe991c	Remove CUDA 9.2 and older references from our cmake (#65065 ) Summary: Removes old CUDA references in our cuda.cmake Pull Request resolved: https://github.com/pytorch/pytorch/pull/65065 Reviewed By: malfet Differential Revision: D30992673 Pulled By: janeyx99 fbshipit-source-id: 85b524089ed57e5acbc71720267cf05e24a8c20a	2021-09-16 12:54:49 -07:00
Nikita Shulga	67570a60ba	Disable ParallelTBB (#65092 ) Summary: As ParallelTBB's `at::get_thread_num` is not compatible with general model used by OpenMP and ParallelNative (where it is an contiguous thread index within parallel loop), see https://github.com/pytorch/pytorch/issues/64571#issuecomment-914691883 More examples of similar regressions: https://github.com/pytorch/pytorch/runs/3612142217 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65092 Reviewed By: zhouzhuojie Differential Revision: D30995936 Pulled By: malfet fbshipit-source-id: db145b6a850d794f2c954f59f30249b291473e36	2021-09-16 12:38:45 -07:00
Zhengxu Chen	96cb05b49a	Introduce tensorRT as builtin module for torch::deploy. (#63818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63818 ghstack-source-id: 138156957 Test Plan: next diff Reviewed By: wconstab Differential Revision: D30499309 fbshipit-source-id: 4ab1bc9896243c0c1503afb18fbfb196fc37404e	2021-09-16 11:27:51 -07:00
David Berard	8eb21488fd	[JIT] Improve BatchMM mutability handling (#65097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65097 Previously, BatchMM would skip any block containing any mutable operators. Now it will avoid batching any operation whose inputs or outputs are ever mutated. Specifically: consider a tree of ADD, T, and MM nodes rooted at an ADD node. If any input or output to any node in the tree is ever mutated, then the entire tree will be ignored by BatchMM. Test Plan: python test/test_jit.py TestBatchMM Reviewed By: eellison Differential Revision: D30973515 Pulled By: davidberard98 fbshipit-source-id: 9d836faa1ef0c9e3fefe0ffc0bd265f275471f48	2021-09-16 10:46:14 -07:00
Charles David Hernandez	f309f8fbd4	[quant] ao migration of observer and qconfig (#64982 ) Summary: (Had to recreate this diff so it wasn't dependent on the stack) Pull Request resolved: https://github.com/pytorch/pytorch/pull/64982 migration of qconfig.py and observer.py to torch/ao/quantization using new test format ghstack-source-id: 138215256 Test Plan: buck test mode/opt //caffe2/test:quantization https://www.internalfb.com/intern/testinfra/testconsole/testrun/8444249354294701/ buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization https://www.internalfb.com/intern/testinfra/testrun/3940649742829796 Reviewed By: z-a-f Differential Revision: D30982534 fbshipit-source-id: 48d08969b1984311ceb036eac0877c811cd6add9	2021-09-16 10:33:16 -07:00
Kushashwa Ravi Shrimali	97e86cf319	[Fix] Raise error when empty index tensor is passed (gather) (#65006 ) Summary: See https://github.com/pytorch/pytorch/pull/63312#issuecomment-919330081 for context. cc: ezyang ysiraichi Pull Request resolved: https://github.com/pytorch/pytorch/pull/65006 Reviewed By: mruberry Differential Revision: D30937730 Pulled By: ezyang fbshipit-source-id: a8f77b1f40d07e7e3bef6caaafa119685f297638	2021-09-16 10:14:26 -07:00
James Reed	874f9bd509	[FX] Gate FXGraphDrawer on whether pydot is installed (#65088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65088 Test Plan: Imported from OSS Reviewed By: khabinov Differential Revision: D30967951 Pulled By: jamesr66a fbshipit-source-id: dba2f13a47889b3d4187de925b4fe74ee90b7f79	2021-09-16 10:04:33 -07:00
Michael Dagitses	2c57bbf521	add support for indexing to meshgrid (#62722 ) Summary: This is step 3/7 of https://github.com/pytorch/pytorch/issues/50276. It only adds support for the argument but doesn't implement new indexing modes yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62722 Test Plan: Verified this is not FC breaking by adding logging to both meshgrid overloads and then called meshgrid twice: `meshgrid(tensors)` and `meshgrid(tensors, indexing='ij')` This confirmed that the former signature triggered the original native function and the latter signature triggered the new native function. Reviewed By: H-Huang Differential Revision: D30394313 Pulled By: dagitses fbshipit-source-id: e265cb114d8caae414ee2305dc463b34fdb57fa6	2021-09-16 09:59:49 -07:00
Richard Zou	67bd2a31b5	[Reland] Add python mode (#64360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64360 This PR adds a (private) enable_python_mode context manager. (see torch/utils/_python_dispatch.py). enable_python_mode accepts the type of a __torch_dispatch__ object as its argument. Whenever an operator gets called inside of the context manager, it dispatches to the __torch_dispatch__ of the passed-in type. Example usage: ``` with enable_python_mode(LoggingTensor): z = torch.empty([]) assert isinstance(z, LoggingTensor) ``` There are quite a few changes that were made to support this. First, we added TorchDispatchTypeObject, a C++ struct that represents the type of a `__torch_dispatch__` object (e.g. LoggingTensor). It holds both the PyObject* representing the class and a PyInterpreter* so we know which Python interpreter it came from. Next, we updated the concrete_dispatch_fn in python_variable.cpp to accept a `const std::shared_ptr<TorchDispatchTypeObject>&` argument. When this is null, dispatching happens as usual. When it is non-null, we prepend the TorchDispatchTypeObject's PyObject* to the overloaded args list so that it is considered first for dispatch. To get that to work, we changed how `handle_torch_dispatch_no_python_arg_parser` works. The "overloaded args list" previously only consisted of Tensor PyObjects, but now it can have types in addition to Tensors! - We renamed `append_overloaded_arg` to `append_overloaded_arg` - We added a new `append_overloaded_type` that appends a type to overloaded_args - We added special handling in `handle_torch_dispatch_no_python_arg_parser` and `append_overloaded_arg` to handle types in addition to Tensors. Then, there is PythonMode and PythonModeTLS. - We reuse the DispatchKey::Python dispatch key as a mode key - We use PythonMode::enter and PythonMode::exit to enable/disable DispatchKey::Python and set the PythonModeTLS. - PythonModeTLS stores a TorchDispatchTypeObject as metadata. - PythonMode is in libtorch_python, and PythonModeTLS is in ATen. This split is due to the libtorch_python library boundary (because we need to save TLS in ATen/ThreadLocalState) - We modify the PythonFallbackKernel to look up the relevant TorchDispatchTypeObject (if Python Mode is active) and dispatch using it. There are two more miscellaneous changes: - internal_new_from_data (torch/csrc/utils/tensor_new.cpp) gets an exclude guard. enable_python_mode currently does not handle torch.tensor and the exclude guard is to prevent a bug. Future: - This PR does not allow for the nesting of Python modes. In the future we should be able to enable this with a more sane no_dispatch API and by changing the TLS to a stack. For now I did not need this for CompositeImplicitAutograd testing. Test Plan: - new tests Reviewed By: ezyang Differential Revision: D30698082 Pulled By: zou3519 fbshipit-source-id: 7094a90eee6aa51f8b71bc4d91cfb6f49e9691f8	2021-09-16 09:02:30 -07:00
Alban Desmaison	8800a8b428	Revert D30888794: [Model Averaging] Simplify PostLocalSGD Optimizer API Test Plan: revert-hammer Differential Revision: D30888794 (`3d312b3b8e`) Original commit changeset: 21261b480f6b fbshipit-source-id: 87abb7e8cd9ecaac909ec6c3ee053fa7c4ae1975	2021-09-16 06:39:57 -07:00
Rodrigo Berriel	83878e19ff	Improve LSTM documentation for proj_size > 0 (#65102 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65053. Although the documentation states that: `fe0f9d1daf/torch/nn/modules/rnn.py (L500-L506)` It seems that the definition of `weight_ih_l[k]` could be improved by specifying what happens when `k > 0` and `proj_size > 0`. As `proj_size` is only used in LSTM, no changes are needed for the other RNNs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65102 Reviewed By: supriyar Differential Revision: D30975781 Pulled By: jbschlosser fbshipit-source-id: 12df06e5e6a8d5de0ad10fb15e33c3e6311c11d3	2021-09-16 06:35:27 -07:00
Scott Wolchok	f69cf3cf2f	[Static Runtime] Use FastSet instead of std::set everywhere (#65114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65114 There doesn't seem to be any reason to use std::set for sets of pointers, right? ghstack-source-id: 138198504 Reviewed By: hlu1 Differential Revision: D30978450 fbshipit-source-id: 4599c6249fda3a89959f839d3bf6400c5891f82c	2021-09-15 21:44:54 -07:00
Amr Elshennawy	0bda7476cf	Reduce PyToch Warnings - Cast fixes from D26624430 (#65015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65015 Split out the existing fixes into a diff we can land separately. Test Plan: pooled_embeddings_modules_test Parsing buck files: finished in 8.3 sec Creating action graph: finished in 38.3 sec [RE] Metadata: Session ID=[https://fburl.com/b/reSessionID-9bea421c-875e-4168-9e00-7d67479b1a9f] [RE] Waiting on 46 remote actions. Completed 905 actions remotely, action cache hit rate: 5.08%. Downloaded 7002/8869 artifacts, 560.00 Mbytes, 11.6% cache miss (for updated rules) Building: finished in 13:12.4 min (100%) 31964/31964 jobs, 17344/31964 updated Total time: 13:59.1 min More details at https://www.internalfb.com/intern/buck/build/b9a58bba-e0aa-4c2b-8824-a0c4074b0954 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 28cbe2b1-6fbc-450c-91c9-c06a7ff1d53b Trace available for this run at /tmp/tpx-20210914-114921.005504/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/1407375088325000 ✓ ListingSuccess: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - main (23.849) {emoji:2702} Omit: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda) Test output: > This test was disabled. To run this test locally, add the command line flag --run-disabled to your test command (prefix with -- if using buck). To view why this is disabled or re-enable this test in the test console, visit https://our.intern.facebook.com/intern/testinfra/testdetail/562949981577936 ↻ Skip: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu) (13.201) Test output: > Repro command : $(cat "/tmp/tpx-20210914-114921.005504/dc174692-8d92-4459-8b8f-201643c6ab7d/execution_command") Skipped: CUDA is not available or no GPUs detected stdout: stderr: ↻ Skip: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation_autograd (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda) (13.201) Test output: > Repro command : $(cat "/tmp/tpx-20210914-114921.005504/dc174692-8d92-4459-8b8f-201643c6ab7d/execution_command") Skipped: CUDA is not available or no GPUs detected stdout: stderr: ✓ Pass: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_compatibility (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda) (13.201) ↻ Skip: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation_autograd (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu) (13.201) Test output: > Repro command : $(cat "/tmp/tpx-20210914-114921.005504/dc174692-8d92-4459-8b8f-201643c6ab7d/execution_command") Skipped: CUDA is not available or no GPUs detected stdout: stderr: ✓ Pass: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_compatibility (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu) (13.201) ✓ Pass: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - main (13.201) Summary Pass: 3 Skip: 3 ↻ caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu) ↻ caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation_autograd (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda) ↻ caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation_autograd (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu) Omit: 1 {emoji:2702} caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda) ListingSuccess: 1 shape_inference_mode_test [amrelshennawy@devvm855.ftw0 /data/users/amrelshennawy/fbsource/fbcode] buck test caffe2/torch/fb/sparsenn:shape_inference_mode_test Downloaded 6/18 artifacts, 11.69 Kbytes, 53.8% cache miss (for updated rules) Building: finished in 1.6 sec (100%) 110/110 jobs, 26/110 updated Total time: 1.8 sec More details at https://www.internalfb.com/intern/buck/build/0e5f45b2-5777-49e9-a3b0-09bd05687b2b Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 99509108-5ff3-4b1a-b7b3-2f43c4036209 Trace available for this run at /tmp/tpx-20210914-120119.723607/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/6192449502564504 ✓ ListingSuccess: caffe2/torch/fb/sparsenn:shape_inference_mode_test - main (0.374) ✓ Pass: caffe2/torch/fb/sparsenn:shape_inference_mode_test - test_set_upper_bound_mode (torch.python.fb.shape_inference_mode_test.TestShapeInferenceMode) (0.249) ✓ Pass: caffe2/torch/fb/sparsenn:shape_inference_mode_test - test_set_upper_bound_settings (torch.python.fb.shape_inference_mode_test.TestShapeInferenceMode) (0.253) Summary Pass: 2 ListingSuccess: 1 test [amrelshennawy@devvm855.ftw0 /data/users/amrelshennawy/fbsource/fbcode] buck test caffe2/torch/fb/sparsenn:test Parsing buck files: finished in 1.1 sec Creating action graph: finished in 38.6 sec Downloaded 6/30 artifacts, 11.29 Kbytes, 66.7% cache miss (for updated rules) Building: finished in 41.6 sec (100%) 26783/26783 jobs, 43/26783 updated Total time: 01:21.4 min More details at https://www.internalfb.com/intern/buck/build/8f794eb0-3d3c-4ee3-9aec-5ec5cec1b0f4 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: a06164b5-d7d7-444c-a4ff-e312cb9970d9 Trace available for this run at /tmp/tpx-20210914-120428.464799/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/3377699789132066 ✓ ListingSuccess: caffe2/torch/fb/sparsenn:test - main (16.637) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_dense_mlp_quantize_ops (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.870) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges_shape_inference_mode (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.922) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_gather_ranges_to_dense_caffe2 (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.348) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_simple (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.370) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_recat_embedding_grad_output_mixed_D_batch (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.516) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_embedding_bag_byte_rowwise_offsets (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.515) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_ranges (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.861) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_embedding_bags (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.873) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_ranges_out (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.969) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_pack_segments_pad_minf (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.104) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_multiple_runs (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.342) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_sigrid_transform (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.664) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_ranges_out_empty_batch (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.745) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_lengths (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.771) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_multiple_runs_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.944) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_ranges_empty_batch (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.944) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_gather_ranges_shape_inference_mode (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.245) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_prior_correction_calibration_prediction_nonbinary (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.328) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_8bitfakefused (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.501) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_ranges (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (20.608) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_lengths_inference_tests (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (22.403) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_broadcast_cat_out (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (23.025) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_lengths_negatives_tests (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (23.956) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_broadcast_cat (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (24.100) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_transform_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (17.384) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_values_scores_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.672) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_empty_values_scores_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.679) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_pack_segments (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.726) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_ranges_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (17.567) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_box_cox_all_zeros (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.036) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_rowwise_prune_op_32bit_indices (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.430) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_transform_torch_bind_upper_bound (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.176) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_dense_feature_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.006) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges_gather (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.555) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_int_nbit_split_embedding_codegen_lookup_function (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.791) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_pack_segments_smaller_max_len (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.737) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_pos (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.212) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_embedding_bag_2bit_rowwise_offsets (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.612) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_prior_correction_calibration_prediction_binary (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.858) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_tracing_torch_bind_upper_bound (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.002) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_tracing (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (20.824) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_1d_counts (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.976) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_recat_embedding_grad_output_mixed_D (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.832) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_one_hot_lengths (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.844) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.558) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_box_cox_non_zeros (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.418) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_prior_correction_calibration_accumulate (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.222) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_unsqueeze_vector (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.327) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_embedding_bag_4bit_rowwise_offsets (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.772) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.425) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_broadcast_cat_backward (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.956) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_offsets_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.320) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_gather_ranges (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.923) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_one_hot (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.549) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_sigrid_transforms_create (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.932) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges_gather_lengths_to_offsets (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.807) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_length_to_row_idx (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.738) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_tracing_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (20.175) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_box_cox_mixed (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.116) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_1d_bins (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.671) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_permute_out (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.002) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_create_sigrid_transforms_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.151) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_ranges_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (16.780) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_no_bins (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.185) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_cumsum (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.242) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_le_one (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.876) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_pack_and_unpack_segments (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.222) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_dims (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.007) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_sigrid_hash_op (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.959) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_rowwise_prune_op_64bit_indices (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.601) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_ranges_torch_bind_upper_bound (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (17.977) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_broadcast_stack (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (22.588) ✓ Pass: caffe2/torch/fb/sparsenn:test - test_multiple_runs_torch_bind_upper_bound (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (15.342) Summary Pass: 73 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/3377699789132066 Did not run (no GPU on my devserver): gpu_test cpp_gpu_test Reviewed By: r-barnes Differential Revision: D30940399 fbshipit-source-id: d867ca646723340775a49c1b983cdab64f2d67d8	2021-09-15 21:20:41 -07:00
Priya Ramani	db601434ef	Bug fix (#65105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65105 Using buildErrorMessage in external_functions.cpp was breaking build target nnc_cpu_backend_lib as buildErrorMessage is defined in tensorexpr/kernel.cpp which is not included in mobile builds and we don't want to include it in mobile builds. Also buildErrorMessage wraps error messages for fuser whereas nnc_aten_conv2d is now only used in AOT workflow and not called by the fuser. So wrapping assertion failures with fuser error message would be misleading for AOT workflow. Test Plan: Before fix: ``` + buck build //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc Downloading... 3/3 artifacts, 24.81 Kbytes, 0.0% cache miss (for updated rules) Building... 1.7 sec (99%) 4639/4641 jobs, 3/4641 updated - //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc#binary... 0.7 sec (running c++ link[0.6 sec]) Command failed with exit code 1. command: [/data/users/priyaramani/fbsource/buck-out/cells/fbcode/gen/aab7ed39/tools/build/buck/wrappers/__ld__/ld.sh, --ld=/data/users/priyaramani/fbsource/fbcode/third-party-buck/platform009/build/llvm-fb/9.0.0/bin/clang++, --cc=/data/users/priyaramani/fbsource/buck-out/cells/fbcode/gen/aab7ed39/tools/build/buck/wrappers/__fbc... <truncated> ... stderr: clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument] ld.lld: error: undefined symbol: torch::jit::tensorexpr::buildErrorMessage(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) >>> referenced by external_functions.cpp:69 (xplat/caffe2/torch/csrc/jit/tensorexpr/external_functions.cpp:69) >>> ../nnc_cpu_backend_lib#compile-external_functions.cpp.o50e02bc2,platform009-clang/torch/csrc/jit/tensorexpr/external_functions.cpp.o:(nnc_aten_conv2d) in archive /data/users/priyaramani/fbsource/buck-out/gen/aab7ed39/xplat/caffe2/nnc_cpu_backend_lib#platform009-clang,static/libnnc_cpu_backend_lib.a clang-9: error: linker command failed with exit code 1 (use -v to see invocation) When running <c++ link>. When building rule //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc#binary (ovr_config//platform/linux:x86_64-fbcode). clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument] ld.lld: error: undefined symbol: torch::jit::tensorexpr::buildErrorMessage(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) >>> referenced by external_functions.cpp:69 (xplat/caffe2/torch/csrc/jit/tensorexpr/external_functions.cpp:69) >>> ../nnc_cpu_backend_lib#compile-external_functions.cpp.o50e02bc2,platform009-clang/torch/csrc/jit/tensorexpr/external_functions.cpp.o:(nnc_aten_conv2d) in archive /data/users/priyaramani/fbsource/buck-out/gen/aab7ed39/xplat/caffe2/nnc_cpu_backend_lib#platform009-clang,static/libnnc_cpu_backend_lib.a clang-9: error: linker command failed with exit code 1 (use -v to see invocation) Command failed with exit code 1. command: [/data/users/priyaramani/fbsource/buck-out/cells/fbcode/gen/aab7ed39/tools/build/buck/wrappers/__ld__/ld.sh, --ld=/data/users/priyaramani/fbsource/fbcode/third-party-buck/platform009/build/llvm-fb/9.0.0[DEBUG kernel.cpp:2766] } ``` After fix: ``` + buck build //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc Action graph will be rebuilt because files have been added or removed. clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument] Downloaded 11/15 artifacts, 78.37 Kbytes, 15.4% cache miss (for updated rules) Building: finished in 7.4 sec (100%) 4718/4718 jobs, 46/4718 updated Total time: 7.5 sec More details at https://www.internalfb.com/intern/buck/build/b87be016-340c-49f8-b832-0c1de70aae9e ``` Reviewed By: ZolotukhinM Differential Revision: D30975952 fbshipit-source-id: 85c028cc6af63c03b505b51302f5158c23e1a047	2021-09-15 20:11:30 -07:00
Jordan Fix	2bb898e039	[acc_ops] Add support for torch variants of squeeze and mul (#65037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65037 att Test Plan: updated unit tests Reviewed By: yuhc Differential Revision: D30952224 fbshipit-source-id: aaf75b27b4fc6c0436ba7bfcf324f761b900171b	2021-09-15 19:41:04 -07:00
Priya Ramani	206646d6ed	Add NNC AOT Compiler executable (#63994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63994 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30582149 Pulled By: priyaramani fbshipit-source-id: 3bbf085428824c3cb308e006c18bb0a57f50fef6	2021-09-15 19:18:24 -07:00
Zafar Takhirov	e0ecd09011	[quant] AO migration of the `_correct_bias.py`, `_equalize.py`, and `_learnable_fake_quantize.py` (#64917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64917 AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates from torch.quantization to torch.ao.quantization the following files: - `_correct_bias.py` - `_equalize.py` - `_learnable_fake_quantize.py` Note: These file are migrated completely without any warning. The old location is thus silently deprecated. Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestBiasCorrection` Reviewed By: vkuzo Differential Revision: D30898565 fbshipit-source-id: 1d39be2539dd1adfcb42e16bdcc0daf5c8316bbd	2021-09-15 18:15:39 -07:00
Jane Xu	3ceecebed0	.circleci/.jenkins: Remove 9.2 references in CI (#65024 ) Summary: Removes 9.2 references in CI scripts and configs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65024 Reviewed By: driazati Differential Revision: D30945948 Pulled By: janeyx99 fbshipit-source-id: 77890a00520c61500a934a90a74e3fcca84c09b5	2021-09-15 18:06:57 -07:00
Jane Xu	d9d8250e3f	.github: GHA add retry for docker run in chown workspace step (#65104 ) Summary: This should help prevent further errors in GHA workflows during the Chown Workspace step such as https://github.com/pytorch/pytorch/runs/3614067053 I did not add retries to other steps with docker run Pull Request resolved: https://github.com/pytorch/pytorch/pull/65104 Reviewed By: seemethere Differential Revision: D30976330 Pulled By: janeyx99 fbshipit-source-id: e403008548aa01c9a0a4ccebe56df0e889dd045c	2021-09-15 18:02:07 -07:00
Eli Uriegas	03389dc851	Revert D30752939: [pytorch][PR] nvfuser update Test Plan: revert-hammer Differential Revision: D30752939 (`cfaecaf40b`) Original commit changeset: ce122e80f01b fbshipit-source-id: 57685df8f9946032a06eff1de8a3d1498500d2d2	2021-09-15 17:38:47 -07:00
Zafar Takhirov	c151d62f45	[quant] AO migration of the `quant_types.py` (phase 1) (#64916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64916 AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates the quant_type.py from torch.quantization to torch.ao.quantization. At this point both locations will be supported. Eventually the torch.quantization will be deprecated. Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization` Reviewed By: vkuzo Differential Revision: D30898422 fbshipit-source-id: 3e6126b49f0565a4136d6928cea9eb25368927ff	2021-09-15 17:30:00 -07:00
Zafar Takhirov	a42996f16e	[quant] AO migration of the `fuse_modules.py` (phase 1) (#64913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64913 AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates the fuse_module.py from torch.quantization to torch.ao.quantization. At this point both locations will be supported. Eventually the torch.quantization will be deprecated. Test Plan: `buck test mode/dev //caffe2/test:quantization` Reviewed By: vkuzo Differential Revision: D30882819 fbshipit-source-id: 1926ad6aa49136aceb5b625dcef4bfde3a2860d4	2021-09-15 17:28:47 -07:00
Mikhail Zolotukhin	7e9c599784	[TensorExpr] Add a method for sanitizing Var and Buf names in Stmt. (#65010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65010 This pass ensures all names are legal and not-duplicated. Fixes #52727. Test Plan: Imported from OSS Reviewed By: bertmaher, navahgar Differential Revision: D30939717 Pulled By: ZolotukhinM fbshipit-source-id: 7dbe7f937de41f22ad49137a5e067d698443ed63	2021-09-15 17:15:06 -07:00
Eli Uriegas	3d5923366d	.github: Enable only specific workflows for canary (#65099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65099 Utilizes ciflow to enable only specific workflows for pytorch/pytorch-canary to reduce noise on that specific repository Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D30973691 Pulled By: seemethere fbshipit-source-id: 371765535b42a00bd72c2551c4faebf733d759f0	2021-09-15 16:53:12 -07:00
Eli Uriegas	59c486f2f3	ci: Disable jit legacy on circleci, enable on gha (#65106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65106 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: malfet, janeyx99 Differential Revision: D30976186 Pulled By: seemethere fbshipit-source-id: 8958f821eab9aa284496c57915894ed70f6b2fff	2021-09-15 16:11:38 -07:00
Jane Xu	b75d3cae4c	CI: Upgrade windows 10.1 jobs to 10.2 (#65080 ) Summary: This is first 2 steps in the following task: 1. Upgrade 10.1 to 10.2 2. Migrate force_on_cpu job to GHA Pull Request resolved: https://github.com/pytorch/pytorch/pull/65080 Test Plan: https://github.com/pytorch/pytorch/pull/65086 Reviewed By: seemethere Differential Revision: D30973655 Pulled By: janeyx99 fbshipit-source-id: 67ab69ea99ff9e0336400a7173efef6d7daac07c	2021-09-15 16:04:50 -07:00
Jane Xu	3f27c1ae78	Replace windows 10.2 smoke tests on PRs to be 11.3 (#65090 ) Summary: As we default to linux CUDA 11.3 on PRs, we should do the same thing with Windows (instead of having 10.2 be the default). This means that 10.2 will now be master only, and 11.3 windows smoke tests will run on every PR. This also copies over the "run smoke tests only" config--removing that will be in a separate PR once there's more certain decision making. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65090 Reviewed By: seemethere Differential Revision: D30968382 Pulled By: janeyx99 fbshipit-source-id: c73f9a2cc800b678909365c4d80627d29fc09f94	2021-09-15 16:01:07 -07:00
Natalia Gimelshein	ec1af11c2e	Revert D30883290: [Static Runtime] Move MemoryPlanner out into memory_planner.cpp Test Plan: revert-hammer Differential Revision: D30883290 (`0e11454d19`) Original commit changeset: a37570f8d943 fbshipit-source-id: 65c57a2b0d2e3c7006765195dd519e8cf2472f72	2021-09-15 15:40:34 -07:00
Charles David Hernandez	37bcefa248	[quant] Removing hardcoded "torch.quantization.observer" for migration (#64981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64981 this would have cause errors when observer.py was moved to ao. see: D30391189 ghstack-source-id: 138118430 Test Plan: buck test mode/opt //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_dynamic_quant_multi_uses (quantization.jit.test_quantize_jit.TestQuantizeDynamicJitPasses)' buck test mode/opt //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_save_load_state_dict_script (quantization.core.test_workflow_module.TestObserver)' Reviewed By: supriyar Differential Revision: D30432008 fbshipit-source-id: 754727a89c78f6ceada6f8ff92c304f3953f38fc	2021-09-15 15:22:19 -07:00
Scott Wolchok	fe0f9d1daf	[Caffe2][easy] Avoid spurious vector copy in TransposeOp (#64403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64403 No need to copy to the heap here. ghstack-source-id: 138033019 Test Plan: CI Reviewed By: smacke Differential Revision: D30712506 fbshipit-source-id: 5f4131b2569ebb1f5092262aaddb17215dea88f1	2021-09-15 15:15:51 -07:00
Scott Wolchok	208cf051d4	[Caffe2] Don't pass vector by value in SqueezeOp (#64400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64400 There appears to be no need to copy this vector. ghstack-source-id: 138033020 Test Plan: CI Reviewed By: smacke Differential Revision: D30711014 fbshipit-source-id: b9fcf3d496a663b8478aa22d52b2c41f8f85e90f	2021-09-15 15:14:30 -07:00
David Riazati	177ebea4c5	Use RDS for build size tracking (#64303 ) Summary: This adds 2 utilities: `register_rds_table` and `rds_write`. `register_rds_table` needs to be called once with the schema for the data that `rds_write` will write. These go to a lambda called `rds-proxy`, which will write to/read from the DB as necessary. This data can then be arbitrarily queried via `rds-proxy` (for use in CI) or on metrics.pytorch.org (for analysis). It also hooks these up for build size tracking (which previously was not working on GHA) Pull Request resolved: https://github.com/pytorch/pytorch/pull/64303 Reviewed By: mruberry Differential Revision: D30941182 Pulled By: driazati fbshipit-source-id: 12c5575ddd29902477464fc989ad76a052306b9b	2021-09-15 14:47:37 -07:00
jiej	cfaecaf40b	nvfuser update (#63745 ) Summary: Syncing nvfuser code base from devel branch, Listing a few of our development since last sync: - Extends support to normalization and reduction kernels. - Multiple kernel launch for single `CudaFusionGroup`. Hierarchical caching system has been updated to cache graph segmentation. - profile_ivalue is enabled to convert dynamic scalar into compile time constants, which are required by the codegen. (e.g. reduction axes). To keep this PR simple and relatively review-free. We stripped most external changes and submitted them as separate PRs, so this gigantic PR is easier to handle. internal updates are files located in: 1. updates in nvfuser codegen `torch/csrc/jit/coddgen/cuda` 2. added nvfuser specific benchmarks `benchmarks/cpp/nvfuser` 3. nvfuser jit cpp tests `test/cpp/jit/test_gpu.cpp` `test/cpp/jit/test_gpu_shift.cpp` `test/cpp/jit/test_gpu_validator.h` updates affecting integration: 1. profile_ivalue enabled for nvfuser. related changes are in `torch/csrc/jit/runtime/`, 2. exposed a few more symbols `aten/src/ATen/core/` used by codegen Pull Request resolved: https://github.com/pytorch/pytorch/pull/63745 Reviewed By: saketh-are Differential Revision: D30752939 Pulled By: malfet fbshipit-source-id: ce122e80f01bcd3865f5bd3c4dfde660665fd84c	2021-09-15 14:42:55 -07:00
Elias Ellison	59988f81bd	Add embedding shape analysis (#64323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64323 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738145 Pulled By: eellison fbshipit-source-id: be12408330d671bc65cf645aa2c20fafd954e6a9	2021-09-15 13:45:48 -07:00
Elias Ellison	29514bfcdb	Max Pool with indices (#64121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64121 Add support for aten operators which return multiple outputs Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738142 Pulled By: eellison fbshipit-source-id: 0d7e51187bd5e3e9b43f0fdb5178366a97aec943	2021-09-15 13:45:46 -07:00
Elias Ellison	2626cd3ba4	Add Maxpool to shape analysis / Opinfo (#63530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63530 how to review: pretty much just check that the inputs generated are a good representation of the op semantics, that should be sufficient for correctness, and then you can also double check the op size semantics by going to https://codebrowser.bddppq.com/pytorch/pytorch/ typing in native::{op_name} and looking at the op implementation as a bonus if you want Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738147 Pulled By: eellison fbshipit-source-id: cf52339e572ee04e0d6167fd95d8a82d58ea7706	2021-09-15 13:44:33 -07:00
Zafar Takhirov	425f173f9d	[quant][refactor] Change the structure of the ao migration tests (#64912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64912 The test naming was confusing and ambiguous. The file was changed to reflect the framework that is being migrated ("quantization" instead of "quantize"). Also, the common testing class was extracted out ghstack-source-id: 138157450 Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization` Reviewed By: vkuzo Differential Revision: D30898214 fbshipit-source-id: 017f95995271d35bcdf6ff6a1b3974b837543e84	2021-09-15 13:15:43 -07:00
David Riazati	2967a48b78	Add retries to ECR login step (#65013 ) Summary: Switch retry mode from `legacy` to `standard` (https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-retries.html#cli-usage-retries-configure) and up the number of retries. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/65013 Reviewed By: zhouzhuojie, mruberry Differential Revision: D30943292 Pulled By: driazati fbshipit-source-id: 0a21e9b4eacbb77e6aca22f9256d94cd591b23cd	2021-09-15 13:12:57 -07:00
Ilqar Ramazanli	df3d649380	To add state dict and load_dict for Chained Scheduler (#65034 ) Summary: Adding state_dict() and load_state_dict() methods for Chained Scheduler Pull Request resolved: https://github.com/pytorch/pytorch/pull/65034 Reviewed By: prabhat00155, nateanl Differential Revision: D30958207 Pulled By: datumbox fbshipit-source-id: 1a587a330d34e0548e891a39f8fb5a3d251b71fa	2021-09-15 13:11:41 -07:00
BowenBao	6512838fab	[ONNX] Enhance shape (two changes merged) (#64585 ) Summary: Enhanced shape inference by introducing typeReliableMap. [ONNX] exporter changes for torch hub models (https://github.com/pytorch/pytorch/issues/62856) Pull Request resolved: https://github.com/pytorch/pytorch/pull/64585 Reviewed By: ezyang Differential Revision: D30870418 Pulled By: msaroufim fbshipit-source-id: 87a294799cb87d649d1d13b6114a5cfbac9be15c Co-authored-by: jiafatom <jiafa@microsoft.com>	2021-09-15 13:02:19 -07:00
Don Jang	0e11454d19	[Static Runtime] Move MemoryPlanner out into memory_planner.cpp (#65011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65011 This change moves `MemoryPlanner` out of impl.cpp into memory_planner.cpp. `MemoryPlanner` performs an independent sub-task of static analysis of a graph, and creating memory planning, and allocating/deallocating managed Tensors. This change will reduce merge conflicts as I work on MemoryPlanner more actively for output Tensor support. Test Plan: N/A Reviewed By: mikeiovine Differential Revision: D30883290 fbshipit-source-id: a37570f8d9430224a6987d2190bcf81cf875043d	2021-09-15 12:57:39 -07:00
Kiuk Chung	db134a6843	(torch.distributed.elastic) properly format traceback on error (#65041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65041 Fixes a bug introduced in https://github.com/pytorch/pytorch/pull/64036 where the traceback of the error handler is printed out rather than the traceback of the actual exception. Fixes https://github.com/pytorch/pytorch/issues/60910 Closes https://github.com/pytorch/pytorch/issues/60910 BEFORE (note that the `py_callstack` is NOT the traceback of the RuntimeError): ``` ************************************************************************************************************************************************************************************************************************************************** run_script_path FAILED ================================================================================================================================================================================================================================================== Root Cause: [0]: time: 2021-09-14_22:01:06 rank: 0 (local_rank: 0) exitcode: 1 (pid: 1092727) error_file: /tmp/torchelastic_aeyvjbpe/none_8zuih7tj/attempt_0/0/error.json msg: { "message": "RuntimeError: rasing error since --throw was specified", "extraInfo": { "py_callstack": [ " File \"<string>\", line 1, in <module>\n", " File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/spawn.py\", line 116, in spawn_main\n exitcode = _main(fd, parent_sentinel)\n", " File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/spawn.py\", line 129, in _main\n return self._bootstrap(parent_sentinel)\n", " File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/process.py\", line 315, in _bootstrap\n self.run()\n", " File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/process.py\", line 108, in run\n self._target(self._args, self._kwargs)\n", " File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/multiprocessing/spawn.py\", line 59, in _wrap\n fn(i, args)\n", " File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/api.py\", line 382, in _wrap\n ret = record(fn)(args_)\n", " File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/__init__.py\", line 373, in wrapper\n error_handler.record_exception(e)\n", " File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 86, in record_exception\n _write_error(e, self._get_error_file_path())\n", " File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 26, in _write_error\n \"py_callstack\": traceback.format_stack(),\n" ], "timestamp": "1631682066" } } ================================================================================================================================================================================================================================================== Other Failures: <NO_OTHER_FAILURES> *********************************************************************************************************************************************************************************************************************************************** ``` AFTER (note the traceback is the traceback of the RuntimeError): ``` ****************************************************************************** run_script_path FAILED ================================================================================ Root Cause: [0]: time: 2021-09-14_21:49:25 rank: 0 (local_rank: 0) exitcode: 1 (pid: 1014681) error_file: /tmp/torchelastic_q0zods2c/none_qwmz5dgj/attempt_0/0/error.json msg: Traceback (most recent call last): File "/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 361, in wrapper return f(args, kwargs) File "/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/run.py", line 671, in run_script_path runpy.run_path(sys.argv[0], run_name="__main__") File "/usr/local/fbcode/platform009/lib/python3.8/runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "/usr/local/fbcode/platform009/lib/python3.8/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/usr/local/fbcode/platform009/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/kiuk/tmp/test.py", line 55, in <module> main() File "/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 361, in wrapper return f(args, kwargs) File "/home/kiuk/tmp/test.py", line 25, in main raise RuntimeError("rasing error since --throw was specified") RuntimeError: rasing error since --throw was specified ================================================================================ Other Failures: <NO_OTHER_FAILURES> ****************************************************************************** ``` Test Plan: (see summary for before and after) `test.py` contents: ``` import argparse import os import sys import torch import torch.distributed as dist import torch.nn.functional as F from torch.distributed.elastic.multiprocessing.errors import record def parse_args(argv): parser = argparse.ArgumentParser(description="test script") parser.add_argument("--init_method", type=str, default="env://") parser.add_argument("--backend", type=str, default="gloo") parser.add_argument("--throw", action="store_true", default=False) parser.add_argument("--exit", action="store_true", default=False) return parser.parse_args() record def main(): args = parse_args(sys.argv[1:]) if args.throw: raise RuntimeError("rasing error since --throw was specified") if args.exit: sys.exit(1) init_method=args.init_method backend=args.backend world_size = int(os.environ["WORLD_SIZE"]) rank = int(os.environ["RANK"]) print(f"initializing `{backend}` process group with rank={rank}, world_size={world_size} at {init_method}") dist.init_process_group( backend=backend, init_method=init_method, world_size=world_size, rank=rank) print(f"successfully initialized process group with rank={dist.get_rank()}, world_size={dist.get_world_size()}") t = F.one_hot(torch.tensor(rank), num_classes=world_size) dist.all_reduce(t) derived_world_size = torch.sum(t).item() if derived_world_size != world_size: raise RuntimeError(f"derived world size: {derived_world_size} != actual world size: {world_size}") else: print(f"sucessfully derived world size: {derived_world_size} (expected: {world_size}). Exiting") if __name__ == "__main__": main() ``` run it as: ``` $ python -m torch.distributed.run --nproc_per_node 2 test.py --throw ``` Reviewed By: cbalioglu Differential Revision: D30953731 fbshipit-source-id: bbea04c59c2aec58969cf44d8e3723d5f8abe8a8	2021-09-15 12:50:21 -07:00
soulitzer	4bf7959de2	Remove `run_functional_checks` from `test_autograd` and create necessary OpInfos (#64993 ) Summary: OpInfo tracker: https://github.com/pytorch/pytorch/issues/54261 - Eliminate duplicated testing logic in test_autograd - Moved tests that rely on this testing logic to use OpInfos - `cat` already has OpInfo (no action needed) - Created OpInfo for `block_diag` and `broadcast_tensors` Running into some FX errors. Added op to skip-list and created an issue here: https://github.com/pytorch/pytorch/issues/64997 Both `block_diag` and `broadcast_tensors` are variadic, so skipping `test_variant_consistency_jit` (from comments on other OpInfos, it looks like JIT does not support variadic tensors) Pull Request resolved: https://github.com/pytorch/pytorch/pull/64993 Reviewed By: jbschlosser Differential Revision: D30961736 Pulled By: soulitzer fbshipit-source-id: e169305384a683acae1178c4e12e9e214a67226a	2021-09-15 12:45:38 -07:00
Peter Bell	21017ad1a1	Dispatch.h: Avoid including ivalue (#64165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64165 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728587 Pulled By: ezyang fbshipit-source-id: d0d2e97491d9d5e2d2fc2d6e51420a4467c1bba4	2021-09-15 12:16:44 -07:00
Ilqar Ramazanli	211ad231dc	To add state_dict and load_state_dict to SequentialLR (#65035 ) Summary: To add state_dict() and load_state_dict() methods to SequentialLR Pull Request resolved: https://github.com/pytorch/pytorch/pull/65035 Reviewed By: prabhat00155, nateanl Differential Revision: D30958204 Pulled By: datumbox fbshipit-source-id: 65114e1b07146526ae2680233f5cd42b2534d67a	2021-09-15 12:01:51 -07:00
Nikita Shulga	8a652e0e91	[CircleCI] Disable pytorch_linux_xenial_cuda10_2 test jobs (#65071 ) Summary: As all of them has been migrated to GHA: - pytorch_linux_pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_distributed_test -> "linux-xenial-cuda11.3-py3.6-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)" - pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1 -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (default, 1, 2, linux.8xlarge.nvidia.gpu)" - pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (default, 2, 2, linux.8xlarge.nvidia.gpu)" - pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (multigpu, 1, 1, linux.16xlarge.nvidia.gpu)" - pytorch_linux_xenial_cuda10_2_cudnn7_py3_nogpu_NO_AVX2_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (nogpu_NO_AVX2, 1, 1, linux.2xlarge)" - pytorch_linux_xenial_cuda10_2_cudnn7_py3_nogpu_NO_AVX_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (nogpu_NO_AVX, 1, 1, linux.2xlarge)" - pytorch_linux_xenial_cuda10_2_cudnn7_py3_slow_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (slow, 1, 1, linux.8xlarge.nvidia.gpu)" "pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build" is still a holdout due to slow gradchecks Pull Request resolved: https://github.com/pytorch/pytorch/pull/65071 Reviewed By: driazati, seemethere, janeyx99 Differential Revision: D30963413 Pulled By: malfet fbshipit-source-id: d9a5188ce7eb2f60547b91b854a5db83af2b10e7	2021-09-15 11:59:40 -07:00
Samuel Salas	f1ce64a58e	Starter Task 1 (#64927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64927 Mypy error corrections Test Plan: Corrected mypy errors to make code less prone to bugs by modifying types or adding lines that avoid special undesired cases e.g. asserting a variable to not None. Reviewed By: wushirong Differential Revision: D30901654 fbshipit-source-id: daae8692603b8b38203a98f673c455749c2fb855	2021-09-15 11:55:07 -07:00
Kyle Chen	dab6496dbe	[ROCm] Update CI images for ROCm 4.3.1 (#64610 ) Summary: Signed-off-by: Kyle Chen <kylechen@amd.com> reference: https://github.com/pytorch/pytorch/issues/58017 jithunnair-amd jeffdaily arindamroy-eng cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/64610 Reviewed By: seemethere Differential Revision: D30964582 Pulled By: malfet fbshipit-source-id: a8335d3d32d7f1557d3cf6cb055ad0f9c49ef7aa	2021-09-15 11:49:54 -07:00
Yukio Siraichi	54d060a8c9	Port `all` and `any` full reductions to structured kernels. (#64642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64642 Tracking issue: #55070 This PR creates out overloads for both `all` and `any` kernels (full reduction overload), and ports them to structured kernels. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30867354 Pulled By: ezyang fbshipit-source-id: 46bccaf6c94a09ed77cc6c724d1183c82f801751	2021-09-15 11:06:47 -07:00
Scott Wolchok	54cdf651fd	[PyTorch] remove string_view::operator[] bounds check (#64670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64670 Bounds checking is not required for `std::string_view`, and the checking hoses performance for the following performance prototype diff. ghstack-source-id: 138037531 Test Plan: CI Reviewed By: ezyang, bhosmer Differential Revision: D30747515 fbshipit-source-id: 1f4374415a82dfdccce76ea2c6885c13cb93d369	2021-09-15 09:57:58 -07:00
Scott Wolchok	57420a6063	[PyTorch][easy] Add cbegin/cend to SmallVector (#64682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64682 Looks like it was forked from llvm before cbegin and cend existed. ghstack-source-id: 138036981 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D30814434 fbshipit-source-id: 9740fa8d3df1c90b77298a95ab9f1d0cf8c90320	2021-09-15 09:57:56 -07:00
Scott Wolchok	bdbc622988	[PyTorch] Avoid extra std::vector in parseSchemaOrName (#64678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64678 We know we only want one declaration, so let's not create an excess std::vector (and thus a heap allocation) for that. ghstack-source-id: 138036978 Test Plan: CI Reviewed By: dhruvbird, tugsbayasgalan Differential Revision: D30813785 fbshipit-source-id: c67e0100cdef5d894282939fb6d39a57309bc240	2021-09-15 09:56:41 -07:00
Zafar Takhirov	0f1bccb692	[quant] Removing unnecessary import from torch/quantization/quantize.py (#64910 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64910 This bled through from the original location. Removing it is not just refactoring, but also prevents potential recursive imports. ghstack-source-id: 138112663 Test Plan: `buck test mode/dev //caffe2/test:quantization` Reviewed By: vkuzo Differential Revision: D30882924 fbshipit-source-id: 8652a334a5186c635761ea5e50f978d1f1078c12	2021-09-15 09:39:04 -07:00
Don Jang	3fb33b38b9	[Static Runtime] Check if outputs of a node do not overlap with each other (#63013 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63013 This change enhances the current memory overlapping check to include outputs: the enhancement enforces a constraint that all outputs of a node should NOT overlap with each other since they are supposed to be update by a node at the same time, holding the node's outputs. This check will detect a problem like T97393697 immediately in debug mode. Test Plan: - Added a unittest `ProcessedNode.VerifyMemoryOverlapWithOverlappingOutputs` - Ran `inline_cvr` on ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench with this diff and confirmed that the checking condition holds true during the run. Reviewed By: hlu1 Differential Revision: D30211705 fbshipit-source-id: 994d8dace2422e2498e504eb61452a55739238c0	2021-09-15 08:38:05 -07:00
Jane Xu	26e43fe9f3	Forward fix SkipInfo missing mypy (#65063 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/65063 Reviewed By: malfet Differential Revision: D30961556 Pulled By: janeyx99 fbshipit-source-id: 9618e12ba873fb48fe5c846a48d4560ad521eb3e	2021-09-15 08:30:38 -07:00
Hong Xu	fb8bdb8039	When test set_affinity, don't hardcode the CPU ID (#65042 ) Summary: The setaffinity test always fails when the number of CPUs is smaller than 3. Changed the test to be dynamically based on the number of CPUs of the system. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65042 Reviewed By: jbschlosser Differential Revision: D30960554 Pulled By: ejguan fbshipit-source-id: 55ac12714b4b0964b48c3617b79a7a345d40ebce	2021-09-15 08:10:59 -07:00
Kevin Tse	c625f971d3	[DataPipe] Make TarArchiveReader and ZipArchiveReader accepts FileSream with attempt to close and additional warning (#64788 ) Summary: ghstack is not working for the second commit so I'm manually creating this PR for now. Please only look at changes related to the second commit in this PR (there is a PR for the first commit). This PR removes TarArchiveReader's dependency on FileLoader DataPipe, by allowing it to use a IterDataPipe of path names as input rather than a tuple of path name and a stream. It also adds additional tests to ensure that the DataPipe is functioning properly when it is read multiple times or reset half way through reading. The whole stack fixes https://github.com/pytorch/pytorch/issues/64281 - issues related to unclosed buffer stream. Stack: * __->__ https://github.com/pytorch/pytorch/issues/64788 * https://github.com/pytorch/pytorch/issues/64786 cc VitalyFedyunin ejguan Pull Request resolved: https://github.com/pytorch/pytorch/pull/64788 Reviewed By: jbschlosser, ejguan Differential Revision: D30901176 Pulled By: NivekT fbshipit-source-id: 59746a8d0144fc6d3ce0feb2d76445b82e6d414e	2021-09-15 07:34:29 -07:00
Philip Meier	32c5da8cd2	add `OpInfo` for `torch.nn.functional.dropout` (#62315 ) Summary: Addresses facebookresearch/functorch#78. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62315 Reviewed By: mruberry Differential Revision: D30932765 Pulled By: zou3519 fbshipit-source-id: 481c67b59a966b4d640973d252b3e392d8db728e	2021-09-15 07:18:04 -07:00
Jongsoo Park	d6d286f651	[dnnlowp] reduce num of test cases to avoid time out (#64935 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64935 As title Test Plan: CI Reviewed By: dskhudia Differential Revision: D30889157 fbshipit-source-id: 316c808806b084bd2e44c56e1cdb61adf2369a9d	2021-09-14 21:32:12 -07:00
Joel Schlosser	b7ec7d760d	Generic test parametrization functionality (#60753 ) Summary: This PR plays around with implementation & usage of a `parametrize` decorator for test parametrization similar to `pytest.mark.parametrize`, based on previous work introducing a `_TestParametrizer` class. It works with the internal `DeviceTest` hierarchy & composes with `dtype`, `skip`, and other decorators. Basic usage is demonstrated in `test/test_blah.py`: ```python import unittest from itertools import product from torch.testing._internal.common_device_type import ( instantiate_device_type_tests, deviceCountAtLeast, ops) from torch.testing._internal.common_methods_invocations import op_db from torch.testing._internal.common_utils import ( TestCase, run_tests, parametrize, instantiate_parametrized_tests, subtest) class TestBlah(TestCase): parametrize("x", range(5)) def test_default_names(self, x): print('Passed in:', x) # Use default names but add an expected failure. parametrize("x", [subtest(0, decorators=[unittest.expectedFailure]), range(1, 5)]) def test_default_names_expected_failure(self, x): if x == 0: raise RuntimeError('Boom') print('Passed in:', x) parametrize("bias", [False, True], name_fn=lambda b: 'bias' if b else 'no_bias') def test_custom_names(self, bias): print('Passed in:', bias) parametrize("bias", [subtest(True, name='bias'), subtest(False, name='no_bias')]) def test_custom_names_alternate(self, bias): print('Passed in:', bias) parametrize("x,y", [(1, 2), (1, 3), (1, 4)]) def test_two_things_default_names(self, x, y): print('Passed in:', x, y) parametrize("x", [1, 2, 3]) parametrize("y", [4, 5, 6]) def test_two_things_composition(self, x, y): print('Passed in:', x, y) parametrize("x", [subtest(0, decorators=[unittest.expectedFailure]), range(1, 3)]) parametrize("y", [4, 5, subtest(6, decorators=[unittest.expectedFailure])]) def test_two_things_composition_expected_failure(self, x, y): if x == 0 or y == 6: raise RuntimeError('Boom') print('Passed in:', x, y) parametrize("x", [1, 2]) parametrize("y", [3, 4]) parametrize("z", [5, 6]) def test_three_things_composition(self, x, y, z): print('Passed in:', x, y, z) parametrize("x", [1, 2], name_fn=str) parametrize("y", [3, 4], name_fn=str) parametrize("z", [5, 6], name_fn=str) def test_three_things_composition_custom_names(self, x, y, z): print('Passed in:', x, y, z) parametrize("x,y", product(range(2), range(3))) def test_two_things_product(self, x, y): print('Passed in:', x, y) parametrize("x,y", [subtest((1, 2), name='double'), subtest((1, 3), name='triple'), subtest((1, 4), name='quadruple')]) def test_two_things_custom_names(self, x, y): print('Passed in:', x, y) parametrize("x,y", [(1, 2), (1, 3), (1, 4)], name_fn=lambda x, y: '{}_{}'.format(x, y)) def test_two_things_custom_names_alternate(self, x, y): print('Passed in:', x, y) class TestDeviceBlah(TestCase): parametrize("x", range(10)) def test_default_names(self, device, x): print('Passed in:', device, x) parametrize("x,y", [(1, 2), (3, 4), (5, 6)]) def test_two_things(self, device, x, y): print('Passed in:', device, x, y) deviceCountAtLeast(1) def test_multiple_devices(self, devices): print('Passed in:', devices) ops(op_db) parametrize("flag", [False, True], lambda f: 'flag_enabled' if f else 'flag_disabled') def test_op_parametrized(self, device, dtype, op, flag): print('Passed in:', device, dtype, op, flag) instantiate_parametrized_tests(TestBlah) instantiate_device_type_tests(TestDeviceBlah, globals()) if __name__ == '__main__': run_tests() ``` Generated tests: ``` TestBlah.test_custom_names_alternate_bias TestBlah.test_custom_names_alternate_no_bias TestBlah.test_custom_names_bias TestBlah.test_custom_names_no_bias TestBlah.test_default_names_expected_failure_x_0 TestBlah.test_default_names_expected_failure_x_1 TestBlah.test_default_names_expected_failure_x_2 TestBlah.test_default_names_expected_failure_x_3 TestBlah.test_default_names_expected_failure_x_4 TestBlah.test_default_names_x_0 TestBlah.test_default_names_x_1 TestBlah.test_default_names_x_2 TestBlah.test_default_names_x_3 TestBlah.test_default_names_x_4 TestBlah.test_three_things_composition_custom_names_1_3_5 TestBlah.test_three_things_composition_custom_names_1_3_6 TestBlah.test_three_things_composition_custom_names_1_4_5 TestBlah.test_three_things_composition_custom_names_1_4_6 TestBlah.test_three_things_composition_custom_names_2_3_5 TestBlah.test_three_things_composition_custom_names_2_3_6 TestBlah.test_three_things_composition_custom_names_2_4_5 TestBlah.test_three_things_composition_custom_names_2_4_6 TestBlah.test_three_things_composition_x_1_y_3_z_5 TestBlah.test_three_things_composition_x_1_y_3_z_6 TestBlah.test_three_things_composition_x_1_y_4_z_5 TestBlah.test_three_things_composition_x_1_y_4_z_6 TestBlah.test_three_things_composition_x_2_y_3_z_5 TestBlah.test_three_things_composition_x_2_y_3_z_6 TestBlah.test_three_things_composition_x_2_y_4_z_5 TestBlah.test_three_things_composition_x_2_y_4_z_6 TestBlah.test_two_things_composition_expected_failure_x_0_y_4 TestBlah.test_two_things_composition_expected_failure_x_0_y_5 TestBlah.test_two_things_composition_expected_failure_x_0_y_6 TestBlah.test_two_things_composition_expected_failure_x_1_y_4 TestBlah.test_two_things_composition_expected_failure_x_1_y_5 TestBlah.test_two_things_composition_expected_failure_x_1_y_6 TestBlah.test_two_things_composition_expected_failure_x_2_y_4 TestBlah.test_two_things_composition_expected_failure_x_2_y_5 TestBlah.test_two_things_composition_expected_failure_x_2_y_6 TestBlah.test_two_things_composition_x_1_y_4 TestBlah.test_two_things_composition_x_1_y_5 TestBlah.test_two_things_composition_x_1_y_6 TestBlah.test_two_things_composition_x_2_y_4 TestBlah.test_two_things_composition_x_2_y_5 TestBlah.test_two_things_composition_x_2_y_6 TestBlah.test_two_things_composition_x_3_y_4 TestBlah.test_two_things_composition_x_3_y_5 TestBlah.test_two_things_composition_x_3_y_6 TestBlah.test_two_things_custom_names_alternate_1_2 TestBlah.test_two_things_custom_names_alternate_1_3 TestBlah.test_two_things_custom_names_alternate_1_4 TestBlah.test_two_things_custom_names_double TestBlah.test_two_things_custom_names_quadruple TestBlah.test_two_things_custom_names_triple TestBlah.test_two_things_default_names_x_1_y_2 TestBlah.test_two_things_default_names_x_1_y_3 TestBlah.test_two_things_default_names_x_1_y_4 TestBlah.test_two_things_product_x_0_y_0 TestBlah.test_two_things_product_x_0_y_1 TestBlah.test_two_things_product_x_0_y_2 TestBlah.test_two_things_product_x_1_y_0 TestBlah.test_two_things_product_x_1_y_1 TestBlah.test_two_things_product_x_1_y_2 TestDeviceBlahCPU.test_default_names_x_0_cpu TestDeviceBlahCPU.test_default_names_x_1_cpu TestDeviceBlahCPU.test_default_names_x_2_cpu TestDeviceBlahCPU.test_default_names_x_3_cpu TestDeviceBlahCPU.test_default_names_x_4_cpu TestDeviceBlahCPU.test_default_names_x_5_cpu TestDeviceBlahCPU.test_default_names_x_6_cpu TestDeviceBlahCPU.test_default_names_x_7_cpu TestDeviceBlahCPU.test_default_names_x_8_cpu TestDeviceBlahCPU.test_default_names_x_9_cpu TestDeviceBlahCPU.test_multiple_devices_cpu TestDeviceBlahCPU.test_op_parametrized_<opname>_<variant>_cpu_uint8_flag_enabled_cpu TestDeviceBlahCPU.test_two_things_x_1_y_2_cpu TestDeviceBlahCPU.test_two_things_x_3_y_4_cpu TestDeviceBlahCPU.test_two_things_x_5_y_6_cpu TestDeviceBlahMETA.test_default_names_x_0_meta TestDeviceBlahMETA.test_default_names_x_1_meta TestDeviceBlahMETA.test_default_names_x_2_meta TestDeviceBlahMETA.test_default_names_x_3_meta TestDeviceBlahMETA.test_default_names_x_4_meta TestDeviceBlahMETA.test_default_names_x_5_meta TestDeviceBlahMETA.test_default_names_x_6_meta TestDeviceBlahMETA.test_default_names_x_7_meta TestDeviceBlahMETA.test_default_names_x_8_meta TestDeviceBlahMETA.test_default_names_x_9_meta TestDeviceBlahMETA.test_multiple_devices_meta TestDeviceBlahMETA.test_op_parametrized_<opname>_<variant>_meta_uint8_flag_enabled_meta TestDeviceBlahMETA.test_two_things_x_1_y_2_meta TestDeviceBlahMETA.test_two_things_x_3_y_4_meta TestDeviceBlahMETA.test_two_things_x_5_y_6_meta ``` Caveats: `parametrize` decorators cannot be "stacked" yet; each one overwrites the previous. This will change to either: * Allow stacking of multiple decorators * Error out with a nice error message if multiple decorators are specified The PR introduces `instantiate_parametrized_tests()` in addition to `instantiate_device_type_tests()`. The former should be used for non-device-specific tests, and the latter should be used for device-specific tests, as usual. Both of these support the `parametrize` decorator. Only the latter supports the `ops` decorator (no change here- this was already the case). Pull Request resolved: https://github.com/pytorch/pytorch/pull/60753 Reviewed By: saketh-are Differential Revision: D30606615 Pulled By: jbschlosser fbshipit-source-id: a34f36d643f68a6e221f419d9bb3e1ae1d84dd65	2021-09-14 19:52:59 -07:00
Sangbaek Park	6ab97fbc28	[vulkan] Use volk to load vulkan libraries and fix Windows build errors (#64988 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64988 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64968 The current wrapper (provided by [Vulkan-Tools](https://github.com/KhronosGroup/Vulkan-Tools/tree/master/common)) can't handle dynamically loading Vulkan on Windows/Mac. Therefore, we can bring in [volk](https://github.com/zeux/volk) to load the vulkan libraries for other platforms. 1. Use `volk` with `link_style="static"` only if Windows. Use `vulkan_wrapper` for all others (temporary solution) 2. Make DotSlash work on Windows when resolving glslc path Test Plan: For Android: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` For Mac: ``` buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64 ``` On Local OSS repo with `pr/64988` branch: The build and test are fine. Note that `VulkanAPITest.log_softmax()` has been broken for the past month. Ivan will take a look at when he is available. Build: `BUILD_TEST=1 USE_VULKAN=1 USE_VULKAN_SHADERC_RUNTIME=1 USE_VULKAN_WRAPPER=0 MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install` Test: `$PYTORCH_ROOT/build/bin/vulkan_api_test /data/local/tmp` ``` Running main() from ../third_party/googletest/googletest/src/gtest_main.cc [==========] Running 69 tests from 1 test suite. [----------] Global test environment set-up. [----------] 69 tests from VulkanAPITest [ RUN ] VulkanAPITest.adaptive_avg_pool2d [ OK ] VulkanAPITest.adaptive_avg_pool2d (228 ms) [ RUN ] VulkanAPITest.add [ OK ] VulkanAPITest.add (51 ms) [ RUN ] VulkanAPITest.add_broadcast0 [ OK ] VulkanAPITest.add_broadcast0 (13 ms) [ RUN ] VulkanAPITest.add_broadcast1 [ OK ] VulkanAPITest.add_broadcast1 (9 ms) [ RUN ] VulkanAPITest.add_broadcast2 [ OK ] VulkanAPITest.add_broadcast2 (9 ms) [ RUN ] VulkanAPITest.add_ [ OK ] VulkanAPITest.add_ (60 ms) [ RUN ] VulkanAPITest.add_broadcast0_ [ OK ] VulkanAPITest.add_broadcast0_ (10 ms) [ RUN ] VulkanAPITest.add_broadcast1_ [ OK ] VulkanAPITest.add_broadcast1_ (1 ms) [ RUN ] VulkanAPITest.add_scalar [ OK ] VulkanAPITest.add_scalar (24 ms) [ RUN ] VulkanAPITest.add_scalar_ [ OK ] VulkanAPITest.add_scalar_ (8 ms) [ RUN ] VulkanAPITest.addmm [ OK ] VulkanAPITest.addmm (22 ms) [ RUN ] VulkanAPITest.addmm_expand [ OK ] VulkanAPITest.addmm_expand (12 ms) [ RUN ] VulkanAPITest.avg_pool2d [ OK ] VulkanAPITest.avg_pool2d (9 ms) [ RUN ] VulkanAPITest.clamp [ OK ] VulkanAPITest.clamp (92 ms) [ RUN ] VulkanAPITest.clamp_ [ OK ] VulkanAPITest.clamp_ (60 ms) [ RUN ] VulkanAPITest.conv2d [ OK ] VulkanAPITest.conv2d (15 ms) [ RUN ] VulkanAPITest.conv2d_dw [ OK ] VulkanAPITest.conv2d_dw (15 ms) [ RUN ] VulkanAPITest.conv2d_pw [ OK ] VulkanAPITest.conv2d_pw (34 ms) [ RUN ] VulkanAPITest.conv2d_winograd [ OK ] VulkanAPITest.conv2d_winograd (10 ms) [ RUN ] VulkanAPITest.copy [ OK ] VulkanAPITest.copy (1 ms) [ RUN ] VulkanAPITest.div [ OK ] VulkanAPITest.div (32 ms) [ RUN ] VulkanAPITest.div_broadcast0 [ OK ] VulkanAPITest.div_broadcast0 (11 ms) [ RUN ] VulkanAPITest.div_broadcast1 [ OK ] VulkanAPITest.div_broadcast1 (9 ms) [ RUN ] VulkanAPITest.div_broadcast2 [ OK ] VulkanAPITest.div_broadcast2 (7 ms) [ RUN ] VulkanAPITest.div_ [ OK ] VulkanAPITest.div_ (46 ms) [ RUN ] VulkanAPITest.div_broadcast0_ [ OK ] VulkanAPITest.div_broadcast0_ (9 ms) [ RUN ] VulkanAPITest.div_broadcast1_ [ OK ] VulkanAPITest.div_broadcast1_ (2 ms) [ RUN ] VulkanAPITest.div_scalar [ OK ] VulkanAPITest.div_scalar (95 ms) [ RUN ] VulkanAPITest.div_scalar_ [ OK ] VulkanAPITest.div_scalar_ (18 ms) [ RUN ] VulkanAPITest.empty [ OK ] VulkanAPITest.empty (0 ms) [ RUN ] VulkanAPITest.hardsigmoid [ OK ] VulkanAPITest.hardsigmoid (76 ms) [ RUN ] VulkanAPITest.hardsigmoid_ [ OK ] VulkanAPITest.hardsigmoid_ (80 ms) [ RUN ] VulkanAPITest.hardshrink [ OK ] VulkanAPITest.hardshrink (630 ms) [ RUN ] VulkanAPITest.hardshrink_ [ OK ] VulkanAPITest.hardshrink_ (573 ms) [ RUN ] VulkanAPITest.leaky_relu [ OK ] VulkanAPITest.leaky_relu (271 ms) [ RUN ] VulkanAPITest.leaky_relu_ [ OK ] VulkanAPITest.leaky_relu_ (254 ms) [ RUN ] VulkanAPITest.hardswish [ OK ] VulkanAPITest.hardswish (83 ms) [ RUN ] VulkanAPITest.hardswish_ [ OK ] VulkanAPITest.hardswish_ (72 ms) [ RUN ] VulkanAPITest.max_pool2d [ OK ] VulkanAPITest.max_pool2d (16 ms) [ RUN ] VulkanAPITest.mean [ OK ] VulkanAPITest.mean (17 ms) [ RUN ] VulkanAPITest.mean2d [ OK ] VulkanAPITest.mean2d (20 ms) [ RUN ] VulkanAPITest.mm [ OK ] VulkanAPITest.mm (12 ms) [ RUN ] VulkanAPITest.mul [ OK ] VulkanAPITest.mul (28 ms) [ RUN ] VulkanAPITest.mul_broadcast0 [ OK ] VulkanAPITest.mul_broadcast0 (9 ms) [ RUN ] VulkanAPITest.mul_broadcast1 [ OK ] VulkanAPITest.mul_broadcast1 (9 ms) [ RUN ] VulkanAPITest.mul_broadcast2 [ OK ] VulkanAPITest.mul_broadcast2 (9 ms) [ RUN ] VulkanAPITest.mul_ [ OK ] VulkanAPITest.mul_ (43 ms) [ RUN ] VulkanAPITest.mul_broadcast0_ [ OK ] VulkanAPITest.mul_broadcast0_ (8 ms) [ RUN ] VulkanAPITest.mul_broadcast1_ [ OK ] VulkanAPITest.mul_broadcast1_ (1 ms) [ RUN ] VulkanAPITest.mul_scalar [ OK ] VulkanAPITest.mul_scalar (64 ms) [ RUN ] VulkanAPITest.mul_scalar_ [ OK ] VulkanAPITest.mul_scalar_ (17 ms) [ RUN ] VulkanAPITest.reflection_pad2d [ OK ] VulkanAPITest.reflection_pad2d (7 ms) [ RUN ] VulkanAPITest.reshape [ OK ] VulkanAPITest.reshape (73 ms) [ RUN ] VulkanAPITest.reshape_ [ OK ] VulkanAPITest.reshape_ (41 ms) [ RUN ] VulkanAPITest.sigmoid [ OK ] VulkanAPITest.sigmoid (81 ms) [ RUN ] VulkanAPITest.sigmoid_ [ OK ] VulkanAPITest.sigmoid_ (68 ms) [ RUN ] VulkanAPITest.softmax [ OK ] VulkanAPITest.softmax (28 ms) [ RUN ] VulkanAPITest.log_softmax Max Diff allowed: 5.87862e-05 ../aten/src/ATen/test/vulkan_api_test.cpp:1470: Failure Value of: check Actual: false Expected: true [ FAILED ] VulkanAPITest.log_softmax (19 ms) [ RUN ] VulkanAPITest.tanh [ OK ] VulkanAPITest.tanh (63 ms) [ RUN ] VulkanAPITest.tanh_ [ OK ] VulkanAPITest.tanh_ (68 ms) [ RUN ] VulkanAPITest.sub [ OK ] VulkanAPITest.sub (28 ms) [ RUN ] VulkanAPITest.sub_broadcast0 [ OK ] VulkanAPITest.sub_broadcast0 (9 ms) [ RUN ] VulkanAPITest.sub_broadcast1 [ OK ] VulkanAPITest.sub_broadcast1 (9 ms) [ RUN ] VulkanAPITest.sub_broadcast2 [ OK ] VulkanAPITest.sub_broadcast2 (8 ms) [ RUN ] VulkanAPITest.sub_ [ OK ] VulkanAPITest.sub_ (43 ms) [ RUN ] VulkanAPITest.sub_broadcast0_ [ OK ] VulkanAPITest.sub_broadcast0_ (10 ms) [ RUN ] VulkanAPITest.sub_broadcast1_ [ OK ] VulkanAPITest.sub_broadcast1_ (2 ms) [ RUN ] VulkanAPITest.upsample_nearest2d [ OK ] VulkanAPITest.upsample_nearest2d (5 ms) [ RUN ] VulkanAPITest.mobilenetv2 [ OK ] VulkanAPITest.mobilenetv2 (82 ms) [----------] 69 tests from VulkanAPITest (3885 ms total) [----------] Global test environment tear-down [==========] 69 tests from 1 test suite ran. (3885 ms total) [ PASSED ] 68 tests. [ FAILED ] 1 test, listed below: [ FAILED ] VulkanAPITest.log_softmax 1 FAILED TEST ``` Differential Revision: D30925995 fbshipit-source-id: 1b1b7f7f22090064424a5379d2f0559d0da7846a	2021-09-14 19:35:05 -07:00
Kshiteej K	ff6b475d4a	[fix] don't expose unique_dim in torch (#63080 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62793 This is mostly a quick fix. I think the more correct fix could be updating `unique_dim` to `_unique_dim` which could be BC-breaking for C++ users (� maybe). Maybe something else I am missing. ~~Not sure how to add a test for it.~~ Have tested it locally. We can add a test like following. Tested this locally, it fails currently but passes with the fix. ```python def test_wildcard_import(self): exec('from torch import *') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63080 Reviewed By: gchanan Differential Revision: D30738711 Pulled By: zou3519 fbshipit-source-id: b86d0190e45ba0b49fd2cffdcfd2e3a75cc2a35e	2021-09-14 18:19:17 -07:00
Michael Carilli	36cac2be4d	[CUDA graphs] moves memory sharing intro paragraph (#64996 ) Summary: Puts memory sharing intro under Sharing memory... header, where it should have been all along. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64996 Reviewed By: mruberry Differential Revision: D30948619 Pulled By: ngimel fbshipit-source-id: 5d9dd267b34e9d3fc499d4738377b58a22da1dc2	2021-09-14 17:53:43 -07:00
Supriya Rao	36a0d97281	Revert D30558877: Ported std/var to ReductionOpInfo and minimum/maximum to BinaryUfuncInfo Test Plan: revert-hammer Differential Revision: D30558877 (`382e008fbf`) Original commit changeset: 3e62ff24a935 fbshipit-source-id: 3b9f03c1f43c6d5f2738ed139d0236f2ded78dbf	2021-09-14 17:33:38 -07:00
Yi Wang	3d312b3b8e	[Model Averaging] Simplify PostLocalSGD Optimizer API (#64885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64885 1) The constructor accepts a local optimizer instance instead of the inputs of local optimizer constructor and the class type. 2) The parameters are read from local optimizer's `param_groups` instead of a separate input. Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 137865867 Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity Reviewed By: rohan-varma Differential Revision: D30888794 fbshipit-source-id: 21261b480f6bbb9b2333426020e3f350da3f73c2	2021-09-14 16:37:14 -07:00
Heitor Schueroff	382e008fbf	Ported std/var to ReductionOpInfo and minimum/maximum to BinaryUfuncInfo (#63978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63978 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D30558877 Pulled By: heitorschueroff fbshipit-source-id: 3e62ff24a935784fc93a76a0f46a1deb060ba680	2021-09-14 16:18:09 -07:00
Erjia Guan	c65128679b	[DataPipe] Improve Mapper to accept input/output index when apply fn (#64951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64951 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30910035 Pulled By: ejguan fbshipit-source-id: d687fe10939920a3617a60552fe743e8526438a0	2021-09-14 15:46:42 -07:00
Jerry Zhang	670853295a	[quant][tensorrt] Add tensorrt backend config (#64623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64623 The config api will change, but we'll add configs gradually for TensorRT to unblock experimentation Test Plan: python torch/fx/experimental/fx2trt/example/unittests.py Imported from OSS Reviewed By: vkuzo Differential Revision: D30800474 fbshipit-source-id: 3c4640de1205a0f19b62943ab84f386d80394ec2	2021-09-14 15:27:33 -07:00
Scott Wolchok	85222c050f	[PyTorch] Add c10::hash<c10::ArrayRef<T>> (#64277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64277 Just moved the vector implementation to ArrayRef and re-implemented the former using the latter. ghstack-source-id: 137978947 Test Plan: existing CI Reviewed By: dhruvbird Differential Revision: D30647666 fbshipit-source-id: c0f4f06c348d36882ec0db802be44d8c7749562f	2021-09-14 14:22:12 -07:00
Scott Wolchok	5d4efed83e	[PyTorch] Add OpCode cache in ByteCodeDeserializer (#64110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64110 As the code comment says, we can exploit pickler string interning to accelerate OpCode parsing. No more strcmp! ghstack-source-id: 137978946 Test Plan: Pixel 3 before: https://www.internalfb.com/intern/aibench/details/591414145082422 Pixel 3 after: https://www.internalfb.com/intern/aibench/details/484557404703261 new mean is 292 ms, down from 302 ms. Reviewed By: dhruvbird Differential Revision: D30615052 fbshipit-source-id: 9707625e778388a7920ab72704d71ad57ddaac17	2021-09-14 14:22:10 -07:00
Scott Wolchok	a9121df09c	[PyTorch] Remove implicit conversion from Tuple to vector reference (#63993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63993 This seems to be unused, and it's pretty scary. ghstack-source-id: 137978949 Test Plan: CI Reviewed By: lw Differential Revision: D30560441 fbshipit-source-id: 08b7ce971fd1e2dbeddbf37b02413fef513b4753	2021-09-14 14:22:08 -07:00
Scott Wolchok	452402b984	[PyTorch] Fix SourceRangeDeserializer vector copy (#64031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64031 More copies of tuple elements. ghstack-source-id: 137978948 Test Plan: Pixel 3 before: https://our.intern.facebook.com/intern/aibench/details/724509739115867 Pixel 3 after: https://our.intern.facebook.com/intern/aibench/details/232361457767293 Top-line number doesn't seem to have moved, but we can see that the vector copy disappeared in the flame graph. Reviewed By: raziel Differential Revision: D30559545 fbshipit-source-id: e5343abae96b8e80e0ccec482ad316884ae231ea	2021-09-14 14:20:45 -07:00
Shiyan Deng	57eda69219	[fx2trt] fix elementwise op converter with one operand being a literal and has different type (#65004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65004 If we have some code like `torch.add(x, 1)` and x is a float tensor then in conversion things would falling apart because currently we will add a constant layer of int32 dtype for `1` but we actually need float dtype. This diff adds an arg to `get_trt_tensor` which specify the dtype of the constant layer we would created. Also, start to add doc string for functions. Reviewed By: yinghai Differential Revision: D30852156 fbshipit-source-id: 650ce72d2794093a4616e640ea503dcc1c6b2bc4	2021-09-14 12:27:37 -07:00
Salil Desai	3727baea6f	[PyTorch Edge][Model Loading] Operator Call De-dup at TorchScript Serialization Level [2/2] (#64269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64269 Revert changes in D29826210 (`693d8f2f07`) (we don't need operator lambda caching since there aren't duplicate operators anymore) This diff stack results in an additional approx 12% speedup in model loading time (from 229ms to 200ms) when run against an 87MB speech model that jiatongzhou provided. ghstack-source-id: 138014904 Test Plan: Speech Transducer v25 model (as in D29826210 (`693d8f2f07`)) \|\| Before \| After \| \|Load Time\|[229ms](https://www.internalfb.com/intern/aibench/details/160889436133243)\|[200ms](https://www.internalfb.com/intern/aibench/details/837884532607514)\| \|Save File Size\|[86.23 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658544950)\|[86.1 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658554403)\| The "after" flamegraph shows significantly less time is spent on ```append_operator``` than before. Steps - Check out desired commit in devserver (base branch or this diff) - ```buck build bento/kernels:bento_kernel_pytorch``` - Use N1094068 with pytorch_local kernel to save model for lite interpreter - Edit ```aibench/specifications/models/pytorch/speech_transducer/v25.json ``` to have new model location and md5 - ```buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/speech_transducer/v25.json --framework pytorch --platform android/arm64 --devices "S8US" --force_profile --remote ``` Test that saving a model with de-dup ops doesn't change its output https://www.internalfb.com/intern/anp/view/?id=1137434 Reviewed By: iseeyuan Differential Revision: D30615710 fbshipit-source-id: bb4052f0f16eccab386585e94411056f94bce43c	2021-09-14 12:12:46 -07:00
Salil Desai	86e6bed0d4	[PyTorch Edge][Model Loading] Operator Call De-dup at TorchScript Serialization Level [1/2] (#64268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64268 If the same pair of operator name and num inputs have been used to add an instruction to the operator table previously (and the operator's schema is not vararg), use the same index as that instruction rather than creating a new one. ghstack-source-id: 138014905 Test Plan: Phabricator tests, and test performance changes in next diff Reviewed By: iseeyuan, tugsbayasgalan Differential Revision: D30615434 fbshipit-source-id: f442f557f12412693a73004ce44733ccef063b82	2021-09-14 12:11:32 -07:00
Eli Uriegas	97df69eac6	.github: Add render test results step (#64937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64937 Adds CLI output for rendered test results to go alongside test exeuction, users should be able to quickly diagnose test failures like so: ![fdsfdsfdsfdsf](https://user-images.githubusercontent.com/1700823/133156245-ba939cbf-8aa2-47a7-b1fb-7cc876ca75c4.png) Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D30917897 Pulled By: seemethere fbshipit-source-id: f51ea499462e3cfd64496cb711b84a93971c91bd	2021-09-14 11:25:14 -07:00
Natalia Gimelshein	d188204323	remove SkipInfo class (#64972 ) Summary: per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/64972 Reviewed By: mruberry Differential Revision: D30924598 Pulled By: ngimel fbshipit-source-id: 1ac1ec8fd50ca27e3cd36c12a588d334e7466899	2021-09-14 11:23:54 -07:00
Scott Wolchok	eedc234e33	[PyTorch] Don't store multiple kernels per key on mobile (#64447 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64447 As the code comment says, we needn't worry about Jupyter notebooks on mobile. ghstack-source-id: 137951718 Test Plan: Profiled startup of //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark on devserver with -niter 0 -nrep 0 and `C10_DISPATCHER_ONE_KERNEL_PER_DISPATCH_KEY` defined. Time spent in sherwood_v3_table lookups went way down. Reviewed By: ezyang, bhosmer Differential Revision: D30736094 fbshipit-source-id: bcc22cd0d9adceba259a03898c992759d501fe89	2021-09-14 10:36:43 -07:00
Shiyan Deng	446d95a7f6	[fx const fold] fix some cases with deep model hierarchy (#64945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64945 In the const folding pass, we try to create `get_attr` nodes in submod_1 for `get_attr` nodes that are in the main graph. But we don't have the real attributes in submod_1. To fix this we assign main module as the owning module of sumod_1 graph. The fix above would cause problem for `call_module` node in submod_1 because during split modules gets inlined (target changed from "mod.a.b" -> "mod_a_b") to submod_1. Changing the owning module would make those `call_module nodes unable to find the referring module. To fix this, we set the targeting module to main module. Reviewed By: jfix71 Differential Revision: D30905949 fbshipit-source-id: cd67bc8fe4b8ad4344ae97b8e36753fdce3ece6d	2021-09-14 09:45:44 -07:00
Yi Wang	00e6e0c593	[Model Averaging] Revert #63895 (#64903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64903 Fix the accuracy regression caused by https://github.com/pytorch/pytorch/pull/63895. Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_periodic_model_averager buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity Reviewed By: rohan-varma Differential Revision: D30894688 fbshipit-source-id: fe00b8b23b860d9f806f87c1b6caba1d0b807485	2021-09-14 09:45:42 -07:00
Nick Kreeger	882b67dff4	Drop incremental linking on Windows with REL_WITH_DEB_INFO=1. (#64892 ) Summary: The library will no longer link properly on VS 2019 (14.29.30133). To ensure that engineers building on Windows can use and debug with this build type, incremental linking needs to be turned off for this build flag. Verified that this build type successfully builds, links, and provides debuggable Python modules on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64892 Reviewed By: jbschlosser Differential Revision: D30902565 Pulled By: malfet fbshipit-source-id: e5286a4c6f45c7cbe4cdc1b98560129bd386970b	2021-09-14 09:44:18 -07:00
Nikita Shulga	01cfea9485	Disable target determination for now (#64921 ) Summary: There were several reports of target determinator incorrectly skipping tests, most recent one is https://github.com/pytorch/pytorch/issues/64902 Let's disable it until it could be further stabilized Pull Request resolved: https://github.com/pytorch/pytorch/pull/64921 Reviewed By: seemethere, janeyx99 Differential Revision: D30901186 Pulled By: malfet fbshipit-source-id: 531afd2d390c6b51f727330d5dd1882d70b6fdde	2021-09-14 09:40:13 -07:00
Jane (Yuan) Xu	4e225da363	print_test_stats.py: dedup test report upload name with TEST_CONFIG (#64948 ) Summary: Connected with issue https://github.com/pytorch/pytorch/issues/64845, takeover of https://github.com/pytorch/pytorch/issues/64091 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64948 Reviewed By: malfet, seemethere Differential Revision: D30908592 Pulled By: janeyx99 fbshipit-source-id: dc31b0bbc9f4e35d23412aa14acbbab7422b4146	2021-09-14 09:01:06 -07:00
Richard Zou	e884554008	Make {select,slice,diagonal}_backward primitives wrt autograd (#64933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64933 Fixes https://github.com/facebookresearch/functorch/issues/108 This is a short-term fix. A longer-term fix would be to either: 1. have proper {select,slice,diagonal}_embed functions 2. have efficient {select,slice,diagonal}_scatter functions (and efficient zero tensors). NB: I didn't use diag_embed because diag_embed is slightly different from diagonal_backward. There are no BC concerns because TorchScript (luckily) does not serialize the backwards graph. Test Plan: - run tests - run benchmarks. https://gist.github.com/zou3519/e7c0774d1ac97f32aa02ec44d81e60e1. Surprisingly the instruction count goes down. This is probably because we create fewer autograd nodes now. Reviewed By: ezyang Differential Revision: D30909333 Pulled By: zou3519 fbshipit-source-id: 3b33e13010ba13b4d487b346aa9bee8a0e8c378c	2021-09-14 08:10:59 -07:00
Yukio Siraichi	2853c7da22	Replace composite dispatch with `CompositeExplicitAutograd` (#64641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64641 `sum`, `mean`, and `norm` were ported to structured kernels in #61642, #61643, and #62711, respectively. Those PRs changed related overlads into composite kernels. However, their dispatch section remained the same, when they really should be marked as `CompositeExplicitAutograd`. This PR fixes this issue. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30867122 Pulled By: ezyang fbshipit-source-id: b951aee41a3cab9ca546df826a285d60013e3b3a	2021-09-14 07:56:54 -07:00
Edward Yang	09d221e8d4	Revert D30711934: [pytorch][PR] Use RDS for build size tracking Test Plan: revert-hammer Differential Revision: D30711934 (`1cd0252eed`) Original commit changeset: 0af808ddf528 fbshipit-source-id: 6f67ed5cbaf333cc55729be2a23e385772e31b10	2021-09-14 06:10:03 -07:00
Mikhail Zolotukhin	f23f21dafe	[TensorExpr] Remove 'Placeholder' class. (#64887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64887 BufHandle has exactly the same functionality and should be used instead. Differential Revision: D30889483 D30889483 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 365fe8e396731b88920535a3de96bd3301aaa3f3	2021-09-14 00:22:44 -07:00
Mikhail Zolotukhin	199031c48e	[TensorExpr] PyBinds: improve QoL of pybind users. (#64886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64886 Bind methods for implicit conversions and constructors to avoid boilerplate code. Differential Revision: D30889193 D30889193 Test Plan: Imported from OSS Reviewed By: jbschlosser Pulled By: ZolotukhinM fbshipit-source-id: 137c0c98f7f1576e1bb97c8de8a900b28407a30e	2021-09-14 00:21:28 -07:00
Peter Bell	caaa6efc1a	Fix use of deprecated tensor.type() in SegmentReduce.cpp (#64151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64151 Reviewed By: mruberry Differential Revision: D30917268 Pulled By: ngimel fbshipit-source-id: 63427372b651ac495d48ef552eba5fbf0e4378e9	2021-09-13 23:16:47 -07:00
Supriya Rao	d4b4d83521	[quant] handle empty input in fused_moving_avg_obs_fake_quant op (#64829 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64829 If an empty input is passed in, the aminmax operator fails with a runtime error like ``` RuntimeError: aminmax(): cannot compute aminmax over an empty dimension as the operation has no identity. ``` To avoid this during training we just return the input if we find it to be empty Test Plan: python test/test_quantization.py TestFusedObsFakeQuant Imported from OSS Reviewed By: jingsh Differential Revision: D30870879 fbshipit-source-id: 0cb4b187449a45a37150a77510d2292f93a7d1cd	2021-09-13 22:22:31 -07:00
Ivan Yashchuk	0aef44cb3d	Add forward AD for torch.linalg.eigh (#62163 ) Summary: This PR adds forward mode differentiation for `torch.linalg.eigh` and a few other functions required for tests to pass. For some reason running tests for `torch.linalg.eigvalsh` and complex `torch.linalg.eigh` hangs. These tests are skipped for now. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry heitorschueroff walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62163 Reviewed By: jbschlosser Differential Revision: D30903988 Pulled By: albanD fbshipit-source-id: d6a74adb9e6d2f4be8ac707848ecabf06d629823	2021-09-13 21:15:38 -07:00
Natalia Gimelshein	35c82dbf5c	[THC] remove TensorTypeUtils and TensorInfo (#64965 ) Summary: per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/64965 Reviewed By: mruberry Differential Revision: D30916754 Pulled By: ngimel fbshipit-source-id: b24020d6a7ce8a05a5ab6c579d176dd94dd3b1d7	2021-09-13 20:36:28 -07:00
Xiang Gao	816048e7e6	EmbeddingBag sort thrust->cub (#64498 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/57505 Also fixes a warning I found when compiling: ``` /home/gaoxiang/pytorch-cub/torch/csrc/distributed/c10d/quantization/quantization_gpu.cu(7): warning: inline qualifier ignored for "__global__" function ``` I also updated the bfloat16 guard to CUDA 11.5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64498 Reviewed By: mruberry Differential Revision: D30917077 Pulled By: ngimel fbshipit-source-id: fb9df08fd469038478a563014b5af7452b4b28c0	2021-09-13 19:51:12 -07:00
Chiang, Yu-Hsun (oToToT)	ed30afd480	Speed up torch.unique_consecutive() (#64835 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62690 Like the way `unique_consecutive_cpu_template` implemented, this PR reimplements `_unique_dim_cpu_impl` to get better performance. Also, because the overhead of `unique_dim_consecutive_cpu` is quite large, directly call `unique_consecutive_cpu_template` when we know the given input is a 1d-array. ## Benchmark ### Script ```python import torch import time torch.manual_seed(0) t = torch.randint(500, (10000000, )) t = torch.sort(t)[0] start = time.time() uniques, inverse, counts = torch.unique_consecutive(t, dim=0, return_inverse=True, return_counts=True) end = time.time() print("torch.unique_consecutive(dim=0) time:", end - start) start = time.time() uniques2, inverse2, counts2 = torch.unique_consecutive(t, return_inverse=True, return_counts=True) end = time.time() print("torch.unique_consecutive() time:", end - start) t = torch.randint(500, (10000000, 2)) t = torch.sort(t)[0] start = time.time() uniques, inverse, counts = torch.unique_consecutive(t, dim=0, return_inverse=True, return_counts=True) end = time.time() print("torch.unique_consecutive(dim=0) time:", end - start) start = time.time() uniques, inverse, counts = torch.unique_consecutive(t, dim=1, return_inverse=True, return_counts=True) end = time.time() print("torch.unique_consecutive(dim=1) time:", end - start) ``` ### Before ``` torch.unique_consecutive(dim=0) time: 78.64345622062683 torch.unique_consecutive() time: 0.029544353485107422 torch.unique_consecutive(dim=0) time: 91.49796152114868 torch.unique_consecutive(dim=1) time: 0.30872368812561035 ``` ### After ``` torch.unique_consecutive(dim=0) time: 0.08256125450134277 torch.unique_consecutive() time: 0.08162403106689453 torch.unique_consecutive(dim=0) time: 35.58408498764038 torch.unique_consecutive(dim=1) time: 1.6258199214935303 ``` ## System Information ``` Collecting environment information... PyTorch version: 1.10.0a0+git7f1932e Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.3 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: 10.0.0-4ubuntu1 CMake version: version 3.16.3 Libc version: glibc-2.31 Python version: 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.11.0-34-generic-x86_64-with-glibc2.29 Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Versions of relevant libraries: [pip3] numpy==1.21.2 [pip3] torch==1.10.0a0+gitbe09195 [conda] Could not collect ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64835 Reviewed By: jbschlosser Differential Revision: D30894906 Pulled By: ngimel fbshipit-source-id: 42ab76d638391ce6c4e589d9c71bdf7579310ad9	2021-09-13 19:00:36 -07:00
Vitaly Fedyunin	ab5e1c69a7	[WIP] Example of DataPipes and DataFrames integration (#60840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60840 Test Plan: Imported from OSS Reviewed By: wenleix, ejguan Differential Revision: D29461080 Pulled By: VitalyFedyunin fbshipit-source-id: 4909394dcd39e97ee49b699fda542b311b7e0d82	2021-09-13 18:50:15 -07:00
driazati	ee554e2e96	Re-land Fix test report uploading (#64958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64958 This is a re-do of #64846 which was missing a path prefix for windows test reports Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D30915253 Pulled By: driazati fbshipit-source-id: d14d0a64d2f8aabc335db9c4d0d2b63512887c66	2021-09-13 18:36:26 -07:00
Tao Xu	f159f12fee	[iOS][OSS][BE] Add Simulator tests for full JIT (#64851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64851 ghstack-source-id: 137970229 Test Plan: CircleCI Reviewed By: hanton, cccclai Differential Revision: D30877963 fbshipit-source-id: 7bb8ade1959b85c3902ba9dc0660cdac8f558d64	2021-09-13 18:16:08 -07:00
Emad El-Haraty	fd09e564d6	add acc_ops.max, acc_ops.maximum, consolidate acc_ops.min and acc_ops.minimum Summary: This diff adds `acc_ops.max` and `acc_ops.maximum` support. It further consolidates the logic for `acc_ops.min` and `acc_ops.minimum` to match the logic for max. torch.max has three behaviors: ```1. max(input) 2. max(input, dim, keepdim=False, , out=None) 3. max(input, other, , out=None) ``` Likewise, `torch.min` has three identical behaviors. I've chosen to implement each as an acc_op, then map to the appropriate one. the third max function is effectively `torch.maximum`, so I've implemented it as that. Reviewed By: yinghai, jfix71, 842974287 Differential Revision: D30551464 fbshipit-source-id: 0a2eec10e5185cbf7d9984eec3fd399b23528b2a	2021-09-13 18:04:33 -07:00
CaoE	3855c24639	Add BFloat16 support for cross, tril, triu, tril_indices, triu_indices and cumsum operators on CPU (#62454 ) Summary: Add BFloat16 support for cross, tril, triu, tril_indices, triu_indices and cumsum operators on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62454 Reviewed By: albanD Differential Revision: D30845805 Pulled By: heitorschueroff fbshipit-source-id: f83836862e38109ec929e83567133e9e88096b8b	2021-09-13 17:59:43 -07:00
David Riazati	1cd0252eed	Use RDS for build size tracking (#64303 ) Summary: This adds 2 utilities: `register_rds_table` and `rds_write`. `register_rds_table` needs to be called once with the schema for the data that `rds_write` will write. These go to a lambda called `rds-proxy`, which will write to/read from the DB as necessary. This data can then be arbitrarily queried via `rds-proxy` (for use in CI) or on metrics.pytorch.org (for analysis). It also hooks these up for build size tracking (which previously was not working on GHA) TODO: * verify output in logs + clean up prints Pull Request resolved: https://github.com/pytorch/pytorch/pull/64303 Reviewed By: malfet, seemethere Differential Revision: D30711934 Pulled By: driazati fbshipit-source-id: 0af808ddf528a24875a378caeb1aa9cb0693f802	2021-09-13 17:48:44 -07:00
Nikita Shulga	c4073af61d	Add `skipIfTBB` decorator (#64942 ) Summary: And replace two existing usages in the codebase with it Pull Request resolved: https://github.com/pytorch/pytorch/pull/64942 Reviewed By: jbschlosser Differential Revision: D30906382 Pulled By: malfet fbshipit-source-id: e7f20f53aff734b0379eded361255543dab4fa4b	2021-09-13 17:11:51 -07:00
Victor Quach	8131bc85d0	Raise TypeError on assigned grad with wrong type (#64876 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64813 Raises a TypeError when assigned value to a grad is not a Tensor or None. Adds tests. cc ezyang gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/64876 Reviewed By: anjali411 Differential Revision: D30901678 Pulled By: soulitzer fbshipit-source-id: dbb3cb5fd0bbac6918e0b2e2f51d340daa43dee0	2021-09-13 16:41:45 -07:00
Natalia Gimelshein	1e25a84993	kill SkipInfo (#64878 ) Summary: Per offline discussion, replaces SkipInfo with DecorateInfo. SkipInfo class itself is not removed yet to give functorch time to replace its SkipInfos. cc zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64878 Reviewed By: mruberry Differential Revision: D30908052 Pulled By: ngimel fbshipit-source-id: 5124180b25c6e32517722883b9f3a2b488e3fe20	2021-09-13 16:32:36 -07:00
Shirong Wu	3710edc86b	Fix TRTOperatorSupport (#64873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64873 Fix TRTOperatorSupport's key naming to match the key generated by torch.fx.passes.tools_common.get_node_target. The get_node_target is used by splitter_base for comparing whether operator is supported by name. Test Plan: print out the supported operator dict and check name. Run TRTSplitter with lrm_split_model_generator and verify split result is correct with all supported operators printed. current split result: ```` Supported node types in the model: acc_ops.size: ((), {'input': torch.float32}) acc_ops.getitem: ((), {'input': torch.float32}) acc_ops.getitem: ((), {'input': None}) acc_ops.reshape: ((), {'input': torch.float32}) acc_ops.unsqueeze: ((), {'input': torch.float32}) acc_ops.linear: ((), {'input': torch.float32, 'weight': torch.float32}) acc_ops.linear: ((), {'input': torch.float32, 'weight': torch.float32, 'bias': torch.float32}) acc_ops.mul: ((), {'input': torch.float32, 'other': torch.float32}) acc_ops.cat: ((), {}) acc_ops.add: ((), {'input': torch.float32, 'other': torch.float32}) acc_ops.add: ((), {'input': torch.float32}) acc_ops.tanh: ((), {'input': torch.float32}) acc_ops.transpose: ((), {'input': torch.float32}) acc_ops.matmul: ((), {'input': torch.float32, 'other': torch.float32}) acc_ops.div: ((), {'input': torch.float32, 'other': torch.float32}) acc_ops.squeeze: ((), {'input': torch.float32}) acc_ops.noop: ((), {'input': torch.float32}) acc_ops.layer_norm: ((), {'input': torch.float32, 'weight': torch.float32, 'bias': torch.float32}) acc_ops.permute: ((), {'input': torch.float32}) acc_ops.sigmoid: ((), {'input': torch.float32}) acc_ops.flatten: ((), {'input': torch.float32}) acc_ops.softmax: ((), {'input': torch.float32}) acc_ops.sum: ((), {'input': torch.float32}) Unsupported node types in the model: torch.ops.fb.pad_sequence_embeddings: ((), {'embeddings': torch.float32, 'offsets': torch.int32}) acc_ops.linalg_norm: ((), {'input': torch ``` Reviewed By: yinghai Differential Revision: D30884463 fbshipit-source-id: 22442aa6a69cd148ce9bc8be8f62157dd6d19954	2021-09-13 15:55:15 -07:00
Eli Uriegas	914e3a861a	Revert D30878101: [pytorch][PR] Fix test report uploading Test Plan: revert-hammer Differential Revision: D30878101 (`fba40bfc1a`) Original commit changeset: 0730f17fa3f4 fbshipit-source-id: dad89e68b4daf656dd0b592bc9b2758f00af38c6	2021-09-13 15:24:44 -07:00
Vasiliy Kuznetsov	6101cbcedb	torch.ao migration: fake_quantize.py, phase 1 (#64814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64814 1. move the file ``` hg mv caffe2/torch/quantization/fake_quantize.py caffe2/torch/ao/quantization/ ``` 2. create a new file in the old location and copy the imports 3. fix all callsites inside `torch` Test Plan: ``` buck test mode/dev //caffe2/test:quantization ``` Reviewed By: z-a-f Differential Revision: D30866792 fbshipit-source-id: 7a221cb46c0ab01f1c5de9be061f09ecc83ce23e	2021-09-13 15:22:28 -07:00
Scott Wolchok	e4314dac57	[PyTorch] Reduce heap allocations in OperatorName::setNamespaceIfNotSet (#64673 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64673 We are now guaranteed to allocate at most one time in this function. ghstack-source-id: 137786392 Test Plan: Previous diff adds test coverage for this function. Reviewed By: dhruvbird Differential Revision: D30813014 fbshipit-source-id: 17d844a1cc8c30574afcc6b0b41b219e62c0b723	2021-09-13 14:33:55 -07:00
Scott Wolchok	000f3310d7	[PyTorch] Add test for operator_name (#64672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64672 Just a small struct missing test coverage. Next diff changes it. ghstack-source-id: 137786388 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D30813013 fbshipit-source-id: 05f39494bb9512a71a928bfe6fcfa710016bdf61	2021-09-13 14:32:50 -07:00
Emad El-Haraty	c99277e177	handle the case in acc_ops.sum when dim == 0, differentiating it from the case when dim is None (#64869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64869 handle the case in acc_ops.sum when dim == 0, differentiating it from the case when dim is None Reviewed By: 842974287 Differential Revision: D30872739 fbshipit-source-id: 2755d3230804a16ef1c9289f804138c6dd7766b3	2021-09-13 14:24:16 -07:00
XiaobingSuper	0561e104d9	fix build error when system cmake3 version >=3.5 but <=3.10 (#64914 ) Summary: For PyTorch source build using conda, there will raise an error in `8535418a06/CMakeLists.txt (L1)` when we get a CMake version < 3.10, it can be fixed by upgrade CMake in conda env, but for centos, there has CMake3, PyTorch fist check whether CMake3's verison<=3.5, so if user's system camke<= 3.5, PyTorch will use the system's cmake3, which will have build error like: ``` CMake Error at CMakeLists.txt:1 (cmake_minimum_required): CMake 3.10 or higher is required. You are running version 3.6.3 -- Configuring incomplete, errors occurred! ``` we need to check CMake3 also >=3.10, if not, then check conda's CMake version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64914 Reviewed By: jbschlosser Differential Revision: D30901673 Pulled By: ezyang fbshipit-source-id: 064e2c5bc0b9331d6ecd65cd700e5a42c3403790	2021-09-13 13:26:06 -07:00
driazati	fba40bfc1a	Fix test report uploading (#64846 ) Summary: Previously we just weren't uploading Windows test report XML files to S3, only to GitHub actions. This was different than Linux where we use both (though maybe we can kill the GHA upload in a follow up PR since I don't think it's very useful anymore). This factors it all out into a macro so they both do the same thing. This also fixes the naming of uploaded files to include info about the job name (the full config, so they can be matched to the job visually or by the included job id). See https://hud.pytorch.org/pr/64846 for results Pull Request resolved: https://github.com/pytorch/pytorch/pull/64846 Reviewed By: seemethere Differential Revision: D30878101 Pulled By: driazati fbshipit-source-id: 0730f17fa3f46a32c131f52669084c3103b0e616	2021-09-13 13:22:54 -07:00
Nikita Shulga	af984c78a9	Pin SciPy to 1.6.3 on Mac (take 2) (#64922 ) Summary: It's already pinned by via docker install on Linux `scipy.stats.`[`poission`\|`geom`\|`binom`] returns quite different results between 1.6.x and 1.7+ versions of SciPy, which results in several distributions tests failing accuracy thresholds Reland of https://github.com/pytorch/pytorch/pull/64844 but limited to just Mac platform Followup PR for Windows are coming as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/64922 Reviewed By: janeyx99 Differential Revision: D30901257 Pulled By: malfet fbshipit-source-id: 0543e7bae9d3bbeb8b6be7b3ecf605880f97665f	2021-09-13 12:48:11 -07:00
Don Jang	1bea49c716	[Deploy] Avoid use-after-free during autograd shutdown (#64620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64620 `autograd` extension module's shutdown logic destructs `PyThreadState` by `pybind11::gil_scoped_acquire` using the RAII pattern. The problem is that torch.deploy also destructs `PyThreadState` as part of its shutdown process (https://www.internalfb.com/phabricator/paste/view/P456363738), causing double destruction, use-after-free. This change adds `defined(USE_DEPLOY)` as a special case to avoid destruction of `PyThreadState` to the existing special treatment for `IS_PYTHON_3_9_PLUS`. Test Plan: Added `TorchpyTest.Autograd` unittest to ensure that torch.deploy can create multiple instances that use autograd without causing a crash. Reviewed By: albanD Differential Revision: D30779080 fbshipit-source-id: 4de3283cc2d394acc9b8141c17cacbfab5eea052	2021-09-13 12:43:10 -07:00
Jacob Szwejbka	fd716fcda2	[Pytorch Edge] Quantized Ops Dtype Selective (#63680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63680 Quantized ops not covered by DType Selectivity. Add the check, and adjust call sites to be constexpr friendly. Test Plan: CI (this covers all model unit tests), verified that segmentation (a model that uses some of these quant ops) still works on instagram. Reviewed By: dhruvbird, raymondethan Differential Revision: D30457626 fbshipit-source-id: 5ba850d2b53a18558dfbb1cfaa78d8f53b5dbad8	2021-09-13 11:04:07 -07:00
Edward Yang	4ca40aeb83	Disable more of the pragma warning stuff (#64899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64899 ghstack-source-id: 137882055 Test Plan: sandcastle, ossci Reviewed By: malfet, ngimel Differential Revision: D30893691 fbshipit-source-id: 67ec8cc9f212aa16a201771603236e429944b561	2021-09-13 10:58:31 -07:00
Scott Wolchok	8cfc74400a	[PyTorch] Gate tls_local_dispatch_key_set off on iOS too (#64753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64753 This may possibly be causing problems on iOS. (Maybe we should just revert inlining access to this thing? Really don't understand what's wrong with it, though.) ghstack-source-id: 137830520 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D30826897 fbshipit-source-id: 0438dee9d49e7601c26cdca0e8540229c777eddb	2021-09-13 10:54:28 -07:00
VertexC	d4b031b31e	typo fix (#64615 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/64615 Reviewed By: jbschlosser Differential Revision: D30884298 Pulled By: ngimel fbshipit-source-id: 230f9d06aa85abcdd69828a1ea0a83f36cbfcb17	2021-09-13 10:50:01 -07:00
kshitij12345	01e92f2a56	[nn] no batch dim support: CosineEmbeddingLoss (#64590 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 TODO * [x] Add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/64590 Reviewed By: H-Huang Differential Revision: D30900775 Pulled By: jbschlosser fbshipit-source-id: d24e72787017e79afbf8f04a94901a290485b81a	2021-09-13 10:45:33 -07:00
Rishi Puri	2ae938e15e	Fixes failure in test_dataloader.py that occurs on jetson boards (#64757 ) Summary: CUDA IPC is not supported for jetsons Pull Request resolved: https://github.com/pytorch/pytorch/pull/64757 Reviewed By: jbschlosser Differential Revision: D30900593 Pulled By: ejguan fbshipit-source-id: c6b2e8a9746276fdb4a009b6412e47cc8aac69f2	2021-09-13 10:11:04 -07:00
Jane Xu	8e63199c7c	.github: Always run chown workspace (#64854 ) Summary: In some workflow runs, like https://github.com/pytorch/pytorch/runs/3568714658, the chown workspace step is duplicated. Is that intentional? Unfortunately it is pretty necessary since (w/ docker) the folder can sometimes be in a broken permission state before and after we run jobs. So this PR makes the second chown workspace run always because that's the true intention of the step. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64854 Reviewed By: jbschlosser, seemethere Differential Revision: D30879289 Pulled By: janeyx99 fbshipit-source-id: 4157ff826c86e8c912deb1ba0cb5c47ea7596529	2021-09-13 10:06:31 -07:00
Eli Uriegas	70e64feda7	Reland .circleci: Skip cuda /cudnn install if existing (#64880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64880 This reverts commit 5836a116d0de214d6d759e70671f23150a5deaba. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30885675 Pulled By: seemethere fbshipit-source-id: 8c96584d5a632170e29f91c5daf0206680a78661	2021-09-13 09:52:16 -07:00
Supriya Rao	3d976d9ceb	torch.ao migration: quantize_jit.py phase1 (#64860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64860 ghstack-source-id: 137885395 Test Plan: buck test mode/dev //caffe2/test:quantization Reviewed By: jerryzh168 Differential Revision: D30880574 fbshipit-source-id: 9629027dd3b00bb8d45633e1564fc03a866f8c31	2021-09-13 08:41:48 -07:00
Supriya Rao	9d52651d4e	torch.ao migration: stubs.py phase 1 (#64861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64861 1. move the file ``` hg mv caffe2/torch/quantization/stubs.py caffe2/torch/ao/quantization/ ``` 2. create a new file in the old location and copy the imports 3. fix all call sites inside `torch` ghstack-source-id: 137885365 Test Plan: buck test mode/dev //caffe2/test:quantization Reviewed By: jerryzh168 Differential Revision: D30879678 fbshipit-source-id: a2d24f25d01064212aca15e94e8c78240ba48953	2021-09-13 08:40:29 -07:00
Jiayi Sun	c08b2491cc	add BFloat16 operators on CPU: cummax, cummin (#63307 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/63307 Reviewed By: nikithamalgifb Differential Revision: D30342002 Pulled By: anjali411 fbshipit-source-id: eee6e640da996ef0e983960119608d9c12405336	2021-09-13 08:00:17 -07:00
Xiaoyu Zhang	d932ddd24b	fix quantization.rst doc (#64802 ) Summary: RT。 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64802 Reviewed By: jbschlosser Differential Revision: D30887210 Pulled By: vkuzo fbshipit-source-id: 0267883d3065d724ea654a28db78f5fe5702ef06	2021-09-13 07:19:54 -07:00
Eddie Ren	9c73a48ecf	ND Embeddings benchmark - Standardize randomized inputs (#64707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64707 Use torch.randn instead of torch.from_numpy to generate the tensor Test Plan: buck run //caffe2/benchmarks/operator_benchmark/pt:qembedding_pack_test Reviewed By: jingsh Differential Revision: D30817302 fbshipit-source-id: 924c05517812b4b9f7df05a8999f9236cfe7b672	2021-09-13 06:47:35 -07:00
Heitor Schueroff	b37503e452	Initial implementation of nanmean (#62671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62671 Very crude first implementation of `torch.nanmean`. The current reduction kernels do not have good support for implementing nan* variants. Rather than implementing new kernels for each nan* operator, I will work on new reduction kernels with support for a `nan_policy` flag and then I will port `nanmean` to use that. TODO - [x] Fix autograd issue Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30515181 Pulled By: heitorschueroff fbshipit-source-id: 303004ebd7ac9cf963dc4f8e2553eaded5f013f0	2021-09-13 05:53:58 -07:00
Heitor Schueroff	8535418a06	[Reland] Added reference tests to ReductionOpInfo (#64273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64273 Reintroduced sample_inputs_prod and constrained the range of values for large reference tests. This reverts commit e4fd2ab59ce8645f5ae9477c7724b6af82124b3b. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30672097 Pulled By: heitorschueroff fbshipit-source-id: b44ed8dfd5eb0c74c194164dafc3242f6728a78f	2021-09-12 20:05:43 -07:00
Emilio Castillo	1cb3507ed3	Adds DLPack support (#57110 ) Summary: Partially Fixes https://github.com/pytorch/pytorch/issues/55090 Depends on https://github.com/pytorch/pytorch/issues/55365 Inspired by https://github.com/dmlc/dlpack/issues/57#issuecomment-774482973 Questions, in PyTorch we can't create streams or easily synchronize them from just an integer. Should we add an [`ExternalStream`](https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.ExternalStream.html) object like the one we have in CuPy? TODO: Add tests Would like some feedback as this design needs quite a few iterations rgommers leofang Pull Request resolved: https://github.com/pytorch/pytorch/pull/57110 Reviewed By: saketh-are Differential Revision: D30761481 Pulled By: mruberry fbshipit-source-id: e85d78df3c1f8defc2a698878da89cd843cb1209	2021-09-12 19:47:15 -07:00
kshitij12345	d46ea03871	[fix] fix test_python_dispatch with pytest (#64574 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62501 Another approach for fixing the same issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/64574 Reviewed By: ngimel Differential Revision: D30867237 Pulled By: ezyang fbshipit-source-id: c632a1e0b241effdc21ae929abe42fccec88aa24	2021-09-12 17:06:55 -07:00
Nikita Shulga	be79da3303	Revert D30876591: [pytorch][PR] Pin scipy to 1.6.3 on Windows and Mac Test Plan: revert-hammer Differential Revision: D30876591 (`39f2b9de2a`) Original commit changeset: 4946e0922063 fbshipit-source-id: b8beff3d973b21fe09c158baef25344030f8fb08	2021-09-12 15:56:40 -07:00
Vasiliy Kuznetsov	1577c106dc	torch.ao migration: numeric suite, eager and fx (#64817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64817 This migrates `torch.quantization._numeric_suite` to `torch.ao.ns._numeric_suite`, and `torch.quantization._numeric_suite_fx` to `torch.ao.ns._numeric_suite_fx`. 1. move the files ``` HG: move eager mode hg mv caffe2/torch/quantization/_numeric_suite.py caffe2/torch/ao/ns/ HG: move fx hg mv caffe2/torch/quantization/_numeric_suite_fx.py caffe2/torch/ao/ns/ hg mv caffe2/torch/quantization/ns/* caffe2/torch/ao/ns/fx/ ``` 2. create new versions of `_numeric_suite.py` and `_numeric_suite_fx.py` with imports 3. update all FB callsites Test Plan: buck test mode/dev //caffe2/test:quantization Reviewed By: z-a-f Differential Revision: D30867538 fbshipit-source-id: 120ee830434ca490c1183a187a518eebcbbaf22c	2021-09-12 12:00:45 -07:00
Nikita Shulga	39f2b9de2a	Pin scipy to 1.6.3 on Windows and Mac (#64844 ) Summary: It's already pinned by via docker install on Linux As `scipy.stats.`[`poission`\|`geom`\|`binom`] returns quite different results in 1.7+ versions of SciPy Pull Request resolved: https://github.com/pytorch/pytorch/pull/64844 Reviewed By: driazati Differential Revision: D30876591 Pulled By: malfet fbshipit-source-id: 4946e0922063e9ac320c218a0b089f73486466f7	2021-09-12 10:53:48 -07:00
Nikita Shulga	47144de473	Revert D30867266: [pytorch][PR] TST Adds gradcheck and gradgradcheck to module info Test Plan: revert-hammer Differential Revision: D30867266 (`67ebde5645`) Original commit changeset: cbc073326151 fbshipit-source-id: 00234e01eafc45fb999f7c83a397f9d6b3e01e46	2021-09-12 10:30:28 -07:00
Martin Yuan	30a7c768d7	[RFC] Modularize functions of parsing bytecode (#61862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61862 Modularize functions of parsing bytecode tables so that they can be used as needed in situations other than mobile lite interpreter. * The decoupled functions are re-used by current lite interpreter loader. * The bytecode can be serialized/deserialized from other formats. * The decoupled functions have minimum dependencies on other PyTorch components. Next: Build a driver binary to include the parser and interpreter, but only has necessary dependency on other PyTorch components. ghstack-source-id: 137867287 Test Plan: As an example, a simple bytecode is parsed to a mobile function, and directly run in the added unit test, `RunTimeTest:ParseBytecode`. It contains basic control flow (if, else) and basic data orchestration (list construction). CI Reviewed By: larryliu0820 Differential Revision: D29798382 Pulled By: iseeyuan fbshipit-source-id: 1c173a5f5d37097e3a97baec3f3e48e1eea1400f	2021-09-11 22:24:05 -07:00
Natalia Gimelshein	dd2d48df07	Revert D30875977: [caffe2] [aten] Remove loose (unpaired) #pragma warning ( pop ) in TensorBase.h Test Plan: revert-hammer Differential Revision: D30875977 (`1f35d20a89`) Original commit changeset: bd593feb5a75 fbshipit-source-id: 4c82dbc857fdb28e0240dacc1a0e607a76552bb4	2021-09-11 17:18:37 -07:00
Tao Xu	d13e0c9c39	[iOS][OSS][BE] Update XCode to use 12.5.1 (#64850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64850 ghstack-source-id: 137827895 Test Plan: CircleCI Reviewed By: hanton Differential Revision: D30877964 fbshipit-source-id: 803f2506a755b3815024704e6177c7826bc42de8	2021-09-11 11:24:06 -07:00
Tao Xu	c9eb312ce9	[iOS][OSS][BE] Remove unused files (#64849 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64849 ghstack-source-id: 137827893 Test Plan: CircleCI Reviewed By: hanton Differential Revision: D30877962 fbshipit-source-id: a76f7fe888b990ba6cad650f72be7f4a1e58a2f1	2021-09-11 11:22:55 -07:00
Mikhail Zolotukhin	82ac3f108d	[TensorExpr] Move 2 graph passes from kernel.cpp to graph_opt.cpp (#64828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64828 Also, make `removeUnusedSelfArgument` more consistent with other passes by mutating the graph in-place rather than returning a copy. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30870776 Pulled By: ZolotukhinM fbshipit-source-id: 4873f01b013921143a5aa43746d655a2d8d620c9	2021-09-11 10:23:15 -07:00
Mikhail Zolotukhin	ff65f637df	[TensorExpr] Add debug logging (store/load tracing) to IREval. (#64848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64848 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D30878278 Pulled By: ZolotukhinM fbshipit-source-id: bd946075336ba2e9786602161c236a0ff8a5a011	2021-09-11 09:25:55 -07:00
Mikhail Zolotukhin	180e4fbfae	[TensorExpr] LLVMCodegen: fix lowering for UInt->Float casts. (#64862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64862 Previously we erroneously were looking at dst signedness. This was discovered when we tried to implement quantize/dequantize ops. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30881696 Pulled By: ZolotukhinM fbshipit-source-id: 34af842e5e52a3b6b5d2e70c4ef32f910a20341f	2021-09-11 09:24:36 -07:00
Elias Guestrin	1f35d20a89	[caffe2] [aten] Remove loose (unpaired) #pragma warning ( pop ) in TensorBase.h (#64870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64870 Remove loose (unpaired) #pragma warning ( pop ) in TensorBase.h Issue started with D30728580 (`d701357d92`), was fixed with D30846958 (`40098f48a1`), and brought back again with the reversion of D30846958 (`40098f48a1`). Reviewed By: H-Huang Differential Revision: D30875977 fbshipit-source-id: bd593feb5a75245470e43ad568ebdd3f1738da7c	2021-09-11 00:43:19 -07:00
Jerry Zhang	d4a86c1f3b	[quant][fx2trt] Add lowering support for reference linear/conv modules (#64368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64368 Test Plan: python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py Imported from OSS Reviewed By: 842974287 Differential Revision: D30708738 fbshipit-source-id: 88142b7ce43ed96093597112dab03a2d277de993	2021-09-10 22:25:27 -07:00
Hui Guo	4481c87ac4	[tensorexpr] Simplify x/100 -> 0 if x is a non-negative integer less than 100. (#64763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64763 Simplification pattern: x/N -> 0; N is a constant positive integer and x is a for-loop index whose range is a subset of [0, N). Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30845854 Pulled By: huiguoo fbshipit-source-id: 814d69ed4be05e57405c222183cc1c6c526721cd	2021-09-10 20:33:02 -07:00
Eli Uriegas	5836a116d0	Revert D30869803: .circleci: Skip cuda /cudnn install if existing Test Plan: revert-hammer Differential Revision: D30869803 (`717d267e19`) Original commit changeset: 9eb3bd20875d fbshipit-source-id: bef8d0c693696307a3be7abd5331b7fa813d754a	2021-09-10 18:56:50 -07:00
Thomas J. Fan	67ebde5645	TST Adds gradcheck and gradgradcheck to module info (#64444 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/61935 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/64444 Reviewed By: ngimel Differential Revision: D30867266 Pulled By: jbschlosser fbshipit-source-id: cbc0733261517dbfcdd3415d969b9e802b62b7ac	2021-09-10 16:53:11 -07:00
Ansley Ussery	c60075d4b5	Preserve types during empty container assignment (#58911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58911 Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ #58911 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D30785623 Pulled By: ansley fbshipit-source-id: 4e05d6369318974290fea02ad2bc148293c25090	2021-09-10 16:49:21 -07:00
Jane Xu	b4855619d1	Always upload stats to S3 (#64853 ) Summary: It's not very useful when stats are only uploaded when the tests all pass. Like for this failing run, the stats were not uploaded to Scribe or S3: https://github.com/pytorch/pytorch/runs/3568714658 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64853 Reviewed By: seemethere Differential Revision: D30878361 Pulled By: janeyx99 fbshipit-source-id: 19a4c520efdd5575785a3ffbc60e6c09456b9e0d	2021-09-10 16:49:19 -07:00
Kevin Tse	f3f410880a	[DataPipe] Remove ZipArchiveReader's dependency on FileLoader (#64786 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * https://github.com/pytorch/pytorch/issues/64788 * __->__ https://github.com/pytorch/pytorch/issues/64786 This PR removes ZipArchiveReader's dependency on FileLoader DataPipe, by allowing it to use a IterDataPipe of path names as input rather than a tuple of path name and a stream. It also adds additional tests to ensure that the DataPipe is functioning properly when it is read multiple times or reset half way through reading. The whole stack fixes issues related to unclosed buffer stream (see https://github.com/pytorch/pytorch/issues/64281). cc VitalyFedyunin ejguan Pull Request resolved: https://github.com/pytorch/pytorch/pull/64786 Reviewed By: ngimel Differential Revision: D30870968 Pulled By: NivekT fbshipit-source-id: 64b04d1697b99772f2fa20fc141668e6b8e18c41	2021-09-10 16:49:17 -07:00
Eli Uriegas	717d267e19	.circleci: Skip cuda /cudnn install if existing (#64825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64825 Rewrites this script to only install the CUDA tools if they are not already pre-installed Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30869803 Pulled By: seemethere fbshipit-source-id: 9eb3bd20875df0f2b18f5314ac825dbdf91637b5	2021-09-10 16:49:14 -07:00
Ilqar Ramazanli	dafa0a5a3b	[doc][hackathon] To add Adadelta Optimizer to the documentation (#63255 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of AdaDelta Algorithm to the documentation. For more details, we refer to the paper here https://arxiv.org/abs/1212.5701 <img width="654" alt="AdaDeltaalg" src="https://user-images.githubusercontent.com/73658284/132770544-82ccf90a-1d54-4ad5-8fc4-51c8dec63a12.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63255 Reviewed By: ngimel Differential Revision: D30867589 Pulled By: iramazanli fbshipit-source-id: 5ba602c20c724a4486bdd38b73e1b64c0e767bdc	2021-09-10 16:49:12 -07:00
Alban Desmaison	d8ae3cc318	Add more error checking in subclass creation (#64746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64746 This extracts the error checking that used to be in the PR above. We are not going to land the proposed fix there, but I think we want this error checking in right now as these would lead to respectively a memory leak and arbitrary memory read/write. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30867569 Pulled By: albanD fbshipit-source-id: bf468033fb8b49fcb26eed423f5fad82b4a46c56	2021-09-10 16:49:10 -07:00
Alban Desmaison	89f94fc15f	Move THPVariable_NewWithVar around (#64550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64550 Just moves a function around to make the next PR easier to read. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30867570 Pulled By: albanD fbshipit-source-id: 99ae925568ed29ca7fdea059762c21d430d4a204	2021-09-10 16:49:08 -07:00
Raghavan Raman	2cc9778495	[MicroBench] Added a log_vml version of the signed log1p kernel (#64205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64205 The log_vml version of the micro-bench is over 2x faster than the log1p version. Here are the perf numbers: ``` --------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------- SignedLog1pBench/ATen/10/1467 45915 ns 45908 ns 14506 GB/s=2.5564G/s SignedLog1pBench/NNC/10/1467 40469 ns 40466 ns 17367 GB/s=2.9002G/s SignedLog1pBench/NNCLogVml/10/1467 19560 ns 19559 ns 35902 GB/s=6.00016G/s ``` Thanks to bertmaher for pointing this out. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30644716 Pulled By: navahgar fbshipit-source-id: ba2b32c79d4265cd48a2886b0c62d0e89ff69c19	2021-09-10 16:49:06 -07:00
Raghavan Raman	cad7a4b0ea	[nnc] Added an implementation of sign op (#64033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64033 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30579197 Pulled By: navahgar fbshipit-source-id: f9f7fa7f2ffa109cf4e441eb1af821b8b891d4d3	2021-09-10 16:49:04 -07:00
Eddie Ren	3fbb49e75d	Extend 2Dim embedding bag benchmarking to include 3Dim benchmarks (#64647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64647 Add support for benchmarking of 8 bit quantizations of N-D batched embeddings. Currently only works for 3Dim embeddings and still requires thought on ramping up from 3Dim to NDim. Test Plan: ```buck run //caffe2/benchmarks/operator_benchmark/pt:qembedding_pack_test``` Reviewed By: jingsh Differential Revision: D30770085 fbshipit-source-id: 26659020f3458991592065a05366bde0f060494e	2021-09-10 16:49:02 -07:00
Howard Huang	227aafd1d9	Revert D30846958: [caffe2/aten] Remove loose #pragma warning ( pop ) in TensorBase.h Test Plan: revert-hammer Differential Revision: D30846958 (`40098f48a1`) Original commit changeset: 52a3fb66e426 fbshipit-source-id: 1d749f6981756f2169d6867538555a945cbb8ca6	2021-09-10 16:47:08 -07:00
Kevin Tse	5060b69d62	[DataPipe] fixing tests related fork() to remove warnings (#64827 ) Summary: There are two warnings produced by `test_fork_datapipe`. This PR addresses the issues raised by those warnings without impacting the test cases. cc VitalyFedyunin ejguan Pull Request resolved: https://github.com/pytorch/pytorch/pull/64827 Reviewed By: ejguan Differential Revision: D30870528 Pulled By: NivekT fbshipit-source-id: 580a001c6fa3ff6f8b04a7e5183e58861938204b	2021-09-10 11:01:42 -07:00
Hui Guo	ade4bf3e82	[tensorexpr] Add 'pre_alloc' argument in python API of tensorexpr kernel (#64718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64718 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30826582 Pulled By: huiguoo fbshipit-source-id: 6c173c8964f2643039273cdc83e64fb02bb5f381	2021-09-10 10:03:00 -07:00
anjali411	92cd5ab1cb	Skip conjugate and negate fallback for view ops and their in-place versions (#64392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64392 cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30866330 Pulled By: anjali411 fbshipit-source-id: 7b2f51486bf1d610ad2b1472306bab608ee69c37	2021-09-10 09:57:27 -07:00
Ilqar Ramazanli	54b72a99ef	To add Rprop documentation (#63866 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Rprop to the documentation. For more details, we refer to the paper http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.1417 <img width="657" alt="Rpropalg" src="https://user-images.githubusercontent.com/73658284/132750009-a5ec059e-6d53-4c67-917b-57174c8ca27b.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63866 Reviewed By: ngimel Differential Revision: D30867590 Pulled By: iramazanli fbshipit-source-id: 0d2d4ffc6c4d939290bbbaa84d2c6e901ed8b54a	2021-09-10 09:49:10 -07:00
Jeff Daily	c7b03e2b83	[ROCm] define C10_WARP_SIZE to warpSize HIP constant (#64302 ) Summary: warpSize is defined as a constexpr in HIP headers. It is incorrect to assume warpSize 64. This change fixes the C10_WARP_SIZE definition in torch sources similar to [how it was done in caffe2](https://github.com/pytorch/pytorch/blob/master/caffe2/utils/GpuDefs.cuh#L10-L14). cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/64302 Reviewed By: mrshenli Differential Revision: D30785975 Pulled By: malfet fbshipit-source-id: 68f8333182ad4d02bd0c8d02f1751a50bc5bafa7	2021-09-10 09:43:47 -07:00
Corey Levinson	db3fcf0af3	fix typo in torch/onnx/utils.py (#63396 ) Summary: fixes minor typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/63396 Reviewed By: pbelevich Differential Revision: D30644295 Pulled By: SplitInfinity fbshipit-source-id: c506f67383909aa2c0c7c533038446b4b3d76a3a	2021-09-10 09:37:44 -07:00
rui	c12df2dc23	build: bump bazel to 4.2.1 (#64455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64455 Reviewed By: saketh-are Differential Revision: D30752580 Pulled By: malfet fbshipit-source-id: 4f5cc6f820396348181c09463f7e5628b5f69471	2021-09-10 08:30:10 -07:00
Aswin John Mathews	63b180beed	ROCm MIOpen NHWC Convolution support (#63617 ) Summary: - Added 2D-Convolution NHWC support - on ROCm 4.3, with `PYTORCH_MIOPEN_SUGGEST_NHWC=1` flag - May need to force MIOpen to search for solutions ( see examples below for flags ) PYTORCH_MIOPEN_SUGGEST_NHWC Environment Flag MIOpen does not officially support NHWC yet, although convolution support has been added to tip-of-tree of MIOpen. This flag is intended to be a short-lived flag to explicitly turn on NHWC support until ROCm officially supports NHWC and performance is verified. Examples 1. Example usage 1 : Run test on ROCm4.3 `PYTORCH_TEST_WITH_ROCM=1 PYTORCH_MIOPEN_SUGGEST_NHWC=1 MIOPEN_FIND_ENFORCE=4 MIOPEN_DEBUG_CONV_GEMM=0 MIOPEN_FIND_MODE=1 pytest test_nn.py -v -k "test_conv_cudnn_nhwc" ` 2. Example usage 2: Run the following with `PYTORCH_MIOPEN_SUGGEST_NHWC=1` on ROCm4.3. ``` #!/usr/bin/env python3 import torch model = torch.nn.Conv2d(8, 4, 3).cuda().half() model = model.to(memory_format=torch.channels_last) input = torch.randint(1, 10, (2, 8, 4, 4), dtype=torch.float32, requires_grad=True) input = input.to(device="cuda", memory_format=torch.channels_last, dtype=torch.float16) # should print True for is_contiguous(channels_last), and strides must match NHWC format print(input.is_contiguous(memory_format=torch.channels_last), input.shape, input.stride() ) out = model(input) # should print True for is_contiguous(channels_last), and strides must match NHWC format print("Contiguous channel last :", out.is_contiguous(memory_format=torch.channels_last), " out shape :", out.shape, "out stride :", out.stride() ) ``` See https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html for more examples. cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/63617 Reviewed By: saketh-are Differential Revision: D30730800 Pulled By: ezyang fbshipit-source-id: 61906a0f30be8299e6547d312ae6ac91cc7c3238	2021-09-10 08:06:32 -07:00
Shen Li	2a81e8b8f1	Let all_reduce_coalesced and all_gather_coalesced return Future objects (#64722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64722 `all_reduce_coalesced` and `all_gather_coalesced` are never publicly released in our API docs. So, I would assume the blast radius to be small. The motivation for this change to allow implementing `all_reduce_coalesced` and `all_gather_coalesced` by re-using `allreduce` and `allgather` C++ cores and perform flatten and copy only on the Python side. With that, we can then remove `all_reduce_coalesced` and `all_gather_coalesced` from C++ ProcessGroup APIs. For the async mode, the copy-back logic after the communication will need to be chained as a callback on the returned Future and use the chained child Future as the return value (otherwise, we will need to wrap the child Future into another work handle). This PR tries to test if we can directly return a Future without breaking tests and internal use cases. If yes, it will make the consolidation a lot easier. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D30830994 Pulled By: mrshenli fbshipit-source-id: dcde0ed9245e9e8fee357b3588b07d540a4b6318	2021-09-10 07:45:25 -07:00
Nikita Vedeneev	88fff22023	`torch.lu`: forward AD support (#64742 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64742 Reviewed By: H-Huang Differential Revision: D30841227 Pulled By: albanD fbshipit-source-id: dc4d043ab94358594adb110fbbbb60750c98262a	2021-09-10 07:19:11 -07:00
Jordan Fix	be091950d0	[const_fold] Keep around node.meta for replaced folded ops (#64782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64782 Previously, get_attrs that were added to the graph did not retain node.meta after folding. Add such support, and improve coverage in general here. Test Plan: Added test coverage. Reviewed By: protonu Differential Revision: D30852704 fbshipit-source-id: ece87a61c69b2e68982964c6adc4dde14dae12c7	2021-09-09 23:52:39 -07:00
Elias Guestrin	40098f48a1	[caffe2/aten] Remove loose #pragma warning ( pop ) in TensorBase.h (#64773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64773 Remove loose `#pragma warning ( pop )` in TensorBase.h. Reviewed By: ezyang Differential Revision: D30846958 fbshipit-source-id: 52a3fb66e426bc16ef7bde2a13e26e8293969026	2021-09-09 23:45:45 -07:00
Shirong Wu	95d98dfeec	Add TRTSplitter (#64762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64762 Extract and format TRTSplitter from fx2trt_example code, current implementation is tentative, subject to changed based on feeds model lowering progress. Test Plan: manul print of supported operator: `{<class 'torch.nn.modules.activation.ReLU'>: None, <function relu at 0x7f9b1abd0790>: None, <class 'torch.nn.modules.activation.Sigmoid'>: None, <class 'torch.nn.modules.pooling.AdaptiveAvgPool2d'>: None, <built-in method add of type object at 0x7f9b7f402498>: None, <built-in function add>: None, <built-in method add of PyCapsule object at 0x7f9b1a3dc690>: None, <built-in method add_relu of PyCapsule object at 0x7f9b1a34cf90>: None, <class 'torch.nn.modules.batchnorm.BatchNorm2d'>: None, <class 'torch.nn.quantized.modules.batchnorm.BatchNorm2d'>: None, <class 'torch.nn.modules.conv.Conv2d'>: None, <class 'torch.nn.quantized.modules.conv.Conv2d'>: None, <class 'torch.nn.intrinsic.quantized.modules.conv_relu.ConvReLU2d'>: None, <class 'torch.nn.modules.linear.Linear'>: None, <class 'torch.nn.quantized.modules.linear.Linear'>: None, <class 'torch.nn.modules.pooling.MaxPool2d'>: None, <built-in function mul>: None, <built-in method mul of type object at 0x7f9b7f402498>: None, <built-in method mul of PyCapsule object at 0x7f9b1a3dc6c0>: None, <built-in method flatten of type object at 0x7f9b7f402498>: None, <class 'torch.nn.quantized.modules.DeQuantize'>: None, <built-in method dequantize of type object at 0x7f9b7f402498>: None, 'dequantize': None, <class 'torch.nn.quantized.modules.Quantize'>: None, <built-in method quantize_per_tensor of type object at 0x7f9b7f402498>: None, <class 'torch.nn.modules.linear.Identity'>: None, <function conv2d at 0x7f9b1a1fe9d0>: None, <function flatten at 0x7f9b1a1f5ca0>: None, <function size at 0x7f9b1a1f5b80>: None, <function batch_norm at 0x7f9b1a1feaf0>: None, <function layer_norm at 0x7f9b1a1feb80>: None, <function softmax at 0x7f9b1a1f9550>: None, <function relu at 0x7f9b1a1fe040>: None, <function sin at 0x7f9b1a2030d0>: None, <function cos at 0x7f9b1a203160>: None, <function tan at 0x7f9b1a2031f0>: None, <function sinh at 0x7f9b1a1fe160>: None, <function cosh at 0x7f9b1a1fe280>: None, <function tanh at 0x7f9b1a1fe310>: None, <function asin at 0x7f9b1a1fe3a0>: None, <function acos at 0x7f9b1a1fe430>: None, <function atan at 0x7f9b1a1fe4c0>: None, <function exp at 0x7f9b1a1fe550>: None, <function log at 0x7f9b1a1fe5e0>: None, <function sqrt at 0x7f9b1a1fe670>: None, <function reciprocal at 0x7f9b1a1fe700>: None, <function abs at 0x7f9b1a1fe790>: None, <function neg at 0x7f9b1a1fe820>: None, <function floor at 0x7f9b1a1fe8b0>: None, <function ceil at 0x7f9b1a1fe940>: None, <function sum at 0x7f9b1a1f9c10>: None, <function max_pool2d at 0x7f9b1a1f5d30>: None, <function squeeze at 0x7f9b1a1f5c10>: None, <function add at 0x7f9b1a1f91f0>: None, <function sub at 0x7f9b1a1f9ca0>: None, <function div at 0x7f9b1a1f9dc0>: None, <function mul at 0x7f9b1a1f9d30>: None, <function pow at 0x7f9b1a1f9e50>: None, <function min_two_tensors_input at 0x7f9b1a1f9940>: None, <function unsqueeze at 0x7f9b1a1f9280>: None, <function topk at 0x7f9b1a203280>: None, <function adaptive_avg_pool2d at 0x7f9b1a1f5dc0>: None, <function avg_pool2d at 0x7f9b1a1f5e50>: None, <function reshape at 0x7f9b1a203550>: None, <function slice_tensor at 0x7f9b1a1fee50>: None, <function split at 0x7f9b1a1fec10>: None, <function linear at 0x7f9b1a1f51f0>: None, <function clamp at 0x7f9b1a1f93a0>: None, <function tuple_construct at 0x7f9b1a1fed30>: None, <function contiguous at 0x7f9b1a1f9430>: None, <function getitem at 0x7f9b1a203310>: None, <function cat at 0x7f9b1a1f9310>: None, <function transpose at 0x7f9b1a1f94c0>: None, <function matmul at 0x7f9b1a1f98b0>: None, <function sigmoid at 0x7f9b1a1fe1f0>: None, <function permute at 0x7f9b1a1f9670>: None, <function quantize_per_tensor at 0x7f9b1a1f9b80>: None, <function dequantize at 0x7f9b1a1f99d0>: None, <function sign at 0x7f9b1a1f5ee0>: None}` Reviewed By: 842974287 Differential Revision: D30798047 fbshipit-source-id: 69076a550874425b7186fbbf2ecf03da4a99b42f	2021-09-09 21:08:57 -07:00
Scott Wolchok	88c0ea9131	[PyTorch] Fix missing move in torch::jit::Lexer::next (#64653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64653 Saves shared_ptr refcount inc/dec in SourceRange. ghstack-source-id: 137608457 Test Plan: Profiled startup of framework overheads benchmark from high_per_models; self time spent in next() is way down. Reviewed By: dhruvbird Differential Revision: D30739240 fbshipit-source-id: ac455678c9d46e657b111d3788d4369983028674	2021-09-09 19:01:07 -07:00
Scott Wolchok	b7b4f63bbc	[PyTorch] Use std::find in the JIT lexer (#64652 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64652 If nothing else, it is slightly clearer code. ghstack-source-id: 137608456 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D30739239 fbshipit-source-id: bc7917b59883ca4a33fc6916b4e422bad79cf04b	2021-09-09 18:59:27 -07:00
Mikhail Zolotukhin	a17d6c7f80	[TensorExpr] Simplify TE IR before applying any transformations. (#64717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64717 This also exposed several bugs, which are fixed in this PR. Differential Revision: D30826408 D30826408 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: a67ec5739aceed9ffdf0d24f77eb3787cefe4560	2021-09-09 18:50:51 -07:00
Jerry Zhang	ef2c9d7d8a	[quant][fix] Fix quantization for sub_scalar (#64603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64603 We'll insert observer only when both the operator and dtype is supported Test Plan: python test/test_quantization.py TestQuantizeFx.test_sub_scalar Imported from OSS Reviewed By: vkuzo Differential Revision: D30797025 fbshipit-source-id: a77c21e2749405534fc245374cf33a0657a3d2c8	2021-09-09 17:18:31 -07:00
Linbin Yu	1b5b210f2c	[Android] print type name for IValues (#64602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64602 print type name in error message for easier debugging. Test Plan: Example: java.lang.IllegalStateException: Expected IValue type Tensor, actual type TensorList Reviewed By: beback4u Differential Revision: D30782318 fbshipit-source-id: 60d88a659e7b4bb2b574b12c7652a28f0d5ad0d2	2021-09-09 17:06:15 -07:00
Xinyi Zhang	11ef68938c	[caffe2][tiny] Add logging to report what the current lengths are when mismatched lengths are detected (#64768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64768 as title Differential Revision: D30846637 fbshipit-source-id: 266768c81b315fdebba854135ea2db1faf67fd6a	2021-09-09 16:46:55 -07:00
Ilqar Ramazanli	d4b09dbab3	[doc][hackathon] To add Adagrad Optimizer to the documentation (#63254 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Adagrad to the documentation. For more details, we refer to the paper http://jmlr.org/papers/v12/duchi11a.html <img width="658" alt="AdaGradAlgo" src="https://user-images.githubusercontent.com/73658284/132743276-a52ea3fb-70a5-4788-94b7-f99367907a26.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63254 Reviewed By: albanD Differential Revision: D30852139 Pulled By: iramazanli fbshipit-source-id: 9e496560a97e92be8386585b01d9bd3bba4b0c66	2021-09-09 15:41:29 -07:00
Harut Movsisyan	9ad75281f6	[Static Runtime] Fix resize_output_check warning coming from prim::VarConcat (#64765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64765 Test Plan: Tested the fix with BR v1 model predictor-replayer setup. Reviewed By: ajyu Differential Revision: D30846506 fbshipit-source-id: 3ef3c93f11285c7cd1e2b188ca298a7ab4fba579	2021-09-09 14:38:50 -07:00
Han Guangyun	7f1932e1b9	Rename profiler metadata key (#63743 ) Summary: rename metadata key to be the same with variable name Pull Request resolved: https://github.com/pytorch/pytorch/pull/63743 Reviewed By: albanD Differential Revision: D30839501 Pulled By: gdankel fbshipit-source-id: b9b4e670dcc9557b8d8d0730baea0ad39a1a0ca4	2021-09-09 13:06:16 -07:00
Jordan Fix	6cc8cc6e56	Add support for lowering info during serialize_module, and add padding/partial to it (#5810 ) Summary: Pull Request resolved: https://github.com/pytorch/glow/pull/5810 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64725 - Any info added to the dict in node.meta["lowering_info"] will be added to the node_rep during serialization. - Use this to add annotations on placeholders that allow partial inputs and require padding. - Check for these annotations and set them in the NNPICompiledFunction as expected Test Plan: Validated working on inline_cvr in stack. Additionally existing fx_glow end to end tests should still pass. Reviewed By: 842974287 Differential Revision: D30824192 fbshipit-source-id: def64ef097aa35c337abb494415f7d437c6c7fa9	2021-09-09 13:01:28 -07:00
Palwisha Akhtar	d43fb75a21	cat_shape_check: Fixes dimension in the error message for CUDA cat shape check and removes unnecessary offending index information (#64556 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/64207 Thank you, SsnL for providing the reproducing script. cc ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/64556 Reviewed By: albanD Differential Revision: D30843859 Pulled By: ngimel fbshipit-source-id: 457ebe80eaef793d9f5d35ee962b6697e5de1907	2021-09-09 12:51:11 -07:00
Xu Zhao	2c243ed112	Enable the on-demand performance PR testing to run on a specified TB branch (#64701 ) Summary: This is to enable performance testing of experimental features such as LazyTensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64701 Test Plan: TorchBench CI RUN_TORCHBENCH: BERT_pytorch, mobilenet_v3_large TORCHBENCH_BRANCH: v1.0 Reviewed By: seemethere Differential Revision: D30847389 Pulled By: xuzhao9 fbshipit-source-id: 6853b368fa6f1ba8ffde517805c74bf318dcb35b	2021-09-09 12:41:21 -07:00
Eli Uriegas	517033916c	.github: Remove add_annotations workflow (#64449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64449 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: suo, janeyx99 Differential Revision: D30738460 Pulled By: seemethere fbshipit-source-id: f1259fcba2f0c14a9bcfbe811ec0a4bf61106619	2021-09-09 12:22:12 -07:00
Rohan Varma	9797a32faf	[Dist/CI] Remove dist from target determinator (#64721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64721 There are a couple PRs where distributed CI did not runa nd we expect it to. Examples: https://github.com/pytorch/pytorch/pull/64513/checks?check_run_id=3539190960, https://github.com/pytorch/pytorch/pull/64113. All distributed tests should've been run on these PRs, but we can see they were not: ``` Determination is skipping distributed/test_c10d_common Determination is skipping distributed/test_c10d_gloo Determination is skipping distributed/test_c10d_nccl Determination is skipping distributed/test_c10d_spawn_gloo Determination is skipping distributed/test_c10d_spawn_nccl Running distributed/test_data_parallel without determination Determination is skipping distributed/test_distributed_spawn Determination is skipping distributed/test_jit_c10d ``` Since it is important to run distributed tests on PRs that touch distributed, exclude distributed from target_det_list for now. ghstack-source-id: 137654015 Test Plan: CI Reviewed By: driazati, mrshenli Differential Revision: D30830455 fbshipit-source-id: 8b0fdf5b57c2c647b0d82c48e2bb8e2bdbe4d307	2021-09-09 12:07:43 -07:00
Emad El-Haraty	46c886e8a6	fix acc topk's handling of the case when dim=0, fix tests as well (#64727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64727 the acc ops convertor for topk has a subtle bug (i found this while trying to introduce max/min) the code does not differentiate between dim == None and dim ==0, but these are both different computations Reviewed By: jfix71, 842974287 Differential Revision: D30833621 fbshipit-source-id: 6cd84e6ca4e95bb1a6d6465e61830b76808a9c78	2021-09-09 10:35:23 -07:00
Richard Barnes	3d3ff4a9e7	Fix a shadowed variable (#64695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64695 Resolves this warning: ``` caffe2/aten/src/ATen/ParallelOpenMP.h:109:63: warning: declaration of 'int64_t begin' shadows a parameter [-Wshadow=compatible-local] 109 \| internal::invoke_parallel(begin, end, grain_size, [&](int64_t begin, int64_t end) { \| ~~~~~~~~^~~~~ caffe2/aten/src/ATen/ParallelOpenMP.h:86:1: note: shadowed declaration is here 85 \| inline scalar_t parallel_reduce( \| ~~~~~~~~~~~~~~~~ 86 \| const int64_t begin, \| ^ ~ ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D30816128 fbshipit-source-id: 3adff6d94eea9fbd65885e88283cae10b87dba18	2021-09-09 10:34:01 -07:00
Nived P A	8deaa476ac	Added more version comparison operations (#63848 ) Summary: Currently the [TorchVersion](`1022443168/torch/torch_version.py (L13)`) only only supports 'greater than', and 'equal to' operations for comparing torch versions and something like `TorchVersion('1.5.0') < (1,5,1)` or `TorchVersion('1.5.0') >= (1,5)` will throw an error. I have added 'less than' (`__lt__()`), 'greater than or equal to' (`__ge__()`) and 'less than or equal to' (`__le__()`) operations, so that the TorchVersion object can be useful for wider range of version comparisons. cc seemethere zsol Pull Request resolved: https://github.com/pytorch/pytorch/pull/63848 Reviewed By: fmassa, heitorschueroff Differential Revision: D30526996 Pulled By: seemethere fbshipit-source-id: 1db6bee555043e0719fd541cec27810852590940	2021-09-09 10:30:20 -07:00
Mike Ruberry	cfa6162e5e	Reverts cat and stack warning when out= is not the expected shape (#64714 ) Summary: These warnings are being thrown too aggressively at the moment. See https://github.com/pytorch/pytorch/issues/64709 for a follow-up to reenable them once internal call sites are reviewed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64714 Reviewed By: ngimel Differential Revision: D30822965 Pulled By: mruberry fbshipit-source-id: 3ad7c92d381d42ac6187ed84afab477c579a8f35	2021-09-09 10:03:22 -07:00
Ilqar Ramazanli	2b41bf40c5	To add SequentialLR to PyTorch Core Schedulers (#64037 ) Summary: Partially resolves https://github.com/pytorch/vision/issues/4281 In this PR we are proposing a new scheduler --SequentialLR-- which enables list of different schedulers called in different periods of the training process. The main motivation of this scheduler is recently gained popularity of warming up phase in the training time. It has been shown that having a small steps in initial stages of training can help convergence procedure get faster. With the help of SequentialLR we mainly enable to call a small constant (or linearly increasing) learning rate followed by actual target learning rate scheduler. ```PyThon scheduler1 = ConstantLR(optimizer, factor=0.1, total_iters=2) scheduler2 = ExponentialLR(optimizer, gamma=0.9) scheduler = SequentialLR(optimizer, schedulers=[scheduler1, scheduler2], milestones=[5]) for epoch in range(100): train(...) validate(...) scheduler.step() ``` which this code snippet will call `ConstantLR` in the first 5 epochs and will follow up with `ExponentialLR` in the following epochs. This scheduler could be used to provide call of any group of schedulers next to each other. The main consideration we should make is every time we switch to a new scheduler we assume that new scheduler starts from the beginning- zeroth epoch. We also add Chained Scheduler to `optim.rst` and `lr_scheduler.pyi` files here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64037 Reviewed By: albanD Differential Revision: D30841099 Pulled By: iramazanli fbshipit-source-id: 94f7d352066ee108eef8cda5f0dcb07f4d371751	2021-09-09 09:36:32 -07:00
John Shen	c3203efe80	[pytorch] Make qlinear weight packing thread safe (#63804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63804 Adding a lock around weight packing section of qlinear + qlinear_dynamic Test Plan: automated tests Reviewed By: kimishpatel Differential Revision: D30340957 fbshipit-source-id: 1c9faf796c4ffbc74345396188a6f1154a76bea6	2021-09-09 09:31:48 -07:00
Nikita Vedeneev	dc53546655	`torch.lu_solve`: forward AD support (#64646 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64646 Reviewed By: VitalyFedyunin Differential Revision: D30807898 Pulled By: albanD fbshipit-source-id: 1f943c22357dd1b3662cfe0d2a26af68e3a2df4c	2021-09-09 08:58:00 -07:00
Raghavan Raman	b7c86365d1	[nnc] Handled cast in index expression during inlining (#64716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64716 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30826388 Pulled By: navahgar fbshipit-source-id: 7e446602f650527e0d954e437f0370602019e040	2021-09-09 08:30:52 -07:00
Raghavan Raman	652a8bf7d0	[nnc] Updated indices during broadcast to use int64_t (#64627 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64627 This fixes the root cause of S242719 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30801686 Pulled By: navahgar fbshipit-source-id: b6d3ebdc7eb57116eaced53c2f35c7798bb17e80	2021-09-09 08:29:37 -07:00
Howard Huang	459653a0f6	Revert D30745921: [DDP] Fix when buffers are reassigned in module Test Plan: revert-hammer Differential Revision: D30745921 (`d59ecc02df`) Original commit changeset: 25eb1edbf445 fbshipit-source-id: 343ead86bf1e2d0b2d4124be331ea2fa437303ad	2021-09-09 08:23:16 -07:00
Howard Huang	5bc53ac5ef	Revert D30745961: [DDP] Remove self.modules_params Test Plan: revert-hammer Differential Revision: D30745961 (`8c09510294`) Original commit changeset: 32d102502570 fbshipit-source-id: 59f7cc50d369b6cc2856cf4ebd0f58b96202336d	2021-09-09 08:23:14 -07:00
Howard Huang	f1aaf8afcd	Revert D30745960: [DDP] Remove SPMD from self.modules_buffers Test Plan: revert-hammer Differential Revision: D30745960 (`1553259520`) Original commit changeset: 66a8f9847e9f fbshipit-source-id: d3f3fb813c45ac1b0ff15c6154b2e99e5dbab433	2021-09-09 08:22:12 -07:00
Elias Ellison	3bf93d769c	[JIT] Add gradient check in constants (#64613 ) Summary: fixes internal issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/64613 Reviewed By: Gamrix Differential Revision: D30799016 Pulled By: eellison fbshipit-source-id: 48ef52d1cac627919e6cd232216d24878a2a8b58	2021-09-09 08:13:57 -07:00
Edward Yang	d4b1016850	Filter out _disabled_torch_function_impl from handle_torch_function (#64689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64689 This brings it in line with the C++ implementation. Fixes https://github.com/pytorch/pytorch/issues/64687 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30816215 Pulled By: ezyang fbshipit-source-id: ed36af6c35467ae678d9548197efd97c36d38dec	2021-09-09 07:29:09 -07:00
Ilqar Ramazanli	239366c9c2	To add Rectified Adam Description to Documentation (#63772 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Rectified Adam Algorithm to the documentation. For more details, we refer to the paper https://arxiv.org/abs/1908.03265 <img width="446" alt="RadamAlgo" src="https://user-images.githubusercontent.com/73658284/132587815-4764b642-df53-4e41-975f-72e0f40fdc48.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63772 Reviewed By: datumbox Differential Revision: D30839694 Pulled By: iramazanli fbshipit-source-id: 6f5629ce56e10c66a451433334b587b99eda1610	2021-09-09 07:10:36 -07:00
Ilqar Ramazanli	5b21f172a4	[doc][hackathon] To add AdamW Optimizer to the documentation (#63252 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of AdamW Algorithm to the documentation. For more details, we refer to the paper here https://arxiv.org/abs/1711.05101 <img width="442" alt="AdamWalgo" src="https://user-images.githubusercontent.com/73658284/132589957-6d381e96-cb62-40d0-990f-82a32ec455be.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63252 Reviewed By: datumbox Differential Revision: D30839685 Pulled By: iramazanli fbshipit-source-id: 1a426c874ab86408d286a34f41aefcf5b21167c0	2021-09-09 07:05:31 -07:00
Ilqar Ramazanli	39ce801d1f	To add Adamax algorithm to documentation (#63903 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Adamax Algorithm to the documentation. For more details, we refer to the paper https://arxiv.org/abs/1412.6980 <img width="447" alt="Adamx" src="https://user-images.githubusercontent.com/73658284/132577306-878ce64c-627a-4086-808c-d0482868d4a1.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63903 Reviewed By: albanD Differential Revision: D30819055 Pulled By: iramazanli fbshipit-source-id: 37f748cbea9f93bf37193ee30fc295fb1a1e9ffd	2021-09-09 06:42:33 -07:00
CodemodService FBSourceClangFormatLinterBot	15c21fa45f	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D30835585 fbshipit-source-id: a7d35319fd3ae3eddd29b69d299d842f68d587f6	2021-09-09 04:23:50 -07:00
Yinghai Lu	233e3e5bb4	Fix lop1p lowering bug (#64724 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64724 `1` will introduce a int tensor instead of float tensor, which doesn't work well with downstream operators (elementwise). Error would be like ``` [TensorRT] WARNING: IElementWiseLayer with inputs (Unnamed Layer* 1) [Unary]_output and (Unnamed Layer* 2) [Constant]_output: first input has type Float but second input has type Int32. ``` Changing the constant to be float type fixes this. Reviewed By: 842974287 Differential Revision: D30796959 fbshipit-source-id: 0538e4dd960df9ce87a2d4cafe8f1a0c061b6bad	2021-09-09 00:59:44 -07:00
Peter Bell	d0b207e68b	Migrate uses of THCReduceApplyUtils to cuda_utils::BlockReduce (#64713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64713 Resubmit of #64442 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30825646 Pulled By: ngimel fbshipit-source-id: 66b06bd0b30b401833e337920681d19d96b11f9d	2021-09-08 22:09:01 -07:00
Rohan Varma	1553259520	[DDP] Remove SPMD from self.modules_buffers (#64474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64474 No need for a nested list here. ghstack-source-id: 137526312 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D30745960 fbshipit-source-id: 66a8f9847e9fe1e02c51b79647e93bf7665cf4d9	2021-09-08 19:16:15 -07:00
Rohan Varma	8c09510294	[DDP] Remove self.modules_params (#64473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64473 Unused after SPMD deprecated. ghstack-source-id: 137526305 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D30745961 fbshipit-source-id: 32d102502570291e01579e5b47a6d74dc71013bb	2021-09-08 19:16:13 -07:00
Rohan Varma	d59ecc02df	[DDP] Fix when buffers are reassigned in module (#64472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64472 Sometimes, user module can reassign tensor buffer, as in: ``` self.buffer = torch.randn(1, 2) # in init self.buffer += 1 # in forward ``` in this case, `self.modules_buffers` will become outdated and we should repopulate self.modules_buffers if we need to sync module buffers. See https://github.com/pytorch/pytorch/issues/63916 for full description of the issue. ghstack-source-id: 137526309 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D30745921 fbshipit-source-id: 25eb1edbf445703a481802e07f3058d38ea6fc64	2021-09-08 19:14:55 -07:00
Scott Wolchok	b6544ef815	[PyTorch] Fix MobileDebugInfo vector copy (#64030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64030 ghstack-source-id: 137566816 Test Plan: Pixel 3 before: https://our.intern.facebook.com/intern/aibench/details/320277034999340 Pixel 3 after: https://our.intern.facebook.com/intern/aibench/details/724509739115867 can see the vector copy disappear in the flame graph. Overall mean decreased from 354 ms to 348 ms (though I'm not sure if this is outside usual noise). Reviewed By: raziel Differential Revision: D30559032 fbshipit-source-id: 6d8bb5396d3449cc63023ee7acf694b5d146ddc1	2021-09-08 18:32:50 -07:00
Scott Wolchok	0d0d2f2ac5	[PyTorch] move from input ivalues in ByteCodeDeserializer (#64029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64029 This should save us a separate pass over the data structure to destroy it. ghstack-source-id: 137566821 Test Plan: Pixel3 before: https://www.internalfb.com/intern/aibench/details/503337445067962 after: https://our.intern.facebook.com/intern/aibench/details/320277034999340 overall mean time decreased from 373 ms to 358 ms. In flame graph, we can see that some time spent destroying a vector of IValues was moved into parseMethods, and the new parseMethods time is less than the old time plus the recursive destruction time. Reviewed By: dhruvbird Differential Revision: D30559530 fbshipit-source-id: d080295a846745ea03ac50f08f4f6c95f4eaf3d8	2021-09-08 18:32:48 -07:00
Scott Wolchok	f5e76b4e38	[PyTorch] Copy vectors less in Function::append_operator (#63977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63977 Doesn't seem to be any reason to copy these argument vectors. ghstack-source-id: 137566815 Test Plan: CI Reviewed By: dhruvbird, raziel Differential Revision: D30550301 fbshipit-source-id: 33c199f975e4fb62c50a8210dc08aa9bb7a3e2f2	2021-09-08 18:31:38 -07:00
Yinghai Lu	0ef32625a8	[FX] make visualizer produce different formatted output (#64699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64699 Previously we just hardcode to svg format. We should give folks a choice in terms of what format they want to see. If we give a weird extension like .abc and this will error out and we expect this to be the right behavior. Reviewed By: houseroad Differential Revision: D30718883 fbshipit-source-id: fe8827262f94ea6887999bb225de763d1909eef8	2021-09-08 18:22:12 -07:00
Nikita Shulga	86e3b2727e	Re-enable nightly doc pushes (#64708 ) Summary: That were accidentally disabled by https://github.com/pytorch/pytorch/pull/64222 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64708 Reviewed By: seemethere Differential Revision: D30822089 Pulled By: malfet fbshipit-source-id: 056b5c006f236c78ffe8afa4a5eab2f35e1bce89	2021-09-08 18:07:54 -07:00
Jordan Fix	9a6c2a75b8	[acc_tracer] Enable check_mutable_operations (#64456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64456 att Test Plan: CI Reviewed By: protonu Differential Revision: D30679174 fbshipit-source-id: 73f3a07d58380cd44fb3481aa97d463c0a964de8	2021-09-08 16:11:15 -07:00
Hui Guo	5c27a580ec	[tensorexpr] Allocate intermediate buffers at compile time (#64227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64227 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30652220 Pulled By: huiguoo fbshipit-source-id: cd75005cdfa42751318de7174b44e14a3a01634e	2021-09-08 15:34:44 -07:00
Hui Guo	527348a6fe	[tensorexpr] Add 'is_allocated' flag for buffers and use it to insert 'Alloc/Free' stmts (#64226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64226 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30652221 Pulled By: huiguoo fbshipit-source-id: ef9bb0e3db2c444b476e5fc23956bc34ae0f0111	2021-09-08 15:34:42 -07:00
Jordan Fix	f90153cda3	[acc_normalizer] Improve error when kwarg normalization fails (#64408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64408 att Test Plan: NFC Reviewed By: protonu Differential Revision: D30716392 fbshipit-source-id: e1c3bb1afcd5363a9d502549d8a46b90226be40c	2021-09-08 15:33:32 -07:00
Hector Yuen	4533e76e7c	Update breakpad to an existing commit: 7d188f6 (#64666 ) Summary: Fixes issue https://github.com/pytorch/pytorch/issues/64561 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64666 Reviewed By: driazati Differential Revision: D30814127 Pulled By: hyuen fbshipit-source-id: 511a30fc26153569b1cd39f34e4a1a6bb99cc5e4	2021-09-08 15:29:10 -07:00
Ilqar Ramazanli	149f1114fe	To add Stochastic Gradient Descent to Documentation (#63805 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Stochastic Gradient Descent to the documentation. <img width="466" alt="SGDalgo" src="https://user-images.githubusercontent.com/73658284/132585881-b351a6d4-ece0-4825-b9c0-126d7303ed53.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63805 Reviewed By: albanD Differential Revision: D30818947 Pulled By: iramazanli fbshipit-source-id: 3812028e322c8a64f4343552b0c8c4582ea382f3	2021-09-08 15:22:30 -07:00
Eli Uriegas	ff18195df9	.github: Upgrade windows CUDA 10.1 -> 10.2 (#64658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64658 We don't release 10.1 anymore so let's bump to 10.2 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet, janeyx99 Differential Revision: D30811178 Pulled By: seemethere fbshipit-source-id: c504ebf7f0d4c0d6229319d774f808b4ba0facd9	2021-09-08 14:43:33 -07:00
Shirong Wu	cc0565326c	Add plugin for linalg norm operation (#64611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64611 Add plugin for torch.linalg.norm, this plugin correctly only support norm operation without batch_size change, so vector input or matrix input with dim including '0' is not supported with this plugin. Test Plan: Unit test Reviewed By: 842974287 Differential Revision: D30525958 fbshipit-source-id: 0d66b60a390bb6235166e5a80390090d0acf691a	2021-09-08 14:33:20 -07:00
Natalia Gimelshein	a97015f22c	Revert D30735341: Migrate uses of THCReduceApplyUtils to cuda_utils::BlockReduce Test Plan: revert-hammer Differential Revision: D30735341 (`a5ad08ec70`) Original commit changeset: 3cb58bed8f1f fbshipit-source-id: 874dd0f93b24a99694db42a15714834069d402bc	2021-09-08 14:27:40 -07:00
Yinghai Lu	b12150608e	[fx] make const fold code more pythonic (#64451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64451 No functional change. Test Plan: ``` buck test caffe2/test:fx_const_fold ``` Reviewed By: jfix71, RoshanPAN, houseroad Differential Revision: D30718255 fbshipit-source-id: 95f98561c7f33fcc6c839db68683c85eb152c949	2021-09-08 13:55:10 -07:00
Zafar Takhirov	24e1315d4b	[quant] Enable jit tracing on quantizable LSTM (resubmission) (#64638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64638 The quantizable LSTM didn't support jit tracing because it had several non taceable paths. We sacrifice some of the user experience to enable the tracing. The main UX feature removed is a user-friendly message when trying to access the backwards path in a bidirectional LSTM: When the bidirectional flag is False, we used to throw a nice error message when the user tried accessing backwards weights. Now the message is default (removed properties). Test Plan: `buck test mode/dev //caffe2/test:quantization -- test_custom_module_lstm` Reviewed By: HDCharles Differential Revision: D30803753 fbshipit-source-id: a639955a96cee22538d9436f1c952a5d121f50f9	2021-09-08 13:34:18 -07:00
Peter Bell	d701357d92	Factor out TensorBase that doesn't depend on native operators (#63612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63612 This makes Tensor inherit from a new class TensorBase, that provides a subset of Tensor that doesn't directly depend on native_functions.yaml. Code that only includes TensorBase.h with thus not need to be rebuilt every time someone changes an operator signature. Making `Tensor` inherit from this class means that `const TensorBase&` parameters will be callable with an ordinary `Tensor`. I've also made `Tensor` constructible and assignable from `TensorBase` to minimize friction in code mixing the two types. To help enforce that `Tensor.h` and `Functions.h` aren't accidentally included, I've added an error into `Operators.h` if `TORCH_ASSERT_NO_OPERATORS` is defined. We can either set this in the build system for certain folders, or just define it at the top of any file. I've also included an example of manually special-casing the commonly used `contiguous` operator. The inline function's slow path defers to `TensorBase::__dispatch_contiguous` which is defined in `Tensor.cpp`. I've made it so `OptionalTensorRef` is constructible from `TensorBase`, so I can materialize a `Tensor` for use in dispatch without actually increasing its refcount. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728580 Pulled By: ezyang fbshipit-source-id: 2cbc8eee08043382ee6904ea8e743b1286921c03	2021-09-08 13:28:54 -07:00
David Riazati	92318a9116	Make doc previews use its own S3 bucket (#64594 ) Summary: We had been using the gha-artifacts bucket (which previously only stored workflow artifacts) to keep the docs around. This makes it hard to see how our storage for artifacts vs docs is trending. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64594 Reviewed By: seemethere Differential Revision: D30794328 Pulled By: driazati fbshipit-source-id: 6b2721a3d76e8a273bde055783d56551f8409edd	2021-09-08 11:36:50 -07:00
Thomas J. Fan	43c0f033fc	TST Adds inplace checks to module_info (#63739 ) Summary: Follow up to https://github.com/pytorch/pytorch/pull/61935 This PR adds inplace checks to `test_modules`. This version checks the constructor for `inplace` and performs the check automatically. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63739 Reviewed By: saketh-are Differential Revision: D30737774 Pulled By: jbschlosser fbshipit-source-id: 8813534511e9296c8424d1ca878412726ddd4043	2021-09-08 11:08:12 -07:00
Peter Bell	a5ad08ec70	Migrate uses of THCReduceApplyUtils to cuda_utils::BlockReduce (#64442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64442 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30735341 Pulled By: ngimel fbshipit-source-id: 3cb58bed8f1f5aa32fd49fd37b10c8490bcc645a	2021-09-08 11:02:12 -07:00
Eli Uriegas	deb9775c07	.github: Run docker containers in detach mode (#64459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64459 Should allow users to exec into the docker container if using with-ssh, even if the build / test command has finished executing Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30742797 Pulled By: seemethere fbshipit-source-id: 969ed8799216c6051439c7d41ab709b2d40938ac	2021-09-08 11:01:08 -07:00
Animesh Jain	18d24bb537	[NNC] Add Softplus operator (#64589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64589 Adding softplus operator lowering for NNC. Enabling element wise fusion as well. Test Plan: Added a test in test_jit_fuser.py Reviewed By: bertmaher Differential Revision: D30736449 fbshipit-source-id: 6c5fc3bceb5cef2322ecd4449f827e4af018ea93	2021-09-08 10:49:58 -07:00
Horace He	35413a16f7	Add `__matmul__` to the magic methods for FX tracing (#64512 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64483 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64512 Reviewed By: mrshenli Differential Revision: D30797265 Pulled By: Chillee fbshipit-source-id: 7630e048a960e0b27c4309d04d85301abe325189	2021-09-08 10:03:48 -07:00
kshitij12345	195cb4efa8	update scatter formula (#64546 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63430 Already tested OpInfo gradient tests `544c8e6a5d/torch/testing/_internal/common_methods_invocations.py (L8575-L8577)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64546 Reviewed By: saketh-are Differential Revision: D30768759 Pulled By: albanD fbshipit-source-id: 27d144971c51a956a232fc7d02df5c9d2706d565	2021-09-08 10:02:35 -07:00
Kevin Tse	1409492fdb	fixing trapezoid() comments for clarity (#64592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64592 cc mruberry rgommers heitorschueroff Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30785663 Pulled By: NivekT fbshipit-source-id: e968687fbb83a59bb46ce6858c6caafa5aa04412	2021-09-08 09:45:46 -07:00
Ivan Yashchuk	dd8f6ac597	Add forward mode differentiation for torch.linalg.cholesky and transpose (#62159 ) Summary: This PR adds forward mode differentiation for `torch.linalg.cholesky`, `torch.linalg.cholesky_ex`, and `transpose` functions. Complex tests for Cholesky fail because for some reason the gradcheck sends matrices full of zeros to `cholesky_jvp` function. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry heitorschueroff walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62159 Reviewed By: mrshenli Differential Revision: D30776829 Pulled By: albanD fbshipit-source-id: 32e5539ed6423eed8c18cce16271330ab0ea8d5e	2021-09-08 09:44:30 -07:00
Hojin Lee	a2934b38f8	Fix typo embedding_renorm_cuda_ (#64542 ) Summary: Fixes #{issue number} cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/64542 Reviewed By: mrshenli Differential Revision: D30792842 Pulled By: ngimel fbshipit-source-id: c9a548256d02b3ce6fb77dd9fb058084f2c91608	2021-09-08 09:36:24 -07:00
Rohan Varma	e0e832c2ba	[c10d] Provide failure reason from ProcessGroup when aborting NCCL comm (#64241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64241 When things go wrong PG NCCL aborts nccl communicators via `ncclCommAbort`, but one issues is that often the error can be set to `ncclSystemError` (see https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/c10d/NCCLUtils.hpp#L176) when that might not be the true cause of the issue and the actual issue is that some prior work timed out, communicator was aborted on other rank, etc. This results in a lot of confusion when debugging jobs with a large no. of processes as the current message for ncclSystemError is not very informative: https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/c10d/NCCLUtils.hpp#L22 The fix here is to pass in a string exception message from PG NCCL down to `NCCLUtils` which will aim to raise that as the actual issue and not the confusing `ncclSystemError` message. Test Plan: CI Reviewed By: pallab-zz, cbalioglu Differential Revision: D30658855 fbshipit-source-id: 17661dbe0a1bb8cc5b87b637c47634b1f52f54e1	2021-09-08 09:19:24 -07:00
Sameer Deshmukh	7205ca0210	Change MaxUnpool to accept tensors with 0-dim batch sizes. (#64082 ) Summary: Part of the fix for https://github.com/pytorch/pytorch/issues/38115. Changes the `MaxUnpool` module to work with 0-dimensions batch sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64082 Reviewed By: mrshenli Differential Revision: D30793907 Pulled By: jbschlosser fbshipit-source-id: d21aa665be5aa18f592b39ef7b4e3cbc632e21ed	2021-09-08 08:41:09 -07:00
johnlu	ba8c1fc648	Add Half conversion of bit cast for SYCL kernel (#64340 ) Summary: ## Motivation Enhance the performance of Half/float conversion in SYCL kernels. ## Solution Add the native SYCL half type to help convert the half from/to float in the kernel code. ## Additional Context `__SYCL_DEVICE_ONLY__` is a MACRO only valid when compiling the kernel code for SYCL backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64340 Reviewed By: gchanan Differential Revision: D30720823 Pulled By: ezyang fbshipit-source-id: e7e770d02df5b2d45da61d2fed3ba59383b3dc3a	2021-09-08 08:25:47 -07:00
Bert Maher	7f0feafa55	[nnc] Provide helpful error messages about turning off the fuser (#64516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64516 If fuser compilation fails due to a bug (which should be highly unlikely at this point) we want to direct the user how to unblock themselves by disabling fusion, in addition to requesting that they report a bug. ghstack-source-id: 137398537 Test Plan: existing tests Reviewed By: ZolotukhinM Differential Revision: D30758051 fbshipit-source-id: 98be89f1b1d4fb3bc816f5b2634c618b9297930e	2021-09-08 08:10:22 -07:00
leslie-fang-intel	768014b3e6	Allow disabling cache in autocast (automatic mixed precision) (#63552 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63552 In this PR, we want to exclude these 2 cases in the `Autocast` weight cache usages: - Using `torch.jit.trace` under the `Autocast` As report in https://github.com/pytorch/pytorch/issues/50231 and several other discussions, using `torch.jit.trace` under the `Autocast`, the trace process would hit Autocast's weight cache and fails. So we should disable weight cache under the trace process. - Using `Autocast` with `Grad mode` - Usually we are using `Grad mode` for training. Since in the training phase, the weight will change in every step. So we doesn't need to cache the weight. - For the recommended `Autocast` training case in the [doc](https://pytorch.org/docs/stable/amp.html), `Autocast` will clear the cache every step leaving the context. We should disable it to save the clear operations. ``` model = Net().cuda() optimizer = optim.SGD(model.parameters(), ...) for input, target in data: optimizer.zero_grad() with autocast(): output = model(input) loss = loss_fn(output, target) loss.backward() optimizer.step() ``` Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30644913 Pulled By: ezyang fbshipit-source-id: ad7bc87372e554e7aa1aa0795e9676871b3974e7	2021-09-08 07:47:18 -07:00
Protonu Basu	b616132403	Adding support for lowering 4Bit EmbeddingBag Operator (#5806 ) Summary: Pull Request resolved: https://github.com/pytorch/glow/pull/5806 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64001 Add 4 bit embeddingbag operator in acc_ops. Test Plan: Let CI run. Reviewed By: jfix71 Differential Revision: D30532824 fbshipit-source-id: bf476c9710477792aae202dacf64e23539c33bd9	2021-09-08 07:13:16 -07:00
Freey0	2223737da9	restore test_inplace_comparison_ops_require_inputs_have_same_dtype Expected behavior (#64267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64267 This test expects every operation to throw a runtime error. And Reinsert in-place operation test，Fix bug for comparison operation fix: #64018 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30720915 Pulled By: ezyang fbshipit-source-id: 215a6556d20770f70f4ced1c1f9a9753933f1d37	2021-09-08 06:42:12 -07:00
Zafar Takhirov	9cc44aad21	[quant] AO migration of the `quantize.py` (resubmission) (#64445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64445 AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates the quantize.py from torch.quantization to torch.ao.quantization. At this point both locations will be supported. Eventually the torch.quantization will be deprecated. Test Plan: `buck test mode/dev //caffe2/test:quantization` Reviewed By: HDCharles Differential Revision: D30734870 fbshipit-source-id: dc204f3cc46bff2cc81c95159eab9d333b43bb4b	2021-09-08 04:58:47 -07:00
Mikhail Zolotukhin	72274e2a2f	[TensorExpr] Don't rely on exceptions in Vectorizer. (#64609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64609 We've been using exceptions to indicate whether vectorization succeeded or not, but that posed some problems with (e.g. we spent too much time symbolicazing these exceptions). This change converts this mechanism to a standard error return code. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30795342 Pulled By: ZolotukhinM fbshipit-source-id: 16e38b37bcdd78ceb438ac814cc377f35b058e17	2021-09-08 00:25:34 -07:00
Jordan Fix	2341ec9ef1	[fx_const_fold] Fix constant folding for attrs in submodule hierarchies (#64342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64342 Previously we weren't handling the case where an attribute was in a module that wasn't the root. Test Plan: Added unit test coverage. Reviewed By: yinghai Differential Revision: D30691730 fbshipit-source-id: b39b5cf748c4c882f315a4f32b51ad88cc7a43ed	2021-09-07 22:44:39 -07:00
Hendrik Schröter	5721205417	Add __ge__ to TorchVersion (#64565 ) Summary: This PR adds greater equal comparison so that not the base class's (str) comparison method is used. This is necessary for a correct comparison with a version string. Previously the following was the case: ```py >>> torch.__version__ '1.10.0.dev20210830+cpu' >>> torch.__version__>"1.9" True >>> torch.__version__>="1.9" False # Wrong output since the base class (str) was used for __ge__ comparison ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64565 Reviewed By: raghuramank100 Differential Revision: D30790463 Pulled By: mrshenli fbshipit-source-id: 79c680f8b448001b34d3e5d5332124a78bea4e34	2021-09-07 20:16:09 -07:00
Maksim Levental	81fe2c5e49	add out variant of linear (#61801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61801 resubmitting because the last one was unrecoverable due to making changes incorrectly in the stack Test Plan: Imported from OSS Reviewed By: desertfire Differential Revision: D29812510 Pulled By: makslevental fbshipit-source-id: ba9685dc81b6699724104d5ff3211db5852370a6	2021-09-07 19:58:52 -07:00
Steven Jin	71ba76b1b5	Fix building docs instructions (#64508 ) Summary: Fixes #{64507} Removed duplicate instruction and linted the file a bit (consistent spacing around codeblocks/headers, adding code types in codeblocks, remove `$` from bash code blocks when uncecessary). Pull Request resolved: https://github.com/pytorch/pytorch/pull/64508 Reviewed By: raghuramank100 Differential Revision: D30791164 Pulled By: mrshenli fbshipit-source-id: a00db32dcfdd1ecc194c836f31174c806062eb6d	2021-09-07 19:01:52 -07:00
Nikita Shulga	4e98304eb9	Fix quicklint (#64612 ) Summary: Fixes land-race introduced by `a22c936b63` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64612 Reviewed By: ngimel Differential Revision: D30798648 Pulled By: malfet fbshipit-source-id: ca546f68141d44493deba7bbf840e5f9662e8558	2021-09-07 18:52:22 -07:00
Natalia Gimelshein	e777e1b01c	Revert D29998114: [pytorch][PR] enable bf16 mkldnn path for gemm Test Plan: revert-hammer Differential Revision: D29998114 (`acc9f9afc8`) Original commit changeset: 459dc5874c63 fbshipit-source-id: 1994623a3afc22a94bd0cf5de766b023185f5238	2021-09-07 18:45:13 -07:00
Don Jang	1a033b45dd	[JIT] Fix a bug of rejecting ops with AliasAnalysisKind::CONSERVATIVE (#64336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64336 Currently AliasDB rejects any user-defined ops with `AliasAnalysisKind::CONSERVATIVE` if they do not have a special treatment for alias analysis. For example, the following alias schema gets rejects: ``` m.def(torch::schema( "namescope::my_op(...) -> ...", c10::AliasAnalysisKind::CONSERVATIVE)); ``` This rejection condition is contradictory: AliasDB can handle ops with `CONSERVATIVE` in a general way without any special casing at https://fburl.com/diffusion/op5u72sk calling https://fburl.com/diffusion/h3aws5dd which seems very appropriate to be conservative for alias analysis. This change corrects the rejection condition to be satisfied for ops with special casing but have `CONSERVATIVE`, since they both cannot be used simultaneously. Test Plan: Confirmed that ``` m.def(torch::schema( "namescope::my_op(...) -> ...", c10::AliasAnalysisKind::CONSERVATIVE)); ``` gets accepted and `my_op`'s all inputs and outputs are put to point to wildcard(*) by AliasDB. Reviewed By: eellison Differential Revision: D30690121 fbshipit-source-id: 431cc1a84edd5227f52b44a0fd85d5eb16f3c288	2021-09-07 18:26:31 -07:00
Elias Ellison	8e1fdd4cd3	Add symbolic shape comparison optimization (#64300 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64300 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738146 Pulled By: eellison fbshipit-source-id: 96287798535b367f23d3e9430d70fc02c59744ab	2021-09-07 18:22:32 -07:00
Elias Ellison	474a51b6bf	Refactor to use shape arguments (#64299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64299 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738141 Pulled By: eellison fbshipit-source-id: 37ca30de81349ecf23d8656291863737b6ad6d96	2021-09-07 18:22:30 -07:00
Elias Ellison	bccbe310ef	Add view with negative dim (#63516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63516 how to review: pretty much just check that the inputs generated are a good representation of the op semantics, that should be sufficient for correctness, and then you can also double check the op size semantics by going to https://codebrowser.bddppq.com/pytorch/pytorch/ typing in native::{op_name} and looking at the op implementation as a bonus if you want Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738143 Pulled By: eellison fbshipit-source-id: c7cd01cb2c8a13cb2664415f3d98aedec19a8e07	2021-09-07 18:22:28 -07:00
Elias Ellison	5a1f8b8573	Generalize expand logic (#63615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63615 how to review: pretty much just check that the inputs generated are a good representation of the op semantics, that should be sufficient for correctness, and then you can also double check the op size semantics by going to https://codebrowser.bddppq.com/pytorch/pytorch/ typing in native::{op_name} and looking at the op implementation as a bonus if you want Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738148 Pulled By: eellison fbshipit-source-id: 4ef74a9c9b39c0beb73949e63aa844c46ab637eb	2021-09-07 18:22:26 -07:00
Elias Ellison	5eb8cec663	Add permute, arange (#63407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63407 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738149 Pulled By: eellison fbshipit-source-id: 36d572488408d38b0643aa93cb08aab5c45218ad	2021-09-07 18:22:24 -07:00
Elias Ellison	cf2d15bf84	Add support for slice, selec twith int, index_select (#63365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63365 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30738144 Pulled By: eellison fbshipit-source-id: 7e0c572209bdc6e62ecb4fd1f06f80291de69803	2021-09-07 18:22:22 -07:00
Elias Ellison	c8a608b197	Add squeeze, unsqueeze, transpose shape functins (#63099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63099 These are checked by OpInfos, which represent all of the inputs and semantics of the operators so it should be an easy stamp Test Plan: Imported from OSS Reviewed By: desertfire, astaff Differential Revision: D30347514 Pulled By: eellison fbshipit-source-id: 37b4c9ecd8c222cc12bf39166181464b43218830	2021-09-07 18:22:19 -07:00
Elias Ellison	a39f3c68b7	Add batch of unary functions (#63050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63050 Test Plan: Imported from OSS Reviewed By: priyaramani, astaff Differential Revision: D30347513 Pulled By: eellison fbshipit-source-id: abaf641778671d17df87a2b7b47bad7501a91b5a	2021-09-07 18:21:04 -07:00
Yanli Zhao	c1b701bc3e	Back out "update rpc tensorpipe logic for sparse tensors" (#64575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64575 Original commit changeset: daee9a567645 Test Plan: unit test Reviewed By: gcramer23 Differential Revision: D30778736 fbshipit-source-id: 8d9386158fb6a3d025c149cdc37558d57c615e9f	2021-09-07 18:00:39 -07:00
lezcano	566ee1217f	Use trsm for triangular_solve in CPU (#63567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63567 The current implementation called trtrs for CPU and trsm for CUDA. See https://github.com/pytorch/pytorch/issues/56326#issuecomment-825496115 for a discussion on the differences between these two functions and why we prefer trsm vs trtrs on CUDA. This PR also exposes the `side` argument of this function which is used in the second PR of this stack to optimise the number copies one needs to make when preparing the arguments to be sent to the backends. It also changes the use of `bool`s to a common enum type to represent whether a matrix is transposed / conj transposed, etc. This makes the API consistent, as before, the behaviour of these functions with `transpose=True` and `conjugate_transpose=True` it was not well defined. Functions to transform this type into the specific types / chars for the different libraries are provided under the names `to_blas`, `to_lapack`, `to_magma`, etc. This is the first of a stack of PRs that aim to improve the performance of `linalg.solve_triangular`. `trsm` has an extra parameter (`side`), which allows to ellide the copy of the triangular matrix in many cases. Fixes https://github.com/pytorch/pytorch/issues/56326 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30566479 Pulled By: mruberry fbshipit-source-id: 3831af9b51e09fbfe272c17c88c21ecf45413212	2021-09-07 17:26:17 -07:00
Tao Xu	52ff9bc639	[iOS][Metal] Add aten:hardswish (#64588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64588 Add `aten::hardswish` to run the mobilenetv3 model from torchvision. ghstack-source-id: 137479323 Test Plan: - buck test pp-macos - circleCI Reviewed By: beback4u Differential Revision: D30781008 fbshipit-source-id: 83454869195ef4ab50570ea9b3bf2a55f32a3e86	2021-09-07 15:41:29 -07:00
kshitij12345	2c351c76e0	[special] Alias igamma, igammac to special.gammaninc, special.gammaincc (#61902 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Also added relevant OpInfo TODO: * [x] Check rendered docs gammainc : https://docs-preview.pytorch.org/61902/special.html#torch.special.gammainc * [x] Check rendered docs gammaincc: https://docs-preview.pytorch.org/61902/special.html#torch.special.gammaincc Pull Request resolved: https://github.com/pytorch/pytorch/pull/61902 Reviewed By: ngimel Differential Revision: D30761428 Pulled By: mruberry fbshipit-source-id: 06a16432873357958d53364f12a4e91c29779d26	2021-09-07 15:31:26 -07:00
Mike Ruberry	b01d2d1d3e	Disables four failing distributions tests on windows (#64596 ) Summary: Per title. Unblocks CI. See https://github.com/pytorch/pytorch/issues/64595. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64596 Reviewed By: mrshenli Differential Revision: D30787296 Pulled By: mruberry fbshipit-source-id: 84b90cb25c0185f1851db02425ea40aa13d3e598	2021-09-07 15:29:13 -07:00
driazati	a22c936b63	Add lint to ensure .github/ pypi dependencies are pinned (#64463 ) Summary: Example failing run: https://github.com/pytorch/pytorch/pull/64463/checks?check_run_id=3501249102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64463 Reviewed By: janeyx99 Differential Revision: D30744930 Pulled By: driazati fbshipit-source-id: 4dd97054db1d4c776a4512bc3d664987cd7b6d23	2021-09-07 15:28:11 -07:00
David Riazati	7e88d0b370	Update explicit_ci_jobs to work with GHA (#64598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64598 This adds a filter option rather than an all-or-nothing so it's easier to iterate on a specific job. ```bash python tools/testing/explicit_ci_jobs.py --filter-gha 'generated-linux-gcc5.4*' ``` See #64600 for an example usage NB: If you regenerate the worfklows you will need to re-run that command to re-delete everything. Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D30788850 Pulled By: driazati fbshipit-source-id: a32c266bbd876c396665bceef9a0a961b4586564	2021-09-07 15:21:12 -07:00
Nikita Shulga	a48d83a575	Move ParallelTBB to GHA (take 2) (#64193 ) Summary: 2nd attempt to do the same Skip failing `TestTensorCreationCPU.test_trilu_indices_cpu` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64193 Reviewed By: mrshenli Differential Revision: D30779469 Pulled By: malfet fbshipit-source-id: 5c51fcbb383d0823d0e953d7af181b5f22eda9ab	2021-09-07 15:11:00 -07:00
Mike Iovine	369db8924f	[Static Runtime] Add first iter metric (#64457 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64457 The first iteration is special since it initializes the memory planner. This change logs and reports first iteration time during benchmarking. It also generates a FAI-PEP output when `generate_ai_pep_output` is set. Test Plan: Run any benchmark, and observe: ``` I0902 15:19:32.528977 2492358 impl.cpp:948] PyTorchObserver {"value":6.415958881378174,"unit":"ms","metric":"latency","type":"static_runtime_first_iter"} ... First iter time: 6.41596 ms ``` Note that this metric is likely to have significantly more noise than the others since we don't have as many data points. Unit tests: `buck test //caffe2/test:static_runtime` Reviewed By: d1jang Differential Revision: D30740619 fbshipit-source-id: 4dcfccd5629f4fa34254fd355073ef19e151245a	2021-09-07 15:00:30 -07:00
Wenliang Zhao	3bd69d3020	add bubdle input into AIBench (#64557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64557 MaskRCNN speed depends on how many people detected in the detection stage. A random input from dataloader doesn't satisfy this. In order to standardize the benchmarking, we use 2 standard image for benchmarking, 2/3 people. Test Plan: AIBench result: https://www.internalfb.com/intern/aibench/details/945883114818980 Reviewed By: axitkhurana Differential Revision: D30446049 fbshipit-source-id: a2826fdb69e9f840c0afc566c4cbbcde1c2fba89	2021-09-07 14:46:23 -07:00
Facebook Community Bot	3c87f55752	Automated submodule update: FBGEMM (#64582 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `3ce04fc664` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64582 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: mrshenli Differential Revision: D30779695 fbshipit-source-id: 22460a4047e2462e672eb4931e44648ae6bde627	2021-09-07 14:16:22 -07:00
haozhe.zhu	acc9f9afc8	enable bf16 mkldnn path for gemm (#61891 ) Summary: # Goal: Integrate mkldnn bf16 Gemm to pytorch ## BF16 Suport for mm, addmm, bmm, addbmm, baddbmm, mv, addmv, dot (with mkldnn matmul primitive): https://oneapi-src.github.io/oneDNN/group__dnnl__api__matmul.html For gemm related ops, we keep all inputs under plain format. So we will not introduce opaque tensor for these ops to save mem copy here. ![mkldnn bf16 gemm integration](https://user-images.githubusercontent.com/54701539/126263077-4b5134e1-52a7-4fad-94fb-19e13a0377f6.png) The minimized integration is only dispatch to mkldnn in addmm, but for gemm with 3-D input (with additional dim for"batch") this will call mkldnn gemm for "batch" times. Since mkldnn matmul support input with multiple dims, we directly dispatch to mkldnn gemm in {bmm, addbmm, baddbmm} to reduce the time to create mkldnn memory desc, primitive, etc. For the different definition for "bias" between mkldnn(which must be shape of (1, N)) and pytorch (which can be same shape with gemm result (M, N)), we use a fused sum to handle it. ## User Case: User case is exactly same with before because no opaque tensor's is introduced. Since the pytorch has already support bf16 data type with CPU tensor before, we can leverage the existed bf16 gemm UT. ## Gemm performance gain on CPX 28Cores/Socket: Note: data is collected using PyTorch operator benchmarks: https://github.com/pytorch/pytorch/tree/master/benchmarks/operator_benchmark (with adding bfloat16 dtype) ### use 1 thread on 1 core ### torch.addmm (M, N) * (N, K) + (M, K) \| impl \|16x16x16\|32x32x32\| 64x64x64 \| 128x128x128\| 256x256x256\| 512x512x512\|1024x1024x1024\| \|:---:\|:---:\| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| \| aten-fp32\| 4.115us\|4.583us\|8.230us\|26.972us\|211.857us\|1.458ms\|11.258ms\| \| aten-bf16 \| 15.812us\| 105.087us\|801.787us\|3.767ms\|20.274ms\|122.440ms\|836.453ms\| \| mkldnn-bf16 \|20.561us \|22.510us\|24.551us\|37.709us\|143.571us\|0.835ms\|5.76ms\| We can see mkldnn-bf16 are better than aten bf16, but for smaller shapes, mkldnn bf16 are not better than aten fp32. This is because onednn overhead, this overhead more like a "constant" overhead and while problems get larger, we can ignore it. Also we are continue optimize the kernel efficiency and decrease the overhead as well. More shapes \| impl \|1x2048x2048\|2048x1x2048\| 2048x2048x1 \| \|:---:\|:---:\| :---: \| :---: \| \| aten-fp32\| 0.640ms\|3.794ms\|0.641ms\| \| aten-bf16 \| 2.924ms\| 3.868ms\|23.413ms\| \| mkldnn-bf16 \|0.335ms \|4.490ms\|0.368ms\| ### use 1 socket (28 thread, 28 core) \| impl \| 256x256x256\| 512x512x512\|1024x1024x1024\| 2048x2048x2048\|4096x4096x4096\| \|:---:\| :---: \| :---: \| :---: \| :---: \| :---: \| \| aten-fp32\| 35.943us \|140.315us\|643.510us\|5.827ms\|41.761ms\| \| mkldnn-bf16 \|53.432us\|114.716us\|421.858us\|2.863ms\|23.029ms\| More shapes \| impl \|128x2048x2048\|2048x128x2048\| 2048x2048x128 \| \|:---:\|:---:\| :---: \| :---: \| \| aten-fp32\| 0.561ms\|0.458ms\|0.406ms\| \| mkldnn-bf16 \|0.369ms \|0.331ms\|0.239ms\| We dose not show aten-bf16 for this case since aten-bf16 always compute as single thread and the performance is extreme poor. The trend for this case is similar for 1 thread on 1 core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61891 Reviewed By: iramazanli Differential Revision: D29998114 Pulled By: VitalyFedyunin fbshipit-source-id: 459dc5874c638d62f290c96684ca0a694ded4b5a	2021-09-07 13:00:37 -07:00
Anirudh Dagar	337c71be05	Array API: Add `torch.linalg.matmul` alias to `torch.matmul` (#63227 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62811 Add `torch.linalg.matmul` alias to `torch.matmul`. Note that the `linalg.matmul` doesn't have a `method` variant. Also cleaning up `torch/_torch_docs.py` when formatting is not needed. cc IvanYashchuk Lezcano mruberry rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/63227 Reviewed By: mrshenli Differential Revision: D30770235 Pulled By: mruberry fbshipit-source-id: bfba77dfcbb61fcd44f22ba41bd8d84c21132403	2021-09-07 12:35:32 -07:00
Jane Xu	8407ce7e38	[small BE] .github: refactor concurrency into a common macro (#64587 ) Summary: By using a macro for these concurrency groups, we can edit just one place for the linux and windows workflows (vs 2). I wanted to loop all the other workflow files in as well, but since those aren't generated, the macros won't work the same way. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64587 Reviewed By: mrshenli Differential Revision: D30783224 Pulled By: janeyx99 fbshipit-source-id: ae16ebb12d2d63a563d28f0ce88e280f68ed4b9b	2021-09-07 12:31:55 -07:00
Kevin Tse	7e4ebe06ca	Fixes issue related torch.trapezoid broadcasting behavior and documentation (#64054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64054 Fixes #63608 cc mruberry rgommers heitorschueroff Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D30617078 Pulled By: NivekT fbshipit-source-id: 815896ec56d447562790df4d662e94fd13457e2a	2021-09-07 11:41:55 -07:00
Danielle Pintz	c9d6ca4c54	Add space in Feature Request issue template (#64563 ) Summary: Add space between emoji and text in Feature Request issue template Pull Request resolved: https://github.com/pytorch/pytorch/pull/64563 Reviewed By: janeyx99 Differential Revision: D30779429 Pulled By: seemethere fbshipit-source-id: 3625299923a7022fa66473633524a6620d58188b	2021-09-07 11:36:06 -07:00
Lu Fang	85eeb4d682	Clean up op BC check list (#64584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64584 It has been a while since last clean up. The list is really long. Test Plan: ci Reviewed By: hl475 Differential Revision: D30779350 fbshipit-source-id: 908b47d0b9a16b784aad6a34c5c87f923500c247	2021-09-07 11:25:40 -07:00
Ilqar Ramazanli	43248d9112	[doc][hackathon] To add Adam Optimizer to the documentation (#63251 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Adam Algorithm to the documentation. For more details, we refer to the paper https://arxiv.org/abs/1412.6980 <img width="442" alt="Screen Shot 2021-08-27 at 6 37 54 PM" src="https://user-images.githubusercontent.com/73658284/131195297-35fce613-3691-4fed-b42d-db234d4fcd7c.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63251 Reviewed By: albanD Differential Revision: D30779163 Pulled By: iramazanli fbshipit-source-id: 319a80fc3952793b0d064d0e641ddc1de3c05a86	2021-09-07 11:03:35 -07:00
Yanli Zhao	adb85b32d3	minor fix for elastic doc (#64531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64531 fix #64530 Test Plan: unit test Reviewed By: mrshenli Differential Revision: D30760879 fbshipit-source-id: 94ed1476e886513427d928a36f5be6b9bfff0826	2021-09-07 09:31:01 -07:00
Philip Meier	26b7ff5aea	deprecate dtype getters from `torch.testing` namespace (#63554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554 Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold: 1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`. 2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries. We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D30662206 Pulled By: mruberry fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56	2021-09-07 08:58:51 -07:00
Ilqar Ramazanli	f767cf6683	To change WarmUp Scheduler with ConstantLR and LinearLR (#64395 ) Summary: Partially unblocks https://github.com/pytorch/vision/issues/4281 Previously we have added WarmUp Schedulers to PyTorch Core in the PR : https://github.com/pytorch/pytorch/pull/60836 which had two mode of execution - linear and constant depending on warming up function. In this PR we are changing this interface to more direct form, as separating linear and constant modes to separate Schedulers. In particular ```Python scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant") scheduler2 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="linear") ``` will look like ```Python scheduler1 = ConstantLR(optimizer, warmup_factor=0.1, warmup_iters=5) scheduler2 = LinearLR(optimizer, warmup_factor=0.1, warmup_iters=5) ``` correspondingly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64395 Reviewed By: datumbox Differential Revision: D30753688 Pulled By: iramazanli fbshipit-source-id: e47f86d12033f80982ddf1faf5b46873adb4f324	2021-09-07 08:42:31 -07:00
Mike Iovine	75b9e4a128	[JIT] Freeze unrolls constant loops (#63614 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63614 There are a number of optimizations (`RemoveListMutation` in particular) that are tied to loop unrolling in `runOptimizations`. However, these were not invoked from `freeze_module` since the freezing pass should be idempotent. This diff makes `runOptimizations` run `UnrollConstantLoops` instead of `UnrollLoops`. `freeze_module` is then able to run these optimizations. Test Plan: Observed that `freeze_module` applies `RemoveListMutation` Reviewed By: eellison Differential Revision: D30437356 fbshipit-source-id: cba04bd958a48ad51b151aa3264f3d5bbb1fc2a4	2021-09-07 08:06:47 -07:00
Kefei Lu	adbcc819cd	Fix fx2trt SplitterBase non_tensor_input logic (#64286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64286 During graph splitting, `_SplitterBase` supports taking into consideration whether the subnet boundary nodes produces "supported" outputs that will cross the acc/non-acc boundary. Specifically, if the backend only supports Tensor-based data passing cross boundary, then we cannot split the graph at a place where the node output is a non-Tensor type (e.g., `Tuple[Tensor]`). There's currently a bug in this logic that it does not correctly detect the output type of a Node. Instead of using `Node.meta['tensor_meta']`, we should instead check `Node.meta['type']`. `Node.meta['tensor_meta']` is not appropriate because this key will exist if the node output is an iterable and one of the element is of type `Tensor`. So `Tuple[Tensor]` will be wrongly considered "supported". Test Plan: arc lint run CI tests Reviewed By: yinghai, 842974287 Differential Revision: D30617147 fbshipit-source-id: e8ba70dfaddc05cafb8037d58fca73b7ccbb1a49	2021-09-07 04:02:29 -07:00
Ivan Yashchuk	32fbeb170d	Update error messages that use LAPACK error codes (#63864 ) Summary: This PR updates the` batchCheckErrors` and `singleCheckErrors` functions so that the error messages are defined only once. `batchCheckErrors` function reuses `singleCheckErrors` now. Fixes https://github.com/pytorch/pytorch/issues/63220, fixes https://github.com/pytorch/pytorch/issues/59779 cc jianyuh nikitaved pearu mruberry heitorschueroff walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/63864 Reviewed By: ngimel Differential Revision: D30672933 Pulled By: mruberry fbshipit-source-id: 0ba37ff98ef278efdb12c3890aa07d687047da7a	2021-09-07 00:05:46 -07:00
Anirudh Dagar	1a1fb31cfa	Support `torch.concat` alias, add `cat` OpInfo & remove OpInfo test_out skips {cat, stack, hstack, vtack, dstack} (#62560 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61767 ## Changes - [x] Add `torch.concat` alias to `torch.cat` - [x] Add OpInfo for `cat`/`concat` - [x] Fix `test_out` skips (Use `at::native::resize_output` or `at::native::resize_output_check`) - [x] `cat`/`concat` - [x] `stack` - [x] `hstack` - [x] `dstack` - [x] `vstack`/`row_stack` - [x] Remove redundant tests for `cat`/`stack` ~I've not added `cat`/`concat` to OpInfo `op_db` yet, since cat is a little more tricky than other OpInfos (should have a lot of tests) and currently there are no OpInfos for that. I can try to add that in a subsequent PR or maybe here itself, whatever is suggested.~ Edit: cat/concat OpInfo has been added. Note: I've added the named tensor support for `concat` alias as well, maybe that's out of spec in `array-api` but it is still useful for consistency in PyTorch. Thanks to krshrimali for guidance on my first PR :)) cc mruberry rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff krshrimali Pull Request resolved: https://github.com/pytorch/pytorch/pull/62560 Reviewed By: saketh-are Differential Revision: D30762069 Pulled By: mruberry fbshipit-source-id: 6985159d1d9756238890488a0ab3ae7699d94337	2021-09-06 23:57:18 -07:00
Natalia Gimelshein	0a1aaff0de	Remove dead code from THC (THCApply.cuh) (#64559 ) Summary: cc peterbell10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64559 Reviewed By: mruberry Differential Revision: D30769526 Pulled By: ngimel fbshipit-source-id: 034a5c778a2b902cffa57b76511fa0dcdea26825	2021-09-06 21:26:08 -07:00
Nikita Shulga	571a2becf3	Move ParallelNative and PureTorch to GHA (#64452 ) Summary: Separate ParallelTBB move to https://github.com/pytorch/pytorch/pull/64193 as it requires some further investiagation Pull Request resolved: https://github.com/pytorch/pytorch/pull/64452 Reviewed By: seemethere, janeyx99 Differential Revision: D30738337 Pulled By: malfet fbshipit-source-id: 81c46423e903058bd1a3e8553e8a10ce978eeefd	2021-09-06 11:40:44 -07:00
Shen Xu	544c8e6a5d	Mark functions in backend header as inline to suppress warning (#64098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64098 Reviewed By: kimishpatel, iseeyuan Differential Revision: D30593104 fbshipit-source-id: 328196b9bc4a89a28ad89bede7e337107976c303	2021-09-05 16:45:23 -07:00
Bert Maher	bcc7e82371	Revert D30745610: [nnc] Make our exceptions c10::Errors, get C++ stacktraces Test Plan: revert-hammer Differential Revision: D30745610 (`18b2751ea1`) Original commit changeset: a1cfaa7364ef fbshipit-source-id: 9b716053b96a65745240ddef1c456c44d5d09671	2021-09-05 16:08:09 -07:00
Sangbaek Park	49fe829cae	[Vulkan] Code Quality: Remove duplicate code for hardshrink and leaky_relu functions (#64405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64405 Code quality improvement: removed duplicate code for hardshrink and leaky_relu functions. ghstack-source-id: 137319378 Test Plan: ```buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test"``` Reviewed By: SS-JIA Differential Revision: D30690251 fbshipit-source-id: 5729d1f32946e42f41df77756a8313f297dd822f	2021-09-05 12:53:58 -07:00
Mike Ruberry	1901c675e1	Back out "nn.functional.linear OpInfo" (#64517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64517 Original commit changeset: ca41dbd98176 Test Plan: PyTorch CI Reviewed By: ngimel Differential Revision: D30758201 fbshipit-source-id: 2d3274293d340373b8af86083336607818019619	2021-09-05 02:25:00 -07:00
Chris Cai	008bf6689b	Back out "D30740897 Add fusion enabled apis" (#64500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64500 D30740897 (`39aeb3bf63`) broke caffe2/torch/fb/module_factory/optimizers/tests:test_full_sync_optimizer_needed_coverage (https://fburl.com/test/mb46jxon) and blocked training_platform_unit_tests {F660271297} multsect results confirms ``` multisect --config FBCODE_TEST bisect 844424966128796 --workers 16 revisions --begin 09629edc --end fc86b434 D30740897 (`39aeb3bf63`) ```` {F660271232} Test Plan: ``` buck test mode/opt //caffe2/torch/fb/module_factory/optimizers/tests:test_full_sync_optimizer_needed_coverage Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4785074671474181 ✓ Pass: caffe2/torch/fb/module_factory/optimizers/tests:test_full_sync_optimizer_needed_coverage - main (3.729) Summary Pass: 1 ``` Differential Revision: D30753916 fbshipit-source-id: 302fd4113ef1f3069846be03edc2300d82b66719	2021-09-04 20:55:58 -07:00
Bert Maher	18b2751ea1	[nnc] Make our exceptions c10::Errors, get C++ stacktraces (#64332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64332 With this diff, if a compiler bug occurs (unlikely, I know!) we'll be able to get a c++ stacktrace leading to the exception, rather than just a terse message. E.g., ``` RuntimeError: UNSUPPORTED DTYPE Exception raised from compilation_error at ../torch/csrc/jit/tensorexpr/exceptions.h:32 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f966659b2eb in /fsx/users/bertrand/c\ onda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0x376f099 (0x7f966a195099 in /fsx/users/bertrand/conda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so) frame #2: <unknown function> + 0x3763bf5 (0x7f966a189bf5 in /fsx/users/bertrand/conda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so) frame #3: torch::jit::tensorexpr::CudaCodeGen::Initialize() + 0xdd8 (0x7f966a193368 in /fsx/users/bertrand/conda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_cuda\ .so) ``` Test Plan: Imported from OSS Reviewed By: huiguoo Differential Revision: D30745610 Pulled By: bertmaher fbshipit-source-id: a1cfaa7364ef4120de834e9cbe57ced1d082ab4e	2021-09-04 20:31:54 -07:00
Peter Bell	6cac7ca980	Ensure num_threads is initialized in get_num_threads (#64486 ) Summary: Possible source of the recent layernorm CI failures. `lazy_init_num_threads` appears at the top of `parallel_for` and can change the number of threads set. So, we need to ensure `num_threads` is initialized during `get_num_threads` calls as well. It's already done this way for OpenMP, but is missing from other parallel backends. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64486 Reviewed By: mruberry Differential Revision: D30752615 Pulled By: ngimel fbshipit-source-id: 085873ce312edbee1254c0aaae30dec7fcfe2c57	2021-09-04 12:38:09 -07:00
Facebook Community Bot	604e885925	Automated submodule update: FBGEMM (#64338 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `9ccb2714a9` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64338 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D30690319 fbshipit-source-id: 884d1f950cd1f7d2a77b79affb9215f285d5d0da	2021-09-04 00:44:28 -07:00
Ivan Yashchuk	a91a278d60	Fix `copy_transpose_valid` condition for `copy_same_type_transpose_` (#64425 ) Summary: Thanks to ngimel for the hint where the problem might be (https://github.com/pytorch/pytorch/issues/64358#issuecomment-910868849)! I added a test that fails on master to verify the fix. The shape `(60, 60)` was chosen because of `MIN_SZ = 60 * 60` in `copy_transpose_valid`. Fixes https://github.com/pytorch/pytorch/issues/64358 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64425 Reviewed By: mruberry Differential Revision: D30752725 Pulled By: ngimel fbshipit-source-id: f40370ea8365c94e30f8e8a3dcab5f3b3462464a	2021-09-03 18:50:33 -07:00
Michael Carilli	e4ff14ad59	[CUDA graphs] Error if attempting to capture uncapturable nccl (#64440 ) Summary: NCCL < 2.9.6 is not capturable. Attempting to capture it can cause nasty behavior (for example, ive seen capture succeed, but replay silently hang). Pytorch should preempt this with a friendlier error. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64440 Reviewed By: mruberry Differential Revision: D30733884 Pulled By: ngimel fbshipit-source-id: 5f2df3cf5cc0e5e68f49bf22a80d9f58064dc7ec	2021-09-03 13:23:07 -07:00
Nikita Shulga	0e3b45eaef	Fix logical typo in _compare_trilu_indices (#64468 ) Summary: I'm pretty sure that repeating the same call twice is pretty meaningless and intend was to call `tril`/`tril_indices` in first case and `triu`/`triu_indices` in another Pull Request resolved: https://github.com/pytorch/pytorch/pull/64468 Reviewed By: mruberry Differential Revision: D30744978 Pulled By: malfet fbshipit-source-id: 7cd36789a7ebf1cc263fb2d875e479c05e7588a4	2021-09-03 10:22:49 -07:00
Ansley Ussery	6831d8e379	Support Union in TorchScript (#64234 ) Summary: This PR is created to replace https://github.com/pytorch/pytorch/pull/53180 PR stack, which has all the review discussions. Reason for needing a replacement is due to a messy Sandcastle issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64234 Reviewed By: gmagogsfm Differential Revision: D30656444 Pulled By: ansley fbshipit-source-id: 77536c8bcc88162e2c72636026ca3c16891d669a	2021-09-03 06:12:24 -07:00
Kefei Lu	91b926fab3	Add fx2trt pass for removing duplicate output args (#64461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64461 Fx2TRT does not support duplicate nodes in the output args tuple. This pass removes duplicate output args from the target subnets and fixes their uses in the top level module where the subnets are called. This pass must be called after acc split on the top-level net and subsequent calls to the acc trace on the subnets. This pass will change both the subnets and top level module. Test Plan: Run: ``` buck run mode/opt -c python.package_style=inplace //caffe2/torch/fb/fx2trt/tests/passes/:test_remove_duplicate_output_args ``` Reviewed By: yinghai Differential Revision: D30740499 fbshipit-source-id: 98459f7677980b21c7bffda918158001285572db	2021-09-02 23:04:12 -07:00
Elias Ellison	39aeb3bf63	Add fusion enabled apis (#64429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64429 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D30740897 Pulled By: eellison fbshipit-source-id: 446aa63b5d763f1cfffea62547db7294368e3438	2021-09-02 22:19:09 -07:00
Elias Ellison	7031fbdc63	update optimize_for_inference docs (#64428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64428 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D30740898 Pulled By: eellison fbshipit-source-id: b94d2c3deb661a6ba048f19e8c1d5e1799667eeb	2021-09-02 22:17:58 -07:00
James Reed	e1c3e5f830	[resubmit][FX] Prototype for guarding against mutable operations in tracing (#64467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64467 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30744870 Pulled By: jamesr66a fbshipit-source-id: fc652f8b17748f90dbeb83fabf3bd5bb57d6ff1a	2021-09-02 21:13:21 -07:00
Mike Ruberry	cd82bc1af9	Skips layer norm OpInfo on tbb platform (#64469 ) Summary: The OpInfo tests appear to be discovering a layer norm x tbb issue that requires investigation. Skipping tests on that platform for now to restore CI signal. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64469 Reviewed By: ngimel Differential Revision: D30745746 Pulled By: mruberry fbshipit-source-id: 282484cc00b867fac85b7df61430d64277da6421	2021-09-02 20:53:01 -07:00
Peter Bell	c19bd05e84	THC: Cleanup dead code (#64441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64441 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30735342 Pulled By: ngimel fbshipit-source-id: 84ab36f7aec6b8cd7f1f34c19a58a382c06ad68d	2021-09-02 17:45:16 -07:00
driazati	db692ec0b3	Regenerate generated github workflows (#64465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64465 These were out of date and causing master failures Test Plan: Imported from OSS Reviewed By: zhouzhuojie Differential Revision: D30744594 Pulled By: driazati fbshipit-source-id: 09a21c3c5d9bc83b368d66cabbafd1ba83302dd3	2021-09-02 17:31:29 -07:00
David Riazati	e161872aab	Revert D30732630: [quant] Enable jit tracing on quantizable LSTM Test Plan: revert-hammer Differential Revision: D30732630 (`116142143c`) Original commit changeset: 443e351ebb0e fbshipit-source-id: 49001392f01366f3b1ccc31139f824c80b86cd40	2021-09-02 17:08:26 -07:00
Zafar Takhirov	046ed57a4d	Revert D30055886: [quant] AO migration of the `quantize.py` Test Plan: revert-hammer Differential Revision: D30055886 (`44e3ed88c9`) Original commit changeset: 8ef7470f9fa6 fbshipit-source-id: c5bd3ead43a2d44b9e56872ec5bd7a195bdac725	2021-09-02 16:59:59 -07:00
Jane Xu	4968d0b34f	[POC] .github: Add event name to concurrency (#64402 ) Summary: This would ensure that manually/API triggered workflows would not cancel other triggered workflows. For example, the manually triggered periodic 11.1 linux job cancelled the scheduled one here, which we may not want: ![image](https://user-images.githubusercontent.com/31798555/131752175-1c99d56e-d344-46e1-b8ac-9c12bba0569a.png). This would be helpful later as we use more dispatched workflows (e.g., for bisect functionality) Pull Request resolved: https://github.com/pytorch/pytorch/pull/64402 Reviewed By: malfet Differential Revision: D30734860 Pulled By: janeyx99 fbshipit-source-id: 220016716094666e9af836fcd716dd529cf23d8a	2021-09-02 16:24:05 -07:00
Garrett Cramer	b12f34e8c2	update rpc tensorpipe logic for sparse tensors (#62960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62960 A bug was filed a few years ago for sending sparse tensor over rpc #30807. This pr updates rpc/tensorpipe logic for CUDA sparse tensors. During the serialization process, the pickler.cpp implementation breaks down the sparse tensor into two tensors and metadata. torch/csrc/distributed/rpc/tensorpipe_agent.cpp needs to be updated because it does not have logic sparse tensors. It pushes a single device for a sparse tensor. This is wrong because after the sparse tensor has been serialized, there will be two tensors. The second tensor will not have a device. This will cause the second tensor to have the wrong target device. tensorpipe_utils.cpp needs to be updated because deserialization happens after the data is received on the target pipe. This takes the two tensors and metadata sent and rebuilds the sparse tensor. There will be two tpDescriptors but only one tensor after deserialization. The logic is updated to verify the sparse tensor is on the correct device using the first tpDescriptor. This pr also updates ivalue.cpp and ivalue.h to support more paths for Sparse COO tensors. I tested these changes by adding sparse tests to rpc_test.py and dist_autograd_test.py. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30717285 Pulled By: gcramer23 fbshipit-source-id: daee9a56764550f56b131f9dd8e74e23113d6714	2021-09-02 16:16:19 -07:00
Eli Uriegas	32a93c2424	Revert D30675780: [FX] Prototype for guarding against mutable operations in tracing Test Plan: revert-hammer Differential Revision: D30675780 (`795387477f`) Original commit changeset: b2116b51dcc8 fbshipit-source-id: d4f1173f4989556ea54974f4c2739ef85a705fae	2021-09-02 16:07:29 -07:00
Zafar Takhirov	116142143c	[quant] Enable jit tracing on quantizable LSTM (#64438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64438 The quantizable LSTM didn't support jit tracing because it had several non taceable paths. We sacrifice some of the user experience to enable the tracing. The main UX feature removed is a user-friendly message when trying to access the backwards path in a bidirectional LSTM: When the bidirectional flag is `False`, we used to throw a nice error message when the user tried accessing backwards weights. Now the message is default (removed properties). Test Plan: `buck test mode/dev //caffe2/test:quantization -- test_custom_module_lstm` Reviewed By: mtl67 Differential Revision: D30732630 fbshipit-source-id: 443e351ebb0e2b636c86dea9691b9bf42ffe618f	2021-09-02 15:59:20 -07:00
James Reed	795387477f	[FX] Prototype for guarding against mutable operations in tracing (#64295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64295 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D30675780 Pulled By: jamesr66a fbshipit-source-id: b2116b51dcc87357f0c84192c4c336680875e27a	2021-09-02 15:17:04 -07:00
Eli Uriegas	3c79e0b314	.github: Migrate pytorch_linux_bionic_py_3_6_clang9 to GHA (#64218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64218 Relies on https://github.com/fairinternal/pytorch-gha-infra/pull/11 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra bdhirsh Test Plan: Imported from OSS Reviewed By: malfet, H-Huang, janeyx99 Differential Revision: D30651516 Pulled By: seemethere fbshipit-source-id: e5843dfe84f096f2872d88f2e53e9408ad2fe399	2021-09-02 14:51:00 -07:00
Erjia Guan	257623da39	Switch Shuffler to use iter-local buffer (#64195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64195 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D30642947 Pulled By: ejguan fbshipit-source-id: d4b52479b4ae37ad693388b9cdb8eed83a136474	2021-09-02 13:40:28 -07:00
Nikita Shulga	f555348aaa	Disable CircleCI ROCm build (#64434 ) Summary: Per jithunnair-amd suggestion Pull Request resolved: https://github.com/pytorch/pytorch/pull/64434 Reviewed By: seemethere, janeyx99 Differential Revision: D30732289 Pulled By: malfet fbshipit-source-id: 1932d0a7d1e648006f8030c8237b187d0709f688	2021-09-02 13:32:02 -07:00
Kevin Tse	4ce9c530d6	[DataPipe] removing filter's inheritance from map (#64404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64404 This PR remove `filter`'s inheritance from `map`. This allows `filter` to not have a `__len__` function and that behavior is what we would like. cc VitalyFedyunin ejguan Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30713120 Pulled By: NivekT fbshipit-source-id: 4d5d07555297ee2bd4b49842c0d26cdc00638f6c	2021-09-02 13:09:47 -07:00
Kevin Tse	4f43480186	[DataPipe] adding/removing __len__ for different DataPipe (#64398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64398 cc VitalyFedyunin ejguan Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30710437 Pulled By: NivekT fbshipit-source-id: 524eda43a2faa0db0c1a662bf9bb4283f0ade83c	2021-09-02 13:08:32 -07:00
Erjia Guan	3cd0a4ac15	Fix test_ind_worker_queue by setting max_num_worker based on system resource (#63779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63779 Fixes #63657 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30494185 Pulled By: ejguan fbshipit-source-id: d1bd24299b25d589889604aaf18ad347bdff4df4	2021-09-02 12:36:56 -07:00
Thomas J. Fan	7d010539c9	ENH Adds test and docs for modules that already support no batch dims (#62729 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62729 Reviewed By: H-Huang Differential Revision: D30669546 Pulled By: jbschlosser fbshipit-source-id: c771c98c1fd9d28fa984b72893585c738c736505	2021-09-02 12:36:54 -07:00
Rohan Varma	d0cb26ba57	[DDP] Fix logging iterations (#64411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64411 These are not actually the training iterations, but are offset by how frequently DDP stats collection actually runs (default being kDDPRuntimeLoggingSampleRate = 100). So with this change, they are actually logged to scuba every: 10, 10 * 100, 40 * 100, etc iterations. Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D30718274 fbshipit-source-id: 146bd2428753c93363bee37e487f40104fce3c18	2021-09-02 12:35:01 -07:00
Eli Uriegas	22f3bcd164	.github: Move squid vars to common vars (#64436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64436 Moves the squid variables to our common jinja template so that when we have to update them they're all in the same place. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: malfet, zhouzhuojie Differential Revision: D30732776 Pulled By: seemethere fbshipit-source-id: 22e3757c4eec775baa8abbaac2ba2a0c69c2b2a9	2021-09-02 11:31:54 -07:00
Eli Uriegas	c932afe39b	.github: Move upload-artifact-s3 to common var (#64435 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64435 Move upload-artifact-s3 to a common variable to be used amongst our jinja templates, this should make it easier in the future to update these images Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30732777 Pulled By: seemethere fbshipit-source-id: 51cd485f5abae134c3c49dfa878e6303ba8e5f25	2021-09-02 11:31:52 -07:00
Richard Zou	1519b6084f	nn.functional.linear OpInfo (#61971 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61971 Test Plan: - wait for tests Reviewed By: heitorschueroff Differential Revision: D30013750 Pulled By: zou3519 fbshipit-source-id: ca41dbd98176c12e50ad1410a658f4b06fe99a1e	2021-09-02 11:31:50 -07:00
Eli Uriegas	c0cdbb1cc5	Revert D30468409: Add fx2trt pass for removing duplicate output args Test Plan: revert-hammer Differential Revision: D30468409 (`6da7552a8e`) Original commit changeset: b4d91b76ab5d fbshipit-source-id: e138dc425fe55ffe3585ea5fac4db476931bafed	2021-09-02 11:31:49 -07:00
Hui Guo	9214450b7f	[tensorexpr] Wrap error msgs with buildErrorMessages for internal asserts (#64409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64409 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30717786 Pulled By: huiguoo fbshipit-source-id: a3b147d339ff4927f14efa24407cd3b63d80001d	2021-09-02 11:30:34 -07:00
Kefei Lu	6da7552a8e	Add fx2trt pass for removing duplicate output args (#64433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64433 Fx2TRT does not support duplicate nodes in the output args tuple. This pass removes duplicate output args from the target subnets and fixes their uses in the top level module where the subnets are called. This pass must be called after acc split on the top-level net and subsequent calls to the acc trace on the subnets. This pass will change both the subnets and top level module. Test Plan: Run: ``` buck run mode/opt -c python.package_style=inplace //caffe2/torch/fb/fx2trt/tests/passes/:test_remove_duplicate_output_args ``` Reviewed By: 842974287 Differential Revision: D30468409 fbshipit-source-id: b4d91b76ab5d8a5275d68dd48d1327a44c22568e	2021-09-02 10:40:37 -07:00
Jane Xu	aeafcde087	CI: Enable using labels to control GHA workflows (#64314 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62852 Sets a global environment variable containing a list of PR labels. For this PR, the PR_LABELS variable looks like: ``` [ "cla signed", "ciflow/default" ] ``` confirmed in a run: https://github.com/pytorch/pytorch/runs/3490072161?check_suite_focus=true This information can be used in other workflow steps to control the logic. For example, if I want to force a build, I can label my PR with "force-build" and do something like the following in my build script: ``` if [[ "${PR_LABELS}" = force-build ]]; then python setup.py install else #use cached wheel or something fi ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64314 Reviewed By: driazati Differential Revision: D30714570 Pulled By: janeyx99 fbshipit-source-id: 80b060ee32643ddd22eb7b8ec548579c7ccf6441	2021-09-02 10:34:44 -07:00
Nicolas Hug	66ddc6ef9e	Fixes and details to torchhub docs (#63783 ) Summary: This PR: - adds a few details regarding the newly added `skip_validation` parameter https://github.com/pytorch/pytorch/pull/62139 - uses double-backticks instead of single-backticks since this is rst, not mardown. - adds a few minor doc nits here and there Pull Request resolved: https://github.com/pytorch/pytorch/pull/63783 Reviewed By: zou3519 Differential Revision: D30696658 Pulled By: NicolasHug fbshipit-source-id: 6f01c7eb3cfcd7e17e4c33c09d193054fa18ad36	2021-09-02 09:32:57 -07:00
Thomas J. Fan	50067c020a	TST Adds __repr__ and str to module info (#63737 ) Summary: Follow up to https://github.com/pytorch/pytorch/pull/61935 This PR adds `test_repr` to `test_modules`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63737 Reviewed By: gchanan Differential Revision: D30729642 Pulled By: jbschlosser fbshipit-source-id: c11a28bc0739abd3ed40727389dd28ed4069edad	2021-09-02 09:32:55 -07:00
Zhaoheng Ni	2c258d91cc	Fix torch.istft length mismatch and window runtime error (#63469 ) Summary: The PR fixes two issues: - See https://github.com/pytorch/pytorch/issues/62747 and https://github.com/pytorch/audio/issues/1409. The length mismatch when the given ``length`` parameter is longer than expected. Add padding logic in consistent with librosa. - See https://github.com/pytorch/pytorch/issues/62323. The current implementations checks if the min value of window_envelop.abs() is greater than zero. In librosa they normalize the signal on non-zero values by indexing. Like ``` approx_nonzero_indices = ifft_window_sum > util.tiny(ifft_window_sum) y[approx_nonzero_indices] /= ifft_window_sum[approx_nonzero_indices] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63469 Reviewed By: fmassa Differential Revision: D30695827 Pulled By: nateanl fbshipit-source-id: d034e53f0d65b3fd1dbd150c9c5acf3faf25a164	2021-09-02 09:31:47 -07:00
Mike Iovine	616fd9219d	[Static Runtime] Add sign/abs/lop1p/mul fusion pass (#64209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64209 Add a new fusion pass that turns transforms the following pattern: ``` graph(%input): %0 : Tensor = aten::sign(%input) %1 : Tensor = aten::abs(%input) %2 : Tensor = aten::log1p(%1) %res : Tensor = aten::mul(%0, %2) return (%res) ``` Into a single op: ``` graph(%input): %res : Tensor = static_runtim::signed_log1p(%input) return (%res) ``` The intent is to reduce the number of passes over the tensor. However, enabling this pass actually causes a performance regression, probably due to a lack of vectorization in the fused implementation. Because of this issue, this diff does not enable this pass. Followup: navahgar will add an NNC kernel which is faster than the the unfused version and enable this pass. We still need this version as a fallback since the NNC kernel will not support all dtypes. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- SignedLog1p` Test passed with new graph pass disabled and enabled. Reviewed By: hlu1 Differential Revision: D30559929 fbshipit-source-id: e4e080cb2e6a705cfdde1fc98bee92b723f8132a	2021-09-02 08:31:40 -07:00
CodemodService FBSourceClangFormatLinterBot	cd3be4675f	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D30710635 fbshipit-source-id: e8dae05a7e3a19d656067a4f102aab4a3c93ac42	2021-09-02 08:31:37 -07:00
Seth Elliott	f04e6594ed	Fix broken caffe2 test: PlanExecutorTest.BlockingErrorPlan (#64401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64401 PlanExecutorTest.BlockingErrorPlan uses `ASSERT_DEATH` which internally performs a `fork()`. This can cause problems under certain configurations that use threads. This change updates this test to use the "threadsafe" style for GTest death tests in order to improve its quality in multithreaded environments. Test Plan: I confirmed that this change fixes the issue on my devvm with the following command: ``` buck test mode/dev //caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest.BlockingErrorPlan ``` Reviewed By: praihan Differential Revision: D30709447 fbshipit-source-id: 12ffd9ad0371e2e5b43a9873c80568e5ab02d246	2021-09-02 08:30:29 -07:00
Michael Dagitses	b737629ff0	simplify op name determination into a single forward pass (#64261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64261 Note that this does not preserve byte-for-byte compatibility with existing names. Test Plan: * Rely on CI to catch gross errors. * Merge after release cut to catch subtle issues. Reviewed By: albanD Differential Revision: D30700647 Pulled By: dagitses fbshipit-source-id: 7b02f34b8fae3041240cc78fbc6bcae498c3acd4	2021-09-02 07:32:11 -07:00
Vasiliy Kuznetsov	b2c7c1dfcf	fix copy.deepcopy on LinearPackedParams (#64367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64367 This is the same thing as https://github.com/pytorch/pytorch/pull/56154 but for quantized linear. It fixes the behavior of `copy.deepcopy` on these modules. Before this PR, copied instances of `LinearPackedParams` were not properly initialized, and inspecting them raised errors of missing `_modules`. After this PR, inspecting and using the copies works. Test Plan: ``` python test/test_quantization.py TestStaticQuantizedModule.test_linear_api ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D30702667 fbshipit-source-id: 38c26d1e72663416eeb989985b77ffc2052c12b9	2021-09-02 06:30:42 -07:00
Ivan Kobzarev	99b064fac4	[jit] shape propagation for prepack (#63585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63585 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30428905 Pulled By: IvanKobzarev fbshipit-source-id: c18f6605a69b2e000bdf14a23e637c5a1c2ec64c	2021-09-02 05:30:38 -07:00
Michael Dagitses	cdb46f4c6e	extract TestAutogradComplex into its own test file (#63400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63400 This is the first step to break up test_autograd.py for #63205. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30541499 Pulled By: dagitses fbshipit-source-id: 8d9d32007938b9eade0e88f95a6a3190e7e2ef01	2021-09-02 04:34:35 -07:00
Michael Dagitses	be5b05c1dc	require that `TARGET_DET_LIST` is sorted (and sort it here) (#64102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64102 We sort this list so that we may add comments to indicate the absence of a file right where that file would need to be put. This makes it difficult to wrongly add such a file. The sorting itself was done programmatically to ensure that no entries were inadvertently removed. I printed the sorted list with: ``` for p in sorted(TARGET_DET_LIST): print(f' "{p}",') ``` Then copied it back into the file. Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30625076 Pulled By: dagitses fbshipit-source-id: cf36fcb3e53e274b76d1f4aae83da1f53c03f9ed	2021-09-02 04:34:33 -07:00
Nicolas Hug	aedd70fcfe	Fix list() and help() torchhub functions for Windows (#63773 ) Summary: This PR Fixes the help() and list() torchhub functions which were probably failing for Windows since the `/` OS separator was hardcoded. Before merging this I need to double check whether the CI actually runs the corresponding tests on Windows or not Pull Request resolved: https://github.com/pytorch/pytorch/pull/63773 Reviewed By: zou3519 Differential Revision: D30695664 Pulled By: NicolasHug fbshipit-source-id: fac328163fd05db804a8186ae28f22b3cc3a6404	2021-09-02 04:34:31 -07:00
Nicolas Hug	030154e241	Remove outdated comment in hub.py (#63757 ) Summary: This PR removes an outdated comment about Python2 that was orginally introduced in https://github.com/pytorch/pytorch/pull/25083/files. The code has changed since then, but the comment wasn't removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63757 Reviewed By: zou3519 Differential Revision: D30695656 Pulled By: NicolasHug fbshipit-source-id: 431cf414588b9e5a1ad6acdae724ff5af1b16971	2021-09-02 04:34:29 -07:00
Nicolas Hug	1c735768ed	Update hub.load() signature to avoid polluting kwargs param (#63755 ) Summary: This PR addresses an old comment about Python2 EOL, directly putting some parameters in the function signature instead of in a `**kargs` dict. I believe the changes are fully backward compatible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63755 Reviewed By: zou3519 Differential Revision: D30695634 Pulled By: NicolasHug fbshipit-source-id: 398f347c5a04bfb58e77e46773a869cb9d0eb225	2021-09-02 04:32:22 -07:00
Kefei Lu	6db8f7a709	Fix TRTModule not adding outputs in order (#64418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64418 In T99368564, we found that when running TRT lowered module, the output tensors are out-of-order, as compared to the output from the original, non-lowered module. It turns out that in `TRTModule.forward()`, we cannot rely on `ICudaEngine` bindings natural order indices to create the output tensors, but rather, we should explicitly construct the output tensor from the bindings' names, in an ordered that we supply. Test Plan: * Arc lint * Run CI/sandcastle tests * Run GPU lowering using commands and code changes in D30171741 and ensure we don't observe out-of-order outputs Reviewed By: yinghai Differential Revision: D30693545 fbshipit-source-id: 32a894ceeb148fcf4e8d279be3835c7d1f1aa2ba	2021-09-02 01:36:23 -07:00
Kushashwa Ravi Shrimali	76e187aa08	Port `gather` to structured kernel (#63312 ) Summary: Will add a description once this is ready for review. cc: ysiraichi ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/63312 Reviewed By: iramazanli Differential Revision: D30597447 Pulled By: ezyang fbshipit-source-id: d36e59835c2f4b38e286032dd2a1111a7e16b7e5	2021-09-02 01:36:21 -07:00
Pavel Belevich	ee8a6c1d14	Replace std::unordered_map<c10::Device, c10::Device> with DeviceMap (#64393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64393 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D30708384 Pulled By: pbelevich fbshipit-source-id: 1c565727e4f09cd9e560874dd90aa403470b4a97	2021-09-02 01:36:19 -07:00
Chen Lai	8d5b95019d	[PyTorch Edge] Support default args with out arg, flag off (#63540 ) Summary: 1. Allow consuming operators with defaults arguments and out arguments. Flag is off to keep the same behavior as v6, in pr 63651, turn on the flag. 2. Add two unittests to cover this type of operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63540 ghstack-source-id: 137211562 Test Plan: ``` caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg ``` Reviewed By: raziel, iseeyuan, tugsbayasgalan Differential Revision: D30414156 fbshipit-source-id: 0f3a219a22aee10ac53184cbd95940726c459d1f	2021-09-02 01:36:16 -07:00
Edward Yang	0addd75be9	Remove unnecessary resize_output (#64272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64272 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang, bdhirsh Differential Revision: D30686941 Pulled By: ezyang fbshipit-source-id: de60e6f1115648f8cf7daaa1e652594fe8b06742	2021-09-02 01:34:17 -07:00
Shirong Wu	69e1207084	Move graph util to fx2trt (#64064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64064 Move original util in torch2trt to fx2trt dir since torch2trt is gonne be deprecated. This is a follow up diff for D30379124 Test Plan: manual Reviewed By: yinghai, mikekgfb Differential Revision: D30591687 fbshipit-source-id: ae0e59dfbc2d2e2aa4f3ccea7cff2291c7deb388	2021-09-01 22:34:11 -07:00
Edward Yang	71e149834b	Add a warning about DataLoader num_workers > 0 "memory leak" (#64337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64337 See https://github.com/pytorch/pytorch/issues/13246 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D30690320 Pulled By: ezyang fbshipit-source-id: 2751aca05a94e63d25162599f458855988516fad	2021-09-01 21:49:41 -07:00
Rohan Varma	d067f15622	[Dist CI] Move rest of distributed tests to their own CI job (#64253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64253 Follow up to D30496178 (`f4aff3a346`) to move the rest of distributed tests to their own jobs for Linux GHA. ghstack-source-id: 137233785 Test Plan: CI Reviewed By: walterddr Differential Revision: D30662999 fbshipit-source-id: f7cfbc0d1223aca52120f17f9da987d70fda8de6	2021-09-01 21:43:41 -07:00
Rohan Varma	4d6314a16e	[DDP] Log num threads (#64072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64072 Log gloo threads to DDP logging. ghstack-source-id: 137119480 Test Plan: CI Reviewed By: mrshenli Differential Revision: D30596083 fbshipit-source-id: 2b4f6e762cb5d850be6056bcc5922029a1af3c91	2021-09-01 18:36:15 -07:00
Zeina Migeed	59c6ceb6a8	add documentation to shape inference algorithm (#64312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64312 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D30709254 Pulled By: migeed-z fbshipit-source-id: 3297d26fe6727c5b9ca176625b1683d787f59659	2021-09-01 18:34:17 -07:00
Yi Wang	778af56504	[DDP Comm Hook] Add debugging communication hooks to ddp_comm_hooks.rst (#64352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64352 as title ghstack-source-id: 137246253 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D30694089 fbshipit-source-id: a78110b11d59bb0718f43c99ede23f2fd8ab21d0	2021-09-01 17:37:19 -07:00
Yi Wang	bf9d66586c	[DDP Comm Hook] Create a noop hook for performance debugging (#64344 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64344 As title. Additionally, avoid using numpy array in test_ddp_hooks.py. ghstack-source-id: 137170449 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/algorithms/ddp_comm_hooks:test_ddp_hooks -- test_ddp_comm_hook_noop_hook Reviewed By: rohan-varma Differential Revision: D30693220 fbshipit-source-id: e17f0d1c6198863cf20a53566f586a6bff602522	2021-09-01 17:36:22 -07:00
Rohan Varma	baceea4426	[DDP] Add more logging iterations (#64071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64071 Adding more logging iterations to get additional data. ghstack-source-id: 137119476 Test Plan: CI Reviewed By: mrshenli Differential Revision: D30579367 fbshipit-source-id: 57195266ada5e5926f0d8eaf4fb4e01dc98924d7	2021-09-01 17:32:17 -07:00
Rohan Varma	59fcbd172b	Fix incorrect DDP test (#64074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64074 Previous PR https://github.com/pytorch/pytorch/pull/63831 did not actually test the error in https://github.com/pytorch/pytorch/issues/63812. Introduce a test directly from the repro that simulates it. ghstack-source-id: 137171460 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30569719 fbshipit-source-id: fd61250ef6d291c093607663d91d6d2cb5574eb7	2021-09-01 16:34:06 -07:00
Rohan Varma	9b8f9d5a25	[c10d] Prefer use of torch_check (#63928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63928 throw std::invalid_argument results in not getting stacktraces with TORCH_SHOW_CPP_STACKTRACES=1, so instead prefer torch_check here. ghstack-source-id: 137135328 Test Plan: CI Reviewed By: mrshenli Differential Revision: D30533955 fbshipit-source-id: 33e5bf4f449e3043dec68da93f8022f6624d9675	2021-09-01 16:34:05 -07:00
anjali411	5d80a48cef	Add fast path for addmm when the inputs are conjugate (#59380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59380 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28898374 Pulled By: anjali411 fbshipit-source-id: eab0e64d37bb57c18b54cabb8e5c00666338ba04	2021-09-01 16:34:02 -07:00
Yi Wang	a8f9aab840	[DDP Comm Hook] Add bf16 gradient compression to ddp_comm_hooks.rst (#64346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64346 as title ghstack-source-id: 137170288 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D30693513 fbshipit-source-id: 8c64b8404ff3b0322e1bbbd93f6ef051ea91307d	2021-09-01 16:34:00 -07:00
Jerry Zhang	ed89937d2c	[quant][graphmode][fx] Add fbgemm backend_config_dict (#64288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64288 This is just to setup the file structure and unblock experimentation. The format for backend_config_dict will change in the future Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: zou3519 Differential Revision: D30699457 fbshipit-source-id: 28211a4def05d34757850c045a36e311f54760fe	2021-09-01 16:32:43 -07:00
Santiago Castro	69f4401b7b	Make datasets in `ConcatDataset` not need to be sized (#64114 ) Summary: `datasets` needs to be iterable, but also sized because the length is checked. But immediately after it's converted to a list. By changing the order of these 2 lines, it doesn't need to be sized anymore. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64114 Reviewed By: H-Huang Differential Revision: D30641480 Pulled By: ejguan fbshipit-source-id: 7e16548c2123afa65b83845f9929271fa07fe1e8	2021-09-01 15:32:50 -07:00
Richard Zou	535526b95c	Restore LayerNorm numerics test (#64385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64385 It was deleted in https://github.com/pytorch/pytorch/pull/63276. The numerics test was meant to check LayerNorm behavior on large inputs, but we deleted it without realizing that. Test Plan: - wait for tests. Reviewed By: ngimel Differential Revision: D30702950 Pulled By: zou3519 fbshipit-source-id: a480e26c45ec38fb628938b70416cdb22d976a46	2021-09-01 15:32:49 -07:00
Jerry Zhang	7ffcf15503	[quant][graphmode][api] Add backend_config_dict to prepare_fx api (#64135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64135 We want to start aligning the api with the design in https://github.com/pytorch/pytorch/wiki/Extending-PyTorch-Quantization-to-Custom-Backends We plan to gradually move things from `prepare_custom_config_dict` and `convert_custom_config_dict` to `backend_config_dict` and allow custom backend developer to define their own way of quantizing operators. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: zou3519 Differential Revision: D30699456 fbshipit-source-id: e3c068da8d3da2270f57719f7159cc71cafa8598	2021-09-01 15:32:47 -07:00
zhouzhuojie	93bc03622e	Silent rm error for sccache log file (#64388 ) Summary: Sample reporting from dr.ci ![image](https://user-images.githubusercontent.com/658840/131724645-75afa04f-7554-4674-8e7c-cf139c84d994.png) The `rm` command is not actually running into problems, just need to silent the console output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64388 Reviewed By: walterddr, malfet, seemethere Differential Revision: D30704439 Pulled By: zhouzhuojie fbshipit-source-id: ecd35531decf05b75cef30d08d46635f81112f67	2021-09-01 15:32:45 -07:00
Yuchen Huang	9495674905	[xplat][metal] Add getters and setters for ivars in Conv2dOpContext (#57395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57395 As title ghstack-source-id: 137223806 (Note: this ignores all push blocking failures!) Test Plan: ### Lib Build - `buck build caffe2:aten_metal_prepack` ### Integration Test - `arc focus2 pp-ops -a ModelRunner` - Click "Test Person/Hair Segmentation Model" {F612831435} - Image Classification Demo {F614144868} Reviewed By: xta0 Differential Revision: D28132020 fbshipit-source-id: 73560263a9d14e9ecfa39c69deb158a2ed8cb179	2021-09-01 15:31:12 -07:00
Meghan Lele	968d7ee46a	[structured] Preserve computed elements from meta func to impl (#61746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61746 Summary This commit introduces a new feature for structured kernels that allows kernels to declare quantities as "precomputed" in `native_functions.yaml`, compute them once in the `meta` function and reuse them again in the `impl`. The names and types of these quantities are used to generate code for a struct containing them that the `meta` function must return. In the case of a handful of surveyed kernels (`all,`, `any`, `avg_pool2d`), these quantities that are used both in the `meta` and `impl` have the same meaning as certain kernel arguments and in fact supersede them. Accordingly, the correspondence between a kernel argument and the precomputed elements that supersede it is also captured in `native_functions.yaml`. This information is used to unpack the struct returned by `meta` and pass its contents correctly to the `impl` function. The primary goal is to avoid recompute and enhance developer experience (e.g. sometimes people can forget to compute these elements while porting a kernel). Test Plan: Imported from OSS Reviewed By: tugsbayasgalan Differential Revision: D30407831 Pulled By: SplitInfinity fbshipit-source-id: 00975525ea373721fe52d06f75cd4ac91f3dc556	2021-09-01 14:34:25 -07:00
Mike Iovine	4aad366111	[Static Runtime] Make per-op latency readable by FAI-PEP (#64315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64315 Add a new flag `generate_ai_pep_output` to `StaticRuntime::benchmark`. If set, produces per-op-kind average total latency in milliseconds in a JSON format recognized by [Facebook AI performance evaluation platform (FAI-PEP)](https://github.com/facebook/FAI-PEP). This is useful for observing the impact of changes that make a big difference for a specific op, but do not affect the overall SR latency by more than a few percent. Reviewed By: hlu1 Differential Revision: D30679352 fbshipit-source-id: c847fa6ea20774aaf1e7949b11db4421d1f70b7e	2021-09-01 14:34:22 -07:00
Salil Desai	86c9654291	Update optimize_for_mobile to preserve node's debug information (#63106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63106 Propagate debug info to the re-written nodes in the graph. Test Plan: - Clone open source repo and build - ``` python3 test/test_jit.py TestOptimizeForMobilePreserveDebugInfo ``` - Tests pass Reviewed By: kimishpatel Differential Revision: D28654659 fbshipit-source-id: 2d7c87f2fb95a3be53246375f35639bbd97c237e	2021-09-01 14:34:20 -07:00
David Reiss	15ff25d1fc	Break up "@generated" string so Phabricator shows changes Summary: Created from CodeHub with https://fburl.com/edit-in-codehub Test Plan: CI Sandcastle run Reviewed By: larryliu0820 Differential Revision: D30701781 fbshipit-source-id: 3acab8b65a327c4ec7da90bc855ecf02f801c40a	2021-09-01 14:34:18 -07:00
Alban Desmaison	e322547fe6	Add forward AD support for custom Functions (#64061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64061 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D30640868 Pulled By: albanD fbshipit-source-id: b0e6610430a879074d6d5306443772fc154b431f	2021-09-01 14:33:09 -07:00
Tanvir Zaman	25e2578967	Fix bytes_written and bytes_read (#64244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64244 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64040 In operator cost inference functions, in many places we are using sizeof(x.data_type()). Since data_type() returns a 32 bit integer from [this enum](https://www.internalfb.com/code/fbsource/[15e7ffe4073cf08c61077c7c24a4839504b964a2]/fbcode/caffe2/caffe2/proto/caffe2.proto?lines=20), we are basically always getting 4 for sizeof(x.data_type()) no matter what actual data type x has. Big thanks to Jack Langman for specifically pointing to this bug. We would instead use the size in bytes based on actual data type. Test Plan: Added unit tests BatchMatMulMemCostTest: buck test //caffe2/caffe2/fb/fbgemm:batch_matmul_op_test -- BatchMatMulMemCostTest Extended existing unit test test_columnwise_concat for different data types: buck test //caffe2/caffe2/python/operator_test:concat_op_cost_test -- test_columnwise_concat Reviewed By: CrazySherman Differential Revision: D30656698 fbshipit-source-id: d42c0c9a0c5b0ddc5dba39e4994f1f85a5e618bf	2021-09-01 13:35:41 -07:00
Scott Wolchok	03a58a2ba0	[Caffe2] Create fewer strings during argument fetching (#64285 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64285 With C++14 heterogeneous ordered container lookup, it is no longer necessary to create a `std::string` in order to look up elements of a `CaffeMap` keyed by std::string. Accordingly, this diff reworks the argument-getting operator functions to avoid that in favor of `c10::string_view`. ghstack-source-id: 137139818 ghstack-source-id: 137139818 Test Plan: buildsizebot iOS apps -- code size win. less strings is probably marginally good for perf but this only happens at setup time anyway. Reviewed By: dzhulgakov Differential Revision: D26826676 fbshipit-source-id: ee653b14dc2c528bae8c90f0fc6a7a419cbca1d6	2021-09-01 13:30:54 -07:00
Kimish Patel	468001600c	Back out "Revert D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling." (#64307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64307 Original commit changeset: 0b2aa7c57d08 Restores original changes. This diff changes the way operator profiling is done in lite predictor benchmarking binary. Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile events and then generate operator level metric from it. Since KinetoEvents do not contain cpu clock time, now we report only wallclock time. This unifies various profiling effort that we have for benchmarking purpose. In production we will still use observer based mechanism, but the advantage of using kineto profiler is that we get few other things for free, such as: chrome trace generation. operator level memory profiling (to be added) flop counts (to be added) Furthermore possible we can use python post processing script to parse chrome trace and generate output similar to torch.profiler. (To be done) Furthermore removes some tests from test_lite_interpreter.cpp which were testing module hierarchy in debug info. They should be covered by test_mobile_profiler.cpp. Test Plan: aibench run Model without debug info: https://www.internalfb.com/intern/aibench/details/219598441154763 Model with debug info and --print_module_info true (see Operator summary has now module hierarchy information). https://www.internalfb.com/intern/aibench/details/617154236292985 Reviewed By: raziel Differential Revision: D30680354 fbshipit-source-id: b6ba0d59c510c13d13d9935b1d8051cc82ffa4e9	2021-09-01 13:29:35 -07:00
Rohan Varma	421d8f86b6	Add a record scope around autograd::engine::evaluate_function (#63619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63619 Adds a RECORD_FUNCTION with the function that is being valuate as part of backwards execution. This has been useful in picking up some operations in the backwards pass that otherwise would not show up, for example custom cpp functions that use custom C++ code. ghstack-source-id: 137041723 Test Plan: CI benchmark: buck run mode/opt //scripts/rvarm1/ddp:bench Reviewed By: albanD Differential Revision: D30439492 fbshipit-source-id: 955917770cdf2a2edb0303223ace710b668ba388	2021-09-01 12:32:30 -07:00
Patrick Kan	0b48d96895	[Bootcamp] Include both python unittest and parser parameters in --help and -h flag (#64297 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45945 Creates a new thread to run -h or --help with unittest.main if the help flag is present, and keeps the add_help default for parameters. Includes both python unittest and parser parameters in --help and -h flag and will remain up to date since both messages are displayed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64297 Test Plan: Imported from GitHub `python test/test_spectral_ops.py --help` Output: ``` % python test/test_spectral_ops.py --help usage: test_spectral_ops.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]] positional arguments: tests a list of any number of test modules, classes and test methods. optional arguments: -h, --help show this help message and exit -v, --verbose Verbose output -q, --quiet Quiet output --locals Show local variables in tracebacks -f, --failfast Stop on first fail or error -c, --catch Catch Ctrl-C and display results so far -b, --buffer Buffer stdout and stderr during tests -k TESTNAMEPATTERNS Only run tests which match the given substring Examples: test_spectral_ops.py - run default set of tests test_spectral_ops.py MyTestSuite - run suite 'MyTestSuite' test_spectral_ops.py MyTestCase.testSomething - run MyTestCase.testSomething test_spectral_ops.py MyTestCase - run all 'test*' test methods in MyTestCase usage: test_spectral_ops.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT] [--test_bailouts] [--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX] [--run-parallel RUN_PARALLEL] [--import-slow-tests [IMPORT_SLOW_TESTS]] [--import-disabled-tests [IMPORT_DISABLED_TESTS]] optional arguments: -h, --help show this help message and exit --subprocess whether to run each test in a subprocess --seed SEED --accept --jit_executor JIT_EXECUTOR --repeat REPEAT --test_bailouts --save-xml [SAVE_XML] --discover-tests --log-suffix LOG_SUFFIX --run-parallel RUN_PARALLEL --import-slow-tests [IMPORT_SLOW_TESTS] --import-disabled-tests [IMPORT_DISABLED_TESTS] ``` Also ran some other tests to make sure tests still worked, and other tests with --help or -h flag Reviewed By: seemethere Differential Revision: D30677776 Pulled By: PatrickKan fbshipit-source-id: eb3d6e3fa677137ec703ec3a23808efb99acc896	2021-09-01 12:30:47 -07:00
Patrick Hu	c6505cc383	[FX] Fix python code generation for wrapped getattr() with default value (#64271 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64271 Closes #60417 Modified emit_node() in fx/graph.py to generate getattr() call with default value when len(node.args) != 2 instead of accessing the attribute. Added test_torch_fx_getattr() in test/test_fx.py. Test Plan: pytest test/test_fx.py Imported from OSS Reviewed By: jamesr66a Differential Revision: D30671265 fbshipit-source-id: f2db9ea47e0cb247547e200684f715aab006c374	2021-09-01 11:30:27 -07:00
Raghavan Raman	87d8ab6e50	[nnc] Updated generic error message with info about turning off the fuser (#64316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64316 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30683942 Pulled By: navahgar fbshipit-source-id: d86607563672213f99a1436dcf4f5dc28053b713	2021-09-01 10:31:50 -07:00
Xiang Gao	c4f3f6e62d	Fixes reduction launch config (#64304 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48573 See also https://github.com/pytorch/pytorch/pull/64194 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64304 Reviewed By: janeyx99 Differential Revision: D30689600 Pulled By: ngimel fbshipit-source-id: bf2103ca177fd3b6e27bc0324b81925234483a29	2021-09-01 10:30:40 -07:00
Kushashwa Ravi Shrimali	d5bfdd3dac	OpInfo for `nn.functional.layer_norm` (#63276 ) Summary: Please see https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261. Note: * This PR also adds a reference test inspired by existing tests in `test_nn.py`. cc: mruberry zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63276 Reviewed By: ejguan Differential Revision: D30452483 Pulled By: zou3519 fbshipit-source-id: 2578d01ca34e031668a41bd284db60c31ae1fba8	2021-09-01 09:31:45 -07:00
Nima Elyasi	d1f3d85fd8	fix GradBucket.is_last() logic (#63768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63768 passed number of buckets to GradBucket constructor, to check if index is equal to num_buckets - 1 in the .is_last() function. Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/algorithms/ddp_comm_hooks:test_ddp_hooks test output: https://www.internalfb.com/intern/testinfra/testconsole/testrun/8162774375985873/ Reviewed By: SciPioneer, mrshenli Differential Revision: D30455913 fbshipit-source-id: 8c67ca69cbf191d6e189e09248407eb167bb24b6	2021-09-01 09:29:13 -07:00
Richard Zou	92b31b59af	Revert D29699456: [pytorch][PR] Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA] Test Plan: revert-hammer Differential Revision: D29699456 (`ad4848565e`) Original commit changeset: 407ae53392ac fbshipit-source-id: b6c70ba8bb28c0c38de47857030b69792a8470de	2021-09-01 07:32:24 -07:00
James Reed	0c4e4e588e	[FX] Rename reduce functions back to their old, public names (#64324 ) Summary: Unfortunately pickle serializes the names of these functions. Also put them under backward-compatibility enforcement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64324 Test Plan: Local repro https://fb.workplace.com/groups/3440841732711443/permalink/4018921611570116/ Reviewed By: SplitInfinity, TailofJune Differential Revision: D30684185 Pulled By: jamesr66a fbshipit-source-id: 900701220155d15115cd0c07cf7774a2891bd04f	2021-08-31 22:36:11 -07:00
Yuchen Huang	05ecaefbbf	[Metal][GPU] Enable metal for simulators and fix test failures if possible (#64322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64322 As title ghstack-source-id: 137143877 Test Plan: - `aibench-cli mobile` - Select iOS -> `y` -> `1` -> `n` -> "--metal_op_test" - Select all iPhone 6 + iPhone 7 + iPhone 8 and a iPhone X or 11 or 12 ``` Benchmark Submitted. Find more details at: https://our.intern.facebook.com/intern/aibench/details/318120612514604 Benchmark Status: D10 (`b8256280ce`)AP-12.0.1: DONE N71mAP-14.3: DONE DUMMY latency: D10 (`b8256280ce`)AP-12.0.1: 4319.3 N71mAP-14.3: 8868.51 I0831 16:06:27.210558 605277 ClientSingletonManager.cpp:99] Shutting down Manifold ClientSingletonManager ``` Reviewed By: xta0 Differential Revision: D30147163 fbshipit-source-id: 2de6bbd9bd525e32ca92b2845eb435800855edcc	2021-08-31 22:36:09 -07:00
Michael Carilli	24e50b8453	[CUDA graphs] hotfix for test_graph_ (#64339 ) Summary: Graphed workloads that try to capture a full backward pass must do warmup on a non-default stream. If warmup happens on the default stream, AccumulateGrad functions might tag themselves to run on the default stream, and therefore won't be capturable. ngimel and I suspect some test_cuda.py tests run with the default stream as the ambient stream, which breaks `test_graph_grad_scaling` because `test_graph_grad_scaling` does warmup on the ambient stream _assuming_ the ambient stream is a non-default stream. This PR explicitly sets a side stream for the warmup in `test_graph_grad_scaling`, which is what I should have done all along because it's what the new documentation recommends. I pushed the PR branch straight to the main pytorch repo because we need to run ci-all on it, and I'm not sure what the requirements are these days. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64339 Reviewed By: mruberry Differential Revision: D30690711 Pulled By: ngimel fbshipit-source-id: 91ad75f46a11f311e25bc468ea184e22acdcc25a	2021-08-31 22:34:10 -07:00
gmagogsfm	479fc4e412	Remove outdated warning about RecursiveScriptModule not being copiable (#64085 ) Summary: RecursiveScriptModule has its customized `__copy__` and `__deepcopy__` defined. The warning/error that says it is not copiable is outdated Pull Request resolved: https://github.com/pytorch/pytorch/pull/64085 Reviewed By: rohan-varma Differential Revision: D30598623 Pulled By: gmagogsfm fbshipit-source-id: 0701d8617f42d818bc7b88244caee4cd47fbe976	2021-08-31 21:31:32 -07:00
Mikhail Zolotukhin	8337a3fb3f	[TensorExpr] Wrap error messages with buildErrorMessage call. (#64330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64330 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30687226 Pulled By: ZolotukhinM fbshipit-source-id: ade1be2ad6847c6afbba60307ef854696821b4e3	2021-08-31 20:31:16 -07:00
Pritam Damania	a87808de93	Fix bug in ShardedTensorMetadata serde. (#63902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63902 The 'memory_format' field was not being serialized correctly and used the same encoding for different fields. ghstack-source-id: 137142406 Test Plan: waitforbuildbot Reviewed By: bowangbj Differential Revision: D30527324 fbshipit-source-id: f0f223e2d660ef6e4abae9649d9992acc36e1278	2021-08-31 20:31:14 -07:00
Pavel Belevich	fa5676a41b	Delete some dead code from RRefMessageBase (#64298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64298 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D30676702 Pulled By: pbelevich fbshipit-source-id: 77dbc0f8064c3518376454ff573d45ed0274956b	2021-08-31 20:30:04 -07:00
Matti Picus	6bb4b5d150	disallow empty named dims list to flatten(names, name) (#61953 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61137 by raising an error if an empty tuple is passed in for the names: ``` >>> torch.empty((2, 3), names=['a', 'b']).flatten((), 'abc') RuntimeError: flatten(tensor, dims, out_dim): dims cannot be empty ``` or from the original issue: ``` >>> torch.empty((2, 3)).flatten((), 'abc') RuntimeError: flatten(tensor, dims, out_dim): dims cannot be empty ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61953 Reviewed By: iramazanli Differential Revision: D30574571 Pulled By: malfet fbshipit-source-id: e606e84458a8dd66e5da6d0eb1a260f37b4ce91b	2021-08-31 19:32:30 -07:00
Scott Wolchok	c59970db6b	[caffe2][easy] Save heap allocation in ConcatOp (#63529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63529 Output() takes an IntArrayRef, so we can just use a std::initializer_list (stack-allocated array) instead of std::vector here. ghstack-source-id: 137085908 Test Plan: existing CI Reviewed By: mruberry Differential Revision: D29687400 fbshipit-source-id: 9f2a7c6679f2552c098bb1bf7befaca18e0e5d4d	2021-08-31 18:33:32 -07:00
Edward Yang	b23e4f6086	Convert mul to use opmath_gpu_kernel_with_scalars (#64019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64019 Note that previously the functor operated on scalar_t and this modifies it to operate on opmath_t, but this is not a problem as half precision was implemented by performing the compute in float anyway. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30575282 Pulled By: ezyang fbshipit-source-id: cc6900ef996e755740afe48f9cb4d0366858dd47	2021-08-31 18:33:30 -07:00
soulitzer	0733582087	Use the correct overloaded name to skip boxed autograd not implemented kernel registration (#64182 ) Summary: Some internal use_count tests are failing for `dequantize_self` because we only compare the skip list with the base name `dequantize` when we should be comparing with the full name including the overload Pull Request resolved: https://github.com/pytorch/pytorch/pull/64182 Reviewed By: albanD Differential Revision: D30639909 Pulled By: soulitzer fbshipit-source-id: d4d22dd1a5c8f7180251ce7739830764cce6f151	2021-08-31 18:33:28 -07:00
Ray Peng	09e610e36d	[Static Runtime] Out version for softmax (#64243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64243 Test Plan: ``` > buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 ... V0830 16:35:22.524479 613839 impl.cpp:1410] Switch to out variant for node: %5 : Tensor = aten::softmax(%a.1, %dim.1, %dtype.1) ... [ OK ] StaticRuntime.IndividualOps_Softmax (803 ms) ``` Reviewed By: hlu1 Differential Revision: D30656149 fbshipit-source-id: 115b7b4a75448fd6a5c526808080ca9a4251302c	2021-08-31 18:33:26 -07:00
Eli Uriegas	0b9cdeb295	.circleci: Remove already migrated CUDA configs (#64231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64231 This migrates over the CUDA 11.1 and CUDA 10.2 configs that we had previously migrated to GHA Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: zhouzhuojie Differential Revision: D30683811 Pulled By: seemethere fbshipit-source-id: 71b0761461557d871c26eb02f665a2e4d9b1d9fb	2021-08-31 18:33:24 -07:00
Eli Uriegas	23da90ab84	.github: Consolidate linux setup / teardown (#64229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64229 Consolidates linux setup / teardown into easy to use jinja2 macros Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: zhouzhuojie, driazati Differential Revision: D30683810 Pulled By: seemethere fbshipit-source-id: 2578630df3e212fb79392a699090553baef44cc2	2021-08-31 18:31:48 -07:00
Nikita Shulga	5ecb966e0f	Add ciflow-tracking issue to pytorch-probot (#64125 ) Summary: Doesn't do anything yet... Pull Request resolved: https://github.com/pytorch/pytorch/pull/64125 Reviewed By: zhouzhuojie Differential Revision: D30620283 Pulled By: malfet fbshipit-source-id: 91869d35c1b70a55e32261d2c32fb0136ec33960	2021-08-31 17:38:34 -07:00
Mikhail Zolotukhin	9e25634833	[TensorExpr] Move declaration of buildErrorMessage to exception.h (#64301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64301 Test Plan: Imported from OSS Reviewed By: navahgar, huiguoo Differential Revision: D30678215 Pulled By: ZolotukhinM fbshipit-source-id: 599c83b3890450a0fb6526815f037eec9563661c	2021-08-31 17:37:29 -07:00
Jay Leverett	44fcb00a56	Fix redundant class definition in GraphModule singleton constructor (#64274 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64274 Reviewed By: jamesr66a Differential Revision: D30675970 Pulled By: jayleverett fbshipit-source-id: e74ef2a28013f0fa7c58d14f38e66cfe48d26b74	2021-08-31 17:34:14 -07:00
Nikita Shulga	c2da103fe6	Discover new tests in run_tests.py (#64246 ) Summary: Introduce `discover_tests` function that globs for all Python files starting with `test_` in test folder excluding subfolders which are executed differently Fixes https://github.com/pytorch/pytorch/issues/64178 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64246 Reviewed By: walterddr, seemethere Differential Revision: D30661652 Pulled By: malfet fbshipit-source-id: a52e78ec717b6846add267579dd8d9ae75326bf9	2021-08-31 17:32:55 -07:00
Richard Zou	0457a85d45	Revert D30543236: Add python mode Test Plan: revert-hammer Differential Revision: D30543236 (`4bd03b0242`) Original commit changeset: ef5444d96a5a fbshipit-source-id: b0042ac2c22765fa11d6d00bf751f6a4489eb6d8	2021-08-31 15:28:33 -07:00
Kevin Tse	6c8cb9bd76	[DataPipe] export fork, mux, demux for public usage (#64279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64279 cc VitalyFedyunin ejguan Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30671971 Pulled By: NivekT fbshipit-source-id: 056ac12ef7183b254d1eec341145594639e47ef6	2021-08-31 14:34:30 -07:00
Kevin Tse	491bf7cb74	[DataPipe] adding description, __len__, tests for mux() (#64224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64224 cc VitalyFedyunin ejguan Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30651551 Pulled By: NivekT fbshipit-source-id: f8af98ba71a592900b992a8077432062ec57bb48	2021-08-31 14:34:28 -07:00
zhouzhuojie	9a0456939b	Try the forked checkout action with retry (#64120 ) Summary: Fixes #{issue number} The main difference is: `ffc6f93ad4` Can test multiple times in this PR to see if it works, will make the `retry` number configurable if it's usable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64120 Reviewed By: malfet Differential Revision: D30656099 Pulled By: zhouzhuojie fbshipit-source-id: a89932196bb0c44e412a34664ed6a061b02ef92e	2021-08-31 14:34:26 -07:00
Rishi Puri	13484084a6	fix syntax error in bfloat16 PR (#64122 ) Summary: fixes prior syntax error from PR ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/64122 Reviewed By: H-Huang Differential Revision: D30643596 Pulled By: ngimel fbshipit-source-id: 0a2d5a40fb6dc7339cd03112e57ef0e1bf8a000e	2021-08-31 14:33:12 -07:00
Michael Carilli	8d08b103be	[CUDA graphs] Prototype API and documentation (#63269 ) Summary: RFC: https://github.com/pytorch/pytorch/issues/61880 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63269 Reviewed By: mruberry Differential Revision: D30596643 Pulled By: ngimel fbshipit-source-id: b1f8061406364b667e2c2d4d30fbce1f0d8456be	2021-08-31 13:34:23 -07:00
Rohan Varma	1c2b5e59ae	Remove ref to test_distributed_fork (#64197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64197 Removes this line as test is gone. ghstack-source-id: 136986275 Test Plan: CI Reviewed By: walterddr Differential Revision: D30642929 fbshipit-source-id: a0c7dfdfb35a4a7f7ec1b881dbea53d85136012c	2021-08-31 13:31:27 -07:00
Eli Uriegas	555171a273	.circleci: Remove migrated jobs, move docs builds (#64222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64222 Removes both backwards_compat as well as docs_test from the general gcc5.4 config and moves the docs build from being run on every PR to only being run on master. We can remove docs builds when we migrate the docs push job (including all secrets associated with that) Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30650953 Pulled By: seemethere fbshipit-source-id: ac11da6a551a6c81f3dc1d47fd81846cbfe9975a	2021-08-31 13:30:13 -07:00
Raghuraman Krishnamoorthi	347ef69529	[ao][docs] Clarify operator support for quantization (#63270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63270 Add table to quantization main page showing supported modules for static and dynamic quantization. ghstack-source-id: 137087204 Test Plan: Imported from OSS Reviewed By: HDCharles Differential Revision: D30658654 fbshipit-source-id: a82c998e1db6370596d5b0ca4c7cc96c1c90f30e	2021-08-31 12:32:47 -07:00
Vasiliy Kuznetsov	3a46edb8d8	ns for fx: make layer types more readable (#64270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64270 Before this PR, layer types were populated by doing `str(module_instance)` and `str(function)`. This resulted in moderately readable strings for modules, and poorly readable strings for functions. This PR switches the logic to use `torch.typename` utility instead. The results are significantly more readable. Example function type: ``` # before '<built-in method linear of PyCapsule object at 0x7fe9b20ce7b0>' # after 'torch._ops.quantized.PyCapsule.linear' ``` Example module type: ``` # before "<class 'torch.nn.quantized.modules.conv.Conv2d'>" # after 'torch.nn.quantized.modules.conv.Conv2d' ``` Test Plan: Manually inspect NS results for modules and functions, verify they are more readable. Manually inspect NS results for modules and functions, verify they are more readable. Imported from OSS Differential Revision: D30669545 D30669545 Reviewed By: jerryzh168 Pulled By: vkuzo fbshipit-source-id: 60959e5cafa0a4992b083bf99f5d8260f9acdac0	2021-08-31 12:31:34 -07:00
Shiyan Deng	845bc89811	[fx2trt] Add acc_ops.sign and converter for it (#63876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63876 Add `acc_ops.sign` which maps from `torch.sign`. Add a plugin (not support dynamic shape currently) for `acc_ops.sign`. The plugin calls `at::sign` directly. Test Plan: buck test mode/opt -c python.package_style=inplace -c fbcode.nvcc_arch=a100 caffe2/torch/fb/fx2trt:test_unary_ops Reviewed By: yinghai Differential Revision: D30518081 fbshipit-source-id: a0b9e6c30deac0b04b8cb09a162579e229985330	2021-08-31 11:31:53 -07:00
Saketh Are	83e28a7d28	Use stacklevel for floordiv deprecation warnings (#64034 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60548 `Tensor.__floordiv__` was indirectly deprecated by deprecation of `torch.floor_divide` (see https://github.com/pytorch/pytorch/issues/43874). Deprecating it directly provides clearer feedback. Repro: ``` import torch x = torch.tensor(0) x // 1 ``` Before this change, a deprecation warning was triggered within the C++ implementation of floor_divide: ``` UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:571.) return torch.floor_divide(self, other) ``` After this change, the warning instead cites the user's offending line of Python code: ``` UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). x // 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64034 Reviewed By: mruberry Differential Revision: D30658010 Pulled By: saketh-are fbshipit-source-id: b0e6c5008d741897509d102f4a89efb47de4aa2a	2021-08-31 11:27:56 -07:00
Raghuraman Krishnamoorthi	b9275a4003	[ao][docs] Add description of qconfig and qengine to quantization page (#63582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63582 Current quantization docs do not define qconfig and qengine. Added text to define these concepts before they are used. ghstack-source-id: 137051719 Test Plan: Imported from OSS Reviewed By: HDCharles Differential Revision: D30658656 fbshipit-source-id: a45a0fcdf685ca1c3f5c3506337246a430f8f506	2021-08-31 10:33:07 -07:00
Kushashwa Ravi Shrimali	ca8dd296ee	Add OpInfo for `nn.functional.cosine_similarity` (#62959 ) Summary: Please see https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261. Notes: * Some redundant tests from `test_nn.py` have been removed. I'm unsure about precision checks if they can be removed as well. * Broadcasting is also checked in the OpInfo for `cosine_similarity`. cc: mruberry zou3519 Chillee Pull Request resolved: https://github.com/pytorch/pytorch/pull/62959 Reviewed By: heitorschueroff Differential Revision: D30520176 Pulled By: zou3519 fbshipit-source-id: 14e902eb4bcce875edab28a1669a2ea021052b9b	2021-08-31 10:31:36 -07:00
Kevin Tse	0ef8760bf6	[DataPipe] implementing __len__ for fork (no valid length for demux) (#64215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64215 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30648672 Pulled By: NivekT fbshipit-source-id: 4780f2f6a79ae15a4009092475e7d92f96dd09a2	2021-08-31 08:32:31 -07:00
Kevin Tse	0deb7a0bc0	[DataPipe] implementing demux() (#63650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63650 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30493944 Pulled By: NivekT fbshipit-source-id: 0aa06dee8c7fb1744975b8f6a0694b90c11ef80d	2021-08-31 08:32:29 -07:00
Kevin Tse	eee054e6ea	[DataPipe] implementing fork() (#63649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63649 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30493945 Pulled By: NivekT fbshipit-source-id: 40db7d4134facd266d86bc0dc2edf2729c4e5842	2021-08-31 08:32:27 -07:00
Kimish Patel	67cb131458	Revert D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling. Test Plan: revert-hammer Differential Revision: D30327514 (`bc9277dca3`) Original commit changeset: 3bb2f2daaaed fbshipit-source-id: 0b2aa7c57d08de77c9aaa75e546a7d0938610f64	2021-08-31 08:30:36 -07:00
Harut Movsisyan	3c15822f5f	[Static Runtime] Implement aten::nonzero out variant (#64126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64126 Test Plan: Confirm out variant is called: ``` > buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 ``` Reviewed By: mikeiovine Differential Revision: D30617729 fbshipit-source-id: 752749638c8f467815efa57021cb3de5c728ab1b	2021-08-31 00:51:15 -07:00
Facebook Community Bot	a3d6dae319	Automated submodule update: FBGEMM (#64213 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `9d69998df6` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64213 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D30647878 fbshipit-source-id: b903b39441b4e28dda7eab226ac874e2227e750a	2021-08-30 21:33:17 -07:00
Kimish Patel	bc9277dca3	[Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling. (#63367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63367 This diff changes the way operator profiling is done in lite predictor benchmarking binary. Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile events and then generate operator level metric from it. Since KinetoEvents do not contain cpu clock time, now we report only wallclock time. This unifies various profiling effort that we have for benchmarking purpose. In production we will still use observer based mechanism, but the advantage of using kineto profiler is that we get few other things for free, such as: - chrome trace generation. - operator level memory profiling (to be added) - flop counts (to be added) Furthermore possible we can use python post processing script to parse chrome trace and generate output similar to torch.profiler. (To be done) Test Plan: aibench run Model without debug info: https://www.internalfb.com/intern/aibench/details/219598441154763 Model with debug info and `--print_module_info true` (see Operator summary has now module hierarchy information). https://www.internalfb.com/intern/aibench/details/617154236292985 Reviewed By: raziel Differential Revision: D30327514 fbshipit-source-id: 3bb2f2daaaedfb04bd6f5d9c91292783f9c4344f	2021-08-30 20:54:51 -07:00
Peter Bell	7ca4728e6d	Compile BatchLinearAlgebra without nvcc (#64146 ) Summary: These files only use cuda libraries interfaces, so don't actually need to be compiled with nvcc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64146 Reviewed By: ezyang Differential Revision: D30633189 Pulled By: ngimel fbshipit-source-id: c9d0ae5259a10cb49332d31f0da89ad758736ea8	2021-08-30 20:18:21 -07:00
Bert Maher	e7fb35021a	[nnc] Enable fusion of bfloat16 ops (#64196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64196 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30643864 Pulled By: bertmaher fbshipit-source-id: e95edeaf7089464d713ea1d1f951743d3e5f61c5	2021-08-30 20:09:36 -07:00
James Reed	538647fe1f	[WIP][FX] BC guarantees for 1.10 (#63888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63888 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D30523133 Pulled By: jamesr66a fbshipit-source-id: b04cc0d842a74862f42ecba98b757310cd2ec7b0	2021-08-30 19:56:46 -07:00
leslie-fang-intel	09dfaa0339	add operation list for AutocastCPU (#63534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63534 In this PR: * We have changed the default dtype of `AutocastCPU` from `float16` to `bfloat16` as discussed here `https://github.com/pytorch/pytorch/pull/61002` * We also update the operation list which needs casting to `lower_precision_fp` or `float32`. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D30644914 Pulled By: ezyang fbshipit-source-id: 8b93485ba452b3759611e3f0ac88e920fe495ac1	2021-08-30 19:30:33 -07:00
oleshp	93f1090267	Update contribution_guide.rst (#64142 ) Summary: Grammatical update. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/64142 Reviewed By: mruberry Differential Revision: D30639394 Pulled By: ezyang fbshipit-source-id: cf1a4dfbd8e34b0772f1b09f5d820278e8ef8574	2021-08-30 19:26:59 -07:00
Santiago Castro	6b85c99ce5	Avoid an unnecessary list creation in `DataChunk` (#64111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64111 Reviewed By: mruberry Differential Revision: D30639383 Pulled By: ezyang fbshipit-source-id: 96b243307413c99a67d55d862a71937e1ef210f4	2021-08-30 19:25:42 -07:00
Samantha Andow	c7c711bfb8	Add optional tensor arguments to (#63967 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63435 Adds optional tensor arguments to check handling torch function checks. The only one I didn't do this for in the functional file was `multi_head_attention_forward` since that already took care of some optional tensor arguments but not others so it seemed like arguments were specifically chosen Pull Request resolved: https://github.com/pytorch/pytorch/pull/63967 Reviewed By: albanD Differential Revision: D30640441 Pulled By: ezyang fbshipit-source-id: 5ef9554d2fb6c14779f8f45542ab435fb49e5d0f	2021-08-30 19:21:26 -07:00
CaoE	cb7cf823b3	add BFloat16 support for fold and unfold on CPU (#62880 ) Summary: Add BFloat16 support for fold and unfold operators on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62880 Reviewed By: iramazanli Differential Revision: D30576387 Pulled By: zou3519 fbshipit-source-id: c48f6e56702bfea34448db1b3a1634c49c5d8ec8	2021-08-30 19:14:10 -07:00
Edward Yang	ffc2612087	Add acc_gpu_kernel_with_scalars and port add to use it (#63884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63884 See https://dev-discuss.pytorch.org/t/cuda-loops-case-study-code-generation-vs-templates/302 for explanation of what's going on here. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30545296 Pulled By: ezyang fbshipit-source-id: f0da52153ae63599fe1d57e90e73f50ca2116939	2021-08-30 19:10:16 -07:00
Erjia Guan	a49907f984	Modify inline doc for DataPipe (#64221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64221 List of tasks in this PR - [x] Add inline doc for DataPipe - [x] Improve the inline doc - [x] Expose DataPipe to `datapipes.iter` (`UnBatcher`) Note: `Forker`, `Demux`, `Mux` are exposed in another PR authored by Kevin - [x] Add correct typing to DataPipe - [x] Unify the argument to `datapipe` rather than `source_datapipe` Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30650541 Pulled By: ejguan fbshipit-source-id: c09d1b9742b8097d8e645c15947cef80c876877b	2021-08-30 18:45:46 -07:00
Erjia Guan	af85bc5ffd	Replace group_by_key by group_by IterDataPipe (#64220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64220 Remove `ByKeyGrouperIterDataPipe` due to duplicated functionality. Fix a bug in `GrouperIterDataPipe` using the existing test. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30650542 Pulled By: ejguan fbshipit-source-id: 666b4d28282fb4f49f3ff101b8d08be16a50d836	2021-08-30 18:45:44 -07:00
Richard Zou	4bd03b0242	Add python mode (#63496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63496 This PR adds a (private) enable_python_mode context manager. (see torch/utils/_python_dispatch.py). enable_python_mode accepts the type of a __torch_dispatch__ object as its argument. Whenever an operator gets called inside of the context manager, it dispatches to the __torch_dispatch__ of the passed-in type. Example usage: ``` with enable_python_mode(LoggingTensor): z = torch.empty([]) assert isinstance(z, LoggingTensor) ``` There are quite a few changes that were made to support this. First, we added TorchDispatchTypeObject, a C++ struct that represents the type of a `__torch_dispatch__` object (e.g. LoggingTensor). It holds both the PyObject* representing the class and a PyInterpreter* so we know which Python interpreter it came from. Next, we updated the concrete_dispatch_fn in python_variable.cpp to accept a `const std::shared_ptr<TorchDispatchTypeObject>&` argument. When this is null, dispatching happens as usual. When it is non-null, we prepend the TorchDispatchTypeObject's PyObject* to the overloaded args list so that it is considered first for dispatch. To get that to work, we changed how `handle_torch_dispatch_no_python_arg_parser` works. The "overloaded args list" previously only consisted of Tensor PyObjects, but now it can have types in addition to Tensors! - We renamed `append_overloaded_arg` to `append_overloaded_arg` - We added a new `append_overloaded_type` that appends a type to overloaded_args - We added special handling in `handle_torch_dispatch_no_python_arg_parser` and `append_overloaded_arg` to handle types in addition to Tensors. Then, there is PythonMode and PythonModeTLS. - We reuse the DispatchKey::Python dispatch key as a mode key - We use PythonMode::enter and PythonMode::exit to enable/disable DispatchKey::Python and set the PythonModeTLS. - PythonModeTLS stores a TorchDispatchTypeObject as metadata. - PythonMode is in libtorch_python, and PythonModeTLS is in ATen. This split is due to the libtorch_python library boundary (because we need to save TLS in ATen/ThreadLocalState) - We modify the PythonFallbackKernel to look up the relevant TorchDispatchTypeObject (if Python Mode is active) and dispatch using it. There are two more miscellaneous changes: - internal_new_from_data (torch/csrc/utils/tensor_new.cpp) gets an exclude guard. enable_python_mode currently does not handle torch.tensor and the exclude guard is to prevent a bug. Future: - This PR does not allow for the nesting of Python modes. In the future we should be able to enable this with a more sane no_dispatch API and by changing the TLS to a stack. For now I did not need this for CompositeImplicitAutograd testing. Test Plan: - new tests Reviewed By: malfet, albanD Differential Revision: D30543236 Pulled By: zou3519 fbshipit-source-id: ef5444d96a5a957d1657b7e37dce80f9a497d452	2021-08-30 18:44:35 -07:00
Bert Maher	ebc0aacf83	[nnc] Fix half2float conversion and re-enable float16 (#64199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64199 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30643865 Pulled By: bertmaher fbshipit-source-id: 9de6adca53bd08839328cbaf6364f7de9550264b	2021-08-30 18:37:55 -07:00
Harut Movsisyan	1f16c22dc8	[Static Runtime] Implement aten::cumsum out variant (#64159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64159 Test Plan: Confirm out variant is called for both versions: ``` > buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 ``` Reviewed By: mikeiovine Differential Revision: D30622819 fbshipit-source-id: a2c8c7f969dae5f507718fb3d513e1fb4f026736	2021-08-30 16:18:22 -07:00
Richard Zou	5401159b8f	OpInfo for nn.functional.interpolate (#61956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61956 Each mode goes through a different implementation so they are listed as different variants. Test Plan: - run tests Reviewed By: malfet Differential Revision: D30013751 Pulled By: zou3519 fbshipit-source-id: 4253b40b55667d7486ef2d98b441c13d807ab292	2021-08-30 16:00:43 -07:00
Thomas J. Fan	a7ae73a238	BUG Fixes regression for nllloss gradcheck (#64203 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64163 This PR includes the fix and the opinfo from https://github.com/pytorch/pytorch/pull/63854/ for non-regression testing. cc albanD mruberry jbschlosser Pull Request resolved: https://github.com/pytorch/pytorch/pull/64203 Reviewed By: albanD Differential Revision: D30647522 Pulled By: jbschlosser fbshipit-source-id: 2974d299763505908fa93532aca2bd5d5b71f2e9	2021-08-30 15:13:09 -07:00
Ivan Yashchuk	ad4848565e	Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA] (#59980 ) Summary: This PR enables Half, BFloat16, ComplexFloat, and ComplexDouble support for matrix-matrix multiplication of COO sparse matrices. The change is applied only to CUDA 11+ builds. `cusparseSpGEMM` also supports `CUDA_C_16F` (complex float16) and `CUDA_C_16BF` (complex bfloat16). PyTorch also supports the complex float16 dtype (`ScalarType::ComplexHalf`), but there is no convenient dispatch, so this dtype is omitted in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59980 Reviewed By: ngimel Differential Revision: D29699456 Pulled By: cpuhrsch fbshipit-source-id: 407ae53392acb2f92396a62a57cbaeb0fe6e950b	2021-08-30 15:06:25 -07:00
Alban Desmaison	c3464e78a4	Revert D30561459: Fix bytes_written and bytes_read Test Plan: revert-hammer Differential Revision: D30561459 (`e98173ff34`) Original commit changeset: 976fa5167097 fbshipit-source-id: 43f4c234ca400820fe6db5b4f37a25e14dc4b0dd	2021-08-30 14:59:54 -07:00
Alban Desmaison	e4fd2ab59c	Back out "Added reference tests to ReductionOpInfo" (#64183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64183 Original commit changeset: 6a1f82ac2819 Test Plan: CI Reviewed By: soulitzer Differential Revision: D30639835 fbshipit-source-id: e238043c6fbd0453317a9ed219e348298f98aaca	2021-08-30 14:48:10 -07:00
Jerry Zhang	8f88f797db	[quant][graphmode][fx] Add reference quantized conv module (#63828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63828 Added reference quantized conv module for the custom backend flow, the reference quantized module will have the following code: ``` w(float) -- quant - dequant \ x(float) ------------- F.conv2d --- ``` In the full model, we will see ``` w(float) -- quant - dequant \ x -- quant --- dequant -- F.conv2d --- quant - dequant ``` and the backend should be able to fuse the ops with `*` into a quantized linear Test Plan: python test/test_quantization.py TestQuantizeFx.test_conv_linear_reference Imported from OSS Reviewed By: vkuzo Differential Revision: D30504749 fbshipit-source-id: e1d8c43a0e0d6d9ea2375b8ca59a9c0f455514fb	2021-08-30 14:23:17 -07:00
Daya Khudia	65050ec924	Back out "[JIT] Add aten::slice optimization" Summary: Original commit changeset: d12ee39f6828 build-break overriding_review_checks_triggers_an_audit_and_retroactive_review Oncall Short Name: dskhudia Test Plan: Local run succeeds Differential Revision: D30633990 fbshipit-source-id: 91cf7cc0ad7e47d919347c2a1527688e062e0c62	2021-08-30 14:05:04 -07:00
Eli Uriegas	09e53c0cfe	.github: Adding configuration for backwards_compat (#64204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64204 Adds backwards_compat to our existing test matrix for github actions Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30646764 Pulled By: seemethere fbshipit-source-id: f0da6027e29fab03aff058cb13466fae5dcf3678	2021-08-30 13:59:00 -07:00
Eli Uriegas	9035a1cb4d	.github: Adding configuration for docs_test (#64201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64201 Adds docs_test to our existing test matrix for github actions Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30646765 Pulled By: seemethere fbshipit-source-id: 946adae01ff1f1f7ebe626e408e161b77b19a011	2021-08-30 13:57:20 -07:00
Will Constable	85df73658c	Make name() part of IMethod interface (#63995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63995 JIT methods already have name() in their interface, and Py methods have names in their implementation. I'm adding this for a particular case where someone tried to use name() on a JIT method that we're replacing with an IMethod. Test Plan: add case to imethod API test Reviewed By: suo Differential Revision: D30559401 fbshipit-source-id: 76236721f5cd9a9d9d488ddba12bfdd01d679a2c	2021-08-30 13:31:55 -07:00
Nikita Shulga	b9933f08b9	Fix type annotation in tools/nightly.py (#64202 ) Summary: `tempfile.TemporaryDirectory` is a generic only in python-3.9 and above Workaround by wrapping type annotation in quotes Fixes https://github.com/pytorch/pytorch/issues/64017 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64202 Reviewed By: janeyx99 Differential Revision: D30644215 Pulled By: malfet fbshipit-source-id: 3c16240b9fa899bd4d572c1732a7d87d3dd0fbd5	2021-08-30 13:27:43 -07:00
lezcano	f3e329cbec	Implements the orthogonal parametrization (#62089 ) Summary: Implements an orthogonal / unitary parametrisation. It does passes the tests and I have trained a couple models with this implementation, so I believe it should be somewhat correct. Now, the implementation is very subtle. I'm tagging nikitaved and IvanYashchuk as reviewers in case they have comments / they see some room for optimisation of the code, in particular of the `forward` function. Fixes https://github.com/pytorch/pytorch/issues/42243 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62089 Reviewed By: ezyang Differential Revision: D30639063 Pulled By: albanD fbshipit-source-id: 988664f333ac7a75ce71ba44c8d77b986dff2fe6	2021-08-30 13:12:07 -07:00
Tanvir Zaman	e98173ff34	Fix bytes_written and bytes_read (#64040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64040 In operator cost inference functions, in many places we are using sizeof(x.data_type()). Since data_type() returns a 32 bit integer from [this enum](https://www.internalfb.com/code/fbsource/[15e7ffe4073cf08c61077c7c24a4839504b964a2]/fbcode/caffe2/caffe2/proto/caffe2.proto?lines=20), we are basically always getting 4 for sizeof(x.data_type()) no matter what actual data type x has. Big thanks to Jack Langman for specifically pointing to this bug. We would instead use the size in bytes based on actual data type. Test Plan: Added unit tests BatchMatMulMemCostTest: buck test //caffe2/caffe2/fb/fbgemm:batch_matmul_op_test -- BatchMatMulMemCostTest Extended existing unit test test_columnwise_concat for different data types: buck test //caffe2/caffe2/python/operator_test:concat_op_cost_test -- test_columnwise_concat Differential Revision: D30561459 fbshipit-source-id: 976fa5167097a35af548498480001aafd7851d93	2021-08-30 12:57:31 -07:00
Philip Meier	eafe33c995	remove componentwise comparison of complex values in torch.testing.assert_close (#63841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63841 Closes #61906. cc ezyang gchanan Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30633526 Pulled By: mruberry fbshipit-source-id: ddb5d61838cd1e12d19d0093799e827344382cdc	2021-08-30 12:38:44 -07:00
Philip Meier	401bbb2aa0	remove componentwise comparison of complex values in TestCase.assertEqual (#63572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63572 Addresses #61906. Issue will be fixed later in the stack when `torch.testing.assert_close` got the same treatment. cc ezyang gchanan Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30633527 Pulled By: mruberry fbshipit-source-id: c2002a4998a7a75cb2ab83f87190bde43a9d4f7c	2021-08-30 12:36:45 -07:00
Xiang Gao	a8ffe81b2c	Bring back old algorithm for sorting on small number of segments (#64127 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63456 The code was copy-pasted from the previous commit without modification. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64127 Reviewed By: mruberry Differential Revision: D30632090 Pulled By: ngimel fbshipit-source-id: 58bbdd9b0423f01d4e65e2ec925ad9a3f88efc9b	2021-08-30 12:30:50 -07:00
Kushashwa Ravi Shrimali	d37636901e	[Doc] `make_tensor` to `torch.testing` module (#63925 ) Summary: This PR aims to add `make_tensor` to the `torch.testing` module in PyTorch docs. TODOs: * [x] Add examples cc: pmeier mruberry brianjo Pull Request resolved: https://github.com/pytorch/pytorch/pull/63925 Reviewed By: ngimel Differential Revision: D30633487 Pulled By: mruberry fbshipit-source-id: 8e5a1f880c6ece5925b4039fee8122bd739538af	2021-08-30 12:25:40 -07:00
Peter Bell	5b0dfd0f8a	Fix bad use of channels last kernel in sync batch norm backward (#64100 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64039 There are two distinct problems here. 1. If `grad_output` is channels last but not input, then input would be read as-if it were channels last. So reading the wrong values. 2. `use_channels_last_kernels` doesn't guarunte that `suggest_memory_format` will actually return channels last, so use `empty_like` instead so the strides always match. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64100 Reviewed By: mruberry Differential Revision: D30622127 Pulled By: ngimel fbshipit-source-id: e28cc57215596817f1432fcdd6c49d69acfedcf2	2021-08-30 12:16:30 -07:00
Zhengxu Chen	ac99d63f83	[jit] Make operation call accept Stack& instead Stack* (#63414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63414 Misuse of raw pointer in here where stack is never nullable. ghstack-source-id: 136938318 Test Plan: compiles. Imported from OSS Reviewed By: ejguan Differential Revision: D30375410 fbshipit-source-id: 9d65b620bb76d90d886c800f54308520095d58ee	2021-08-30 11:49:20 -07:00
=	93d2e5090f	Improve performance of index_select by avoiding item (#63008 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/61788 From a CUDA perspective: item already pulls all Tensor content onto the host (albeit one-by-one), which incurs very expensive memory transfers. This way we'll do it all at once. From a CPU perspective: item has a lot of overhead as a native function in comparison to simply using a pointer. Overall there's still lots of performance gains to be had, but this is a small change that should take us into a more usable landscape. This doesn't land a separate benchmark, but I postulate that's not necessary to decide on the benefit of this (we'll also see if it shows up indirectly), however is still a good follow-up item. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63008 Reviewed By: zou3519 Differential Revision: D30211160 Pulled By: cpuhrsch fbshipit-source-id: 70b752be5df51afc66b5aa1c77135d1205520cdd	2021-08-30 09:50:41 -07:00
Harut Movsisyan	e24c3644d8	[Static Runtime] aten::cat out version when it is not being replaced by prim::VarConcat (#64157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64157 UseVariadicCat optimization is not applied to aten::cat if list input to the op can not be moved to the position before op (https://fburl.com/diffusion/l6kweimu). For these cases we will need out version for SR. Test Plan: Confirm out variant is called: ``` > buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 ``` Reviewed By: d1jang Differential Revision: D30598574 fbshipit-source-id: 74cfa8291dc8b5df4aef58adfb1ab2a16f10d90a	2021-08-30 09:42:38 -07:00
Scott Wolchok	16ecdbbaa2	[PyTorch] Fix missing move in unpickler (#63974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63974 Saw some time spent in this for model loading, no reason not to move here. ghstack-source-id: 136760979 Test Plan: Re-profile model loading on devserver; IValue copy ctor time has gone down Reviewed By: dhruvbird Differential Revision: D30548923 fbshipit-source-id: 42000f2e18582762b43353cca10ae094833de3b3	2021-08-30 09:38:55 -07:00
Scott Wolchok	9777887f0e	[PyTorch] Reduce copies/refcount bumps in BytecodeDeserializer::parseMethods (#63961 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63961 Saw a report that this function was slow and was doing unexplained vector copies. First pass to remove a bunch of copying. ghstack-source-id: 136760976 Test Plan: Pixel 3 before: https://our.intern.facebook.com/intern/aibench/details/461850118893980 after: https://www.internalfb.com/intern/aibench/details/48965886029524 MilanBoard failed to return data from simpleperf Reviewed By: dhruvbird Differential Revision: D30544551 fbshipit-source-id: 0e2b5471a10c0803d52c923e6fb5625f5542b99d	2021-08-30 09:37:10 -07:00
Raghavan Raman	dc4fd3bdda	[MicroBench] Added a micro benchmark for a signed log1p kernel. (#64032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64032 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30579198 Pulled By: navahgar fbshipit-source-id: a53d68225fba768b26491d14b535f8f2dcf50c0e	2021-08-30 09:27:51 -07:00
Facebook Community Bot	f79df24859	Automated submodule update: FBGEMM (#64149 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `f6dfed87a1` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64149 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D30632209 fbshipit-source-id: aa1cebaf50169c3a93dbcb994fa47e29d6b6a0d7	2021-08-30 08:30:57 -07:00
Vitaly Fedyunin	82174330d0	[DataLoader2] Adding Messages, Protocols, Loop wrappers (#63882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63882 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30627452 Pulled By: VitalyFedyunin fbshipit-source-id: 561ea2df07f3572e04401171946154024126387b	2021-08-30 07:57:20 -07:00
Rong Rong (AI Infra)	7701ea48be	remove one more distributed test (#64108 ) Summary: Follow up on https://github.com/pytorch/pytorch/issues/62896. one more place we should remove distributed test Pull Request resolved: https://github.com/pytorch/pytorch/pull/64108 Reviewed By: janeyx99, soulitzer Differential Revision: D30614062 Pulled By: walterddr fbshipit-source-id: 6576415dc2d481d65419da19c5aa0afc37a86cff	2021-08-30 07:51:11 -07:00
Raghavan Raman	093a12aaa9	[nnc] Updated internal asserts to include more detailed error messages (#64118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64118 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30616944 Pulled By: navahgar fbshipit-source-id: 35289696cc0e7faa01599304243b86f0febc6daf	2021-08-30 04:40:51 -07:00
Raghavan Raman	a836d83957	[nnc] Fixed warning due to implicit parameter conversion (#64117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64117 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30616945 Pulled By: navahgar fbshipit-source-id: eaf69232ac4a684ab5f97a54a514971655f86ef3	2021-08-30 04:39:34 -07:00
Thomas J. Fan	d3bcba5f85	ENH Adds label_smoothing to cross entropy loss (#63122 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/7455 Partially resolves pytorch/vision#4281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63122 Reviewed By: iramazanli Differential Revision: D30586076 Pulled By: jbschlosser fbshipit-source-id: 06afc3aa1f8b9edb07fe9ed68c58968ad1926924	2021-08-29 23:33:04 -07:00
Harut Movsisyan	8af1407eab	[Static Runtime] Out version for torch.linalg.norm (#64070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64070 Test Plan: Confirm out variant is called for both versions: ``` > buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 ``` Reviewed By: d1jang Differential Revision: D30595816 fbshipit-source-id: e88d88d4fc698774e83a98efce66b8fa4e281563	2021-08-29 21:00:11 -07:00
Zafar Takhirov	44e3ed88c9	[quant] AO migration of the `quantize.py` (#64086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64086 AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates the `quantize.py` from torch.quantization to `torch.ao.quantization`. At this point both locations will be supported. Eventually the torch.quantization will be deprecated. Test Plan: `buck test mode/opt //caffe2/test:quantization` Reviewed By: jerryzh168, raghuramank100 Differential Revision: D30055886 fbshipit-source-id: 8ef7470f9fa640c0042bef5bb843e7a05ecd0b9f	2021-08-29 20:30:01 -07:00
Mike Ruberry	29ad84f252	Removes beta warning from the special module documentation (#64148 ) Summary: Updates documentation per feature review. torch.special is now stable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64148 Reviewed By: ngimel Differential Revision: D30632049 Pulled By: mruberry fbshipit-source-id: 8f6148ec7737e7b3a90644eeca23eb217eda513d	2021-08-29 19:38:46 -07:00
mingfeima	c5ed31e4a7	add channel last support for MaxUnpool2d (#49984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49984 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26007051 Pulled By: VitalyFedyunin fbshipit-source-id: 6c54751ade4092e03c1651aaa60380f7d6e92f6b	2021-08-29 18:37:10 -07:00
Nikita Shulga	9db56531f7	Revert D30620966: [pytorch][PR] Move Parallel[Native\|TBB] to GHA Test Plan: revert-hammer Differential Revision: D30620966 (`223f886032`) Original commit changeset: 9a23e4b3e168 fbshipit-source-id: b9248d377b9a7b850dfb3f10f3350fbc9855acfe	2021-08-29 15:51:27 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	710a2e933f	[DOC] Add doc for maybe_wrap_dim (#63161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63161 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D30629451 Pulled By: tugsbayasgalan fbshipit-source-id: b03f030f197e10393a8ff223b240d23c30858028	2021-08-29 14:19:28 -07:00
Garrett Cramer	7ebdbf82dc	add support for sending cpu sparse tensors over rpc (#62794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62794 This pr updates jit serialization to support pickling Sparse COO tensors. This pr updates message.cpp to support Sparse COO tensors. A bug was filed a few years ago https://github.com/pytorch/pytorch/issues/30807. I tested the fix by adding sparse tensor tests to rpc_test.py and dist_autograd_test.py. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23 gmagogsfm Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D30608848 Pulled By: gcramer23 fbshipit-source-id: 629ba8e4a3d8365875a709c9b87447c7a71204fb	2021-08-29 11:35:00 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	52d7dd7398	[DOC] improve docstring for Optimizer.state_dict (#63153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63153 Fixes: https://github.com/pytorch/pytorch/issues/60121 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D30629462 Pulled By: tugsbayasgalan fbshipit-source-id: a9160e02ac53bb1a6219879747d73aae9ebe4d2f	2021-08-29 10:20:58 -07:00
Facebook Community Bot	371c6612b3	Automated submodule update: FBGEMM (#64141 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `9939bac9de` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64141 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D30629417 fbshipit-source-id: 1b1ad3d4caff925f798b86b358ab193554c9b8e0	2021-08-29 09:58:04 -07:00
Bert Maher	2e6221a232	[nnc] Make 64-bit dimensions work (#64077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64077 We were assuming kernel dimensions fit in 32 bits (the old fuser made this assumption too), but we should be able to support 64. ghstack-source-id: 136933272 Test Plan: unit tests; new IR level test with huge sizes Reviewed By: ZolotukhinM Differential Revision: D30596689 fbshipit-source-id: 23b7e393a2ebaecb0c391a6b1f0c4b05a98bcc94	2021-08-28 19:59:47 -07:00
Bert Maher	405c15516c	Parse int64 sizes/strides (#64076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64076 We were parsing sizes into int32s, so if you had a tensor with more than 2^32 elements, you couldn't represent it. ghstack-source-id: 136933273 Test Plan: parseIR with size of 4e9 Reviewed By: ZolotukhinM Differential Revision: D30521116 fbshipit-source-id: 1e28e462cba52d648e0e2acb4e234d86aae25a3e	2021-08-28 19:58:34 -07:00
Bert Maher	4f969db325	[nnc] Fix batchnorm implementation (#64112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64112 Fixes #64062 Test Plan: Imported from OSS Reviewed By: zhxchen17 Differential Revision: D30622897 Pulled By: bertmaher fbshipit-source-id: 7d7c6131aa786e61fa1d0a517288396a0bdb1d22	2021-08-28 19:20:35 -07:00
Ilqar Ramazanli	aefa2f3e64	To add RMSProp algorithm documentation (#63721 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of RMSProp to the documentation. For more details, we refer to the paper https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf <img width="464" alt="RMSProp" src="https://user-images.githubusercontent.com/73658284/131179226-3fb6fe5a-5301-4948-afbe-f38bf57f24ff.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63721 Reviewed By: albanD Differential Revision: D30612426 Pulled By: iramazanli fbshipit-source-id: c3ac630a9658d1282866b53c86023ac10cf95398	2021-08-28 15:55:56 -07:00
Facebook Community Bot	8b6266fe4f	Automated submodule update: FBGEMM (#64129 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `f14e794814` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64129 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D30621549 fbshipit-source-id: 34c109e75c96a261bf370f7a06dbb8b9004860ab	2021-08-28 11:56:17 -07:00
Nikita Shulga	223f886032	Move Parallel[Native\|TBB] to GHA (#64123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64123 Reviewed By: driazati Differential Revision: D30620966 Pulled By: malfet fbshipit-source-id: 9a23e4b3e16870f77bf18df4370cd468603d592d	2021-08-28 11:50:30 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	d0c63e857d	Enhancement for smart serialization for out schemas (#63096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63096 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D30415255 Pulled By: tugsbayasgalan fbshipit-source-id: eb40440a3b46258394d035479f5fc4a4baa12bcc	2021-08-28 11:46:27 -07:00
Priya Ramani	f4496528e3	[Light] Fix error message (#64010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64010 Fixing typos in a error message Test Plan: Error message before fix: Lite Interpreter verson number does not match. The model version must be between 3 and 5But the model version is 6 Error message after fix: Lite Interpreter version number does not match. The model version must be between 3 and 5 but the model version is 6 Reviewed By: larryliu0820 Differential Revision: D30568367 fbshipit-source-id: 205f3278ee8dcf38579dbb828580a9e986ccacc1	2021-08-27 22:54:38 -07:00
Jerry Zhang	0d0605eaa9	[quant][graphmode][fx] Add reference quantized linear module (#63627 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63627 Added reference quantized linear module for the custom backend flow, the reference quantized module will have the following code: ``` w(float) -- quant - dequant \ x(float) ------------- F.linear --- ``` In the full model, we will see ``` w(float) -- quant - dequant \ x -- quant --- dequant -- F.linear --- quant - dequant ``` and the backend should be able to fuse the ops with `*` into a quantized linear Test Plan: python test/test_quantization.py TestQuantizeFx.test_conv_linear_reference Imported from OSS Reviewed By: vkuzo Differential Revision: D30504750 fbshipit-source-id: 5729921745c2b6a0fb344efc3689f3b170e89500	2021-08-27 22:53:24 -07:00
Yuchen Huang	a3a7a67048	[iOS][GPU] Consolidate array and non-array kernel for hardswish (#63369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63369 ghstack-source-id: 136918152 (Note: this ignores all push blocking failures!) Test Plan: - `buck test pp-macos` - Op tests in PyTorchPlayground app - Run mobilenetv3 test https://pxl.cl/1Ncls Reviewed By: xta0 Differential Revision: D30354454 fbshipit-source-id: 88bf4f8b5871e63170161b3f3e44f99b8a3086c6	2021-08-27 19:31:08 -07:00
Ilqar Ramazanli	9ccb9299e0	To add Nesterov Adam algorithm description to documentation (#63793 ) Summary: It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236. In this PR we are adding description of Nesterov Adam Algorithm to the documentation. For more details, we refer to the paper https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ <img width="439" alt="NAdam" src="https://user-images.githubusercontent.com/73658284/131185124-e81b2edf-33d9-4a9d-a7bf-f7e5eea47d7c.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63793 Reviewed By: NivekT Differential Revision: D30617057 Pulled By: iramazanli fbshipit-source-id: cd2054b0d9b6883878be74576e86e307f32f1435	2021-08-27 19:29:34 -07:00
Mike Iovine	07c5cb8c48	[Static Runtime] Optimize memory planner initialization (#64101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64101 Checking `getOutOfPlaceOperation(n)` is a very expensive operation, especially in multithreaded environments, due to a lock acquisition when the NNC cache is queried. This slows down the memory planner initialization time, and by extension, the latency for the first static runtime inference. There are two optimizations in this diff: * Cache the result of `p_node->has_out_variant()` to avoid the call to `getOutOfPlaceOperation`. This speeds up calls to `canReuseInputOutputs`, which in turn speeds up `isOptimizableContainerType` * Precompute all `isOptimizableContainerType` during static runtime initialization to avoid a pass over all of each node's inputs. Test Plan: All unit tests pass: `buck test caffe2/benchmarks/static_runtime/...` Reviewed By: movefast1990 Differential Revision: D30595579 fbshipit-source-id: 70aaa7af9589c739c672788bf662f711731864f2	2021-08-27 17:40:43 -07:00
Mikhail Zolotukhin	2d75ab0c8f	[TensorExpr] Update tutorial. (#64109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64109 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30614050 Pulled By: ZolotukhinM fbshipit-source-id: e8f9bd9ef2483e6eafbc0bd5394d311cd694c7b2	2021-08-27 16:19:29 -07:00
Eli Uriegas	3abbcf079d	.github: Add cpp_docs job to current gcc5 workflow (#64044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64044 Adds the cpp_docs job to the current workflow, also modifies the scripts surrounding building docs so that they can be powered through environment variables with sane defaults rather than having to have passed arguments. Ideally should not break current jobs running in circleci but those should eventually be turned off anyways. Coincides with work from: * https://github.com/seemethere/upload-artifact-s3/pull/1 * https://github.com/seemethere/upload-artifact-s3/pull/2 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30610010 Pulled By: seemethere fbshipit-source-id: f67adeb1bd422bb9e24e0f1ec0098cf9c648f283	2021-08-27 16:06:12 -07:00
soulitzer	6ccb74b837	Update codegen to use boxed kernel (#63459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63459 - Replaces the usual registration basically when "requires_derivative" is True (as in we still need a grad_fn), but `fn.info` is `None` (TODO maybe make sure differentiable inputs > 0 also to match requires_derivative). - Adds some (temporary?) fixes to some sparse functions See: https://github.com/pytorch/pytorch/issues/63549 - To remove the codegen that generates NotImplemented node (though that should only be one line), because there are some ops listed under `RESET_GRAD_ACCUMULATOR` that have a extra function call. We would need to make this list of ops available to c++, but this would either mean we'd have to codegen a list of strings, or move the RESET_GRAD_ACCUMULATOR to cpp land. We could do this in a future PR if necessary. Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30518571 Pulled By: soulitzer fbshipit-source-id: 99a35cbced46292d1b4e51594ae4d534c2caf8b6	2021-08-27 15:01:50 -07:00
soulitzer	90a6498a12	Add autograd not implemented boxed fallback (#63458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63458 See description and discussion from https://github.com/pytorch/pytorch/pull/62450 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30518572 Pulled By: soulitzer fbshipit-source-id: 3b1504d49abb84560ae17077f0dec335749c9882	2021-08-27 15:00:28 -07:00
Jessica Choi	8406dba65a	Removing references to ProcessGroupAgent in comments (#64051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64051 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30587076 Pulled By: jaceyca fbshipit-source-id: 414cb95faad0b4da0eaf2956c0668af057f93574	2021-08-27 14:47:37 -07:00
Erjia Guan	bdde898d9c	Add README to datapipes (#63982 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63982 Add a readme to `datapipes` for developer. This is can be a replacement of https://github.com/pytorch/pytorch/blob/master/torch/utils/data/datapipes_tutorial_dev_loaders.ipynb After this PR is landed, the README.md will be added to PyTorch Wiki Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D30554198 Pulled By: ejguan fbshipit-source-id: 6091aae8ef915c7c1f00fbf45619c86c9558d308	2021-08-27 14:17:08 -07:00
Vincent Phan	358c46f99e	Implement leaky relu op Summary: Implemented leaky relu op as per: https://www.internalfb.com/tasks/?t=97492679 Test Plan: buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" all tests pass, including new ones Reviewed By: SS-JIA Differential Revision: D30186225 fbshipit-source-id: fdb1f8f7b3a28b5504581822185c0475dcd53a3e	2021-08-27 13:52:49 -07:00
Patrick Hu	18cb3fc910	[FX] Validate data type of target on Node Construction (#64050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64050 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D30585535 Pulled By: yqhu fbshipit-source-id: 96778a87e75f510b4ef42f0e5cf76b35b7b2f331	2021-08-27 13:40:57 -07:00
Ivan Yashchuk	ff4569ae29	Sparse CUDA: rename files .cu -> .cpp (#63894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63894 This PR introduces a few code structure changes. There is no need to use .cu extension for pure c++ code without cuda. Moved `s_addmm_out_csr_sparse_dense_cuda_worker` to a separate cpp file from cu file. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30548771 Pulled By: cpuhrsch fbshipit-source-id: 6f12d36e7e506d2fdbd57ef33eb73192177cd904	2021-08-27 13:22:54 -07:00
Scott Wolchok	8fc1064b7f	[PyTorch] Reduce code size of register_prim_ops.cpp (#61494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61494 Creating a constexpr array and then looping over it is much cheaper than emitting a function call per item. ghstack-source-id: 136639302 Test Plan: fitsships Buildsizebot some mobile apps to check size impact. Reviewed By: dhruvbird, iseeyuan Differential Revision: D29646977 fbshipit-source-id: 6144999f6acfc4e5dcd659845859702051344d88	2021-08-27 12:56:35 -07:00
Marjan Fariborz	6a76ee04de	Adding alltoall_single collective to collective quantization API (#63154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63154 The collective quantization API now supports alltoall, alltoall_single, and allscatter. The test is also included. ghstack-source-id: 136856877 Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/algorithms/quantization:DistQuantizationTests_nccl -- test_all_to_all_single_bfp16 Reviewed By: wanchaol Differential Revision: D30255251 fbshipit-source-id: 856f4fa12de104689a03a0c8dc9e3ecfd41cad29	2021-08-27 12:46:31 -07:00
albanD	04108592a3	New TLS to disable forward mode AD (#63117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63117 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30388097 Pulled By: albanD fbshipit-source-id: f1bc777064645db1ff848bdd64af95bffb530984	2021-08-27 11:59:24 -07:00
Karen Zhou	6257f5b168	[pruner] add README to repo (#64099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64099 adding readme to pruner in OSS ghstack-source-id: 136867516 Test Plan: should not affect behavior Reviewed By: z-a-f Differential Revision: D30608045 fbshipit-source-id: 3e9899a853395b2e91e8a69a5d2ca5f3c2acc646	2021-08-27 11:52:59 -07:00
mrshenli	101a626330	Improve `distributed.get_rank()` API docstring (#63296 ) Summary: See discussion in https://pytorch.slack.com/archives/CBHSWPNM7/p1628792389008600 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63296 Reviewed By: cbalioglu Differential Revision: D30332042 Pulled By: mrshenli fbshipit-source-id: 3a642fda2e106fd35b67709ed2adb60e408854c2	2021-08-27 11:34:55 -07:00
Joel Schlosser	196fd3ee7a	Modules note v2 (#63963 ) Summary: This PR expands the [note on modules](https://pytorch.org/docs/stable/notes/modules.html) with additional info for 1.10. It adds the following: * Examples of using hooks * Examples of using apply() * Examples for ParameterList / ParameterDict * register_parameter() / register_buffer() usage * Discussion of train() / eval() modes * Distributed training overview / links * TorchScript overview / links * Quantization overview / links * FX overview / links * Parametrization overview / link to tutorial Pull Request resolved: https://github.com/pytorch/pytorch/pull/63963 Reviewed By: albanD Differential Revision: D30606604 Pulled By: jbschlosser fbshipit-source-id: c1030b19162bcb5fe7364bcdc981a2eb6d6e89b4	2021-08-27 11:30:18 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	19c1b45f25	Detect out argument in the schema (#62755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62755 After this change, out argument can be checked by calling is_out() Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30415256 Pulled By: tugsbayasgalan fbshipit-source-id: b2e1fa46bab7c813aaede1f44149081ef2df566d	2021-08-27 11:20:33 -07:00
Don Jang	9f1f22b9bc	[Static Runtime] Add out variant of quantized::embedding_bag_byte_prepack (#64081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64081 This change add an out variant of `quantized::embedding_bag_byte_prepack`. Test Plan: - Added `ShapeInferenceTest.QEmbeddingBagByteUnpack`. - Observed ``` V0824 13:38:49.723708 1322143 impl.cpp:1394] Switch to out variant for node: %2 : Tensor = quantized::embedding_bag_byte_prepack(%input) ``` Reviewed By: hlu1 Differential Revision: D30504216 fbshipit-source-id: 1d9d428e77a15bcc7da373d65e7ffabaf9c6caf2	2021-08-27 10:53:23 -07:00
BBuf	6ab3a21098	fix resize bug (#61166 ) Summary: I think the original intention here is to only take effect in the case of align_corners (because output_size = 1 and the divisor will be 0), but it affects non-align_corners too. For example: ```python input = torch.tensor( np.arange(1, 5, dtype=np.int32).reshape((1, 1, 2, 2)) ) m = torch.nn.Upsample(scale_factor=0.5, mode="bilinear") of_out = m(input) ``` The result we expect should be [[[[2.5]]]] but pytorch get [[[[1.0]]]] which is different from OpenCV and PIL, this pr try to fixed it。 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61166 Reviewed By: malfet Differential Revision: D30543178 Pulled By: heitorschueroff fbshipit-source-id: 21a4035483981986b0ae4a401ef0efbc565ccaf1	2021-08-27 10:49:31 -07:00
Pierluigi Taddei	538c30a713	[caffe2] fixes to allow stricter compilation flag (#64016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64016 In order to increase the strictness of the compilation for some target depending on caffe2 we need to fix some errors uncovered when rising such flags. This change introduces the required override tokens for virtual destructors Test Plan: CI. Moreover targets depending on caffe2 using clang strict warnings now compile Reviewed By: kalman5 Differential Revision: D30541714 fbshipit-source-id: 564af31b4a9df3536d7d6f43ad29e1d0c7040551	2021-08-27 10:38:37 -07:00
Heitor Schueroff	eca87f729d	Added reference tests to ReductionOpInfo (#62900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62900 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30408815 Pulled By: heitorschueroff fbshipit-source-id: 6a1f82ac281920ff7405a42f46ccd796e60af9d6	2021-08-27 10:32:16 -07:00
Mike Iovine	babd449978	[JIT] Add aten::slice optimization (#63049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63049 Given a graph produced from a function like this: ``` def foo(): li = [1, 2, 3, 4, 5, 6] return li[0:2] ``` This pass produces a graph like this: ``` def foo(): li = [1, 2] return li ``` These changes are mostly adapted from https://github.com/pytorch/pytorch/pull/62297/ Test Plan: `buck test //caffe2/test:jit -- TestPeephole` Reviewed By: eellison Differential Revision: D30231044 fbshipit-source-id: d12ee39f68289a574f533041a5adb38b2f000dd5	2021-08-27 10:12:45 -07:00
Jonathan Chang	3abb606091	Add doc for nn.MultiMarginLoss (shape, example) (#63760 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63747 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63760 Reviewed By: malfet Differential Revision: D30541581 Pulled By: jbschlosser fbshipit-source-id: 99560641e614296645eb0e51999513f57dfcfa98	2021-08-27 09:51:05 -07:00
Peter Bell	a9983ac09c	Refactor structured set_output in Register{DispatchKey}.cpp (#62188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62188 These parts of the `set_output` code are identical for all operators in the kernel registration files. So, this moves them from being copied into every class to two helper functions at the top of the file. Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29962045 Pulled By: albanD fbshipit-source-id: 753b8aac755f3c91b77ffa2c30a89ac91a84b7c4	2021-08-27 09:38:27 -07:00
Sergei Vorobev	f922b58b5f	[bazel] GPU-support: add @local_config_cuda and @cuda (#63604 ) Summary: ## Context We take the first step at tackling the GPU-bazel support by adding bazel external workspaces `local_config_cuda` and `cuda`, where the first one has some hardcoded values and lists of files, and the second one provides a nicer, high-level wrapper that maps into the already expected by pytorch bazel targets that are guarded with `if_cuda` macro. The prefix `local_config_` signifies the fact that we are breaking the bazel hermeticity philosophy by explicitly relaying on the CUDA installation that is present on the machine. ## Testing Notice an important scenario that is unlocked by this change: compilation of cpp code that depends on cuda libraries (i.e. cuda.h and so on). Before: ``` sergei.vorobev@cs-sv7xn77uoy-gpu-1628706590:~/src/pytorch4$ bazelisk build --define=cuda=true //:c10 ERROR: /home/sergei.vorobev/src/pytorch4/tools/config/BUILD:12:1: no such package 'tools/toolchain': BUILD file not found in any of the following directories. Add a BUILD file to a directory to mark it as a package. - /home/sergei.vorobev/src/pytorch4/tools/toolchain and referenced by '//tools/config:cuda_enabled_and_capable' ERROR: While resolving configuration keys for //:c10: Analysis failed ERROR: Analysis of target '//:c10' failed; build aborted: Analysis failed INFO: Elapsed time: 0.259s INFO: 0 processes. FAILED: Build did NOT complete successfully (2 packages loaded, 2 targets configured) ``` After: ``` sergei.vorobev@cs-sv7xn77uoy-gpu-1628706590:~/src/pytorch4$ bazelisk build --define=cuda=true //:c10 INFO: Analyzed target //:c10 (6 packages loaded, 246 targets configured). INFO: Found 1 target... Target //:c10 up-to-date: bazel-bin/libc10.lo bazel-bin/libc10.so INFO: Elapsed time: 0.617s, Critical Path: 0.04s INFO: 0 processes. INFO: Build completed successfully, 1 total action ``` The `//:c10` target is a good testing one for this, because it has such cases where the [glob is different](`075024b9a3/BUILD.bazel (L76-L81)`), based on do we compile for CUDA or not. ## What is out of scope of this PR This PR is a first in a series of providing the comprehensive GPU bazel build support. Namely, we don't tackle the [cu_library](`11a40ad915/tools/rules/cu.bzl (L2)`) implementation here. This would be a separate large chunk of work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63604 Reviewed By: soulitzer Differential Revision: D30442083 Pulled By: malfet fbshipit-source-id: b2a8e4f7e5a25a69b960a82d9e36ba568eb64595	2021-08-27 09:33:42 -07:00
Hanton Yang	22d38bd10d	[OSS] Enable Metal in PyTorch MacOS nightly builds (#63718 ) Summary: Build on https://github.com/pytorch/pytorch/pull/63825 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63718 Test Plan: 1.Add `ci/binaries` label to PR, so the CI will build those nightly builds 2.Make sure the following CI jobs build with `USE_PYTORCH_METAL_EXPORT` option is `ON`: ``` ci/circleci: binary_macos_arm64_conda_3_8_cpu_nightly_build ci/circleci: binary_macos_arm64_conda_3_9_cpu_nightly_build ci/circleci: binary_macos_arm64_wheel_3_8_cpu_nightly_build ci/circleci: binary_macos_arm64_wheel_3_9_cpu_nightly_build ci/circleci: binary_macos_conda_3_6_cpu_nightly_build ci/circleci: binary_macos_conda_3_7_cpu_nightly_build ci/circleci: binary_macos_conda_3_8_cpu_nightly_build ci/circleci: binary_macos_conda_3_9_cpu_nightly_build ci/circleci: binary_macos_libtorch_3_7_cpu_nightly_build ci/circleci: binary_macos_wheel_3_6_cpu_nightly_build ci/circleci: binary_macos_wheel_3_7_cpu_nightly_build ci/circleci: binary_macos_wheel_3_8_cpu_nightly_build ci/circleci: binary_macos_wheel_3_9_cpu_nightly_build ``` 3.Test `conda` and `wheel` builds locally on [HelloWorld-Metal](https://github.com/pytorch/ios-demo-app/tree/master/HelloWorld-Metal) demo with [(Prototype) Use iOS GPU in PyTorch](https://pytorch.org/tutorials/prototype/ios_gpu_workflow.html) (1) conda ``` conda install https://15667941-65600975-gh.circle-artifacts.com/0/Users/distiller/project/final_pkgs/pytorch-1.10.0.dev20210826-py3.8_0.tar.bz2 ``` (2) wheel ``` pip3 install https://15598647-65600975-gh.circle-artifacts.com/0/Users/distiller/project/final_pkgs/torch-1.10.0.dev20210824-cp38-none-macosx_10_9_x86_64.whl ``` Reviewed By: xta0 Differential Revision: D30593167 Pulled By: hanton fbshipit-source-id: 471da204e94b29c11301c857c50501307a5f0785	2021-08-27 09:25:05 -07:00
Aswin Murali	a43e7a51d7	Adds return type annotation for fork_rng function (#63724 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63723 Since it's a generator function the type annotation shall be `Generator`. ![image](https://user-images.githubusercontent.com/47299190/130318830-29ef9529-0daa-463c-90b2-1b11f63ade8a.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/63724 Reviewed By: iramazanli Differential Revision: D30543098 Pulled By: heitorschueroff fbshipit-source-id: ebdd34749defe1e26c899146786a0357ab4b4b9b	2021-08-27 09:03:40 -07:00
gmagogsfm	ad8eddbd80	More robust check of whether a class is defined in torch (#64083 ) Summary: This would prevent bugs for classes that 1) Is defined in a module that happens to start with `torch`, say `torchvision` 2) Is defined in torch but with an import alias like `import torch as th` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64083 Reviewed By: soulitzer Differential Revision: D30598369 Pulled By: gmagogsfm fbshipit-source-id: 9d3a7135737b2339c9bd32195e4e69a9c07549d4	2021-08-27 08:55:35 -07:00
Harut Movsisyan	f2c47cf4db	[Static Runtime] Out version for fmod (#64046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64046 Test Plan: Confirm out variant is used: ``` > //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1 V0826 23:31:30.321382 193428 impl.cpp:1395] Switch to out variant for node: %4 : Tensor = aten::fmod(%a.1, %b.1) ``` Reviewed By: mikeiovine Differential Revision: D30581228 fbshipit-source-id: dfab9a16ff8afd40b29338037769f938f154bf74	2021-08-27 03:05:06 -07:00
Don Jang	c90b3cb1da	[Static Runtime] Manage temporary Tensors for aten::layer_norm (#64078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64078 This change converts `aten::layer_norm -> output Tensor` to `static_runtime::layer_norm -> (output Tensor, temp1 Tensor, tmp2 Tensor)` to manage `tmp1` and `tmp2` Tensors by the static runtime. Currently the out-variant of `aten::layer_norm` creates two temporary Tensors inside it: ``` at::Tensor mean = create_empty_from({M}, X); at::Tensor rstd = create_empty_from({M}, X); ``` that the static runtime misses an opportunity to manage. This change puts them into (unused) output Tensors of a new placeholder op `static_runtime::layer_norm` so that the static runtime can mange them since the static runtime as of now chooses to manage only output tensors. Test Plan: - Enhanced `StaticRuntime.LayerNorm` to ensure that `static_runtime::layer_norm` gets activated. - Confirmed that the new op gets activated during testing: ``` V0825 12:51:50.017890 2265227 impl.cpp:1396] Switch to out variant for node: %8 : Tensor, %9 : Tensor, %10 : Tensor = static_runtime::layer_norm(%input.1, %normalized_shape.1, %4, %4, %5, %3) ``` Reviewed By: hlu1 Differential Revision: D30486475 fbshipit-source-id: 5121c44ab58c2d8a954aa0bbd9dfeb7468347a2d	2021-08-27 02:44:43 -07:00
Hao Lu	3c3bba4169	[Static Runtime] Use F14FastMap/F14FastSet (#63999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63999 Use folly::F14FastMap/F14FastSet instead of std::unordered_map/unordered_set in the Static Runtime code base. folly::F14FastMap/F14FastSet implements the same APIs as std::unordered_map/unordered_set but faster. For details see https://github.com/facebook/folly/blob/master/folly/container/F14.md Reviewed By: d1jang Differential Revision: D30566149 fbshipit-source-id: 20a7fa2519e4dde96fb3fc61ef6c92bf6d759383	2021-08-27 01:40:41 -07:00
Ansha Yu	3f1c809470	[static runtime] port c2 argmin kernel (#63632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63632 Local benchmarking with 1 input repeated 10k iter on 290331537_4 local net. Reduces argmin runtime by about 80% and and local net execution by about ~0.71-0.77ms. Before: ``` I0826 17:25:53.972786 1104614 PyTorchPredictorBenchLib.cpp:313] PyTorch run finished. Milliseconds per iter: 7.37599. Iters per second: 135.57 ``` ``` Static runtime ms per iter: 8.22086. Iters per second: 121.642 Time per node type: 4.13527 ms. 50.9157%. fb::sigrid_transforms_torch_bind (1 nodes, out variant) 0.868506 ms. 10.6935%. aten::argmin (1 nodes, out variant) ... ``` After: ``` I0826 17:17:54.165174 1064079 PyTorchPredictorBenchLib.cpp:313] PyTorch run finished. Milliseconds per iter: 6.66724. Iters per second: 149.987 ``` ``` Static runtime ms per iter: 7.68172. Iters per second: 130.179 Time per node type: 4.1452 ms. 54.0612%. fb::sigrid_transforms_torch_bind (1 nodes, out variant) 0.656778 ms. 8.56562%. fb::quantized_linear (8 nodes) 0.488229 ms. 6.36741%. static_runtime::to_copy (827 nodes, out variant) 0.372678 ms. 4.86042%. aten::argmin (1 nodes, out variant) ...Time per node type: 3.39387 ms. 53.5467%. fb::sigrid_transforms_torch_bind (1 nodes, out variant) 0.636216 ms. 10.0379%. fb::quantized_linear (8 nodes, out variant) 0.410535 ms. 6.47721%. fb::clip_ranges_to_gather_to_offsets (304 nodes, out variant) 0.212721 ms. 3.3562%. fb::clip_ranges_gather_sigrid_hash_precompute_v3 (157 nodes, out variant) 0.173736 ms. 2.74111%. aten::matmul (1 nodes, out variant) 0.150514 ms. 2.37474%. aten::argmin (1 nodes, out variant) ``` P447422384 Test Plan: Test with local replayer sending traffic to `ansha_perf_test_0819.test`, and compare outputs to jit interpreter. Start compute tier: ``` RUN_UUID=ansha_perf_test_0819.test.storage JOB_EXPIRE_TIME=864000 MODEL_ID=290331537_4 PREDICTOR_TAG= PREDICTOR_VERSION=405 PREDICTOR_TYPE=CPU ADDITIONAL_FLAGS="--enable_disagg_file_split=true --enable_adx=false --load_remote_file_locally=true --pytorch_predictor_static_runtime_whitelist_by_id=290331537" GFLAGS_CONFIG_PATH=sigrid/predictor/gflags/predictor_gflags_ads_perf_cpu_pyper SMC_TIER_NAME=sigrid.predictor.perf.ansha_per_test_0819.test.storage CLUSTER=tsp_rva ENTITLEMENT_NAME=ads_ranking_infra_test_t6 PREDICTOR_LOCAL_DIRECTORY= ICET_CONFIG_PATH= NNPI_COMPILATION_CONFIG_FILE= NUM_TASKS=1 NNPI_NUM_WORKERS=0 tw job start /data/users/ansha/fbsource/fbcode/tupperware/config/admarket/sigrid/predictor/predictor_perf_canary.tw ``` Start nnpi tier: ``` RUN_UUID=ansha_perf_test_0819.test JOB_EXPIRE_TIME=247200 MODEL_ID=290331537_4 PREDICTOR_TAG= PREDICTOR_VERSION=343 PREDICTOR_TYPE=NNPI_TWSHARED ADDITIONAL_FLAGS="--torch_glow_min_fusion_group_size=30 --pytorch_storage_tier_replayer_sr_connection_options=overall_timeout:1000000,processing_timeout:1000000 --predictor_storage_smc_tier=sigrid.predictor.perf.ansha_perf_test_0819.test.storage --pytorch_predictor_static_runtime_whitelist_by_id=290331537" GFLAGS_CONFIG_PATH=sigrid/predictor/gflags/predictor_gflags_ads_perf_glow_nnpi_pyper_v1 SMC_TIER_NAME=sigrid.predictor.perf.ansha_perf_test_0819.test CLUSTER=tsp_rva ENTITLEMENT_NAME=ads_ranking_infra_test_t17 PREDICTOR_LOCAL_DIRECTORY= ICET_CONFIG_PATH= NNPI_COMPILATION_CONFIG_FILE= NUM_TASKS=1 NNPI_NUM_WORKERS=0 tw job start /data/users/ansha/fbsource/fbcode/tupperware/config/admarket/sigrid/predictor/predictor_perf_canary.tw ``` ```buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- StaticRuntime.IndividualOps_Argmin --print-passing-details``` Compared outputs to jit interpreter to check for no differences greater than 1e-3 (with nnc on) https://www.internalfb.com/intern/diff/view-version/136824794/ Reviewed By: hlu1 Differential Revision: D30445635 fbshipit-source-id: 048de8867ac72f764132295d1ebfa843cde2fa27	2021-08-26 23:19:19 -07:00
Supriya Rao	294db0603f	[quant] Add support for linear_relu fusion for FP16 dynamic quant (#63826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63826 Support the conversion of the intrinsic linearRelu module to the quantized dynamic LinearReLU module Verify the support works for both linear module and functional linear fusion Test Plan: python test/test_quantization.py test_dynamic_with_fusion Imported from OSS Reviewed By: iramazanli Differential Revision: D30503513 fbshipit-source-id: 70446797e9670dfef7341cba2047183d6f88b70f	2021-08-26 21:12:06 -07:00
Supriya Rao	cec44aa574	[quant] Add op support for linear_relu_dynamic_fp16 (#63824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63824 Add a fused operator implementation that will work with the quantization fusion APIs. Once FBGEMM FP16 kernel supports relu fusion natively we can remove the addition from the PT operator. Test Plan: python test/test_quantization.py Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30503514 fbshipit-source-id: 6bf3bd53f47ffaa3f1d178eaad8cc980a7f5258a	2021-08-26 21:12:04 -07:00
Supriya Rao	975f4ccad6	[quant] support linear_relu_dynamic for qnnpack backend (#63820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63820 Adds support in the operator directly to call relu operator if relu fusion is enabled. Once QNNPACK natively supports relu fusion in the linear_dynamic this can be removed Test Plan: python test/test_quantization.py TestDynamicQuantizedLinear.test_qlinear Imported from OSS Reviewed By: vkuzo Differential Revision: D30502813 fbshipit-source-id: 3352ee5f73e482b6d1941f389d720a461b84ba23	2021-08-26 21:12:02 -07:00
Supriya Rao	c7027f19ef	[quant][fx] Add support for dynamic linear + relu fusion (INT8) (#63799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63799 Add a new module that can be used for module swap with the nni.LinearReLU module in convert function. Supports INT8 currently (since FP16 op doesn't have relu fusion yet). Fixes #55393 Test Plan: python test/test_quantization.py test_dynamic_fusion Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30502812 fbshipit-source-id: 3668e4f001a0626d469e17ac323acf582ee28a51	2021-08-26 21:10:46 -07:00
Michael Suo	63c90ec3bf	[torch/deploy] add torch.distributed to build (#63918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63918 Previously we were building with `USE_DISTRIBUTED` off, because c10d was built as a separately library for historical reasons. Since then, lw has merged the c10d build into libtorch, so this is fairly easy to turn on. Differential Revision: D30492442 NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.intern.facebook.com/intern/diff/D30492442/)! D30492442 D30492442 Test Plan: added a unit test Reviewed By: wconstab Pulled By: suo fbshipit-source-id: 843b8fcf349a72a7f6fcbd1fcc8961268690fb8c	2021-08-26 20:58:44 -07:00
Can Balioglu	65e6194aeb	Introduce the torchrun entrypoint (#64049 ) Summary: This PR introduces a new `torchrun` entrypoint that simply "points" to `python -m torch.distributed.run`. It is shorter and less error-prone to type and gives a nicer syntax than a rather cryptic `python -m ...` command line. Along with the new entrypoint the documentation is also updated and places where `torch.distributed.run` are mentioned are replaced with `torchrun`. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64049 Reviewed By: cbalioglu Differential Revision: D30584041 Pulled By: kiukchung fbshipit-source-id: d99db3b5d12e7bf9676bab70e680d4b88031ae2d	2021-08-26 20:17:48 -07:00
nikithamalgi	510d2ece81	Merge script and _script_pdt API (#62420 ) Summary: Merge `torch.jit.script` and `torch.jit._script_pdt` API. This PR merges profile directed typing with script api Pull Request resolved: https://github.com/pytorch/pytorch/pull/62420 Reviewed By: iramazanli Differential Revision: D30579015 Pulled By: nikithamalgifb fbshipit-source-id: 99ba6839d235d61b2dd0144b466b2063a53ccece	2021-08-26 18:58:19 -07:00
Maksim Levental	0e8c3c51d9	port glu to use structured kernel approach (#61800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61800 resubmitting because the [last one](https://github.com/pytorch/pytorch/pull/61433) was unrecoverable due to making changes incorrectly in the stack Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D29812492 Pulled By: makslevental fbshipit-source-id: c3dfeacd1e00a526e24fbaab02dad48069d690ef	2021-08-26 18:01:28 -07:00
Jane Xu	a5f35ac7cd	Run through failures on trunk (#64063 ) Summary: This PR runs all the tests on trunk instead of stopping on first failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64063 Reviewed By: malfet, seemethere Differential Revision: D30592020 Pulled By: janeyx99 fbshipit-source-id: 318b225cdf918a98f73e752d1cc0227d9227f36c	2021-08-26 17:38:19 -07:00
Paul Johnson	0c9dce90ed	[pytorch] add per_sample_weights support for embedding_bag_4bit_rowwise_offsets (#63605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63605 Reviewed By: houseroad Differential Revision: D30434664 fbshipit-source-id: eb4cbae3c705f9dec5c073a56f0f23daee353bc1	2021-08-26 17:31:45 -07:00
Michael Dagitses	81764d1153	document that `torch.triangular_solve` has optional out= parameter (#63253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63253 Fixes #57955 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30312134 Pulled By: dagitses fbshipit-source-id: 4f484620f5754f4324a99bbac1ff783c64cee6b8	2021-08-26 17:28:17 -07:00
Jiewen Tan	ed573a8e08	Enable test_api IMethodTest in OSS (#63345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63345 This diff did the following few things to enable the tests: 1. Exposed IMethod as TORCH_API. 2. Linked torch_deploy to test_api if USE_DEPLOY == 1. 3. Generated torch::deploy examples when building torch_deploy library. Test Plan: ./build/bin/test_api --gtest_filter=IMethodTest.* Reviewed By: ngimel Differential Revision: D30346257 Pulled By: alanwaketan fbshipit-source-id: 932ae7d45790dfb6e00c51893933a054a0fad86d	2021-08-26 16:50:52 -07:00
Don Jang	0bd8d0951d	[Static Runtime] Remove unnecessary fb::equally_split nodes (#64022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64022 Test Plan: - Added unittest `StaticRuntime.RemoveEquallySplitListUnpack`. Reviewed By: hlu1 Differential Revision: D30472189 fbshipit-source-id: 36040b0146f4be9d0d0fda293f7205f43aad0b87	2021-08-26 16:29:43 -07:00
Shijun Kong	dfa35ab3e7	[pytorch][quant][oss] Support 2-bit embedding_bag op "embedding_bag_2bit_rowwise_offsets" (#63658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63658 Support 2-bit embedding_bag op "embedding_bag_2bit_rowwise_offsets" Reviewed By: jingsh, supriyar Differential Revision: D30454994 fbshipit-source-id: 7aa7bfe405c2ffff639d5658a35181036e162dc9	2021-08-26 16:09:35 -07:00
soulitzer	92a154aa29	Move variabletype functions around (#63330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63330 - This is in preparation for templated/boxed autograd-not-implemented fallback - Make sure VariableTypeUtils does not depend on generated code - Lift `isFwGradDefined` into `autograd/functions/utils.cpp` so it's available to mobile builds - Removes `using namespace at` from VariableTypeUtils, previously we needed this for Templated version, but now its not strictly necessary but still a good change to avoid name conflicts if this header is included elsewhere in the future. Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30518573 Pulled By: soulitzer fbshipit-source-id: a0fb904baafc9713de609fffec4b813f6cfcc000	2021-08-26 16:02:39 -07:00
Bo Wang	49353e319c	More sharded_tensor creation ops: harded_tensor.zeros, sharded_tensor.full, sharded_tensor.rand (#63732 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63732 Test Plan: $ python test/distributed/_sharded_tensor/test_sharded_tensor.py --v $ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestCreateTensorFromParams --v $ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestShardedTensorChunked --v Imported from OSS Differential Revision: D30472621 D30472621 Reviewed By: pritamdamania87 Pulled By: bowangbj fbshipit-source-id: fd8ebf9b815fdc292ad1aad521f9f4f454163d0e	2021-08-26 16:01:38 -07:00
Jane Xu	49b782b2cb	Add shard number to print_test_stats.py upload name (#64055 ) Summary: Now that the render test results job is gone, each shard on GHA is uploading a JSON test stats report. To ensure differentiation, this PR includes the shard number in the report name. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64055 Reviewed By: iramazanli Differential Revision: D30586869 Pulled By: janeyx99 fbshipit-source-id: fd19f347131deec51486bb0795e4e13ac19bc71a	2021-08-26 15:43:29 -07:00
MengeTM	085278f8b1	Derivatives of relu (#63027 ) (#63089 ) Summary: Optimization of relu and leaky_relu derivatives for reduction of VRAM needed for the backward-passes Fixes https://github.com/pytorch/pytorch/issues/63027 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63089 Reviewed By: iramazanli Differential Revision: D30582049 Pulled By: albanD fbshipit-source-id: a9481fe8c10cbfe2db485e28ce80cabfef501eb8	2021-08-26 15:33:25 -07:00
Facebook Community Bot	7861dba7f6	Automated submodule update: FBGEMM (#62879 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `ce54703857` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62879 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D30154801 fbshipit-source-id: b2ce185da6f6cadf5128f82b15097d9e13e9e6a0	2021-08-26 15:20:06 -07:00
Mike Iovine	aeec177833	[JIT] UseVariadicOp takes list_idx parameter (#63915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63915 Previously, this function only worked for variadic op substitutions of the form `op(list, args) -> variadic_op(list_1, ..., list_n, args)`. This change allows for transformations of the form `op(args_0, list, args_1) -> variadic_op(args_0, list_1, ..., list_n, args_1)`. Test Plan: `buck test caffe2/test/cpp/jit:jit -- Stack Concat` (tests exercising `list_idx != 0` will be added further up in this diff stack) Reviewed By: navahgar Differential Revision: D30529729 fbshipit-source-id: 568080679c3b40bdaedee56bef2e8a5ce7985d2f	2021-08-26 14:10:35 -07:00
Can Balioglu	d8d8e4902a	[torch/elastic] Pretty print the failure message captured by @record (#64036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64036 This PR slightly revises the implementation of the internal `_format_failure()` method in order to pretty print the error message captured in a subprocess by the `record` annotation. With this PR a failure log is formatted as below: ``` Root Cause: [0]: time: 2021-08-26_17:12:07 rank: 0 (local_rank: 0) exitcode: 1 (pid: 8045) error_file: /tmp/torchelastic_6cj9eppm/6d9d844a-6ce4-4838-93ed-1639a9525b00_rec9kuv3/attempt_0/0/error.json msg: { "message": "ValueError: Test", "extraInfo": { "py_callstack": [ " File \"/data/home/balioglu/fail.py\", line 7, in <module>\n main()\n", " File \"/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/__init__.py\", line 373, in wrapper\n error_handler.record_exception(e)\n", " File \"/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 86, in record_exception\n _write_error(e, self._get_error_file_path())\n", " File \"/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 26, in _write_error\n \"py_callstack\": traceback.format_stack(),\n" ], "timestamp": "1629997927" } } ``` in contrast to the old formatting: ``` Root Cause: [0]: time: 2021-08-26_17:15:50 rank: 0 (local_rank: 0) exitcode: 1 (pid: 9417) error_file: /tmp/torchelastic_22pwarnq/19f22638-848c-4b8f-8379-677f34fc44e7_u43o9vs7/attempt_0/0/error.json msg: "{'message': 'ValueError: Test', 'extraInfo': {'py_callstack': 'Traceback (most recent call last):\n File "/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 351, in wrapper\n return f(args, *kwargs)\n File "/data/home/balioglu/fail.py", line 5, in main\n raise ValueError("BALIOGLU")\nValueError: BALIOGLU\n', 'timestamp': '1629998150'}}" ``` ghstack-source-id: 136761768 Test Plan: Run the existing unit tests. Reviewed By: kiukchung Differential Revision: D30579025 fbshipit-source-id: 37df0b7c7ec9b620355766122986c2c77e8495ae	2021-08-26 13:56:46 -07:00
Ilqar Ramazanli	5a12cb611f	To add Chained Scheduler to the list of PyTorch schedulers. (#63491 ) Summary: In this PR we are introducing ChainedScheduler which initially proposed in the discussion https://github.com/pytorch/pytorch/pull/26423#discussion_r329976246 . The idea is to provide a user friendly chaining method for schedulers, especially for the cases many of them are involved and we want to have a clean and easy to read interface for schedulers. This method will be even more crucial once CompositeSchedulers and Schedulers for different type of parameters are involved. The immediate application of Chained Scheduler is expected to happen in TorchVision Library to combine WarmUpLR and MultiStepLR https://github.com/pytorch/vision/blob/master/references/video_classification/scheduler.py#L5 . However, it can be expected that in many other use cases also this method could be applied. ### Example The usage is as simple as below: ```python sched=ChainedScheduler([ExponentialLR(self.opt, gamma=0.9), WarmUpLR(self.opt, warmup_factor=0.2, warmup_iters=4, warmup_method="constant"), StepLR(self.opt, gamma=0.1, step_size=3)]) ``` Then calling ```python sched.step() ``` would trigger step function for all three schedulers consecutively Partially resolves https://github.com/pytorch/vision/issues/4281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63491 Reviewed By: datumbox, mruberry Differential Revision: D30576180 Pulled By: iramazanli fbshipit-source-id: b43f0749f55faab25079641b7d91c21a891a87e4	2021-08-26 13:30:21 -07:00
Shiyan Deng	7cfbc85821	[fx_acc] [fx2trt] add acc op mapper for argmin and converter for topk (#63823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63823 Add mapper for `torch.argmin` which maps it to `acc_ops.flatten` (optional) + `acc_ops.topk` + `acc_ops.getitem` + `acc_ops.squeeze` (optional). This diff doesn't allow mapping if `dim=None && keepdim=True` in `torch.argmin`. Add fx2trt converter for `acc_ops.topk`. Test Plan: buck test mode/opt glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_argmin buck run mode/opt caffe2/torch/fb/fx2trt:test_topk Reviewed By: jfix71 Differential Revision: D30501771 fbshipit-source-id: 0babc45e69bac5e61ff0b9b4dfb98940398e3e57	2021-08-26 13:16:22 -07:00
Don Jang	cbfec02007	[Static Runtime] Add native op for aten::expand_as (#64024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64024 `aten::expand_as` creates a view of the input tensor. This change adds its native op implementation for the static runtime. Test Plan: - Added `StaticRuntime.IndividualOps_ExpandAs` Reviewed By: hlu1 Differential Revision: D30546851 fbshipit-source-id: e53483048af890bc41b6192a1ab0c5ba0ee2bdc0	2021-08-26 13:05:53 -07:00
Meghan Lele	95d0b3199b	Back out "[ONNX] Fix an issue that optimizations might adjust graph inputs unexpectedly. (#61280 )" (#64004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64004 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63904 Fixes T98808160 Test Plan: T98808160 Reviewed By: msaroufim Differential Revision: D30527450 fbshipit-source-id: 6262901a78ca929cecda1cf740893139aa26f1b4	2021-08-26 12:49:42 -07:00
Ansley Ussery	c5cc185b6d	Allow uncompiled strings as input to `checkScriptRaisesRegex` (#63901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63901 cc gmagogsfm Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D30579472 Pulled By: ansley fbshipit-source-id: 59ee09c1f25278d4f6e51f626588251bd095c6ea	2021-08-26 12:17:07 -07:00
Luca Wehrstedt	48c57b9b2e	Leverage TensorPipe's automatic SHM address selection (#63028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63028 TensorPipe until now required PyTorch to come up and provide a unique identifier to use as address for the UNIX domain socket used in the SHM transport. However the Linux kernel can automatically assign an available address (like it does with IP ports), and TensorPipe now supports it, so we can remove that useless PyTorch logic. Test Plan: CI Reviewed By: mrshenli Differential Revision: D30220352 fbshipit-source-id: 78e8a6ef5916b2a72df26cdc9cd367b9d083e821	2021-08-26 12:15:53 -07:00
Erjia Guan	ad47fb8858	Rename IterableAsDataPipe to IterableWrapper (#63981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63981 Rename `IterableAsDataPipe` to `IterableWrapper` based on our naming convention `Op-er` Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30554197 Pulled By: ejguan fbshipit-source-id: c2eacb20df5645d83ca165d6a1591f7e4791990f	2021-08-26 10:23:25 -07:00
Cheng Chang	0f6b524665	[NNC] Add C++ codegen backend to NNC (#62869 ) Summary: Adds a C++ codegen backend to NNC to generate C++ for CPU instead of generating LLVM IR. Tensors are represented as blobs of float. Vector operations are devectorized/unrolled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62869 Test Plan: https://github.com/pytorch/pytorch/tree/mvz-nnc-aot-prototype makes it able to AOT compile the whole MobileNetV3 model into binary code through LLVM codegen in NNC. I forked that branch to https://github.com/cheng-chang/pytorch/tree/cc-aot-cpp, merged this PR into it, and modified `fancy_compile` to compile MobileNetV3 into C++ through ``` import torch m = torch.jit.load('mobnet.pt') m.eval() f = torch.jit.freeze(m) torch._C._fancy_compile(f.graph, [1, 3, 224, 224]) ``` The generated C++ file `mobnet.cc` can be found at https://gist.github.com/cheng-chang/e2830cc6920b39204ebf368035b2bcec. I manually compiled the generated C++ through `g++ -o mobnet -std=c++14 -L./build/lib -ltorch_cpu -ltorch mobnet.cc`, and it succeeded. Reviewed By: ZolotukhinM Differential Revision: D30149482 Pulled By: cheng-chang fbshipit-source-id: e77b189f0353e37cd309423a48a513e668d07675	2021-08-26 09:56:37 -07:00
Raghavan Raman	6d31ba6ddc	[nnc] Sanitized the names of constants in the input graph. (#63990 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63923 The input graph can contain constants whose names contain special characters. So, all names of constants in the input graph need to be sanitized. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63990 Reviewed By: ZolotukhinM Differential Revision: D30558432 Pulled By: navahgar fbshipit-source-id: de5b0c23d50ee8997f40f2c0fc605dda3719186f	2021-08-26 09:52:02 -07:00
Bert Maher	ba5f1b1076	[nnc] Fix dtype promotion involving scalars (#64002 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64002 Fixes https://github.com/pytorch/vision/issues/4315 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30566979 Pulled By: bertmaher fbshipit-source-id: eaa98b9534a926be7fcd337d46c5a0acb3243179	2021-08-26 09:43:15 -07:00
Jane Xu	1354ee417a	run_test.py: add option to run only core tests (#63976 ) Summary: This is in response to a feature request from some folks in the core team to have a local command that would only run relevant "core" tests. The idea is to have a local smoke test option for developers to run locally before making a PR in order to verify their changes did not break core functionality. These smoke tests are not targeted to be short but rather relevant. This PR enables that by allowing developers to run `python test/run_test.py --core` or `python test/run_test.py -core` in order to run the CORE_TEST_LIST, which is currently test_nn.py, test_torch.py, and test_ops.py. I am not the best person to judge what should be considered "core", so please comment which tests should be included and/or excluded from the CORE_TEST_LIST! Pull Request resolved: https://github.com/pytorch/pytorch/pull/63976 Test Plan: ``` (pytorch) janeyx@janeyx-mbp test % python run_test.py --core -v Selected tests: test_nn, test_ops, test_torch Running test_nn ... [2021-08-25 14:48:28.865078] Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', 'test_nn.py', '-v'] ... [2021-08-25 14:48:28.865123] test_to (__main__.PackedSequenceTest) ... ok test_to_memory_format (__main__.PackedSequenceTest) ... ok ``` Reviewed By: walterddr Differential Revision: D30575560 Pulled By: janeyx99 fbshipit-source-id: 3f151982c1e315e50e60cb0d818adaea34556a04	2021-08-26 09:29:57 -07:00
Don Jang	fbe7133b58	[Static Runtime] Disable out variant of aten::clone (#63980 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63980 The out variant implementation of `aten::clone` causes a crash, which needs further investigation. This change disables it until the problem gets fixed. Note that `inline_cvr` doesn't use `aten::clone` as of now, so no perf implication: https://www.internalfb.com/phabricator/paste/view/P446858755?lines=121 Test Plan: N/A Reviewed By: hlu1 Differential Revision: D30544149 fbshipit-source-id: facb334d67473f622b36862fbdb2633358556fdf	2021-08-26 08:10:13 -07:00
Rong Rong (AI Infra)	7ccc4b5cc8	[CI] move distributed test into its own CI job (#62896 ) Summary: Moving distributed to its own job. - [x] ensure there should be a distributed test job for every default test job matrix (on GHA) - [x] ensure that circleci jobs works for distributed as well - [x] waiting for test distributed to have its own run_test.py launch options, see https://github.com/pytorch/pytorch/issues/63147 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62896 Reviewed By: seemethere Differential Revision: D30230856 Pulled By: walterddr fbshipit-source-id: 0cad620f6cd9e56c727c105458d76539a5ae976f	2021-08-26 08:02:20 -07:00
albanD	733755f72c	remove special grad_mode tls handling (#63116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63116 This PR removes the special flag to disable grad mode tracking on the ThreadLocalState and replaces it with an explicit setter that users can use. This allows to reduce complexity of ThreadLocalState. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30388098 Pulled By: albanD fbshipit-source-id: 85641b3d711179fb78ff6a41ed077548dc821a2f	2021-08-26 07:51:30 -07:00
Heitor Schueroff	950f7c0237	Added API tests to ReductionOpInfo and ported amax/amin/nansum tests (#62899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62899 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30408816 Pulled By: heitorschueroff fbshipit-source-id: 6cb0aa7fa7edba93549ef873baa2fb8a003bd91d	2021-08-26 07:18:43 -07:00
Edward Yang	10da1fc3f8	Deify opmath_t into its own header, align with accscalar_t (#63986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63986 Fixes #63985 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30555996 Pulled By: ezyang fbshipit-source-id: b6e4d56a5658ed028ffc105cc4b479faa6882b65	2021-08-26 06:59:46 -07:00
Heitor Schueroff	774ae0851d	[OpInfo] Added ReductionOpInfo subclass of OpInfo and ported sum test (#62737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62737 ReductionOpInfo is a specialization of OpInfo for reduction operators. For now, it is designed to work with reductions that return a single tensor and that reduce all elements along one or more dimensions to a single value. In particular this excludes operators such as `max` and `min` that return multiple tensors and `quantile` that can return multiple values. fixes https://github.com/pytorch/pytorch/issues/49746 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30406568 Pulled By: heitorschueroff fbshipit-source-id: 218b1da1902f67bcf4c3681e2a0f0029a25d51f1	2021-08-26 06:06:38 -07:00
Luca Wehrstedt	c02eda8166	Update TensorPipe submodule Summary: The bot failed to do it. Test Plan: D30542677 Reviewed By: beauby Differential Revision: D30573500 fbshipit-source-id: 50abd6fc415cead0a6b6d9290fa0e5f97d0e4989	2021-08-26 05:44:38 -07:00
Michael Dagitses	61d88cdd1c	use `const auto&` as type for grad alias (#63949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63949 This is an extension of the discussion in https://github.com/pytorch/pytorch/pull/63040#discussion_r687793027. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30546789 Pulled By: dagitses fbshipit-source-id: 3046aff4f129d5492d73dfb67717a824e16ffee8	2021-08-26 04:44:03 -07:00
Kefei Lu	5757d03145	Add logging for _MinimizerBase Summary: Add logging so we know which nodes are currently being visited Test Plan: lint & SC tests Reviewed By: 842974287 Differential Revision: D30509865 fbshipit-source-id: 09e77e44c97c825242e0b24f90463b50f3ca19c6	2021-08-26 00:52:58 -07:00
Rohan Varma	a6f767ed3d	Fix issue re: DDP and create_graph=True (#63831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63831 Closes https://github.com/pytorch/pytorch/issues/63812 `at::mul_out` is not supported when `grad` itself requires grad, which is useful for computing higher order derivatives. In this case, fall back to a mul + copy instead of mul_out. ghstack-source-id: 136614644 Test Plan: UT Reviewed By: SciPioneer Differential Revision: D30505573 fbshipit-source-id: 83532b6207b3d80116fcc4dff0e5520d73b3454f	2021-08-25 23:50:25 -07:00
Marjan Fariborz	3b284ab024	Adding BFP16 quantization/dequantization support to OSS (#63059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63059 Supporting BFP16 quantization method to OSS. Currently only support CPU ghstack-source-id: 136639528 Test Plan: Imported from OSS Reviewed By: wanchaol Differential Revision: D30194538 fbshipit-source-id: ac248567ad8028457c2a91b77ef2ce81709fce53	2021-08-25 23:41:34 -07:00
Kiuk Chung	9d95d48567	(torch.distributed) Add torch.distributed.is_torchelastic_launched() util method + make init_method=tcp:// compatible with torchelastic (#63910 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63910 Addresses the current issue that `init_method=tcp://` is not compatible with `torch.distributed.run` and `torch.distributed.launch`. When running with a training script that initializes the process group with `init_method=tcp://localhost:$port` as such: ``` $ python -u -m torch.distributed.run --max_restarts 0 --nproc_per_node 1 --nnodes 1 --master_addr $(hostname) --master_port 6000 ~/tmp/test.py ``` An `Address in use` error is raised since the training script tries to create a TCPStore on port 6000, which is already taken since the elastic agent is already running a TCPStore on that port. For details see: https://github.com/pytorch/pytorch/issues/63874. This change does a couple of things: 1. Adds `is_torchelastic_launched()` check function that users can use in the training scripts to see whether the script is launched via torchelastic. 1. Update the `torch.distributed` docs page to include the new `is_torchelastic_launched()` function. 1. Makes `init_method=tcp://` torchelastic compatible by modifying `_tcp_rendezvous_handler` in `torch.distributed.rendezvous` (this is NOT the elastic rendezvous, it is the old rendezvous module which is slotted for deprecation in future releases) to check `is_torchelastic_launched()` AND `torchelastic_use_agent_store()` and if so, only create TCPStore clients (no daemons, not even for rank 0). 1. Adds a bunch of unittests to cover the different code paths NOTE: the issue mentions that we should fail-fast with an assertion on `init_method!=env://` when `is_torchelastic_launched()` is `True`. There are three registered init_methods in pytorch: env://, tcp://, file://. Since this diff makes tcp:// compatible with torchelastic and I've validated that file is compatible with torchelastic. There is no need to add assertions. I did update the docs to point out that env:// is the RECOMMENDED init_method. We should probably deprecate the other init_methods in the future but this is out of scope for this issue. Test Plan: Unittests. Reviewed By: cbalioglu Differential Revision: D30529984 fbshipit-source-id: 267aea6d4dad73eb14a2680ac921f210ff547cc5	2021-08-25 22:57:43 -07:00
Joseph Spisak	b629ea4620	Update persons_of_interest.rst (#63907 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/63907 Reviewed By: jspisak Differential Revision: D30534972 Pulled By: dzhulgakov fbshipit-source-id: ba726fc53e292a362c387cc8b5f7776ca2a2544c	2021-08-25 22:50:54 -07:00
Philip Meier	b1154cc774	enable equal_nan for complex values in isclose (#63571 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63571 Test Plan: Imported from OSS Reviewed By: malfet, ngimel Differential Revision: D30560127 Pulled By: mruberry fbshipit-source-id: 8958121ca24e7c139d869607903aebbe87bc0740	2021-08-25 22:05:49 -07:00
nikithamalgi	49c8fbc92f	Clean up related to type refinements (#62444 ) Summary: Creates a helper function to refine the types into a torchScript compatible format in the monkeytype config for profile directed typing Pull Request resolved: https://github.com/pytorch/pytorch/pull/62444 Reviewed By: malfet Differential Revision: D30548159 Pulled By: nikithamalgifb fbshipit-source-id: 7c09ce5f5e043d069313b87112837d7e226ade1f	2021-08-25 21:53:00 -07:00
Zeina Migeed	80a61142e4	inference for algebraic expressions (#63822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63822 Infer algebraic expressions and add it to our symbolic inferencer. Works for conv2D and can be extended to other operations. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D30518469 Pulled By: migeed-z fbshipit-source-id: b92dfa40b2d834a535177da42b851701b8f7178c	2021-08-25 20:47:23 -07:00
Zafar Takhirov	124ae597fb	[quant] Fixing the conversion of the quantizable RNN (#63879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63879 Quantizable RNN had a bug, where the `from_observed` was an instance method, instead of a class method. This caused the `tq.convert` to fail. This fixes the issue by making the `from_observed` a classmethod. The tests were passing before because the unittests were not using the custom module path, but a conventional `from_float`, which is also supported. Test Plan: `buck test mode/dev //caffe2/test:quantization -- test_custom_module_lstm` ``` buck test mode/dev //caffe2/test:quantization -- test_custom_module_lstm Parsing buck files: finished in 0.5 sec Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 9.2 sec (100%) 12622/12622 jobs, 2/12622 updated Total time: 9.7 sec More details at https://www.internalfb.com/intern/buck/build/0d87b987-649f-4d06-b0e2-97b5077 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: cb99305f-65c9-438b-a99f-a0a2a3089778 Trace available for this run at /tmp/tpx-20210824-115652.540356/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5066549645030046 ✓ ListingSuccess: caffe2/test:quantization - main (12.550) ✓ Pass: caffe2/test:quantization - test_custom_module_lstm (quantization.core.test_quantized_op.TestQuantizedOps) (174.867) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5066549645030046 ``` Reviewed By: jerryzh168, mtl67 Differential Revision: D30520473 fbshipit-source-id: bc5d0b5bb079fd146e2614dd42526fc7d4d4f3c6	2021-08-25 20:39:02 -07:00
Zhengxu Chen	2ea2711501	Make frozen symbol name customizable in torch deploy. (#63817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63817 ghstack-source-id: 136699671 Test Plan: eyes Reviewed By: wconstab Differential Revision: D29571559 fbshipit-source-id: 8e3caa4932ef8d7c8559f264f0e9bb5474ad2237	2021-08-25 20:10:35 -07:00
Natalia Gimelshein	f4bc28990f	Compute cuda reduction buffer size in elements (#63969 ) Summary: Resubmit of https://github.com/pytorch/pytorch/issues/63885 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63969 Reviewed By: mruberry Differential Revision: D30549423 Pulled By: ngimel fbshipit-source-id: b16d25030d44ced789c125a333d72b02a8f45067	2021-08-25 18:18:37 -07:00
Jerry Zhang	01b8162d00	Back out "Revert D30384746: [fx2trt] Add a test for quantized resnet18" (#63973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63973 Original commit changeset: b93235323e22 Test Plan: buck run mode/opt -c python.package_style=inplace caffe2:fx2trt_quantized_resnet_test Reviewed By: 842974287 Differential Revision: D30546036 fbshipit-source-id: 2c8302456f072d04da00cf9ad97aa8304bc5e43e	2021-08-25 17:52:22 -07:00
Philip Meier	57d4c6cf42	replace `self.assertTrue(torch.allclose(..))` with `self.assertEqual(…)` (#63637 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63565 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63637 Reviewed By: malfet Differential Revision: D30541266 Pulled By: mruberry fbshipit-source-id: ab461949782c6908a589ea098fcfcf5c3e081ee6	2021-08-25 16:47:40 -07:00
David Riazati	1be1c901aa	Remove render_test_results job (#63877 ) Summary: This removes the `render_test_results` job we had before which had been causing some confusion among devs when it failed and isn't really necessary now that we can actually render test results on the PR HUD. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63877 Reviewed By: walterddr, janeyx99 Differential Revision: D30546705 Pulled By: driazati fbshipit-source-id: 55fdafdb6f80924d941ffc15ee10787cb54f34a1	2021-08-25 15:55:55 -07:00
John Clow	ba0e6a1e03	[EASY] Update the clang-tidy error message (#63370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63370 As shown by this CI run, the actual thing that is incorrect is the prompt. https://github.com/pytorch/pytorch/actions/runs/1137298261 The CI runs the below command instead of the original command. The original command errors out when importing another file on line 1. Trying to fix the code to work with the original command causes the CI to error out. We should actually ask the user to run `python3 -m tools.linter.install.clang_tidy` Test Plan: Imported from OSS Reviewed By: janeyx99, heitorschueroff Differential Revision: D30530216 Pulled By: Gamrix fbshipit-source-id: 2a2b8d539dcc2839e4000c13e82c207fa89bfc9f	2021-08-25 15:30:13 -07:00
Peter Bell	44ede71751	Shard python_torch_functions.cpp (#62187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62187 This file can take 3 minutes on its own to compile, and after python_functions.cpp is the second limiting factor for compile time of `libtorch_python` on a 32-core threadripper. This splits it into 3 files that take around 1 minute each to compile. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D29962048 Pulled By: albanD fbshipit-source-id: 99016d75912bff483fe21b130cef43a6882f8c0e	2021-08-25 15:10:43 -07:00
Jithun Nair	730ce29baf	Add note on ifdefing based on CUDA_VERSION for ROCm path (#62850 ) Summary: CUDA_VERSION and HIP_VERSION follow very unrelated versioning schemes, so it does not make sense to use CUDA_VERSION to determine the ROCm path. This note explicitly addresses it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62850 Reviewed By: mruberry Differential Revision: D30547562 Pulled By: malfet fbshipit-source-id: 02990fa66a88466c2330ab85f446b25b78545150	2021-08-25 15:02:03 -07:00
John Clow	b5b9ce146f	Small fixes to the Contributing.txt (#63385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63385 Correcting a mistake for the pytorch uninstall, and adding an extra note for Darwin. Test Plan: Imported from OSS Reviewed By: janeyx99, heitorschueroff Differential Revision: D30530234 fbshipit-source-id: e0f88a1725eeadabfb4b28c1da11e369ee878ab4	2021-08-25 14:50:37 -07:00
Rong Rong (AI Infra)	52ebe7e14e	Back out "Temporary fix for remote gpu execution issue" (#63983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63983 Test for fixes in D30545351. it should resolve the remote execution flag being populated incorrectly issue. Test Plan: CI Reviewed By: malfet, seemethere Differential Revision: D30549443 fbshipit-source-id: b3895909f5cd654ba163b77950872b332fbad3fe	2021-08-25 14:37:01 -07:00
Priya Ramani	5b548f6f64	Shape Propagation Pass: Fix AdaptiveAveragePooling2d (#63629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63629 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30461727 Pulled By: priyaramani fbshipit-source-id: 3873d1d636f79185680b82de06174d8de288c941	2021-08-25 13:13:41 -07:00
driazati	ab5cf5a1eb	Move existing target determinator to tools (#63809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63809 This moves out the modulefinder determinator to `tools/testing` since it is supposed to be CI-only. This also simplifies run_test.py a little bit. Test Plan: Imported from OSS Reviewed By: malfet, seemethere, janeyx99 Differential Revision: D30497438 Pulled By: driazati fbshipit-source-id: 1d203037af5af6a20c1e7812da935e7cbb5cd82f	2021-08-25 13:03:53 -07:00
Yi Wang	7edeead796	Add a comment on the potential implicit type up-casting (#63905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63905 as title ghstack-source-id: 136590703 Test Plan: N/A Reviewed By: mrshenli Differential Revision: D30527929 fbshipit-source-id: 69402bbfa87cfd8fc166ce313cde9736ee072589	2021-08-25 12:47:45 -07:00
mingfeima	b0782f0f32	add BFloat16 support for bernoulli and Dropout on CPU (#56372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56372 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D28836792 Pulled By: VitalyFedyunin fbshipit-source-id: ede951d172a59276e11383fd767778ab959b5a6b	2021-08-25 12:01:27 -07:00
Howard Huang	7299565768	Update torch.distributed.run OMP_NUM_THREADS message to log.warning (#63953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63953 Closes #61138 Test: `python -m torch.distributed.run --nproc_per_node 2 test.py` Still outputs message `LOGLEVEL=ERROR python -m torch.distributed.run --nproc_per_node 2 test.py` Does not output message anymore cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30542997 Pulled By: H-Huang fbshipit-source-id: e7da30dcda51516abf4e56f1f510132e44397027	2021-08-25 11:55:06 -07:00
zhouzhuojie	3d4aabfc48	Fix ciflow/all label generation (#63954 ) Summary: the `ciflow/all` is automatically added but need to be added before we call `gen_root_job_condition`. - fix the order of adding `ciflow/all` - refactor all the string into global constants Pull Request resolved: https://github.com/pytorch/pytorch/pull/63954 Reviewed By: malfet Differential Revision: D30545596 Pulled By: zhouzhuojie fbshipit-source-id: 83ab668f0234488afb855a72e3ebd4503f7f1a78	2021-08-25 11:32:32 -07:00
driazati	67d8e7b659	Reformat run_test.py (#63808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63808 `black run_test.py` Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D30497437 Pulled By: driazati fbshipit-source-id: 41b29b73f41fa4bb15fce5eaa69f8efe614e02f7	2021-08-25 11:27:18 -07:00
Raghavan Raman	64d605bab8	[Static Runtime] Added caching for the NNC code generated for Logit. (#63840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63840 Added NNC generated code for Logit to the cache. ``` Logit NNC Benchmark Time (ns) w/o cache w/ cache logit_nnc_sleef/64 543 536 logit_nnc_sleef/512 3517 3465 logit_nnc_sleef/8192 88483 85881 logit_nnc_sleef/32768 337016 323090 logit_nnc_fast/64 167 163 logit_nnc_fast/512 866 817 logit_nnc_fast/8192 13069 12801 logit_nnc_fast/32768 53429 52530 logit_nnc_vml/64 164 151 logit_nnc_vml/512 783 769 logit_nnc_vml/8192 11563 11674 logit_nnc_vml/32768 46720 46452 ``` Test Plan: Unit tests and inline_cvr model. Reviewed By: hlu1 Differential Revision: D30405424 fbshipit-source-id: 938b1b74758e2612ae151bac890c5f8ebbc42d50	2021-08-25 11:19:58 -07:00
Raghavan Raman	dde07cad6f	[Static Runtime] Added a variable for clamp in the NNC code for Logit. (#63839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63839 Replaced the use of a constant for clamp in the NNC code for Logit with a variable. This makes it easier to enable caching for Logit. There is no performance difference with this change, as shown in the micro-benchmarks below. ``` Logit NNC Benchmark Time (ns) const-clamp var-clamp logit_nnc_sleef/64 550 543 logit_nnc_sleef/512 3514 3517 logit_nnc_sleef/8192 85537 82900 logit_nnc_sleef/32768 347635 337016 logit_nnc_fast/64 173 167 logit_nnc_fast/512 829 866 logit_nnc_fast/8192 13286 13069 logit_nnc_fast/32768 51116 53429 logit_nnc_vml/64 146 164 logit_nnc_vml/512 773 783 logit_nnc_vml/8192 11556 11563 logit_nnc_vml/32768 44815 46720 ``` Test Plan: SR unit tests and the inline_cvr model. Reviewed By: bertmaher Differential Revision: D30405466 fbshipit-source-id: adb891fdae5746439931ce5f43165291fec08f52	2021-08-25 11:19:55 -07:00
Raghavan Raman	a2399a76e1	[Static Runtime] Moved NNC operator definitions to separate files. (#63838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63838 Refactored NNC operator definitions code into separate files. Made `TEWrapper` a class with a fixed set of methods and added separate definitions for them based on `TORCH_ENABLE_LLVM` to keep the same functionality as before. Test Plan: Build and ran Static Runtime tests. Reviewed By: hlu1 Differential Revision: D30405467 fbshipit-source-id: 606ef852bb820d5e23a0f8af1bf5dc122e90bceb	2021-08-25 11:18:32 -07:00
Aayush Prakash	8a22d4fa5c	[Reland] Replacing the p.data acccess in utils with tensor.set_ . Passes both test_post_localSGD_optimizer_pari and test_periodic_model_averager tests (#63895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63895 When updating the model parameter, updating `parameter.data` is no longer recommended, because this `data` field will be deprecated in the future. The replacement is `tensor.set_`. ghstack-source-id: 136593433 Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_periodic_model_averager buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity Reviewed By: SciPioneer Differential Revision: D30526178 fbshipit-source-id: a1ac0ec3665d8623edd5bf94f01c1132daff5c00	2021-08-25 11:12:55 -07:00
albanD	ab954cb0d1	clean up engine.cpp thread state (#63115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63115 This actually changes: - callbacks now run with proper grad mode even in worker threads - graphtask's Future callbacks now run with proper TLS when erroring out from a worker thread Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30388100 Pulled By: albanD fbshipit-source-id: 7ae9c461c2f0040548dd9e1e314f25e8da0c2e67	2021-08-25 11:08:43 -07:00
Shiyan Deng	c06dfd7c26	[fx2trt] Check input device in TRTModule (#63893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63893 Add a check to ensure all the inputs are on cuda device. Test Plan: CI Reviewed By: kflu, houseroad Differential Revision: D30525265 fbshipit-source-id: 6e50b70fd535defc1f802d51e8bb991b2dd73741	2021-08-25 10:25:34 -07:00
riship	6324d98e9e	bf16 Error message cleanup as well as addition of is_bf16_supported (#63798 ) Summary: ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/63798 Reviewed By: heitorschueroff Differential Revision: D30526187 Pulled By: ngimel fbshipit-source-id: c484aec14638097c96c720095d3491249b6b2d14	2021-08-25 09:59:59 -07:00
Karen Zhou	eebac46282	[pruner] add getter for pruned outputs in base pruner (#63520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63520 Rather than having to call `module.parametrizations.weight[0].pruned_outputs` each time we need to access the set of pruned indices, we add a getter `get_module_pruned_outputs` which takes the module as an argument and returns the set. This is used for testing. ghstack-source-id: 136561130 Test Plan: ` buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1N4gK Reviewed By: z-a-f Differential Revision: D30374558 fbshipit-source-id: e38dfee0879cadde52b942e899a3d8d7151ee493	2021-08-25 09:57:29 -07:00
Karen Zhou	83b132b112	[pruner] add support for pruning BatchNorm2d (#63519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63519 If the pruner should be pruning biases along with weights, then if the model has BatchNorm2d following pruned Conv2d layers, then the corresponding channels of the BatchNorm must also be pruned. Specifically, they need to zeroed out, rather than fully removed, since in eager mode, the dimensions between layers need to be preserved. To do this, we add a pruning parametrization called `ZeroesParametrization` which zeroes out pruned channels, rather than removing them. The user must provide in the config, a tuple of the Conv2d and BatchNorm layers that go together. The `prepare` method will add the tuple to the `module_groups`; then it will add a PruningParametrization to the Conv2d layer, and a ZeroesParametrization to BatchNorm, and then set their pruned sets to be the same set. That way, during `step`, both masks are updated with the same pruned indices. ghstack-source-id: 136562278 Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1N1P6 Reviewed By: z-a-f Differential Revision: D30349855 fbshipit-source-id: 3199d3688d5a70963f9b32d7a8fdac3962ae6a65	2021-08-25 09:56:19 -07:00
Peter Bell	c1dfd58715	Minor OptionalTensorRef updates (#63611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63611 A few minor updates to `OptionalTensorRef`: 1. use `Tensor`'s `unsafe_borrow_t` constructor which avoids an unnecesary `nullptr` check. 2. copy constructor cannot defer to the `const Tensor&` constructor because it checks the tensor is defined, and so would fail for disengaged optionals. 3. use copy-swap idiom to avoid issues with self-assignment. `x = x` should be a no-op, but the old version would clear `x`. 4. Add pointer-like access for consistency with `optional` and `MaybeOwned` Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D30484704 Pulled By: ezyang fbshipit-source-id: 738f4bd22359eaecd0a519a04e89a4b44d92da5b	2021-08-25 09:37:02 -07:00
Nikita Shulga	5ab356ffe6	Update CMake minimum version to 3.10 (#63660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63660 Test Plan: Imported from OSS Reviewed By: janeyx99, mruberry Differential Revision: D30543878 fbshipit-source-id: a7d938807653f39727f2cc7d7ca167200567b6a0	2021-08-25 09:25:43 -07:00
Rong Rong (AI Infra)	34ed16ffef	Temporary fix for remote gpu execution issue (#63899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63899 See: T99020845 Test Plan: sandcastle Reviewed By: heitorschueroff Differential Revision: D30527384 fbshipit-source-id: ce9933e5e181322c02d4ed17f3fdaabe4c5ba29e	2021-08-25 09:14:03 -07:00
Ansley Ussery	01c35115d8	Fix bug in `check_empty_containers` (#63492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63492 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D30402749 Pulled By: ansley fbshipit-source-id: 7de533355fe91ca4f45b2bafc3bfb205a028c1ed	2021-08-25 09:05:08 -07:00
Jane Xu	8c897d254d	Swap CUDA 11.1 and 11.3 in CI to make 11.1 periodic (#63900 ) Summary: Preparing for supporting 11.3 in the next release. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63900 Reviewed By: malfet Differential Revision: D30541437 Pulled By: janeyx99 fbshipit-source-id: a7297da7f7818a4291b1c321d62d76fc2c0f1f90	2021-08-25 09:01:26 -07:00
zhouzhuojie	3926fdbaa4	[skip ci] Add generated comment to ruleset json (#63896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63896 Reviewed By: heitorschueroff Differential Revision: D30529820 Pulled By: zhouzhuojie fbshipit-source-id: 7529803af23ea36a7bcb673cd399da80da8e3feb	2021-08-25 08:53:33 -07:00
Alban Desmaison	87a661c79f	Revert D30526034: [pytorch][PR] compute reduction intermediate buffer size in elements Test Plan: revert-hammer Differential Revision: D30526034 (`e69a1398cb`) Original commit changeset: 0aca7f887974 fbshipit-source-id: a22472723818d6fe0c11a6e134080df1ac408038	2021-08-25 07:17:22 -07:00
Linbin Yu	839eaa2e91	Revert D30384746: [fx2trt] Add a test for quantized resnet18 Test Plan: revert-hammer Differential Revision: D30384746 (`10dfa58eba`) Original commit changeset: 1a8638777116 fbshipit-source-id: b93235323e229b391f5456f6e3543988062dd0d4	2021-08-25 00:43:06 -07:00
Jerry Zhang	10dfa58eba	[fx2trt] Add a test for quantized resnet18 (#63446 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63446 Add a test for quantized resnet18 running in TensorRT Test Plan: buck run mode/opt -c python.package_style=inplace caffe2:fx2trt_quantized_resnet_test Reviewed By: 842974287 Differential Revision: D30384746 fbshipit-source-id: 1a863877711618cd23d887694269ed9e44ee606c	2021-08-24 21:34:23 -07:00
Jerry Zhang	0301c3bc01	[quant][graphmode][fx] Make maxpool and flatten produce the reference pattern (#63501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63501 Currently some of the ops are considered as working with both float and quantized input, so we may have things like "quant - some_op - dequant" this might not work well with the backend, we may consider change everything to produce "quant - dequant - some_op - quant - dequant" instead in the future, this PR fixes it for maxpool and flatten only to unblock resnet benchmarking on TensorRT Test Plan: python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: mruberry Differential Revision: D30402788 fbshipit-source-id: 892c5ff6552775070e2c1453f65846590fb12735	2021-08-24 21:31:01 -07:00
Mikhail Zolotukhin	d388a1a5df	[TensorExpr] LLVMCodegen: Use addFnAttr instead of addAttribute which was deleted. (#63886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63886 cc gmagogsfm Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30523135 Pulled By: ZolotukhinM fbshipit-source-id: 62e125f917b2a0153eb30879d93cf956587a05e0	2021-08-24 21:23:06 -07:00
Jerry Zhang	c8527bc398	[qunat][graphmode][fx] Add a separate lower_to_native_backend function for relu (#62861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62861 This PR adds a lower_to_native_backend function to lower a quantized reference model to a model that uses fbgemm/qnnpack ops. We'll gradually add support and remove the fbgemm/qnnpack specific handling in quantization_patterns.py Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D30165828 fbshipit-source-id: de1149cd7e7c1840c17c251cd4d35004afd015b7	2021-08-24 21:07:03 -07:00
Natalia Gimelshein	e69a1398cb	compute reduction intermediate buffer size in elements (#63885 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63869 `iter` strides are in bytes, and we are additionally multiplying size computed using those strides by `sizeof(arg_t)`. Computing `output_memory_size` in elements should be enough. This doesn't fix the still real problem of allocating large intermediate tensor, but it makes this tensor smaller by typically a factor of 4. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63885 Reviewed By: mruberry Differential Revision: D30526034 Pulled By: ngimel fbshipit-source-id: 0aca7f887974b7776e380463bbd82d32a5786ee8	2021-08-24 19:39:21 -07:00
Thomas J. Fan	ba126df614	TST Adds more modules into common module tests (#62999 ) Summary: This PR moves some modules into `common_modules` to see what it looks like. While migrating some no batch modules into `common_modules`, I noticed that `desc` is not used for the name. This means we can not use `-k` to filter tests. This PR moves the sample generation into `_parametrize_test`, and passes in the already generated `module_input` into users of `modules(modules_db)`. I can see this is a little different from opsinfo and would be happy to revert to the original implementation of `modules`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62999 Reviewed By: heitorschueroff Differential Revision: D30522737 Pulled By: jbschlosser fbshipit-source-id: 7ed1aeb3753fc97a4ad6f1a3c789727c78e1bc73	2021-08-24 19:16:32 -07:00
Joel Schlosser	544af391b5	Allow arbitrary objects in state_dicts (#62976 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62094 Introduces functionality for adding arbitrary objects to module state_dicts. To take advantage of this, the following functions can be defined on a module: * `get_extra_state(self) -> dict` - Returns a dict defining any extra state this module wants to save * `set_extra_state(self, state)` - Subsumes the given state within the module In the details, a sub-dictionary is stored in the state_dict under the key `_extra_state` for each module that requires extra state. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62976 Reviewed By: heitorschueroff Differential Revision: D30518657 Pulled By: jbschlosser fbshipit-source-id: 5fb35ab8e3d36f35e3e96dcd4498f8c917d1f386	2021-08-24 19:06:14 -07:00
Thomas J. Fan	58ef99bd5a	TST Adds pickle testing for ModuleInfo (#63736 ) Summary: Follow up to https://github.com/pytorch/pytorch/pull/61935 This PR adds `test_pickle` to `test_modules`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63736 Reviewed By: heitorschueroff Differential Revision: D30522462 Pulled By: jbschlosser fbshipit-source-id: a03b66ea0d81c6d0845c4fddf0ddc3714bbf0ab1	2021-08-24 19:04:46 -07:00
Bert Maher	8dda299d96	Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776 I reverted this out of an abundance of caution because some test failures occurred, but they were all due to precision issues fixed lower in this stack. Let's try again. I've rolled the elimination of the allow-parallelism-in-fusions toggle into this diff since they're pretty tightly coupled. ghstack-source-id: 136529847 Test Plan: CI Reviewed By: huiguoo Differential Revision: D30484555 fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59	2021-08-24 18:56:55 -07:00
Bert Maher	1787b905c4	Don't switch executors mid test (#63830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63830 It's really not safe to change the executor out from under models that may have already been partially compiled. ghstack-source-id: 136526228 Test Plan: ``` DEBUG=1 CFLAGS="-fsanitize=address" CXXFLAGS="-fsanitize=address" USE_LLVM=$(realpath ../llvm-project/install) CMAKE_PREFIX_PATH=$CONDA_PREFIX python setup.py install LD_PRELOAD=/lib64/libasan.so.5 numactl -C3 pytest -v --cov --cov-report xml:test/coverage.xml --cov-append onnx/test_pytorch_onnx_onnxruntime.py::TestONNXRuntime_opset11 -s ``` Reviewed By: desertfire Differential Revision: D30504489 fbshipit-source-id: 188581cb53f0cf5bd3442d1e9d46e8c0c7e124f8	2021-08-24 18:56:53 -07:00
Bert Maher	543130511a	[nnc] Disable erf and erfc (#63775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63775 These introduce small accuracy differences that cause some internal tests to fail, and it's not worth fixing the tests right now because they're slower than the ATen ops anyways. ghstack-source-id: 136526229 Test Plan: ``` buck test mode/dev //aml/eccv/mcm/training:tests -- --exact 'aml/eccv/mcm/training:tests - test_build_torch_script_model (aml.eccv.mcm.training.tests.publish_helper_tests.TransformerPredictorPublishHelperTests)' ``` Reviewed By: navahgar Differential Revision: D30484557 fbshipit-source-id: 095a9c810539a499105b76e1d96843dbc61b0079	2021-08-24 18:55:45 -07:00
Peter Bell	d454c9e76e	Migrate THCTensor_copyIgnoringOverlaps to ATen (#63505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63505 This isn't a public operator, just a helper function used in CUDA_tensor_apply. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30441305 Pulled By: ngimel fbshipit-source-id: 84fabc701cbd8479e02d80f373a3dd62d70df2ce	2021-08-24 18:50:28 -07:00
Jerry Zhang	5b28e3c183	[quant][graphmode][fx] Add reference option support for binary ops (#62698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62698 We also removed the special handling in match_utils for binary ops Test Plan: python test/test_quantize.py TestQuantizeFx python test/test_quantize.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D30093781 fbshipit-source-id: 58cc972de8211a80dd4d111e25dc4ad36057933f	2021-08-24 18:22:11 -07:00
Hao Lu	6fa646ad54	[StaticRuntime] Fix bug in HasInplaceOp (#63842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63842 Reviewed By: mikeiovine Differential Revision: D30506914 fbshipit-source-id: b2e358cfb991dacdb295b61bbc37beb36b73b852	2021-08-24 17:07:45 -07:00
Harut Movsisyan	956c8fa01e	Microbenchmarking matrix mult (einsum, torch.mult, torch.mm) (#63654 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63654 Test Plan: ``` > buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:matrix_mult_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: einsum_bmm # Mode: Eager # Name: einsum_bmm_B4_M5_N3_K2_cpu # Input: B: 4, M: 5, N: 3, K: 2, device: cpu Forward Execution Time (us) : 27.970 # Benchmarking PyTorch: einsum_bmm # Mode: Eager # Name: einsum_bmm_B32_M25_N20_K30_cpu # Input: B: 32, M: 25, N: 20, K: 30, device: cpu Forward Execution Time (us) : 41.830 # Benchmarking PyTorch: einsum_bmm # Mode: Eager # Name: einsum_bmm_B128_M100_N120_K110_cpu # Input: B: 128, M: 100, N: 120, K: 110, device: cpu Forward Execution Time (us) : 499.114 # Benchmarking PyTorch: bmm # Mode: Eager # Name: bmm_B4_M5_N3_K2_cpu # Input: B: 4, M: 5, N: 3, K: 2, device: cpu Forward Execution Time (us) : 6.268 # Benchmarking PyTorch: bmm # Mode: Eager # Name: bmm_B32_M25_N20_K30_cpu # Input: B: 32, M: 25, N: 20, K: 30, device: cpu Forward Execution Time (us) : 12.676 # Benchmarking PyTorch: bmm # Mode: Eager # Name: bmm_B128_M100_N120_K110_cpu # Input: B: 128, M: 100, N: 120, K: 110, device: cpu Forward Execution Time (us) : 438.219 # Benchmarking PyTorch: einsum_elementwise # Mode: Eager # Name: einsum_elementwise_B4_M5_N3_cpu # Input: B: 4, M: 5, N: 3, device: cpu Forward Execution Time (us) : 7.657 # Benchmarking PyTorch: einsum_elementwise # Mode: Eager # Name: einsum_elementwise_B32_M25_N20_cpu # Input: B: 32, M: 25, N: 20, device: cpu Forward Execution Time (us) : 18.523 # Benchmarking PyTorch: einsum_elementwise # Mode: Eager # Name: einsum_elementwise_B100_M90_N110_cpu # Input: B: 100, M: 90, N: 110, device: cpu Forward Execution Time (us) : 55.103 # Benchmarking PyTorch: mul # Mode: Eager # Name: mul_B4_M5_N3_cpu # Input: B: 4, M: 5, N: 3, device: cpu Forward Execution Time (us) : 2.501 # Benchmarking PyTorch: mul # Mode: Eager # Name: mul_B32_M25_N20_cpu # Input: B: 32, M: 25, N: 20, device: cpu Forward Execution Time (us) : 10.589 # Benchmarking PyTorch: mul # Mode: Eager # Name: mul_B100_M90_N110_cpu # Input: B: 100, M: 90, N: 110, device: cpu Forward Execution Time (us) : 50.102 Reviewed By: ajyu Differential Revision: D30455179 fbshipit-source-id: 9f2d92b2d2b860f41a8e59be2cc086d75b587f7b	2021-08-24 16:26:26 -07:00
Xiaodong Wang	6d58c83007	Turn off layer norm in jit symbolic differentiation (#63816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63816 Test Plan: Confirmed this can rescue the NE: https://www.internalfb.com/mast/job/torchx_xdwang-SparseNNApplication_72cf593d Reviewed By: ngimel Differential Revision: D30498746 fbshipit-source-id: 4a387f32ee2f70685de6104459c7f21bfbddc187	2021-08-24 15:47:13 -07:00
Alban Desmaison	41ffec07ce	Add a common autograd TLS state (#63860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63860 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30513253 Pulled By: albanD fbshipit-source-id: 97d76ed54dfbdf4ba3fc7051ce3b9bb636cefb4b	2021-08-24 15:34:06 -07:00
Eli Uriegas	865d127a66	.github: Enable with-ssh for Windows (#63440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63440 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D30521460 Pulled By: seemethere fbshipit-source-id: e987e170e73fb4f9d9f024bed0e58404ed206848	2021-08-24 14:14:27 -07:00
James Reed	4e37a015c7	[FX] Fix _replicate_for_data_parallel (#63821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63821 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D30502115 Pulled By: jamesr66a fbshipit-source-id: 0f004f95def6e1ba21ccbeab40cb0a739a0ad20c	2021-08-24 13:48:15 -07:00
soulitzer	5be17ec1fc	Do not modify saved variables in-place for spectral norm during power iteration (#62293 ) Summary: Interestingly enough, the original code did have a mechanism that aims to prevent this very issue: but it performs a clone AFTER modifying u and v in-place. This wouldn't work though because we can later use the cloned u and v in operations that save for backward, and the next time we execute forward, we modify the same cloned u and v in-place. So if the idea is that we want to avoid modifying saved variable in-place we should clone it BEFORE the in-place operation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62293 Reviewed By: bdhirsh Differential Revision: D30489750 Pulled By: soulitzer fbshipit-source-id: cbe8dea885aef97adda8481f7a822e5bd91f7889	2021-08-24 13:08:59 -07:00
Peter Bell	4a0776100e	Migrate legacy lstsq from THC to ATen (CUDA) (#63504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63504 Closes gh-24592 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30441304 Pulled By: ngimel fbshipit-source-id: ec176596f54bc084af48a73d1dbb0dcb82fec593	2021-08-24 12:47:16 -07:00
Edward Yang	699c764d2e	Revert D30513613: Removing tensor.data usage in utils with tensor set_ method Test Plan: revert-hammer Differential Revision: D30513613 (`d08a36f831`) Original commit changeset: 402efb9c30fa fbshipit-source-id: 911c66a9852de77dc5274b5fb373258c0c97739a	2021-08-24 12:20:37 -07:00
Bo Wang	835dac0869	Merge common fields from TensorInitParams and ShardedTensorMetadata into TensorProperties (#63731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63731 1) Follow up [PR/63378 last comment](https://github.com/pytorch/pytorch/pull/63378#discussion_r693143053) 2) Also updated the caller side (usage of ShardedTensorMetadta) in fbcode Ref: [landing workflow 3](https://www.internalfb.com/intern/wiki/PyTorch/PyTorchDev/Workflow/Landing/#landing-your-prs-from-gi-1) Test Plan: Imported from OSS OSS: (pytorch).. $ python test/distributed/_sharded_tensor/test_sharded_tensor.py --v FB: fbcode $ buck test mode/dev //aiplatform/modelstore/checkpointing/pyper/tests:checkpoint_utils_test Reviewed By: wanchaol, heitorschueroff Differential Revision: D30472281 fbshipit-source-id: 727fb0e7f10eab4eb7a10476194e9008f2ac1fb5	2021-08-24 11:49:06 -07:00
Aayush Prakash	d08a36f831	Removing tensor.data usage in utils with tensor set_ method (#63867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63867 When updating the model parameter, updating `parameter.data` is no longer recommended, because this `data` field will be deprecated in the future. The replacement is `tensor.set_`. ghstack-source-id: 136531233 Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_periodic_model_averager Reviewed By: SciPioneer Differential Revision: D30513613 fbshipit-source-id: 402efb9c30fafc3f285bebc631639f656ceae585	2021-08-24 11:20:44 -07:00
Yi Zhang	73431449b3	update readme and contributing.md (#63843 ) Summary: 1. In fact, Visual Studio isn't supported as CMAKE generator 2. I was asked many times why there's error as 'Could NOT find OpenMP' 3. Add Newly added Best Practices link in contributing.md Pull Request resolved: https://github.com/pytorch/pytorch/pull/63843 Reviewed By: seemethere, heitorschueroff Differential Revision: D30514095 Pulled By: janeyx99 fbshipit-source-id: 76715a1d8c049122546e5a7778cafe54e4dfd5d6	2021-08-24 10:52:11 -07:00
peterjc123	e6dc7bc61b	Subprocess encoding fixes for cpp extension (#63756 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63584 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63756 Reviewed By: bdhirsh Differential Revision: D30485046 Pulled By: ezyang fbshipit-source-id: 4f0ac383da4e8843e2a602dceae85f389d7434ee	2021-08-24 10:46:11 -07:00
mingfeima	14d4723abd	add bf16 support for bucketize (#55588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55588 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D28836796 Pulled By: VitalyFedyunin fbshipit-source-id: c9ae5b969c30a45473533be5f29bb497f8da5143	2021-08-24 10:31:42 -07:00
Karen Zhou	1256dcd509	[pruner] modify base pruner to prune bias by default (#63202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63202 By default, the prune will also prune biases, such that the whole output channel is removed. The user can manually set `also_prune_bias` to False when calling `prepare` if they don't want the bias to be pruned. ghstack-source-id: 136466671 Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1MV32 modify `fusion_tests` according to API change `buck test mode/opt //scripts/kazhou:fusion_tests` https://pxl.cl/1NbKz Reviewed By: z-a-f Differential Revision: D30294494 fbshipit-source-id: c84655648bee0035559195ca855b98fb7edaa134	2021-08-24 10:25:45 -07:00
Karen Zhou	16ba20507a	[pruner] amend base pruner API to match base sparsifier (#63178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63178 Update base pruner API to match base sparsifier API as defined in D28970960 / PR58955 Changes include: - `enable_mask_update = True` in `__init__` - `prepare` takes model and config instead of constructor - convert functionality renamed to `squash_mask`, `convert` method call now raises Error - `activation_handles` ad `bias_handles` initialized in `_prepare` instead of constructor ghstack-source-id: 136467595 Test Plan: Function names updates according to changes `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1MTgH TODO will need to modify `fbcode/scripts/kazhou/fusion_tests.py` to use new API Reviewed By: z-a-f Differential Revision: D30287179 fbshipit-source-id: d4727bea1873b500f2d4bb784db26d532bf26cce	2021-08-24 10:25:43 -07:00
Karen Zhou	5dee15401c	[pruner] refactor `ActivationReconstruction` forward hooks (#63158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63158 Combined functionality for `ActivationReconstruction` for both Linear and Conv2d in one class. The only difference between the old classes was the size and indexing of the reconstructed tensor -- that logic can be generalized by iterating over the size of `output`. ghstack-source-id: 136467465 Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1MSSv Reviewed By: raghuramank100 Differential Revision: D30282765 fbshipit-source-id: 08a1e4e0650511019fff85cf52b41dd818b0c7f8	2021-08-24 10:24:29 -07:00
Mike Iovine	7774a4e95b	[Static Runtime] Implement prim::VarStack out variant (#63579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63579 Provide a static runtime out variant implementation for the new op introduced in D30426232 (`1385f9fb12`). Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- IndividualOps_VarStack` Reviewed By: navahgar Differential Revision: D30410525 fbshipit-source-id: bc59a3d8ad23e3d94561ec2dca9cc20687dbadf8	2021-08-24 09:44:29 -07:00
Xiang Gao	227cb268bc	[Reland] Embedding thrust->cub migration (#63806 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63427 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63806 Reviewed By: bdhirsh Differential Revision: D30498255 Pulled By: ngimel fbshipit-source-id: 78b7085a92a168cf0163f53dcb712bac922f5235	2021-08-24 09:30:32 -07:00
mingfeima	94d621584a	optimize BFloat16 elemwise operators CPU: sigmoid, sigmoid_backward, tanh_backward, addcmul, addcdiv (#55221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55221 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D28836797 Pulled By: VitalyFedyunin fbshipit-source-id: 6b79098c902ffe65d228668118ef36fb49bab800	2021-08-24 08:56:17 -07:00
yanbing-j	33a163d886	Enable BFloat16 LeakyReLU and RReLU in CPU path (#61514 ) Summary: Enable and optimize BFloat16 LeakyReLU and RReLU in CPU path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61514 Reviewed By: ejguan Differential Revision: D30257612 Pulled By: VitalyFedyunin fbshipit-source-id: 8cc0d1faacd02dcc9827af724a86d95b6952748f	2021-08-24 08:34:56 -07:00
Thomas J. Fan	2ca2761f3c	ENH Adds no_batch_dim for NLLLoss (#62651 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62651 Reviewed By: VitalyFedyunin Differential Revision: D30303340 Pulled By: jbschlosser fbshipit-source-id: 7ab478cf63bf6cd1f850cad5fd101e74a2cfe3f5	2021-08-24 08:27:27 -07:00
mingfeima	d3be02d100	fix batchnorm2d issue when input is non contiguous (#63392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63392 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30476317 Pulled By: VitalyFedyunin fbshipit-source-id: 03055a0aec21cf2c029b6f32315da2b09cb722d0	2021-08-24 08:24:01 -07:00
Mike Iovine	1385f9fb12	[JIT] Add variadic stack op (#63578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63578 Added a new op `prim::VarStack` and a pass that transforms instances of `aten::stack(list, dim)` into `prim::VarStack(list[0], ..., list[n], dim)`. Also provided a JIT interpreter implementation. Most of the implementation/tests are the same as `prim::VarConcat`. Test Plan: `buck test caffe2/test/cpp/jit:jit -- TestStackOpt` Reviewed By: navahgar Differential Revision: D30426232 fbshipit-source-id: 9829a7db6e0a5038c9b7528c43c25b0c221aa2ce	2021-08-24 08:20:54 -07:00
Rong Rong (AI Infra)	f4aff3a346	[BE] add distributed run_test options (#63147 ) Summary: Currently distributed tests are mixed within test_python. We would like to split the distributed tests into its own batch thus we need to split them out. Adding an option to include/exclude distributed tests with CUSTOM_HANDLERS. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63147 Test Plan: - locally run with the addition run_test.py options. - CI Dependency: found a bug in mpiexec test and need https://github.com/pytorch/pytorch/issues/63580 to fix it first. Reviewed By: bdhirsh Differential Revision: D30496178 Pulled By: walterddr fbshipit-source-id: 7903a57b619f2425028028f944211938823918a6	2021-08-24 08:03:01 -07:00
Alban Desmaison	688f06cac3	Revert D30388099: Add a common autograd TLS state Test Plan: revert-hammer Differential Revision: D30388099 (`83d9bad44a`) Original commit changeset: 8e03f940150f fbshipit-source-id: f6d60fec66e8292f5268335bb8a3e7e1a662f23b	2021-08-24 07:22:39 -07:00
Thomas J. Fan	9914fb6615	ENH Adds no_batch_dim tests/docs for LPPool1d and Identity (#62190 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62190 Reviewed By: ejguan Differential Revision: D29942385 Pulled By: jbschlosser fbshipit-source-id: 00df6f6f01ad039631bb8679f8de94863aac7650	2021-08-24 06:59:41 -07:00
albanD	83d9bad44a	Add a common autograd TLS state (#63114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63114 This PR collapses the GradMode and InferenceMode thread local booleans into a single thread local uint8. This helps reducing the number of thread local variable accesses done when we propagate ThreadLocalStates. Note that this is even more beneficial as we will add a forward mode AD TLS (similar to GradMode) higher in this stack and this new structure should reduce the perf impact of adding this new TLS. Here is the full benchmark result between master and the top of this stack: https://gist.github.com/albanD/e421101e9ed344e94999bef3a54bf0f3 tl;dr: give a benefit in most cases. It is never detrimental. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30388099 Pulled By: albanD fbshipit-source-id: 8e03f940150ff063c2edd792733663413ae2f486	2021-08-24 06:54:02 -07:00
Marjan Fariborz	c545b099aa	Separating quantization test from distributed_test (#63058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63058 Dedicating separate tests for different quantization methods. Currently supporting FP16 method. ghstack-source-id: 136499767 Test Plan: uck test mode/dev //caffe2/test/distributed/algorithms/quantization:quantization_gloo_fork -- name_of_the_test Reviewed By: wanchaol Differential Revision: D30142580 fbshipit-source-id: 3aacec1a231a662067d2b48c001f0c69fefcdd60	2021-08-24 01:44:55 -07:00
Mikhail Zolotukhin	f0d274294d	[TensorExpr] Nuke KernelArena and KernelScope. (#63587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587 Now that there is no classes using KernelArena for memory management we can remove it. Differential Revision: D30429115 D30429115 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544	2021-08-24 00:32:16 -07:00
Mikhail Zolotukhin	62d02f2b57	[TensorExpr] Make 'Tensor' a value type. (#63586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586 This is another commit in transition from KernelArena memory management. Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need to dynamically allocate it at all - it's cheap to pass it by value, and that's what we're switching to in this commit. After this change nothing uses KernelScope/KernelArena and they can be safely removed. Differential Revision: D30429114 D30429114 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819	2021-08-24 00:32:13 -07:00
Mikhail Zolotukhin	4e15a6f495	[TensorExpr] Switch Exprs and Stmt from kernel-arena to shared_ptr. (#63216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63216 Currently there are three classes managed by KernelArena: Expr, Stmt, and Tensor (and derived classes). KernelArena has been a long standing painpoint for NNC devs and we're moving away from that memory management model to ref-count based memory model (using shared_ptr). This commit switches Expr and Stmt to shared_ptr and is the biggest change in this transition. Later commits will detach Tensor from KernelArena and kill the arena + scope altogether. Differential Revision: D30353195 D30353195 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 9575225ada3d0fb65087ae40435f3dfea4792cae	2021-08-24 00:32:11 -07:00
Mikhail Zolotukhin	dd96c26066	[TensorExpr] More NFC changes like Expr* -> ExprPtr. (#63778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63778 This is a preparation for a switch from raw pointers to shared pointers as a memory model for TE expressions and statements. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30487425 Pulled By: ZolotukhinM fbshipit-source-id: 9cbe817b7d4e5fc2f150b29bb9b3bf578868f20c	2021-08-24 00:30:49 -07:00
mingfeima	5b7cdc5a3d	add channels last for GroupNorm (#49821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49821 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26007053 Pulled By: VitalyFedyunin fbshipit-source-id: 34a48d5d3b66a159febf3c3d96748fbaba1b9e31	2021-08-23 22:54:59 -07:00
Jane Xu	f5d585391d	Add ROCm as a platform for which tests can be disabled (#63813 ) Summary: Realized we were missing ROCm as a platform on which one could disable a flaky test. (like how this issue specifies windows https://github.com/pytorch/pytorch/issues/61655) cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/63813 Reviewed By: seemethere Differential Revision: D30498478 Pulled By: janeyx99 fbshipit-source-id: f1abe8677e1ddd01de3291e1618272ad8e287dc4	2021-08-23 18:50:04 -07:00
Mike Iovine	d96ef8c1b1	[Static Runtime] SR clones graph input (#63704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63704 Previously SR did not clone the graph. This was leading to subtle bugs in `testStaticRuntime`; static runtime would modify its graph, and the graph used by the JIT interpreter would change as well. The JIT interpreter would then crash if SR-only ops were added! Cloning the graph is more consistent with the behavior of the `Module` ctor. Test Plan: `buck test caffe2/benchmarks/static_runtime/...` Reviewed By: hlu1 Differential Revision: D30463294 fbshipit-source-id: b771551a1f55f95fde79373b23babcf3e5ddf726	2021-08-23 18:45:41 -07:00
Shiyan Deng	195c60d844	[fx2trt] Add acc op and converter for torch.pow (#63795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63795 att Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_binary_ops Reviewed By: jackm321, wushirong Differential Revision: D30492488 fbshipit-source-id: 6d615770567b13720316f06fd2f866ea2fdc2995	2021-08-23 18:18:31 -07:00
Vitaly Fedyunin	e1bdebf685	Adding DataLoader2 class as future replacement of DataLoader (#63742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63742 Supports sharding and batching on loader level** Supports sharding and batching on loader level Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30494506 Pulled By: VitalyFedyunin fbshipit-source-id: 6648e09d955055ac38e3a4e3973f701acefca762	2021-08-23 18:09:07 -07:00
Rohan Varma	fc07489ec5	[BE] Enable PostLocalSGD tests on windows (#63463 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63463 Now that `torch.distributed.optim` gates DistributedOptimizer on RPC availability, local sgd optimizer can be used on windows. ghstack-source-id: 136437632 Test Plan: Ci Reviewed By: SciPioneer Differential Revision: D30358922 fbshipit-source-id: 9b56aebf1075f026637296d338805ad8851c9d40	2021-08-23 17:49:03 -07:00
Rohan Varma	16a4434422	[BE] Enable functional optim tests for windows (#63462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63462 Now that `torch.distributed.optim` gates DistributedOptimizer on RPC availability, these tests can be run on windows. ghstack-source-id: 136437635 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30358923 fbshipit-source-id: 36739bdfe7214789f17de652d30c62c2bc124c73	2021-08-23 17:49:01 -07:00
Shiyan Deng	630ec2e190	[fx_acc] Add mapper for torch.log1p (#63792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63792 Map `torch.log1p` to `acc_ops.add` + `acc_ops.log`. Test Plan: buck test mode/opt glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_log1p Reviewed By: wushirong Differential Revision: D30491706 fbshipit-source-id: bcbeddf06131113185d2019cfd7cf5e9193a8a78	2021-08-23 17:48:59 -07:00
Peter Bell	e4f44bec27	Fix pocketfft include path in mobile build (#63714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63714 PocketFFT was disabled for CMake < 3.9 but CMake 3.11 is the first version to support `INCLUDE_DIRECTORIES` as a target property. So updating to CMake 3.10 causes the mobile builds to fail. Instead of limiting the CMake support, this just adds the include directory to the entire target, Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D30498369 Pulled By: malfet fbshipit-source-id: 83372e29c477c97e7015763b7c29d6d7e456bcef	2021-08-23 17:48:57 -07:00
Peter Bell	fc47497905	Simplify ccache instructions in CONTRIBUTING.md (#62549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62549 When building CUDA files with native CMake support, it will respect the `CMAKE_CUDA_COMPILER_LAUNCHER` setting. So, there's no need for symlinks. Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D30498488 Pulled By: malfet fbshipit-source-id: 71c2ae9d4570cfac2a64d777bc95cda3764332a0	2021-08-23 17:47:38 -07:00
driazati	d9231dc3df	Skip archiving useless build artifacts (#63785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63785 We currently zip up everything in `build/` which includes a lot of cruft (`.o` files, random things copied in from dependencies, etc). This makes the artifact bigger (slower upload/download times, and takes about 1.5 minutes to archive). This change makes archiving instead take ~15 seconds and removes the 50 second upload to GitHub step that isn't as useful now that we have the HUD PR page that lists out all artifacts. Test Plan: Imported from OSS Reviewed By: seemethere, janeyx99 Differential Revision: D30494444 Pulled By: driazati fbshipit-source-id: 93202dba7387daeb4859a938110b02ff2dc2ccc4	2021-08-23 17:40:01 -07:00
Bert Maher	172e5c76ab	Fix some memory bugs in onnx passes (#63754 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63754 Running onnx tests with ASAN uncovers several memory errors. These two are caused by: (1) iterating the uses list of a node after mutation, and (2) accessing the `blocks` attribute of a possibly deleted node. To reproduce (this is on a CentOS 7 box): ``` DEBUG=1 CFLAGS="-fsanitize=address" CXXFLAGS="-fsanitize=address" USE_LLVM=$(realpath ../llvm-project/install) CMAKE_PREFIX_PATH=$CONDA_PREFIX python setup.py install LD_PRELOAD=$(realpath /lib64/libasan.so.5) numactl -C3 pytest -v --cov --cov-report xml:test/coverage.xml --cov-append onnx/test_pytorch_onnx_onnxruntime.py::TestONNXRuntime_opset11 -s ``` Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30493939 Pulled By: bertmaher fbshipit-source-id: e16e19dc9b4c9896e102ca8bf04c8bedfdde87af	2021-08-23 17:31:45 -07:00
Mike Iovine	fc6dd0bc00	[JIT] Move UseVariadicCat internals (#63577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63577 Since other variadic ops will have an almost identical implementation, we can generalize the `UseVariadicCat` implementation and put it in a common folder. Also moved some test utilities that other variadic op tests will likely need. Test Plan: `buck test caffe2/test/cpp/jit:jit -- ConcatOptTest` Reviewed By: navahgar Differential Revision: D30409937 fbshipit-source-id: 925c11c27b58ce98cb8368d2a205e26ba66d3db9	2021-08-23 17:30:36 -07:00
Akshit Khurana	130549d61b	Fix typo in NNAPI tests (#63797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63797 nnapi memory format test has a typo Test Plan: pytest test/test_nnapi.py::TestNNAPI Imported from OSS Reviewed By: Amyh11325 Differential Revision: D30495473 fbshipit-source-id: 8edad7c01a080847a64a2797e077ec4d6077552a	2021-08-23 16:34:24 -07:00
Don Jang	84890aae35	[Static Runtime] Add an out variant op for aten::abs (#63675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63675 This change adds an out variant implementation for `aten::abs`. Test Plan: - Observed `V0820 14:14:08.880342 101788 impl.cpp:1394] Switch to out variant for node: %3 : Tensor = aten::abs(%a.1)` - Perf impact: TBD Reviewed By: hlu1 Differential Revision: D30461317 fbshipit-source-id: 0c0230bd40afe463ae1ccb222c2a1207ebcf4191	2021-08-23 16:25:10 -07:00
Rong Rong (AI Infra)	55f8f95ad4	fix git diff issue (#63408 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60111, ideally we should merge this before https://github.com/pytorch/pytorch/issues/63360 but we can also test this with https://github.com/pytorch/pytorch/issues/63360 easily. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63408 Test Plan: - This is conform working with local test.sh run by setting PR_NUMBER - should be validated by GHA CI as well Concern: - currently GHA CI is running into proxy 403 rate-limit exceeded issue consistently. However the worst case is not generating any git diff files, which is going to be exactly the same as current behavior. - depends on https://github.com/pytorch/pytorch/issues/63770. Reviewed By: driazati, janeyx99 Differential Revision: D30489355 Pulled By: walterddr fbshipit-source-id: a638b7ae5820f29a7aca6cc40ff390ab253cb174	2021-08-23 15:38:18 -07:00
Eli Uriegas	49be16d50a	.github: Add ec2 information as a step (#63784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63784 Also creates the common.yml.j2 file as a place to store common code amongst the templates Should look like: ![image](https://user-images.githubusercontent.com/1700823/130495226-f18b8c0f-1ea7-4097-8bbb-e998fabb71f2.png) Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet, driazati Differential Revision: D30490682 Pulled By: seemethere fbshipit-source-id: 18028b4acff938ef54cd6e4877561b2d830a11cf	2021-08-23 15:04:04 -07:00
Erjia Guan	7946f8a9f6	Rename DataPipe to Op-er (#63325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63325 Rename each DataPipe to an operation name ending with er. Functional API should remain `verb` such as `read_from_tar` , `shuffle`, ... (Discussed in [here](https://github.com/facebookexternal/torchdata/pull/97#discussion_r688553905)) - Batch -> Batcher - Collate -> Collator - Concat -> Concater - GroupByKey - > ByKeyGrouper ? - ListDirFiles -> FileLister - LoadFilesFromDisk -> FileLoader - Map -> Mapper - ReadFilesFromTar -> TarArchiveReader - ReadFilesFromZip -> ZipArchiveReader - ReadLinesFromFile -> LineReader - Shuffle -> Shuffler - ToBytes -> StreamReader - Transforms -> Transformer - Zip -> Zipper Let me know if you have better name for each DataPipe Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30466950 Pulled By: ejguan fbshipit-source-id: 72909dca7b3964ab83b965891f96cc1ecf62d049	2021-08-23 14:36:10 -07:00
Zeina Migeed	a781340bf7	Add equality constraints for some acc opeartions for symbolic inference (#63689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63689 Test Plan: buck run mode/opt-clang caffe2/torch/fb/model_transform/experimental:fx_ir_lower_inline_cvr -- \ --action=lower_and_run \ --filename=inline_cvr_7x_dec_2020.model \ --print_glow_glog=True Reviewed By: jamesr66a Differential Revision: D30462113 fbshipit-source-id: 0b2a1ce9770561248527d47c07b80112491dc949	2021-08-23 14:11:08 -07:00
Hao Lu	0bc7fef406	[Static Runtime] Remove unused fusion patterns (#63636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63636 Reviewed By: d1jang Differential Revision: D30446573 fbshipit-source-id: 3abb7f697380f3b4e865b98c594de359b5e26b96	2021-08-23 12:55:09 -07:00
Bert Maher	a709ab34a8	[nnc] Re-enable CPU fusion" (#63665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63665 This reverts commit 125e2d02e575612eb427104e7c67f1c28f090db8. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30471646 Pulled By: bertmaher fbshipit-source-id: 4189869566f03b5f9ada78d78830f6a34946eed6	2021-08-23 12:42:42 -07:00
Peter Bell	560cd88195	Kill THCUNN (#63429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63429 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30441308 Pulled By: ngimel fbshipit-source-id: 3ae342a2f8d5c7f8827b637c4055c5d1b0a1be26	2021-08-23 12:07:16 -07:00
Rong Rong (AI Infra)	db1b27fa8d	fix mpi ssh runtime error (#63580 ) Summary: should fix https://github.com/pytorch/pytorch/issues/60756. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63580 Test Plan: - this CI. - validated by running on the bionic_cuda container: https://app.circleci.com/pipelines/github/pytorch/pytorch/366632/workflows/478602fb-698f-4210-ac09-d9c61af5c62b/jobs/15472104 Reviewed By: malfet Differential Revision: D30486472 Pulled By: walterddr fbshipit-source-id: d83ab88d163d4a468f03961a13d891b658668a7f	2021-08-23 09:45:33 -07:00
Rong Rong (AI Infra)	98449f5bba	hotfix clone issue (#63770 ) Summary: This was discovered during https://github.com/pytorch/pytorch/issues/63408. For some reason only this checkout action is not correctly set fetch-depth Pull Request resolved: https://github.com/pytorch/pytorch/pull/63770 Reviewed By: malfet, janeyx99 Differential Revision: D30486110 Pulled By: walterddr fbshipit-source-id: a67395cca2487407ed0d49c8c89587935ca5f212	2021-08-23 09:30:48 -07:00
Gary Miguel	f1d865346f	[ONNX] add test images to repo (#63717 ) Summary: This is better than the status quo: * Test doesn't download files from the internet -> faster and more reliable. * Test doesn't leave the git working directory dirty. Rather than using the original images, I've copied some images from the pytorch/vision repo. This will keep the tests in the two repos in sync, while avoiding adding new assets to the vision repo. See https://github.com/pytorch/vision/pull/4176. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63717 Reviewed By: janeyx99 Differential Revision: D30466016 Pulled By: malfet fbshipit-source-id: 2c56d4c11b5c74db1764576bf1c95ce4ae714574	2021-08-23 07:43:21 -07:00
Alban Desmaison	bafd875f74	Allow implementing either backward or vjp for Function (#63434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63434 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30431968 Pulled By: albanD fbshipit-source-id: 0bb88664283486a9fd3364e6c3d79442a44625c2	2021-08-23 07:07:11 -07:00
Jithun Nair	726fd26b3e	Update ROCm PyTorch persons of interest (#55206 ) Summary: cc jeffdaily sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55206 Reviewed By: VitalyFedyunin Differential Revision: D30296584 Pulled By: dzhulgakov fbshipit-source-id: 6e5c610cc6b7c7fd58b80fa3f9de31f269341a88	2021-08-22 22:31:09 -07:00
Pritam Damania	d6133b2fe6	Remove `_fork_processes` from common_distributed.py (#63711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63711 This removes `_fork_process` from common_distributed.py and fixes all other callpoints to use `spawn_process` instead. ghstack-source-id: 136395719 Test Plan: waitforbuildbot Reviewed By: xush6528 Differential Revision: D30463834 fbshipit-source-id: 0c09e8a996d0e5b912c8cdd45488a39951bac4db	2021-08-22 18:57:12 -07:00
Horace He	2289a12f21	Made FuncTorchBatched decompose CompositeImplicitAutograd (#63616 ) Summary: See https://github.com/facebookresearch/functorch/issues/56 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63616 Reviewed By: zou3519 Differential Revision: D30438316 Pulled By: Chillee fbshipit-source-id: e84446d9f68b87daa0cfff75b3b8a972f36ec85a	2021-08-21 17:14:39 -07:00
jiej	e926f75b0b	BatchNorm autodiff re-enabled (#57321 ) Summary: Turns on BN in autodiff: 1. outputs an empty tensor for running stats to by pass autodiff issue on None; 2. fixing BN inference backward in cudnn & miopen, where backward falls back to native batchnorm kernel instead; Pull Request resolved: https://github.com/pytorch/pytorch/pull/57321 Reviewed By: albanD, ngimel Differential Revision: D30250419 Pulled By: jansel fbshipit-source-id: a62553789c20fb50a820003a056f40d9d642dfaa	2021-08-21 09:07:31 -07:00
Bert Maher	37d60c08e5	Revert D30360382: [nnc] Support thread level parallelism in fused kernels Test Plan: revert-hammer Differential Revision: D30360382 (`d6d86efb1c`) Original commit changeset: 29acf4e932c6 fbshipit-source-id: e0531113135d30eabb172dc1537d5dd6d65dc438	2021-08-21 03:46:43 -07:00
Bert Maher	76da46ccdc	Revert D30417127: Remove flag to toggle CPU fusion in the presence of parallelism Test Plan: revert-hammer Differential Revision: D30417127 (`6600bc9651`) Original commit changeset: b77d7c68364f fbshipit-source-id: 6b52fb83a84fe241945e3cb3eeb71050d1d9c8f1	2021-08-21 03:38:07 -07:00
Wanchao Liang	8871ff29b7	[sharded_tensor] add readonly tensor properties (#63679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63679 This PR add read only tensor properties to sharded tensor, to match the torch.Tensor behaviors. Test Plan: test_sharded_tensor_metadata Reviewed By: pritamdamania87 Differential Revision: D30459343 fbshipit-source-id: 9aec8ecfe76479eed25f3b843495e5719ed2956d	2021-08-20 22:17:11 -07:00
Hao Lu	b2a601ffe5	[Static Runtime] Implement out variant for fb::quantized_linear (#63635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63635 Reviewed By: ajyu Differential Revision: D30446234 fbshipit-source-id: 1ef014186ff725930a97d0159626f9233ee74030	2021-08-20 21:42:22 -07:00
Akshit Khurana	2d58f3f56d	NNAPI: Support const values in binary ops Summary: NNAPI converter failed with 1 const value and one tensor earlier Code suggestions from dreiss Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_pointwise_binary Imported from OSS Reviewed By: anshuljain1 Differential Revision: D28893881 fbshipit-source-id: 59240373fb03c6fdafa4cb2fa4d8408dd20092f6	2021-08-20 21:10:26 -07:00
Peter Bell	b4f5809db8	Migrate thnn_conv2d from THC to ATen (#63428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63428 Closes gh-24644, closes gh-24645 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30441307 Pulled By: ngimel fbshipit-source-id: 9c3dec469c0525831ae398df261cf41b7df7e373	2021-08-20 18:29:02 -07:00
Bo Wang	3ee1f81dce	Extend _sharded_tensor constructor to support other ops like torch.ones (#63378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63378 a) Introduce InitCommonParams to wrap tensor creation params b) Factor local tensor initiation into common_params so that tensor value is not hard specified in ShardedTensor constructor c) Add _sharded_tensor.ones(...) to exemplify - Note memory_format arg is not provided to be consistent as torch.ones d) Follow up: more ops like torch.full, torch.zero, torch.rand, Test: $ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestCreateTensorFromParams --v $ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestShardedTensorChunked.test_create_sharded_tensor_with_ones --v $ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestShardedTensorEnumerable.test_create_sharded_tensor_with_ones --v Test Plan: Imported from OSS Reviewed By: pritamdamania87, wanchaol Differential Revision: D30359245 Pulled By: bowangbj fbshipit-source-id: 85768fcb36e9d9d40213036884b1266930a91701	2021-08-20 17:11:34 -07:00
driazati	7c0f5b9aa4	[clang-tidy] Enable more folders (#63380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63380 Crosses off some more of #62011, see the test in the stacked PR #63381 Test Plan: Imported from OSS Reviewed By: malfet, seemethere Differential Revision: D30455843 Pulled By: driazati fbshipit-source-id: d473545d05ffa0b2476968f0b1c55f3a16a2c755	2021-08-20 16:40:42 -07:00
Yi Zhang	e0fe5699c4	enable increment build for build_libtorch (#63074 ) Summary: Since issue https://github.com/pytorch/pytorch/issues/59859 is resolved. rerun_cmake in build_libtorch should not be hardcoded. build_libtorch is necessary to generate debug version libtorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63074 Reviewed By: VitalyFedyunin, seemethere Differential Revision: D30306705 Pulled By: malfet fbshipit-source-id: f2077d334191f4973da0681560937bc8bab730c1	2021-08-20 16:30:34 -07:00
北海若	efe01c59e3	[Doc] Deprecation notice for only_inputs argument (#63631 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63544. Changed docstring accordingly. I'm new here, not sure if the style is okay. Please check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63631 Reviewed By: ejguan Differential Revision: D30459439 Pulled By: soulitzer fbshipit-source-id: 8df3c509d1dd39764815b099ab47229550126cbe	2021-08-20 15:49:49 -07:00
driazati	bcf8e2f57e	Remove breakpad from docker image (#63598 ) Summary: As of https://github.com/pytorch/pytorch/issues/63186 we're doing this properly via a third_party cmake build, so we don't need it here anymore. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63598 Reviewed By: walterddr, malfet Differential Revision: D30432250 Pulled By: driazati fbshipit-source-id: d0d5db14355cf574e42c0d0ed786bb26230180bd	2021-08-20 15:48:39 -07:00
jiayisun	da0820e553	add BFloat16 operators on CPU: range, sinh, cosh, frexp, nan_to_num (#61826 ) Summary: Added BFloat16 support for range, sinh, cosh, frexp, and nan_to_num on CPU, and collected the benchmark data of these OPs(range, sinh, cosh, frexp, and nan_to_num) for BFloat16 and Float32 data type by using the operator_benchmark tool of PyTorch on the platform of Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz Number of cores: 1 core, 28 cores(1 socket) [cosh_sinh_benchmark.txt](https://github.com/pytorch/pytorch/files/6974313/cosh_sinh_benchmark.txt) [frexp_benchmark.txt](https://github.com/pytorch/pytorch/files/6974315/frexp_benchmark.txt) [nan_to_num_benchmark.txt](https://github.com/pytorch/pytorch/files/6974317/nan_to_num_benchmark.txt) [range_benchmark.txt](https://github.com/pytorch/pytorch/files/6974318/range_benchmark.txt) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61826 Reviewed By: saketh-are Differential Revision: D30257259 Pulled By: VitalyFedyunin fbshipit-source-id: 394cd713e6394050a8c90b2160633beb675d71dd	2021-08-20 14:56:52 -07:00
Jeff Daily	a8de0d83fe	empty caching allocator before test_avg_pool2d large subtest (#63528 ) Summary: Otherwise, unrecoverable OOM occurs on MI25. Fixes broken ROCm CI test1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63528 Reviewed By: malfet, zhouzhuojie Differential Revision: D30459151 Pulled By: walterddr fbshipit-source-id: 63e205c4f486fcbdd514cfb0ed8e38584f894585	2021-08-20 14:01:45 -07:00
Nikita Shulga	b008bb4443	Include iostream in ProcessGroupMPI.cpp (#63656 ) Summary: As it uses `std::cerr`, which in turn results in compilation regression introduced by https://github.com/pytorch/pytorch/pull/61500 Fixes https://github.com/pytorch/pytorch/issues/63653 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63656 Reviewed By: ejguan Differential Revision: D30455824 Pulled By: malfet fbshipit-source-id: 29f316e7f7fd8e7dcbee2666e7a985f25bf56515	2021-08-20 13:15:40 -07:00
Scott Wolchok	07e41cf2d7	[easy]Unbreak caffe2benchmarking build (#63655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63655 ghstack-source-id: 136324310 Test Plan: buck build //fbobjc/Apps/Internal/Caffe2Benchmarking:Caffe2Benchmarking fbobjc/mode/iphonesimulator Reviewed By: hl475, JacobSzwejbka Differential Revision: D30455659 fbshipit-source-id: b6da6be4f89b6e84753ef0849ffedea04785034a	2021-08-20 12:57:27 -07:00
BowenBao	1dd648f1c4	[ONNX] Suppport torch.dot and torch.nn.utils.spectral_norm (#62596 ) (#62765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62765 Fixes #27723 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30375181 Pulled By: msaroufim fbshipit-source-id: 715f4745899757ec405877980cd20c826028eb2c Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-08-20 12:46:56 -07:00
BowenBao	db0771b05d	[ONNX] Update repeat_interleave for dynamic repeats (#59979 ) (#62764 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62764 Fixes #58733 - Support dynamic interleave for cases with dynamic repeat values - Moved repeat_interleave symbolic from opset 11 to opset 13, as sequence as output types for loop outputs is needed for this change Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30375179 Pulled By: msaroufim fbshipit-source-id: 787f96bf91d124fd0483761088c5f4ae930d96a9 Co-authored-by: Shubham Bhokare <shubhambhokare@gmail.com>	2021-08-20 12:46:54 -07:00
BowenBao	8760254911	[ONNX] Fix an issue that optimizations might adjust graph inputs unexpectedly. (#61280 ) (#62763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62763 This PR is to fix the issue that the graph inputs might be updated when we export the model in inference mode. When a model is export in inference mode, some optimizations will be made. One side effect of these optimizations is: the inputs of graph might be adjusted. Such optimizatiosn include: 1. Conv and BatchNorm op fusion. 2. Do constant folding. If the user sets export_params=False, or set keep_initializers_as_inputs=True, it's highly possible that the user wants to provide the corresponding parameters or initiliazers as the inputs of the graph. In such situation, no matter the model is export in inference mode or training mode, exporter needs to prevent above optimizations from adjusting the graph inputs. By this, the inputs of graph could match inputs that users provided. The changes in this PR, add an additional common judgement to see if the above optimizations needs to be done or not. From the value of export_params and keep_initializers_as_inputs arguments, infer if the graph inputs are allowed to be adjusted. If no, these optimizations will be ignored, even other requirements are matched. Besides these code changes, the comments of some parameters below have been updated so that users have more thoughts when they consider how to leverage these parameters for different purposes: 1. export_params 2. training 3. do_constant_folding 4. keep_initializers_as_inputs Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30375183 Pulled By: msaroufim fbshipit-source-id: 4db8b9695649eb32a3a0fefa950ee2e5651bdba0 Co-authored-by: fatcat-z <jiz@microsoft.com>	2021-08-20 12:46:52 -07:00
BowenBao	a65d1ae7cc	[ONNX] Fix controlflow shape inference with contrib op (#60707 ) (#62762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62762 `ONNXShapeTypeInference` for node `n` is skipped if `n` is non ONNX namespace, or if `n` contains any non ONNX namespace nodes. This prevents controlflow nodes containing contrib ops from running `SpecialPostProcess`, which sets up correct node output shape/type information in rare cases. This PR depends on opset 14 export https://github.com/pytorch/pytorch/pull/59486 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30375180 Pulled By: msaroufim fbshipit-source-id: 5deacec39f091deb4d75ddd9e660e12fca7f16c5 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-08-20 12:45:53 -07:00
Alban Desmaison	125e2d02e5	Revert D30417370: [nnc] Enable CPU fusion Test Plan: revert-hammer Differential Revision: D30417370 (`b9fc656cf2`) Original commit changeset: 84ce7a578a36 fbshipit-source-id: cd23774cdc3273fd72f8a05f1900eaf36f373e6b	2021-08-20 12:30:21 -07:00
Pritam Damania	2d671ca41b	[8/N] Remove c10d/ddp fork tests. (#63454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63454 Continuation of https://github.com/pytorch/pytorch/pull/63443, this PR removes all fork tests from torch.distributed. ghstack-source-id: 136285511 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D30387872 fbshipit-source-id: f6d6313db126ae7b95b86f78a1e0726887c5c513	2021-08-20 12:23:18 -07:00
Alban Desmaison	71da114412	Revert D30426527: Adding DataLoader2 class as future replacement of DataLoader Test Plan: revert-hammer Differential Revision: D30426527 (`5a7133b87f`) Original commit changeset: e5905d3364c4 fbshipit-source-id: 794d8a4e9256ccff8cf894aee10eff6adc30d502	2021-08-20 12:06:52 -07:00
Philip Meier	70a3210eca	Add `BinaryUfuncOpInfo` and broadcasting tests (#61964 ) Summary: As proof of concept, this PR uses the new `BinaryUfuncOpInfo` in broadcasting tests for `add`, `sub`, `mul`, `div`, `floor_div`, and `true_div`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61964 Reviewed By: ngimel Differential Revision: D30407734 Pulled By: mruberry fbshipit-source-id: ada28994f43b0635f279f45a02ecba18bc8ee033	2021-08-20 11:44:15 -07:00
Bert Maher	b9fc656cf2	[nnc] Enable CPU fusion (#63545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63545 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30417370 Pulled By: bertmaher fbshipit-source-id: 84ce7a578a3678d5562bab99d1dc00330c4f72d1	2021-08-20 11:18:21 -07:00
Bert Maher	6600bc9651	Remove flag to toggle CPU fusion in the presence of parallelism (#63514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63514 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30417127 Pulled By: bertmaher fbshipit-source-id: b77d7c68364f2af73570740540f3b1152313016e	2021-08-20 11:18:19 -07:00
Bert Maher	d6d86efb1c	[nnc] Support thread level parallelism in fused kernels (#63386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63386 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30360382 Pulled By: bertmaher fbshipit-source-id: 29acf4e932c669ce0f35823faea9099bcd8119b6	2021-08-20 11:18:17 -07:00
Aaron Bockover	c78ab28441	Add support for the ONNX Runtime Eager Mode backend (#58248 ) Summary: This PR implements the necessary hooks/stubs/enums/etc for complete ONNX Runtime (ORT) Eager Mode integration. The actual extension will live out of tree at https://github.com/pytorch/ort. We have been [working on this at Microsoft](https://github.com/microsoft/onnxruntime-pytorch/tree/eager-ort/torch_onnxruntime) for the last few months, and are finally ready to contribute the PyTorch core changes upstream (nothing major or exciting, just the usual boilerplate for adding new backends). The ORT backend will allow us to ferry [almost] all torch ops into granular ONNX kernels that ORT will eagerly execute against any devices it supports (therefore, we only need a single ORT backend from a PyTorch perspective). Pull Request resolved: https://github.com/pytorch/pytorch/pull/58248 Reviewed By: astaff Differential Revision: D30344992 Pulled By: albanD fbshipit-source-id: 69082b32121246340d686e16653626114b7714b2	2021-08-20 11:17:13 -07:00
Victor Quach	b95ce1591d	Add docs describing saved tensor hooks (#62362 ) Summary: Add section to the Autograd mechanics docs to describe the recently exposed saved tensors (https://github.com/pytorch/pytorch/issues/52451), how to register packing / unpacking hooks (https://github.com/pytorch/pytorch/issues/60975) and how to use default hooks (https://github.com/pytorch/pytorch/issues/61834) Sister PR: https://github.com/pytorch/pytorch/issues/62361 (will add a link from autograd.rst to notes/autograd in whatever PR does not land first) Pull Request resolved: https://github.com/pytorch/pytorch/pull/62362 Reviewed By: soulitzer Differential Revision: D30453177 Pulled By: Varal7 fbshipit-source-id: f5759977b069ff0ef36a47b08856d297691a6caa	2021-08-20 11:10:51 -07:00
Shiyan Deng	03cc46a0ac	[fx2trt] Add layernorm plugin for dynamic shape (#63620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63620 Added layernorm dynamic plugin, so that it works when explicit batch dim is required. Needed for ig model. Changed the way of how we creating a plugin layer from instantiating the plugin directly to use plugin creator with `PluginFieldCollection`. Follow ups: Another way to convert layernorm is by breaking it down to supported trt layers. T97398182 Test Plan: layernorm unittest Reviewed By: yinghai Differential Revision: D30138205 fbshipit-source-id: aebe021d8de818e20376634f30e84579b9807f9b	2021-08-20 10:52:42 -07:00
Pavithran Ramachandran	5f997a7d2f	[PyTorch][Edge] Improve InflatableArgs for Bundled Inputs (#62368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62368 # Context The bundled inputs accepts an expression in the form of string InflatableArg.fmt that can be applied on the inputs to inflate. The InflatableArg.fmt provides flexibility to have custom transformation to inflate. When the input arguments to a function are not Tensor type, TorchScript casts the inputs from type T to Optional[T] expects the function to handle Nullable (None) clause as well. This becomes tricky to handle in one line code or lambda functions. We propose an alternative way which allows InflatableArg to include the text of a TorchScript function that would be defined on the module as a helper, then use that in its inflation expression. This can be provided by InflatableArg.fmt_fn. Please refer to pytorch/test/test_bundled_inputs.py for example on how to use the same. Also refer JacobSzwejbka comment on the same [here](https://github.com/pytorch/pytorch/pull/62368#issuecomment-892012812) # Mitigation Allow InflatedArg to include the text of a TorchScript function that would be defined on the module as a helper, then use that in its inflation expression. ghstack-source-id: 135158680 Test Plan: To run `test_dict_args` ``` (base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource/fbcode] buck test //caffe2/test:test_bundled_inputs -- test_dict_args Action graph will be rebuilt because files have been added or removed. Building: finished in 5.4 sec (100%) 12180/12180 jobs, 0/12180 updated Total time: 5.8 sec More details at https://www.internalfb.com/intern/buck/build/fafcf277-1095-4cba-978d-6022f0d391ad Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 5ef9de71-c1b1-406b-a6c0-3321c2368b8d Trace available for this run at /tmp/tpx-20210727-163946.454212/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7036874465805934 ✓ ListingSuccess: caffe2/test:test_bundled_inputs - main (11.365) ✓ Pass: caffe2/test:test_bundled_inputs - test_dict_args (test_bundled_inputs.TestBundledInputs) (12.307) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/7036874465805934 ``` To check the py code of TS module: P433043973 Reviewed By: dreiss Differential Revision: D29950421 fbshipit-source-id: c819ec5c94429b7fbf6c4beb0259457f169b08ec	2021-08-20 09:36:08 -07:00
Vitaly Fedyunin	5a7133b87f	Adding DataLoader2 class as future replacement of DataLoader (#63523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63523 Supports sharding and batching on loader level** * #63522 Adding IterableAsDataPipe IterDataPipe usefull for tests and simple cases Supports sharding and batching on loader level Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30426527 Pulled By: VitalyFedyunin fbshipit-source-id: e5905d3364c4880e720dd62fb066f08881c71a6e	2021-08-20 09:01:55 -07:00
albanD	99e28baeba	Small custom function refactor which doesn't change anything (#63433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63433 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D30431970 Pulled By: albanD fbshipit-source-id: 905fa4d2ddeca18005b1bcb13dd6f8a080327e7c	2021-08-20 08:44:23 -07:00
Vitaly Fedyunin	0f2c60f0e3	Adding IterableAsDataPipe IterDataPipe (#63522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63522 Supports sharding and batching on loader level * #63522 Adding IterableAsDataPipe IterDataPipe usefull for tests and simple cases usefull for tests and simple cases Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30426528 Pulled By: VitalyFedyunin fbshipit-source-id: 535b5cc1505bb58731fcca8170541ac5ee7bd417	2021-08-20 08:38:23 -07:00
Mike Iovine	ae901e372e	[Static Runtime] Enable RemoveListMutation (#63536 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63536 Enable a pass that transforms sequences like this: ``` li = [] li.append(1) li.append(2) ``` into this: ``` li = [1, 2] ``` Initially I implemented this pass myself (D30387213), but I discovered that there is an existing pass that does the same thing. Reviewed By: hlu1 Differential Revision: D30412970 fbshipit-source-id: 0810ef03480878d5039bd800a40f5fd31c2652ec	2021-08-20 06:15:41 -07:00
Don Jang	913c1f83f4	[Static Runtime] Add native op for aten::detach (#63625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63625 This change adds a static runtime's native op implementation for `aten::detach` op. See the standard `aten::detach`'s implementation (https://codebrowser.bddppq.com/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp.html#_ZN2at6native6detachERKNS_6TensorE ) for comparison. Test Plan: - Added `StaticRuntime.IndividualOps_Detach`. - Observed ``` V0819 18:55:33.181188 3092034 impl.cpp:1398] Switch to native impl for node: %a.1 : Tensor = aten::detach(%input.1) ``` Reviewed By: hlu1 Differential Revision: D30443187 fbshipit-source-id: d6e0eadb1b817e0a126c4fc97526abc276ee8a17	2021-08-20 00:46:27 -07:00
Nikita Shulga	bec75daa77	Update protobuf to 3.13.1 (#62571 ) Summary: Update bazel to 4.10.0 Update ASAN_SYMBOLIZER_PATH to llvm-7 Suppress `vptr` ubsan violations in `test_jit` Fix ProtoBuf patching for ONNX which caused Windows builds to crash while attempting to free `std::string` allocated on stack Fixes https://github.com/pytorch/pytorch/issues/62569 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62571 Reviewed By: walterddr Differential Revision: D30048685 Pulled By: malfet fbshipit-source-id: 6462c1bef9c42318551d2cf906bbab41e1d4e1cd	2021-08-19 23:43:55 -07:00
Raghavan Raman	d82667f7e2	[nnc] Updated sliceTail to do inplace mutation (#63532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63532 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30412184 Pulled By: navahgar fbshipit-source-id: e7669d3b9d24e14501f3feb6505c88d1d42030c6	2021-08-19 22:55:30 -07:00
Raghavan Raman	5e31a3b904	[nnc] Updated sliceHead to do inplace mutation (#63531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63531 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30412183 Pulled By: navahgar fbshipit-source-id: 47ee9482a36e606788d28d22eee4edaca45ffa50	2021-08-19 22:54:05 -07:00
Scott Wolchok	0a66d5b325	[PyTorch] Remove unnecessary iostream includes in headers (#61500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61500 libstdc++ defines a static variable called `std::__ioinit` in iostream that adds global constructor size overhead to each translation that includes iostream. To reduce the size overhead from that, we can often include ostream instead. ghstack-source-id: 136163529 Test Plan: buildsizebot some mobile apps Reviewed By: dhruvbird Differential Revision: D29648016 fbshipit-source-id: 9c3139712c71248513cc5032d21e77f3ecbae8fe	2021-08-19 18:54:51 -07:00
Scott Wolchok	b99a299c60	[PyTorch] Remove unused dump() methods in vec headers (#63533 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63533 These methods don't seem to be used, and they use std::cout, which incurs a small code size overhead on platforms using libstdc++ due to std::__ioinit (see #61500). Seems like we can just delete them? ghstack-source-id: 136163409 Test Plan: CI Reviwers: #sentinel, dhruvbird Reviewed By: dskhudia Differential Revision: D30412269 fbshipit-source-id: 380b9aa2f9aabc4107188b6b209d2afc1769c0ee	2021-08-19 18:53:49 -07:00
Pavithran Ramachandran	0b6cc8daf2	[PyTorch][Edge] Support backtrace symbolication for Android builds (#63339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63339 # Context https://fb.workplace.com/groups/pytorch.dev/permalink/900474523864362/?comment_id=901125403799274&reply_comment_id=905023386742809 ##### WHAT IS A STACK TRACE? A stack trace (also called stack backtrace or stack traceback) is a report of the active stack frames at a certain point in time during the execution of a program. Typically when an exception is thrown, one would expect to see the code (file:line) that threw the exception, and every intermediate frame up to and including the main function. We are enabling android stack trace to help debugging on android devices. Test Plan: ## Steps to test ``` buck build fbsource//xplat/caffe2/mode/aibench_pytorch_android -c pt.enable_qpl=0 -c pt.has_backtraces=1 fbsource//xplat/caffe2/fb/lite_predictor:lite_predictorAndroid#android-x86_64 one_world android emulator android-28 adb push ~/fbsource/buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictorAndroid#android-x86_64 /data/local/tmp cd /data/local/tmp ./lite_predictorAndroid#android-x86_64 ./lite_predictorAndroid#android-x86_64 --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true ``` ## See how model file is not found stack traces is: ### before ``` ./lite_predictorAndroid#android-x86_64 --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true Run with 2 threads Run with 2 threads Loading model... terminating with uncaught exception of type c10::Error: open file failed, file path: ./detect.bc Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:13 (most recent call first): (no backtrace available) Aborted ``` ### after ``` 134\|generic_x86_64:/data/local/tmp $ ./lite_predictorAndroid#android-x86_64 --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true Run with 2 threads Run with 2 threads Loading model... terminating with uncaught exception of type c10::Error: open file failed, file path: ./detect.bc Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:13 (most recent call first): frame #0 c10::get_backtrace(unsigned long, unsigned long, bool)[0x59494274f10e] frame #1 [0x5949427b1eee] frame #2 [0x5949427b1eb2] frame #3 [0x5949427b1cdc] frame #4 std::__ndk1::function<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > ()>::operator()() const[0x5949427afc34] frame #5 c10::Error::Error(c10::SourceLocation, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >)[0x5949427b05b1] frame #6 c10::detail::torchCheckFail(char const, char const, unsigned int, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x5949427aca5f] frame #7 caffe2::serialize::FileAdapter::RAIIFile::RAIIFile(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x5949426b37b2] frame #8 caffe2::serialize::FileAdapter::FileAdapter(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x5949426b3903] frame #9 torch::jit::_load_for_mobile(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, c10::optional<c10::Device>, std::__ndk1::unordered_map<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::hash<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::equal_to<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > > >&)[0x5949422737bd] frame #10 torch::jit::_load_for_mobile(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, c10::optional<c10::Device>)[0x594942273769] frame #11 benchmark(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, int, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, bool, int, int, int, bool, int, bool, int, double, bool, bool, bool, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x59494189b21d] frame #12 main[0x594941882aff] frame #13 __libc_init[0x7b699d08578d] ``` ### what we get for os:linux ``` (base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource] ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true Run with 24 threads Run with 24 threads Loading model... terminate called after throwing an instance of 'c10::Error' what(): open file failed, file path: ./detect.bc Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:13 (most recent call first): frame #0: ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor() [0x20cb7fe] frame #1: ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor() [0x20cb6c6] frame #2: std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>::operator()() const + 0x54 (0x20ca4e4 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #3: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x57 (0x20ca9a7 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #4: c10::detail::torchCheckFail(char const, char const, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x7a (0x20c823a in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #5: caffe2::serialize::FileAdapter::RAIIFile::RAIIFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x96 (0x206f3d6 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #6: caffe2::serialize::FileAdapter::FileAdapter(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x42 (0x206f502 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #7: torch::jit::_load_for_mobile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x30 (0x1be826c in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #8: torch::jit::_load_for_mobile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>) + 0x35 (0x1be8214 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #9: benchmark(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, int, int, int, bool, int, bool, int, double, bool, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x16d (0x12093ad in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #10: main + 0x25c (0x11f933c in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #11: __libc_start_main + 0x105 (0x7fc7b9f2ed95 in /usr/local/fbcode/platform009/lib/libc.so.6) frame #12: _start + 0x2a (0x11f902a in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) Aborted (core dumped) ```` Reviewed By: dhruvbird Differential Revision: D30135947 fbshipit-source-id: f50c634ef4545843305cad4b4a14a8776b1aec76	2021-08-19 18:41:29 -07:00
Nikita Shulga	f2bf0f229f	Revert D30359218: [pytorch][PR] [doc] pre-commit fix instructions Test Plan: revert-hammer Differential Revision: D30359218 (`4e1d84ae8f`) Original commit changeset: 61771babeac4 fbshipit-source-id: c2ac0a4a7463fafa03ad0b20bfb0701a8c1476c4	2021-08-19 16:48:04 -07:00
zhouzhuojie	d0d27f6971	Add concurrency group for more workflows (#63606 ) Summary: Fixes unnecessary duplicated workflows runs ![image](https://user-images.githubusercontent.com/658840/130146332-ecf54e49-3538-49c1-88de-b099f1c1e41f.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/63606 Reviewed By: malfet, mruberry Differential Revision: D30436889 Pulled By: zhouzhuojie fbshipit-source-id: aafbad1edc45e3ab9bceb00e8f3b4204f18e43d0	2021-08-19 15:39:28 -07:00
Zeina Migeed	71ab48ed3b	acc type inference (#63119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63119 Test Plan: buck run mode/opt-clang caffe2/torch/fb/model_transform/experimental:fx_ir_lower_inline_cvr -- \ --action=lower_and_run \ --filename=inline_cvr_7x_dec_2020.model \ --print_glow_glog=True Reviewed By: jamesr66a, jfix71, ansley Differential Revision: D30235895 fbshipit-source-id: dab7f96e1799b99eeae0ee519cf0ddd636fddf2e	2021-08-19 15:23:56 -07:00
Sergei Vorobev	ccca66597a	Replace hardcoded values in IndexKernel.cu (#63372 ) Summary: This is a small change that helps to maintain Cruise pytorch fork, since we use a different hardcoded value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63372 Reviewed By: mruberry Differential Revision: D30396171 Pulled By: ejguan fbshipit-source-id: cc0023f58b5922d3d98c7283495e6dc8d35049b6	2021-08-19 15:02:28 -07:00
Adam J. Stewart	e5ab0d1013	DataLoader: allow non-integer Samplers (#63500 ) Summary: Not entirely sure how to use TypeVar but if someone could give me a hint it would be appreciated. Also let me know if you want me to add tests so we can make sure non-integer samplers actually work. It seems like `test/test_dataloader.py` is the correct location but that's a big file. Fixes https://github.com/pytorch/pytorch/issues/63483 ejguan Pull Request resolved: https://github.com/pytorch/pytorch/pull/63500 Reviewed By: mruberry Differential Revision: D30403689 Pulled By: ejguan fbshipit-source-id: 464e09e5aad3215b94a29cc5e21cb4b10ec136e3	2021-08-19 14:55:46 -07:00
Kimish Patel	11a40ad915	[Pytorch] Fix callstack pointer serialization bug (#63576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63576 We serialize function name associated with InlinedCallStackPtr. This is derived via querying Function* stored in InlinedCallStack. However this is a raw pointer that is not gauranteed to be valid when we serialization happens. On the other hand we also store function name separately when constructing InlinedCallStack anyways. So this change just uniformly relies on function_name instead of Function* Test Plan: Internal build's asan failure + CI Reviewed By: larryliu0820 Differential Revision: D30427029 fbshipit-source-id: de9617482404785920ed2e67b72f38461590fba3	2021-08-19 13:35:52 -07:00
Charles David Hernandez	6c3ebccc00	Updating the names of these functions (#63513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63513 updating these names per Jerry's nits in the previous pr Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D30406710 fbshipit-source-id: a9f1577a2b8c4a93f5005e0f6278b7d7348d8b66	2021-08-19 13:34:34 -07:00
Natalia Gimelshein	ce6fe50158	Revert embedding thrust->cub migration (#63451 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63427 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63451 Reviewed By: mruberry Differential Revision: D30398482 Pulled By: ngimel fbshipit-source-id: e153786d204215555a6571688eabae712facad7e	2021-08-19 13:03:33 -07:00
Philip Meier	99203580a9	Updates internal `assert_allclose` callsites in favor of `assert_close` (#61841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61841 Redo of #60863. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30408145 Pulled By: mruberry fbshipit-source-id: 0b34ebc7f23ba38ecd89640b61d8aca59b7eab58	2021-08-19 12:50:41 -07:00
Mike Ruberry	efd70b7ce6	Modernizes add and mul documentation (#63309 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39329. The documentation for torch.add and torch.mul was sorely out of date and even included deprecated references. This PR modernizes their descriptions consistent with torch.sub. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63309 Reviewed By: ngimel Differential Revision: D30338004 Pulled By: mruberry fbshipit-source-id: ee1c2a8106af8341253cafb0003b06e8f652624d	2021-08-19 12:49:30 -07:00
kshitij12345	d986d4bf63	[special] use __all__ to hide internal imports (#63135 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63135 Reviewed By: ngimel Differential Revision: D30364287 Pulled By: mruberry fbshipit-source-id: 20078668943fafa45ce09610634b1d2c424b1922	2021-08-19 12:45:43 -07:00
Yusuo Hu	0c3904d180	[BF16] Add a missing thread local specifier to autocast_gpu_dtype (#63416 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63416 Fix a missing thread local specifier introduced by recent PR https://github.com/pytorch/pytorch/pull/61002 Test Plan: Unit Tests Reviewed By: ngimel Differential Revision: D30376154 fbshipit-source-id: c70d37ec85c3eba88eb87f766f1c4e7aeff8eaf9	2021-08-19 12:39:27 -07:00
Pritam Damania	535d44141b	[7/N] Remove fork tests for RPC. (#63443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63443 After https://github.com/pytorch/pytorch/pull/63442, all distributed tests can run with opt-asan. As a result, we can now remove all of our fork based tests. This is the first PR in a stack, which first removes fork based tests from RPC. ghstack-source-id: 136177744 Test Plan: waitforbuildbot Reviewed By: lw Differential Revision: D30384905 fbshipit-source-id: 86d438aebaa6cb02ae2a966fea244849849a1889	2021-08-19 11:22:40 -07:00
driazati	bd8608cd5c	Use CMake for breakpad (#63186 ) Summary: We currently build breakpad from [this fork](https://github.com/driazati/breakpad) to include extra logic to restore signal handlers that were previously present. With some [new additions](https://github.com/google/breakpad/compare/main...driazati:main) this fork now includes a CMake based build, so we can add breakpad as a proper dependency rather than rely on including it in Docker images as a system library which is error prone (we have a bunch of images) and hard to extend to MacOS / Windows. This also includes some changes to the crash handling code to support MacOS / Windows in a similar way to Linux. ```python import torch # On Windows this writes crashes to C:\Users\<user>\AppData\pytorch_crashes # On MacOS/Linux this writes crashes to /tmp/pytorch_crashes torch.utils._crash_handler.enable_minidumps() # Easy way to cause a segfault and trigger the handler torch.bincount(input=torch.tensor([9223372036854775807])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63186 Reviewed By: malfet, seemethere Differential Revision: D30318404 Pulled By: driazati fbshipit-source-id: 0d7daf3701cfaba5451cc529a0730272ab1eb1dc	2021-08-19 10:42:01 -07:00
Scott Wolchok	e030b81356	[easy] Fix missing move in TupleType::createNamed (#61572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61572 ghstack-source-id: 136161829 Test Plan: CI Reviewed By: SplitInfinity Differential Revision: D29672872 fbshipit-source-id: d8ba2d54f7914dbeb3fc52aa21dd77025951c4b5	2021-08-19 10:38:52 -07:00
Shiyan Deng	3aa4521fe8	[hpc] use fx2trt for exploration track (#63535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63535 Reviewed By: yinghai, jianyuh Differential Revision: D30272810 fbshipit-source-id: 61f3edf2a2282cd8c268a92acf92feb05a6ae3e1	2021-08-19 10:18:56 -07:00
Shiyan Deng	885e312ce0	Add permute021 fx2trt converter (#63238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63238 Reviewed By: yinghai Differential Revision: D30295373 fbshipit-source-id: 2a189fe485edaa978fd03e4b8d8582edb34ec648	2021-08-19 10:17:48 -07:00
Scott Wolchok	e7831fe5de	[PyTorch] Test IValue move/copy/assign/swap more (#54717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54717 Hit more tags in these tests ghstack-source-id: 136140508 Test Plan: buck test //caffe2/aten:ivalue_test Reviewed By: anjali411 Differential Revision: D27339736 fbshipit-source-id: 610c8e92846bb70ba725ab117440326ab50af5ce	2021-08-19 09:50:40 -07:00
David Esiobu	79693bb86a	Use linecache.lazycache to cache generated code. (#63453 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63453 Instead of patching linecache.getlines, use linecache.lazycache and parts of the loader protocol described in PEP-302 Test Plan: python3 test/test_fx.py Imported from OSS Reviewed By: suo Differential Revision: D30388176 fbshipit-source-id: 92933711ecf3a21a07e1d6b0d1185ab0efd8341c	2021-08-19 09:17:01 -07:00
anjali411	e1334512a3	Add fastpath for dot and vdot when the inputs have conj bit set to True (#62915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62915 As much as 45% and 20% perf improvement on CUDA and CPU respectively. consistent improvement in perf for all cases -- see perf numbers in comments below Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D30404006 Pulled By: anjali411 fbshipit-source-id: 565940da28c7761d993cf43346932c24292e8a4d	2021-08-19 08:42:24 -07:00
Till Hoffmann	f596aa8b77	Poisson zero rate (#61511 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/53485 by allowing zero rates for the Poisson distribution. This implementation is consistent with `scipy.stats.poisson` which admits zero rates. In addition to addressing the aforementioned issue, this PR makes two supporting changes: 1. add a `nonnegative` constraint to enforce non-negative rates for the Poisson distribution. 2. adjust the evaluation of the gradient of `xlogy` such that it is well defined for `x == 0 and y == 0`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61511 Reviewed By: ejguan Differential Revision: D30352917 Pulled By: albanD fbshipit-source-id: f3d33da58360e80d75eb83519f199b93232a2a2d	2021-08-19 08:30:28 -07:00
Jeff Daily	be9be9bfdd	add distributed/_sharded_tensor/test_sharded_tensor to ROCM_BLOCKLIST (#63508 ) Summary: Fixes current ROCm CI test2 brokenness until tensorpipe is fully supported by ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63508 Reviewed By: ejguan Differential Revision: D30406450 Pulled By: walterddr fbshipit-source-id: c07509271d5d33901f3eaf7ffb916dc3626e1f9a	2021-08-19 07:50:55 -07:00
Ilqar Ramazanli	e7c4988b52	To fix the chainability at epoch zero for some schedulers (#63457 ) Summary: It has been discussed in the https://github.com/pytorch/pytorch/pull/60836#issuecomment-899084092 that we have observed an obstacle to chain some type of learning rate schedulers. In particular we observed * some of the learning rate schedulers returns initial learning rates at epoch 0 as ``` return self.base_lrs` ``` * This can be a problem when two schedulers called as chained as ``` scheduler1.step() scheduler2.step() ``` in particular, we completely ignore the effect of scheduler1 at epoch 0. This could not be an issue if at epoch 0, scheduler1 was ineffective as in many schedulers, however for schedulers as WarmUp Schedulers, where at epoch 0 schedulers multiplicative value is smaller than 1 this could lead to undesired behaviors. The following code snippet illustrates the problem better ## Reproducing the bug ```python import torch from torch.nn import Parameter from torch.optim import SGD from torch.optim.lr_scheduler import WarmUpLR, ExponentialLR model = [Parameter(torch.randn(2, 2, requires_grad=True))] optimizer = SGD(model, 1.0) scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant") scheduler2 = ExponentialLR(optimizer, gamma=0.9) for epoch in range(10): print(epoch, scheduler2.get_last_lr()[0]) optimizer.step() scheduler1.step() scheduler2.step() ``` ### Current Result ``` 0 1.0 1 0.9 2 0.81 3 0.7290000000000001 4 0.6561000000000001 5 5.904900000000001 6 5.314410000000001 7 4.782969000000001 8 4.304672100000001 9 3.874204890000001 ``` ### Expected Result ``` 0 1.0 1 0.9 2 0.81 3 0.7290000000000001 4 0.6561000000000001 5 0.5904900000000001 6 0.5314410000000001 7 0.4782969000000001 8 0.4304672100000001 9 0.3874204890000001 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63457 Reviewed By: datumbox Differential Revision: D30424160 Pulled By: iramazanli fbshipit-source-id: 3e15af8d278c872cd6f53406b55f4d3ce5002867	2021-08-19 07:17:03 -07:00
Alban Desmaison	2d5b19f62b	Update full backward hook doc with not-same-object note (#63245 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61446 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63245 Reviewed By: ejguan Differential Revision: D30352656 Pulled By: albanD fbshipit-source-id: 7000ecb54a80f2da968ec7600b98574b608578ae	2021-08-19 06:50:56 -07:00
Mike Iovine	47a9e8ff32	[Static Runtime] Support __getitem__ for lists (#63398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63398 This change provides a native `__getitem__` implementation for lists to avoid overhead associated with falling back to the JIT interpreter. Test Plan: Unit tests: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D30368464 fbshipit-source-id: e0e0971508cd5d9bcf6025606993dc24ecbf6764	2021-08-19 06:38:51 -07:00
Alban Desmaison	ce61100923	Revert D29399533: Hoisting common expressions out of If blocks Test Plan: revert-hammer Differential Revision: D29399533 (`9477211e7d`) Original commit changeset: 9336b9dc48c0 fbshipit-source-id: f081c7280203f40328bcbb0c03a7c6a007acedb7	2021-08-19 06:20:40 -07:00
Chen Lai	6bb68ba507	Fix interpreter debug logging message (#63499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63499 https://github.com/pytorch/pytorch/pull/62418 combine the instruction and debug handle. This change fix the debugging message. ghstack-source-id: 136184053 Test Plan: Uncomment and it works Reviewed By: kimishpatel, raziel Differential Revision: D30390699 fbshipit-source-id: e32b7b297ad3b7d8bffebd025d15519083a244c4	2021-08-19 02:14:13 -07:00
Nikolay Korovaiko	5254e3adb8	layernom inplace (#63437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63437 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30388824 Pulled By: Krovatkin fbshipit-source-id: 852d19bf238544c5de177ed5854dcd01c7ae5572	2021-08-18 23:07:25 -07:00
Nikolay Korovaiko	531262fe2e	layernorm (#63436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63436 use MKLDNN layernorm use mkldnn version 2 address Elias feedback fix build CI errors Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30388825 Pulled By: Krovatkin fbshipit-source-id: fb909bfbf53cb8567a43aac40f51c491daeec908	2021-08-18 23:05:39 -07:00
Mikhail Zolotukhin	6e00b31b15	[TensorExpr] Make CacheReplacer and IndexFlattener mutate stmts/exprs inplace. (#63527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63527 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30411411 Pulled By: ZolotukhinM fbshipit-source-id: efb14ee57b36537fa4fefa89bdd6bafe7151c012	2021-08-18 22:59:31 -07:00
Mikhail Zolotukhin	1d62fb8a63	[TensorExpr] Speedup ExternalCall.ComputeInterop test by reducing tensor sizes. (#63526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63526 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30411410 Pulled By: ZolotukhinM fbshipit-source-id: d9a99afac14d2238b5100c98ae9ed4467f9f05ea	2021-08-18 22:58:25 -07:00
Michael Dagitses	773c8b6440	support optional comparisons with different but comparable types (#62890 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62565 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62890 Reviewed By: ejguan Differential Revision: D30396008 Pulled By: dagitses fbshipit-source-id: fca02207509f882973d54484f89c4d116505fc66	2021-08-18 21:40:38 -07:00
Edward Yang	2544664e54	Beef up comment in AccumulateType (#63503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63503 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30403160 Pulled By: ezyang fbshipit-source-id: 6cb24418152d9fb146f86b6f973ec50f1a397a58	2021-08-18 20:59:37 -07:00
Yinbin Ma	0d437fe6d0	BF16 allreduce hook (#63260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63260 Add BF16 all-reduce communication hook. Skip if CUDA version < 11 or NCCL version < 2.9.7. Reviewed By: SciPioneer Differential Revision: D30238317 fbshipit-source-id: bad35bf7d43f10f1c40997a282b831b61ef592bb	2021-08-18 20:53:49 -07:00
John Clow	9477211e7d	Hoisting common expressions out of If blocks (#59492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59492 Adding code to find common expressions from the two subblocks of an if operation and hoist them before the if block. This also allows Dead Code Elimination to then eliminate some if blocks. Also eliminated some dead code in the codebase. Test Plan: python test_jit.py TestIfHoisting Imported from OSS Reviewed By: ngimel Differential Revision: D29399533 fbshipit-source-id: 9336b9dc48c02c38862f98f98cd72fc1767a1802	2021-08-18 16:29:30 -07:00
Amy He	d9547b9bb2	Nnapi Delegation: Quick improvements (#63489 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63489 A few quick improvements to the Android NNAPI Delegate, some of which were discussed here https://github.com/pytorch/pytorch/pull/62272: 1) `throw std::exception` replaced with `TORCH_CHECK` to reduce runtime size (nnapi_backend_lib.cpp) 2) weights processing moved from compile to preprocess step, since it can be done AOT (nnapi_backend_lib.cpp & nnapi_backend_preprocess.cpp) 3) `ser_model_` and `shape_compute_module_` member variables removed, since they are never used after `init()`, so they are not needed (nnapi_backend_lib.cpp) Test Plan: Unit tests: `python test/test_jit.py TestNnapiBackend` Run SparkAR segmentation with delegated NNAPI as done here D30259033 (can use `jf download GAekdAwsyGKXhggFALN4LnSBTzcubsIXAAAz --file "v303-nnd-mod.ptl"` to get a preprocessed model from these changes) Imported from OSS Reviewed By: raziel, iseeyuan Differential Revision: D30398880 fbshipit-source-id: b6872e1e9ccd583622b80659da00c83fdd82580e	2021-08-18 16:25:01 -07:00
kshitij12345	4dcc2197ce	[fix] tensor_split : non-contiguous indices tensor (#63390 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63390 Reviewed By: ejguan Differential Revision: D30362649 Pulled By: mruberry fbshipit-source-id: 3ea3ad02199e4345beb0b580d056babd56112309	2021-08-18 16:10:17 -07:00
Sangbaek Park	1f4e019d8e	[Vulkan] Fix incorrect input range for Hardshrink tests (#63515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63515 Fixed inappropriate input range for Hardshrink tests: The range -10 ~ +10 for input tensors is more proper when we use the test set of lambda {-4.2, -1.0, -0.42, 0.0, 0.42, 1.0, 4.2, 42.42}. ghstack-source-id: 136141416 Test Plan: ```build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" ``` Note that the test can fail sporadically due to the precision loss by FP16(Vulkan)/FP32(CPU). This issue will be handled separately after some design discussions. Reviewed By: SS-JIA Differential Revision: D30389646 fbshipit-source-id: 7224bd8ba4e4972f5fc147df8a0cb84808f8c62e	2021-08-18 15:52:12 -07:00
Rong Rong (AI Infra)	15eec8e1d1	using PR number instead of IN_PULL_REQUEST (#63360 ) Summary: PR numbers should be available on GHA after this. This fixes some target determinator not working issue discovered when manually running: https://github.com/pytorch/pytorch/issues/63412. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63360 Reviewed By: malfet, zhouzhuojie, seemethere Differential Revision: D30374615 Pulled By: walterddr fbshipit-source-id: eee8d8bb7aa4308a6a50cfdcd4423a96d846777f	2021-08-18 15:05:10 -07:00
Mike Iovine	779a3d47b0	[Static Runtime] Benchmark reports native nodes (#63346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63346 We have seen that we can get significant perf wins essentially for free by implementing native ops for ops that we cannot write out variants for (e.g. TupleUnpack D30306955 (`078b8004a6`), append D30326461 (`9d9e7a8d72`)). Therefore, whether or not SR is using a native implementation is valuable information. By capturing this in the benchmarking suite, we can hopefully avoid wasting time profiling/manually inspecting `native_ops.cpp` Reviewed By: hlu1 Differential Revision: D30346752 fbshipit-source-id: 205b090513b6a5a6ce4cb92f75ab0395b15d08f9	2021-08-18 15:05:08 -07:00
Mostafa Elhoushi	139413078f	[FX] make ASTReriter patch wrapped functions properly (#62987 ) Summary: reference the same global namespace (instead of copying it) in ASTRewriter to patch wrapped functions properly Fixes #{62071} Pull Request resolved: https://github.com/pytorch/pytorch/pull/62987 Test Plan: To test it you may write this snippet and ensure the results are as shown in the comments: ``` import torch import torch.fx torch.fx.wrap def to_be_wrapped(x): return torch.relu(x) class Foo(torch.nn.Module): def forward(self, x): return to_be_wrapped(x) traced = torch.fx.symbolic_trace(Foo()) print(traced.graph) """ graph(): %x : [#users=1] = placeholder[target=x] %to_be_wrapped : [#users=1] = call_function[target=__main__.to_be_wrapped](args = (%x,), kwargs = {}) return to_be_wrapped """ from torch.fx.experimental.rewriter import RewritingTracer rt = RewritingTracer() graph = rt.trace(Foo()) print(graph) """ ### AFTER FIX (CORRECT): graph(): %x : [#users=1] = placeholder[target=x] %to_be_wrapped : [#users=1] = call_function[target=__main__.to_be_wrapped](args = (%x,), kwargs = {}) return to_be_wrapped ### BEFORE FIX (WRONG): graph(): %x : [#users=1] = placeholder[target=x] %relu : [#users=1] = call_function[target=torch.relu](args = (%x,), kwargs = {}) return relu """ ``` Reviewed By: ansley Differential Revision: D30396176 Pulled By: mostafaelhoushi fbshipit-source-id: f61eddf32e9ef42b5f5c3ce21d559945214ee833	2021-08-18 15:03:57 -07:00
Dhruv Matani	9bbf80969e	[PyTorch] Avoid using std::regex for device string parsing in Device.cpp (#63464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63464 This was previously committed as D30281388 (`4d6f98ecad`), but was reverted due to t98478641. jnkwok1 confirmed that this change was not the root cause, so trying to land it again. Currently, `std::regex` is used for parsing device strings. This is undesirable for a few reasons. 1. Increases binary size 2. Slows down model loading 3. Potentially uses more memory at runtime 4. Takes marginally longer time to build code that uses std::regex v/s not using std::regex This change avoids the use of `std::regex` for parsing the device string since we don't need to. ghstack-source-id: 136006963 ghstack-source-id: 136081898 Test Plan: ### AI Bench Runs Before this change: 1. Model Load time: [252ms](https://www.internalfb.com/intern/aibench/details/332471502816548) 2. Model unload time: 3.5ms After this change: 1. Model Load time: [240ms](https://www.internalfb.com/intern/aibench/details/652195589031318), which is an approx 5% reduction for the current model. I suspect percentage wise, it will be larger for smaller models since this is a fixed cost reduction. 2. Model unload time: 3.3ms (probably too small to be meaningfully impactful to an end user). ### BSB Results ``` D30281388 (`4d6f98ecad`)-V1 (https://www.internalfb.com/intern/diff/D30281388 (`4d6f98ecad`)/?dest_number=135713848) messenger-pika-optimized-device: Succeeded Change in Download Size for arm64 + 3x assets variation: -7.1 KiB Change in Uncompressed Size for arm64 + 3x assets variation: -17.6 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:551399955987465@base/bsb:551399955987465@diff/ ``` Reviewed By: raziel, pavithranrao Differential Revision: D30388269 fbshipit-source-id: 10942e7aa56f9ea47aa479a8f50187f2ce2899bf	2021-08-18 14:55:12 -07:00
Mikhail Zolotukhin	7fdba4564a	[TensorExpr] IRSimplifier: sort terms in polynomials, terms, minterms, maxterms. (#63197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63197 This solves non-determinism from using hash values in sort methods. Changes in tests are mostly mechanical. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30292776 Pulled By: ZolotukhinM fbshipit-source-id: 74f57b53c3afc9d4be45715fd74781271373e055	2021-08-18 14:49:27 -07:00
Mikhail Zolotukhin	8bdd542417	[TensorExpr] Add debug logging to LoopNest::computeInline. (#63196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63196 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30292778 Pulled By: ZolotukhinM fbshipit-source-id: d8a111b75466a9354f6d048119cc6f814c9d5abb	2021-08-18 14:48:05 -07:00
Michael Dagitses	feba6806c9	clarify that `torch.finfo.tiny` is the smallest normal number (#63241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63241 This is a common source of confusion, but it matches the NumPy behavior. Fixes #44010 Fixes #59526 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30307646 Pulled By: dagitses fbshipit-source-id: d848140ba267560387d83f3e7acba8c3cdc53d82	2021-08-18 13:44:52 -07:00
Alexander Grund	9253dc1e58	Fix segmentation fault due to access to destroyed CudaIPCGlobalEntities instance (#56141 ) Summary: There is an instance of the static destruction order fiasco where cuda_ipc_global_entities may be accessed after it is destroyed. See https://github.com/pytorch/pytorch/issues/51961 This change uses a flag and avoids accesses to the destroyed class when it is set to false. Fixes https://github.com/pytorch/pytorch/issues/51961 This removes the function to clear shared_blocks introduced by https://github.com/pytorch/pytorch/issues/53080 which had multiple issues: Unprotected access to a shared structure and modification of the vector which is being cleared by the destructors of the objects contained. I.e. what happened was: - `CudaIPCSentDataLimbo_.clear_shared_blocks();` is called from the destructor of CudaIPCGlobalEntities as of your PR - This deletes instances of `CudaIPCSentData` which hold `at::DataPtr` created by `GetNewRefCountedSentData` - This means `CudaIPCSentDataDelete` is called with still active pointers - Hence `CudaIPCSentDataLimbo_.add` is called adding a new value to `shared_blocks_` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56141 Reviewed By: ejguan Differential Revision: D30397279 Pulled By: VitalyFedyunin fbshipit-source-id: ce4b8b90fa1c90d275e5eca93ba84321cbc6140a	2021-08-18 13:38:55 -07:00
Charles David Hernandez	877e6f2be3	Bugfix for fuse qconfig comparison (#63384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63384 In some cases the changes to qconfig on module would cause the fusions to fail. This bugfix solves that problem by adding a qconfig_function_comparison that compares the functions within the qconfig rather than the modules the qconfigs are on. The comparison looks at the partial object within QConfig.activation/weight.p and compares args, keywords and func. This is necessary to do mannually because partial doesn't have __eq__ implemented and so == reverts to is. Test Plan: python test/test_quantization.py TestFuseFx.test_problematic_fuse_example Imported from OSS Reviewed By: supriyar, ejguan Differential Revision: D30386264 fbshipit-source-id: 51e358c021c39d6f48dc12ad2a82b2838677b9de	2021-08-18 13:31:56 -07:00
BowenBao	2aa19f33c6	[ONNX] Fix for batchnorm training op mode (#52758 ) (#62760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62760 * Rebase # Conflicts: # torch/csrc/jit/passes/onnx/eval_peephole.cpp # Conflicts: # test/onnx/test_utility_funs.py # torch/onnx/symbolic_opset9.py * Update symbolic_opset12.py * Update test.sh # Conflicts: # .jenkins/caffe2/test.sh * Merge * Fix utility tests # Conflicts: # test/onnx/test_pytorch_onnx_onnxruntime.py # test/onnx/test_utility_funs.py * Fix for comment * Enable BN tests * Fix for test * Update test_pytorch_onnx_onnxruntime.py * Update test_pytorch_onnx_onnxruntime.py * Update test_utility_funs.py * Update test_pytorch_onnx_onnxruntime.py Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30349060 Pulled By: msaroufim fbshipit-source-id: 93312c17607974731c17099ae181acb6e4c1c409	2021-08-18 13:29:07 -07:00
BowenBao	e182401062	[ONNX] Remove aten parameter (#61652 ) (#62759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62759 * remove aten argument in export() * add export_to_pretty_string default value OperatorExportTypes.ONNX * add DPYTORCH_ONNX_CAFFE2_BUNDLE description Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30349062 Pulled By: msaroufim fbshipit-source-id: d9738f3aa8b80eac54548d0b9494f9f1e544f20f Co-authored-by: Gary Miguel <garymiguel@microsoft.com>	2021-08-18 13:29:04 -07:00
BowenBao	3a7bbf5fb7	[ONNX] Add support for opset14 in PT-ONNX exporter (#59486 ) (#62758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62758 * Add initial changes for opset14 * Fixed flake * Add onnx submodule changes and removed utility func tests * Add updated batchNorm symbolic * Add triu/tril symbolics * Fix lint * Fixed test failures * Add reshape with allowzero * Added tests/refactored opset versioning * Bump onnxruntime version * Fix clang/lint failures * Add reshape shape inference for opset 14 * Changes for allowzero * Fix lint/clang and test failures * Updated PR * Flake fixes * Fix flake * Remove new_jit_api tests * Add opset14 models * Update allowzero * Fix test failures Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30349063 Pulled By: msaroufim fbshipit-source-id: 54724246149b01a2f627c43d7396253a7e9c9eb9 Co-authored-by: Shubham Bhokare <sbhokare@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-08-18 13:29:01 -07:00
BowenBao	99b154b8be	[ONNX] Support lstm_cell symbolic (#61476 ) (#62757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62757 Support lstm_cell symbolic Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30349061 Pulled By: msaroufim fbshipit-source-id: f236177e3e5c62a30b7e4d91a623bcaef21b5eb1 Co-authored-by: jiafatom <jiafa@microsoft.com>	2021-08-18 13:27:46 -07:00
James Reed	d661e646ad	[FX] Fix GraphModule deepcopy to use deepcopied graph (#63090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63090 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D30252471 Pulled By: jamesr66a fbshipit-source-id: cafd7d7917935a5ea6ffa2a7fe9e9b2a9578b3e3	2021-08-18 13:17:14 -07:00
Basil Hosmer	11fbd3958c	MaybeOwned page for dev wiki (#63450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63450 Brief guide to understanding `MaybeOwned<Tensor>`, aimed at C++ PT devs who are obliged to interact with existing uses of it, rather than encouraging new usage. For reviewers: I haven't yet added a link to this page from anywhere. I'm thinking the right place is the [dev wiki main page C++ section](https://github.com/pytorch/pytorch/wiki#c) but happy to put it wherever makes sense, suggestions welcome. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30402313 Pulled By: bhosmer fbshipit-source-id: 69b15909ecafcd8d88e44f664f88c3ad4eb26d84	2021-08-18 12:08:58 -07:00
peterjc123	9bb1371cc2	Disable RDYNAMIC check with MSVC (#62949 ) Summary: When testing with clang-cl, the flag is added though it is unsupported and that generates a few warnings. Tried a few alternatives like https://cmake.org/cmake/help/latest/module/CheckLinkerFlag.html, but they just don't work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62949 Reviewed By: zhouzhuojie, driazati Differential Revision: D30359206 Pulled By: malfet fbshipit-source-id: 1bd27ad5772fe6757fa8c3a4bddf904f88d70b7b	2021-08-18 11:51:23 -07:00
Michael Dagitses	d4593d9d08	document why wrappers exist in `torch.functional` (#62847 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62844. These wrappers are not super obvious, but ultimately stem from the lack of support for functions with variadic args in native_functions.yaml. https://github.com/pytorch/pytorch/issues/62845 tracks that issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62847 Reviewed By: VitalyFedyunin Differential Revision: D30305016 Pulled By: dagitses fbshipit-source-id: 716fcecb0417b770bc92cfd8c54f7ead89070896	2021-08-18 11:51:21 -07:00
Rohan Varma	f0f5cffde9	[DDP] Add a debug check in cpp fp16 compress (#63379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63379 this codepath has been prone to bugs as seen in the below diff, this will help ensure against changes/refactors that touch this, as a basic sanity check. Enabled it in debug-only builds to not affect the perf. ghstack-source-id: 136056093 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30358440 fbshipit-source-id: e1b3893a223722c2593ceed8696a09c7d07d47c1	2021-08-18 11:51:19 -07:00
Rohan Varma	ac1ece054b	[DDP][Grad compression] Fix fp16 cpp hook (#63375 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63375 I think tensor.copy_(tensor.to(torch::kFloat16)); will keep it as float32. Tested by add the following line: ``` LOG(INFO) << "Type is: " << compressed_tensor.scalar_type(); ``` before: ``` I0816 17:03:09.823688 364141 default_comm_hooks.cpp:21] Type is: Float ``` after: ``` I0816 17:01:16.779052 353924 default_comm_hooks.cpp:21] Type is: Half ``` ghstack-source-id: 136056092 Test Plan: ci Reviewed By: SciPioneer Differential Revision: D30356256 fbshipit-source-id: 8208a705acd7628541cd43c8bf61d007dfdd2435	2021-08-18 11:49:35 -07:00
Stas Bekman	4e1d84ae8f	[doc] pre-commit fix instructions (#61717 ) Summary: fix invalid instruction Pull Request resolved: https://github.com/pytorch/pytorch/pull/61717 Reviewed By: zhouzhuojie, driazati Differential Revision: D30359218 Pulled By: malfet fbshipit-source-id: 61771babeac4d34425a61ce49f38a7099b521eec	2021-08-18 11:42:25 -07:00
Heitor Schueroff	50a3b6a6a8	Make SkipInfo with expected_failure an XFAIL (#63481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63481 This PR changes the SkipInfo decorators to use unittest.expectedFailure so that the test reports as XFAIL as opposed to PASSED. Note that changing the expectedFailure here `30e1c74dc1/torch/testing/_internal/common_device_type.py (L879)` to an XFAIL is not possible because the decision of whether to decorate is delayed until the wrapper function is called. fixes https://github.com/pytorch/pytorch/issues/63363 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30397154 Pulled By: heitorschueroff fbshipit-source-id: c5e4911969ad8667763eec4203dbbc6a51178592	2021-08-18 11:36:18 -07:00
soulitzer	2f615f6313	Improve custom function docs (#60312 ) Summary: - Adds some code examples for `ctx` methods and make requirements of arguments more clear - Type annotations for `save_for_backward`, `mark_dirty`, `mark_non_differentiable`, and `set_materialize_grads` (BC-breaking?) - Refactor `torch.autograd.Function` doc Pull Request resolved: https://github.com/pytorch/pytorch/pull/60312 Reviewed By: VitalyFedyunin Differential Revision: D30314961 Pulled By: soulitzer fbshipit-source-id: a284314b65662e26390417bd2b6b12cd85e68dc8	2021-08-18 11:31:31 -07:00
Pritam Damania	d565a7bd68	[6/N] Enable opt-asan for elastic and launcher tests. (#63442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63442 Continuation of https://github.com/pytorch/pytorch/pull/62051, I've enabled elastic and launcher tests to run in opt-asan mode which is supported with spawn multiprocessing. This allows us to completely get rid of fork based tests from torch.distributed and have all tests run in spawn mode. ghstack-source-id: 136057123 Test Plan: waitforbuildbot Reviewed By: cbalioglu Differential Revision: D30384267 fbshipit-source-id: ad3447cfb9d6e31e7ec8332d64c8ff1054858dcb	2021-08-18 10:48:49 -07:00
Shirong Wu	af3cbfed95	Add validation check in fx2trt interpreter (#63424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63424 Add validation check in fx2trt for missing converter operators. If any op missing, interpreter init will report missing operators. Test Plan: for call_function and call_method: manual test with feeds benchmark and verify init failed with expected message. {F642390780} for call_module: specify a module as leaf node and make acc_tracer trace it as a node; then in fx2trt.py, in CONVERTER initialize stage make it skip recording all modules; initialize interpreter and call validator function, verify the output includes the missing module name, return value print as screenshot below. {F643458718} Reviewed By: 842974287 Differential Revision: D30294832 fbshipit-source-id: 243dca3fdfc6a174ded65248938e2a234aec19c6	2021-08-18 10:41:10 -07:00
John Shen	7df2324120	[pytorch] Make qconv forward() thread safe (#63432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63432 There's a race condition in quantized models when multiple threads call forward() due to qnnpack packing the weights the first time the operator is called. This locks the entire apply_impl function. Test Plan: https://github.com/pytorch/pytorch/issues/58055 Ran the script before and after, original crashes went away Reviewed By: kimishpatel Differential Revision: D30229520 fbshipit-source-id: d06cabe24199a80325cd57f24a7fd60624be2cf7	2021-08-18 10:37:13 -07:00
Masaki Kozuki	565578cdab	Use `fastAtomicAdd` in EmbeddingBag (mode "max") backward (#63298 ) Summary: Rel: https://github.com/pytorch/pytorch/issues/62695 ### This PR \| n_tokens \| num_embeddings \| embedding_dim \| mode \| bwd_fp32 \| bwd_fp16 \| \|-----------:\|-----------------:\|----------------:\|:-------\|------------:\|------------:\| \| 4096 \| 4096 \| 4096 \| max \| 0.000326228 \| 0.000181448 \| \| 4096 \| 4096 \| 16384 \| max \| 0.00102805 \| 0.000618136 \| \| 4096 \| 16384 \| 4096 \| max \| 0.000907326 \| 0.000530422 \| \| 4096 \| 16384 \| 16384 \| max \| 0.00334988 \| 0.00264645 \| \| 16384 \| 4096 \| 4096 \| max \| 0.000366449 \| 0.000320232 \| \| 16384 \| 4096 \| 16384 \| max \| 0.00126421 \| 0.00104183 \| \| 16384 \| 16384 \| 4096 \| max \| 0.00087738 \| 0.00065068 \| \| 16384 \| 16384 \| 16384 \| max \| 0.00379229 \| 0.00298201 \| ### Original \| n_tokens \| num_embeddings \| embedding_dim \| mode \| bwd_fp32 \| bwd_fp16 \| \|-----------:\|-----------------:\|----------------:\|:-------\|------------:\|------------:\| \| 4096 \| 4096 \| 4096 \| max \| 0.00032407 \| 0.000188231 \| \| 4096 \| 4096 \| 16384 \| max \| 0.00104356 \| 0.000624001 \| \| 4096 \| 16384 \| 4096 \| max \| 0.000902069 \| 0.000527382 \| \| 4096 \| 16384 \| 16384 \| max \| 0.00302202 \| 0.00255153 \| \| 16384 \| 4096 \| 4096 \| max \| 0.000384343 \| 0.000403249 \| \| 16384 \| 4096 \| 16384 \| max \| 0.00126445 \| 0.00135069 \| \| 16384 \| 16384 \| 4096 \| max \| 0.000880814 \| 0.000825679 \| \| 16384 \| 16384 \| 16384 \| max \| 0.00337611 \| 0.00319515 \| cc xwang233 ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/63298 Reviewed By: mruberry Differential Revision: D30383583 Pulled By: ngimel fbshipit-source-id: 14dd9d67002c53a153721812709033c198f68c1e	2021-08-18 10:14:40 -07:00
Rishi Puri	e2ddaec5cf	Reverting launch bounds change in topK that induced a regression in perf (#63431 ) Summary: [topkwsyncs.zip](https://github.com/pytorch/pytorch/files/7003077/topkwsyncs.zip) Running this script on nvidia containers 21.08 vs 21.07 we see the following perf drops: topk(input=(dtype=torch.float16,shape=[60, 201600]), k=2000, dim=1, sorted=True) - 0.63 topk(input=(dtype=torch.float32,shape=[120000]), k=12000, dim=0, sorted=False) - 0.55 topk(input=(dtype=torch.float16,shape=[5, 201600]), k=2000, dim=1, sorted=True) - 0.55 topk(input=(dtype=torch.float32,shape=[1, 10000]), k=1000, dim=1, sorted=False) - 0.33 The relative perf drop is reported as (21.08_time - 21.07_time) / 21.07_time I narrowed down the source of the regression to this commit: https://github.com/pytorch/pytorch/pull/60314 which reduced launch bounds from 1024 to 512. The perf did not seem to regress in the original evidence provided to change 1024 to 512 due to the input shapes in the benchmark being a lot smaller than the input shapes of the tensors which I am witnessing perf regression in. I suggest reverting back to 1024 as with 512 there was no considerable improvement in perf for small inputs and a major regression in perf for large tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63431 Reviewed By: mruberry Differential Revision: D30384087 Pulled By: ngimel fbshipit-source-id: 11eecbba82a069b1d4579d674c3f644ab8060ad2	2021-08-18 09:44:07 -07:00
Erjia Guan	383a33a0eb	Make DataChunk support list in-place ops (#63422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63422 Fixes #63095 Make `DataChunk` delegate to list method. Then it will support in-place operations: - `sort` - `reverse` - `append` - `extend` - `random.shuffle` Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30379027 Pulled By: ejguan fbshipit-source-id: d176bd0cc8b89b915c7bb184ff243ab1f605616d	2021-08-18 08:48:47 -07:00
cyy	93582e3bba	A tiny fix in MT19937RNGEngine (#63219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63219 Reviewed By: VitalyFedyunin Differential Revision: D30341484 Pulled By: ezyang fbshipit-source-id: 0ff4499d0f4a3dfeb991c0f10fe3248c6ca1c992	2021-08-18 08:05:23 -07:00
Edward Yang	c508433617	Implement subclass priority for __torch_dispatch__ (#63411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63411 In order to get this behavior, you have to use append_overloaded, which I forgot to use in the previous implementation. I exposed an internal helper function which is more appropriate for dispatch to Python where we know that an argument is definitely a Tensor (and this test no longer needs to be done). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D30374489 Pulled By: ezyang fbshipit-source-id: 43b08c00d1958c9b26d82a025d19f0b67bb85590	2021-08-18 07:49:03 -07:00
Jerry Zhang	061b36e2f5	[fx2trt] Add dequantize support (#63448 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63448 Only available after TensorRT 8.0 Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_dequantize Reviewed By: 842974287 Differential Revision: D30296863 fbshipit-source-id: 44b9630ef0d210e7f20e650dc81c519f7e41f5f3	2021-08-18 07:44:17 -07:00
Philip Meier	a00d587849	add `OpInfo` for `torch.linalg.tensorinv` (#62326 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53739. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62326 Reviewed By: H-Huang Differential Revision: D30136376 Pulled By: zou3519 fbshipit-source-id: 04ec9450e8866667649af401c7559b96ddc91491	2021-08-18 07:37:34 -07:00
JackCaoG	30e1c74dc1	Update cuda amp to also check xla device (#63413 ) Summary: Fixes https://github.com/pytorch/xla/issues/3086. Pytorch/XLA:GPU also use cuda amp. I verified the pt/xla `test_autocast` with this fix and all test passed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63413 Reviewed By: ngimel Differential Revision: D30380785 Pulled By: bdhirsh fbshipit-source-id: fd1a1de7d224c616fc3fa90b80a688a21f6b1ecc	2021-08-18 06:44:10 -07:00
CodemodService FBSourceClangFormatLinterBot	4a390a56c4	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D30391472 fbshipit-source-id: d4eb1e7debea8905e7fee5f026c082bee65e78f3	2021-08-18 04:20:05 -07:00
Michael Dagitses	2b303f3f31	enhance comparison tests for c10::optional (#62887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62887 Reviewed By: VitalyFedyunin Differential Revision: D30305044 Pulled By: dagitses fbshipit-source-id: d0a3a9e4ea186915ef087543aaf81a606f943380	2021-08-18 04:08:05 -07:00
Michael Dagitses	0f2f6a79cb	clarify the documentation of `torch.meshgrid` (#62977 ) Summary: Also warn about the behavior differences from `numpy.meshgrid`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62977 Reviewed By: mruberry, ngimel Differential Revision: D30220930 Pulled By: dagitses fbshipit-source-id: ae6587b41792721cae2135376c58121b4634e296	2021-08-18 04:01:22 -07:00
Pritam Damania	f8a84a80cd	[5/N] Run opt-asan with detect_leaks=0 (#63361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63361 Python multiprocessing doesn't support LSAN and causes false positives instead. As a result, disabling LSAN for these tests so that we can still run with opt-asan ghstack-source-id: 135962489 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D30352269 fbshipit-source-id: f6ab5abce7bdef00cd5e1f5977424d2b151174af	2021-08-18 01:59:56 -07:00
Wanchao Liang	d431c77d76	[sharded_tensor] fix typing issue for placement (#63426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63426 placement should either be a string or a _remote_device, this fixes the type to match the behaviors ghstack-source-id: 136041125 Reviewed By: pritamdamania87 Differential Revision: D30379702 fbshipit-source-id: 34e226494240923b433e3a39cc08c84d42cdad6b	2021-08-17 23:11:48 -07:00
Pavithran Ramachandran	2fd14735d6	[easy][PyTorchEdge] print error message when failing to load model file (#63404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63404 # Context Loading a model file using `fopen` might error out for multiple reasons. Repro'ing the error on devices takes some time and efforts. Logging the error no# will help in debugging and fixing the error quickly. # Mitigation Printout the error message of the `fopen` to help users debug the issue. Test Plan: ``` (base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource] buck run xplat/caffe2/fb/lite_predictor:lite_predictor -- --model=/home/pavithran/models/prod/GAaNhAoTIV6cIvgJAHn30m8NR1QgbmQwAAAA.ptl --use_bundled_input=0 Building: finished in 0.5 sec (100%) 354/354 jobs, 0/354 updated Total time: 0.6 sec Run with 24 threads Run with 24 threads Loading model... terminate called after throwing an instance of 'c10::Error' what(): open file failed because of errno 2 on fopen: No such file or directory, file path: /home/pavithran/models/prod/GAaNhAoTIV6cIvgJAHn30m8NR1QgbmQwAAAA.ptl Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:15 (most recent call first): (no backtrace available) ``` Reviewed By: dhruvbird Differential Revision: D30372308 fbshipit-source-id: 5346e828f53f6bc5d871b403586566a3332a389a	2021-08-17 22:27:49 -07:00
Jerry Zhang	15144ade25	[fx2trt] Add quantize_per_tensor support (#63447 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63447 Only available in TRT 8.0 and above Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_quantize_per_tensor Reviewed By: 842974287 Differential Revision: D30322844 fbshipit-source-id: dfd925e3432de128f2925b1aa55d6125e63359af	2021-08-17 21:37:26 -07:00
Shen Li	3fd8e09102	Fix RPC Python User Function Error Handling (#63406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63406 The `RemoteException` will be thrown on the caller side when converting the response message to IValue. Since it is a Python error, the error message needs to be extracted explicitly and clear the `PyErr`. Test Plan: Imported from OSS Reviewed By: rohan-varma, ngimel Differential Revision: D30372741 Pulled By: mrshenli fbshipit-source-id: 1f72a7ee0c39cc2ef070f99884c142f7b3e0543d	2021-08-17 20:14:03 -07:00
Aliaksandr Ivanou	f12f667e12	[torch] Set default log level for torch elastic (#63214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63214 The default log level in fb and oss is different: in oss we use WARNING and in fb we use INFO. Test Plan: unittests, f291441502 Reviewed By: cbalioglu Differential Revision: D30296298 fbshipit-source-id: 89067352be767255fbc66e790ec333582de64c6c	2021-08-17 19:58:13 -07:00
Rohan Varma	dcf90b797c	[BE] remove _SUPPORTED_OPTIM_MAP from tests (#63383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63383 Per title ghstack-source-id: 135966157 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30358921 fbshipit-source-id: 965e054e525194b1ee55980340df275bab355c9b	2021-08-17 17:17:25 -07:00
Rohan Varma	5b8862abf1	[DDP] Support step_param for AdamW (#63382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63382 Per title ghstack-source-id: 135966156 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30255446 fbshipit-source-id: e6ffbf339db0bc5b4702d02b74a462309df07c75	2021-08-17 17:16:11 -07:00
Jerry Zhang	cd5e9dcc1d	[quant][graphmode][fx][fix] Fix quantization for tuple arguments (#63376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63376 Previously when tuple is an argument for a quantizable op it would be transformed to a list by mistake, this PR fixes that. Test Plan: python test/test_quantization.py TestQuantizeFx.test_preserve_tuple Imported from OSS Reviewed By: raghuramank100 Differential Revision: D30357642 fbshipit-source-id: 82d10805d9c00c003cc99983dca68b6455ff7b2e	2021-08-17 17:01:24 -07:00
zhouzhuojie	975542c314	Add more ciflow labels for more workflows (#63410 ) Summary: - Add more ciflow labels and enable it for more workflows. - Only the 'ciflow/default' workflows are run by default on pull_request time - Other labels can be manually triggered by (adding the labels + unassign pytorchbot), OR wait for pytorchbot's comment opt-in rollout - The label design is a logical operator `OR`, i.e. adding ('ciflow/cuda' + 'ciflow/win') will trigger the union of them. (design feedback is needed here) Typical default workflows for normal PRs. <details> <summary>Generated label rules</summary> ![image](https://user-images.githubusercontent.com/658840/129779905-eb5e56dd-a696-4040-9eb6-71ecb6487dc1.png) ``` { "label_rules": { "ciflow/all": [ "libtorch-linux-xenial-cuda10.2-py3.6-gcc7", "libtorch-linux-xenial-cuda11.1-py3.6-gcc7", "linux-bionic-cuda10.2-py3.9-gcc7", "linux-bionic-py3.8-gcc9-coverage", "linux-xenial-cuda10.2-py3.6-gcc7", "linux-xenial-cuda11.1-py3.6-gcc7", "linux-xenial-py3.6-gcc5.4", "linux-xenial-py3.6-gcc7-bazel-test", "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7", "periodic-linux-xenial-cuda11.3-py3.6-gcc7", "periodic-win-vs2019-cuda11.3-py3", "win-vs2019-cpu-py3", "win-vs2019-cuda10.1-py3", "win-vs2019-cuda11.1-py3" ], "ciflow/bazel": [ "linux-xenial-py3.6-gcc7-bazel-test" ], "ciflow/coverage": [ "linux-bionic-py3.8-gcc9-coverage" ], "ciflow/cpu": [ "linux-bionic-py3.8-gcc9-coverage", "linux-xenial-py3.6-gcc5.4", "linux-xenial-py3.6-gcc7-bazel-test", "win-vs2019-cpu-py3" ], "ciflow/cuda": [ "libtorch-linux-xenial-cuda10.2-py3.6-gcc7", "libtorch-linux-xenial-cuda11.1-py3.6-gcc7", "linux-bionic-cuda10.2-py3.9-gcc7", "linux-xenial-cuda10.2-py3.6-gcc7", "linux-xenial-cuda11.1-py3.6-gcc7", "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7", "periodic-linux-xenial-cuda11.3-py3.6-gcc7", "periodic-win-vs2019-cuda11.3-py3", "win-vs2019-cuda10.1-py3", "win-vs2019-cuda11.1-py3" ], "ciflow/default": [ "linux-bionic-py3.8-gcc9-coverage", "linux-xenial-cuda11.1-py3.6-gcc7", "linux-xenial-py3.6-gcc5.4", "linux-xenial-py3.6-gcc7-bazel-test", "win-vs2019-cpu-py3", "win-vs2019-cuda10.1-py3" ], "ciflow/libtorch": [ "libtorch-linux-xenial-cuda10.2-py3.6-gcc7", "libtorch-linux-xenial-cuda11.1-py3.6-gcc7", "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7" ], "ciflow/linux": [ "libtorch-linux-xenial-cuda10.2-py3.6-gcc7", "libtorch-linux-xenial-cuda11.1-py3.6-gcc7", "linux-bionic-cuda10.2-py3.9-gcc7", "linux-bionic-py3.8-gcc9-coverage", "linux-xenial-cuda10.2-py3.6-gcc7", "linux-xenial-cuda11.1-py3.6-gcc7", "linux-xenial-py3.6-gcc5.4", "linux-xenial-py3.6-gcc7-bazel-test", "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7", "periodic-linux-xenial-cuda11.3-py3.6-gcc7" ], "ciflow/scheduled": [ "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7", "periodic-linux-xenial-cuda11.3-py3.6-gcc7", "periodic-win-vs2019-cuda11.3-py3" ], "ciflow/slow": [ "linux-bionic-cuda10.2-py3.9-gcc7", "linux-xenial-cuda10.2-py3.6-gcc7" ], "ciflow/win": [ "periodic-win-vs2019-cuda11.3-py3", "win-vs2019-cpu-py3", "win-vs2019-cuda10.1-py3", "win-vs2019-cuda11.1-py3" ] }, "version": "v1" } ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63410 Reviewed By: ngimel Differential Revision: D30378553 Pulled By: zhouzhuojie fbshipit-source-id: 4e0953740793e5e72b95018f8ab2ce4a6a364c38	2021-08-17 17:00:09 -07:00
Masaki Kozuki	da87d648b3	`F.avg_pool3` CUDA backward: gpuAtomicAddNoReturn -> fastAtomicAdd (#63387 ) Summary: Rel: https://github.com/pytorch/pytorch/issues/62695 In the following two tables, I set `kernel_size` to 3 and `stride` to 2. In benchmark, input tensors have the shape of (N, C, n_features, n_features, n_features). Tested on RTX3080 w/ CUDA11.4 Update 1. ## This PR \| N \| C \| n_features \| dtype \| time \| \|----:\|----:\|-------------:\|:--------------\|------------:\| \| 32 \| 3 \| 8 \| torch.float16 \| 7.46846e-05 \| \| 32 \| 3 \| 8 \| torch.float32 \| 8.18968e-05 \| \| 32 \| 3 \| 32 \| torch.float16 \| 0.000156748 \| \| 32 \| 3 \| 32 \| torch.float32 \| 0.000165236 \| \| 32 \| 3 \| 128 \| torch.float16 \| 0.00549854 \| \| 32 \| 3 \| 128 \| torch.float32 \| 0.008926 \| ## master (6acd87f) \| N \| C \| n_features \| dtype \| time \| \|----:\|----:\|-------------:\|:--------------\|------------:\| \| 32 \| 3 \| 8 \| torch.float16 \| 7.60436e-05 \| \| 32 \| 3 \| 8 \| torch.float32 \| 7.55072e-05 \| \| 32 \| 3 \| 32 \| torch.float16 \| 0.000189292 \| \| 32 \| 3 \| 32 \| torch.float32 \| 0.000168645 \| \| 32 \| 3 \| 128 \| torch.float16 \| 0.00699538 \| \| 32 \| 3 \| 128 \| torch.float32 \| 0.00890226 \| master's time divided by PR's time is as follows: \| N \| C \| n_features \| master / PR \| \|---:\|---:\|---------------:\|----------------:\| \| 32 \| 3 \| 8 \| 1.018 \| \| 32 \| 3 \| 32 \| 1.208 \| \| 32 \| 3 \| 128 \| 1.272\| cc: xwang233 ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/63387 Reviewed By: mruberry Differential Revision: D30381434 Pulled By: ngimel fbshipit-source-id: 3b97aee4b0d457a0277a0d31ac56d4151134c099	2021-08-17 16:53:13 -07:00
Nikita Shulga	6e5d065b2b	Add pocketfft as submodule (#62841 ) Summary: Using https://github.com/mreineck/pocketfft Also delete explicit installation of pocketfft during the build as it will be available via submodule Limit PocketFFT support to cmake-3.10 or newer, as `set_source_files_properties` does not seem to work as expected with cmake-3.5 Partially addresses https://github.com/pytorch/pytorch/issues/62821 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62841 Reviewed By: seemethere Differential Revision: D30140441 Pulled By: malfet fbshipit-source-id: d1a1cf1b43375321f5ec5b3d0b538f58082f7825	2021-08-17 15:29:56 -07:00
Rohan Varma	078dcc4e97	[wip] Move smallest bucket to end after rebuild buckets (#62279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62279 Before rebuild buckets, `kDefaultFirstBucketBytes` is actually misleading because we reverse the parameter indices when initialize reducer so it is actually the size of the last bucket. Currently rebuild buckets sets this to be the first bucket size, but seeing if keeping it as last can help perf. This is currently experimental only and don't plan to land it unless experiments show a clear win. ghstack-source-id: 135966897 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29927931 fbshipit-source-id: 55b949986fa2c3bade6fcb4bf5b513461bf0f490	2021-08-17 15:04:50 -07:00
Kevin Tse	e0e2796fa9	adding a note to the documentation of polar (#63259 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63259 Fix #52919 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D30342536 Pulled By: NivekT fbshipit-source-id: 4c61a86f96a6370cc64652bf652c4ae25c9f4601	2021-08-17 14:48:32 -07:00
Jerry Zhang	bcddc71f26	[quant][graphmode][fx][bc-breaking] Support for reference pattern for fixqparam ops in eval mode (#62608 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62608 Insert extra fixeqparam fake quant in the output of fixed qparam ops in fbgemm e.g. sigmoid so that we can produce reference patterns for these ops Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: iramazanli Differential Revision: D30053978 fbshipit-source-id: c527944b6e791bb4d45ebe96265af52794203695	2021-08-17 14:42:40 -07:00
Dhruv Matani	9cd24e12a1	Revert D30281388: [PyTorch] Avoid using std::regex for device string parsing in Device.cpp Test Plan: revert-hammer Differential Revision: D30281388 (`4d6f98ecad`) Original commit changeset: 4d998e9f313e fbshipit-source-id: 11134b3400cc3e851155c9c1b6fb59308ff1567b	2021-08-17 14:40:27 -07:00
Richard Zou	495e7e4815	Fix zero-dim handling in torch.matmul (#63359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63359 Fixes #63352. The problem was that in e.g. `torch.matmul(A, B)` with A, B having shapes [3, 2, 0] and [0, 2], the code attempts to call `A.view(-1, 0)` which fails due to "-1 being ambiguous". The solution is to manually compute what we want the shape of the view to be. Test Plan: - new tests Reviewed By: ngimel Differential Revision: D30351583 Pulled By: zou3519 fbshipit-source-id: 7625691fe8b85d96a4073409596a932c303e3e8c	2021-08-17 13:44:47 -07:00
Mikhail Zolotukhin	1dc2b52764	[TensorExpr] Add a wrapper for all expr and stmt pointers. (#63195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195 This helps us to later switch from using KernelArena with raw pointers to shared pointers without having to change all our source files at once. The changes are mechanical and should not affect any functionality. With this PR, we're changing the following: * `Add` --> `AddPtr` `new Add(...)` --> `alloc<Add>(...)` * `dynamic_cast<Add>` --> `to<Add>` `static_cast<Add>` --> `static_to<Add>` Due to some complications with args forwarding, some places became more verbose, e.g.: `new Block({})` --> `new Block(std::vector<ExprPtr>())` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30292779 Pulled By: ZolotukhinM fbshipit-source-id: 150301c7d2df56b608b035827b6a9a87f5e2d9e9	2021-08-17 13:44:45 -07:00
Kushashwa Ravi Shrimali	a2db5d34a5	OpInfo fix: `conv_transpose2d` (#63389 ) Summary: Addresses comment: https://github.com/pytorch/pytorch/pull/62882#issuecomment-899679606. cc: mruberry ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/63389 Reviewed By: mruberry Differential Revision: D30377481 Pulled By: ngimel fbshipit-source-id: 0fa21acc3503c259c9b27463e8555247c43d9e2e	2021-08-17 13:42:36 -07:00
Mike Iovine	9d9e7a8d72	[Static Runtime] Implement aten::append (#63350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63350 Add a native implementation for `aten::append`, the list append op. Test Plan: New unit test: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Append` Reviewed By: hlu1 Differential Revision: D30326461 fbshipit-source-id: 0dbdf6cc82e78c7c36db39583256f6b87385e3d3	2021-08-17 13:40:18 -07:00
Ivan Kobzarev	6621df9a6a	[vulkan] Add log_softmax (#63193 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63193 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D30291987 fbshipit-source-id: 89c6560274e5a841e5af249f6963b67ef6826f4c	2021-08-17 13:36:02 -07:00
Supriya Rao	b0396e39f4	[quant][fx] Ensure qconfig works for QAT with multiple modules (#63343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63343 The previous implementation had a bug where we were trying to modify an ordered dict value while iterating through it. This fixes it by creating a copy before modifying it. Test Plan: python test/test_quantization.py TestQuantizeFx.test_qconfig_qat_module_type Imported from OSS Reviewed By: raghuramank100 Differential Revision: D30346116 fbshipit-source-id: 0e33dad1163e8bff3fd363bfd04de8f7114d7a3a	2021-08-17 11:40:51 -07:00
Yi Wang	e000dfcf97	Add return type hint and improve the docstring of consume_prefix_in_state_dict_if_present method (#63388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63388 Context: https://discuss.pytorch.org/t/how-to-use-the-helper-function-consume-prefix-in-state-dict-if-present/129505/3 Make it clear that this method strips the prefix in place rather than returns a new value. Additional reformatting is also applied. ghstack-source-id: 135973393 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D30360931 fbshipit-source-id: 1a0c7967a4c86f729e3c810686c21dec43d1dd7a	2021-08-17 11:30:27 -07:00
Elias Ellison	fcc840eae0	Add handling of ifs to shape propagation (#62914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62914 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30196945 Pulled By: eellison fbshipit-source-id: 1c0c7f938c4547330fd1dba8ab7dd0b99a79b6a9	2021-08-17 11:26:42 -07:00
Elias Ellison	3975c08e5d	Small shape analysis changes (#62911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62911 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D30196946 Pulled By: eellison fbshipit-source-id: 2562bab323088d9c1440ae0431e533f9bcc513d3	2021-08-17 11:26:40 -07:00
Elias Ellison	e2227e86e4	Add a few peepholes (#62910 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62910 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30196947 Pulled By: eellison fbshipit-source-id: d88c92616d4de4f47ff4fcf5c1994e629ca20395	2021-08-17 11:26:38 -07:00
Elias Ellison	9a60759453	Propagate symbolic dimensions through idioms like x.view(y.size()) (#61975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61975 Propagate symbolic dimensions through size calls. We did this by associating SymbolicSizes with integer inputs by looking through their constructors for `x.size(1)` or `x.size()` nodes. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30196948 Pulled By: eellison fbshipit-source-id: 377fc1d2f6d396c52dc0e87fa814b15720f1414e	2021-08-17 11:25:22 -07:00
Jerry Zhang	60cadd0bd1	[fx2trt] Refactor linear op to use mm + add Summary: Previously linear is translated to fully_connected which only works when weight is a constant, this diff changes that to mm + add so that the weight can be an ITensor so that we can have the weight - quantize - dequantize pattern in the produced TensorRT network Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_linear Reviewed By: 842974287 Differential Revision: D30294751 fbshipit-source-id: 596fbd4c81caef8df41a002a2e14fbf22d9d2a80	2021-08-17 10:52:28 -07:00
Mike Ruberry	517aa8965a	Updates set_default_dtype documentation (#63233 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60560. The description of set_default_dtype is updated to clarify that it affects the interpretation of Python numbers as either float32 (complex64) or float64 (complex128) and that default (floating) dtypes other than float32 or float64 are unsupported. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63233 Reviewed By: VitalyFedyunin Differential Revision: D30306396 Pulled By: mruberry fbshipit-source-id: bbee62f323c773b23b2fa45cb99122bc28197432	2021-08-17 10:41:03 -07:00
Amy He	63554cfb3d	Remove backend_debug from torch_core srcs and replace with library dependency (#63111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63111 ### Problem: Buck contains at least two libraries which have `backend_debug_info.cpp` as a source, `torch_core` and `backend_interface_lib`. `backend_debug_info.cpp` registers BackendDebugInfo as a class. If targets contain both libraries (e.g. sparkAR debug build with NNAPI delegation), then BackendDebugInfo is registered twice, causing a runtime error. ### Solution: These changes remove `backend_debug_info.cpp` and `backend_interface.cpp` as a source in `torch_core` and adds backend_interface_lib as a dependency instead. build_variables.bzl: - Added a list that excludes `backend_debug_info.cpp` and `backend_interface.cpp` ( both srcs already included by `backend_interface_lib`) buck: - torch_core: Removed `backend_debug_info.cpp` from srcs and added `backend_interface_lib` deps - backend_interface_lib: Replaced `torch_mobile_core` dep with more specific deps - to avoid an indirect dep between `torch_core` and `torch_mobile_core` ghstack-source-id: 135981061 Test Plan: ### Test Plan: Build and run SparkAR internally with Android NNAPI Delegation (`buck build --show-output arstudioplayer_arm64_debug`) and internal tests. Reviewed By: iseeyuan Differential Revision: D30259034 fbshipit-source-id: 0c14c827732f07fb9b9bd25a999828b51793cdcc	2021-08-17 10:33:35 -07:00
Amy He	3aecec609f	Move Android Nnapi srcs from aten_native_cpu to aten_cpu (#62919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62919 Move Android NNAPI srcs (nnapi_bind.cpp, nnapi_wrapper.cpp, nnapi_model_loader.cpp) from aten_native_cpu to aten_cpu, so that later the NNAPI delegate's execution library can depend on it. aten_native_cpu is built selectively per app, but the srcs have no selective components and are required for the NNAPI delegate library in D30259033. See Buck Dependencies: https://docs.google.com/document/d/17RuWkqWKCO6sc5fKzIDkGeNhhvMk7BvJOqeSnGsHZ8o/edit?usp=sharing ghstack-source-id: 135981062 Test Plan: `buck build --show-output arstudioplayer_arm64_debug` and internal tests Reviewed By: iseeyuan Differential Revision: D30164867 fbshipit-source-id: 0beff481ff250e75664ce8393beabbeb9db66770	2021-08-17 10:32:30 -07:00
Ivan Kobzarev	c982f13a80	[android][vulkan] Fix model loading for Vulkan backend (#63402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63402 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D30370692 Pulled By: IvanKobzarev fbshipit-source-id: 73311b9b767fe9ed3ae390db59d6aa2c4a98f06d	2021-08-17 10:20:32 -07:00
Peter Bell	f70b9ee5de	Advertise USE_PRECOMPILED_HEADERS in CONTRIBUTING.md (#62827 ) Summary: This option was added in https://github.com/pytorch/pytorch/issues/61940 and fits with this section's theme of improving build times. I've also changed it to a `cmake_dependent_option` instead of `FATAL_ERROR`ing for older CMake versions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62827 Reviewed By: astaff Differential Revision: D30342102 Pulled By: malfet fbshipit-source-id: 3095b44b7085aee8a884ec95cba9f8998d4442e7	2021-08-17 10:14:40 -07:00
Bradley Davis	011fdc3b7e	[fx] persist `tracer_cls` on `fx.Graph` when deep copying (#63353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63353 Custom deepcopy method copies all nodes but does not copy the tracer_cls attribute Reviewed By: houseroad Differential Revision: D30349424 fbshipit-source-id: 3e98bdac8a8a992eb0b4ec67fe80bb2e5cf3884d	2021-08-17 09:57:48 -07:00
Dhruv Matani	4d6f98ecad	[PyTorch] Avoid using std::regex for device string parsing in Device.cpp (#63204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63204 Currently, `std::regex` is used for parsing device strings. This is undesirable for a few reasons. 1. Increases binary size 2. Slows down model loading 3. Potentially uses more memory at runtime 4. Takes marginally longer time to build code that uses std::regex v/s not using std::regex This change avoids the use of `std::regex` for parsing the device string since we don't need to. ghstack-source-id: 136006963 Test Plan: ### AI Bench Runs Before this change: 1. Model Load time: [252ms](https://www.internalfb.com/intern/aibench/details/332471502816548) 2. Model unload time: 3.5ms After this change: 1. Model Load time: [240ms](https://www.internalfb.com/intern/aibench/details/652195589031318), which is an approx 5% reduction for the current model. I suspect percentage wise, it will be larger for smaller models since this is a fixed cost reduction. 2. Model unload time: 3.3ms (probably too small to be meaningfully impactful to an end user). ### BSB Results ``` D30281388-V1 (https://www.internalfb.com/intern/diff/D30281388/?dest_number=135713848) messenger-pika-optimized-device: Succeeded Change in Download Size for arm64 + 3x assets variation: -7.1 KiB Change in Uncompressed Size for arm64 + 3x assets variation: -17.6 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:551399955987465@base/bsb:551399955987465@diff/ ``` Reviewed By: raziel Differential Revision: D30281388 fbshipit-source-id: 4d998e9f313e6366d9d89a6a73cd090ddfb059fc	2021-08-17 09:23:48 -07:00
Dhruv Matani	013a42bdb1	[PyTorch] Add Device_test.cpp (#63203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63203 Currently, `c10::Device` isn't being tested - i.e. there's no test to ensure that the device string parsing works as expected. This diff adds very basic tests to assert that the stuff we expect to work works, and the stuff that we don't expect to work doesn't work. ghstack-source-id: 136006962 Test Plan: New test. Ran as: ``` cd fbsource/fbcode/ buck test //caffe2/c10:c10_test_0 -- -r '.DeviceTest.' ``` Reviewed By: dreiss, raziel Differential Revision: D30286910 fbshipit-source-id: b5699068dcbba89d5d224dbaf74b175f3f785a00	2021-08-17 09:22:35 -07:00
Taylor Robie	336aa9cd85	change with_callable_args to return a fresh _PartialWrapper (#63374 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63326 Currently `get_callable_args` has the side effect of mutating the input _PartialWrapper. When that input is one of the global defaults, there are all sorts of lifetime issues that crop up. (Details in the linked issue.) So far as I can tell, we only need to make a constructor which is module (and by extension device) aware, so making a fresh one should have the same effect without leaking the last call's module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63374 Test Plan: the repro in https://github.com/pytorch/pytorch/issues/63326 now reports no leaked Tensors, and all quantization tests pass locally. Reviewed By: HDCharles Differential Revision: D30359360 Pulled By: robieta fbshipit-source-id: aef33261ac49952d8d90da868a57ab063dfc456e	2021-08-17 09:11:38 -07:00
Victor Quach	7bad9ac78a	Fix flaky test for dp saved tensor hooks (#63324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63324 Fix for https://www.internalfb.com/tasks/?t=98258963 `catch_warnings` seem to only trigger once in certain cases where it should trigger twice. This test is only meant to test whether hooks are trigger / not trigger, so changing it to self.assertGreater is ok. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30340833 Pulled By: Varal7 fbshipit-source-id: 1bfb9437befe9e8ab8f95efe5f513337fa9bdc5c	2021-08-17 08:56:58 -07:00
Erjia Guan	2992d92b5a	Add mode to TarArchiveReader (#63332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63332 Add a corresponding PR from [torchdata](https://github.com/facebookexternal/torchdata/pull/101) Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D30350151 Pulled By: ejguan fbshipit-source-id: bced4a1ee1ce89d4e91e678327342e1c095dbb9e	2021-08-17 07:28:37 -07:00
Michael Dagitses	cae5ddc427	add torch.meshgrid() OpInfo (#62720 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62719 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62720 Reviewed By: astaff Differential Revision: D30344574 Pulled By: dagitses fbshipit-source-id: ed42d9fe20741df98018efb08e640fca370583fb	2021-08-17 04:04:24 -07:00
Mike Ruberry	22f78144c7	Extends warning on norm docs (#63310 ) Summary: torch.norm has a couple documentation issues, like https://github.com/pytorch/pytorch/issues/44552 and https://github.com/pytorch/pytorch/issues/38595, but since it's deprecated this PR simply clarifies that the documentation (and implementation) of torch.norm maybe be incorrect. This should be additional encouragement for users to migrate to torch.linalg.vector_norm and torch.linalg.matrix_norm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63310 Reviewed By: ngimel Differential Revision: D30337997 Pulled By: mruberry fbshipit-source-id: 0fdcc438f36e4ab29e21e0a64709e4f35a2467ba	2021-08-16 22:23:45 -07:00
Peter Bell	ad94248b57	Cleanup dead code (#63328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63328 This code supported the old `at::_fft_with_size` operator which no longer exists. Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D30343557 Pulled By: mruberry fbshipit-source-id: 7a71585e013acb46c98f14fd40e15bdfbf026bac	2021-08-16 22:13:08 -07:00
Peter Bell	877b649bc3	Workaround for cuFFT bug (#63327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63327 Fixes #63152 Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D30343558 Pulled By: mruberry fbshipit-source-id: 68e17a07650f65f397e26efc417e97e2ab302f82	2021-08-16 22:11:52 -07:00
Nikita Shulga	794b04c6c8	Add step to report code coverage from GHA (#63373 ) Summary: Similar to the logic provided in `b2069e7d01/.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml (L197-L201)` Fixes https://github.com/pytorch/pytorch/issues/63366 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63373 Reviewed By: walterddr Differential Revision: D30357737 Pulled By: malfet fbshipit-source-id: 20b115eb4d6412bd9895680308a9097742d2ae7b	2021-08-16 20:42:38 -07:00
Mikhail Zolotukhin	548c717cbd	[TensorExpr] Remove test_train from tensorexpr tests. (#63194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63194 This test implements functionality used nowhere, and the author no longer works on that. This PR also adds test_approx to CMakeLists where it's been missing before. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30292777 Pulled By: ZolotukhinM fbshipit-source-id: ab6d98e729320a16f1b02ea0c69734f5e7fb2554	2021-08-16 20:36:31 -07:00
Don Jang	e7724bb100	[JIT] Set future's error to current exception as is when `--torch_jit_enable_rethrow_caught_exception=true` (#63348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63348 This change addresses singlaiiit's comment on D30241792 (`61b49c8e41`), which makes the JIT interpreter's behavior consistent between `future` is set and not. Test Plan: Enhanced `EnableRethrowCaughtExceptionTest.EnableRethrowCaughtExceptionTestRethrowsCaughtException` to cover the modified code path. Reviewed By: singlaiiit Differential Revision: D30347782 fbshipit-source-id: 79ce57283154ca4372e5341217d942398db21ac8	2021-08-16 17:32:13 -07:00
Don Jang	075024b9a3	[Static Runtime] Fix a bug that assigns multiple outputs to single storage (#63012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63012 This change fixes a bug that the static runtime's memory optimizer assigns multiple outputs of a node to the same storage. Fixing this bug enables the static runtime to run `inline_cvr` with its memory optimizer enabled. A problematic line from `inline_cvr` was as follows: ``` %7767 : Tensor, %getitem_6419.1 : Tensor = fb::gather_ranges(%tensor74.1, %7764) ``` where enabling the memory optimizer assigns `%7767` and `%getitem_6419.1` to the same storage, which made their data corrupted during the 2nd iteration. This change fixed the aforementioned bug by marking all inputs & outputs of a node as `alive` during our liveness analysis. By doing that, no inputs / outputs will collide with each other. I believe this is a fair assumption that most ops' implementation always has, but missing in our analysis before this change. Test Plan: - Added a unittest `StaticRuntime.ValuesShareSameStorageDoesNotContainOutputsFromSameNode` to cover the new code. Reviewed By: hlu1 Differential Revision: D30202018 fbshipit-source-id: 10287a1bee9e86be16a5201e9a7cd7c7f046bab9	2021-08-16 16:52:02 -07:00
Yi Wang	068d6fec5c	[Model Averaging] Add a few member methods of PostLocalSGDOptimizer (#63340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63340 Some methods are needed such as accessing optimizer states. These are necessary for integration with PyTorch Lightning. Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 135912246 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_hook_parity_post_localSGD Reviewed By: rohan-varma Differential Revision: D30328794 fbshipit-source-id: e585b874313bd266fdc7c79936e2af98700c7bad	2021-08-16 16:39:01 -07:00
Hao Lu	aa63c0d9df	[PyPer] Skip printing out per node time when do_profile is on (#63256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63256 This suppresses printing out the per node time which is very long when the net has too many ops. It can be easily turned on by setting `--pt_sr_print_per_node_time=1`. Reviewed By: ajyu, mikeiovine Differential Revision: D30298331 fbshipit-source-id: 32b3f93b3fe19d335654168311fda93331a1e706	2021-08-16 16:32:19 -07:00
Amy He	b2069e7d01	Refactor NnapiCompilation registration into it's own file (#63183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63183 Move registration of NnapiCompilation into it's own file, so that `nnapi_bind.cpp` (which contains the implementation of NnapiCompilation) can be moved to `aten_cpu`, while maintaining the selectiveness for registration. `nnapi_bind.cpp` is moved to `aten_cpu` in https://github.com/pytorch/pytorch/pull/62919. See the PR for more details on why it's needed. ghstack-source-id: 135900318 Test Plan: Nnapi unit tests: `python test/test_nnapi.py` Reviewed By: iseeyuan Differential Revision: D30288708 fbshipit-source-id: 6ed5967fa6bd018075469d18e68f844d413cf265	2021-08-16 15:45:26 -07:00
Richard Zou	da36bbcd35	Add section to CONTRIBUTING.md explaining developer docs (#63228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63228 It is a quick summary and links to a page on the Developer Wiki that has more detail. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30347109 Pulled By: zou3519 fbshipit-source-id: a6242986d275e5279ca3f61ade2294a132d268c4	2021-08-16 15:44:10 -07:00
Eli Uriegas	4982fc4ecf	test: Add ability to set CONTINUE_THROUGH_ERROR (#63357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63357 Adds the ability to set CONTINUE_THROUGH_ERROR as an environment variable so that we can easily set it without having to add the flag directly Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D30351108 Pulled By: seemethere fbshipit-source-id: 767fa9bd24e1399f359eb24d16f6cc985a2d7173	2021-08-16 15:35:40 -07:00
Bo Wang	6acd87fe6a	Add driver function to run test_sharded_tensor.py and test_sharding_spec.py (#63189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63189 Add main --> run_tests func in test file which is needed to launch the real test cases in OSS flow. Test Plan: b/f: $ python test/distributed/_sharding_spec/test_sharding_spec.py --v ==> nothing happened $ python test/distributed/_sharded_tensor/test_sharded_tensor.py --v ==> nothing happened after: $ python test/distributed/_sharding_spec/test_sharding_spec.py --v ==> test_chunked_sharding_spec (__main__.TestShardingSpec) ... ok test_device_placement (__main__.TestShardingSpec) ... ok test_enumerable_sharding_spec (__main__.TestShardingSpec) ... ok $ python test/distributed/_sharded_tensor/test_sharded_tensor.py --v test_complete_world_size (__main__.TestShardedTensorChunked) ... ok test_insufficient_sharding_dims (__main__.TestShardedTensorChunked) ... ok test_invalid_pg_rpc_ranks (__main__.TestShardedTensorChunked) ... [W tensorpipe_agent.cpp:699] RPC agent for worker2 encountered error when reading incoming request from worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) ok test_invalid_sharding (__main__.TestShardedTensorChunked) ... ok test_load_state_dict_errors (__main__.TestShardedTensorChunked) ... ok test_multiple_local_shards (__main__.TestShardedTensorChunked) ... ok test_new_group (__main__.TestShardedTensorChunked) ... ok test_partial_world_size (__main__.TestShardedTensorChunked) ... ok test_sharded_tensor_metadata (__main__.TestShardedTensorChunked) ... ok test_sharded_tensor_sizes (__main__.TestShardedTensorChunked) ... ok test_sharding_columns (__main__.TestShardedTensorChunked) ... ok test_state_dict (__main__.TestShardedTensorChunked) ... ok test_state_dict_new_group (__main__.TestShardedTensorChunked) ... ok test_state_dict_no_sharded_tensors (__main__.TestShardedTensorChunked) ... ok test_grid_sharding (__main__.TestShardedTensorEnumerable) ... ok test_multiple_local_shards (__main__.TestShardedTensorEnumerable) ... ok test_new_group (__main__.TestShardedTensorEnumerable) ... ok test_partial_world_size (__main__.TestShardedTensorEnumerable) ... ok test_sharded_tensor_metadata (__main__.TestShardedTensorEnumerable) ... ok test_uneven_shards (__main__.TestShardedTensorEnumerable) ... ok test_with_rpc_names (__main__.TestShardedTensorEnumerable) ... ok test_init_from_local_shards (__main__.TestShardedTensorFromLocalShards) ... ok test_init_from_local_shards_invalid_shards (__main__.TestShardedTensorFromLocalShards) ... ok test_init_from_local_shards_invalid_shards_gaps (__main__.TestShardedTensorFromLocalShards) ... Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30294094 fbshipit-source-id: 08f0431a12ea854abe00dc920205b10ba43ae6b6	2021-08-16 15:25:32 -07:00
Shiyan Deng	f4f2c1231a	[fx2trt] add unsqueeze converter (#63355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63355 Added converter for acc_ops.unsqueeze. Needed for ig model. DIdn't add support for input that has more than one dynamic dim. This is not needed right now and I feel it would be a rare case. Test Plan: unit test Reviewed By: yinghai Differential Revision: D30138293 fbshipit-source-id: 899fe8eb68387de83195a2f6e199618d96f09a9e	2021-08-16 15:18:43 -07:00
Mike Iovine	078b8004a6	[Static Runtime] Implement prim::TupleUnpack (#63243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63243 Add `prim::TupleUnpack` native op to static runtime. Test Plan: Unit test: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D30306955 fbshipit-source-id: 21923d6cbd5545c144ac051b3d48b37ec6e610cf	2021-08-16 14:56:30 -07:00
Jerry Zhang	a12b371f7c	[fx2trt] Factor out add_matrix_multiply_layer Summary: Factor out the function so that it can be reused in future diffs Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_matmul Reviewed By: 842974287 Differential Revision: D30322823 fbshipit-source-id: 069b945de2c744cdbcca1618b62827692dfb4174	2021-08-16 14:13:37 -07:00
MY_	dc5ce22a1a	A re-open PR: Avoid re-creating the random number generator in RandomSampler (#63026 ) Summary: More details can be found in the old pr: https://github.com/pytorch/pytorch/pull/53085 ejguan Thanks for your guidance. I tried to reopen this PR following your instructions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63026 Reviewed By: anjali411 Differential Revision: D30224920 Pulled By: ejguan fbshipit-source-id: 2fa83bd4a2661485e553447fe3e57ce723f2716d	2021-08-16 14:08:37 -07:00
Nikita Shulga	3f06f29577	Improve pip package determination (#63321 ) Summary: Invoking `pip` or `pip3` yields list of packages invoked for `pip` alias on the path, rather than for the one currently being executed. Changed `get_pip_packages` to use `sys.executable + '-mpip'` Also, add mypy to the list of packages of interest Discovered while looking at https://github.com/pytorch/pytorch/issues/63279 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63321 Reviewed By: walterddr Differential Revision: D30342099 Pulled By: malfet fbshipit-source-id: fc8d17cf2ddcf18236cfde5c1b9edb4e72804ee0	2021-08-16 13:54:39 -07:00
Lucas Kabela	4a59f0b9d9	[Profiler] Change FLOP/s to Total FLOPs (#62779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62779 Change from floating point operations per second to total floating point operations. This requires removing the division by executing time from the Kineto computed FLOPs and updating necessary documentation Test Plan: Running the following script: ``` import torch from torch.profiler import profile import torchvision.models as models model = models.resnet18().eval() inputs = torch.randn(5, 3, 224, 224) with torch.no_grad(): with profile(record_shapes=True, with_flops=True) as prof: model(inputs) print(prof.key_averages().table(sort_by="cpu_time_total")) ``` Before diff results in: {F636640118} And after diff should be about `(27.78 * 10^9) FLOP/s * .652838 seconds =18135839640 FLOP = 18.136 GFLOP`. Running the script again yields this answer: {F636655686} ------------------------------------ Reviewed By: gdankel Differential Revision: D29972997 fbshipit-source-id: 0f8d9f264b7d9f8f6bb3f10ab7c2c9794291e28b	2021-08-16 13:43:32 -07:00
zhouzhuojie	d2e8359971	Fix triage workflow when the card already exists in project (#63347 ) Summary: Fixes issues like https://github.com/pytorch/pytorch/runs/3336787242 ``` RequestError [HttpError]: Validation Failed: {"resource":"ProjectCard","code":"unprocessable","field":"data","message":"Project already has the associated issue"} Error: Unhandled error: HttpError: Validation Failed: {"resource":"ProjectCard","code":"unprocessable","field":"data","message":"Project already has the associated issue"} at /home/runner/work/_actions/actions/github-script/v2/dist/index.js:7531:23 at processTicksAndRejections (internal/process/task_queues.js:93:5) at async eval (eval at callAsyncFunction (/home/runner/work/_actions/actions/github-script/v2/dist/index.js:7985:56), <anonymous>:63:1) at async main (/home/runner/work/_actions/actions/github-script/v2/dist/index.js:8011:20) { name: 'HttpError', status: 422, ... ``` The card may already exist, thus no need to handle `422` status code. Anything else will re-throw the err. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63347 Reviewed By: malfet Differential Revision: D30348529 Pulled By: zhouzhuojie fbshipit-source-id: 36647837bfccad43ce01eb5dfe6642e685615037	2021-08-16 13:33:58 -07:00
kshitij12345	3ce67efea2	[opinfo] nn.functional.pad (#62814 ) Summary: Reference: https://github.com/facebookresearch/functorch/issues/78 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62814 Reviewed By: VitalyFedyunin Differential Revision: D30307492 Pulled By: zou3519 fbshipit-source-id: 4f6062eb4a3c91ed1795df1f82846afa0abafcdc	2021-08-16 13:29:34 -07:00
Sam Estep	1e8de64c66	Add expecttest to requirements.txt (#63320 ) Summary: This PR closes the developer environment gap left by https://github.com/pytorch/pytorch/issues/60658 by adding [expecttest](https://github.com/ezyang/expecttest) to `requirements.txt`. Thus it provides a solution to one of the short-term problems that https://github.com/pytorch/pytorch/issues/60697 tries to solve, but does not provide a long-term solution to https://github.com/pytorch/pytorch/issues/61375. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63320 Reviewed By: malfet Differential Revision: D30340654 Pulled By: samestep fbshipit-source-id: 26c8f8c9889cce4a94fafb1bf2f0d6df4c70503f	2021-08-16 13:22:43 -07:00
kyshel	e75ed4a4b5	add comma to prevent syntax errors (#62492 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/62492 Reviewed By: VitalyFedyunin Differential Revision: D30304684 Pulled By: ezyang fbshipit-source-id: db08ca39bcecbfd79ea50df18536bf4e87f51e15	2021-08-16 12:27:31 -07:00
Bert Maher	0074a099a8	Retry apt-get during setup_ci_workspace (#63319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63319 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D30346067 Pulled By: bertmaher fbshipit-source-id: 2aafa97e78f9297553d772b2524d6f1c0ebaa46e	2021-08-16 12:20:51 -07:00
Nikita Vedeneev	dbcfd7739f	Make `torch.lu` differentiable for wide/tall inputs + jit (#61564 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61564 Reviewed By: astaff Differential Revision: D30338136 Pulled By: mruberry fbshipit-source-id: f01436fc90980544cdfa270feee16bb3dda21b93	2021-08-16 11:40:57 -07:00
Yi Wang	979180cd01	[Model Averaging] Allow subgroup to be None in PostLocalSGDState (#63277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63277 `PostLocalSGDState` requires a subgroup. To initialize this subgroup, a global process group must be initialized. However, this imposes a restriction that a hook state can only be provided after distributed environment initialization, which is not compatible with lightning DDP plugin setup where hook state should be provided before distributed environment initialization. Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 135848575 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_hook_parity_post_localSGD Reviewed By: cbalioglu Differential Revision: D30325041 fbshipit-source-id: 7b870166d096d306c3f2f7c69816a705cec0bebd	2021-08-16 10:07:41 -07:00
Meghan Lele	d5d5f42ea9	Revert "[docs] Update docs for NegativeBinomial (#45693 )" (#63192 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63192 Summary This reverts commit 402caaeba513929dcfe12df183c764b0ef43f688. As per the dicussion in #62178, this commit was not needed. Test Plan Continuous integration. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30293202 Pulled By: SplitInfinity fbshipit-source-id: 91ee7ad0523a9880605d83fe9712c39df67384a8	2021-08-16 09:14:44 -07:00
Erjia Guan	d1cbee7b2b	Refactor BucketBatch (#63185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63185 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D30288893 Pulled By: ejguan fbshipit-source-id: b88b792d12a83c99d8ea9e516e3b4c54a82100f6	2021-08-16 06:42:56 -07:00
Erjia Guan	56d609d93e	Replace str by repr for DataChunk (#63184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63184 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D30288892 Pulled By: ejguan fbshipit-source-id: 45c88fdd3987e234f2c22ebbbfd8d5044983c34c	2021-08-16 06:41:38 -07:00
Raghavan Raman	e50e8b07d8	[nnc] Updated IRMutator and IRSimplifier to perform in-place mutations. (#63246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63246 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30309636 Pulled By: navahgar fbshipit-source-id: 409ea8d6982888cfee9127e6248044dd2ed9d8d4	2021-08-16 00:09:22 -07:00
Supriya Rao	a421cba325	[docs][ao] Add overload information for fake_quantize_per_tensor_affine (#63258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63258 This function supports scalar and tensor qparams Test Plan: CI Imported from OSS Reviewed By: jerryzh168 Differential Revision: D30316432 fbshipit-source-id: 8b2f5582e7e095fdda22c17d178abcbc89a2d1fc	2021-08-15 22:47:05 -07:00
Supriya Rao	0831b59cf5	[docs][ao] Add missing docstrings for quantized_max_pool1d and quantized_max_pool2d (#63242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63242 These functions are part of the native functions namespace as well as the quantized namespace Test Plan: CI Imported from OSS Reviewed By: jerryzh168 Differential Revision: D30316430 fbshipit-source-id: cd9c839e5c1a961e3c6944e514c16fbc256a2f0c	2021-08-15 22:47:03 -07:00
Supriya Rao	a090073fe4	[docs][ao] Add missing documentation for torch.quantized_batch_norm (#63240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63240 Op is exposed via torch.quantized_batch_norm to the end user without any existing documentation Test Plan: CI Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30316431 fbshipit-source-id: bf2dc8b7b6f497cf73528eaa2bedef9f65029d84	2021-08-15 22:45:56 -07:00
Heitor Schueroff	50fc8e8250	[OpInfo] Add expected_failure kwarg to SkipInfo (#62963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62963 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30327199 Pulled By: heitorschueroff fbshipit-source-id: 45231eca11d1697a4449d79849fb17264d128a6b	2021-08-15 18:09:20 -07:00
Heitor Schueroff	8987726cc6	Small refactor for OpInfo decorators (#62713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62713 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30327200 Pulled By: heitorschueroff fbshipit-source-id: 1899293990c8c0a66da88646714b38f1aae9179d	2021-08-15 18:08:12 -07:00
Kimish Patel	3ca3349555	[Pytorch Edge] Fix broken test post changes in error reporting format. (#63287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63287 Recent changes in https://github.com/pytorch/pytorch/pull/62419 changed the way module hierarchy is reported. Now it includes information about function names as well. Test Plan: python test/mobile/test_lite_script_module.py TestLiteScriptModule.test_save_mobile_module_with_debug_info_with_trace Imported from OSS Reviewed By: iseeyuan Differential Revision: D30328512 fbshipit-source-id: ddd6b11b9ab01cc725f4568a35eff7a92f17204b	2021-08-15 16:14:11 -07:00
Ilqar Ramazanli	cec08e7032	To add warm-up scheduler to optim (#60836 ) Summary: Warm up of learning rate scheduling has initially been discussed by Priya et. al. in the paper: https://arxiv.org/pdf/1706.02677.pdf . In the section 2.2 of the paper they discussed and proposed idea of warming up learning schedulers in order to prevent big variance / noise in the learning rate. Then idea has been further discussed in the following papers: * Akilesh Gotmare et al. https://arxiv.org/abs/1810.13243 * Bernstein et al http://proceedings.mlr.press/v80/bernstein18a/bernstein18a.pdf * Liyuan Liu et al: https://arxiv.org/pdf/1908.03265.pdf There are two type of popularly used learning rate warm up ideas * Constant warmup (start with very small constant learning rate) * Linear Warmup ( start with small learning rate and gradually increase) In this PR we are adding warm up as learning rate scheduler. Note that learning rates are chainable, which means that we can merge warmup scheduler with any other learning rate scheduler to make more sophisticated learning rate scheduler. ## Linear Warmup Linear Warmup is multiplying learning rate with pre-defined constant - warmup_factor in the first epoch (epoch 0). Then targeting to increase this multiplication constant to one in warmup_iters many epochs. Hence we can derive the formula at i-th step to have multiplication constant equal to: warmup_factor + (1-warmup_factor) * i / warmup_iters Moreover, the fraction of this quantity at point i to point i-1 will give us 1 + (1.0 - warmup_factor) / [warmup_iterswarmup_factor+(i-1)(1-warmup_factor)] which is used in get_lr() method in our implementation. Below we provide an example how to use linear warmup scheduler and to give an example to show how does it works. ```python import torch from torch.nn import Parameter from torch.optim import SGD from torch.optim.lr_scheduler import WarmUpLR model = [Parameter(torch.randn(2, 2, requires_grad=True))] optimizer = SGD(model, 0.1) scheduler = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=10, warmup_method="linear") for epoch in range(15): print(epoch, scheduler.get_last_lr()[0]) optimizer.step() scheduler.step() ``` ``` 0 0.010000000000000002 1 0.019000000000000003 2 0.028000000000000008 3 0.03700000000000001 4 0.04600000000000001 5 0.055000000000000014 6 0.06400000000000002 7 0.07300000000000002 8 0.08200000000000003 9 0.09100000000000004 10 0.10000000000000005 11 0.10000000000000005 12 0.10000000000000005 13 0.10000000000000005 14 0.10000000000000005 ``` ## Constant Warmup Constant warmup has straightforward idea, to multiply learning rate by warmup_factor until we reach to epoch warmup_factor, then do nothing for following epochs ```python import torch from torch.nn import Parameter from torch.optim import SGD from torch.optim.lr_scheduler import WarmUpLR model = [Parameter(torch.randn(2, 2, requires_grad=True))] optimizer = SGD(model, 0.1) scheduler = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant") for epoch in range(10): print(epoch, scheduler.get_last_lr()[0]) optimizer.step() scheduler.step() ``` ``` 0 0.010000000000000002 1 0.010000000000000002 2 0.010000000000000002 3 0.010000000000000002 4 0.010000000000000002 5 0.10000000000000002 6 0.10000000000000002 7 0.10000000000000002 8 0.10000000000000002 9 0.10000000000000002 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60836 Reviewed By: saketh-are Differential Revision: D29537615 Pulled By: iramazanli fbshipit-source-id: d910946027acc52663b301f9c56ade686e62cb69	2021-08-15 12:31:45 -07:00
Shiyan Deng	8e0998ca70	Move fx2trt and oss_acc_tracer to oss (#63101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63101 Move internal fx2trt to torch/fx/experimental/fx2trt and merge the two TRT interpreter we have right now. cc: mortzur as this might affect uru exporting script. Move oss_acc_tracer to torch/fx/experimental/fx_acc. Test Plan: CI Reviewed By: jerryzh168 Differential Revision: D30257909 fbshipit-source-id: 4e374965fbf88d72e91844d9e9b6ff9b98f467d1	2021-08-15 11:53:36 -07:00
Bert Maher	0ce4d30c44	Hide all symbols in llvm namespace (#63272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63272 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D30331695 Pulled By: bertmaher fbshipit-source-id: d35130c96f7e2a31fa86d9d80de59002e96301df	2021-08-15 11:29:43 -07:00
anjali411	045c4cb82f	Add copy button to code snippets in docs (#63149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63149 Test Plan: Imported from OSS Reviewed By: navahgar, albanD Differential Revision: D30308891 Pulled By: anjali411 fbshipit-source-id: ad51180ab2f27c4525682b2603bbf753bb8f1ce9	2021-08-15 06:25:32 -07:00
Kimish Patel	38c185189c	[Pytorch Edge] Enable kineto profiler on mobile via EdgeKinetoProfiler (#62419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419 This diff adds support for cpu only kineto profiler on mobile. Thus enabling chrome trace generation on mobile. This bring cpp API for mobile profiling on part with Torchscript. This is done via: 1. Utilizating debug handle annotations in KinetoEvent. 2. Adding post processing capability, via callbacks, to KinetoThreadLocalState 3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be used in surrounding scope of model execution. This will write chrome trace to the location specified in profiler constructor. Test Plan: MobileProfiler.ModuleHierarchy Imported from OSS Reviewed By: raziel Differential Revision: D29993660 fbshipit-source-id: 0b44f52f9e9c5f5aff81ebbd9273c254c3c03299	2021-08-13 21:40:19 -07:00
Kimish Patel	77a6436cac	[Pytorch Mobile] Combing instructions and debug hanles in single struct (#62418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62418 Debug handles have one to one correspondence with instruction, so just combine them in one. Test Plan: CI Imported from OSS Reviewed By: raziel Differential Revision: D29993661 fbshipit-source-id: 125c7163174cf66624dd95f110fdc8208fea8a07	2021-08-13 21:40:17 -07:00
Kimish Patel	1b04d99f55	[Pytorch Profiler] Introduce scopes to enableProfiler (#62417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62417 This diff adds an option to make enableProfiler enable callbacks only for certain RecordScopes. Why? Profiling has some overhead when we repeatedly execute callbacks for alls copes. On mobile side when we often have small quantized models this overhead can be large. We observed that by only profiling top level op and skipping profiling of other atend ops called within we can limit this overhead. For example, instead of profling at::conv2d -> at::convolution -> at::convolution_ and further more if ops like transpose etc. are called, skipping profiling of those. Of course this limits the visibility, but at the least this way we get a choice. Test Plan: Imported from OSS Reviewed By: ilia-cher Differential Revision: D29993659 fbshipit-source-id: 852d3ae7822f0d94dc6e507bd4019b60d488ef69	2021-08-13 21:40:15 -07:00
Kimish Patel	b00afe135d	[Pytorch Profiler] Add debug_handles to KinetoEvent (#62228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62228 This diff adds debug handles to events and provides a way to use RECORD_FUNCTIONs that will pass debug_handles down to profiler, which will record it in the events. Why add debug_handles? For pytorch mobile, with lite interpreter, we generate debug handles that can be used for lazily symbolicate exception traces to model level stack trace. Similar to the model level stack trace you get in TorchScript models. The debug_handles also enable getting module hierarchy for lite interpreter model, support for which was added to KinetoProfiler in previous diffs. Followup plan: 1. Enabled scope callbacks such that lite interpreter can use it to profiler only top level ops. 2. Enable post processing callbacks that take KinetoEvents and populate module hierarchy using debug handles. This will let us use KinetoProfiler for lite interpter use cases on mobile. Aim is to use RAII guard to similarly generate chrome trace for mobile usecases as well, although only for top level ops. Test Plan: test_misc : RecordDebugHandles.Basic Imported from OSS Reviewed By: ilia-cher Differential Revision: D29935899 fbshipit-source-id: 4f06dc411b6b5fe0ffaebdd26d3274c96f8f389b	2021-08-13 21:40:14 -07:00
Kimish Patel	44b12ba862	[Pytorch Profiler] Move start timestamp to end of start callback (#62191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62191 This moves start timestamping to end of callback. This way we dont account for callstack/module hierarchy related overhead in op runtime. Test Plan: CI Imported from OSS Reviewed By: ilia-cher Differential Revision: D29910519 fbshipit-source-id: f462031a81ae12b3db7993cf482e5ad93a35e096	2021-08-13 21:40:12 -07:00
Kimish Patel	54f2eb6e7e	[Pytorch Profiler] Add support for adding module hierarchy to (#61792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61792 KinetoEvent This PR adds module hierarchy information to events. What is module hierarchy information attached to events? During profiling a TorchScript module, when events are added, we ask JIT what is the module hierarchy associated with the node being executed. At the time of execution of that node, there might be multiple frames in the stack of interpreter. For each frame, we find corresponding node and the corresponding module hierarchy is queried. Module hierarchy corresponding to the node is associated with node's InlinedCallStack. InlinedCallStack of node tracks the path via which the node is inlined. Thus during the inlining process we annotate module information corresponding to the CallMethod nodes being inlined. With this PR, chrome trace will contain additional metadata: "Module Hierarchy". This can look like this: TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward It contains module instance, type name and the method name in the callstack. Test Plan: test_profiler Imported from OSS Reviewed By: raziel, ilia-cher Differential Revision: D29745442 fbshipit-source-id: dc8dfaf7c5b8ab256ff0b2ef1e5ec265ca366528	2021-08-13 21:39:10 -07:00
leslie-fang-intel	385b082854	add substract of max and testcase (#63132 ) Summary: As discussed here https://github.com/pytorch/pytorch/pull/62897, in the path of BF16/non-last-dim Softmax, we miss the subtractions of max value which will cause the overflow in the `exp()` calculation when the value of input tensor is large, such as `1000.0`. To avoid this issue, we add the subtractions of max value and the corresponding test cases in this PR. Note w/o subtractions of max value(accidental reverts or changes), we will get the underlying error message of the test case ``` AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.05 and atol=0.05, found 103984 element(s) (out of 126720) whose difference(s) exceeded the margin of error (including 103984 nan comparisons). The greatest difference was nan (0.0 vs. nan), which occurred at index (0, 0, 0, 1). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63132 Reviewed By: VitalyFedyunin Differential Revision: D30280792 Pulled By: cpuhrsch fbshipit-source-id: 722821debf983bbb4fec878975fa8a4da0d1d866	2021-08-13 20:50:49 -07:00
Kushashwa Ravi Shrimali	baedb559e3	OpInfo: `nn.functional.conv_transpose2d` (#62882 ) Summary: See https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261. cc: mruberry zou3519 Chillee Pull Request resolved: https://github.com/pytorch/pytorch/pull/62882 Reviewed By: bdhirsh Differential Revision: D30280804 Pulled By: zou3519 fbshipit-source-id: e40cdf43e98c1f11e45df6b8bc13110b4d29c45f	2021-08-13 17:11:23 -07:00
Kefei Lu	f8e217a17e	refactor fx2trt example script so it can be imported as a library (#63262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63262 Just create a `__main__` guard. Test Plan: run linter, sandcastle tests Reviewed By: 842974287 Differential Revision: D30263617 fbshipit-source-id: 8044ce5d815b043c3778591384cb13d9a89d0048	2021-08-13 16:59:29 -07:00
Hanton Yang	3f43a8b9a3	[iOS] Add `LibTorch-Lite-Nightly` pod (#63239 ) Summary: D30090760 (`e182b459d9`) was reverted by D30303292 because of a lint issue in `LibTorch-Lite-Nightly.podspec.template`. Resubmit the diff after fixing the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63239 Test Plan: Imported from OSS Reviewed By: xta0 Differential Revision: D30315690 Pulled By: hanton fbshipit-source-id: f0fa719ffc3b8181ab28c123584ae5c1da8992c0	2021-08-13 16:21:41 -07:00
Sameer Deshmukh	809e1e7457	Allow TransformerEncoder and TransformerDecoder to accept 0-dim batch sized tensors. (#62800 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. This PR allows TransformerEncoder and Decoder (alongwith the inner `Layer` classes) to accept inputs with 0-dimensional batch sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62800 Reviewed By: VitalyFedyunin Differential Revision: D30303240 Pulled By: jbschlosser fbshipit-source-id: 8f8082a6f2a9f9d7ce0b22a942d286d5db62bd12	2021-08-13 16:11:57 -07:00
Pruthvi Madugundu	ab7a472980	[ROCm] Update HIP_VERSION to TORCH_HIP_VERSION (#62786 ) Summary: - HIP_VERSION semantic versioning will change in ROCm4.3. The changes essentially remove the dependency on HIP_VERSION provided in the hip header to keep code compatible with older and newer versions of ROCm. - TORCH_HIP_VERSION is derived from HIP_VERSION_MAJOR and HIP_VERSION_MINOR Pull Request resolved: https://github.com/pytorch/pytorch/pull/62786 Reviewed By: bdhirsh Differential Revision: D30281682 Pulled By: seemethere fbshipit-source-id: e41e69fb9e13de5ddd1af99ba5bbdcbb7b64b673	2021-08-13 15:00:43 -07:00
Can Balioglu	e711b5ce6c	Respect user-set CMAKE_PREFIX_PATH (#61904 ) Summary: Fixes the case where the `CMAKE_PREFIX_PATH` variable gets silently overwritten by a user specified environment variable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61904 Reviewed By: walterddr, malfet Differential Revision: D29792014 Pulled By: cbalioglu fbshipit-source-id: babacc8d5a1490bff1e14247850cc00c6ba9e6be	2021-08-13 13:49:05 -07:00
gmagogsfm	90a96e0642	Remove left-over print in test_diff_graph_inline_threshold (#63231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63231 Reviewed By: VitalyFedyunin Differential Revision: D30305851 Pulled By: gmagogsfm fbshipit-source-id: 43da3b5f49ad4a6a2d6d174acf792f3ccf41a463	2021-08-13 13:11:27 -07:00
Tanvir Zaman	cc6b023cba	Add CostInferenceFunction for SplitOp (#63133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63133 SplitOp is costly but missing cost inference function which hurts cost based balancing. Changes are: (1) Addition of CostInferenceFunction for SplitOp (2) Small fix in CostInferenceFunction for ConcatOp Test Plan: Added unit tests: buck test //caffe2/caffe2/python/operator_test:split_op_cost_test buck test //caffe2/caffe2/python/operator_test:concat_op_cost_test Reviewed By: smacke Differential Revision: D30247360 fbshipit-source-id: 989e962f3a981acc85b73aac3fb23e603b7d1591	2021-08-13 12:28:15 -07:00
Meghan Lele	acdad8bc63	[docs] Merge note block in `torch.lu` documentation (#63156 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63156 Summary This commit merges the four successive `Note` blocks that appear in the documentation for `torch.lu`. Each one only has one line in it, so all of them have been merged into one block with a bulleted list that contains the original items. Test Plan Continuous integration. Before <img width="888" alt="Captura de Pantalla 2021-08-12 a la(s) 10 48 39 a m" src="https://user-images.githubusercontent.com/4392003/129244443-b7d1594e-8833-4c20-a911-e1bf7ca88a8d.png"> After <img width="932" alt="Captura de Pantalla 2021-08-12 a la(s) 10 48 46 a m" src="https://user-images.githubusercontent.com/4392003/129244462-1f39dcdb-90e0-4fd9-a95f-343b0b6be1f1.png"> Fixes This commit fixes #62339. Test Plan: Imported from OSS Reviewed By: navahgar, pbelevich Differential Revision: D30292633 Pulled By: SplitInfinity fbshipit-source-id: cb9071165629bfe7316b1d2fe952e4354c75d48f	2021-08-13 12:11:35 -07:00
Meghan Lele	e5c32cdde7	[docs] Remove `input` parameter from `Tensor.flatten` docs (#63180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63180 Summary This commit removes the `input` parameter from the signature for `Tensor.flatten` shown in its documentation. This parameter is accepted by `torch.flatten` but not `Tensor.flatten` (since the input is the `Tensor` on which `flatten` is invoked). Test Plan Continuous integration. Fixes This commit fixes #57478. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30293156 Pulled By: SplitInfinity fbshipit-source-id: 4ad70d638af009fb6bdeb703433b306904d39a76	2021-08-13 12:10:16 -07:00
Meghan Lele	548fe682e2	[docs] Add cross references to `torch.transpose` and `torch.t` (#63177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63177 Summary This commit adds a link in the documentation for `torch.transpose` that directs to `torch.t` and vice versa. These two functions are related and it is useful for users of one to know about the other. Test Plan Continuous integration. Fixes This commit fixes #56267. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30292654 Pulled By: SplitInfinity fbshipit-source-id: 8e60cd7a598ff8b4756cb30141399dfe8e118338	2021-08-13 11:51:55 -07:00
Meghan Lele	7107c367b5	[docs] Mention `vsplit`, `hsplit` and `tensor_split` in Tensor views doc (#63191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63191 Summary This commit adds `vsplit`, `hsplit` and `tensor_split` to the list of view ops on the Tensor Views documentation page. Test Plan Continuous integration. Before <img width="195" alt="Captura de Pantalla 2021-08-12 a la(s) 2 55 07 p m" src="https://user-images.githubusercontent.com/4392003/129275921-c1cfdf6c-9f1f-45f3-98b6-1de7a0f0cc84.png"> After <img width="197" alt="Captura de Pantalla 2021-08-12 a la(s) 2 55 15 p m" src="https://user-images.githubusercontent.com/4392003/129275936-de4afde7-0143-4e1d-b38f-c86256f4896c.png"> Fixes This commit fixes #62727. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30293181 Pulled By: SplitInfinity fbshipit-source-id: 283783a4ccc3ebc50cb0a427e55c7a6cb618ffd7	2021-08-13 11:44:38 -07:00
Sameer Deshmukh	38a825c648	Allow Average Pooling modules to accept tensors with 0-dim batch sizes. (#62025 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. It introduces changes and tests for allowing the Average Pooling layers to accept tensors with 0 sized batch dimensions and return meaningful results. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62025 Reviewed By: VitalyFedyunin Differential Revision: D30303256 Pulled By: jbschlosser fbshipit-source-id: 5f727e62a7c58d2b8bb49fcc3bd7688474917ba5	2021-08-13 11:31:17 -07:00
zhouzhuojie	de7ae9e9b6	[skip ci] fix workflow code generation (#63235 ) Summary: Fixes a clean git check with code generation introduced by https://github.com/pytorch/pytorch/pull/63148 `generated-win-vs2019-cuda10-py3.yml` was renamed as `generated-win-vs2019-cuda10.1-py3.yml` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63235 Reviewed By: VitalyFedyunin Differential Revision: D30306474 Pulled By: zhouzhuojie fbshipit-source-id: cbae1ace064e360e8ca0c0e997116bdb20d54d46	2021-08-13 10:38:30 -07:00
Mike Iovine	000e3a0881	[Static Runtime] Add pass to eliminate __getitem__/DictConstruct calls (#62429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62429 Introduce a new pass to eliminate calls to `prim::DictConstruct/aten::__getitem__`. Given a graph like this: ``` %2 : Dict = prim::DictConstruct(%key, %value) %3 : Tensor = aten::__getitem__(%2, %key) %4 : Tensor = op(%3) ``` This pass produces a graph like this (after dead code elimination): ``` %4 : Tensor = op(%value) ``` This optimization is applied in the static runtime. Test Plan: `buck test //caffe2/test:jit -- TestPeephole` local.forward performance summary About 3% runtime benefit. All `DictConstruct` calls optimized out, `__getitem__` calls reduced significantly (~50% of them are cut out) P438354810 local_request_only.forward performance summary About 14% runtime benefit. Again, all `DictConstruct` calls optimized out, 50% `__getitem__` calls removed. P438359742 There is some variance with runtime measurements, so take these numbers with a grain of salt. Also note that the benefit does not exist in the shrunk model since there are no `DictConstruct` calls Reviewed By: hlu1 Differential Revision: D29995087 fbshipit-source-id: f376376a46ff808115afd2d60446e5db8f6f752f	2021-08-13 10:21:16 -07:00
Kushashwa Ravi Shrimali	fcc1f87b6a	Fixing user inputs for low, high in `make_tensor` (#61108 ) Summary: TODOs: * [x] Do not clamp inputs for low and high when given and valid. * [x] Devise rules for modifying `low` and `high` when extremals/invalid values passed. * [x] Testing with `test_references_numerics_hard` with the revised changes. _(I've tested locally, the changes will take place in a separate PR though after offline discussion with mruberry)_ * [x] Revise comments/documentation for `make_tensor` See https://github.com/pytorch/pytorch/issues/61758 for tracker issue. cc: mruberry pmeier Pull Request resolved: https://github.com/pytorch/pytorch/pull/61108 Reviewed By: VitalyFedyunin Differential Revision: D30296167 Pulled By: mruberry fbshipit-source-id: 67e8d15b173209a9c97ca013231494a5fa99f8c7	2021-08-13 10:13:12 -07:00
Natalia Gimelshein	720a7a0d81	[hackathon] fix benchmarking script in CONTRIBUTING (#63199 ) Summary: [skip ci] Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/63199 Reviewed By: mruberry Differential Revision: D30305487 Pulled By: ngimel fbshipit-source-id: 2704c4f08ab976a55c9f8c2fe54cd4f3f39412cf	2021-08-13 09:50:48 -07:00
Andres Suarez	bd9fad25c2	[codemod][lint][caffe2] Extend BLACK coverage Test Plan: Sandcastle Reviewed By: zsol Differential Revision: D30302716 fbshipit-source-id: f9724d4f4d1b8950f581cc2c6c77eedf19b4b6fc	2021-08-13 09:28:10 -07:00
Thomas J. Fan	c5f3ab6982	ENH Adds no_batch_dim to FractionalMaxPool2d (#62490 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62490 Reviewed By: bdhirsh Differential Revision: D30287143 Pulled By: jbschlosser fbshipit-source-id: 1b9dd932157f571adf3aa2c98c3c6b56ece8fa6e	2021-08-13 08:48:40 -07:00
Don Jang	61b49c8e41	[JIT] Add a flag to rethrow caught exception in jit interpreter (#63073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63073 It turned out that it's less than ideal to print out verbose stacktrace in exception messages in high-QPS services (see the related task) with a non-significant failure rate due to the truncation of long stacktrace which results in losing the original exception message thrown from native code. It is actually desirable to retain only the message of the original exception directly thrown from native code in such a usecase. This change adds a new flag `torch_jit_disable_exception_stacktrace` to the pytorch jit interpreter to suppress stacktrace in the messages of exception thrown from the interpreter. Reviewed By: Krovatkin Differential Revision: D30241792 fbshipit-source-id: c340225c69286663cbd857bd31ba6f1736b1ac4c	2021-08-13 08:44:24 -07:00
Yukio Siraichi	32b6104f37	Port `norm` kernel to structured kernels. (#62711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62711 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D30109866 Pulled By: ezyang fbshipit-source-id: 894c9496894d059c7690a174b75bbd4db7ed6016	2021-08-13 08:27:48 -07:00
Yukio Siraichi	07bb6e4fd0	Port `prod` kernel to structured kernels. (#62024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62024 Tracking issue: #55070 In this PR, I also broke down the meta functions of other reduction kernels (e.g. `all`, `argmax`, `sum`) into the composition of common patterns. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29847122 Pulled By: ezyang fbshipit-source-id: a6680a6cf6e59bb46b8ffe7bf2a3a611d6e0fd14	2021-08-13 08:27:46 -07:00
Yukio Siraichi	1280363bad	Port `mean` kernel to structured kernels. (#61643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61643 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29783866 Pulled By: ezyang fbshipit-source-id: dc95baf593096c03fb5f292ee6c36de3cc7f2b35	2021-08-13 08:26:01 -07:00
Andrew Gu	2d75703c6a	Remove req to call step() in training loop (#63164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63164 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30284616 Pulled By: andwgu fbshipit-source-id: afdb677fb08851b139178a9f6d782196f26773e1	2021-08-13 08:22:44 -07:00
Andrew Gu	28f9e108b1	Pass `_allow_empty_param_list` into func opt ctor (#63163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63163 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30284615 Pulled By: andwgu fbshipit-source-id: 4857f5b618ec5b007648737ab532ce605e5d70dc	2021-08-13 08:22:42 -07:00
Andrew Gu	bd81c9178a	Simplify data structures, add uniform approximation, fix mem leak (#63162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63162 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30284617 Pulled By: andwgu fbshipit-source-id: 9bd9e5f89abcc0d3dac56b85d55cc88e843baa9f	2021-08-13 08:20:59 -07:00
Supriya Rao	75f198d48d	[docs][ao] update quantize_per_tensor to mention overloads (#63165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63165 Add details about the overloads for * list of tensors input * supporting tensor scale/zero-point inputs Test Plan: CI Imported from OSS Reviewed By: bdhirsh Differential Revision: D30291045 fbshipit-source-id: 9fc6418792c5e3a35417eeb8d31de4a4bfcbb7a5	2021-08-13 08:00:10 -07:00
Victor Quach	5abeac3ef7	Make saved tensors default hooks thread local (#62909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62909 This PR makes saved tensors default hooks thread local. This allows using default hooks in a multithreaded context. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30165416 Pulled By: Varal7 fbshipit-source-id: 10a7d580661d3d94bdaf398c4e076b7bea11c16b	2021-08-13 07:49:20 -07:00
Sameer Deshmukh	cb23976f9f	Allow 0-dim batch sizes for AdaptiveMaxPool and MaxPool. (#62088 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. This PR allows `MaxPool` and `AdaptiveMaxPool` to accept tensors whose batch size is 0. Some changes have been made to modernize the tests so that they will show the name of C++ function that throws an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62088 Reviewed By: bdhirsh Differential Revision: D30281285 Pulled By: jbschlosser fbshipit-source-id: 52bffc67bfe45a78e11e4706b62cce1469eba1b9	2021-08-13 07:33:17 -07:00
AspenStars	72bc6dc8c3	DOC Improve documentation for LayerNorm (#63144 ) Summary: In this [commit](`7026995f3c`) and [issue](https://github.com/pytorch/pytorch/pull/59178#issuecomment-897485295), the [Line 134](`47e286d024/torch/nn/modules/normalization.py (L134)`) will overwrite the "embedding" variate which would cause an error when initiating `nn.LayerNorm` function. I suggest renaming the "embedding" in [Line 133](`47e286d024/torch/nn/modules/normalization.py (L133)`) to "embedding_dim". The final example is: ``` batch, sentence_length, embedding_dim = 20, 5, 10 embedding = torch.randn(batch, sentence_length, embedding_dim) layer_norm = nn.LayerNorm(embedding_dim) ``` Fixes #{59178} Pull Request resolved: https://github.com/pytorch/pytorch/pull/63144 Reviewed By: bdhirsh Differential Revision: D30288778 Pulled By: jbschlosser fbshipit-source-id: e74b11430e302dae5661bf6e830ee5ac6c1838c4	2021-08-13 07:04:40 -07:00
Alban Desmaison	aa665e1ab8	Revert D30090760: [iOS] Add podspec for libTorch-lite nightly build Test Plan: revert-hammer Differential Revision: D30090760 (`e182b459d9`) Original commit changeset: 361aa2ed24a1 fbshipit-source-id: 9c0dfee80a80eb012b142d3928204d6eb8025b0a	2021-08-13 06:45:43 -07:00
Kushashwa Ravi Shrimali	dcb5eb8d9b	OpInfo for `torch.nn.functional.normalize` (#62635 ) Summary: See https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261 cc: mruberry zou3519 Chillee Pull Request resolved: https://github.com/pytorch/pytorch/pull/62635 Reviewed By: H-Huang Differential Revision: D30136503 Pulled By: zou3519 fbshipit-source-id: 258c069f30d9c2a51ed27dadf94f3703b9432a4a	2021-08-13 06:36:50 -07:00
Nikita Vedeneev	741accb11e	Implements backward for `torch.lu_solve` (#61681 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22620 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61681 Reviewed By: ngimel Differential Revision: D30063116 Pulled By: mruberry fbshipit-source-id: e095b0cadfb7c8b37a7ef91bae5b5dc170d8ef1c	2021-08-12 21:17:11 -07:00
Charles David Hernandez	126ff6222e	Moving getattr_from_fqn to torch.quantization.utils (#63107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63107 moving this function because the functionality would be useful outside of ns ghstack-source-id: 135727260 Test Plan: buck test //caffe2/test:quantization_fx mode/dev-nosan --keep-going --config client.id=nuclide --show-full-output -- suite Reviewed By: supriyar Differential Revision: D30260735 fbshipit-source-id: 58deabdd0f3b03b0ee7ee92be0548a0945084d65	2021-08-12 20:59:01 -07:00
Thomas J. Fan	07b00fc324	ENH Migrate nll_loss2d from THC to ATen (#62826 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24608 Fixes https://github.com/pytorch/pytorch/issues/24607 With the following benchmark, the backward pass runs a little slower. This is strange since the implementation should be exactly the same. <details> <summary>Benchmark script</summary> ```python from itertools import product import torch import torch.nn as nn import torch.nn.functional as F import time torch.manual_seed(0) MS_PER_SECOND = 1000 def _time(): torch.cuda.synchronize() return time.perf_counter() * MS_PER_SECOND device = "cuda" C = 3 n_runs = 30 reductions = ["none", "sum", "mean"] Ns = [128, 256, 512] Hs = [128, 256, 512] for reduction, N, H in product(reductions, Ns, Hs): total_fwd_time = 0 total_back_time = 0 if reduction == "none": grad_out = torch.randn(N, H, H, device=device) else: grad_out = torch.randn(1)[0] for _ in range(n_runs): input = torch.randn(N, C, H, H, device=device, requires_grad=True) target = torch.rand(N, H, H, device=device).mul(3).floor().long() # forward start = _time() result = F.nll_loss(input, target, reduction=reduction) total_fwd_time += _time() - start result = F.nll_loss(input, target, reduction=reduction) for _ in range(n_runs): # backward start = _time() result.backward(grad_out, retain_graph=True) total_back_time += _time() - start fwd_avg = total_fwd_time / n_runs bwd_avg = total_back_time / n_runs print( f"input size({N}, {C}, {H}, {H}), reduction: {reduction}, fwd: {fwd_avg:.2f} (ms), back: {bwd_avg:.2f} (ms)" ) ``` </details> <details> <summary>master results</summary> ``` input size(128, 3, 128, 128), reduction: none, fwd: 0.34 (ms), back: 0.57 (ms) input size(128, 3, 256, 256), reduction: none, fwd: 2.56 (ms), back: 3.85 (ms) input size(128, 3, 512, 512), reduction: none, fwd: 14.54 (ms), back: 16.62 (ms) input size(256, 3, 128, 128), reduction: none, fwd: 1.26 (ms), back: 1.78 (ms) input size(256, 3, 256, 256), reduction: none, fwd: 7.07 (ms), back: 8.22 (ms) input size(256, 3, 512, 512), reduction: none, fwd: 29.38 (ms), back: 33.29 (ms) input size(512, 3, 128, 128), reduction: none, fwd: 3.41 (ms), back: 4.05 (ms) input size(512, 3, 256, 256), reduction: none, fwd: 14.32 (ms), back: 16.46 (ms) input size(512, 3, 512, 512), reduction: none, fwd: 59.20 (ms), back: 66.68 (ms) input size(128, 3, 128, 128), reduction: sum, fwd: 0.08 (ms), back: 0.21 (ms) input size(128, 3, 256, 256), reduction: sum, fwd: 0.21 (ms), back: 0.73 (ms) input size(128, 3, 512, 512), reduction: sum, fwd: 0.82 (ms), back: 2.86 (ms) input size(256, 3, 128, 128), reduction: sum, fwd: 0.12 (ms), back: 0.39 (ms) input size(256, 3, 256, 256), reduction: sum, fwd: 0.42 (ms), back: 1.45 (ms) input size(256, 3, 512, 512), reduction: sum, fwd: 1.53 (ms), back: 5.66 (ms) input size(512, 3, 128, 128), reduction: sum, fwd: 0.21 (ms), back: 0.74 (ms) input size(512, 3, 256, 256), reduction: sum, fwd: 0.78 (ms), back: 2.86 (ms) input size(512, 3, 512, 512), reduction: sum, fwd: 2.98 (ms), back: 11.23 (ms) input size(128, 3, 128, 128), reduction: mean, fwd: 0.07 (ms), back: 0.21 (ms) input size(128, 3, 256, 256), reduction: mean, fwd: 0.21 (ms), back: 0.73 (ms) input size(128, 3, 512, 512), reduction: mean, fwd: 0.82 (ms), back: 2.86 (ms) input size(256, 3, 128, 128), reduction: mean, fwd: 0.13 (ms), back: 0.39 (ms) input size(256, 3, 256, 256), reduction: mean, fwd: 0.42 (ms), back: 1.45 (ms) input size(256, 3, 512, 512), reduction: mean, fwd: 1.54 (ms), back: 5.65 (ms) input size(512, 3, 128, 128), reduction: mean, fwd: 0.22 (ms), back: 0.74 (ms) input size(512, 3, 256, 256), reduction: mean, fwd: 0.78 (ms), back: 2.87 (ms) input size(512, 3, 512, 512), reduction: mean, fwd: 2.98 (ms), back: 11.23 (ms) ``` </details> <details> <summary>PR results</summary> ``` input size(128, 3, 128, 128), reduction: none, fwd: 0.33 (ms), back: 0.59 (ms) input size(128, 3, 256, 256), reduction: none, fwd: 2.51 (ms), back: 3.92 (ms) input size(128, 3, 512, 512), reduction: none, fwd: 14.52 (ms), back: 17.05 (ms) input size(256, 3, 128, 128), reduction: none, fwd: 1.23 (ms), back: 1.85 (ms) input size(256, 3, 256, 256), reduction: none, fwd: 7.07 (ms), back: 8.45 (ms) input size(256, 3, 512, 512), reduction: none, fwd: 29.39 (ms), back: 34.21 (ms) input size(512, 3, 128, 128), reduction: none, fwd: 3.40 (ms), back: 4.18 (ms) input size(512, 3, 256, 256), reduction: none, fwd: 14.33 (ms), back: 16.90 (ms) input size(512, 3, 512, 512), reduction: none, fwd: 59.04 (ms), back: 68.36 (ms) input size(128, 3, 128, 128), reduction: sum, fwd: 0.07 (ms), back: 0.25 (ms) input size(128, 3, 256, 256), reduction: sum, fwd: 0.21 (ms), back: 0.86 (ms) input size(128, 3, 512, 512), reduction: sum, fwd: 0.82 (ms), back: 3.33 (ms) input size(256, 3, 128, 128), reduction: sum, fwd: 0.12 (ms), back: 0.46 (ms) input size(256, 3, 256, 256), reduction: sum, fwd: 0.42 (ms), back: 1.70 (ms) input size(256, 3, 512, 512), reduction: sum, fwd: 1.53 (ms), back: 6.58 (ms) input size(512, 3, 128, 128), reduction: sum, fwd: 0.21 (ms), back: 0.87 (ms) input size(512, 3, 256, 256), reduction: sum, fwd: 0.78 (ms), back: 3.34 (ms) input size(512, 3, 512, 512), reduction: sum, fwd: 2.98 (ms), back: 13.07 (ms) input size(128, 3, 128, 128), reduction: mean, fwd: 0.07 (ms), back: 0.26 (ms) input size(128, 3, 256, 256), reduction: mean, fwd: 0.21 (ms), back: 0.86 (ms) input size(128, 3, 512, 512), reduction: mean, fwd: 0.82 (ms), back: 3.34 (ms) input size(256, 3, 128, 128), reduction: mean, fwd: 0.12 (ms), back: 0.46 (ms) input size(256, 3, 256, 256), reduction: mean, fwd: 0.42 (ms), back: 1.72 (ms) input size(256, 3, 512, 512), reduction: mean, fwd: 1.53 (ms), back: 6.60 (ms) input size(512, 3, 128, 128), reduction: mean, fwd: 0.21 (ms), back: 0.87 (ms) input size(512, 3, 256, 256), reduction: mean, fwd: 0.78 (ms), back: 3.33 (ms) input size(512, 3, 512, 512), reduction: mean, fwd: 2.98 (ms), back: 13.07 (ms) ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/62826 Reviewed By: bdhirsh Differential Revision: D30282279 Pulled By: ngimel fbshipit-source-id: 4aa0ff3f8af0632957417931d332ec486a12b52d	2021-08-12 18:07:15 -07:00
Alexander Soare	219ba6575b	add autowrap_functions kwarg to fx.Tracer (#62106 ) Summary: Implements feature request https://github.com/pytorch/pytorch/issues/62021 Test it out with ```python from torch import fx from torch import nn def fx_int(x): return int(x) class MyModule(nn.Module): def forward(self, x): return fx_int(x.shape[0] / 2) tracer = fx.Tracer(autowrap_functions=(fx_int,)) # or remove kwarg to demonstrate symbolic trace error tracer.trace(MyModule()) ``` First time contributor, so please advise if I could have done anything to make lives easier for next time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62106 Reviewed By: SplitInfinity, driazati Differential Revision: D30080834 Pulled By: jamesr66a fbshipit-source-id: 68fadf8c881ea7930e7afd62b642874010fe4903	2021-08-12 17:38:25 -07:00
Bradley Davis	7a1ab9f5d7	[fx] store Tracer class on Graph and GraphModule for package deserialization [v2, the re-do] (#63121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63121 Re-introducing this diff with a small change to ignore setting Tracer classes on GraphModules when the Tracer class is defined not at module-level (prevents pickling). Previous, reverted Pull Request: https://github.com/pytorch/pytorch/pull/62497 Reviewed By: houseroad Differential Revision: D30252776 fbshipit-source-id: 42d2bc846e4b32d00563419c38c02b63cd0986e6	2021-08-12 17:28:50 -07:00
Karol Sputo	988ef190e3	Show warning in eager mode for empty containers (#62978 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54873 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62978 Reviewed By: navahgar Differential Revision: D30278343 Pulled By: ansley fbshipit-source-id: ebb19f7b8a10720f2612b99a2668d1ebbc1f2d16	2021-08-12 16:11:27 -07:00
Hanton Yang	e182b459d9	[iOS] Add podspec for libTorch-lite nightly build (#62691 ) Summary: The nightly pod version will be aliased with [PyTorch nightly build version](https://l.facebook.com/l.php?u=https%3A%2F%2Fgithub.com%2Fpytorch%2Fpytorch%2Fblob%2Fmaster%2F.circleci%2Fscripts%2Fbinary_populate_env.sh%23L88&h=AT3AeTpSGcz9YVeG7Lr_bweWOv8H2-kAMevglFfMslaZwgEPptNM59WdWj2ZER806rKVLNhQGM5EQcyFC_8xOq334LBo2J6YzgPW2LELkgASlA6UxP2gaD2 (`fa22f6303f`)Wy5mA6_lu_YlHHbEGPIU7ewJQD1 (`2d884f2263`)aBSlOy) and [CocoaPods version specification](https://l.facebook.com/l.php?u=https%3A%2F%2Fguides.cocoapods.org%2Fusing%2Fthe-podfile.html%23specifying-pod-versions&h=AT3AeTpSGcz9YVeG7Lr_bweWOv8H2-kAMevglFfMslaZwgEPptNM59WdWj2ZER806rKVLNhQGM5EQcyFC_8xOq334LBo2J6YzgPW2LELkgASlA6UxP2gaD2 (`fa22f6303f`)Wy5mA6_lu_YlHHbEGPIU7ewJQD1 (`2d884f2263`)aBSlOy), the version format of the podspect is `PyTorch version + nightly build date`, like `1.10.0.20210812`. Usage: 1. Add `pod 'LibTorch-Lite-Nightly'` to `Podfile` 2. Run `pod install` to install the nightly built lib 3. Run `pod update` to update the lib to the latest version Pull Request resolved: https://github.com/pytorch/pytorch/pull/62691 Test Plan: * Test on [TestApp](https://github.com/pytorch/pytorch/tree/master/ios/TestApp) and [HelloWorld](https://github.com/pytorch/ios-demo-app): Podfile: `pod 'LibTorch-Lite-Nightly'` * Test on Private Pod: {F642106928} Reviewed By: xta0 Differential Revision: D30090760 Pulled By: hanton fbshipit-source-id: 361aa2ed24a11d6aced8374cb45f70f49bd5da52	2021-08-12 15:35:14 -07:00
Rong Rong (AI Infra)	0b89e69e7c	[BE] delete GHA generated workflow files before regen (#63148 ) Summary: Unlike circle which all workflow goes in one file, GHA legacy generated files will stay silently in once's PR. e.g. when we change build_environment name and that's not ideal. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63148 Reviewed By: bdhirsh Differential Revision: D30283382 Pulled By: walterddr fbshipit-source-id: ffdd5bf9561dd38499052855a12ee5cf838a20b0	2021-08-12 14:43:00 -07:00
Tao Xu	ba25527ffc	[iOS][GPU] Fix the clamp shader function for x86_64 (#63062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63062 Pervasively, due to the need of supporting 10.0, we used a fp16 version of the clamp kernel on Metal, which didn't work well on x86_64. Since we don't need to support 10.0 anymore, we can use the fp32 version, which works both on arm64 and x86_64. ghstack-source-id: 135536785 Test Plan: - `buck test pp-macos` - Op tests in the playground app {F641013793} Reviewed By: husthyc Differential Revision: D30239931 fbshipit-source-id: 6ad1bf71422b537e052fbd7b7465ba8deb7ca0cf	2021-08-12 13:20:27 -07:00
Victor Quach	ed7ece389d	Forbid inplace modification of a saved tensor's pack_hook input (#62717 ) Summary: When using saved tensors hooks (especially default hooks), if the user defines a `pack_hook` that modifies its input, it can cause some surprising behavior. The goal of this PR is to prevent future user headache by catching inplace modifications of the input of `pack_hook` and raising an error if applicable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62717 Reviewed By: albanD Differential Revision: D30255243 Pulled By: Varal7 fbshipit-source-id: 8d73f1e1b50b697a59a2849b5e21cf0aa7493b76	2021-08-12 12:40:10 -07:00
Howard Huang	aa5141f204	Update CONTRIBUTING.md to remove ProcessGroupAgent (#63160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63160 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30284439 Pulled By: H-Huang fbshipit-source-id: 53c31b6917ef5e2125e146fb0ed73ae3d76a8cf9	2021-08-12 12:26:12 -07:00
Edward Wang (EcoF)	96fb1a56ea	add use_strict_trace to tensorboard add_graph method (#63120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63120 FAIM returns dictionaries as the model output, which throws an error when trying to trace using add_graph. Pass in `strict` to the tracer to make this user configurable. User post: https://fb.workplace.com/groups/pytorchLightning/permalink/1510194972650369/?comment_id=1510252919311241&reply_comment_id=1510281112641755 Test Plan: unit test Reviewed By: Reubend Differential Revision: D30265890 fbshipit-source-id: 58b25d9500b875a29a664aa9ef4c1e7f13631fa1	2021-08-12 12:12:12 -07:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
jiej	ed0b8a3e83	LayerNorm Support in autodiff: (#50467 ) Summary: 1. extend autodiff by adding entry for layer_norm in symbolic script, we now use native_layer_norm_backward 2. added backward function `layernorm_double_backward` for `native_layer_norm_backward`, preserves double backward support for LayerNorm in autodiff/ScriptModule 3. added python test to verify autodiff on layer_norm with various configuration of optional tensors; (verify the fix in https://github.com/pytorch/pytorch/issues/49430) Pull Request resolved: https://github.com/pytorch/pytorch/pull/50467 Reviewed By: eellison Differential Revision: D30232864 Pulled By: jansel fbshipit-source-id: b9c33075386aff96afff7415df9f94388bfb474a Co-authored-by: Ryan Spring <rspring@nvidia.com> Co-authored-by: Jie <jiej@nvidia.com>	2021-08-12 11:05:53 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00
Kushashwa Ravi Shrimali	aac3c7bd06	[reland] OpInfo: `adaptive_avg_pool2d` (#62935 ) Summary: This PR is an attempt to reland https://github.com/pytorch/pytorch/pull/62704. What has changed? The op has non-deterministic behavior, hence an appropriate `gradcheck` wrapper had to be added. cc: mruberry zou3519 heitorschueroff kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62935 Reviewed By: anjali411 Differential Revision: D30225095 Pulled By: zou3519 fbshipit-source-id: 644873cc21d44b19c8b68f9edff691913778de0e	2021-08-12 09:46:38 -07:00
Rong Rong (AI Infra)	daba551922	[BE] shorten CI name part2 (#63030 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62357 there's no need to specify cudnn version since they are recommended from cuda version already. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63030 Reviewed By: zhouzhuojie, driazati Differential Revision: D30226354 Pulled By: walterddr fbshipit-source-id: 7e2dc577810e0ce80ee27569c25a814566250ab1	2021-08-12 08:14:22 -07:00
Rohan Varma	eea52b7d47	Skip zero test on windows (#63087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63087 Test failed on windows unexpectedly see https://github.com/pytorch/pytorch/issues/63086. Skip for now while we investigate ghstack-source-id: 135631811 Test Plan: CI Reviewed By: ngimel Differential Revision: D30251300 fbshipit-source-id: 8acb1ea8863c654c171fe989ac24446c321c085d	2021-08-12 00:38:42 -07:00
Peter Bell	4d7a12f68b	BatchNorm: Use resize_output and empty, instead of empty_like (#63084 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62967 This lets each of the three implementations choose which memory format to use for the output, meaning channels_last can be used in more cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63084 Reviewed By: saketh-are Differential Revision: D30255740 Pulled By: ngimel fbshipit-source-id: 48d42850952ec910b29521a1c4e530eb2b29df5e	2021-08-11 23:47:24 -07:00
Supriya Rao	d5a7579597	[quant] Make version 1 the default for get_default_qat_qconfig (#63043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63043 In version 1 we use the fused module/operator during QAT. Making this the default for all QAT runs going forward. Older models saved after prepare_qat_fx can still load their state_dict into a model prepared using version 1. The state_dict will still have the same attribute for the observer/fake_quant modules. There may be some numerics difference between the old observer code in observer.py and the new fused module that was re-written in C++/CUDA to perform observe + fake_quantize. This PR also updates the test to check for the new module instead of the default FakeQuantize module. Note: there are also some changes to make the operator work for multi-dim per-channel quantization + updated the test for that. Test Plan: python test/test_quantization.py TestSerialization.test_default_qat_qconfig Imported from OSS Reviewed By: raghuramank100 Differential Revision: D30232222 fbshipit-source-id: f3553a1926ab7c663bbeed6d574e30a7e90dfb5b	2021-08-11 22:06:44 -07:00
Pritam Damania	91525d42d9	Fix sharded tensor tests. (#63054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63054 1) Ensure these tests are skipped in environments without any GPUs. 2) Add the test to run_test.py ghstack-source-id: 135595698 Test Plan: waitforbuildbot Reviewed By: wanchaol Differential Revision: D30239159 fbshipit-source-id: 21b543ba72e8d10182bc77e7ae1fd34fd4096509	2021-08-11 21:46:45 -07:00
Meghan Lele	bf7d03ff1f	Port `log_softmax_backward_data` to structured kernel (#62372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62372 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D30240242 Pulled By: SplitInfinity fbshipit-source-id: 67d5e4b1543c2e43675e905ce18ca49c11e33748	2021-08-11 21:03:59 -07:00
Meghan Lele	ba603594fd	Port `log_softmax` to structured kernel (#57374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57374 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D30240243 Pulled By: SplitInfinity fbshipit-source-id: de6617c75d16e26d607a884c25b8752b7b561737	2021-08-11 21:02:48 -07:00
zhouzhuojie	d2eda7f2f3	Add ciflow_ruleset.json generator along with gha ci (#63097 ) Summary: - Add `.github/generated-ciflow-ruleset.json` for ciflow-bot (so that we can generate better comments) - The lint job also checks git dirty to make sure that the file is always in sync with ciflow configs Pull Request resolved: https://github.com/pytorch/pytorch/pull/63097 Reviewed By: saketh-are Differential Revision: D30263278 Pulled By: zhouzhuojie fbshipit-source-id: bad68105a228e892ba071b29ecfdf433e1038054	2021-08-11 17:14:40 -07:00
Jiewen Tan	04caef8e1d	Improve IMethod::getArgumentNames to deal with empty argument names list (#62947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62947 This diff improved IMethod::getArgumentNames to deal with empty argument names list. Test Plan: buck test mode/dev //caffe2/caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor.GetEmptyArgumentNamesValidationMode buck test mode/dev //caffe2/caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor.GetEmptyArgumentNamesRealMode Reviewed By: wconstab Differential Revision: D30179974 fbshipit-source-id: c7aec35c360a73318867c5b77ebfec3affee47e3	2021-08-11 16:44:00 -07:00
Amy He	5cf32c1d09	Fix Nnapi backend execute's dangling pointer (#63092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63092 Bug discovered while testing NNAPI Delegate on SparkAR. Using ``` c10::IntArrayRef order = {0, 2, 3, 1}; fixed_inputs.push_back(tensorInp.get(i).permute(order).contiguous()); ``` results in a garbage value for order in `permute()`. Moving order inside the call to `permute()` fixes this issue. Problem is seemingly related to https://github.com/pytorch/pytorch/issues/44409, but luckily the solution in this case is simple. Bug wasn't caught earlier, since regular unit tests weren't affected by the dangling pointer, and address sanitizer NNAPI tests are turned off due to there being a different failure (T95764916). ghstack-source-id: 135526129 Test Plan: Run Unit tests: `python test/test_jit.py` Build and run SparkAR on an Android phone at the top of this diff stack (D30173959): `buck build --show-output arstudioplayer_arm64_debug -c pt.enable_nnapi=1` Reviewed By: raziel, iseeyuan Differential Revision: D30237504 fbshipit-source-id: c946d81feefc453b43d9295d8d6f509cafdcec03	2021-08-11 14:26:48 -07:00
Nikita Shulga	709ac6853a	Fix warnings (#62930 ) Summary: Add `-Wno-writable-strings`(which is clang's flavor of `-Wwrite-strings`) to list of warnings ignored while compiling torch_python. Avoid unnecessary copies in range loop Fix number of signed-unsigned comparisons Found while building locally on M1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62930 Reviewed By: albanD Differential Revision: D30171981 Pulled By: malfet fbshipit-source-id: 25bd43dab5675f927ca707e32737ed178b04651e	2021-08-11 14:07:10 -07:00
Tao Xu	855e8f2b17	[iOS][GPU] Consolidate array and non-array kernel for upsampling_nearest2d (#63061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63061 Cleanup the redundant shader code for the upsampling nearest kernel. ghstack-source-id: 135524349 Test Plan: - `buck test pp-macos` - Op tests in PyTorchPlayground app Reviewed By: husthyc Differential Revision: D30236905 fbshipit-source-id: e1e001b446452b077e6db719b0519c9070f3300b	2021-08-11 13:29:39 -07:00
Richard Barnes	456364729e	irange-ify 13b (#62476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62476 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D30001445 fbshipit-source-id: 6f4525338c80e9f929695f47f36ca9c72d96a75d	2021-08-11 13:13:44 -07:00
CaoE	31c1983603	Add BFloat16 support for unique and unique_consecutive on CPU (#62559 ) Summary: Add BFloat16 support for unique and unique_consecutive on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62559 Reviewed By: saketh-are Differential Revision: D30250675 Pulled By: ngimel fbshipit-source-id: 26e48f971d87f3b86db237e8ad3a4b74eb3c1def	2021-08-11 12:54:46 -07:00
Alexander Grund	51a67d3168	Add Github action to upload full source releases (#63022 ) Summary: Those release tarballs include the submodules. The action is run on every tag, master-branch push but will not upload anything. This makes sure nothing is broken when an actual release happens. On created releases the action runs and uploads the tarball Fixes https://github.com/pytorch/pytorch/issues/62708 As I don't have access rights here and testing is obviously hard (as a new release needs to be published), I set up a test at https://github.com/Flamefire/pytorch/releases/tag/testtag See also the run(s) at https://github.com/Flamefire/pytorch/actions/workflows/create_release.yml Pull Request resolved: https://github.com/pytorch/pytorch/pull/63022 Reviewed By: saketh-are Differential Revision: D30256253 Pulled By: seemethere fbshipit-source-id: ab5fe131452de14ae3768b91c221e68c536cb3aa	2021-08-11 12:47:17 -07:00
Xiang Gao	821c1edea9	Embedding thrust->cub: unique (#63042 ) Summary: Followup of https://github.com/pytorch/pytorch/pull/62495 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63042 Reviewed By: saketh-are Differential Revision: D30231084 Pulled By: ngimel fbshipit-source-id: 03b0a88107e8a2aee3570881d81bf2b676f525cd	2021-08-11 12:40:36 -07:00
Howard Cheng	fa22f6303f	[PyTorch] Add flop count for addmm (#61895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61895 * Add FLOP count for addmm, should be `2mnk`. Share the same code path for `addmm` and `mm`. Test Plan: Imported from OSS `python test/test_profiler.py` Run a sample profile and check that FLOPS for `aten::addmm` is correct. `[chowar@devbig053.frc2 ~/local/pytorch/build] ninja bin/test_jit` `[chowar@devbig053.frc2 ~/local/pytorch/build] ./bin/test_jit --gtest_filter='ComputeFlopsTest'` Reviewed By: dskhudia Differential Revision: D29785671 fbshipit-source-id: d1512036202d7234a981bda897af1f75808ccbfe	2021-08-11 12:33:43 -07:00
Salil Desai	fb4ba9e664	XNNPack Input Pointer Caching Comment (#62818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62818 Added a comment to explain why we no longer need to manually cache pointers/parameters for convolution, as removed in D29777605 (`f5c6c3947e`) Test Plan: Sandcastle tests (no code changed) Reviewed By: kimishpatel Differential Revision: D30113489 fbshipit-source-id: d697f05816acbd367d59a4aced1925303c683d40	2021-08-11 11:55:42 -07:00
rusty1s	82123758ba	`_convert_coo_to_csr` CPP and CUDA functionality (#61838 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57381 and improves https://github.com/pytorch/pytorch/pull/61340 via dedicated `coo_to_csr` functionalities. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61838 Reviewed By: ezyang Differential Revision: D30132736 Pulled By: cpuhrsch fbshipit-source-id: a1fd074c0d70366a524d219a620b94f8bed71d7c	2021-08-11 11:37:20 -07:00
Pritam Damania	b8e6144e0a	Add a _RemoteDevice structure for ShardedTensor/ShardingSpec. (#62927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62927 As part of the ShardedTensor work, we realized we do need some sort of _RemoteDevice structure that deals with our format of "workername/device" so that users don't have to worry about parsing this string directly. Right now this structure is just the bare minimum and is mostly a container for describing a remote device. It is currently only used in ShardedTensor, ShardingSpec and RemoteModule. Once we actually have a consolidated remote device proposal, this class can be extended appropriately if needed. ghstack-source-id: 135534086 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: SciPioneer Differential Revision: D30170689 fbshipit-source-id: 1ac2e81c7a597dc40bf3fbf2c1168c382c66649f	2021-08-11 11:27:32 -07:00
Jacob Szwejbka	b746fed164	[Pytorch Edge] Move RuntimeCompatibilityInfo Factory Method (#63005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63005 Realized I forgot to move the Runtime half of these functions be within the struct. Test Plan: ci Reviewed By: pavithranrao Differential Revision: D30205521 fbshipit-source-id: ccd87d7d78450dd0dd23ba493bbb9d87be4640a5	2021-08-11 11:15:57 -07:00
Stephen Macke	3d3ad0a52f	[easy] add an `inplace` argument to MutableNetProto.to_net() and core.Net() constructor (#63068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63068 The caffe2 core.Net constructor can accept a caffe2_pb2.NetDef proto, but it always creates a copy. This is wasteful when we can prove that the proto being passed to it will not be used anywhere else. So we add an "inplace" argument to the `core.Net` constructor that allows clients to give away ownership of the passed proto without copying. We default this argument to `False`, ensuring that behavior does not change unless explicitly requested. Test Plan: Let CI run. Differential Revision: D29976510 fbshipit-source-id: 26e13ca76f3431b8ef0de51f08bbf263491d323e	2021-08-11 11:10:52 -07:00
zhouzhuojie	c090ae291e	Fix gha render-test-result mixed failure passthrough (#63056 ) Summary: To fix something like https://github.com/pytorch/pytorch/actions/runs/1114555082 ![image](https://user-images.githubusercontent.com/658840/128956528-86997457-5e18-4ae1-83cc-aa7d0ca03c0e.png) Not sure why `needs.test.result` doesn't capture the `failure` case before, so changed it to `needs.test.result != 'skipped' \|\| failure()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63056 Reviewed By: walterddr, tktrungna Differential Revision: D30240112 Pulled By: zhouzhuojie fbshipit-source-id: d159cc3f79ed5d604ae12583736b37ac28e8d87c	2021-08-11 09:45:31 -07:00
Yida Wang	4ea6a3aa74	Fix issues with printing certain torch modules (#62447 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54420 When I tested on master, with the testing code, there were multiple objects on the garbage collector that cannot be printed. Testing code: ``` import torch import gc import os import sys print(torch.__version__) a = torch.rand(10) print(a) objects = gc.get_objects() for i in range(len(objects)): print(objects[i]) ``` ### 1 ``` print(torch.classes) ``` Like SplitInfinity has mentioned in the GitHub issue, the solution here is to set `__file__` for `torch.classes` to something. Similar to [_ops.py](https://github.com/pytorch/pytorch/blob/master/torch/_ops.py#L69), where `__file__` is set to `_ops.py`, we could set `__file__` for torch.classes to `_classes.py`. ### 2 ``` print(torch._ops.ops.quantized) print(torch._ops.ops.atan) ``` When we try to print these two modules, it will call `_OpNamespace::__getattr__`, but the `op_name` is `__file__`. This becomes a problem when `torch._C._jit_get_operation(qualified_op_name)` [(link)](https://github.com/pytorch/pytorch/blob/master/torch/_ops.py#L60) tries to look for an actual op on the native C++ side. Only when we get the attribute for an actual op, e.g. `print(torch._ops.ops.quantized.elu)`, the `op_name` becomes proper (e.g. `elu`). My current solution is to return a hardcoded string (i.e. “torch.ops”) if `op_name` is `"__file__"`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62447 Reviewed By: saketh-are Differential Revision: D30234654 Pulled By: yidawang-oss fbshipit-source-id: de43a8f599739c749fb3307eea015cc61f1da60e	2021-08-11 09:40:41 -07:00
Peter Bell	5c00091f02	Shard python_functions.cpp (#62186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62186 This file takes 6 minutes on its own to compile and is the limiting factor for building `libtorch_python` on a 32-core threadripper. This splits the file into 5 shards which take around 50 seconds each to compile. Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D29962046 Pulled By: albanD fbshipit-source-id: df13cfaebd54296f10609f67ae74a850c329bd37	2021-08-11 09:21:26 -07:00
Sze Wai Celeste Yuen	c5de83adca	Fix inconsisteny between Python and JIT power operation (#62842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62842 Test Plan: Wrote unit test TestAtenPow to test behavior of aten::pow when: 1. base is int, exponent is int 2. base is int, exponent is float 3. base is float, exponent is int 4. base is float, exponent is float Specifically, we test that when base is zero and exponent is negative, we raise error. In all other cases, we expect behavior to be the same as the result returned by Python. It is because the cpp code relies on overloading, we need to make sure all combinations of types give us the expected result. Reviewed By: zhxchen17 Differential Revision: D30146115 Pulled By: szewaiyuen7 fbshipit-source-id: dc661897ad38da286ee454120fbe41314b7f2995	2021-08-11 08:41:46 -07:00
Dmytro Dzhulgakov	f446e835ee	Fix CUDA_KERNEL_ASSERT ambiguous symbol in NDEBUG mode (#62527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62527 If NDEBUG is applied inconsistently in compilation we might get 'ambiguous declaration' error. Let's make sure that the forward declaration matches glibc including all specifiers. Test Plan: sandcastle Reviewed By: mdschatz Differential Revision: D30030051 fbshipit-source-id: 9f4d5f1d4e74f0a4eaeeaaaad76b93ee485d8bcd	2021-08-11 01:10:09 -07:00
Pritam Damania	f7611b31aa	[4/N] Enable opt-asan for distributed unit tests. (#62051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62051 The goal here is to enable opt-asan for "spawn" based unit tests since this works for "spawn" unlike "dev-asan". As a result, we can run ASAN for "spawn" unit tests as well. This means we can completely remove fork unit tests from the code base since the only purpose for these tests was to run ASAN. ghstack-source-id: 135523770 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D29854514 fbshipit-source-id: 02a5bfcfae2afc21badecff77082c7a6ad83636b	2021-08-10 22:38:31 -07:00
Lu Fang	847a7cfa10	Back out "[fx] store Tracer class on Graph and GraphModule for package deserialization" (#63053 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63053 Original commit changeset: eca09424ad30 The original diff - D30019214 (`6286d33878`) breaks the publish flow in model saving. Test Plan: ci Differential Revision: D30236517 fbshipit-source-id: 3e05db02fc1cbbc2ed262c83bf56d555277abb34	2021-08-10 21:58:08 -07:00
Rishi Puri	324673a537	rebase for autocast updates to include device_type and dtype flags (#61002 ) Summary: Fixes #{55374} https://github.com/pytorch/pytorch/issues/55374 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61002 Reviewed By: malfet, mruberry Differential Revision: D30016812 Pulled By: ngimel fbshipit-source-id: 6e09a29f539d28e9aea5cd9489b1e633cc588033	2021-08-10 20:03:12 -07:00
Wei-Sheng Chin	a55cae3d37	Fix missing element types and shapes when autograd.Function has multiple tensor outputs (#57966 ) Summary: When generating IR for autograd.Function, if the function has multiple outputs, a TupleUnpack may be inserted after the original function node, and Pytorch only assigns proper information (tensor element type and shape) to the TupleUnpack and forgets the original function node. In contrast, if autograd.Function only produces one output, the original function node may have tensor element type and shape in its output schema. Before this PR: - (simplified) IR for autograd.Function with one output: input (tensor, dtype=float32, shape=[2, 3]) -> PythonOp -> output (tensor, dtype=float32, shape=[4, 5]) - (simplified) IR for autograd.Function with one output: input (tensor, dtype=float32, shape=[2, 3]) -> PythonOp -> output_0 (tensor), output_1 (tensor) -> TupleUnpack output_2 (tensor, dtype=float32, shape=[4, 5]), output_3 (tensor, dtype=float32, shape=[6, 7]) After this PR: - (simplified) IR for autograd.Function with one output: input (tensor, dtype=float32, shape=[2, 3]) -> PythonOp -> output (tensor, dtype=float32, shape=[4, 5]) - (simplified) IR for autograd.Function with one output: input (tensor, dtype=float32, shape=[2, 3]) -> PythonOp ->output_0 (tensor, dtype=float32, shape=[4, 5]), output_1 (tensor, dtype=float32, shape=[6, 7]) -> TupleUnpack output_2 (tensor, dtype=float32, shape=[4, 5]), output_3 (tensor, dtype=float32, shape=[6, 7]) Pull Request resolved: https://github.com/pytorch/pytorch/pull/57966 Reviewed By: zhxchen17 Differential Revision: D30208207 Pulled By: gmagogsfm fbshipit-source-id: 42a3d1f9c0932133112a85df0c49cf4ea0afa175	2021-08-10 19:48:11 -07:00
Natalia Gimelshein	390c0ac403	remove dead code (#63031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63031 Reviewed By: mruberry Differential Revision: D30225094 Pulled By: ngimel fbshipit-source-id: 3666a0fa120bea85225cd3ee04f89d64952d2862	2021-08-10 18:41:13 -07:00
Natalia Gimelshein	94c5309369	Revert D30199482: [pytorch][PR] Add BFloat16 support for unique and unique_consecutive on CPU Test Plan: revert-hammer Differential Revision: D30199482 (`fc0b8e6033`) Original commit changeset: 6f2d9cc1a528 fbshipit-source-id: 39e9f202bcbd978525f792173d4f97b5b329b5b1	2021-08-10 18:27:18 -07:00
Richard Barnes	d1f9c03cef	Use `const auto` with irange (#62990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62990 Test Plan: Sandcastle Reviewed By: zhouzhuojie Differential Revision: D30199748 fbshipit-source-id: 284b208ffa3c6c4749e5ac9b1fccb28914590f2c	2021-08-10 17:59:01 -07:00
Eddie Yan	d893b44cd8	change nccl version reporting (#62916 ) Summary: https://github.com/pytorch/pytorch/issues/62295 Previously the packing and unpacking of the NCCL version "integer" was done to have parity with the upstream NCCL version encoding. However, there doesn't seem to be any place where this integer is directly compared with a version integer sourced from upstream NCCL, and syncing the encoding seems to be error-prone (e.g., a recent change where a special case was added for minor versions >= 10 `7e51592129/src/nccl.h.in (L22)`). This patch changes the reporting to return a tuple of version numbers instead (to preserve ease-of-use for comparisons) and tweaks the passing between C/Python to avoid the digit overflow problem. CC ngimel mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/62916 Reviewed By: anjali411 Differential Revision: D30201069 Pulled By: mrshenli fbshipit-source-id: 2e4e7c69f001c3f22bd04aa6df6a992e538bea45	2021-08-10 17:46:27 -07:00
tktrungna	f307120df4	Update test_torch_deploy (#62838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62838 Fixes #62380 * update test functions to call wheel install folder {sitepackages}/torch instead of build/ folder * add symbolic link for shared libraries which are called by the tests (this is a bit hacky and should be fixed the rpath before compiling -- similar to https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L204-L208). ### Test plan check if all ci workflows pass Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D30193141 Pulled By: tktrungna fbshipit-source-id: 72c2bd3a740fca0f72e4803df505240193692c44	2021-08-10 16:29:50 -07:00
tktrungna	af6ed084b4	update test_libtorch (#62797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62797 Fixes #62380 * update test functions to call wheel install folder {sitepackages}/torch instead of build/ folder * add symbolic link for shared libraries which are called by the tests (this is a bit hacky and should be fixed the rpath before compiling -- similar to https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L204-L208). ### Test plan check if all ci workflows pass Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D30193140 Pulled By: tktrungna fbshipit-source-id: d8e54c403f42abbbbe4556abf40c22a7955df737	2021-08-10 16:29:48 -07:00
tktrungna	2f5ac9c0ba	update test distributed (#62796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62796 Fixes #62380 * update test functions to call wheel install folder {sitepackages}/torch instead of build/ folder * add symbolic link for shared libraries which are called by the tests (this is a bit hacky and should be fixed the rpath before compiling -- similar to https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L204-L208). ### Test plan check if all ci workflows pass Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30193142 Pulled By: tktrungna fbshipit-source-id: 1247f9eda1c11c763c31c7383c77545b1ead1a60	2021-08-10 16:29:47 -07:00
tktrungna	dfe8445cd7	update test_vulkan (#62795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62795 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D30124421 Pulled By: tktrungna fbshipit-source-id: 235ba166b02f7334e89cb2493024067851bf5b9b	2021-08-10 16:29:45 -07:00
tktrungna	25c3b9dc10	update test_rpc (#62781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62781 Test Plan: Imported from OSS Reviewed By: walterddr, zhouzhuojie Differential Revision: D30124391 Pulled By: tktrungna fbshipit-source-id: 99c275d6c9f23b4f274fd0ca19a16879ed27afd5	2021-08-10 16:28:35 -07:00
Matej Sladek	f807229fd4	[ONNX] add support for prim::Unitialized in lower_tuples pass (#56912 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56911 Code from issue generates this Torchscript: ``` graph(%self : __torch__.MyModule, %t.1 : Tensor): %12 : None = prim::Constant() %7 : str = prim::Constant[value="Negative input"]() # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:11:28 %3 : int = prim::Constant[value=0]() # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:15 %9 : int = prim::Constant[value=5]() # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:13:31 %33 : (Tensor, Tensor) = prim::Uninitialized() %4 : Tensor = aten::lt(%t.1, %3) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:11 %6 : bool = aten::Bool(%4) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:11 %34 : (Tensor, Tensor) = prim::If(%6) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:8 block0(): = prim::RaiseException(%7) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:11:12 -> (%33) block1(): %11 : int[] = prim::ListConstruct(%9) %16 : Tensor = aten::zeros(%11, %12, %12, %12, %12) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:13:19 %18 : int[] = prim::ListConstruct(%9) %23 : Tensor = aten::zeros(%18, %12, %12, %12, %12) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:13:35 %24 : (Tensor, Tensor) = prim::TupleConstruct(%16, %23) -> (%24) return (%34) ``` Problem is that onnx exporter during lower_tuples pass doesn't support forwarding of tuples in prim::Unitialized. Solution is: 1. add prim::Unitialized to supported_op in lower_tuples pass 1. As prim::Unitialized has now multiple outputs, we should call giveFreshAlias for every output Pull Request resolved: https://github.com/pytorch/pytorch/pull/56912 Reviewed By: nikithamalgifb Differential Revision: D29837200 Pulled By: SplitInfinity fbshipit-source-id: 321fae6fe52b1523df5653dbb9ea73b998ef1cda	2021-08-10 16:21:16 -07:00
Howard Huang	4d0497034c	Remove process_group_agent and faulty_process_group_agent files (#62985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62985 Remove the process_group_agent and faulty_process_group_agent code now that PROCESS_GROUP backend has been deprecated for RPC (https://github.com/pytorch/pytorch/issues/55615). Discussed with xush6528 that it was okay to remove ProcessGroupAgentTest and ProcessGroupAgentBench which depended on process_group_agent. Test Plan: CI tests Reviewed By: pritamdamania87 Differential Revision: D30195576 fbshipit-source-id: 8b4381cffadb868b19d481198015d0a67b205811	2021-08-10 15:57:39 -07:00
Natalia Gimelshein	790553811c	fix sort and topk with discontiguous out (#63029 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62645 and https://github.com/pytorch/pytorch/issues/62940. The root cause of those bugs is in the bad interaction between `collapseDims` and setting the size of sorting/topK dimension to 1. If all other dimensions happen to be 1, `collapseDims` thinks that that `1` dimension is collapsible (even though it was specifically marked to be preserved) and loses its stride information. If dimension was really of size 1, the stride information would be unimportant, but since in reality that dimension is not 1 and was set to 1 for convenience, the loss of stride information results in incorrect outputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63029 Reviewed By: heitorschueroff Differential Revision: D30224925 Pulled By: ngimel fbshipit-source-id: 269dd375c5cd57c6007fe91f729f8c60a2e7a264	2021-08-10 15:45:28 -07:00
Hanton Yang	500b24e303	[iOS] enable Metal in the nightly build (#62855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62855 Test Plan: Test on Private Pod with the [HelloWorld](https://fburl.com/3hiwkkhm) demo Reviewed By: xta0 Differential Revision: D30174151 Pulled By: hanton fbshipit-source-id: 22cd8663ac239811bf8ed1c3b6301460d798dbfa	2021-08-10 15:18:58 -07:00
Christian Puhrsch	3beb65d45d	test_cudnn_convolution_relu skipCUDAIfRocm Summary: skip rocm test for test_cudnn_convolution_relu Test Plan: This skips a test Reviewed By: ngimel Differential Revision: D30233620 fbshipit-source-id: 31eab8b03c3f15674e0d262a8f55965c1aa6b809	2021-08-10 15:15:23 -07:00
Victor Quach	557047eb4c	Add docstring for saved tensors default hooks (#62361 ) Summary: Add documentation for the saved tensors default hooks introduced in https://github.com/pytorch/pytorch/issues/61834 / https://github.com/pytorch/pytorch/issues/62563 Sister PR: https://github.com/pytorch/pytorch/issues/62362 (will add a link from autograd.rst to notes/autograd in whatever PR does not land first) Pull Request resolved: https://github.com/pytorch/pytorch/pull/62361 Reviewed By: zou3519 Differential Revision: D30081997 Pulled By: Varal7 fbshipit-source-id: cb923e943e1d96db9669c1d863d693af30910c62	2021-08-10 14:59:38 -07:00
Tao Xu	dbb7be2e79	[iOS][CI] Store every version of nightlies in S3 (#63039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63039 Test Plan: Imported from OSS Reviewed By: hanton Differential Revision: D30229385 Pulled By: xta0 fbshipit-source-id: 15b438a6326159258803ab97e67dc9ec5db50d59	2021-08-10 14:33:36 -07:00
Jerry Zhang	990c2190d1	[quant][graphmode] Reference pattern support for elu (#62607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62607 Removing the quantize handler for elu since it can be covered by DefaultNodeQuantizeHandler Test Plan: python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: iramazanli Differential Revision: D30053977 fbshipit-source-id: 426789443e928bb01a88907de616cbda5866f621	2021-08-10 14:00:39 -07:00
kshitij12345	f836c4f8bd	[fix] TestMultiThreadAutograd: propagate exception from child thread to main thread (#63018 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62895 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63018 Reviewed By: anjali411 Differential Revision: D30225856 Pulled By: Varal7 fbshipit-source-id: b5dd7999de5060e06f8958ea3ce49e0b74110971	2021-08-10 13:56:49 -07:00
Amy He	bfa67264d1	[1/N] Nnapi backend execute and compile (#62272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62272 Added Android NNAPI delegate implementation of runtime initialization (compilation) and execution. The delegate's preprocess step was [previously implemented](https://github.com/pytorch/pytorch/pull/62225). Now, the reset of the delegate, which implements client-side execution, is added. nnapi_backend_lib.cpp: Implementation of delegate's compile and execute. `execute()` is essentially a C++ implementation of [`NnapiModule`](https://github.com/pytorch/pytorch/blob/master/torch/backends/_nnapi/prepare.py), which wraps an NNAPI Compilation and handles preparation of weights, inputs, and outputs. - Any steps that can be done before execution are moved to `compile()`. - `init()` cannot be moved to `compile()` because it requires real inputs for dynamic shaping. - `shape_compute_module` cannot currently be deserialized in `compile()`, since mobile::Module has no IValue conversion. - Processed arguments that are modified by `init()` must be kept as member variables. Any other processed arguments are passed through a dictionary, `handles`. nnapi_bind.cpp & nnapi_bind.h: Created a header file for `nnapi_bind.cpp`, so that it's NnapiCompilation class can be used by `nnapi_backend_lib.cpp`. test_backend_nnapi.py: Enabled execution testing. ghstack-source-id: 135432844 Test Plan: Imported from OSS Tested on devserver. 1. Load and unpack a special devserver build of NNAPI: `jf download GICWmAAzUR0eo20TAPasVts8ObhobsIXAAAz --file "nnapi-host-linux.tar.xz"` 2. `export LIBNEURALNETWORKS_PATH=/path/to/libneuralnetworks.so` 3. Run unittests: `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` TODO: test with lite interpreter runtime Reviewed By: raziel, iseeyuan Differential Revision: D29944873 fbshipit-source-id: 48967d873e79ef2cce9bcba2aeea3c52f7a18c07	2021-08-10 13:37:39 -07:00
CaoE	fc0b8e6033	Add BFloat16 support for unique and unique_consecutive on CPU (#62559 ) Summary: Add BFloat16 support for unique and unique_consecutive on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62559 Reviewed By: anjali411 Differential Revision: D30199482 Pulled By: ngimel fbshipit-source-id: 6f2d9cc1a528bea7c723139a4f1b14e4b2213601	2021-08-10 13:22:54 -07:00
Jerry Zhang	cb7f35d47a	[quant][refactor] Checking activation_dtype instead of activation_post_process (#62489 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62489 Addressing comment from previous PR: https://github.com/pytorch/pytorch/pull/62374#discussion_r679354145 Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: iramazanli Differential Revision: D30053980 fbshipit-source-id: 79c216410282eccd6f0a8f24e38c55c4d18ec0d0	2021-08-10 12:17:36 -07:00
Raghav Kansal	6d21e36f21	LU solve uses cuBLAS and cuSOLVER for matrices with dim > 1024 (#61815 ) Summary: This PR builds off of https://github.com/pytorch/pytorch/issues/59148 and modifies the `lu_solve` routine to avoid MAGMA for `b` or `lu_data` matrices with any dimension > 1024, since MAGMA has a bug when dealing with such matrices (https://bitbucket.org/icl/magma/issues/19/dgesv_batched-dgetrs_batched-fails-for). Fixes https://github.com/pytorch/pytorch/issues/36921 Fixes https://github.com/pytorch/pytorch/issues/61929 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61815 Reviewed By: anjali411 Differential Revision: D30199618 Pulled By: ngimel fbshipit-source-id: 06870793f697e9c35aaaa8254b8a8b1a38bd3aa9	2021-08-10 11:07:16 -07:00
Wanchao Liang	0c39cea3d2	[sharded_tensor] add default fields to ShardedTensorMetadata (#62867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62867 This add default fields for ShardedTensorMetadata, to allow easy construction and modification afterwards. ghstack-source-id: 135284133 Test Plan: ShardedTensorMetadata validity should be guarded with `init_from_local_shards` API and its tests. Reviewed By: pritamdamania87 Differential Revision: D30148481 fbshipit-source-id: 0d99f41f23dbeb4201a36109556ba23b9a6c6fb1	2021-08-10 11:00:01 -07:00
Rohan Varma	5fb79f61a8	[DDP] Dont set thread local state in reducer autograd hook. (#62996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62996 No need to set this because autograd engine already propagates TLS states. ghstack-source-id: 135438220 Test Plan: CI Reviewed By: albanD Differential Revision: D30202078 fbshipit-source-id: e5e917269a03afd7a6b8e61f28b45cdb71ac3e64	2021-08-10 10:50:16 -07:00
Pyre Bot Jr	6915bc0781	[typing] suppress errors in `fbcode/caffe2` - batch 2 Test Plan: Sandcastle Differential Revision: D30222378 fbshipit-source-id: 6a0a5d210266f19de63273240a080365c9143eb0	2021-08-10 10:26:52 -07:00
Elias Ellison	ea808df25d	Test shape analysis with opinfos (#59814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59814 Using opinfos to test shape analysis. By default, we just check that we don't give incorrect answers, and then if `assert_jit_shape_analysis` is true, tests that we correctly propagates the full shape. and it found a couple bugs {emoji:1f603} Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D30200058 Pulled By: eellison fbshipit-source-id: 6226be87f5390277cfa5a1fffaa1b072d4bc8803	2021-08-10 09:47:33 -07:00
Elias Ellison	7312bd953c	add ssupport for a few more opinfos in jit (#59812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59812 This is sort of a half measure: we can successfully trace through opinfos which are registered as lambdas, we just can't script them. This tests if the op is a lambda in which case bails... see the next PR to get resize_ to work, maybe this should be consolidated with that... Test Plan: Imported from OSS Reviewed By: pbelevich, zhxchen17 Differential Revision: D30200061 Pulled By: eellison fbshipit-source-id: 7e3c9b0be746b16f0f57ece49f6fbe20bf6535ec	2021-08-10 09:47:32 -07:00
Elias Ellison	9cbdc90d73	Don't substitute in symbolic shapes to shape compute graph (#59811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59811 We don't want to actually substitute in symbolic shapes, because it invalidates the partially evaluated graph for further use. Test Plan: Imported from OSS Reviewed By: pbelevich, zhxchen17 Differential Revision: D30200059 Pulled By: eellison fbshipit-source-id: 267ed97d8421fe480dec494cdf0dec9cf9ed3ba2	2021-08-10 09:47:30 -07:00
Elias Ellison	7db0bcfb40	small cleanups (#59810 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59810 Rephrasings and cleanup of dead code Test Plan: Imported from OSS Reviewed By: pbelevich, zhxchen17 Differential Revision: D30200062 Pulled By: eellison fbshipit-source-id: b03e5adb928aa46bee6685667cad43333b6e6016	2021-08-10 09:47:28 -07:00
Elias Ellison	9cd990de0d	Only optimize after change (redo) (#59809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59809 Some how this didnt get landed previously in ghstack mixup Test Plan: Imported from OSS Reviewed By: pbelevich, zhxchen17 Differential Revision: D30200060 Pulled By: eellison fbshipit-source-id: 47f256421a1fe1a005cd11fcc4d7f023b5990834	2021-08-10 09:46:13 -07:00
Michael Suo	4c630773e8	[jit] warn if _check_overload_body fails to find source Summary: Under certain conditions (particularly if a module is frozen, like with PyInstaller or torch::deploy), we will not have source code available for functions. `import torch` should still work in this case, but this check is currently causing it to raise an exception. Since this is an initial check (if an overload is actually exercised there will be hard failure), raise a warning and move on. Test Plan: unit tests Reviewed By: eellison Differential Revision: D30214271 fbshipit-source-id: eb021503e416268e8585e0708d6271c1e7b91e95	2021-08-10 09:28:50 -07:00
Supriya Rao	aa89d5f7f6	[quant] Update get_default_qat_qconfig to return the fused observer+fake_quant module (#62702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62702 Expose the qconfig to the user to speed up training by leveraging the fused module. The module currently supports per-tensor/per-channel moving avg observer and fake-quantize. For details on perf benefits, refer to https://github.com/pytorch/pytorch/pull/61691 Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D30093719 fbshipit-source-id: b78deb7810f5b597474b9b9a0395d361d04eb46a	2021-08-10 09:28:49 -07:00
Supriya Rao	08d1a12d69	[quant] add reduce_range option to FusedMovingAvgFakeQuantize module (#62863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62863 To make this consistent with other observers, add reduce_range option that can be used to update quant_min/max Test Plan: python test/test_quantization.py test_fused_mod_reduce_range Imported from OSS Reviewed By: raghuramank100 Differential Revision: D30146602 fbshipit-source-id: a2015f095766f9c884611e9ab6942528bc9bc972	2021-08-10 09:27:01 -07:00
Peter Bell	978490d7c7	Codegen: Fix operator::name on windows (#62278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62278 In `Operators.h` we're using `str(BaseOperatorName)`, while in `OperatorsEverything.cpp` we're using `str(OperatorName)`. e.g. ``` STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(name, "aten::abs") ``` vs ``` STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA(abs_out, name, "aten::abs.out") ``` Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D29962047 Pulled By: albanD fbshipit-source-id: 5a05b898fc734a4751c2b0187e4eeea4efb0502b	2021-08-10 07:58:09 -07:00
Edward Yang	cdf702b60c	Reject kwonly arguments passed positionally in torch.ops (#62981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62981 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D30211030 Pulled By: ezyang fbshipit-source-id: aae426592e92bf3a50076f470e153a4ae7d6f101	2021-08-10 07:16:00 -07:00
Sameer Deshmukh	9e7b6bb69f	Allow LocalResponseNorm to accept 0 dim batch sizes (#62801 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. This PR allows `LocalResponseNorm` to accept tensors with 0 dimensional batch size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62801 Reviewed By: zou3519 Differential Revision: D30165282 Pulled By: jbschlosser fbshipit-source-id: cce0b2d12dbf47dc8ed6247c267bf2f2305f858a	2021-08-10 06:54:52 -07:00
Luca Wehrstedt	061062ae2a	Update TensorPipe submodule Test Plan: CI ran as part of https://github.com/pytorch/pytorch/pull/60938. Reviewed By: beauby Differential Revision: D30219343 fbshipit-source-id: 531338f912fee488d312d23da8bda63ceb862aa9	2021-08-10 05:46:12 -07:00
Rohan Varma	3df4870343	[Reland][DDP] Support not all outputs used in loss calculation (#61753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61753 Reland of https://github.com/pytorch/pytorch/pull/57081. Main difference is that the former diff moved `prepare_for_backward` check into `DDPSink` backward, but that resulted in issues due to potential autograd engine races. The original diff moved `prepare_for_backward` into `DDPSink` as part of a long-term plan to always call it within `DDPSink`. In particular this doesn't work because `prepare_for_backward` sets `expect_autograd_hooks=true` which enables autograd hooks to fire, but there were several use cases internally where autograd hooks were called before DDPSink called `prepare_for_backward`, resulting in errors/regression. We instead keep the call to `prepare_for_backward` in the forward pass, but still run outputs through `DDPSink` when find_unused_parameters=True. As a result, outputs that are not used when computing loss have `None` gradients and we don't touch them if they are globally `None`. Note that the hooks still fire with a undefined gradient which is how we avoid the Reducer erroring out with the message that some hooks did not fire. Added the unittests that were part of the reverted diff. ghstack-source-id: 135388925 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D29726179 fbshipit-source-id: 54c8819e0aa72c61554104723a5b9c936501e719	2021-08-09 22:29:11 -07:00
Ilqar Ramazanli	5ed6e4429e	To fix variance computation for complex Adam (#62946 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59998 It has been discussed in the issue that the variance term of Adam optimizer currently doesn't compute correctly for complex domain. As it has been stated in the Generalization to Complex numbers section in https://en.wikipedia.org/wiki/Variance variance is computed as E[(X - mu)(X-mu)] (where mu = E[X] and stands for conjugate) for complex random variable X. However, currently the computation method in implementation of Adam is via E[(X - mu)(X-mu)] which doesn't return right variance value, in particular it returns complex number. Variance is defined to be real number even though underlying random variable is complex. We fix this issue here, and testing that resulting variance is indeed real number. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62946 Reviewed By: albanD Differential Revision: D30196038 Pulled By: iramazanli fbshipit-source-id: ab0a6f31658aeb56bdcb211ff86eaa29f3f0d718	2021-08-09 17:54:43 -07:00
Jerry Zhang	3c1d1170a4	[quant][graphmode][fx] Attach a weight qparam dict to linear and conv in reference quantized model (#62488 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62488 Instead of attaching weight observer/fake_quant to the float linear and conv, we can compute the quantization parameters and attach that as a dictionary to these modules so that we can reduce the model size and make the reference module clearer TODO: the numerics for linear and conv in reference quantized model is still not correct since we did not quantize weight, we may explore things like parameterization to implement this support Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D30053979 fbshipit-source-id: b5f8497cf6cf65eec924df2d8fb10a9e154b8cab	2021-08-09 16:55:14 -07:00
zhouzhuojie	59ac451ba3	Simplify the logic of running ci workflow codegen (#62853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62853 wanted to simplify the logic in the `__post_int__`, and delegate the settings back to individual workflows, this gives us more flexibility in changing individual workflows, as well as reducing the complexity of understanding the mutation conditions. Test Plan: Imported from OSS Reviewed By: walterddr, seemethere Differential Revision: D30149190 Pulled By: zhouzhuojie fbshipit-source-id: 44df5b1e14184f3a81cb8004151525d0e0fb20d9	2021-08-09 16:47:46 -07:00
Richard Barnes	8720369a48	irange-ify 12b (#62484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62484 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D30015528 fbshipit-source-id: c4e1a5425a73f100102a97dcec1579f1049c9c1d	2021-08-09 16:40:47 -07:00
Peter Bell	93e0f3a330	Shard Operators.cpp (#62185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62185 This file can take 5 minutes on its own to compile, and is the single limiting factor for compile time of `libtorch_cpu` on a 32-core threadripper. Instead, sharding into 5 files that take around 1 minute each cuts a full minute off the overall build time. This also factors out the `.findSchemaOrThrow(...).typed` step so the code can be shared between `call` and `redispatch`. Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D29962049 Pulled By: albanD fbshipit-source-id: be5df05fbea09ada0d825855f1618c25a11abbd8	2021-08-09 16:19:49 -07:00
Richard Barnes	4b9ca72c7c	irange-ify 13d (#62477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62477 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D30001499 fbshipit-source-id: 993eb2b39f332ff0ae6c663792bd04734cfc262b	2021-08-09 16:16:58 -07:00
peterjc123	d16587f84d	Enable rebuilds for Ninja on Windows (#62948 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59859. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62948 Reviewed By: seemethere, tktrungna Differential Revision: D30192246 Pulled By: janeyx99 fbshipit-source-id: af25cc4bf0db67a1304d9971cfa0ff6831bb3b48	2021-08-09 16:15:45 -07:00
Marjan Fariborz	a82b9ef1ff	BFP16 quantization/dequantization (#62974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62974 Testing the functionality of `tensor.to` approach. Comparing `tensor.to` and `torch.ops.fb.FloatToBfloat16Quantized` approach and testing if they match for 2d tensors. Test Plan: buck test //torchrec/fb/distributed/tests:test_quantized_comms Reviewed By: wanchaol Differential Revision: D30079121 fbshipit-source-id: 612e92baeb2245449637faa9bc31686353d67033	2021-08-09 15:47:07 -07:00
Xiang Gao	c4aeecac75	Migrate Embedding thrust sort to cub sort (#62495 ) Summary: This PR only migrates sort. Other thrust operations will be migrated in followup PRs Benchmark `num_embeddings` pulled from https://github.com/huggingface/transformers/tree/master/examples by ``` grep -P 'vocab_size.(=\|:)\s[0-9]+' -r transformers/examples/ grep -P 'hidden_size.(=\|:)\s[0-9]+' -r transformers/examples/ ``` to get `vocab_size = 119547, 50265, 32000, 8000, 3052` (similar size omitted) and `hidden_size = 512, 768` Code: ```python import torch import itertools num_embeddings = (119547, 50265, 32000, 8000, 3052) num_tokens = (4096, 16384) hidden_sizes = (512, 768) for ne, nt, nh in itertools.product(num_embeddings, num_tokens, hidden_sizes): print(f"Embedding size: {ne}, Tokens: {nt}, Hidden size: {nh}") embedding = torch.nn.Embedding(ne, nh).cuda() input_ = torch.randint(ne, (nt,), device='cuda') out = embedding(input_) torch.cuda.synchronize() %timeit out.backward(out, retain_graph=True); torch.cuda.synchronize() ``` ## On CUDA 11.3.1 Before: ``` Embedding size: 119547, Tokens: 4096, Hidden size: 512 1.43 ms ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 119547, Tokens: 4096, Hidden size: 768 2.07 ms ± 56.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Embedding size: 119547, Tokens: 16384, Hidden size: 512 1.61 ms ± 2.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 119547, Tokens: 16384, Hidden size: 768 2.32 ms ± 8.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Embedding size: 50265, Tokens: 4096, Hidden size: 512 738 µs ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 4096, Hidden size: 768 1.02 ms ± 1.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 16384, Hidden size: 512 913 µs ± 3.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 16384, Hidden size: 768 1.27 ms ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 4096, Hidden size: 512 559 µs ± 860 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 4096, Hidden size: 768 743 µs ± 630 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 16384, Hidden size: 512 713 µs ± 969 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 16384, Hidden size: 768 977 µs ± 884 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 4096, Hidden size: 512 301 µs ± 8.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 4096, Hidden size: 768 383 µs ± 4.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 16384, Hidden size: 512 409 µs ± 1.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 16384, Hidden size: 768 515 µs ± 766 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 4096, Hidden size: 512 215 µs ± 1.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 4096, Hidden size: 768 250 µs ± 320 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 16384, Hidden size: 512 271 µs ± 888 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 16384, Hidden size: 768 325 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` After: ``` Embedding size: 119547, Tokens: 4096, Hidden size: 512 1.42 ms ± 1.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 119547, Tokens: 4096, Hidden size: 768 2.05 ms ± 9.93 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Embedding size: 119547, Tokens: 16384, Hidden size: 512 1.6 ms ± 3.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 119547, Tokens: 16384, Hidden size: 768 2.3 ms ± 3.67 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Embedding size: 50265, Tokens: 4096, Hidden size: 512 730 µs ± 811 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 4096, Hidden size: 768 1.01 ms ± 2.71 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 16384, Hidden size: 512 887 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 16384, Hidden size: 768 1.25 ms ± 2.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 4096, Hidden size: 512 556 µs ± 1.86 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 4096, Hidden size: 768 744 µs ± 4.44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 16384, Hidden size: 512 691 µs ± 570 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 16384, Hidden size: 768 957 µs ± 2.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 4096, Hidden size: 512 309 µs ± 2.84 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 4096, Hidden size: 768 376 µs ± 2.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 16384, Hidden size: 512 381 µs ± 1.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 16384, Hidden size: 768 487 µs ± 2.42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 4096, Hidden size: 512 202 µs ± 383 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 4096, Hidden size: 768 239 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 16384, Hidden size: 512 243 µs ± 1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 16384, Hidden size: 768 340 µs ± 2.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` ## On CUDA 11.1 Before: ``` Embedding size: 119547, Tokens: 4096, Hidden size: 512 1.41 ms ± 14.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 119547, Tokens: 4096, Hidden size: 768 2.05 ms ± 7.61 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Embedding size: 119547, Tokens: 16384, Hidden size: 512 1.61 ms ± 1.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 119547, Tokens: 16384, Hidden size: 768 2.32 ms ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Embedding size: 50265, Tokens: 4096, Hidden size: 512 743 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 4096, Hidden size: 768 1.02 ms ± 2.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 16384, Hidden size: 512 912 µs ± 5.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 16384, Hidden size: 768 1.28 ms ± 6.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 4096, Hidden size: 512 555 µs ± 2.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 4096, Hidden size: 768 743 µs ± 655 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 16384, Hidden size: 512 714 µs ± 1.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 16384, Hidden size: 768 980 µs ± 1.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 4096, Hidden size: 512 312 µs ± 396 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 4096, Hidden size: 768 386 µs ± 2.32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 16384, Hidden size: 512 413 µs ± 3.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 16384, Hidden size: 768 512 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 4096, Hidden size: 512 209 µs ± 585 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 4096, Hidden size: 768 271 µs ± 776 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 16384, Hidden size: 512 297 µs ± 1.11 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 16384, Hidden size: 768 377 µs ± 3.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` After: ``` Embedding size: 119547, Tokens: 4096, Hidden size: 512 1.46 ms ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 119547, Tokens: 4096, Hidden size: 768 2.09 ms ± 4.31 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Embedding size: 119547, Tokens: 16384, Hidden size: 512 1.64 ms ± 4.48 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 119547, Tokens: 16384, Hidden size: 768 2.35 ms ± 2.54 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Embedding size: 50265, Tokens: 4096, Hidden size: 512 782 µs ± 2.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 4096, Hidden size: 768 1.06 ms ± 596 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 16384, Hidden size: 512 945 µs ± 2.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 50265, Tokens: 16384, Hidden size: 768 1.31 ms ± 553 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 4096, Hidden size: 512 603 µs ± 856 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 4096, Hidden size: 768 789 µs ± 500 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 16384, Hidden size: 512 752 µs ± 7.56 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 32000, Tokens: 16384, Hidden size: 768 1.01 ms ± 4.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 4096, Hidden size: 512 323 µs ± 7.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 4096, Hidden size: 768 398 µs ± 765 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 16384, Hidden size: 512 412 µs ± 544 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 8000, Tokens: 16384, Hidden size: 768 519 µs ± 614 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 4096, Hidden size: 512 229 µs ± 1.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 4096, Hidden size: 768 263 µs ± 417 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 16384, Hidden size: 512 274 µs ± 576 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Embedding size: 3052, Tokens: 16384, Hidden size: 768 354 µs ± 1.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62495 Reviewed By: gchanan Differential Revision: D30176833 Pulled By: ngimel fbshipit-source-id: 44148ebb53a0abfc1e5ab8b986865555bf326ad1	2021-08-09 15:31:55 -07:00
=	084e92bb76	Use output memory format based on input for cudnn_convolution_relu (#62482 ) Summary: Currently when cudnn_convolution_relu is passed a channels last Tensor it will return a contiguous Tensor. This PR changes this behavior and bases the output format on the input format. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62482 Reviewed By: ngimel Differential Revision: D30049905 Pulled By: cpuhrsch fbshipit-source-id: 98521d14ee03466e7128a1912b9f754ffe10b448	2021-08-09 15:31:53 -07:00
Richard Barnes	4fdb9579fa	irange-ify 12 (#62120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62120 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29879713 fbshipit-source-id: 3084a5eacb722f7fb0a630d47bf694f4d6831136	2021-08-09 15:31:51 -07:00
Richard Barnes	da9958c899	irange-ify 1 (#62193 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62193 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29879504 fbshipit-source-id: adc86adcd1e7dcdfa2d7adf4d576f081430d52ec	2021-08-09 15:30:43 -07:00
zhouzhuojie	161fb31893	Fix render_test_results if condition on always() (#62997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62997 Fixes #62979, changed the condition to listen on the previous' job's result to be either 'success' or 'failure'. Notice that 'skipped' will also skip this job, which is what we want. Test Plan: Imported from OSS Reviewed By: driazati, seemethere Differential Revision: D30202598 Pulled By: zhouzhuojie fbshipit-source-id: f3c0f715c39a5c8119b528b66e45f594a54b49d1	2021-08-09 15:27:40 -07:00
Rohan Varma	39ec1da935	[reland] Gate DistributedOptimizers on RPC availability (#62937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62937 reland due to windows + cuda failure, fix by running it on gloo on windows even with cuda. ghstack-source-id: 135306176 Test Plan: ci Reviewed By: mrshenli Differential Revision: D30177734 fbshipit-source-id: 7625746984c8f858648c1b3632394b98bd4518d2	2021-08-09 14:41:06 -07:00
Richard Barnes	5b8389e536	irange-ify 8d (#62505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62505 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29971891 fbshipit-source-id: 7dcbe27221788695f320c7238f5fe81e32823802	2021-08-09 13:18:38 -07:00
Bradley Davis	6286d33878	[fx] store Tracer class on Graph and GraphModule for package deserialization (#62497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62497 Previously named: add support for custom tracer in __reduce_package__ Stores a Tracer class on a Graph created by Tracer, and copies the Tracer class into the GraphModule's state so that when a GraphModule is packaged by torch package, it can be reconstructed with the same Tracer and GraphModule class name. Reviewed By: suo Differential Revision: D30019214 fbshipit-source-id: eca09424ad30feb93524d481268b066ea55b892a	2021-08-09 13:07:30 -07:00
Nikita Shulga	f82d4b8957	Mark unused functions with `C10_UNUSED` (#62929 ) Summary: Which fixes number of warnings Pull Request resolved: https://github.com/pytorch/pytorch/pull/62929 Reviewed By: walterddr, albanD Differential Revision: D30171953 Pulled By: malfet fbshipit-source-id: f82475289ff4aebb0c97794114e94a24d00d2ff4	2021-08-09 13:00:33 -07:00
peterjc123	08f6bc1da6	Stop exporting symbols in anonymous namespaces (#62952 ) Summary: The cases are found out by compiling against clang on Windows. Those functions will still be exported under this case, which is a waste of space in the symbol table. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62952 Reviewed By: gchanan Differential Revision: D30191291 Pulled By: ezyang fbshipit-source-id: 3319b0ec4f5fb02e0fe1b81dbbcedcf12a0c795e	2021-08-09 12:52:12 -07:00
Mike Iovine	3dcd785cac	[Static Runtime] Add tests for all aten ops (#62347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62347 This diff includes tests for all `aten` ops that did not already have test coverage. Test Plan: `buck test //caffe2/benchmarks/static_runtime/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D29968280 fbshipit-source-id: 768655ca535f9e37422711673168dce193de45d2	2021-08-09 12:09:59 -07:00
Zeina Migeed	a01f832329	handle get_attr opearations in typechecker (#62682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62682 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D30107789 Pulled By: migeed-z fbshipit-source-id: 0b21b2893e2dc7cfaf5b5f5990f662e051a981b4	2021-08-09 11:49:04 -07:00
Bert Maher	3eeaffc7c5	Linker version script to hide LLVM symbols (#62906 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62906 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30193893 Pulled By: bertmaher fbshipit-source-id: 9b189bfd8d4c52e8dc4296a4bed517ff44994ba0	2021-08-09 11:26:02 -07:00
Andrew Gu	1b1f1e36b4	Add ``allow_empty_param_list`` to functional optimizers (#62522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62522 Addresses https://github.com/pytorch/pytorch/issues/62481 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D30072074 Pulled By: andwgu fbshipit-source-id: 1a5da21f9636b8d74a6b00c0f029427f0edff0e3	2021-08-09 11:18:56 -07:00
Sangbaek Park	710c419f11	[Vulkan] Added Hardshrink op (#62870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62870 Added Hardshrink operator for Vulkan Added tests for Hardshrink op Reference: [Hardshrink](https://pytorch.org/docs/stable/generated/torch.nn.Hardshrink.html#torch.nn.Hardshrink) Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D30174950 Pulled By: beback4u fbshipit-source-id: 3e192390eb9f92abecae966e84bbfae356bfd7c8	2021-08-09 10:54:11 -07:00
Zeina Migeed	922710f9b9	Change output node handling for typechecker to deal with tuples (#62582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62582 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D30050004 Pulled By: migeed-z fbshipit-source-id: 9b81b10d24e1e8165cdc18c820ea314349b463cb	2021-08-09 10:47:12 -07:00
Edward Yang	e55f271859	__torch_dispatch__: Populate kwargs dictionary with keyword-only arguments (#62822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62822 This is BC breaking for people who were using the old integration, although only if you had been writing bindings for functions with keyword-only arguments (that includes functorch). Other than that, the patch was pretty straightforward. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30134552 Pulled By: ezyang fbshipit-source-id: a47f536fb030994a07c9386069b8f800ac86d731	2021-08-09 10:02:54 -07:00
Jane Xu	2b83007ae2	Modify GHA CI to use PYTORCH_IGNORE_DISABLED_ISSUES based on PR body (#62851 ) Summary: Another step forward in fixing https://github.com/pytorch/pytorch/issues/62359 Disclaimer: this only works with GHA for now, as circleci would require changes in probot. Test plan can be seen a previous description where I modified the description to include linked issues. I've removed them now since the actual PR doesn't fix any of them. It works! In the [periodic 11.3 test1](https://github.com/pytorch/pytorch/pull/62851/checks?check_run_id=3263109970), we get this in the logs and we see that PYTORCH_IGNORE_DISABLED_ISSUES is properly set: ``` test_jit_cuda_extension (__main__.TestCppExtensionJIT) ... Using /var/lib/jenkins/.cache/torch_extensions/py36_cu113 as PyTorch extensions root... Creating extension directory /var/lib/jenkins/.cache/torch_extensions/py36_cu113/torch_test_cuda_extension... Detected CUDA files, patching ldflags Emitting ninja build file /var/lib/jenkins/.cache/torch_extensions/py36_cu113/torch_test_cuda_extension/build.ninja... Building extension module torch_test_cuda_extension... Using envvar MAX_JOBS (30) as the number of workers... [1/3] c++ -MMD -MF cuda_extension.o.d -DTORCH_EXTENSION_NAME=torch_test_cuda_extension -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11 (`d55b25a633`)_COMPILER_TYPE=\"_gcc\" -DPYBIND11 (`d55b25a633`)_STDLIB=\"_libstdcpp\" -DPYBIND11 (`d55b25a633`)_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.6/site-packages/torch/include -isystem /opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.6/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++14 -c /var/lib/jenkins/workspace/test/cpp_extensions/cuda_extension.cpp -o cuda_extension.o [2/3] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=torch_test_cuda_extension -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11 (`d55b25a633`)_COMPILER_TYPE=\"_gcc\" -DPYBIND11 (`d55b25a633`)_STDLIB=\"_libstdcpp\" -DPYBIND11 (`d55b25a633`)_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.6/site-packages/torch/include -isystem /opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.6/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_52,code=compute_52 -gencode=arch=compute_52,code=sm_52 --compiler-options '-fPIC' -O2 -std=c++14 -c /var/lib/jenkins/workspace/test/cpp_extensions/cuda_extension.cu -o cuda_extension.cuda.o nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [3/3] c++ cuda_extension.o cuda_extension.cuda.o -shared -L/opt/conda/lib/python3.6/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o torch_test_cuda_extension.so Loading extension module torch_test_cuda_extension... ok (26.161s) ``` whereas on the latest master periodic 11.1 windows [test](https://github.com/pytorch/pytorch/runs/3263762478?check_suite_focus=true), we see ``` test_jit_cuda_extension (__main__.TestCppExtensionJIT) ... skip (0.000s) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62851 Reviewed By: walterddr, tktrungna Differential Revision: D30192029 Pulled By: janeyx99 fbshipit-source-id: fd2ecc59d2b2bb5c31522a630dd805070d59f584	2021-08-09 09:48:56 -07:00
Raghavan Raman	8b54b14f92	[Static Runtime] Added a cache for NNC generated code across different calls to the same ops (#62921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62921 Added a cache for NNC generated code across different calls to the same ops. Before this diff: ``` ProcessedNode time 13402.9 ms Static Module initialization took 30964.8 ms ``` After this diff: ``` ProcessedNode time 85.4195 ms Static Module initialization took 4348.42 ms ``` There is one global cache for all the ops. It is guarded with a reader-writer lock. This is necessary because we could have multiple threads loading different models in parallel. Note that this locking does not guarantee that there will be exactly one code generated for each op. There could be more than one thread generating code for the same op simultaneously and all of them will update the cache in some order. But that should be small number bounded by the number of threads. Also, there is no correctness issue, since the generated code is always the same and the one generated by the last thread is retained in the cache and reused later while running the model. Test Plan: Tested inline_cvr model Reviewed By: hlu1 Differential Revision: D30104017 fbshipit-source-id: 32e9af43d7e724ed54b661dfe58a73a14e443ff7	2021-08-09 09:30:07 -07:00
Rong Rong (AI Infra)	3782f3eced	Enable upper for torch.linalg.cholesky (#62434 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61988 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62434 Reviewed By: seemethere, tktrungna Differential Revision: D30079806 Pulled By: walterddr fbshipit-source-id: 044efb96525155c9bc7953ac4ad47c1b7c12fb20	2021-08-09 09:28:33 -07:00
Raghavan Raman	e54ee9bac1	[nnc] Updated IR cloning to create clones of expressions in addition to statements (#62833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62833 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D30135980 Pulled By: navahgar fbshipit-source-id: e557eedec7ecf596a4045756276d25a485fa66fb	2021-08-09 09:13:03 -07:00
peter	5deeaab36a	minor fixes in c10d for Windows (#62953 ) Summary: Found out by triggering builds against clang on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62953 Reviewed By: gchanan Differential Revision: D30191300 Pulled By: ezyang fbshipit-source-id: d929119768298084c41d70dbc3a78aacd64fb715	2021-08-09 09:05:09 -07:00
Elias Ellison	fff83f3f66	Add handling of list write to remove mutation (#62904 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62904 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D30168493 Pulled By: eellison fbshipit-source-id: 3b25982b235938cc7439dd3a5236dfce68254c05	2021-08-09 08:56:06 -07:00
Elias Ellison	254148ec7d	Add tensor-scalar op (#62903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62903 Test Plan: Imported from OSS Reviewed By: pbelevich, SplitInfinity Differential Revision: D30168338 Pulled By: eellison fbshipit-source-id: 7dcb34ddd76c6aad4108a4073d3c8a93d974d0ef	2021-08-09 08:54:47 -07:00
Yukio Siraichi	4c4c5b14e4	Port `sum.dim_IntList` kernel to structured kernels. (#61642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61642 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29783865 Pulled By: ezyang fbshipit-source-id: 375d4cd5f915812108367601a610a428762e606d	2021-08-09 08:46:16 -07:00
Marjan Fariborz	c7db642a72	Adding collective quantization API (#62142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62142 Created wrapper that takes the collective op and a quantization type as an arguments. It quantize the input, performs the collective op, and and perform dequantization Test Plan: Tested through distributed_gloo_fork. e.g., buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_all_to_all_quantized Reviewed By: wanchaol Differential Revision: D29682812 fbshipit-source-id: 79c39105ff11270008caa9f566361452fe82a92e	2021-08-09 08:11:22 -07:00
Erjia Guan	6ccedc7c1f	Set mkl thread locally (#62891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62891 Fixes #60469 We want to land this PR before next release, so soliciting the idea from raven38 in https://github.com/pytorch/pytorch/pull/60471. And, add corresponding test to verify the result. - Before this PR using this test: ![image](https://user-images.githubusercontent.com/68879799/128542334-1b899be5-2b6e-4c03-8ac0-568fb15470b8.png) - After this PR the test passed without Error. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30161483 Pulled By: ejguan fbshipit-source-id: 800f7204e0e1a19c492b2e556c92a91115f1b69b	2021-08-09 07:37:18 -07:00
Nikita Shulga	30214aef2d	[BE] irangefy (#62928 ) Summary: Replace for loop with for `irange` loop. Also fix some unused variable warnings in range loop cases Pull Request resolved: https://github.com/pytorch/pytorch/pull/62928 Reviewed By: driazati Differential Revision: D30171904 Pulled By: malfet fbshipit-source-id: 1b437a0f7e3515f4a2e324f3450e93312f1933ae	2021-08-07 13:34:13 -07:00
Will Constable	9f7aba737b	Make IMethod cache mutable so getArgument works on const IMethod (#62834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62834 Test Plan: existing unit tests Reviewed By: alanwaketan Differential Revision: D30135939 fbshipit-source-id: e19c0ac1af6996e065a18318351265b5c4a01e70	2021-08-06 22:58:21 -07:00
Mikhail Zolotukhin	b80dffd911	[TensorExpr] Remove more 'const' from IRVisitor methods for *Imm types. (#62932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62932 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30172961 Pulled By: ZolotukhinM fbshipit-source-id: 9b7f45880d356f823364135fe29fc08f6565f827	2021-08-06 22:44:09 -07:00
Natalia Gimelshein	b45cf9b81b	Revert D30117838: [WIP] Gate DistributedOptimizers on RPC availability Test Plan: revert-hammer Differential Revision: D30117838 (`3f09485d7e`) Original commit changeset: e6365a910a3d fbshipit-source-id: f276b2b2bdf5f7bd27df473fca0eebaee9f7aef2	2021-08-06 22:10:41 -07:00
Natalia Gimelshein	e6a3154519	Allow broadcasting along non-reduction dimension for cosine similarity (#62912 ) Summary: Checks introduced by https://github.com/pytorch/pytorch/issues/58559 are too strict and disable correctly working cases that people were relying on. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62912 Reviewed By: jbschlosser Differential Revision: D30165827 Pulled By: ngimel fbshipit-source-id: f9229a9fc70142fe08a42fbf2d18dae12f679646	2021-08-06 19:17:04 -07:00
Peter Bell	6630d98ae5	Refactor codegen file sharding (#62184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62184 File sharding is currently implemented twice, once for VariableType and once for TraceType. This refactors the implementation into `FileManager` and also changes it so template substitution is only done once and shared between the sharded file and the "Everything" file. Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D29962050 Pulled By: albanD fbshipit-source-id: 7858c3ca9f6e674ad036febd2d1a4ed2323a2861	2021-08-06 19:13:42 -07:00
Rohan Varma	44fad84bca	[DDP] Add host-side time to CUDATimer (#62770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62770 Adding timing of forward, backward comp, backward comm, etc will help detect desynchronization issues. ghstack-source-id: 135195680 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30115585 fbshipit-source-id: 509bf341c5c92dcc63bdacd3c1e414da4eb4f321	2021-08-06 18:41:40 -07:00
Will Constable	22e3cc21e5	Back out "Enable test_api IMethodTest in OSS" (#62893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62893 Original commit changeset: 50eb3689cf84 Test Plan: Confirm pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test2 passes in OSS Reviewed By: seemethere, alanwaketan Differential Revision: D30159999 fbshipit-source-id: 74ff8975328409a3dc8222d3e2707a1bb0ab930c	2021-08-06 16:43:50 -07:00
Alex Suhan	bbe2c8e6d2	Fix reshape for the Lazy key (#62846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62846 Test Plan: CI Reviewed By: zou3519 Differential Revision: D30162185 Pulled By: asuhan fbshipit-source-id: d582dcef35ce7e8bebf161a5c93e470339891e29	2021-08-06 15:29:56 -07:00
Natalia Gimelshein	6e24ce7a46	Revert D30138788: [pytorch][PR] OpInfo for `adaptive_avg_pool2d` Test Plan: revert-hammer Differential Revision: D30138788 (`5c431981b5`) Original commit changeset: 66735ceaa85b fbshipit-source-id: 75eb241ef82d32d6480db069c035df0abc6753fe	2021-08-06 15:17:05 -07:00
Angela Yi	d9154b9b26	[quant] Input-Weight Equalization - allow logical evaluation (#61603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61603 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D29686878 fbshipit-source-id: 67ca4cab98b3d592ff2bb8db86499789b85bd582	2021-08-06 15:10:32 -07:00
Eli Uriegas	43b087791c	.github: Make sure to deep clone on windows (#62907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62907 Deep clones allow us to use git commands on historical commits so that we can do things like collect test times correctly Should fix empty `.pytorch-test-times.json` files that walterddr was observing Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D30166414 Pulled By: seemethere fbshipit-source-id: 1f9904eeb5a8ebaf0a02d1aa7291fffe1aecd57b	2021-08-06 15:06:56 -07:00
Natalia Gimelshein	e3944ab00e	Revert D30038175: Improve IMethod::getArgumentNames to deal with empty argument names list Test Plan: revert-hammer Differential Revision: D30038175 (`64b3ab6407`) Original commit changeset: 46f08dda9418 fbshipit-source-id: 604735d2300487a0b75890b330d7ba5b3e7145b2	2021-08-06 14:58:43 -07:00
Yi Wang	7a3f1386ae	Add GradBucket::parameters() to ddp_comm_hooks.rst (#62877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62877 as title ghstack-source-id: 135214612 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D30153490 fbshipit-source-id: d4cec434a53ef6e65b60c065804884d1a114aa0d	2021-08-06 14:50:47 -07:00
eqy	6d24a075cb	Check contiguous to dispatch to NHWC cuda template (#62839 ) Summary: follow up of https://github.com/pytorch/pytorch/issues/62773 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62839 Reviewed By: H-Huang Differential Revision: D30142906 Pulled By: ngimel fbshipit-source-id: 600a7ad240a4a1827352eab8c8cbc98240d693f0	2021-08-06 14:11:10 -07:00
=	e6e579ce74	[FX] Add torch.memory_format as a BaseArgumentType (#62593 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62498 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62593 Reviewed By: H-Huang Differential Revision: D30104091 Pulled By: cpuhrsch fbshipit-source-id: 25b7a4b308219860c969db54d7b1867b1aa4180a	2021-08-06 14:03:41 -07:00
Rong Rong (AI Infra)	97dc43beeb	use test environment for test phase (#62824 ) Summary: Currently all test generated in test matrix share the same `BUILD_ENVIRONMENT` variable. we should distinguish them because some test scripts uses BUILD_ENVIRONMENT to differentiate what to run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62824 Reviewed By: zhouzhuojie Differential Revision: D30162250 Pulled By: walterddr fbshipit-source-id: 3a99a21e91e02ed8638feed102e7966af01dd175	2021-08-06 11:52:41 -07:00
Jane Xu	786934902c	Adds JOB_BASE_NAME to steps of CircleCI mac workflows (#62892 ) Summary: Upon noticing that we had a job entry named "None" in our S3 stats, I set out to find which test reporting had a JOB_BASE_NAME that wasn't set. It turns out all non Windows and Linux workflows did not have JOB_BASE_NAME but instead used CIRCLE_JOB. This remedies the current issue by explicitly setting JOB_BASE_NAME in Mac workflows, but doesn't touch anything else as those other jobs (like android) do not report test stats. This also adds back the CIRCLE_JOB dependency in print_test_stats to be backwards compatible, but the goal is to move off of CIRCLE_JOB dependency to a more CI-platform-agnostic naming of variables. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62892 Test Plan: Imported from GitHub, without a `Test Plan:` line. {F639556801} None is now the macos! Reviewed By: walterddr Differential Revision: D30160234 Pulled By: janeyx99 fbshipit-source-id: df868dec5f9b289d3837e927d2bb95acb2d9185b	2021-08-06 11:34:17 -07:00
Rong Rong (AI Infra)	c9b5d79d40	[hotfix] fix BC checker direction (#62901 ) Summary: fix https://github.com/pytorch/pytorch/issues/62687 error. should allow listed those that has date time newer than today. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62901 Reviewed By: zhouzhuojie Differential Revision: D30163202 Pulled By: walterddr fbshipit-source-id: b882975a231249137cb2d252f41e98e133b6f337	2021-08-06 11:29:28 -07:00
Thomas J. Fan	59d09b148c	BUG Fixes bug in no_batch_dim tests (#62726 ) Summary: The way that Python captures variables for lambdas meant that only the last `input_fn`, etc were captured. This PR adds makes sure the local variable to captured by a lambda. REF: https://docs.python.org/3/faq/programming.html#why-do-lambdas-defined-in-a-loop-with-different-values-all-return-the-same-result Pull Request resolved: https://github.com/pytorch/pytorch/pull/62726 Reviewed By: zou3519 Differential Revision: D30159478 Pulled By: jbschlosser fbshipit-source-id: cfef3d9776d2676b2f5bb6d39d569b8ca07b0fe5	2021-08-06 11:11:25 -07:00
Jane Xu	a03604c610	Set JOB_BASE_NAME consistently for bazel (#62886 ) Summary: It was manually set incorrectly before to pytorch-linux-xenial-py3.6-gcc7-bazel-test-test, which is inconsistent with the rest of our naming scheme. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62886 Reviewed By: driazati Differential Revision: D30159860 Pulled By: janeyx99 fbshipit-source-id: 4984ec04ee2bcf68b9a57e241ca9f979bfe6398a	2021-08-06 11:07:03 -07:00
Rohan Varma	3f09485d7e	[WIP] Gate DistributedOptimizers on RPC availability (#62774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62774 Gates DistributedOptimizer which relies on RRef based on if RPC is available. This should enable ZeRo to work with Windows as Windows should not try to import the DIstributedOptimizer. If this works as expected we can enable the windows tests for functional/local sgd optimizers as well. ghstack-source-id: 135216642 Test Plan: CI Reviewed By: pbelevich Differential Revision: D30117838 fbshipit-source-id: e6365a910a3d1ca40d95fa6777a7019c561957db	2021-08-06 10:59:00 -07:00
Rohan Varma	1dba329d20	Enable step_param for Adam functional optimizer (#62611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62611 Enables optimizer overlap with backwards in DDP for Adam. Additional optimizers, especially Adagrad will be done in follow up diffs. 1. Implement `step_param` method based on `step` in _FunctionalAdam (perf permitting we can later dedupe `step` to call `step_param` 2. Modify tests to test all current functional optimizers. ghstack-source-id: 135207143 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29891783 fbshipit-source-id: 321915982afd5cb0a9c2e43d27550f433bff00d1	2021-08-06 10:53:55 -07:00
Angela Yi	836b2431dc	[quant] Input-Weight Equalization - selective equalization (#61916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61916 Functions used to run selective equalization based on the SQNR obtained from running the Numeric Suite. After running the Numeric Suite between the equalized and float model, we will get the SQNR between the two models and construct an equalization_qconfig_dict that specifies to only equalize the layers with the highest quantization errors. How to run: ``` layer_to_sqnr_dict = get_layer_sqnr_dict(float_model, equalized_model, input) eq_qconfig_dict = get_equalization_qconfig_dict(layer_to_sqnr_dict, equalized_model, num_layers_to_equalize) prepared = prepare_fx(float_model, qconfig_dict, eq_qconfig_dict) ... ``` Test Plan: `python test/test_quantization.py TestEqualizeFx.test_selective_equalization` Imported from OSS Reviewed By: supriyar Differential Revision: D29796950 fbshipit-source-id: 91f0f8427d751beaea32d8ffc2f3b8aa8ef7ea95	2021-08-06 09:29:03 -07:00
Yusuo Hu	e6ef87001c	[BF16] Add BF16 support to _aminmax and _anminmax_all operators (#62767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62767 Add BF16 support to _aminmax_all and _aminmax operators. Test Plan: Added unit test: https://www.internalfb.com/intern/testinfra/testconsole/testrun/2533274857208373/ Reviewed By: anjali411 Differential Revision: D30073837 fbshipit-source-id: 9cb4991e644cfdb2f0674ccaff161d223c174150	2021-08-06 08:56:12 -07:00
Stephen Jia	56ff996386	[vulkan] Add _reshape_alias (#62858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62858 D29792126 (`adb73d3dcf`) changed the behaviour of `reshape()` such that it calls `_reshape_alias()` instead of `view()` in order to avoid duplicating some work such as computing strides. Vulkan has not yet implemented `_reshape_alias()` so `reshape()` would fail with ``` C++ exception with description "Could not run 'aten::_reshape_alias' with arguments from the 'Vulkan' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. ``` For Vulkan there is no concept of strides so it's fine to just have `_reshape_alias()` point to `view()`. Test Plan: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` Reviewed By: kimishpatel Differential Revision: D30054706 fbshipit-source-id: 770979fa3a0f99bcc2ddaefa4674e5bd79b17c03	2021-08-06 08:44:15 -07:00
Stephen Jia	5f4207eb91	[vulkan] Throw an exception if device does not support Vulkan (#62859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62859 If the Vulkan instance cannot be initialized successfully (i.e. no `vkPhysicalDevice` could be found due to missing drivers) then Vulkan ops will not be able to execute. However, currently `api::context()` which is used to access the global Vulkan context simply returns a null pointer if there is a problem initializing the Vulkan instance. This leads to Segmentation Faults later on because Vulkan ops assume that `api::context()` will not return a `nullptr`. For instance: [this line](https://www.internalfb.com/code/fbsource/xplat/caffe2/aten/src/ATen/native/vulkan/ops/Persistent.cpp?lines=14) will frequently cause a Segmentation Fault when drivers are not present. Instead of having `api::context()` returning a nullptr when Vulkan cannot be initialized, it should just throw an exception since ops cannot be executed anyway. This results in a more graceful failure as these exceptions can be caught instead of crashing the app with a Seg Fault down the line. Test Plan: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` On an Omni model portal, I can also remove the vulkan drivers in order to test the functionality when Vulkan is not supported. Reviewed By: kimishpatel Differential Revision: D30139891 fbshipit-source-id: 47fcc8dcd219cb78ab9bec0b6a85b2aa7320ab50	2021-08-06 08:42:26 -07:00
Vitaly Fedyunin	d3bdf345cb	Introducing DataChunk for DataPipes batching (#62768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62768 This is part of TorchArrow DF support preparation, separating it to multiple PRs to simplify review process. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D30149090 Pulled By: VitalyFedyunin fbshipit-source-id: a36b5ff56e2ac6b06060014d4cd41b487754acb8	2021-08-06 08:38:33 -07:00
Edward Yang	5e5de75f4d	Add getPyInterpreter() API (#62659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62659 It turns out that it is occasionally useful to be able to access the PyInterpreter object from other Python bindings (see next diff in the stack). Make it publicly available. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30074926 Pulled By: ezyang fbshipit-source-id: 2f745ab7c7a672ed7215231fdf9eef6af9705511	2021-08-06 08:23:24 -07:00
Eugene Yang	27135f86fd	fix docstring default value of `last_epoch` for SWALR in torch/optim/… (#62799 ) Summary: …swa_utils Fixes https://github.com/pytorch/pytorch/issues/62633 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62799 Reviewed By: zou3519 Differential Revision: D30131929 Pulled By: H-Huang fbshipit-source-id: 741c077073bbe398492dff0761836acdbba7be78	2021-08-06 08:15:10 -07:00
Michael Shang	9573e7a644	rename namespace f4d to velox (#61 ) Summary: Pull Request resolved: https://github.com/facebookexternal/torchdata/pull/61 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62860 Pull Request resolved: https://github.com/facebookexternal/presto_cpp/pull/453 Moving all namespace definitions, declarations and references from 'f4d' to 'velox' Test Plan: ``` buck build //f4d/... buck test //f4d/... ``` Also monitor the signals from sandcaslte Reviewed By: pedroerp Differential Revision: D30140136 fbshipit-source-id: 5b53ac768bb7e5cd07c93a9b04dfd6363080eb52	2021-08-05 21:04:36 -07:00
Aliaksandr Ivanou	e1f81c9321	[torchelastic][multiprocessing] Print warning message only when child processes are stuck (#62823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62823 The diff makes sure that the warning message is printed only when the child processes are stuck after sending termination code. Test Plan: sandcastle buck build mode/dev-nosan //caffe2:run buck-out/gen/caffe2/run.par --nnodes 1 --nproc_per_node 1 main.py P435691445 Differential Revision: D30046695 fbshipit-source-id: c59170b297f4a0e530906fa5069234303deee938	2021-08-05 19:57:31 -07:00
Sameer Deshmukh	f6c7081a16	Allow FractionalMaxPool 2D and 3D layers to accept 0 dim batch size tensors. (#62083 ) Summary: This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115. Allow `FractionalMaxPool` 2D and 3D layers to accept 0 dim batch sizes. Also make some minor corrections to error messages to make them more informative. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62083 Reviewed By: H-Huang Differential Revision: D30134461 Pulled By: jbschlosser fbshipit-source-id: 0ec50875d36c2083a7f06d9ca6a110fb3ec4f2e2	2021-08-05 17:40:10 -07:00
Andrew Gu	8aa12cbf86	Add tutorial link (#62785 ) Summary: Addresses: https://github.com/pytorch/pytorch/pull/62605#discussion_r681380364 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62785 Test Plan: I checked the render, and the link redirects as desired. Reviewed By: mrshenli Differential Revision: D30133229 Pulled By: andwgu fbshipit-source-id: baefe0d1f1b78ece44bb42e67629bc130dbf8e9a	2021-08-05 17:28:02 -07:00
kshitij12345	64c54f92ca	[opinfo] nn.functional.unfold (#62705 ) Summary: Reference: facebookresearch/functorch#78 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62705 Reviewed By: H-Huang Differential Revision: D30138807 Pulled By: zou3519 fbshipit-source-id: 1d0b0e58feb13aec7b231c9f632a6d1694b9d272	2021-08-05 17:12:25 -07:00
Rohan Varma	9ac56ef0fc	[DDP] log gradient ready order and bucket indices (#62751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62751 This will help us determine whether gradient ready order and bucket indices are aligned amongst all the ranks. This should always be true for rank 0 as we determine rebuilt bucket order by the gradient ready order on rank 0, but would be interested to see this on different workloads for other ranks ghstack-source-id: 135104369 Test Plan: CI Reviewed By: SciPioneer, wanchaol Differential Revision: D30111833 fbshipit-source-id: a0ab38413a45022d953da76384800bee53cbcf9f	2021-08-05 16:36:25 -07:00
Rohan Varma	80091cb0f7	[DDP] Allow tuning of first bucket (#62748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62748 Previously after buckets were rebuilt the first bucket size was always defaulted to 1MB, this diff allows first bucket to be tuned like the rest of the bucket sizes can. Setting `dist._DEFAULT_FIRST_BUCKET_BYTES = 1` results in the following logs as expected: I0804 12:31:47.592272 246736 reducer.cpp:1694] 3 buckets rebuilt with size limits: 1, 1048, 1048 bytes. ghstack-source-id: 135074696 Test Plan: CI Reviewed By: SciPioneer, wanchaol Differential Revision: D30110041 fbshipit-source-id: 96f76bec012de129d1645e7f50e266d4b255ec66	2021-08-05 16:35:07 -07:00
Kushashwa Ravi Shrimali	5c431981b5	OpInfo for `adaptive_avg_pool2d` (#62704 ) Summary: Please see https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261. Note regarding sample inputs for this function: * Checks added for all relevant/interesting cases for `output_size`: `(None, None), (None, width), (height, None), (height, width)`. cc: mruberry zou3519 Chillee Pull Request resolved: https://github.com/pytorch/pytorch/pull/62704 Reviewed By: H-Huang Differential Revision: D30138788 Pulled By: zou3519 fbshipit-source-id: 66735ceaa85b9e6050d4ec27749fc3a8108cf557	2021-08-05 16:11:31 -07:00
neginraoof	eaaceea8d4	Bump protobuf version in CircleCI docker images (#62441 ) Summary: Needed to update ONNX to 1.10 (https://github.com/pytorch/pytorch/issues/62039) because that introduces uses of the "reserved" protobuf feature. Also: * Remove protobuf install code from scripts where it was unused. * Add `-j` flag to make commands to speed things up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62441 Reviewed By: soulitzer Differential Revision: D30072381 Pulled By: malfet fbshipit-source-id: f55a4597baf95e3ed8ed987d6874388cab3426b0	2021-08-05 15:46:12 -07:00
Zhengxu Chen	e62189ad69	[jit] Better checking for overload function declarations. (#59956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59956 Issue #50175. Basically two things need to be checked and are lacking currently: 1. Overload declarations should always have a single `pass` statement as the body. 2. There should be always an implementation provided for decls which doesn't have the torch.jit._overload decorator. So in this case we need to check whether we are actually compiling a function body with decorator ahead. Test Plan: python test/test_jit.py TestScript.test_function_overloads Imported from OSS Reviewed By: gmagogsfm Differential Revision: D29106555 fbshipit-source-id: 2d9d7df2fb51ab6db0e1b726f9644e4cfbf733d6	2021-08-05 14:21:48 -07:00
Will Constable	63fa53d37a	Add batched model to torchdeploy examples (#62836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62836 Used for upcoming diff that adds support for batching to torchdeploy Test Plan: Models are used by later diffs, but generation script is verified by CI now and locally. Reviewed By: gunchu Differential Revision: D30135938 fbshipit-source-id: 566a32a3ede56833e41712025e9d47191dfc5f39	2021-08-05 14:01:40 -07:00
mattip	c8eda919a4	test, fix sparse * dense exceptions and corner case (#61723 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59916 This fixes two problems with sparse multiplication - 0d-dense * sparse was creating a non-sparse output and failing. - dense * sparse or sparse * dense is not supported, but would emit an unhelpful error message <details> <summary> unhelpful error message </summary> Traceback (most recent call last): File "<stdin>", line 1, in <module> NotImplementedError: Could not run 'aten::_nnz' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_nnz' is only available for these backends: [SparseCPU, SparseCUDA, SparseCsrCPU, SparseCsrCUDA, BackendSelect, Python, Named, Conjugate, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode]. SparseCPU: registered at aten/src/ATen/RegisterSparseCPU.cpp:961 [kernel] SparseCUDA: registered at aten/src/ATen/RegisterSparseCUDA.cpp:1092 [kernel] SparseCsrCPU: registered at aten/src/ATen/RegisterSparseCsrCPU.cpp:202 [kernel] SparseCsrCUDA: registered at aten/src/ATen/RegisterSparseCsrCUDA.cpp:229 [kernel] BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback] Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:38 [backend fallback] Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback] Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:118 [backend fallback] ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:60 [backend fallback] AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradXPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradMLC: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradHPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel] Tracer: registered at ../torch/csrc/autograd/generated/TraceType_2.cpp:10254 [kernel] UNKNOWN_TENSOR_TYPE_ID: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:446 [backend fallback] Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:285 [backend fallback] Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback] VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback] </details> Also added tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61723 Reviewed By: ezyang Differential Revision: D29962639 Pulled By: cpuhrsch fbshipit-source-id: 5455680ddfa91d5cc9925174d0fd3107c40f5b06	2021-08-05 11:27:12 -07:00
Peter Lin	8d7786ada6	Simplify hardswish ONNX export graph. (#60080 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58301 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60080 Reviewed By: suo Differential Revision: D30002939 Pulled By: SplitInfinity fbshipit-source-id: 8b4ca6f62d51b72e9d86534592e3c82ed6608c9d	2021-08-05 11:15:14 -07:00
Philip Meier	7630f407cc	add `OpInfo` for `torch.nn.functional.grid_sample` (#62311 ) Summary: Addresses facebookresearch/functorch#78. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62311 Reviewed By: malfet Differential Revision: D30013388 Pulled By: zou3519 fbshipit-source-id: 0887ae9935923d928bfeb59054afe1aab954b40b	2021-08-05 10:43:54 -07:00
Kushashwa Ravi Shrimali	5dbcd5638b	OpInfo for `nn.functional.avg_pool2d` (#62455 ) Summary: Please see https://github.com/facebookresearch/functorch/issues/78 cc: mruberry zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62455 Reviewed By: soulitzer Differential Revision: D30096146 Pulled By: heitorschueroff fbshipit-source-id: ef09abee9baa5a9aab403201226d1d9db5af100a	2021-08-05 10:28:52 -07:00
Eddie Yan	878943c64f	Preserve memory layout when aten batchnorm is used (#62773 ) Summary: https://github.com/pytorch/pytorch/issues/62594 CC cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/62773 Reviewed By: H-Huang Differential Revision: D30118658 Pulled By: cpuhrsch fbshipit-source-id: bce9e92f5f8710c876a33cccbd1625155496ddea	2021-08-05 10:21:44 -07:00
Karen Zhou	d45291613c	[pruner] generalize bias hook for conv2d (#62430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62430 The bias hook is a forward hook that is part of the pruning parametrization; it is attached after the activation reconstruction forward hook, so adding the bias occurs after zeros are reinserted to the pruned activation. This diff/PR amends the bias hook to work for Conv2d layers, in addition to Linear layers. The reshaping of the ._bias parameter ensures that it is added to the right dimension of the output. ghstack-source-id: 135097700 Test Plan: Added tests for `Conv2dB()`, a model with Conv2d layers that have `bias=True`. `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1MfgL Reviewed By: jerryzh168 Differential Revision: D29979571 fbshipit-source-id: c1a7e9fabc8b3c9d0050bd6b6c6a631ddfdf2a68	2021-08-05 09:27:17 -07:00
Vasiliy Kuznetsov	b524a1101a	ns for fx: add ref_node_target_type (#62685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62685 Adds a `ref_node_target_type` field to hold the string type of the base node. This is needed because in some cases the previous node does not match ref_node (if we have observers, or if we are logging inputs), and it is useful to know the type of ref_node. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: hx89 Differential Revision: D30082947 fbshipit-source-id: 98ded7b25a5d8d5ea820e0ef62c3799b65c3fc77	2021-08-05 09:26:10 -07:00
Jane Xu	b96acb7591	Allow disabled tests to be re-enabled with IGNORE_DISABLED_ISSUES (#62686 ) Summary: Part 1 of fixing https://github.com/pytorch/pytorch/issues/62359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62686 Test Plan: 1. Check out this PR and run `python setup.py install`. 2. The test we will be running requires CUDA. If you don't have CUDA, you can try this on another device or simply comment out the skipIf statement before the `test_jit_cuda_extension` test in `test_cpp_extensions_jit.py` 3. Run: `IN_CI=1 python test/run_test.py -i test_cpp_extensions_jit -- -k test_jit_cuda_extension` and notice that it should skip. If it doesn't skip, edit test/.pytorch-disabled-tests.json: modify the platforms list of the first issue (61655) to include whatever platform you are on (macos or linux), and just run `python test/test_cpp_extensions_jit.py -v -k test_jit_cuda_extension --import-disabled-tests` to make sure it skips. 4. Now `export PYTORCH_IGNORE_DISABLED_ISSUES=61655` or `export PYTORCH_IGNORE_DISABLED_ISSUES=34952,61655`. 5. `rm test/.pytorch-*` to clear the cached files. 6. Run the same command as in step 5 and note that it SHOULDN'T skip. It should run. Reviewed By: walterddr, samestep Differential Revision: D30108773 Pulled By: janeyx99 fbshipit-source-id: dbf015a266f57577dc9283b0cdff720083b5c0cb	2021-08-05 09:05:40 -07:00
Nikita Shulga	24a2681358	Revert D30094460: [profiler] Re-enable test on Windows Test Plan: revert-hammer Differential Revision: D30094460 (`5a1017be97`) Original commit changeset: 80521f6bc136 fbshipit-source-id: 7c01493ad078be7df1bbb81c08be6364d6ffaa4d	2021-08-05 08:34:15 -07:00
Pavel Belevich	0c8ed042f2	Revert D30095246: [pytorch][PR] Enable ncclAvg for reductions Test Plan: revert-hammer Differential Revision: D30095246 (`a749180e4e`) Original commit changeset: d3a3475345fa fbshipit-source-id: 34b5100b925859461296cae5a717a70e5eca6af6	2021-08-05 07:56:40 -07:00
cpatru	6d896cb545	Update faq.rst so OOM section mentions checkpoint (#62709 ) Summary: This FAQ has a section for CUDA OOMs where there are lots of don'ts. This limits modeling solution. Deep nets can blow up memory due to output caching during training. It's a known problem with a known solution: to trade-off compute for memory via checkpointing. FAQ should mention it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62709 Reviewed By: nairbv Differential Revision: D30103326 Pulled By: ezyang fbshipit-source-id: 3a8b465a7fbe19aae88f83cc50fe82ebafcb56c9	2021-08-05 07:40:08 -07:00
Edward Yang	b84885cc8b	Add support for boxed functors (#62658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62658 Boxed functors, like their unboxed brethren, support operators which aren't just a function pointer, but a function pointer with some associated global state that is allocated at registration time. The use case I have in mind with this implementation is "dispatcher API from Python", where the extra state kernel registrations need is the PyObject callable we will invoke to do the actual invocation. See next PR in this stack. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D30074925 Pulled By: ezyang fbshipit-source-id: ee040edbbec1e607486d338d1ea78bb5c6b2ece9	2021-08-05 07:26:09 -07:00
Alban Desmaison	e6a227465b	Add serialization support for slots and subclass getstate/setstate (#62745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62745 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30113112 Pulled By: albanD fbshipit-source-id: 6c562d0c060fb0280e5e3d432bb42fb833e6d500	2021-08-05 06:49:44 -07:00
Alban Desmaison	056b147e10	clean torch_function handling in serialization (#62744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62744 The `Tensor._reduce_ex_internal` function can only be called via the `Tensor.__reduce_ex__` function. And that second function already properly handles the `__torch_function__` overwrites. So no need to handle them again in `Tensor._reduce_ex_internal`. This PR also updates `Tensor.__reduce_ex__` to use the specialized unary API for `__torch_function__` that makes it nicer to read. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D30113113 Pulled By: albanD fbshipit-source-id: c94f5d2597ee3afe799d9de991f75615c3c172d6	2021-08-05 06:48:26 -07:00
Sean Lawlor	ee82e7a14e	[DDP Communication Hook] Renaming C++ calls to match python API closer (#62735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62735 Renamed the following 1. getTensor -> getBuffer 2. getTensorRef -> getBufferRef 3. setTensor -> setBuffer and all associated private variables as well Reviewed By: SciPioneer Differential Revision: D30069124 fbshipit-source-id: fa8f1f8a7f3255e6242973bc37b3f7b2731af55d	2021-08-05 05:06:29 -07:00
Jiewen Tan	64b3ab6407	Improve IMethod::getArgumentNames to deal with empty argument names list (#62782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62782 This diff improved IMethod::getArgumentNames to deal with empty argument names list. Test Plan: buck test mode/dev caffe2/caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor.GetEmptyArgumentNamesValidationMode buck test mode/dev caffe2/caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor.GetEmptyArgumentNamesRealMode Reviewed By: wconstab Differential Revision: D30038175 fbshipit-source-id: 46f08dda94187160b4d6ee87600d1b46fe934222	2021-08-05 01:32:00 -07:00
Dhruv Matani	019048b3b6	[PyTorch Edge] Simplify Exception Handling (Take-2) (module.cpp) (#62634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62634 Apply the same set of changes as in D27688352 (`d728491fc1`) to `module.cpp` as instructed by xcheng16. Basically, this simplifies exception handling and allows propagation of the original message undisturbed to the caller so that we can figure out the lineage of the exception in crash tasks such as t96812652 ghstack-source-id: 134877012 Test Plan: Build/Sandcastle Reviewed By: raziel Differential Revision: D30038867 fbshipit-source-id: 8dfd415c510bcd0ab49814f4eb559ec6fc8f72e5	2021-08-04 23:25:30 -07:00
Jiewen Tan	4b68801c69	Enable test_api IMethodTest in OSS (#62521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62521 This diff did the following few things to enable the tests: 1. Exposed IMethod as TORCH_API. 2. Linked torch_deploy to test_api if USE_DEPLOY == 1. Test Plan: ./build/bin/test_api --gtest_filter=IMethodTest.* To be noted, one needs to run `python torch/csrc/deploy/example/generate_examples.py` before the above command. Reviewed By: ezyang Differential Revision: D30055372 Pulled By: alanwaketan fbshipit-source-id: 50eb3689cf84ed0f48be58cd109afcf61ecca508	2021-08-04 21:14:20 -07:00
Michael Carilli	a749180e4e	Enable ncclAvg for reductions (#62303 ) Summary: [ncclAvg](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/types.html?highlight=ncclavg#c.ncclAvg) is a new `ncclRedOpt_t` that fuses a div-by-world-size with ncclAllReduce, Reduce, or ReduceScatter. This PR adds support. This PR and https://github.com/pytorch/pytorch/pull/62140 lay the foundation for to DDP allreduce+average grad tensors in place with a single nccl call without additional memory pass(es) to flatten or average or unflatten. I'll write the necessary DDP changes once this PR and https://github.com/pytorch/pytorch/pull/62140 land. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62303 Reviewed By: soulitzer Differential Revision: D30095246 Pulled By: rohan-varma fbshipit-source-id: d3a3475345fafb0ab265c11d36db74d7c5613a0a	2021-08-04 19:43:50 -07:00
Zeina Migeed	4bd54cebe0	Refinement types and unification for symbolic shape inference (#61776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61776 Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D29772537 Pulled By: migeed-z fbshipit-source-id: 3555d43152a213087c64faa326432f1628eb3bb1	2021-08-04 17:34:29 -07:00
Hao Lu	a27a0b1ef5	[SR] Disable NNC temporarily (#62746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62746 Disable NNC temporarily until a code cache is implemented to reduce the compilation time. Reviewed By: ajyu Differential Revision: D30080326 fbshipit-source-id: ef8bb3ac3a6947614f4a03a3d52774b6933d3ea8	2021-08-04 17:33:07 -07:00
Nikita Shulga	afc1d1b3d6	Fix lint errors in cuda_ReportMemoryUsage tests (#62778 ) Summary: Introduced in `8bbcef5096` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62778 Reviewed By: chaekit, driazati Differential Revision: D30120245 Pulled By: malfet fbshipit-source-id: 2cb5755b870182dd147a6685c74f7defcc10030a	2021-08-04 17:26:23 -07:00
Matti Picus	658540f43f	remove deprecated is_deterministic and set_deterministic (#62158 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58096 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62158 Reviewed By: mruberry Differential Revision: D29909634 Pulled By: ezyang fbshipit-source-id: ccffbcf8f378e39bd2c7fbeace7ed1cbbe003981	2021-08-04 16:45:23 -07:00
Kushashwa Ravi Shrimali	a705b8f08f	OpInfo for `nn.functional.relu` (#62076 ) Summary: See https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261. cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/62076 Reviewed By: soulitzer Differential Revision: D30013262 Pulled By: zou3519 fbshipit-source-id: 7df5e930d1588146e09cf58c53c8860392da7348	2021-08-04 15:50:18 -07:00
Yukio Siraichi	123be6b261	Port `addcdiv` to structured kernels. (#62319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62319 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29961996 Pulled By: bdhirsh fbshipit-source-id: d38141476b41dbfd4bf029d631f81a32aff82a5e	2021-08-04 15:35:25 -07:00
Yukio Siraichi	693b0af996	Port `addcmul` kernels to structured kernels. (#62318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62318 Tracking issue: #55070 This PR introduces the method `TensorIteratorBase::build_ternary_op` for building a `TensorIteratorBase` for 3-input 1-output kernel. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29961997 Pulled By: bdhirsh fbshipit-source-id: 2208d24823bad6e74c8d508f363716d8125b8619	2021-08-04 15:34:01 -07:00
Han Guangyun	8bbcef5096	Report more information for memory profiling (#61282 ) Summary: Report pointed memory size, total allocated memory, total reserved size all in one report. `ptr` and `alloc_size` will be used for associating with op trace. `allocated_size`, `reserved_size` will be used for memory trace. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61282 Reviewed By: ejguan Differential Revision: D29796282 Pulled By: chaekit fbshipit-source-id: 5314c867632d3af1fa9a3811b35eaa5e931a5d87	2021-08-04 15:03:14 -07:00
CodemodService FBSourceClangFormatLinterBot	0aee9c0ef8	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D30097148 fbshipit-source-id: 514c22ea52f048bb048a53fa6b5ea57f3ac12250	2021-08-04 14:58:29 -07:00
Will Constable	aed01a991d	Add hasattr to torch::deploy interface and hasMethod to PredictorContainer (#62669 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62669 Useful to avoid having to implement null checking on the application side. Test Plan: Add unit tests Reviewed By: suo, houseroad Differential Revision: D30074406 fbshipit-source-id: 881aec735953b43cb24786c1a2d79e8e724928b8	2021-08-04 14:48:34 -07:00
Qing Hu	281737ea6f	[DDP Communication Hook] Rename 4 Methods of GradBucket Class Summary: 1. getPerParameterTensors -> getGradients 2. getModelParamsForBucket -> getParameters 3. isTheLastBucketToAllreduce -> IsLast Test Plan: Test results for "buck test mode/dev-nosan caffe2/test/distributed:c10d": https://pxl.cl/1Mrq8 Test results for "buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork": https://pxl.cl/1MrtP Reviewed By: SciPioneer Differential Revision: D30076436 fbshipit-source-id: 0bd1e410186a318ea6328f4c1e830ea5632f8a47	2021-08-04 14:37:23 -07:00
Rong Rong (AI Infra)	7f1b672b7a	Revert D29952381: [Static Runtime] Ensure that unittests only use out variants or native ops Test Plan: revert-hammer Differential Revision: D29952381 (`8737e17af2`) Original commit changeset: e60e70b80ccf fbshipit-source-id: 59dc2f920b7ceaf94ba8f5f36024e7cc710f6645	2021-08-04 14:25:11 -07:00
Eli Uriegas	491d89da1b	.github: Fix --no-build-suffix (#62739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62739 Original flag didn't initially work correctly so this makes it actually output the right thing Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D30107694 Pulled By: seemethere fbshipit-source-id: 5ff28d6820b9cf7145dbb617b86a941bf7686b5c	2021-08-04 14:19:38 -07:00
Kyle Matoba	de94034328	Fixes #62636 (#62670 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62636. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62670 Reviewed By: ezyang Differential Revision: D30102179 Pulled By: soulitzer fbshipit-source-id: 38480463ef354f2c12ed83e6678aed26b0b96efe	2021-08-04 13:58:21 -07:00
Nikita Vedeneev	8e35df0bf3	det_backward: return svd path for double backward (so that all ci tests pass) (#62570 ) Summary: Potentially fixes https://github.com/pytorch/pytorch/issues/62327 and fixes https://github.com/pytorch/pytorch/issues/62328. This PR replaces the double backward of det from eig to svd. The latter is slower but should be more stable. CC anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62570 Reviewed By: pbelevich Differential Revision: D30072876 Pulled By: anjali411 fbshipit-source-id: c91b507dbfd6a3ec47dc6d0b0dcfa5f8c8228c30	2021-08-04 13:43:51 -07:00
kshitij12345	6f0abba04c	[fix] manual_seed{_all}: mem leak (#62534 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/55768 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62534 Reviewed By: nairbv Differential Revision: D30103294 Pulled By: ezyang fbshipit-source-id: d871ae869314dfd2d27544a51107ab752abfe452	2021-08-04 13:03:12 -07:00
aeioaeu	89f898ebb5	Fix wrong command in README.md (#62472 ) Summary: If it is `[15^,16^)`, 16.10 is not included. https://github.com/Microsoft/vswhere/wiki/Examples Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/62472 Reviewed By: nairbv Differential Revision: D30103199 Pulled By: ezyang fbshipit-source-id: 82085627ca53cd5a4e666848d27d4ab062de8352	2021-08-04 12:55:18 -07:00
Karol Sputo	b454275f47	Support eager mode use of `torch.jit.isinstance` with multiple types (#60465 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60465 Reviewed By: soulitzer Differential Revision: D30093110 Pulled By: ansley fbshipit-source-id: ee9c654bdb031e9eff4837f9f1d489c81e47cc06	2021-08-04 12:45:24 -07:00
Ilia Cherniavskii	5a1017be97	[profiler] Re-enable test on Windows (#62703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62703 Re-enable test on Windows Test Plan: CI Reviewed By: ezyang Differential Revision: D30094460 Pulled By: ilia-cher fbshipit-source-id: 80521f6bc1365d2c252f20b5d0485fc062c8d9c3	2021-08-04 12:32:24 -07:00
Don Jang	8737e17af2	[Static Runtime] Ensure that unittests only use out variants or native ops (#62335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62335 This change ensures that unittests only use out variants or native ops. - Our unittests currently assume that a graph fed to the static runtime correctly replaces an interpreter op for its corresponding out variant / native op, but it's not checked by the unittest. This change ensures that. - We relied on manual inspection of log messages to see if an out variant is used for a specific workload even for unittesting. This change frees us from doing that. - `aten::add` is excluded from this check since it's only enabled for an internal workload. Also some unittests are excluded by using `expect_interpreter_op = true` since they are written to use interpreter ops by design. Test Plan: Ran `buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest` successfully. Reviewed By: mikeiovine, hlu1 Differential Revision: D29952381 fbshipit-source-id: e60e70b80ccf45e91c6654b4ad53f92ffd5ab702	2021-08-04 11:37:15 -07:00
Rong Rong (AI Infra)	de77c6a0eb	[BE] fix bc check (#62687 ) Summary: a bug was discovered in https://github.com/pytorch/pytorch/issues/62434, for some reason comparing the schema name didn't match the allow_list item. So: 1. remove duplicate regex compile 2. make use of the schema string is used instead of just the name Pull Request resolved: https://github.com/pytorch/pytorch/pull/62687 Reviewed By: ezyang Differential Revision: D30102437 Pulled By: walterddr fbshipit-source-id: 541b2ed77948f24daebb08623cadabb034a241e0	2021-08-04 11:00:22 -07:00
Jane Xu	0a66416767	Rename master to main for test-infra references (#62728 ) Summary: Reacting to the main->master switch in test-infra Pull Request resolved: https://github.com/pytorch/pytorch/pull/62728 Reviewed By: samestep Differential Revision: D30104777 Pulled By: janeyx99 fbshipit-source-id: a7af7dfc69fd6e02c30ad6c15808a5b32a68c587	2021-08-04 10:45:47 -07:00
Facebook Community Bot	90ba71f841	Automated submodule update: FBGEMM (#62688 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `10ec0d3388` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62688 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: dskhudia Differential Revision: D30088109 fbshipit-source-id: da8a1e6232e489eac0384faadb71c2dfac5927f7	2021-08-04 10:40:50 -07:00
Jagadish Krishnamoorthy	8bcf01631a	[ROCm] update magma (#62502 ) Summary: Update magma to point to magma_ctrl_launch_bounds branch. When upstream magma branch is used, cholesky tests in test_ops.py and test_linalg.py fails due to "Intel MKL ERROR: Parameter 4 was incorrect on entry to DPOTRF." Suspect commit: [35325212b15c5baadd7493d61b19b2db2635cb68](`35325212b1`) in magma master. Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/62502 Reviewed By: malfet Differential Revision: D30089171 Pulled By: seemethere fbshipit-source-id: b07234ce66d48e3af113640995f923ee586b3cd9	2021-08-04 10:19:55 -07:00
Rong Rong (AI Infra)	dfdc3069e7	Revert D30072994: [pytorch][PR] [6/n Update test rpc path Test Plan: revert-hammer Differential Revision: D30072994 (`ad4e1f1132`) Original commit changeset: 3217e764bd85 fbshipit-source-id: cf89df78a4e04ef03b04ec3c253c5cbb1a1f5f63	2021-08-04 10:14:31 -07:00
Sean Lawlor	34c9f5a8da	[DDP Communication Hook] Update get_tensor and set_tensor to be cleaner naming conventions (buffer() and set_buffer()) (#62662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62662 Replaced the methods set_tensor(.) and get_tensor() in the python exposed API from the C++ logic with buffer() and set_buffer(.) to be a cleaner interface. Reviewed By: SciPioneer Differential Revision: D30012869 fbshipit-source-id: bd8efab583dd89c96f9aeb3dd48a12073f0b1482	2021-08-04 09:27:31 -07:00
Kevin Tse	4b47ea9446	adding a skip for ROCm for a flaky test (#62664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62664 Skipping a test for ROCm because of issue #62602 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D30079534 Pulled By: NivekT fbshipit-source-id: a9cf35e5d3a8d218edc9c5a704d1f9599d2f38a6	2021-08-04 07:29:06 -07:00
Nikita Shulga	d1c85d2c06	Move ASAN tests to clang-7 (#62663 ) Summary: This should avoid following false positives: ``` [ RUN ] ProtoTest.Basic /var/lib/jenkins/workspace/build/third_party/onnx/onnx/onnx_onnx_torch-ml.pb.h:7060:15: runtime error: member call on address 0x7fffffffdd80 which does not point to an object of type 'google::protobuf::MessageLite' 0x7fffffffdd80: note: object is of type 'onnx_torch::ModelProto' 00 00 00 00 b0 b9 05 ef ff 7f 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ^~~~~~~~~~~~~~~~~~~~~~~ vptr for 'onnx_torch::ModelProto' UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/build/third_party/onnx/onnx/onnx_onnx_torch-ml.pb.h:7060:15 in ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62663 Reviewed By: tktrungna Differential Revision: D30076315 Pulled By: malfet fbshipit-source-id: 7bfc2c4b417307195e3c3379e4874eaceb4f3134	2021-08-04 06:26:03 -07:00
Ilia Cherniavskii	773a8eede4	[profiler][refactor] Refactor the usage of legacy profiler implementation (#61931 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61931 This PR consolidates the profiling code around a new C++ implementation (profiler_kineto.h/cpp) and uses it unconditionally from torch.autograd.profiler/torch.profiler: 1. Always use profiler_kineto.h/cpp as the C++ implementation 2. Simplify profiler.py to remove unneeded parts depending on legacy impl 3. Move some of the legacy logic into profiler_legacy.py (to be fully deleted later) Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake python test/test_profiler.py -v USE_KINETO=0 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake python test/test_profiler.py -v Imported from OSS Reviewed By: gdankel Differential Revision: D29801599 fbshipit-source-id: 9794d29f2af38dddbcd90dbce4481fc8575fa29e	2021-08-03 18:51:29 -07:00
Victor Quach	5830f122f1	Add docstrings for save_on_cpu hooks (#62410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62410 This PR adds docstrings for CPU hooks introduced in #61928. Also uncomments the warning about pinned memory in CUDA semantics docs. Depends on: #62361. For now docstrings are an orphan page at https://docs-preview.pytorch.org/62410/generated/torch.autograd.graph.set_save_on_cpu_hooks.html#torch-autograd-graph-set-save-on-cpu-hooks Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29990129 Pulled By: Varal7 fbshipit-source-id: 7a98eeee6a0abb11e2c2d9169cd1aa35ad7ba3f4	2021-08-03 17:53:45 -07:00
James Reed	5542d590d4	[EZ] Fix type of functional.pad default value (#62095 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62095 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29879898 Pulled By: jamesr66a fbshipit-source-id: 903d32eca0040f176c60ace17cadd36cd710345b	2021-08-03 17:47:20 -07:00
Heitor Schueroff	d7d399f3df	Exposes _aminmax as aminmax and makes it structured (#62401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62401 This PR exposes the `torch._aminmax` operator as `torch.aminmax`. TODO - [x] add examples to documentation - [x] add minmax to rst docs fixes https://github.com/pytorch/pytorch/issues/62164 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D30072246 Pulled By: heitorschueroff fbshipit-source-id: 557d30af7c28ca6c238c59122367104036429ecd	2021-08-03 16:10:43 -07:00
Rong Rong (AI Infra)	92f470da08	Revert D30070707: [pytorch][PR] [5/n] Update test distribute path Test Plan: revert-hammer Differential Revision: D30070707 (`d8849bdb03`) Original commit changeset: c45f07b7b548 fbshipit-source-id: 867019e95b2898346ba2d918fa7a7291c8125efd	2021-08-03 16:00:56 -07:00
Ivan Kobzarev	18eeccc7e8	[mypy] Fix Optional type check (#62668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62668 Test Plan: Imported from OSS Reviewed By: malfet, 842974287 Differential Revision: D30077960 Pulled By: IvanKobzarev fbshipit-source-id: 5e423bfb65a65974ed848caa177330d6e61452e6	2021-08-03 16:00:55 -07:00
Ivan Kobzarev	5a49abfaf1	Revert "Revert D29940705: [fx2trt] Dynamic shape inference support" (#62667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62667 This reverts commit 053e11f1b39b50fcd7aa7cdd272f7775c7a5e1ba. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30077961 Pulled By: IvanKobzarev fbshipit-source-id: a7e76b2d2fa79e6c42a6a87f0a13f62642591fee	2021-08-03 15:59:40 -07:00
Mike Iovine	34f50c6e35	[Static Runtime] testStaticRuntime verifies that # of nodes is at least 2 (#62622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62622 This allows us to catch cases where an out variant is being tested but the test author forgot to call `.clone()` in the test script. More than 2 ops does not guarantee that the memory planner is being exercised, but less than 2 guarantees that it is not being used. Reviewed By: hlu1 Differential Revision: D30058050 fbshipit-source-id: 5bc053736f1cc6fd1ffcf8254bf38874ac18c34b	2021-08-03 15:55:57 -07:00
Eli Uriegas	2bddaf6149	Revert D30072859: [pytorch][PR] [4/n] Update vulkan test path Test Plan: revert-hammer Differential Revision: D30072859 (`1630b86dd6`) Original commit changeset: bf75faabf6b6 fbshipit-source-id: 3e2672bd19544ed3f1e2a951eb02d58f5c2f9d52	2021-08-03 15:28:04 -07:00
tktrungna	ad4e1f1132	[6/n Update test rpc path (#62526 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62380 * update `test_rpc` function to call wheel install folder {sitepackages}/torch instead of build/ folder * add IN_WHEEL_TEST to limit the change for linux CI GHA only Pull Request resolved: https://github.com/pytorch/pytorch/pull/62526 Test Plan: check if all ci workflows pass Reviewed By: walterddr, seemethere Differential Revision: D30072994 Pulled By: tktrungna fbshipit-source-id: 3217e764bd859dc2db597d24a1abb5ec1d0e8c9e	2021-08-03 15:26:54 -07:00
Eli Uriegas	c48dfe0d9f	.github: Enable SSH to linux runners (#62280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62280 Enables SSH to linux GHA runners for FB employees while on the FB VPN SSH keys will be added to runners when the label "with-ssh" is applied to your pull request. Depnds on https://github.com/fairinternal/pytorch-gha-infra/pull/8 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99, soulitzer Differential Revision: D29941681 Pulled By: seemethere fbshipit-source-id: 9d291f4291eb1d814d4a3473f7daf7f6951ad724	2021-08-03 15:15:39 -07:00
Victor Quach	9beb279d84	Add context manager to save tensors on CPU (#61928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61928 Fix #57100. Creates a function `torch.autograd.graph.set_save_on_cpu_hooks()` which can be used to register default hooks under which all tensors saved during the forward pass are actually copied* to cpu, then copied back to the appropriate device for the backward pass. *If the tensor was already on cpu, the entire operation is a no op. If the tensor is on GPU, we copy the tensor to `pin_memory` during packing so that the unpacking can be done asynchronously. See [benchmark](https://github.com/pytorch/pytorch/pull/61928#issuecomment-885089279) and [note about training large models](https://github.com/pytorch/pytorch/pull/61928#issuecomment-887009448) Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29848526 Pulled By: Varal7 fbshipit-source-id: 3d289cddd4fa377bd4884ba0d569fa47c777d9e5	2021-08-03 13:08:37 -07:00
Angela Yi	91ef19309e	[quant] Input-weight equalization - branch support (#62366 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62366 In the case of models with branches, we are unable to equalize the branching part in the graph. For example, given this graph: ``` conv2 / \ x -> conv1 -> add ``` After prepare, we will ignore the branched layers (conv1 and conv2) and will not insert the equalization observers. A warning message will also be printed with the layers that are unable to be equalized. ``` conv2 -> out_quant_obs2 / \ x -> input_quant_obs -> conv1 -> out_quant_obs1 -> add ``` Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_prepare` Imported from OSS Reviewed By: malfet, supriyar Differential Revision: D29982585 fbshipit-source-id: 706297e7f1861975998dfa83e7ca59af09d80618	2021-08-03 12:45:25 -07:00
Andrew Gu	62a90c227f	Make _Join, _Joinable, _JoinHook public (#62605 ) Summary: Overview: This removes the preceding `_` from `_Join`, `_Joinable`, and `_JoinHook` in preparation for adding the generic join context manager tutorial (see [here](https://github.com/pytorch/tutorials/pull/1610)). This also adds a docs page, which can be linked from the tutorial. [Here](https://github.com/pytorch/pytorch/files/6919475/render.pdf) is a render of the docs page. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62605 Test Plan: `DistributedDataParallel.join()`: ``` touch /tmp/barrier && TEMP_DIR="/tmp" BACKEND="nccl" WORLD_SIZE="2" gpurun python test/distributed/test_distributed_fork.py -- TestDistBackendWithFork.test_ddp_uneven_inputs TestDistBackendWithFork.test_ddp_uneven_inputs_stop_iteration_sync_bn TestDistBackendWithFork.test_ddp_grad_div_uneven_inputs TestDistBackendWithFork.test_ddp_uneven_input_join_disable TestDistBackendWithFork.test_ddp_uneven_input_exception ``` `ZeroRedundancyOptimizer`: ``` gpurun4 python test/distributed/optim/test_zero_redundancy_optimizer.py ``` NOTE: DDP overlap tests are failing due to a landing race. See https://github.com/pytorch/pytorch/pull/62592. Once the fix is landed, I will rebase, and tests should be passing. `Join`: ``` gpurun4 python test/distributed/algorithms/test_join.py ``` Reviewed By: mrshenli Differential Revision: D30055544 Pulled By: andwgu fbshipit-source-id: a5ce1f1d9f1904de3bdd4edd0b31b0a612d87026	2021-08-03 12:20:11 -07:00
Nikita Shulga	053e11f1b3	Revert D29940705: [fx2trt] Dynamic shape inference support Test Plan: revert-hammer Differential Revision: D29940705 (`6b02ad5f82`) Original commit changeset: 1eab53a8cfd5 fbshipit-source-id: 68150a193df6f11389b14a0e8224e1489b29ff0b	2021-08-03 12:03:42 -07:00
Nathan Lanza	ff31389c21	Cast a few vars to void that are otherwise unused Summary: llvm-13 marks this as an error when a variable is set but not used. Evidently this macro doesn't always expand to using the var. Work around that here with void casts. Test Plan: nfc Reviewed By: drodriguez Differential Revision: D30062462 fbshipit-source-id: ff868ec74116da99afd539142996d2ffffd399fb	2021-08-03 11:57:57 -07:00
Raghavan Raman	59dd12042e	[nnc] Removed const from all fields in IR. (#62336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62336 This PR was generated by removing `const` for all types of nodes in NNC IR, and fixing compilation errors that were the result of this change. This is the first step in making all NNC mutations in-place. Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D30049829 Pulled By: navahgar fbshipit-source-id: ed14e2d2ca0559ffc0b92ac371f405579c85dd63	2021-08-03 11:44:36 -07:00
Jacob Szwejbka	474d7ec43b	[Pytorch Edge] Black Box Compatibility API (#61477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61477 It would be nice if the compatibility api was just kinda plug and play with no care about the internals of the api at all. Thats what this diff aims to provide. The general usage would be something like < On the Client > RuntimeCompatibilityInfo runtime_info = get_runtime_compatibility_info(); . . . < On the Server > ModelCompatibilityInfo model_info = get_model_compatibility_info(<model_path>); bool compatible = is_compatible(runtime_info, model_info); Currently RuntimeCompatibilityInfo and ModelCompatibilityInfo are exactly the same, but it seemed feasible to me that they may end up diverging as more information is added to the api (such as a min supported bytecode version being exposed from the runtime). Test Plan: unit test and ci Reviewed By: dhruvbird, raziel Differential Revision: D29624080 fbshipit-source-id: 43c1ce15531f6f1a92f357f9cde4e6634e561700	2021-08-03 11:27:28 -07:00
Jeff Daily	b7391f44df	cast return of cudaGetLastError() to void when discarding (#62518 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62511. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62518 Reviewed By: walterddr, janeyx99 Differential Revision: D30029858 Pulled By: malfet fbshipit-source-id: d47ce4e507ac800b4e5a5e0a8d9a6fabdfd28e6d	2021-08-03 11:17:22 -07:00
Nikita Shulga	d6048ecd6b	Enable bazel builds on `ciflow/default` (#62649 ) Summary: Add `regenerate.sh` convenience script Remove "TODO: Reenable on PR" label from workflows which are enabled on PRs Pull Request resolved: https://github.com/pytorch/pytorch/pull/62649 Reviewed By: seemethere Differential Revision: D30071905 Pulled By: malfet fbshipit-source-id: c82134cb676b273d23e225be21166588996a31d3	2021-08-03 11:05:41 -07:00
Rohan Varma	4d5607bb25	[Reland][DDP] log bucket sizes (#62625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62625 reland of https://github.com/pytorch/pytorch/pull/62232 which ran into a land race. Test Plan: ci Reviewed By: SciPioneer Differential Revision: D30058217 fbshipit-source-id: 1454dd481e630f3de9ec6111b3f2e18cd8976091	2021-08-03 10:55:46 -07:00
tktrungna	1630b86dd6	[4/n] Update vulkan test path (#62519 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62380 * update `test_vulkan` function to call wheel install folder {sitepackages}/torch instead of build/ folder * add `IN_WHEEL_TEST` to limit the change for `pytorch_linux_test` only * add symbolic link for shared libraries which are called by the tests (this is a bit hacky and should be fixed the rpath before compiling -- similar to https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L204-L208). Pull Request resolved: https://github.com/pytorch/pytorch/pull/62519 Test Plan: check if all ci workflows pass Reviewed By: walterddr Differential Revision: D30072859 Pulled By: tktrungna fbshipit-source-id: bf75faabf6b6070c366571a74834a1f58b2549d3	2021-08-03 10:24:47 -07:00
Jerry Zhang	ddd916c210	[quant][refactor] Return the models in checkGraphModeFxOp for further checking (#62487 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62487 checkGraphModeFxOp is our utility test function to quantize a given model with FX Graph Mode Quantization and checks whether the result model contains expected ops, previously it only returns a result on the sample data for the quantized model, this PR chagnes it to return prepared, quantized, quantized_reference models together with the result for quantized models. Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: iramazanli Differential Revision: D30053981 fbshipit-source-id: 31fbce48d138261d0b00ba24e1427fd0c6208990	2021-08-03 10:12:16 -07:00
Xiang Gao	76c447a730	Remove CUDA10.2 + gcc 9 in CI (#62609 ) Summary: This is an invalid combination because CUDA10.2 does not support gcc>8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62609 Reviewed By: iramazanli Differential Revision: D30057292 Pulled By: seemethere fbshipit-source-id: 7cb0fa8401e80297846b0fcb5e0ecaa435b101be	2021-08-03 10:05:16 -07:00
tktrungna	d8849bdb03	[5/n] Update test distribute path (#62520 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62380 * update `test_distributed` function to call wheel install folder {sitepackages}/torch instead of build/ folder * add IN_WHEEL_TEST to limit the change for linux CI GHA only Pull Request resolved: https://github.com/pytorch/pytorch/pull/62520 Test Plan: check if all ci workflows pass Reviewed By: soulitzer Differential Revision: D30070707 Pulled By: tktrungna fbshipit-source-id: c45f07b7b54857dc8e78405714d6d5b864c30868	2021-08-03 09:52:48 -07:00
Shiyan Deng	6b02ad5f82	[fx2trt] Dynamic shape inference support Summary: Add a field called `shape_range` to `inputTensorSpec` which allow user to indicate the range of the input shape. Make all current converters work with dynamic shape expect `layer_norm`. Need to make the layer_norm plugin to be `IPluginV2Ext`. Some ops only have limited dynamic shape support for now: - "linear": only support at most 1 dynamic dim. We add full support but I'm thinking breaking down linear to matmul + add. - "adaptive_avgpool`: right now we lower it to trt avgpool which means we need to know the last two dims to calculate parameters like kernel_size, strides, etc. Follow up would be make a plugin for adaptive avgpool. TRTorch already have one, we can borrow it. Test Plan: Added unit tests for dynamic shape inference for converter tests. Reviewed By: jackm321 Differential Revision: D29940705 fbshipit-source-id: 1eab53a8cfd5e8db0be57845062e9794578165d1	2021-08-03 09:44:26 -07:00
Peter Bell	b7ac286d0e	CMake: Add optional precompiled header support (#61940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61940 This adds a `USE_PRECOMPILED_HEADERS` option to the CMake build which precompiles `ATen.h` and also `CUDAContext.h` for the cuda library. After making a change in `native_functions.yaml`, this speeds up compilation time by around 15% on my machine. Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D29988775 Pulled By: malfet fbshipit-source-id: a23c468c958a8b74ebaef052a5b2e5fa3836c64b	2021-08-03 09:13:47 -07:00
Philip Meier	2cf4d8128d	add `OpInfo` for `torch.nn.functional.mse_loss` (#62254 ) Summary: Addresses facebookresearch/functorch#78. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62254 Reviewed By: malfet Differential Revision: D30013331 Pulled By: zou3519 fbshipit-source-id: e3242cb97d1f061b932e3e0ed589f1ee6a291512	2021-08-03 09:01:09 -07:00
Raghavan Raman	ab8af15545	[Static Runtime] Enabled building Static Runtime tests and benchmarks in OSS CI (#62226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62226 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29923800 Pulled By: navahgar fbshipit-source-id: 33cfe0e92a34c7140ea762e5715301cfbf401434	2021-08-03 08:52:36 -07:00
Andrew Gu	43327cc197	Refactor commonalities between two approaches (#62624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62624 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30058543 Pulled By: andwgu fbshipit-source-id: 73c794062b75e011868fae264f592549eed67482	2021-08-03 08:43:14 -07:00
Andrew Gu	e6a3967c2a	Add invariant check (bucket indices: 0, 1, ..., k-1) (#62623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62623 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30058544 Pulled By: andwgu fbshipit-source-id: a56910f294c6a40118751eebe255b62700f42be9	2021-08-03 08:13:52 -07:00
Kevin Tse	87465a6e68	adding operator cumulative_trapezoid (#61615 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * https://github.com/pytorch/pytorch/issues/61616 * https://github.com/pytorch/pytorch/issues/61615 * https://github.com/pytorch/pytorch/issues/61475 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61615 Reviewed By: malfet, mruberry Differential Revision: D29975064 Pulled By: NivekT fbshipit-source-id: 4d4e98f3efb720fdc44eb238ecbf0fa157ac13d7	2021-08-03 08:04:00 -07:00
Sergei Vorobev	b37578b3c0	Make bazel output less verbose in CI (#62601 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62600 Adds `bazel --config=no-tty` that is useful for less verbose output in environments that don't implement full tty like CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62601 Reviewed By: soulitzer Differential Revision: D30070154 Pulled By: malfet fbshipit-source-id: 5b89af8441c3c6c7ca7e9a0ebdfddee00c9ab576	2021-08-03 07:59:01 -07:00
Victor Quach	3bda4ea842	Avoid unnecessary copying data in Saved Variable (#61927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61927 This is a refactor of `SavedVariable.cpp` to prevent ever defining the `data_` tensor if default hooks are set. Before the refactor: ```c++ data_ = variable.tensor_data(); // this is wasteful if hooks are defined register_hooks(Engine::get_default_engine().get_default_saved_variable_hooks()); ``` After the refactor: ```c++ if (get_default_hooks_()) { save_metadata_(variable); register_hooks_(get_default_hooks_(), variable); return; } save_metadata_(variable); data_ = variable.tensor_data(); // only needed if hooks are not defined ``` Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29848524 Pulled By: Varal7 fbshipit-source-id: abca1eee37a17b47841e28d8a576490913fce1ce	2021-08-03 07:09:47 -07:00
Yukio Siraichi	7edb4f8761	Port `cumprod` kernel to structured kernels. (#61899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61899 Tracking issue: #55070 This PR also removes `at::_cumprod`, which was the "backend" for `at::cumprod`. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29939489 Pulled By: ezyang fbshipit-source-id: d5e4a6dfa6c79e4b135508ea13c2d11bd0684f63	2021-08-03 06:58:13 -07:00
Yukio Siraichi	e52325655a	Port `cumprod` kernel to structured kernels. (#61899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61899 Tracking issue: #55070 This PR also removes `at::_cumprod`, which was the "backend" for `at::cumprod`. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29939152 Pulled By: ezyang fbshipit-source-id: b3379033a1ffe3c7bc8216d16d089d388ea559ba	2021-08-03 06:57:09 -07:00
yanbing-j	c7a7c2b62f	Enable Gelu fp32/bf16 in CPU path using Mkldnn implementation (#58525 ) Summary: Enable Gelu bf16/fp32 in CPU path using Mkldnn implementation. User doesn't need to_mkldnn() explicitly. New Gelu fp32 performs better than original one. Add Gelu backward for https://github.com/pytorch/pytorch/pull/53615. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58525 Reviewed By: ejguan Differential Revision: D29940369 Pulled By: ezyang fbshipit-source-id: df9598262ec50e5d7f6e96490562aa1b116948bf	2021-08-03 06:52:23 -07:00
kshitij12345	fd8004b42e	add bfloat16 impl for nextafter (#61829 ) Summary: Add `BFloat16` support for `nextafter`. * [x] Add OpInfo * [x] Add Implementation Test (C++ tests) * [x] Add credit Pull Request resolved: https://github.com/pytorch/pytorch/pull/61829 Reviewed By: ejguan Differential Revision: D29932498 Pulled By: mruberry fbshipit-source-id: 89524531a4800569ba1addd08a4ace330a6f72a4	2021-08-02 23:16:58 -07:00
Richard Barnes	2888b7fec5	Fix sign comparison (#62483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62483 Test Plan: Sandcastle Reviewed By: albanD Differential Revision: D30015385 fbshipit-source-id: eefc3208fb8c42ff46b9f4d910eb93c32595fa28	2021-08-02 22:50:39 -07:00
Nikita Shulga	a77be16538	TensorAccessor::bounds_check should be a CPU-only funciton (#62628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62628 This fixes following errors when ROCm compiler is used ``` caffe2/aten/src/ATen/core/TensorAccessor.h:160:5: error: throw is prohibited in AMP-restricted functions TORCH_CHECK_INDEX( ^ ``` Test Plan: CI Reviewed By: zhouzhuojie, seemethere Differential Revision: D30059737 fbshipit-source-id: d094ee608768db41fcc91d044c2c6d7d29f33fe4	2021-08-02 22:46:24 -07:00
Adam Simpkins	e0364ccc33	[caffe2] break one circular dependency between Caffe2 and ATen-cpu (#62632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62632 Update the caffe2/core/context.h to directly use `at::mt19937` instead of the `at::CPUGeneratorImpl` wrapper class from the ATen-cpu library. Using `at::CPUGeneratorImpl` causes circular dependencies between the ATen and caffe2 code. In particular the `at::CPUGeneratorImpl::get_state()` logic depends on CPU Tensor functionality that currently depends on code from caffe2. Test Plan: The RNG behavior should be identically to the previous code (perhaps even faster since we now avoid virtual function calls). buck test //caffe2/caffe2:caffe2_test_cpu \ //caffe2/caffe2/python: //caffe2/caffe2/fb/operators: Differential Revision: D29915701 fbshipit-source-id: f9b2eab8d3b21b2224d30bcf52be9c0e7eb7cd0a	2021-08-02 22:40:56 -07:00
Pritam Damania	88af4d8441	Initialize RRefs only when explicitly asked for. (#62618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62618 ShardedTensor implicitly initialized RRefs to remote shards if the RPC framework was initialized. Although, there are use cases where the RPC framework might be initialized for a different purpose but users would not prefer that ShardedTensor initializes RRefs as well. As a result, I've made RRef initialization explcitit in ShardedTensor APIs. ghstack-source-id: 134889287 Test Plan: 1) waitforbuildbot 2) unit tests. Reviewed By: wanchaol Differential Revision: D30056833 fbshipit-source-id: 9b2433a38dafa1888589c5b72ed93b6f0ee51639	2021-08-02 22:17:17 -07:00
Isuru Fernando	b58e04f156	Make sure FindLAPACK finds the same BLAS library (#49647 ) Summary: BLAS library is found by cmake/Dependencies.cmake and then LAPACK library is found by FindLAPACK.cmake which in turn calls FindBLAS.cmake. This means that we are searching for BLAS twice and they might be different things. By setting a few variables, this can be avoided. cc seemethere Pull Request resolved: https://github.com/pytorch/pytorch/pull/49647 Reviewed By: seemethere, ejguan Differential Revision: D29943680 Pulled By: malfet fbshipit-source-id: 3cbc350ea645a1a28dd92c19e5ee7f9eecdeff59	2021-08-02 20:41:00 -07:00
Nathan Lanza	2d038b5dc8	Cast a var to void that is unused Summary: The comment above makes it seem intentional, so just ignore it. Test Plan: NFC Reviewed By: smeenai Differential Revision: D30057632 fbshipit-source-id: 45929b4eeeefdf22f5c7c4dd603229635f9da31b	2021-08-02 19:56:41 -07:00
Santiago Castro	c4196bee93	Save some memory in scatter (#62516 ) Summary: Also removes some redundant parenthesis for clarity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62516 Reviewed By: andwgu Differential Revision: D30030546 Pulled By: SciPioneer fbshipit-source-id: e106486f70b9590bf3dcffb76d23f5725737542f	2021-08-02 18:41:58 -07:00
Hui Guo	10d3a2c13a	[tensorexpr] Added logging info for SimplifierUnderContext (#62138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62138 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D29891257 Pulled By: huiguoo fbshipit-source-id: c36b3d615fa2fe971d022111bef61ee843a9dbea	2021-08-02 18:38:55 -07:00
Hui Guo	3a592730d5	[nnc] Simplify i%100 to i if i is less than 100; fixed #52580 (#60693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60693 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29375938 Pulled By: huiguoo fbshipit-source-id: 1388729c5b93805cb156efa53e8823d5462885bf	2021-08-02 18:38:54 -07:00
Hui Guo	8f7ae77040	[nnc] Add context-sensitive simplification for div/mod (#60688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60688 Test Plan: Imported from OSS Reviewed By: navahgar, ZolotukhinM Differential Revision: D29373313 Pulled By: huiguoo fbshipit-source-id: 90d7f2fbfce583b0ea3b0f1c7899e22b0210bd62	2021-08-02 18:37:39 -07:00
Pritam Damania	c07a123b26	Support saving and loading ShardedTensor. (#62242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62242 1) Add a state_dict hook to ensure ShardedTensors are added to a state_dict. 2) Add a pre load state_dict hook to ensure ShardedTensor are added back to a module at load time. 3) Add a `with_load_process_group` context manager for load time. 4) Added ser-de capability to ShardedTensor. ghstack-source-id: 134860967 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: wanchaol Differential Revision: D29927881 fbshipit-source-id: b1ef8872ed91e9cb0e2d5dd17d2764678ab89f0c	2021-08-02 18:33:19 -07:00
Eli Uriegas	dd23372aa5	.circleci: Prefix intermediate build image tags (#62610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62610 Prefixes intermediate build image tags with build- so that ECR lifecycle policies can automatically clean them up Policy to automatically cleanup images prefixed with `build-`: `b02dd818f9` Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D30055952 Pulled By: seemethere fbshipit-source-id: 328b9c94ffc02877d088d0118a19c732f580838b	2021-08-02 18:17:14 -07:00
Victor Quach	525fa2f0b6	[reland] Catch saved tensors default hooks race condition (#62564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62564 If the user runs code that registers default saved tensor hooks from multiple threads, it will fail with a nice error message most of the time. This commit handles the very rare case where a race condition would have made it fail silently. Relanding previous PR #61957 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30045406 Pulled By: Varal7 fbshipit-source-id: d04f74c99affbbf655e53cfc2acd42f7c5b4e6eb	2021-08-02 18:00:37 -07:00
Nikita Shulga	f5cf24a224	Fix lint in test_deploy_from_python.py (#62626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62626 Reviewed By: walterddr, zhouzhuojie, seemethere Differential Revision: D30059119 Pulled By: malfet fbshipit-source-id: 2aff44c1585091d864ab7e02d69046204e5b5d17	2021-08-02 17:55:24 -07:00
Mustafa Bal	615ac8e573	Added logic for notifying PTE webapp for Nightly and PR builds (#62512 ) Summary: This PR adds the logic to notify the PTE webapp for DevOps PyTorch Nightly and PR builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62512 Reviewed By: iramazanli Differential Revision: D30046165 Pulled By: malfet fbshipit-source-id: ef7e4848d4db9f38536a647fcd2d8e26cf64b960	2021-08-02 16:44:35 -07:00
Yi Wang	db071ef005	[Reland][DDP Communication Hook] Rename 4 Methods of GradBucket Class (#62592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62592 Reland #62510 `GradBucket` is an important class defined in both C++ and Python, used for PyTorch Distributed Training. We need to rename the following methods for simplicity: 1) get_index -> index 2) is_the_last_bucket_to_allreduce -> is_last, 3) get_per_parameter_tensors -> gradients, 4) get_model_params_for_bucket -> parameters. ghstack-source-id: 134848352 Test Plan: unit test Reviewed By: andwgu Differential Revision: D30049431 fbshipit-source-id: 1bcac331aa30e529b7230e3891bc811c531b0ea9	2021-08-02 16:38:09 -07:00
Salil Desai	d228a8fc94	[Vulkan] Softmax Along Channel Dim (#62239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62239 Added naive implementation of vulkan softmax (not using shared memory) Based off of naive implementation of mean, found here: `2565a33c98/aten/src/ATen/native/vulkan/glsl/mean.glsl` Test Plan: After building: ``` build/bin/vulkan_api_test ``` {F637001190} ``` [ RUN ] VulkanAPITest.softmax [ OK ] VulkanAPITest.softmax (180 ms) ``` Reviewed By: SS-JIA Differential Revision: D29793150 fbshipit-source-id: 4f9d8e1dae8a43cbcb7063b095fa4726df06c929	2021-08-02 16:20:44 -07:00
Peter Bell	940cbbce76	Add BFloat16 support to CPU nansum (#61083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61083 It's already supported on CUDA, so it seems reasonable to support on CPU as well. This also changes `test_nansum` to compare against `torch.sum` since numpy doesn't support BFloat16. Note that `test_nansum_vs_numpy` checks against NumPy as well, so that's still being tested. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30006227 Pulled By: heitorschueroff fbshipit-source-id: 1449730e1936417e7de1f8b3cf8cdcc15518873c	2021-08-02 16:03:57 -07:00
Zachary DeVito	27d3d3a7d7	deploy in python fix to work in @opt mode Summary: if we let torch_deploy get put in libomnibus, it hides the symbols we need to link against Test Plan: buck test //caffe2/torch/csrc/deploy:test_deploy_from_python -- --exact 'caffe2/torch/csrc/deploy:test_deploy_from_python - test_deploy_from_python (caffe2.torch.csrc.deploy.test_deploy_from_python.TestDeployFromPython)' --run-disabled Reviewed By: wconstab Differential Revision: D30031134 fbshipit-source-id: e5c2f740f17abafec7d01c57c664bd71a00b6f61	2021-08-02 14:47:49 -07:00
Gao, Xiang	a4af91b2fe	Cleanup CUDA 10.1 and 10.0 support on CI (#62597 ) Summary: 10.1 is removed in https://github.com/pytorch/pytorch/pull/56056 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62597 Reviewed By: walterddr Differential Revision: D30053902 Pulled By: seemethere fbshipit-source-id: deb148e5e44c12b08c267a36fbd4a1afa138e6e4	2021-08-02 14:42:25 -07:00
Jacob Szwejbka	305d5fcc05	[Pytorch Edge] get_model_bytecode int -> uint (#62201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62201 change int to uint to be the same type as the runtimes bytecode. Only affects c++ since python doesn't have uints iirc. Also changed the behavior of the functions from returning -1 and a warning to just throw an exception. Wasnt sure what the proper behavior here would be (returning UINT_MAX seemed gross) so feedback is appreciated. Test Plan: ci Reviewed By: raziel Differential Revision: D29914072 fbshipit-source-id: 1bb08702fc301d7c7612b5ad7205a6dbe855c890	2021-08-02 14:17:44 -07:00
Nikita Shulga	0c4c37b01e	Disable libtorch testing on MacOS (#62599 ) Summary: Fixes regression introduced by https://github.com/pytorch/pytorch/issues/62402 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62599 Reviewed By: walterddr, driazati Differential Revision: D30051914 Pulled By: malfet fbshipit-source-id: a07184b21cc4b2d0ae31fe385bb58a0f665595c6	2021-08-02 13:41:18 -07:00
Bradley Davis	093495d3f0	[fx] prevent implicit submodule inlining when submodule is a GraphModule (#62436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62436 ## Problem Given two modules and a tracer that indiscriminately marks all modules as a leaf: ``` class InnerModule(torch.nn.Module): def forward(self, t): return t + t class MyModule(torch.nn.Module): def __init__(self, inner): super().__init__() self.inner = inner def forward(self, t): x = self.inner(t) y = self.inner(t) return x + y class MyTracer(torch.fx.Tracer): def is_leaf_module(self, module, name): return True ``` One might generally expect the following behavior (note call_module nodes): ``` print(">> Outer GraphModule (with inner module as nn.Module):") inner = InnerModule() m = MyModule(inner) gm = torch.fx.GraphModule(m, MyTracer().trace(m)) print(gm.graph.print_tabular()) >> Outer GraphModule (with inner module as nn.Module): opcode name target args kwargs ------------- ------- ----------------------- ---------------- -------- placeholder t t () {} call_module inner inner (t,) {} call_module inner_1 inner (t,) {} call_function add <built-in function add> (inner, inner_1) {} output output output (add,) {} None ``` However, when the inner module is first symbolically traced, the symbolic trace of the outer module ignores `is_leaf_node` entirely, and traces through the whole module (note call_function nodes). ``` print(">> Inner module as GraphModule:") inner = InnerModule() inner_gm = torch.fx.GraphModule(inner, MyTracer().trace(inner)) print(inner_gm.graph.print_tabular()) print(">> Outer GraphModule (with inner module as GraphModule):") m = MyModule(inner_gm) gm = torch.fx.GraphModule(m, MyTracer().trace(m)) print(gm.graph.print_tabular()) >> Inner module as GraphModule: opcode name target args kwargs ------------- ------ ----------------------- ------ -------- placeholder t t () {} call_function add <built-in function add> (t, t) {} output output output (add,) {} None >> Outer GraphModule (with inner module as GraphModule): opcode name target args kwargs ------------- ------ ----------------------- ------------ -------- placeholder t t () {} call_function add <built-in function add> (t, t) {} call_function add_1 <built-in function add> (t, t) {} call_function add_2 <built-in function add> (add, add_1) {} output output output (add_2,) {} None ``` This is surprising behavior and at first glance violates the tracer's intent. As I understand it, `torch.fx.symbolic_trace.Tracer.trace` intends to patch `torch.nn.Module.__call__` with a `module_call_wrapper()` that records a `call_module` node if the module is a leaf, else executes `torch.fx._symbbolic_trace._orig_module_call = torch.nn.Module.__call__`, which is set a module loading time. Every submodule should be a leaf, but no `call_module` nodes are created when that submodule is a `GraphModule`. Why? Upon further inspection, I found: - The constructor for GraphModule includes a path to `GraphModule.recompile()` via the setter for a `fx.Graph`: ``` inner_gm = torch.fx.GraphModule(inner, MyTracer().trace(inner)) File "/torch/fx/graph_module.py", line 252, in __init__ self.graph = graph File "/torch/nn/modules/module.py", line 1183, in __setattr__ object.__setattr__(self, name, value) File "/torch/fx/graph_module.py", line 277, in graph self.recompile() ``` - `recompile()` wraps the `__call__` method by holding a reference to the `__call__` method at the time of recompilation: ``` cls = type(self) cls_call = cls.__call__ ... def wrapped_call(self, args, kwargs): try: return cls_call(self, args, *kwargs) except Exception as e: ... cls.__call__ = wrapped_call ``` - Recompilation of the inner GraphModule happens on initialization, before creation or tracing of the outer module. Adding some old-fashioned print debug statements gives: ``` Inner Module: _orig_module_call: <function Module._call_impl at 0x7faaebfee8b0> recompile: cls.__call__ now wraps _orig_module_call, <function Module._call_impl at 0x7faaebfee8b0> Outer Module: _orig_module_call: <function Module._call_impl at 0x7faaebfee8b0> tracing: patching method <class 'torch.nn.modules.module.Module'>.__call__ <function Module._call_impl at 0x7faaebfee8b0> with <function Module._call_impl at 0x7fa9d42bce50> outer module MRO before tracing: (0) <class '__main__.MyModule'>: <function Module._call_impl at 0x7faaebfee8b0> (1) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7faaebfee8b0> (2) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00> outer module MRO during tracing: (0) <class '__main__.MyModule'>: <function Module._call_impl at 0x7fa9d42bce50> (1) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7fa9d42bce50> (2) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00> inner module MRO before tracing: (0) <class 'torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl'>: <function x.y.z.wrapped_call at 0x7fa9d42a8670> (1) <class 'torch.fx.graph_module.GraphModule'>: <function Module._call_impl at 0x7faaebfee8b0> (2) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7faaebfee8b0> (3) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00> inner module MRO during tracing: (0) <class 'torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl'>: <function x.y.z.wrapped_call at 0x7fa9d42a8670> (1) <class 'torch.fx.graph_module.GraphModule'>: <function Module._call_impl at 0x7fa9d42bce50> (2) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7fa9d42bce50> (3) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00> ``` - The outer module is patched correctly, but the inner module's first element in its MRO is the `wrapped_call` from `recompile` that still invokes `<function Module._call_impl at 0x7faaebfee8b0>` directly. Therefore, no call_module nodes are created. ## In Practice In practice, this behavior affects the ability of `torch.package` to package `GraphModules` whose submodules are `GraphModules`. In our case, the `GraphModule` submodules are not passed through a constructor, but created separately and installed on the root `GraphModule` via `setattr`. This means that prior to packaging, there appear to be no issues with the module, since the root's graph was created before any call_module targets were replaced with `GraphModules`. When unpackaging such a model with `torch.package`, `torch.fx.graph_module._deserialize_graph_module` uses an inline `KeepModules` tracer that sets all submodules to leaves; the unpackaged module is implicitly and surprisingly inlined in the process. ## Potential Solution This behavior was previously not understood by us, and so the current workaround is a gnarly process of wrapping all submodules with a `nn.Module` with a manually installed forward method. Changing `wrapped_call` to return `return super(type(self), self).__call__(args, **kwargs)` whenever `__call__` is inherited at least appears to solve the issue. Does this seem like an acceptable approach? ## Other Thoughts - Repeated calls to `recompile` create nested `wrapped_calls`, all for the purpose of error handling. This seems probably unnecessary ¯\\_(ツ)\_/¯ - If a root module with a overriden `__call__` method is symbolically traced, it is ignored Test Plan: ``` buck test: ✓ ListingSuccess: caffe2/test:fx - main (12.570) ✓ Pass: caffe2/test:fx - test_tracing_graphmodules_as_leaf_submodules (test_fx.TestFX) (11.982) ``` Reviewed By: ansley Differential Revision: D29997935 fbshipit-source-id: 1988fbb025b14188da26a3e73e94fb789c3c1f74	2021-08-02 13:37:08 -07:00
Howard Huang	dc1bd6acee	Remove PROCESS GROUP rpc backend (#62411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62411 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29990408 Pulled By: H-Huang fbshipit-source-id: 183d3b316767b12993cebbe32b73c2850fd1cc42	2021-08-02 12:26:22 -07:00
Yi Wang	2ec4f69b48	[DDP Comm Hook] Do not expose hook_then_optimizer as a public method (#62532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62532 This method is not stable at this time, so avoid releasing it when DDP communication hook feature is released as a stable feature. ghstack-source-id: 134787831 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_hook_with_optimizer_parity buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_hook_then_optimizer_nccl Reviewed By: rohan-varma Differential Revision: D30031222 fbshipit-source-id: e03a8e13fee5116a5ddd724eb76316ee98f2a676	2021-08-02 12:25:01 -07:00
Victor Quach	b161ac541d	[reland] Add default Saved Variable hooks (#62563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62563 Expose a pair of functions to Python users: torch.autograd.graph.set_saved_tensors_default_hooks(pack, unpack) and torch.autograd.graph.reset_saved_tensors_default_hooks(). These functions control the hooks applied to saved tensors: all tensors saved in that context will be packed using the pack function, then unpacked accordingly when needed. Currently, this works by simply calling register_hooks (cf #60975) directly at the end of the constructor of a SavedVariable. This could be optimized further by not performing the copy before registering default hooks, but this would require a small refactor. Edit: the refactor is done in #61927. A current limitation is that if users create tensors in this context, they will not be able to register additional hooks on the saved tensor. For instance, to perform something like #28997, one could define a pack function that saves to disk whenever the tensor size is too big and returns a filename, then unpack simply reads the content of the file and outputs a tensor, e.g.: ``` def pack(x): name = os.path.join(tmp_dir, str(uuid.uuid4())) torch.save(x, name) return name def unpack(name): return torch.load(name) ``` Relanding previous PR: https://github.com/pytorch/pytorch/pull/61834 Original PR led to timeout error in: https://www.internalfb.com/mast/job/yuguo-release_canary_offline_training-inlinecvrp_a-canary_offline_train_28a7ecfc Now passing: https://www.internalfb.com/mast/job/quach-release_canary_offline_training-inlinecvrp_a-canary_offline_train_9bb57e98 The difference with the new version is we don't need to acquire the GIL when calling `PyDefaultSavedVariableHooks::get_hooks`. Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D30045405 Pulled By: Varal7 fbshipit-source-id: 7f6c07af3a56fe8835d5edcc815c15ea4fb4e332	2021-08-02 11:30:26 -07:00
Eli Uriegas	6f95850127	Revert D30024161: [DDP Communication Hook] Rename 4 Methods of GradBucket Class Test Plan: revert-hammer Differential Revision: D30024161 (`29c8b1db57`) Original commit changeset: 07e6072a2f7b fbshipit-source-id: d571c2caadaf7b71fe2aba3c0597bd8074d153de	2021-08-02 10:26:54 -07:00
Philip Meier	2e4f566d30	add `OpInfo` for `torch.nn.functional.softplus` (#62317 ) Summary: Addresses facebookresearch/functorch#78. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62317 Reviewed By: malfet Differential Revision: D30013322 Pulled By: zou3519 fbshipit-source-id: e80affd10b81534234694c9e4326cc68c7efc7fe	2021-08-02 09:46:13 -07:00
kshitij12345	cb626da145	[fix] mark non-differentiable ops (#62529 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62506 Fixes https://github.com/pytorch/pytorch/issues/62504 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62529 Reviewed By: albanD Differential Revision: D30032665 Pulled By: malfet fbshipit-source-id: 90254c50fb4a873e3eda59c8484626137e01cb31	2021-08-02 09:40:45 -07:00
Meghan Lele	562b555a2b	[CUDA] Fix typo in Normalization.cu (#62515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62515 Summary This commit fixes an obvious typo in `Normalization.cu` I found while working on #62452. Since that PR will not be landed anytime soon, I thought it would be prudent to land this fix. Test Plan Continuous integration. Test Plan: Imported from OSS Reviewed By: makslevental Differential Revision: D30027324 Pulled By: SplitInfinity fbshipit-source-id: 9d368a54c13f8e2bf6f6956dfb2bee974226f726	2021-08-02 09:38:46 -07:00
Qing Hu	29c8b1db57	[DDP Communication Hook] Rename 4 Methods of GradBucket Class (#62510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62510 `GradBucket` is an important class defined in both C++ and Python, used for PyTorch Distributed Training. We need to rename the following methods for simplicity: 1) get_index -> index 2) is_the_last_bucket_to_allreduce -> is_last, 3) get_per_parameter_tensors -> gradients, 4) get_model_params_for_bucket -> parameters. Test Plan: Local run comprehensive test with following results: https://pxl.cl/1Ml8b For two timeout failure test cases, most likely environment related and fail in my devserver. Reviewed By: SciPioneer Differential Revision: D30024161 fbshipit-source-id: 07e6072a2f7b81f731425d9b71f8c8b60d383b0f	2021-08-02 09:33:32 -07:00
BoTorch website deployment script	34cb2b5d04	Update SobolEngine docstring w/ correct behavior (#62548 ) Summary: Sobol was modfied to not drop the first point. This update reflects this behavior in the docstring. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62548 Reviewed By: qingfeng10 Differential Revision: D30035627 Pulled By: Balandat fbshipit-source-id: 64c659ea30c0c929778da3b58041875834e25e9a	2021-08-02 09:04:38 -07:00
Marjan Fariborz	2445d5c60a	Removed the hypothesis tests for adaptive_avg_pool (#60886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60886 Remove all the hypothesis tests from test_adaptive_avg_pool2d_nhwc, test_adaptive_avg_pool, and test_adaptive_avg_pool3d_ndhwc. Test Plan: I tested it with buck test //caffe2/test:quantization and all three tests passed. The tests that failed are test_conv2d_api (test_quantized_functional.py), test_conv3d_api (test_quantized_functional.py), Reviewed By: wanchaol, jerryzh168 Differential Revision: D29432184 fbshipit-source-id: 2a4c540d2c169aec69cf8d143d5a155394885745	2021-08-02 08:57:14 -07:00
Yi Zhang	3dc588d577	Fix: no enough space for cu102 debug nightly build (#62465 ) Summary: Fixes #{issue number} ![image](https://user-images.githubusercontent.com/16190118/127632173-783630b7-c644-4239-b1dd-fb12b6bacf83.png) verification: https://app.circleci.com/pipelines/github/pytorch/pytorch/358483/workflows/a34a0123-54fe-418f-9211-4af75ee56a70/jobs/15120463 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62465 Reviewed By: iramazanli Differential Revision: D30045280 Pulled By: janeyx99 fbshipit-source-id: f40090eb02fd1d86033971611d492c7b107cc4bd	2021-08-02 08:44:16 -07:00
Andrew Gu	51f687fd4b	Add overlap with DDP to ZeRO (two approaches) (#62157 ) Summary: Overview: This adds two approaches to overlapping `DistributedDataParallel.backward()` with `ZeroRedundancyOptimizer.step()` by providing two hook constructors: `hook_with_zero_step()` and `hook_with_zero_step_interleaved()`. The former waits for all backward computation to finish before starting optimizer computation, while the latter launches a partial optimizer computation using the contents of a gradient bucket once that bucket's all-reduce completes. The two approaches each suffer from their own weaknesses, and which one to use depends on the specific hardware configuration. Both approaches can share changes to `ZeroRedundancyOptimizer`. A user should pass `overlap_with_ddp=True` to `ZeroRedundancyOptimizer`, construct a DDP communication hook using either `hook_with_zero_step()` or `hook_with_zero_step_interleaved()`, and register that communication hook. `ZeroRedundancyOptimizer.step()` should still be called in the training loop, though the optimizer computation and communication will be offloaded to originate from the communication hook. Currently, the first two iterations are vacuous, meaning they do not result in parameter updates and the inputs are ignored. This is required to finalize the DDP bucket strategy and to then initialize the `ZeroRedundancyOptimizer`'s local optimizer based on that bucketing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62157 Test Plan: The existing `ZeroRedundancyOptimizer` tests pass, and new unit tests for both hooks pass: - ~~`test_ddp_with_zero_step_parity_cpu`~~ (removed for now due to flakiness in CI -- under investigation, could possibly be similar Gloo issue as with `hook_with_zero_step_interleaved()`) - `test_ddp_with_zero_step_parity_gpu` - `test_ddp_with_zero_step_interleaved_parity_gpu` These were tested on the AI AWS cluster. An analogous `test_ddp_with_zero_step_interleaved_parity_cpu` is missing due to existing bugs with Gloo. See https://github.com/pytorch/pytorch/pull/62302. Both approaches have been verified using an internal accuracy benchmark. Reviewed By: mrshenli Differential Revision: D29971046 Pulled By: andwgu fbshipit-source-id: a7234c23c7ea253f144a698fd7e3c0fe039de5e8	2021-08-02 08:33:34 -07:00
Joel Schlosser	ee482edf0a	Callable activation function support for Transformer modules (C++) (#62342 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60747 Enhances the C++ versions of `Transformer`, `TransformerEncoderLayer`, and `TransformerDecoderLayer` to support callables as their activation functions. The old way of specifying activation function still works as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62342 Reviewed By: malfet Differential Revision: D30022592 Pulled By: jbschlosser fbshipit-source-id: d3c62410b84b1bd8c5ed3a1b3a3cce55608390c4	2021-08-02 08:06:39 -07:00
Rong Rong (AI Infra)	c9d5325c52	[BE] shorten the name part 1 (#62402 ) Summary: This should address part of https://github.com/pytorch/pytorch/issues/62357. 1. rename all files 'generated-*' to make it clear, filename will not be in CI workflow name 2. remove all 'pytorch-' in names 3. make sure the build test shell scripts are adopted to new name Next change should reduce more device related naming Pull Request resolved: https://github.com/pytorch/pytorch/pull/62402 Reviewed By: malfet Differential Revision: D30021959 Pulled By: walterddr fbshipit-source-id: 64b21a2020e25a507101c09c010cb593d8ac4146	2021-08-02 07:56:55 -07:00
Can Balioglu	7565039ee9	Support system-provided Intel TBB (#61934 ) Summary: This PR: (1) enables the use of a system-provided Intel TBB for building PyTorch, (2) removes `tbb:task_scheduler_init` references since it has been removed from TBB a while ago (3) marks the implementation of `_internal_set_num_threads` with a TODO as it requires a revision that fixes its thread allocation logic. Tested with `test/run_test`; no new tests are introduced since there are no behavioral changes (removal of `tbb::task_scheduler_init` has no impact on the runtime behavior). Pull Request resolved: https://github.com/pytorch/pytorch/pull/61934 Reviewed By: malfet Differential Revision: D29805416 Pulled By: cbalioglu fbshipit-source-id: 22042b428b57b8fede9dfcc83878d679a19561dd	2021-08-02 07:39:00 -07:00
Joel Schlosser	bbf6131159	Add factory kwargs test to test_modules (#62340 ) Summary: Adds a new `ModuleInfo`-based test to `test_modules.py`. The test passes `device` and `dtype` to each module during instantiation, ensuring that the kwargs are applied to any newly-created parameters or buffers. Note that the `device` and `dtype` kwargs should only be present when a module creates parameters or buffers; the test uses some mock magic to identify this. Originally lifted from `test/test_module_init.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62340 Reviewed By: malfet Differential Revision: D30022543 Pulled By: jbschlosser fbshipit-source-id: 77e5d46d6b11c16dc39d19a1c650ee48c26c54c1	2021-08-02 06:53:00 -07:00
CodemodService FBSourceClangFormatLinterBot	46b18aa294	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D30039182 fbshipit-source-id: 3b38fc89585853bb9a5483a0de9ebd6852154a8d	2021-08-02 04:17:10 -07:00
Supriya Rao	aa5e3ad705	[quant] Support PerChannel quantization in FusedMovingAvgObsFakeQuantize (#62346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62346 Update the operator code to resize the min/max tensors if per-channel quant is selected. We need to do this because by default the observer creates empty tensors for min/max and scale/zero_point values when per-channel quantization is enabled Test Plan: python test/test_quantization.py test_fused_mod_per_channel Imported from OSS Reviewed By: HDCharles Differential Revision: D30003835 fbshipit-source-id: b5ec80261cb50ee543f21191a887e979dcde4667	2021-08-01 21:45:11 -07:00
Pavithran Ramachandran	7adb78017a	[countbuild][xplat/caffe2] contbuild with sanitizers (#61724 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61724 To improve the stability of xplat/caffe2 code, we are enabling sanitizers (asan, tsan, ubsan) on contbuild. ghstack-source-id: 134339882 Test Plan: ``` buck test --show-output --flagfile fbsource//fbcode/mode/dev-asan --config fbsource.sanitizer=address fbsource//xplat/pytorch_models/build/pytorch_model_test/v13:body_tracking_v124_test clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument] Downloaded 0/7 artifacts, 0.00 bytes, 100.0% cache miss Building: finished in 14.5 sec (100%) 4538/4538 jobs, 4 updated Total time: 14.5 sec Testing: finished in 1.1 sec (1 PASS/0 FAIL) RESULTS FOR //xplat/pytorch_models/build/pytorch_model_test/v13:body_tracking_v124_test PASS 1.0s 1 Passed 0 Skipped 0 Failed //xplat/pytorch_models/build/pytorch_model_test/v13:body_tracking_v124_test TESTS PASSED ``` ``` buck test --show-output --flagfile fbsource//fbcode/mode/dev-tsan --config fbsource.sanitizer=thread fbsource//xplat/pytorch_models/build/ads_mai_test_train/v4:model_test clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument] Downloaded 3/19 artifacts, 88.30 Kbytes, 66.7% cache miss Building: finished in 24.0 sec (100%) 4609/4609 jobs, 9 updated Total time: 24.9 sec Testing: finished in 0.9 sec (1 PASS/0 FAIL) RESULTS FOR //xplat/pytorch_models/build/ads_mai_test_train/v4:model_test PASS 808ms 1 Passed 0 Skipped 0 Failed //xplat/pytorch_models/build/ads_mai_test_train/v4:model_test TESTS PASSED ```` Reviewed By: dhruvbird, albanD Differential Revision: D29348099 fbshipit-source-id: 3d3255bff0464129745d2ed729d443f3e7470313	2021-08-01 12:02:30 -07:00
Yi Wang	32b37ba246	[DDP Communication Hook] Update the typing info of comm hook output as well as some docstring (#62457 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62457 Specify `Future[torch.Tensor]` as DDP communication hook return type, which should be explicitly a single tensor. The previous API takes a list that has a single tensor. Note that now the typing info no longer accepts the internal type of `torch._C.Future`, which does not support torchscript and hence cannot support `Future[torch.Tensor]`. ghstack-source-id: 134771419 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_invalid_comm_hook_return_type Reviewed By: rohan-varma Differential Revision: D30007390 fbshipit-source-id: 246667c9b575b4c6e617b0a5b373151f1bd81e7f	2021-07-30 20:51:34 -07:00
Yi Wang	72295da6c3	Reformat (#62456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62456 as title ghstack-source-id: 134771417 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D30006493 fbshipit-source-id: 1d1dc9cfff69a9b4fa31470177c1f4fa206a94ef	2021-07-30 20:50:19 -07:00
Richard Barnes	c506769f19	irange-ify 8 (#62422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62422 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29879655 fbshipit-source-id: 69fdf0196091932f866bfaba707ad7643790fdd8	2021-07-30 20:31:58 -07:00
Eli Uriegas	bd9f35313a	Revert D29922299: [DDP] log bucket sizes Test Plan: revert-hammer Differential Revision: D29922299 (`5429f68f00`) Original commit changeset: 538b331c96e7 fbshipit-source-id: 3595fe04e8dea38bc9d05e8c70f2dcd2ad496ced	2021-07-30 20:27:36 -07:00
Meghan Lele	9df7ac7a94	Port `nll_loss_backward` to structured (#62144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62144 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D29945279 Pulled By: SplitInfinity fbshipit-source-id: 2fee60e8424fc590a81767c9b0a8226a0c2fd69c	2021-07-30 19:43:10 -07:00
Rohan Varma	5429f68f00	[DDP] log bucket sizes (#62232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62232 Logs the bucket sizes in DDP logging so that we know which workflow ran with what bucket size config. Will be used to verify how changing bucket sizes in DDP affects perf. Based on the test, we can see inconsistency where the "first" bucket size actually is (last before rebuild buckets, first after). ghstack-source-id: 134663867 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29922299 fbshipit-source-id: 538b331c96e77048164ad130b377433be100a761	2021-07-30 18:07:04 -07:00
Richard Barnes	63d3da7961	Fix sign comparison (#62194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62194 Reviewed By: albanD Differential Revision: D29885396 Pulled By: r-barnes fbshipit-source-id: 8092f3002474a48fc6b349b9e369c8d59e832fcc	2021-07-30 17:18:05 -07:00
Pritam Damania	2006dc6316	[3/N] Remove unittest.skip from torch/testing/_internal distributed files. (#61991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61991 Continuation of https://github.com/pytorch/pytorch/pull/61887 and removing unittest.skip as much as possible. ghstack-source-id: 134759368 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D29831860 fbshipit-source-id: fe57a7d56d4423924a2dec10bb670137ace0c9a4	2021-07-30 16:40:43 -07:00
Zachary DeVito	7521addede	[deploy] loader cleanup (#62223 ) Summary: Some refactoring of the custom loader logic: * Make sure we unregister frames when they are deleted so that future exceptions do not attempt to read unallocated memory * rename linker -> loader to make its name more correct * move the build of the loader into lib deploy since it can be shared across interpreters * unify the logic for finding the library symbol across ops and fbcode Pull Request resolved: https://github.com/pytorch/pytorch/pull/62223 Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D29922002 Pulled By: zdevito fbshipit-source-id: b7f8ee5812e29a5d098fcf1bd9f4cea7d30ecb4c	2021-07-30 16:34:13 -07:00
Stephen Macke	174433267c	[dte] fastpath implementation for broadcast utility function (4/x) (#62493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62493 This diff adds a broadcast fastpath for the caffe2 broadcast utility function, which just copies the contents of a smaller tensor into a larger one. We also update the tests to exercise the new functionality. Test Plan: unit tests + let CI run Differential Revision: D29938285 fbshipit-source-id: 543ecc548500380e307be91902696033454964a2	2021-07-30 16:15:10 -07:00
Mike Guo	08539ca047	Add non-context manager usage support for profiler (#61690 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60238, https://github.com/pytorch/kineto/issues/329 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61690 Reviewed By: malfet Differential Revision: D30016561 Pulled By: ngimel fbshipit-source-id: 93a578ffbb556f4b584213ac9cfafcc5cf0a9270	2021-07-30 15:54:36 -07:00
Xiao Wang	6441caeaa7	Use multi-dimensional cuFFT transforms to improve FFT performance (#61203 ) Summary: Benchmark and numerical accuracy tests on A100 and V100 are available at https://github.com/xwang233/code-snippet/tree/master/fft-61203. I've checked the FFT results for different shapes/dims and different `dim` arg for `rfftn` and `irfftn` before and after this PR, and they all numerically matched. With this PR, about 10%~15% performance gain is expected on commonly used shapes and dims. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61203 Reviewed By: heitorschueroff Differential Revision: D29996244 Pulled By: zou3519 fbshipit-source-id: 02c9862eaa1ad8f2ae9c7f7448aeb9e23bcda276	2021-07-30 14:54:04 -07:00
Jiakai Liu	73c46092f1	[pytorch] sort the output of the model_dump util (#62485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62485 Make it easier to browse the code section by sorting the files by name. Test Plan: Imported from OSS Reviewed By: dhruvbird, malfet Differential Revision: D30016245 Pulled By: ljk53 fbshipit-source-id: c9cb3c1ad9bcaa5337a6ad5c575ac0c240751f6c	2021-07-30 14:40:07 -07:00
Eli Uriegas	49060aa81a	Revert D29999785: Reland D29943356: .github: Migrate ecr_gc to github actions Test Plan: revert-hammer Differential Revision: D29999785 (`49dc827712`) Original commit changeset: bb9285076551 fbshipit-source-id: c26b39fb2d3c361015ce7f205d3f5f4232845289	2021-07-30 14:33:12 -07:00
Masaki Kozuki	43d4fe68cd	[Foreach] support implicit broadcasting in slow path (#62167 ) Summary: This PR has foreach functions support implicit broadcasting via slow path. rel: https://github.com/pytorch/pytorch/issues/58833 cc: ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/62167 Reviewed By: mruberry Differential Revision: D30005109 Pulled By: ngimel fbshipit-source-id: f48c0a13e304411763541ffcfcfc6154adb26bac	2021-07-30 13:29:56 -07:00
Mengwei Liu	70f57bcb1e	[PyTorch] Fix quantized Conv1d module parameters (#62356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62356 In `torch/nn/quantized/module/conv.py`, Conv1d is making a scaler `kernel_size` into a tuple with size 2 by repeating `kernel_size` value. This logic is breaking `Conv1d` because internally it's unsqueezing the input with shape N, C, L to N, C, 1, L in [`qconv.cpp`](`06dfaadfc6/aten/src/ATen/native/quantized/cpu/qconv.cpp (L841)`). Applying aforementioned kernel to this input shape will produce negative output shape in [`ConvUtils.h`](`203f7ff6e0/include/fbgemm/ConvUtils.h (L118-L119)`), if kernel_size > 1. Here I'm modifying the processing logic for `kernel_size` and a few other parameters, to follow the pattern of [`torch/nn/module/conv.py`](`aae2a3c95e/torch/nn/modules/conv.py (L284-L287)`). Test Plan: Rely on unit test Reviewed By: kimishpatel Differential Revision: D29957556 fbshipit-source-id: ae13f7ca892d60b82cfffdf972cce422ebfaae8e	2021-07-30 12:27:52 -07:00
Kimish Patel	eac288ea77	[Pytorch Backend Delegation] Annotate function args with type information (#62433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62433 Without type information, default type is Tensor which may conflict at runtime. Test Plan: CI Reviewed By: raziel Differential Revision: D29990902 fbshipit-source-id: 0a38843d7d0612a458bb38fad7c86bad08c7197b	2021-07-30 11:34:40 -07:00
Philip Meier	f16c73b9f3	Improve error messages of `torch.testing.assert_close` for sparse inputs (#61583 ) Summary: This utilizes the feature introduced in https://github.com/pytorch/pytorch/issues/60091 to modify the header of the error message. Before: ```python AssertionError: Tensor-likes are not equal! Mismatched elements: 1 / 2 (50.0%) Greatest absolute difference: 1 at index 1 Greatest relative difference: 0.3333333432674408 at index 1 The failure occurred for the values. ``` After: ```python AssertionError: Sparse COO values of tensor-likes are not equal! Mismatched elements: 1 / 2 (50.0%) Greatest absolute difference: 1 at index 1 Greatest relative difference: 0.3333333432674408 at index 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61583 Reviewed By: malfet Differential Revision: D30014797 Pulled By: cpuhrsch fbshipit-source-id: 66e30645e94de5c8c96510822082ff9aabef5329	2021-07-30 11:23:26 -07:00
Nathan Lanza	8a9dfa52e9	Delete an unused variable Summary: This was set twice but never used. Delete it. Test Plan: NFC Reviewed By: smeenai Differential Revision: D30000794 fbshipit-source-id: 084d16da914febec58c4cb5f452c37027275cd08	2021-07-30 11:10:38 -07:00
Ce Gao	73ba166e2a	fix(elastic-docs): Fix elastic launch doc (#62378 ) Summary: The documentation link should be https://pytorch.org/docs/stable/elastic/run.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/62378 Reviewed By: aivanou Differential Revision: D30002830 Pulled By: kiukchung fbshipit-source-id: 34b434acaa10222561df43f6397a2420eef02015	2021-07-30 10:58:13 -07:00
Richard Barnes	635e63c53d	irange-ify 15 (#62123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62123 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29879765 fbshipit-source-id: eda8e641e9fd06e16ad71b8144332f253537955a	2021-07-30 10:41:33 -07:00
Ivan Yashchuk	3c0c1c4ecb	Fix incorrectly sized tensors for svd when full_matrices=False (#62022 ) Summary: Before this PR for m x n input matrix, the return matrices were always allocated as m x m and n x n and then narrowed. This unnecessarily requires a lot of memory that is then discarded. With this PR when `compute_uv=True and full_matrices=False` correctly sized tensors are allocated. Moreover, if `compute_uv=False` U, V matrices are not allocated as they are not needed. However, cusolver's gesvdj routines fail when these matrices are not allocated, which is a bug, so this allocation is done separately in cusolver specific code path. MAGMA doesn't work for this input because it tries to allocate a large matrix internally (ROCm doesn't work as it uses MAGMA). Example error: ``` CUBLAS error: memory mapping error (11) in magma_sgelqf at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgelqf.cpp:161 CUBLAS error: out of memory (3) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145 CUBLAS error: not initialized (1) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145 MAGMA error: function-specific error, see documentation (1) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145 MAGMA error: function-specific error, see documentation (1) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145 python: /opt/conda/conda-bld/magma-cuda110_1598416697386/work/interface_cuda/interface.cpp:806: void magma_queue_create_internal(magma_device_t, magma_queue*, const char, const char*, int): Assertion `queue->dAarray__ != __null' failed. Aborted (core dumped) ``` Fixes https://github.com/pytorch/pytorch/issues/61949. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62022 Reviewed By: heitorschueroff Differential Revision: D29994429 Pulled By: ngimel fbshipit-source-id: c3f7744d7adc5fd6787f6cbb1ec41405f89a6d4c	2021-07-30 10:27:13 -07:00
Richard Zou	26d2f4acb2	Quick fix to make torch.tensor work with functorch (#62423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62423 Fixes https://github.com/facebookresearch/functorch/issues/7. functorch uses FuncTorchDynamicLayerBackMode as a mode key to wrap all tensors returned from operators in special TensorWrapper tensor extension. The problem with this is that TensorWrapper does not have storage so accessing the data_ptr (for recursive_store) internal asserts. As a quick hack, the guard added prevents functorch from wrapping the empty tensor in a TensorWrapper and instead when `tensor.to` is called later, the tensor gets wrapped. This is effectively what Ed proposed in https://github.com/facebookresearch/functorch/issues/7#issuecomment-847501020 In the long term we probably want some better way of extending `internal_new_from_data` for cases like this (where there is a mode-based dispatch key for a C++ tensor extension -- the Python case may be different). Test Plan: - Verified that this fixes functorch's problem Reviewed By: malfet Differential Revision: D29992607 Pulled By: zou3519 fbshipit-source-id: 82b713156a37d7470f8fc46e3803ee7353689a33	2021-07-30 10:15:23 -07:00
tktrungna	8c4d8c29e4	[2/n] add test ATen to wheel test (#62341 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62380 * This PR introduces env variable IN_WHEEL_TEST to control the dependency on `build/` folder * update `test_aten` function to call wheel install folder `{sitepackages}/torch` instead of `build/` folder Pull Request resolved: https://github.com/pytorch/pytorch/pull/62341 Test Plan: check if all ci workflows pass Reviewed By: walterddr Differential Revision: D30004259 Pulled By: tktrungna fbshipit-source-id: ccebd513a3530f1e5c8c9177d5f2daf14de3e853	2021-07-30 10:09:09 -07:00
Shiyan Deng	d08165dfdf	[fx2trt] Add op converters for ads 23x dense arch Summary: Adding 4 converters for 1. torch.addmm 2. torch.mul 3. torch.t 4. torch.sigmoid Test Plan: fx2trt unittests Able to lower dense arch with fx2trt locally. Reviewed By: ajtulloch, yinghai Differential Revision: D29563962 fbshipit-source-id: 114c4e871efb25379043224f5f0116829cd7dc50	2021-07-30 09:26:11 -07:00
Natalia Gimelshein	d783617216	enable warnings on cuda synchronization (#62092 ) Summary: This creates `torch.cuda.set_warn_on_synchronization()` function that would warn or error when synchronizing operation is performed. We could wrap it in a context manager for ease of use, but it would be a lie, because it sets global, and not thread-local state. Since it's intended for debugging, maybe that's ok though. As all `torch.cuda.*` functions, it's going through CPython, not pybind, so the argument is converted to long before being passed to c10 function. I'll make python argument a python enum class, but without pybind it'll still have to go thourgh long conversion. For a test script ``` import torch torch.cuda.set_warn_on_synchronization(1) x=torch.randn(10, device="cuda") x.nonzero() y=torch.randn((), device="cuda") if y: print("something") torch.multinomial(x.abs(), 10, replacement=False) torch.randperm(20000, device="cuda") ind = torch.randint(10, (3,), device="cuda") mask = torch.randint(2, (10,), device="cuda", dtype=torch.bool) val = torch.randn((), device="cuda") x[mask]=1. x[mask] = val torch.cuda.synchronize() ``` the output is ``` /../playground/sync_warn_test.py:4: UserWarning: called a synchronizing operation (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:145.) x.nonzero() /../playground/sync_warn_test.py:7: UserWarning: called a synchronizing operation (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:145.) if y: something /../playground/sync_warn_test.py:9: UserWarning: called a synchronizing operation (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:145.) torch.multinomial(x.abs(), 10, replacement=False) /../playground/sync_warn_test.py:15: UserWarning: called a synchronizing operation (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:145.) x[mask] = val ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62092 Reviewed By: mruberry Differential Revision: D29968792 Pulled By: ngimel fbshipit-source-id: cc6f817212c164727ed99ecf6ab050dc29631b9e	2021-07-30 09:13:01 -07:00
Isuru Fernando	273188549f	pass through EXITCODE EXITCODE__TRYRUN_OUTPUT variables (#49646 ) Summary: This is needed to allow cross compiling to work There are some `try_run` statements in CMake files used for building pytorch and dependencies. Since we are cross compiling, there's no way to run the compiled executables to get the output for `try_run` function. CMake provides a solution to this by requiring the user to manually provide the exitcode and the output of the executable which should be given by `EXITCODE` and `EXITCODE__TRYRUN_OUTPUT` respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49646 Reviewed By: heitorschueroff Differential Revision: D29960301 Pulled By: malfet fbshipit-source-id: b10ab9c182d1220f7e1911f922e7db261d521145	2021-07-30 08:22:33 -07:00
Howard Huang	b3781f0244	Remove faulty process group agent logic (#62409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62409 This a reland of #61907 because removing process_group_agent.h / cpp broke facebook specific tests. I will remove the files and update the internal test code in a separate PR. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29990001 Pulled By: H-Huang fbshipit-source-id: 2ee333322247d8b72691152308c3297e8c0c006d	2021-07-30 08:12:48 -07:00
Philip Meier	ee7d19ac29	add `OpInfo` for `torch.nn.functional.one_hot` (#62253 ) Summary: Addresses facebookresearch/functorch#78. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62253 Reviewed By: heitorschueroff Differential Revision: D29992924 Pulled By: zou3519 fbshipit-source-id: 1fc81edf3c8ca0722c5db0b32929a4cb3285f634	2021-07-30 07:05:29 -07:00
Kushashwa Ravi Shrimali	09d10c4329	OpInfo for nn.functional.softmax (#62077 ) Summary: This PR: * Adds OpInfo for `softmax` and `nn.functional.softmax` (alias). * Skip removal for `test_jit_alias_remapping` test of `log_softmax`. Please see https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261. cc: mruberry zou3519 pmeier Pull Request resolved: https://github.com/pytorch/pytorch/pull/62077 Reviewed By: heitorschueroff Differential Revision: D29990019 Pulled By: zou3519 fbshipit-source-id: 67476990b54a5dd824eed9d10236e118564f2501	2021-07-30 06:56:03 -07:00
Gary Miguel	9fdf7ec6a2	[docs] Update sphinx to 3.5.4 (#61601 ) Summary: Sphinx 4.x is out, but it seems that requires many more changes to adopt. So instead use the latest version of 3.x, which includes several nice features. * Add some noindex directives to deal with warnings that would otherwise be triggered by this change due to conflicts between the docstrings declaring a function and the autodoc extension declaring the same function. * Update distributions.utils.lazy_property to make it look like a regular property when sphinx autodoc inspects classes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61601 Reviewed By: ejguan Differential Revision: D29801876 Pulled By: albanD fbshipit-source-id: 544d2434a15ceb77bff236e934dbd8e4dbd9d160	2021-07-30 06:23:10 -07:00
Jane Xu	e352585f67	Clean up running smoke tests logic for Windows GHA (#62344 ) Summary: Followup to https://github.com/pytorch/pytorch/issues/62288 Front loads the logic and also force smoke tests to run on only one shard. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62344 Test Plan: Note that for the windows cuda10 run on PR, we get only 1 shard with the smoke tests running: https://github.com/pytorch/pytorch/pull/62344/checks?check_run_id=3194294041 Reviewed By: seemethere, heitorschueroff Differential Revision: D29991573 Pulled By: janeyx99 fbshipit-source-id: 263d7de72c7a82a7205932914c32d39892294cad	2021-07-30 05:00:56 -07:00
Scott Cheng	329426c249	Fix cppdoc example syntax (#62385 ) Summary: a simple fix Pull Request resolved: https://github.com/pytorch/pytorch/pull/62385 Reviewed By: suo Differential Revision: D30000982 Pulled By: heitorschueroff fbshipit-source-id: e2e1c9efba3734b58d9b5f358c01d12c2c8c91ff	2021-07-30 04:36:55 -07:00
Xiao Wang	d57ce8cf89	[Linalg] Add cusolver syevjBatched path for torch.linalg.eigh when cuda >= 11.3 U1 (#62003 ) Summary: This PR adds the `cusolverDn<T>SyevjBatched` fuction to the backend of `torch.linalg.eigh` (eigenvalue solver for Hermitian matrix). Using the heuristics from https://github.com/pytorch/pytorch/pull/53040#issuecomment-788264724 and my local tests, the `syevj_batched` path is only used when `batch_size > 1` and `matrix_size <= 32`. This would give us huge performance boost in those cases. Since there were known numerical issues on cusolver `syevj_batched` before cuda 11.3 update 1, this PR only enables the dispatch when cuda version is no less than that. See also https://github.com/pytorch/pytorch/issues/42666 #47953 https://github.com/pytorch/pytorch/issues/53040 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62003 Reviewed By: heitorschueroff Differential Revision: D30006316 Pulled By: ngimel fbshipit-source-id: 3a65c5fc9adbbe776524f8957df5442c3d3aeb8e	2021-07-30 00:35:21 -07:00
Stephen Macke	956c22b1f9	[dte] fastpath implementations for mulgrad / divgrad (3/x) (#62437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62437 In this diff we add a broadcast fastpath for MulGradient and DivGradient ops, whose tests we update to exercise the new functionality. Test Plan: Added test cases to elementwise ops (which will exercise the new MulGradient / DivGradient broadcast fastpath functionality) that will be run by CI. It's worth noting there's still no code (outside of the new test cases) that takes the new code paths added -- the user must explicitly request allow_broadcast_fastpath=True, and nothing outside of the added tests currently does so. Differential Revision: D29938273 fbshipit-source-id: 281c1a109e38c25b9bf9ff8d832de60ac3c231a9	2021-07-30 00:05:34 -07:00
Nathan Lanza	607d720be1	Remove an unused variable Summary: This is set but never used Test Plan: NFC Reviewed By: smeenai Differential Revision: D30000830 fbshipit-source-id: 702d6f7b844b52bfe696725a6b0a055d494b739a	2021-07-29 23:10:03 -07:00
Supriya Rao	cfd0f5ebc9	[quant] update per-channel observer min/max_val attribute names (#62345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62345 This PR updates the attribute names from min_vals to min_val. the motivation for this is to keep the attribute name consistent with per-tensor observers so that dependencies (like FusedMovingAvgObsFakeQuantize) don't need to differentiate between the two observer types to access the attributes. It also adds some BC tests to make sure that observers saved earlier with min_vals/max_vals can be loaded depending on the state_dict version. Note: Scriptability of the observers isn't fully supported yet, so we aren't testing for that in this PR. Test Plan: python test/test_quantization.py TestSerialization Imported from OSS Reviewed By: HDCharles Differential Revision: D30003700 fbshipit-source-id: 20e673f1bb15e2b209551b6b9d5f8f3be3f85c0a	2021-07-29 22:28:53 -07:00
Wanchao Liang	d92301dd02	[sharded_tensor] add new init_from_local_shards API (#60479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60479 This added `init_from_local_shards` API to construct a ShardedTensor from local_shards and global sharded_tensor_metadata. It also refactors the utils in ShardingSpec to be able to be used by sharded_tensor for sanity check purpose. Test Plan: test_init_from_local_shards test_init_from_local_shards_invalid_sharding Reviewed By: pritamdamania87 Differential Revision: D29276777 fbshipit-source-id: 011c1d70426bc560a59b8d858c68f1aa12db8481	2021-07-29 22:04:13 -07:00
Will Constable	bc787f2402	Fix setArgumentNames and make Script/Python consistent (#62442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62442 For PythonMethodWrapper::setArgumentNames, make sure to use the correct method specified by method_name_ rather than using the parent model_ obj which itself _is_ callable, but that callable is not the right signature to extract. For Python vs Script, unify the behavior to avoid the 'self' parameter, so we only list the argument names to the unbound arguments which is what we need in practice. Test Plan: update unit test and it passes Reviewed By: alanwaketan Differential Revision: D29965283 fbshipit-source-id: a4e6a1d0f393f2a41c3afac32285548832da3fb4	2021-07-29 21:29:06 -07:00
Dhruv Matani	725d98bab6	[Prototype] [PyTorch Edge] Speed up model loading by 12% by directly calling the C file API from FileAdapter (#61997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61997 After profiling the model loading latency on AI Bench (Android Galaxy S8 US), it seems like a significant amount of time was spent reading data using FileAdapter, which internally calls IStreamAdapter. However, IStreamAdapter uses `std::istream` under the hood, which is not that efficient. This change reduces the model loading time from [~293ms](https://www.internalfb.com/intern/aibench/details/600870874797229) to [~254ms](https://www.internalfb.com/intern/aibench/details/163731416457694), which is a reduction of ~12%. ghstack-source-id: 134634610 Test Plan: See the AI Bench links above. Reviewed By: raziel Differential Revision: D29812191 fbshipit-source-id: 57810fdc1ac515305f5504f88ac5e9e4319e9d28	2021-07-29 20:14:49 -07:00
Dhruv Matani	693d8f2f07	[PyTorch Edge] Cache operator lambda during model loading [7% faster model loading] (#61996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61996 A recent post https://fb.workplace.com/groups/pytorch.edge.users/posts/2012215235600341/ about slow model loading with an accompanying perf report (report.html) caused me to look at the report and find hot spots during model loading. This suggested that we spend quite a bit of time looking up operators from the dispatcher. This means that we can probably just cach the operator handler functions (instead of computing them every time the operator name shows up since it potentially shows up multiple times in a given model). This diff results in an approx 7% speedup in model loading time (from [315ms](https://www.internalfb.com/intern/aibench/details/45077128343028) to [293ms](https://www.internalfb.com/intern/aibench/details/600870874797229)) when run against an 87MB speech model that jiatongzhou provided. See https://fb.workplace.com/groups/pytorch.dev/posts/855724575006024/ for the previous post from jiatongzhou. ghstack-source-id: 134634612 Test Plan: Run using AI Bench. ### Speech Transducer v25 model (87MiB) Followed up with jiatongzhou and he gave me his speech model. For posterity, here's how to fetch it (you don't need to since I uploaded it to NMLML and now has a permanent Everstore Handle): ``` cd /tmp/ mkdir speech_model cd speech_model fbpkg fetch speech.stella.neural_transducer.on_device.en_us:25 cp pytorchmodel.pt ~/speech_transducer_v25_pytorchmodel.ptl ``` Here's how to build and run the benchmark using AI Bench: ``` buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/speech_transducer/v25.json --framework pytorch --platform android/arm64 --devices "S8US" --force_profile --remote ``` Reviewed By: raziel Differential Revision: D29826210 fbshipit-source-id: 134b67eb466e73f0e43447b9b966278f13c4b56f	2021-07-29 20:14:47 -07:00
Dhruv Matani	0b3f42fa4f	[PyTorch Edge] Add test for lite interpreter operator caching (#62306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62306 Test to see if caching of operators works as expected. When caching operators during model load we look up using the operator name. This test ensures that even if there are multiple operators with the same name (in the same model), the caching distinguishes between the ones that have a different number of arguments specified during the call in the serialized bytecode. In this specific test, there's a model with 3 methods, 2 of which return a `float32` tensor and one which return an `int64` dtype. Please see the comments in the diff for details. ghstack-source-id: 134634613 Test Plan: Test command: ``` cd fbsource/fbcode/ buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.OperatorCacheDifferentiatesDefaultArgs' ``` ``` cd fbsource/ buck test xplat/caffe2:test_lite_interpreter ``` Reviewed By: raziel Differential Revision: D29929116 fbshipit-source-id: 1d42bd3e6d33128631e970c477344564b0337325	2021-07-29 20:14:45 -07:00
Dhruv Matani	0bbdf0e1e3	[PyTorch Edge] Add test_lite_interpreter to fbsource xplat BUCK files (#62305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62305 Currently, it's super time consuming to run a lite interpreter test from fbcode since it takes > 10 minutes to build. Recently, I haven't been able to do that either due to low disk space. Having this test available in fbsource/xplat/ is a great win for productivity since I can re-run it in ~2 minutes even after significant changes! I've had to disarm some tests that can only run in OSS of fbcode builds (since they need functionality that we don't include for on-device FB builds). They are disarmed using the macro `FB_XPLAT_BUILD`. ghstack-source-id: 134634611 Test Plan: New test! Reviewed By: raziel, JacobSzwejbka, cccclai Differential Revision: D29954943 fbshipit-source-id: e55eab14309472ef6bc9b0afe0af126c561dbdb1	2021-07-29 20:13:06 -07:00
Nathan Lanza	90977e10ed	Remove an unused variable Summary: This is defined and then set once but never actually used. Kill it here. Test Plan: NFC Reviewed By: smeenai Differential Revision: D29994983 fbshipit-source-id: 0cb7383b3ec95f1aeed5210974bc95060cf10be5	2021-07-29 18:04:01 -07:00
Jerry Zhang	74291c8347	[quant][graphmode][fx] Fix the calls to load_arg in quantization_patterns.py (#62376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62376 load_arg(quantized=...) accepts a dictionary from index to dtype, not a list of dtype, the call is just to make sure the inputs are quantized with correct dtype Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: heitorschueroff Differential Revision: D29979711 fbshipit-source-id: 8499976ac5df8eb2019c3beae573dec6c9a56247	2021-07-29 17:28:07 -07:00
Stephen Macke	eef85f89b9	[dte] broadcast fastpath implementations for reduce utility functions (2/x) (#62428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62428 In this diff we add a broadcast fastpath for reduce utility functions. These functions are used by various elementwise ops, whose tests we update to exercise the new functionality. Test Plan: Added test cases to elementwise ops (which will exercise the new reducer functionality) that will be run by CI. It's worth noting there's still no code (outside of the new test cases) that takes the new code paths added -- the user must explicitly request `allow_broadcast_fastpath=True`, and nothing outside of the added tests currently does so. Differential Revision: D29938264 fbshipit-source-id: 5d5542bd93afb85fd9f7a4073f766adc07eb3b65	2021-07-29 17:27:39 -07:00
Jerry Zhang	219917706e	[quant][graphmode] Add support for reference pattern for default ops (#62375 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62375 default ops means ops that has one quantized input and one quantized output, e.g. gelu, silu, leaky_relu etc. and we need to insert observer for the output Test Plan: python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D29979712 fbshipit-source-id: ed88210a9d6f1ab5cdb9397b4ff7f1628162ef22	2021-07-29 17:27:37 -07:00
Yi Wang	acba9b3104	[DDP Communication Hook] Simplify the implementation of parseHookResult of PythonCommHook (#62389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62389 Simplify the implementation of `parseHookResult` of `PythonCommHook`, by not directly accepting the output of allreduce, which is a tensor list. Address the comment on https://github.com/pytorch/pytorch/pull/62074#discussion_r675303280 Additionally, formatter is also applied to `OptimizerHookState` and `hook_then_optimizer`. ghstack-source-id: 134626246 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork Reviewed By: rohan-varma Differential Revision: D29982485 fbshipit-source-id: 5b27cc5ef09d2f87c1ade4c0feef7eacc1af3a9a	2021-07-29 17:27:35 -07:00
Yi Wang	554daef820	Reformat test_c10d_nccl.py and distributed_test.py (#62388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62388 as title ghstack-source-id: 134626247 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D29984086 fbshipit-source-id: 0960e5acc379ccdf08813387e11d3fb0a5f0e4b0	2021-07-29 17:27:33 -07:00
Yi Wang	9fee176be3	[Model Averaging] Fix docstring of PeriodicModelAverager (#62392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62392 The constructor of `PeriodicModelAverager` does not need to accept parameters. ghstack-source-id: 134626245 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager Reviewed By: rohan-varma Differential Revision: D29986446 fbshipit-source-id: 6a8b709e4383a3c44b9e60955fbb067cd2868e76	2021-07-29 17:26:27 -07:00
Jerry Zhang	8f519c5e07	[quant][graphmode] Add support for reference pattern for torch.cat (#62374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62374 Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_cat Imported from OSS Reviewed By: vkuzo Differential Revision: D29979713 fbshipit-source-id: 2d38991f96fcca783169ffd306bc2b0fb7debc69	2021-07-29 16:31:09 -07:00
Heitor Schueroff	502823c201	Change torch::Tensor to at::Tensor to fix build failure (#62425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62425 Fixes https://github.com/pytorch/pytorch/issues/62416 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D30000948 Pulled By: heitorschueroff fbshipit-source-id: 07dfc88a01b7718bc32be4342f43bb2cf2842b60	2021-07-29 16:31:08 -07:00
Eli Uriegas	49dc827712	Reland D29943356: .github: Migrate ecr_gc to github actions (#62438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62438 Switches out BASH_ENV for GITHUB_ENV This reverts commit 1f1d01df3ec06046880d0a92b930fbd051d60606. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D29999785 Pulled By: seemethere fbshipit-source-id: bb92850765518005a3f530264643959e5038e681	2021-07-29 16:31:06 -07:00
Jerry Zhang	dc8b5db5f8	[quant][graphmode] relax the constraint for supported_dtypes for reference option (Linear and Conv) (#62348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62348 Originally we have a supported_dtypes check for linear and conv, but it's only valid for non reference option, this PR removes the constraint when is_reference=True and enables producing reference patterns for the dtype combinations that's not supported by fbgemm/qnnpack, for example qint8 activation dtypes Test Plan: python test/test_quantization.py TestQuantizeFx.test_linear_qint8_activation Imported from OSS Reviewed By: vkuzo Differential Revision: D29968675 fbshipit-source-id: 2abe37940eb62e16fcf0cbb700c174de49719223	2021-07-29 16:31:04 -07:00
Stephen Macke	9f9244aabe	[dte] scaffolding for c2 operator broadcasting fastpath (1/x) (#62369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62369 This diff is a big no-op that just sets up scaffolding for passing the "allow_broadcast_fastpath" from caffe2 operator protos created in Python down to C++. To facilitate this, we create helper template wrappers that pass a flag for "allow_broadcast_fastpath" down to elementwise functors. This flag will determine whether to try and take the broadcast fastpath, which we will add in subsequent diffs. Test Plan: sandcastle + let github CI run Differential Revision: D28154475 fbshipit-source-id: 15750a0bcd2994fbc6a61fb5653d8cae6b0177dd	2021-07-29 16:31:02 -07:00
Yu Guo	5c47038d12	Back out D29792193 "Add default Saved Variable hooks" (#62415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62415 test error Differential Revision: D29990361 fbshipit-source-id: 99c87dec6c5be6496c9db5c9205c3cb72a953dd9	2021-07-29 16:31:00 -07:00
Yu Guo	dcfcefcd0b	Back out D29848525 "Catch saved tensors default hooks race condition" (#62414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62414 test error Differential Revision: D29990348 fbshipit-source-id: 1a7c668153ad7ad9e847dd1a74db678e787b6b0e	2021-07-29 16:29:46 -07:00
Richard Zou	389380ffcc	[reland] Refactor Tensor::to to call a primitive that is not copy_. (#62262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62262 Context ------- functorch is unable to vmap(grad(f)) when f contains a .to call. This is because .to (when it is not a no-op) decomposes to .copy_ under grad and the .copy_ is not compatible with vmap. Fix --- The fix for this is to have all Tensor::to variants call a new operator, `_to_copy`, that always copies and is a primitive w.r.t. autograd so that autograd decomposes Tensor::to into a call to `_to_copy`. (This is related to https://github.com/pytorch/pytorch/issues/60956, please let me know if you want to bikeshed the naming). In order to get this done I had to do a bit of refactoring. All of the `::to` implementations now call `to_impl` which may call `_to_copy`. Autograd codegen changes ------------------------ The second thing I had to do was modify the autograd codegen. Right now, autograd assumes that every output is either statically known to be differentiable or not differentiable at codegen time. `_to_copy` is a little special because its differentiability depends on the output dtype. e.g. `torch.randn(3, requires_grad=True).to(torch.long)` is non differentiable. To get this to work: - I changed how `output_differentiability` in derivatives.yaml work. - output_differentiability can now accept "conditions" for each of the output arguments. A "condition" is some C++ code. - We currently only support `output_differentiability` with conditions if there is a single output. This is for convenience and can be changed in the future. - I added a new `output_differentiability_conditions` field to DifferentiabilityInfo. This gets populated in load_derivatives.yaml - forward-mode and reverse-mode AD take `output_differentiability_conditions` into account. Here's how the generated code for `VariableType::_to_copy` [looks like](https://gist.github.com/zou3519/93462df4bda1837acee345205b7cc849) No other autogenerated code gets modified by this PR. Performance benchmarking ------------------------ - I benchmarked [three cases that demonstrate overhead](https://gist.github.com/zou3519/5b6985e6906b80eec5a0dd94ed5b6a1a). - Case A: No-op .to(). Instruction count went from 50223 to 25623. I have no clue why but this is a good thing. - Case B: not-no-op .to(). Instruction count went from 665291 to 671961. This is expected; `_to_copy` adds an additional dispatch. - Case C: not-no-op .to() forward pass and backward pass. Instruction count went from 4022841 to 4030057. This PR adds an additional dispatch to .to() (so there should be one additional dispatch in the forward pass) so this number looks reasonable. Test Plan --------- - test_torch.py has a test_to - test_cuda.py has test_to* - test_autograd has tests (test_type_conversions) that exercise the reverse-mode path - test_ops.py has some tests (like log_softmax) that exercise the reverse-mode and forward-mode AD path. - test_quantization, test_namedtensor all exercise tensor.to as well. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29934998 Pulled By: zou3519 fbshipit-source-id: 820069acd66fd5af97b98f42edfca68572c9eb1c	2021-07-29 10:49:32 -07:00
Raghavan Raman	7b6d569a2b	[jit] Renamed prim::Concat as prim::VarConcat (#61983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61983 Trial #2. The previous PR (https://github.com/pytorch/pytorch/pull/61498) was reverted because this caused a failure in `pytorch_linux_backward_compatibility_check_test`. Fixed that now by adding to the exception list in `check_backward_compatibility.py`. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D29828830 Pulled By: navahgar fbshipit-source-id: 947a7b1622ff6e3e575c051b8f34a789e105bcee	2021-07-29 10:28:59 -07:00
zhouzhuojie	5ede826178	Fix alpine ecr image pull (#62413 ) Summary: Fixes alpine ecr image pull in the render_test_result step ![image](https://user-images.githubusercontent.com/658840/127527503-e88f198d-a8d5-4d3b-a064-096dca07d713.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/62413 Reviewed By: malfet Differential Revision: D29990844 Pulled By: zhouzhuojie fbshipit-source-id: ff420f57d5e4b80d0ebf73508001a127649e9eb2	2021-07-29 10:20:13 -07:00
Joel Schlosser	a42345adee	Support for target with class probs in CrossEntropyLoss (#61044 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11959 Alternative approach to creating a new `CrossEntropyLossWithSoftLabels` class. This PR simply adds support for "soft targets" AKA class probabilities to the existing `CrossEntropyLoss` and `NLLLoss` classes. Implementation is dumb and simple right now, but future work can add higher performance kernels for this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61044 Reviewed By: zou3519 Differential Revision: D29876894 Pulled By: jbschlosser fbshipit-source-id: 75629abd432284e10d4640173bc1b9be3c52af00	2021-07-29 10:04:41 -07:00
Nikita Shulga	dd0ef23a85	Delete .clang-tidy-oss (#62373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62373 Internal clang-tidy can handle all the options after D29863426 was deployed Test Plan: CI Reviewed By: 1ntEgr8 Differential Revision: D29978471 fbshipit-source-id: ea531734ab4fc3e0a26552bd24846b22c2e5c745	2021-07-29 09:30:18 -07:00
zhouzhuojie	7157ad44bc	Fix windows ci squid env (#62353 ) Summary: This is a re-land of https://github.com/pytorch/pytorch/pull/62244, noticeable changes are - Use jinja2 variables to DRY the settings - Added no_proxy for common destinations that don't fit into proxy (e.g. the magic settings from [aws link](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/http_proxy_config.html#windows-proxy)) - Try to trigger windows GHA CI flows - Also went through the actionlint for github action linting errors Pull Request resolved: https://github.com/pytorch/pytorch/pull/62353 Reviewed By: driazati Differential Revision: D29970842 Pulled By: zhouzhuojie fbshipit-source-id: b9c457b0005bb1a64850949a56679d68fbb281d6	2021-07-29 09:20:30 -07:00
Thomas J. Fan	80a662e773	ENH Updates docs and tests for classification modules that already support no batch dims (#61874 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61874 Reviewed By: heitorschueroff Differential Revision: D29979977 Pulled By: jbschlosser fbshipit-source-id: 82c19151aa7220e564337b05d7677d52981e0aa2	2021-07-29 09:14:52 -07:00
Erjia Guan	b9f02778b2	Forward fix mypy for #61820 (#62398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62398 Test Plan: Imported from OSS Reviewed By: malfet, anjali411 Differential Revision: D29988610 Pulled By: ejguan fbshipit-source-id: 700dfa5b1c415bc058390bbe5727a739c8419b0f	2021-07-29 07:43:12 -07:00
Gao, Xiang	2d103025a5	Adding warning on isend about modifying after send (#61875 ) Summary: This is a standard limitation on communication collective libraries. For example: https://www.open-mpi.org/doc/v4.0/man3/MPI_Isend.3.php ``` A nonblocking send call indicates that the system may start copying data out of the send buffer. The sender should not modify any part of the send buffer after a nonblocking send operation is called, until the send completes. ``` http://openucx.github.io/ucx/api/latest/html/group___u_c_p___c_o_m_m.html#ga8323878b60f426c630d4ff8996ede3cc ``` The user should not modify any part of the buffer after this operation is called, until the operation completes. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61875 Reviewed By: suo Differential Revision: D29783720 Pulled By: mrshenli fbshipit-source-id: 78fd047c74449f77b906f3766a6c2bc29499847d	2021-07-29 07:37:18 -07:00
albanD	945d40dca6	Also disable inplace fw AD for acos on windows (#62360 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62304 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62360 Reviewed By: malfet, bdhirsh Differential Revision: D29973310 Pulled By: albanD fbshipit-source-id: 3b033e779f557724602c5a87f497698f2262a12e	2021-07-29 06:42:25 -07:00
Jerry Cai	1b147a52f5	Allow FX tracer to trace control flow (if/while) statements when parameter shapes are in the conditionals (#61820 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61733 Allow FX tracer to trace control flow (if/while) statements when parameter shapes are in the condition. If the user specifies the new "param_shapes_constant" option when constructing a tracer, the model's parameter shape attribute will be evaluated and the resulting constant will be emitted into the IR during tracing. Also added a new test ` python test/fx/test_fx_param_shape_control_flow.py ` The test also performs a somewhat whitebox style testing to check the generated Python code from the IR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61820 Reviewed By: bdhirsh Differential Revision: D29969299 Pulled By: jerryzhenleicai fbshipit-source-id: 99aae824bdfec880be69258de7ead5c8cd59eddc	2021-07-28 23:48:44 -07:00
guyang3532	4ed8858817	Exclude time of waiting in queue from gloo communication prof… (#61342 ) Summary: Background: The gloo communication implementation is as follow: 1. Construct communication workers and push them into a queue. 2. Initialize a thread pool and each thread run a loop to get worker from the queue and execute it. Issue: The recorded profiling time span start from the worker construction and end at finish. So it will include the time of worker waiting in the queue and will result in multiple gloo communication time span overlapping with each other in a same thread in the timeline: ![image](https://user-images.githubusercontent.com/62738430/124867273-5bc95b80-dff0-11eb-8664-6e5d4166fc39.png) This is because when next work is waiting in the queue, the last work is not finished. Solution: This PR delays the profiling start time of gloo communication from worker construction to worker is really executed, so the profiling span will not include the time of waiting in queue. Implementation as follow: 1. Firstly, disable the original record function by specifying 'nullptr' to 'profilingTitle' argument of ProcessGroup::Work 2. Construct a 'recordFunctionBeforeCallback_' and 'recordFunctionEndCallback_' and save it as member of the worker. 3. When the worker is executed, invoke the 'recordFunctionBeforeCallback_'. 4. The 'recordFunctionEndCallback_' will be invoked at finish as before. After this modification, the gloo profiling span in timeline will not overlap with each other: ![image](https://user-images.githubusercontent.com/62738430/124868716-bb286b00-dff2-11eb-9cf0-d0494a356d0c.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61342 Reviewed By: albanD Differential Revision: D29811656 Pulled By: gdankel fbshipit-source-id: ff07e8906d90f21a072049998400b4a48791e441	2021-07-28 22:24:26 -07:00
Joel Schlosser	35307b131d	Callable activation function support for Transformer modules (Python) (#61355 ) Summary: Fixes Python part of https://github.com/pytorch/pytorch/issues/60747 Enhances the Python versions of `Transformer`, `TransformerEncoderLayer`, and `TransformerDecoderLayer` to support callables as their activation functions. The old way of specifying activation function still works as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61355 Reviewed By: bdhirsh Differential Revision: D29967302 Pulled By: jbschlosser fbshipit-source-id: 8ee6f20083d49dcd3ab432a18e6ad64fe1e05705	2021-07-28 21:42:56 -07:00
Rohan Varma	1f2b96e7c4	[DDP] Make compute_bucket_assignment_by_size return per bucket sizes (#62231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62231 `compute_bucket_assignment_by_size` is responsible for setting per-bucket size limits, return this information from the function so that we are aware of size limits for each bucket. This is currently not being consumed, but will be in the next diff when we log bucket size limits to DDP logging. This will help us run experiments under different bucket size configs and analyze the impact. ghstack-source-id: 134480575 Test Plan: CI Reviewed By: mrshenli Differential Revision: D29919056 fbshipit-source-id: dd5a096fa23d22e5d9dc1602899270a110db4a19	2021-07-28 20:21:01 -07:00
Rohan Varma	c76daa6de3	[DDP][ez] Remove misleading comment (#62230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62230 We don't iterate over model replicas anymore. ghstack-source-id: 134475834 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29918760 fbshipit-source-id: 84bde670b4e91667a49f94f1b597fad364498467	2021-07-28 20:20:59 -07:00
Rohan Varma	842228fd0d	[DDP] Save bucket size limits (#62229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62229 First of a stack of diffs to save and log the bucket size limits to help debug/discover discrepancies and analyze impact of bucket size tuning ghstack-source-id: 134475835 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29918629 fbshipit-source-id: b9b3f9a5658340a4c7fd72874c2254664e3c52e9	2021-07-28 20:19:56 -07:00
Pritam Damania	cac4aa71ca	Provide option to pass module instance to _load_state_dict_pre_hooks. (#62070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62070 We have a custom Tensor: https://github.com/pytorch/pytorch/blob/master/torch/distributed/_sharded_tensor/api.py#L67, which doesn't show up in state_dict for the module. This was resolved by using the _register_state_dict_hook: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py#L1196 to parse and add custom tensors to state_dict. However, the problem is during load time _register_load_state_dict_pre_hook: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py#L1272, does not pass in the module instance and as a result, a ShardedTensor in the state_dict cannot be appropriately added to a module at load time. To resolve this issue, in this PR I've enhanced this hook to support two variations, one which passes in the module instance (for the problem described above) and one is the previous version for BC reasons. ghstack-source-id: 134541391 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: jbschlosser Differential Revision: D29867142 fbshipit-source-id: bcb136ff51eedd0b508cfb419e8b8a6b7d95539c	2021-07-28 19:22:47 -07:00
Yi Wang	2eaf71d749	[Model Averaging] Update model averager API to avoid the redundant `params` arg needed by post-localSGD optimizer (#62132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62132 as title Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 134560541 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_post_localSGD_optimizer_parity buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager Reviewed By: rohan-varma Differential Revision: D29887751 fbshipit-source-id: 60dadb04790d800fdcc7cb8a08d060e411718739	2021-07-28 18:43:09 -07:00
Yi Wang	55bee44951	[Model Averaging] Post-localSGD optimizer (#62131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62131 Wrap `PeriodicModelAverager` as an optimizer. Currently both the optimizer and averager require an input `params` arg, where the latter actually can read params from the optimizer wrapper. Will update averager class API in a follow-up PR. Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 134560248 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_post_localSGD_optimizer_parity Reviewed By: rohan-varma Differential Revision: D29881465 fbshipit-source-id: b9634972f4d8bffd3b3eb94f5dbbb19db2bcd759	2021-07-28 18:42:06 -07:00
Rohan Varma	58d45d950b	[DDP] Log unused param names under DETAIL debug mode. (#62209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62209 When `TORCH_DISTRIBUTED_DEBUG=DETAIL` is set, log names and indices of unused parameters when searching for them. Motivation is that we have seen a couple of issues occasionally when there are errors related to parameter possibly being marked as unused when it shouldn't, this can help narrow down the root cause by explicitly logging param names that are marked as unused. ghstack-source-id: 134541461 Test Plan: CI Reviewed By: mrshenli Differential Revision: D29916085 fbshipit-source-id: d84cf637cbbd811521e6264ffd6c50ca8a79595b	2021-07-28 18:10:32 -07:00
David Riazati	24ed6e6b16	Add actionlint (#62364 ) Summary: This adds a linter for our GitHub actions. When a GitHub Actions workflow has an invalid definition, GitHub doesn't queue the job and doesn't report it as failed, so these can be hard to detect with the usual tools. This adds an explicit job to check if our workflow YAMLs are valid using [https://github.com/rhysd/actionlint](https://github.com/rhysd/actionlint). We deployed a similar check in pytorch/test-infra [here](https://github.com/pytorch/test-infra/pull/89). This PR enables the linter and fixes all the issues it complained about (it did already catch one bug where we were leaving `CIRCLE_BRANCH` blank when uploading binary size) Pull Request resolved: https://github.com/pytorch/pytorch/pull/62364 Reviewed By: zhouzhuojie Differential Revision: D29973928 Pulled By: driazati fbshipit-source-id: 83b365e98fd6cbdcd75eeb44daf1be1c89056f8d	2021-07-28 17:10:20 -07:00
Peter Bell	fcc7fbe15f	Split zeta_kernel out of BinaryMiscOpsKernel.cu (#62261 ) Summary: `BinaryMiscOpsKernel.cu` takes 4 m 30 s to compile on my machine, which is the second slowest after `PowKernel.cu`. Moving the zeta kernel into it's own file takes 3 m 30 s, and reduces `BinaryMiscOpsKernel.cu` compile time to 1 m. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62261 Reviewed By: bdhirsh Differential Revision: D29969350 Pulled By: ngimel fbshipit-source-id: 37cad5775088b2f7d22948414e4bf0427f88e07d	2021-07-28 16:07:15 -07:00
Vasiliy Kuznetsov	f6e137598d	ns for fx: fix nit in default qlinear weight extraction function (#62334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62334 Removes the assert for node type in default qlinear weight extraction function. Without the assert, user defined functions can now use this util function without failing this check. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs // further tests will be in follow-up fb-only diffs ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29963501 fbshipit-source-id: a634eabb5165375bde186438318ec52fa29c970f	2021-07-28 16:07:13 -07:00
Vasiliy Kuznetsov	72c943a2ac	ns for fx: fix bug for user function in weight extraction (#62333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62333 We incorrectly ignored any custom relationships the user specified in the `extract_weights` API. Fixing this and adding a test case. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29963502 fbshipit-source-id: 33ce3d4df1acb6298b6c7dcb6674015c8d14bdf4	2021-07-28 16:05:51 -07:00
Karen Zhou	d98b1c400d	[pruner] add cuda tests for pruner (#61993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61993 Repeating `test_pruner` unit tests for Linear and Conv2d models with device = 'cuda' to confirm pruner will work on GPU - set device to cuda - move model to device - assert that module.weight.device is cuda ghstack-source-id: 134554382 Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1Md9c Reviewed By: jerryzh168 Differential Revision: D29829293 fbshipit-source-id: 1f7250e45695d0ad634d0bb7582a34fd1324e765	2021-07-28 14:45:04 -07:00
Richard Barnes	b39b28ced3	irange-ify 10 (#62122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62122 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29879694 fbshipit-source-id: 87cd8ab17061129c835d9f961b67587c84d181d1	2021-07-28 13:35:23 -07:00
Richard Barnes	88f8f2ab94	irange-ify 6 (#62115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62115 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29879576 fbshipit-source-id: 63cbf0ab5a52325fa2c3dec0e8239e2eac1ecf72	2021-07-28 13:32:11 -07:00
Richard Barnes	9e77113e85	irange-ify 11 (#62121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62121 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29879701 fbshipit-source-id: 5c51879c88fa6a5790db241c8b33ec0dc4b177ca	2021-07-28 13:32:09 -07:00
Richard Barnes	b5867a1b34	irange-ify 7 (#62117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62117 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29879640 fbshipit-source-id: 189578a57301747a3421742e145bbcdf2ad75c49	2021-07-28 13:30:39 -07:00
Jane Xu	59bb4f2dab	Revert D29928698: [pytorch][PR] Use private squid proxy Test Plan: revert-hammer Differential Revision: D29928698 (`6da4a25509`) Original commit changeset: 4ee78be0abe3 fbshipit-source-id: 44679a2b247ba8163f09895d9d36ecf5df4390b8	2021-07-28 12:35:55 -07:00
Meghan Lele	3a2603bc68	Port `slow_conv_transpose2d` to structured (#55503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55503 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29945028 Pulled By: SplitInfinity fbshipit-source-id: 0b696d104938287444210f1bc926afc13f899991	2021-07-28 12:03:03 -07:00
Meghan Lele	05b802d4e0	[pytorch] Bring back RemoveInplaceOps() (#62200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62200 This commit brings back the `RemoveInplaceOps` pass removed in D29523283 (`dec5aa2260`) that apparently had a bunch of internal users. Test Plan: danthe3rd Reviewed By: danthe3rd Differential Revision: D29833316 fbshipit-source-id: 6cf13d463ab0a5e50ba3eb3243f79a9c51623809	2021-07-28 12:00:38 -07:00
Raghavan Raman	b91a917616	[Static Runtime] Fixed another build failure in OSS due to test_utils.h (#62338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62338 Test Plan: Imported from OSS Reviewed By: d1jang Differential Revision: D29965744 Pulled By: navahgar fbshipit-source-id: cf3e54ac13432ea8afc4b718fac6c9768743d01b	2021-07-28 11:41:33 -07:00
Thomas J. Fan	7c588d5d00	ENH Adds no_batch_dim support for pad 2d and 3d (#62183 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62183 Reviewed By: ejguan Differential Revision: D29942250 Pulled By: jbschlosser fbshipit-source-id: d1df4ddcb90969332dc1a2a7937e66ecf46f0443	2021-07-28 11:10:44 -07:00
zhouzhuojie	6da4a25509	Use private squid proxy (#62244 ) Summary: This PR adds a private squid proxy (note that the internal ELB is only accessible from the private VPC subnets of GitHub Runners) that's deployed dedicated for PyTorch CI for GitHub runners. ``` dig $SQUID_PROXY 10.0.x.x 10.0.x.x ``` http_proxy and https_proxy are compatible with the following http clients: - curl - wget - python Existing cache policy: refresh_pattern -i .(7z\|deb\|rpm\|exe\|zip\|tar\|tgz\|gz\|ram\|rar\|bin\|tiff\|bz2\|run\|csv\|sh)$ 1440 80% 2880 It uses the standard squid refresh_pattern for cache requests. In our setup, we tried to cache at least (1440 minutes - 1 day) and at max (2880 minutes - 2 days), with last-modified factor 80% (squid doc). Please refer to pytorch/test-infra for details. Right now, it only applies to the build and test step, to limit the scope and make sure build and test are more reliable with egress cache. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62244 Test Plan: ``` # first time, cache miss (4min20s) http_proxy=$SQUID_PROXY https_proxy=$SQUID_PROXY curl -v -L http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz --output /tmp/tmp_mnist.zip 100 9680k 100 9680k 0 0 37836 0 0:04:21 0:04:21 --:--:-- 29908 # second time, cache hit (0s) http_proxy=$SQUID_PROXY https_proxy=$SQUID_PROXY curl -v -L http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz --output /tmp/tmp_mnist.zip 100 9680k 100 9680k 0 0 103M 0 --:--:-- --:--:-- --:--:-- 103M ``` Load Test Plan: ``` # ab load test with `-n 100` requests ab -X $SQUID_PROXY -n 100 http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz Concurrency Level: 1 Time taken for tests: 9.044 seconds Complete requests: 100 Failed requests: 0 Total transferred: 991326300 bytes HTML transferred: 991242200 bytes Requests per second: 11.06 [#/sec] (mean) Time per request: 90.442 [ms] (mean) Time per request: 90.442 [ms] (mean, across all concurrent requests) Transfer rate: 107040.50 [Kbytes/sec] received ``` Reviewed By: malfet Differential Revision: D29928698 Pulled By: zhouzhuojie fbshipit-source-id: 4ee78be0abe35411666c6121991b0addded57106	2021-07-28 10:37:42 -07:00
Yi Wang	2581dfc249	[Model Averaging] Create a base class for model averaging (#62111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62111 This base class will be passed to the post-localSGD optimizer in the next PR. This way, the same post-localSGD optimizer can choose different model averaging algorithms. Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 134489187 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager Reviewed By: rohan-varma Differential Revision: D29884954 fbshipit-source-id: 1dc5e35c58895902991567f633afd621c7108938	2021-07-28 10:15:36 -07:00
Howard Huang	a15fff0a7f	Revert D29794666: Remove faulty process group code Test Plan: revert-hammer Differential Revision: D29794666 (`afe3644321`) Original commit changeset: 0b35191cc072 fbshipit-source-id: 6467bc5100f4115f2fdb385e205740cd68c89743	2021-07-28 10:15:34 -07:00
Thomas J. Fan	71a6ef17a5	ENH Adds no_batch_dim tests/docs for Maxpool1d & MaxUnpool1d (#62206 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62206 Reviewed By: ejguan Differential Revision: D29942341 Pulled By: jbschlosser fbshipit-source-id: a3fad774cee30478f7d6cdd49d2eec31be3fc518	2021-07-28 10:15:32 -07:00
Jerry Zhang	cdf85a82ed	[quant][graphmode][fx] Add reference pattern support for BatchNorm (#62215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62215 including batchnorm2d, batchnorm3d, batchnormrelu2d and batchnormrelu3d Test Plan: python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D29917524 fbshipit-source-id: 3a9520ff659cb21e6e2fe614973b3d08aa0af923	2021-07-28 10:14:16 -07:00
leslie-fang-intel	7443c90f15	optimize non lastdim softmax bf16 (#60371 ) Summary: Here is the PR to enable the softmax calculation with data type of `bfloat16` when not along the last dim. * Use bf16 specialization for forward calculation to reduce the bf16/fp32 cast in vec template. * Release the bf16 limitation for backward calculation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60371 Reviewed By: ejguan Differential Revision: D29563109 Pulled By: cpuhrsch fbshipit-source-id: f6b439fa3850a6c633f35db65ea3d735b747863e	2021-07-28 10:06:51 -07:00
Don Jang	68efa186cc	[static runtime] Implement aten::full (#62227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62227 Test Plan: Added `StaticRuntime.IndividualOps_Full` to cover the newly added code path. Reviewed By: hlu1 Differential Revision: D29923649 fbshipit-source-id: 722950137c35ae325590a670b97f03b395e8eac3	2021-07-28 09:50:27 -07:00
Rohan Varma	10c6811a6b	[DDP] Run test_ddp_new_tensor_in_fwd with static graph (#61992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61992 This test previously was not enabled for static graph but to ensure this feature is supported with DDPSink, enable it for static graph which currently passes outputs to DDPSink. ghstack-source-id: 134471406 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D29830887 fbshipit-source-id: 2d3f750d9eb4289558ed21acccd172d83d9b82cc	2021-07-28 09:49:12 -07:00
Alban Desmaison	acf8907e94	These should be equivalent per the previous formula but breaks xla (#62329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62329 Reviewed By: ejguan Differential Revision: D29961527 Pulled By: albanD fbshipit-source-id: 46e46726591f4c0c8faf6ec0d7136a2d4ca976ea	2021-07-28 09:23:51 -07:00
Jerry Zhang	f4baa83eae	[bc-breaking] reference option for conv produce a pattern instead of reference conv module (#61942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61942 This PR changes is_reference=True for conv to produce a pattern consists of dequant - float conv - quant instead of reference conv module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for convert in the future. Test Plan: python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D29810656 fbshipit-source-id: 549237a62bfda4341a2a7474c124f5e33350e267	2021-07-28 09:13:40 -07:00
Richard Zou	52d1ffb789	Teach pytrees about namedtuple (#62292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62292 This PR adds pytree support for namedtuples. The challenge about namedtuple is that each namedtuple class is actually different. This PR does the following: - it adds a namedtuple flatten/unflatten. The flatten function returns a context that is the actual type of the namedtuple subclass. The unflatten function uses that type to reconstruct the namedtuple - Special cases all pytree logic to consider all namedtuples the same. This is done by creating a `_get_node_type(pytree)` helper function that returns `namedtuple` if `pytree` is any namedtuple subclass. The effect of this is that all namedtuple subclasses will go through the namedtuple flatten/unflatten functions - Adds a `_namedtuple_flatten_spec` function for FX pytrees. This function flattens the namedtuple based on the spec and is equivalent to the `_tuple_flatten_spec`. Test Plan - new tests in test/test_pytree.py and test/test_fx.py Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29947302 Pulled By: zou3519 fbshipit-source-id: 19c00665b13546642c315df0f243ad99b8e7ff7c	2021-07-28 06:27:44 -07:00
Nikita Shulga	c06b6e445f	Build M1 binaries with PocketFFT (#62222 ) Summary: As MKL is only available on x86_64 platform, clone header-only PocketFFT library and use it as FFT provider Fixes https://github.com/pytorch/pytorch/issues/62107 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62222 Reviewed By: ejguan Differential Revision: D29938718 Pulled By: malfet fbshipit-source-id: ac0bd98b5090d6c8a26c36c4e34a4d6e1d9f1a92	2021-07-27 22:41:29 -07:00
Nikita Shulga	cb2b5f06c9	Revert D29816592: [pytorch][PR] [fix] polygamma n>=1 Test Plan: revert-hammer Differential Revision: D29816592 (`b73d759708`) Original commit changeset: 2c020a6e4c32 fbshipit-source-id: 310c93ade300966366ef04f206a5908fb27745db	2021-07-27 22:14:10 -07:00
Amy He	73f1e2d1dc	[8/N] Nnapi backend delegation preprocess: New refactored design (#62225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62225 Rewrote the preprocess function for Android NNAPI delegate. Previously, `preprocess()` called `convert_model_to_nnapi()` using Pybind and returned a NnapiModule that is serialized for mobile. Now, `preprocess()` calls a sub-function of `convert_model_to_nnapi()` and returns several preprocessed items (that were previously components of NnapiModule). Dictionary returned contains: "shape_compute_module": torch::jit::Module, "ser_model": torch::Tensor, "weights": List[torch.Tensor], "inp_mem_fmts": List[int], "out_mem_fmts": List[int] Purpose and Future: The purpose of these changes are to move more implementation from bytecode and Torchscript to the delegate API, since bytecode is less efficient. Now, only the shape computation uses bytecode. In the future, shape computation will be moved out of Torchscript as well. nnapi_backend_preprocess.cpp: preprocess implementation prepare.py: refactored a portion of `convert_model_to_nnapi()` to `process_for_nnapi()`, so preprocess can get components of NnapiModule Test: Ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` on OSS successfully ghstack-source-id: 134444190 Test Plan: Ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` on OSS successfully Reviewed By: raziel Differential Revision: D29922279 fbshipit-source-id: cadcf8908d8a745dc7abbe286e97d6ead937d4ab	2021-07-27 18:52:48 -07:00
Nikita Shulga	7aabda6d5d	Update nccl to v2.10.3-1 (#62276 ) Summary: Which at the time of creating PR is points to `7e51592129` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62276 Reviewed By: ngimel Differential Revision: D29940950 Pulled By: malfet fbshipit-source-id: 59c6fda76a9023af3adbfb5a96b83ca50950df6c	2021-07-27 18:32:53 -07:00
Nikita Shulga	1f1d01df3e	Revert D29943356: .github: Migrate ecr_gc to github actions Test Plan: revert-hammer Differential Revision: D29943356 (`8e0622abf1`) Original commit changeset: 493592baf2f7 fbshipit-source-id: f0e604aab2b828561adc3e8fabf0f39221e15615	2021-07-27 18:14:31 -07:00
Wanchao Liang	af0f083d42	[dist_optim] fix the bug of none grads on functional optimizers (#62249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62249 parameter and grads passed to torch.optim.functional should always match, we should skip the parameters that have none gradients to avoid the size mismatch ghstack-source-id: 134452467 Test Plan: test_dist_optim_none_grads Reviewed By: mrshenli Differential Revision: D29929653 fbshipit-source-id: 4ca6167fecdfe1db422236655edee3aa59b8b044	2021-07-27 18:10:51 -07:00
Nikita Shulga	c0b806694f	Do not use deprecated data accessor in IndexKernel.cu (#62268 ) Summary: Fixes repeated warnings like: ``` /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/IndexKernel.cu: In lambda function: /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/IndexKernel.cu:354:683: warning: 'T* at::Tensor::data() const [with T = c10::BFloat16]' is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations] AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3 (`e23ddf06e9`)(at::ScalarType::Half, at::ScalarType::Bool, at::ScalarType::BFloat16, iter.dtype(), "take_cuda", [&] { ^ /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:559:1: note: declared here T * data() const { ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62268 Reviewed By: walterddr Differential Revision: D29937267 Pulled By: malfet fbshipit-source-id: 6413deb9762b973880f4a7db47652eacd013214f	2021-07-27 17:58:19 -07:00
Christopher Dewan	e3be185069	[PyTorch] Add KWargs support to script module forward (#62224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62224 They underlying operator allows both args and kwargs, but we only expose args in this convenience method. this brings them in line while not changing any existing programs. Test Plan: CI Reviewed By: gunchu Differential Revision: D29920830 fbshipit-source-id: f4b2aa88d4a679e33595625b7ef355e4d14e54c4	2021-07-27 17:02:57 -07:00
Peter Bell	9776e1ff2f	Migrate thnn_conv_depthwise2d from THC to ATen (#62281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62281 Closes gh-24646, Closes gh-24647 There is no `TensorIterator` equivalent to these kernels so this is just migrating the existing kernels over to the ATen style. I've benchmarked for contiguous tensors with this script: ``` import torch shape = (10, 10, 100, 100) x = torch.randn(*shape, device='cuda') w = torch.randn((10, 1, 5, 5), device='cuda') for _ in range(100): torch.nn.functional.conv2d(x, w, groups=10) ``` and similarly for backwards. I see these as the same to within measurement error. \| \| Master Forward (us) \| This PR Forward (us) \| \|------------------:\|:-------------------:\|:--------------------:\| \| Forward \| 133.5 \| 133.6 \| \| Backward (input) \| 1,102 \| 1,119 \| \| Backward (weight) \| 2,220 \| 2,217 \| Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29943062 Pulled By: ngimel fbshipit-source-id: fc5d16496eb733743face7c5a14e532d7b8ee26a	2021-07-27 16:51:23 -07:00
Alban Desmaison	ba9423aa93	Fix forward ad for matrix power land race (#62291 ) Summary: Fix land race from https://github.com/pytorch/pytorch/pull/59993 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62291 Reviewed By: driazati, seemethere Differential Revision: D29946599 Pulled By: albanD fbshipit-source-id: 16411e1a0c298fad12a6a6788ec2427923b0112a	2021-07-27 16:17:51 -07:00
Peter Bell	171e13fde9	Rework PowKernel.cu (#62260 ) Summary: PowKernel.cu is the single slowest file to compile in all of pytorch, taking 7 m 34 s on my machine. After investigating, I discovered that the case with complex inputs and a cpu scalar for the first argument takes more than half that time just on its own. Noting that [`thrust::pow`] for complex is just `exp(log(base) * exponent)`, we can improve this kernel by precomputing `log(base)` on cpu and computing only the `exp` on CUDA. This is faster in both runtime and compile time. For 1 million elements, master takes 61.6 us vs 56.9 us with this PR. I also noticed that the constant exponent case is implemented twice, once in `gpu_kernel_with_scalars` and again in `pow_tensor_scalar_kernel`. Further, the `Pow.cpp` code detects cpu-scalar exponents and redispatches to the `tensor_scalar` overload, making the `gpu_kernel_with_scalars` version dead code. Now instead, we unconditionally run `tensor_tensor` and it will call into `tensor_scalar` if appropriate. With these changes, PowKernel.cu takes just 2 m 30 s to compile. [`thrust::pow`]: `368266e80e/thrust/detail/complex/cpow.h (L33)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62260 Reviewed By: ejguan Differential Revision: D29938789 Pulled By: ngimel fbshipit-source-id: 7ab7d81ececc92a9e6e62e60b0a4f2e6e3146df8	2021-07-27 16:16:20 -07:00
Jerry Zhang	7507aeded5	[reland][bc-breaking] reference option for linear produce a pattern instead of reference linear module (#61892 ) (#62277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62277 This PR changes is_reference=True for linear to produce a pattern consists of dequant - float linear - quant instead of reference linear module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for convert in the future. Test Plan: python test/test_quantization.py TestQuantizeFxOps Imported from OSS Imported from OSS Reviewed By: ejguan Differential Revision: D29941079 fbshipit-source-id: 84bdfc0bb872c34fc345875e545c8b323e77c41e	2021-07-27 15:46:44 -07:00
Jane Xu	24d94f5102	Limit smoke tests on PRs to just one config (#62288 ) Summary: When coming across the short runtime of a periodic job on this PR, I realized the current smoke tests on PRs set up was flawed. Previously an attempt for better future compatibility, our conditional for running smoke tests only was for USE_CUDA=1 on Windows. This is BAD and has unintended consequences, such as misleading results when a ci/scheduled workflow is triggered but fails to test the full test suite. e.g., with PR https://github.com/pytorch/pytorch/issues/62266 https://github.com/pytorch/pytorch/actions/runs/1071698069 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62288 Reviewed By: seemethere, ejguan Differential Revision: D29945540 Pulled By: janeyx99 fbshipit-source-id: 3cc91511c151f7348872b039c94d7752b6ea4692	2021-07-27 15:33:37 -07:00
Eli Uriegas	8e0622abf1	.github: Migrate ecr_gc to github actions (#62284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62284 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet, zhouzhuojie Differential Revision: D29943356 Pulled By: seemethere fbshipit-source-id: 493592baf2f7abe206e1fb17438bac4e908b1251	2021-07-27 15:11:01 -07:00
Eli Uriegas	d0e5ef5eba	.circleci: Remove conda-package-handling pin (#62290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62290 No longer needed anymore. Fixes nightly failures that we're observing as well: ``` Jul 27 07:33:02 Found conflicts! Looking for incompatible packages. Jul 27 07:33:02 This can take several minutes. Press CTRL-C to abort. Jul 27 07:33:02 failed Jul 27 07:33:02 Jul 27 07:33:02 UnsatisfiableError: The following specifications were found Jul 27 07:33:02 to be incompatible with the existing python installation in your environment: Jul 27 07:33:02 Jul 27 07:33:02 Specifications: Jul 27 07:33:02 Jul 27 07:33:02 - conda-package-handling=1.6.0 -> python[version='>=2.7,<2.8.0a0\|>=3.6,<3.7.0a0\|>=3.7,<3.8.0a0\|>=3.8,<3.9.0a0'] Jul 27 07:33:02 Jul 27 07:33:02 Your python: python=3.9 ``` From: https://app.circleci.com/pipelines/github/pytorch/pytorch/356478/workflows/2102acf1-c92a-4a59-919c-61d32d3bcd71/jobs/15027876 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D29946501 Pulled By: seemethere fbshipit-source-id: 3e9182f4cbcf2aab185dbbc21b7a6171746e2281	2021-07-27 14:59:41 -07:00
Rong Rong (AI Infra)	8fe32c9c13	fix test-report uploading uniqueness issue (#62217 ) Summary: Should fix: https://github.com/pytorch/pytorch/issues/61978. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62217 Reviewed By: seemethere, ejguan Differential Revision: D29944444 Pulled By: walterddr fbshipit-source-id: 4b737d1535fd5cbfafb24245fad9ef67285f1dc0	2021-07-27 14:17:50 -07:00
Rong Rong (AI Infra)	190cdcb08c	remove print for status on scribe sending (#62285 ) Summary: Following up on https://github.com/pytorch/pytorch/issues/61768. Currently the printout is hugely long because each test case returns a status code OK without an exception. This should be avoided when no exception was raised from send_to_scribe. Removing the log printing when response without error Pull Request resolved: https://github.com/pytorch/pytorch/pull/62285 Reviewed By: zhouzhuojie Differential Revision: D29944461 Pulled By: walterddr fbshipit-source-id: fc3c2b88bba27c68521cef7079ca2b6197d2d58b	2021-07-27 14:16:32 -07:00
Mike Iovine	e1bee3eb30	[Static Runtime] Add missing unit tests for static runtime ops (#62238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62238 Added tests for the following ops: * `aten::mul` * `aten::nan_to_num` * `aten::stack` * `aten::relu` * `aten::tanh` Reviewed By: hlu1 Differential Revision: D29914217 fbshipit-source-id: 6a6c39629310e7131127e24fdce7253ccdf80340	2021-07-27 14:12:21 -07:00
Sameer Deshmukh	4a15f4a902	Allow 0-dim batch sizes in Bilinear NN layer. (#47106 ) Summary: Part of the fix for https://github.com/pytorch/pytorch/issues/12013 Checks if the inputs and outputs are non-zero in order to allow the Bilinear layer to accept 0-dim batch sizes. The if-check for this checks for both input and output dim sizes since the `_trilinear` function is written to work with both forward and backward for Bilinear. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47106 Reviewed By: ejguan Differential Revision: D29935589 Pulled By: jbschlosser fbshipit-source-id: 607d3352bd4f88e2528c64408f04999960be049d	2021-07-27 13:59:42 -07:00
albanD	ab0354b650	All remaining linear/element-wise formulas (#59993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59993 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29914594 Pulled By: albanD fbshipit-source-id: 2ffc5993cb66586e1458d7016774a03dfe786863	2021-07-27 13:06:46 -07:00
albanD	4c3eea26bd	Fix out= variant forward grad detection (#60499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60499 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29914595 Pulled By: albanD fbshipit-source-id: c51bb3aed91ab1f6ebc57936143b249590a43bd5	2021-07-27 13:06:45 -07:00
albanD	4a36e2a223	Add forward AD inplace check and fix codegen (#60498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60498 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29914593 Pulled By: albanD fbshipit-source-id: bde649d5a03639a240dfe5fe027c6a3f758428a4	2021-07-27 13:04:55 -07:00
Tanvir Zaman	df18d05429	Make bytes_read available for OperatorCost (#62059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62059 GetOperatorCost in Workspace exposes flops and bytes_written only. Make the an additional piece, bytes_read, available from OperatorSchema::Cost. Test Plan: Added the two additional pieces in the unit test testGetOperatorCost in workspace_test buck test caffe2/caffe2/python:workspace_test -- testGetOperatorCost buck test //aml/ml_foundation/exp_platform/large_scale_training/distributed_hogwild/auto_device_placement/tests/... buck test //aiplatform/training/autotuning/tests/... buck test //aiplatform/training/pipelining/tests/... buck test //deeplearning/fblsim/tests/... Flow tests: ADP Greedy: f288078287 ADP MILP: f288079278 Reviewed By: CrazySherman, xtaofb Differential Revision: D29860676 fbshipit-source-id: 8b3a9f2bf17c0dae48cfe2800e8821bf441e0b03	2021-07-27 12:48:36 -07:00
JackCaoG	bba7800933	Add logical op symbol (#62063 ) Summary: This is for xla side [pr](https://github.com/pytorch/xla/pull/3054) to add logical op lowering Pull Request resolved: https://github.com/pytorch/pytorch/pull/62063 Reviewed By: ejguan Differential Revision: D29937449 Pulled By: bdhirsh fbshipit-source-id: ba421f6c2dad67395a383b5ed0b81ad9d59abe86	2021-07-27 12:19:56 -07:00
Laurence Rouesnel	3bdee2bbed	[jit] Rewrote DFS graph iterator to remove unnecessary local state (#61326 ) (#61980 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61980 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29917766 Pulled By: laurencer fbshipit-source-id: 536c4806636fe9e709e8bffdefa9320127064dea	2021-07-27 11:50:20 -07:00
Eli Uriegas	fa52b4b922	.github: chown workspace for render_test_results (#62207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62207 Workspace was getting held back due to permission denied errors, let's ensure we have a chown'd / clean workspace for all render_test_results runs Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr, janeyx99 Differential Revision: D29915232 Pulled By: seemethere fbshipit-source-id: dd9fcc9c00d9665569bd8cfa57e5d2d8da965aac	2021-07-27 11:44:15 -07:00
Erjia Guan	acaac70f63	Revert D29883676: Migrate thnn_conv_depthwise2d from THC to ATen Test Plan: revert-hammer Differential Revision: D29883676 (`de3a4eb583`) Original commit changeset: 9b2ac62cdd8a fbshipit-source-id: d211d3cb7723b5d2e73de6941a7e649e5f78864f	2021-07-27 11:28:52 -07:00
Pritam Damania	82d81455ae	[2/N] Remove unittest.skip across all of torch.distributed. (#61887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61887 1) Introduced a `sandcastle_skip_if` decorator that ensures these tests just get passed on sandcastle. 2) Fixed all test files under `test/distributed` to not use `unittest.skip` Overall goal is to avoid using skips since sandcastle tags these tests as continuously skipping. ghstack-source-id: 134382237 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D29784152 fbshipit-source-id: 17b4df6c5a55ff1d1e8e1de128fa679c3dfbcb7d	2021-07-27 10:53:23 -07:00
huqinghao	7fc96db45d	fix typo errors in quantization-support.rst Line320 (#44447 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44379 change "`torch.per_channel_symmetric` — per tensor, symmetric" to "`torch.per_channel_symmetric` — per channel, symmetric" Pull Request resolved: https://github.com/pytorch/pytorch/pull/44447 Reviewed By: mruberry Differential Revision: D29909645 Pulled By: ezyang fbshipit-source-id: e1505d070ec2b335dd6503b528e6a2f3bda2f1e3	2021-07-27 10:42:29 -07:00
Edward Yang	5f7f08f498	Reenable AMP on XLA (#61861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61861 Fixes https://github.com/pytorch/pytorch/issues/61804 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29881903 Pulled By: ezyang fbshipit-source-id: 91530c10fa37715bec33f477285da119415a9da9	2021-07-27 10:32:01 -07:00
Oleg Khabinov	a0c1c7e5d4	Fixing the case when starter nodes depend on get_attr node (#62234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62234 There was a typo that we caught until recently, thus making this fix. Reviewed By: 842974287 Differential Revision: D29924190 fbshipit-source-id: ee6259fcd41358aefe9680b419acc87c0c2821cb	2021-07-27 10:29:53 -07:00
Erjia Guan	8cdf16d1de	Revert D29810657: [bc-breaking] reference option for linear produce a pattern instead of reference linear module Test Plan: revert-hammer Differential Revision: D29810657 (`9df605133e`) Original commit changeset: 949615bbc017 fbshipit-source-id: 54597d1f9636b0f94ae01c66018ff2592e5c39fc	2021-07-27 10:10:13 -07:00
Nikita Vedeneev	d7ddae8e4f	det_backward: correct, more robust and with complex support [clone] (#61905 ) Summary: Clone of https://github.com/pytorch/pytorch/pull/58195 to ease the import. Done by request from anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61905 Reviewed By: albanD Differential Revision: D29937920 Pulled By: anjali411 fbshipit-source-id: 025892a8e6147790825b20458986730ad8c5bb0f	2021-07-27 10:08:26 -07:00
Peter Bell	de3a4eb583	Migrate thnn_conv_depthwise2d from THC to ATen (#62006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62006 Closes gh-24646, gh-24647 There is no `TensorIterator` equivalent to these kernels so this is just migrating the existing kernels over to the ATen style. I've benchmarked for contiguous tensors with this script: ``` import torch shape = (10, 10, 100, 100) x = torch.randn(*shape, device='cuda') w = torch.randn((10, 1, 5, 5), device='cuda') for _ in range(100): torch.nn.functional.conv2d(x, w, groups=10) ``` and similarly for backwards. I see these as the same to within measurement error. \| \| Master Forward (us) \| This PR Forward (us) \| \|------------------:\|:-------------------:\|:--------------------:\| \| Forward \| 133.5 \| 133.6 \| \| Backward (input) \| 1,102 \| 1,119 \| \| Backward (weight) \| 2,220 \| 2,217 \| Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29883676 Pulled By: ngimel fbshipit-source-id: 9b2ac62cdd8a84e1a23ffcd66035b2b2fe2374d8	2021-07-27 10:00:25 -07:00
Jerry Zhang	9df605133e	[bc-breaking] reference option for linear produce a pattern instead of reference linear module (#61892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61892 This PR changes is_reference=True for linear to produce a pattern consists of dequant - float linear - quant instead of reference linear module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for convert in the future. Test Plan: python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D29810657 fbshipit-source-id: 949615bbc017bc454d81c8a6b2bdec53badaab19	2021-07-27 09:49:20 -07:00
Amy He	6c6a9c73f2	[7/N] Nnapi backend delegation preprocess: compile_spec sanity check (#62213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62213 Added sanity checks in preprocess function for Android NNAPI delegate. `preprocess()` requires some input metadata passed through its `method_compile_spec` function argument. `preprocess()` now throws specific error messages, if it cannot find the correct input arguments. Example error message: ``` RuntimeError: method_compile_spec does not contain the "forward" key. method_compile_spec should contain a Tensor or Tensor List which bundles input parameters: shape, dtype, quantization, and dimorder. For input shapes, use 0 for run/load time flexible input. method_compile_spec must use the following format: {"forward": {"inputs": at::Tensor}} OR {"forward": {"inputs": c10::List<at::Tensor>}} ``` nnapi_backend_preprocess.cpp: contains sanity check implementation test_backend_nnapi.py: sanity check unit tests Test: Ran `python test/test_jit.py TestNnapiBackend` in OSS successfully. TODO: Using Tensors to pass input parameters is a temporary hack. When a dedicated object is implemented, update the sanity check error message. ghstack-source-id: 134339282 Test Plan: Ran `python test/test_jit.py TestNnapiBackend` in OSS successfully. Reviewed By: raziel, iseeyuan Differential Revision: D29917004 fbshipit-source-id: 0d5c6b35889c556cda905ffc29c25c5422ae9ee4	2021-07-27 09:31:35 -07:00
Rohan Varma	2cbc0ede7d	[DDP] Log if graph is static at end of training (#61871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61871 When set_static_graph=False, the only type of dynamism we really support in DDP is dynamic set of unused parameters which must be explicitly enabled with find_unused_parameters=True. Although, some workflows have static set of unused parameters, would be good to detect and add this to logging to identify workflows that are candidates for static graph optimization. ghstack-source-id: 134371429 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D29773962 fbshipit-source-id: 1f741984c6e6f8e3e55cf69ca719b1e25a485b13	2021-07-27 09:23:43 -07:00
Mike Iovine	79eb8bb299	[Static Runtime] Enforce proper output dtype for many ops (re-land) (#62267 ) Summary: Re-land of D29935444 We previously had lots of ops with implementations like this: ``` if (p_node->Output(0).isNone()) { p_node->Output(0) = create_empty_like(input_0); } ... auto& out = p_node->Output(0); some_func_out(inputs, out); ``` This would make the output have the correct shape. But it would also take the dtype of `input_0`, which is not always correct. This change transforms these blocks to: ``` if (p_node->Output(0).isNone()) { p_node->Output(0) = some_func(inputs) } else { ... auto& out = p_node->Output(0); some_func_out(inputs, out); } ``` This gives the output the correct shape and dtype. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62267 Reviewed By: ejguan Differential Revision: D29937253 Pulled By: malfet fbshipit-source-id: d91ca5d5703490d7d349a1de2ad3bb09b0c33967	2021-07-27 08:54:09 -07:00
Brian Vaughan	2eef1f27f8	Disable ccache for nccl builds (#62208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62208 reverts https://github.com/pytorch/pytorch/pull/55814 which removed a workaround for: https://github.com/pytorch/pytorch/issues/13362 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29935472 Pulled By: nairbv fbshipit-source-id: 7ce9cde1408f17153632036fd128814032739746	2021-07-27 08:07:26 -07:00
Erjia Guan	dc55d511d9	Forward fix mypy (#62263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62263 Fixes current HUD Error: https://github.com/pytorch/pytorch/runs/3170342799 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29935265 Pulled By: ejguan fbshipit-source-id: 6f247833d24bff7aea42f6287493a85d62d73b96	2021-07-27 07:52:31 -07:00
Ivan Yashchuk	3cd12448b4	Add forward mode differentiation for inverse and solve (#62160 ) Summary: This PR adds forward mode differentiation for `torch.linalg.inv`, `torch.linalg.inv_ex`, and `torch.linalg.solve` functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62160 Reviewed By: mruberry Differential Revision: D29917213 Pulled By: albanD fbshipit-source-id: b08bbc830f77f342cc7ca5b823d7ea4380f2aaa8	2021-07-27 07:51:22 -07:00
Joel Schlosser	a0309f89f4	Initial ModuleInfo implementation (#61935 ) Summary: This PR contains the initial version of `ModuleInfo` for use in testing modules. The design philosophy taken here is to start small and simple and build out / refactor as needed when more test coverage or `ModuleInfo` entries are added. As such, it's not intended for general usage yet. The PR contains the following: * (new file) `torch/testing/_internal/common_modules.py` * `ModuleInfo` definition - metadata for each module to use in testing * `module_db` - the actual `ModuleInfo` database; currently contains entries for two modules * `ModuleInput` - analogous to `SampleInput` from OpInfo; contains `FunctionInput`s for both constructor and forward pass inputs * Constructor and forward pass inputs are tied together within a `ModuleInput` because they are likely correlated * `FunctionInput` - just contains args and kwargs to pass to a function (is there a nicer way to do this?) * `modules` decorator - analogous to `ops`; specifies a set of modules to run a test over * Some constants used to keep track of all modules under torch.nn: * `MODULE_NAMESPACES` - list of all namespaces containing modules * `MODULE_CLASSES` - list of all module class objects * `MODULE_CLASS_NAMES` - dict from module class object to nice name (e.g. torch.nn.Linear -> "nn.Linear") * (new file) `test/test_modules.py` * Uses the above to define tests over modules * Currently, there is one test for demonstration, `test_forward`, which instantiates a module, runs its forward pass, and compares it to a reference, if one is defined Pull Request resolved: https://github.com/pytorch/pytorch/pull/61935 Reviewed By: mruberry Differential Revision: D29881832 Pulled By: jbschlosser fbshipit-source-id: cc05c7d85f190a3aa42d55d4c8b01847d1efd57f	2021-07-27 07:42:07 -07:00
Howard Huang	afe3644321	Remove faulty process group code (#61907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61907 Removing the code for faulty process group agent since it was replaced by faulty tensorpipe agent Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29794666 Pulled By: H-Huang fbshipit-source-id: 0b35191cc07220b6774ecacc8d004f25fd2e87f0	2021-07-27 07:37:40 -07:00
Erjia Guan	a3be2ecc3a	Revert D29887367: [Static Runtime] Enforce proper output dtype for many ops Test Plan: revert-hammer Differential Revision: D29887367 (`f4136c5efc`) Original commit changeset: cef04bfa52ec fbshipit-source-id: 32e89f2b6381930559dd746b535904c3e90fd52b	2021-07-27 07:29:09 -07:00
lezcano	b599c1e794	Create linalg and parametrizations codeowners (#62086 ) Summary: Added myself nikitaved and IvanYashchuk Pull Request resolved: https://github.com/pytorch/pytorch/pull/62086 Reviewed By: mruberry Differential Revision: D29920798 Pulled By: albanD fbshipit-source-id: dcbd57bb2a438a1f04d4651447710fced83264d3	2021-07-27 06:50:41 -07:00
CodemodService FBSourceClangFormatLinterBot	228b50e053	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D29930232 fbshipit-source-id: e36dbc59a25d7f36d3bb7a02ad76696f299712cf	2021-07-27 04:13:15 -07:00
Jerry Zhang	2d7c1e3fa8	[bc-breaking] Produce quantization pattern for add_scalar and mul_scalar (#61859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61859 BC-breakign note: Previously we do not add observer/fake_quant for output of add/mul for tensor - scalar operation, in this PR we added the observer/fake_quant instance (that's the same as input) to correctly model the behavior of the quantized add_scalar and mul_scalar op (since quantized add/mul scalar assumes the output quantized tensor have the same quantization parameter as input) Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_add python test/test_quantization.py TestQuantizeFxOps.test_mul Imported from OSS Reviewed By: vkuzo Differential Revision: D29770859 fbshipit-source-id: f43fcbfecd04c392467770b22c481bbbdaf43c25	2021-07-27 02:46:00 -07:00
Alex Suhan	b176feec1e	Add device and key for lazy tensors (#61621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61621 Test Plan: CI Reviewed By: mruberry Differential Revision: D29912934 Pulled By: asuhan fbshipit-source-id: 493c32063a3e756d93cbf1d876563a35eaafb537	2021-07-26 23:00:22 -07:00
Nikita Shulga	2945a73d90	Add option to skip GH validation for torch.hub (#62139 ) Summary: Split from https://github.com/pytorch/pytorch/pull/62072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62139 Reviewed By: mthrok Differential Revision: D29891497 Pulled By: malfet fbshipit-source-id: 5c0baf53a2acf8f95062bd001457e1f936011529	2021-07-26 22:44:12 -07:00
Rohan Varma	64283fe146	[DDP/Functional Optim] Support kwarg arguments (#62079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62079 Adds support for kwarg arguments into functional optimizer running as hook. ghstack-source-id: 134330379 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29838127 fbshipit-source-id: 2ab051ef5f0dff19c145ebe2260668b927ba47b2	2021-07-26 22:12:50 -07:00
Rohan Varma	c0ebeca1a8	[Functional Optim] Test kwargs parity for SGD (#62078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62078 Ensure that kwarg arguments such as momentum and weight decay maintain parity between optimizer.step and step_param. ghstack-source-id: 134330377 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29837942 fbshipit-source-id: 1ae39648fc26aebd8aaef1a7ac0e03b598a8ed60	2021-07-26 22:11:40 -07:00
Nikita Shulga	478098aaac	Revert D29801652: Refactor Tensor::to to call a primitive that is not copy_. Test Plan: revert-hammer Differential Revision: D29801652 (`29bb3f4647`) Original commit changeset: bb01eb1acf3d fbshipit-source-id: 93693bad8068d47a3a4c16f34f300e03ea573897	2021-07-26 19:37:17 -07:00
Rohan Varma	69adb21940	Parity tests for functional optimizer step_param (#61756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61756 DDP will support running optimizer as communication hook with optimizers that support a per-parameter/gradient step function `step_param`. Add parity tests as we implement more optimizers that support step_param to ensure parity with regular optimizers. ghstack-source-id: 134330378 Test Plan: Ci Reviewed By: SciPioneer Differential Revision: D29727549 fbshipit-source-id: 18977c896f12b8e478298488b298fd107affcf5f	2021-07-26 19:03:22 -07:00
Nikita Shulga	b6d10a3a27	Fix infinite loop in `_validate_not_a_forked_repo()` (#62072 ) Summary: Increase `page_idx` in the loop rather than outside of it Break from the loop when receive empty response as it means there are no more items to fetch via pagination request Also, add options to use provided github token (via `GITHUB_TOKEN` environment variable) Fixes failure with "Rate Limit Exceeded" when doing something like `torch.hub.list("pytorch/test-infra:dsf")` Fixes https://github.com/pytorch/pytorch/issues/61755 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62072 Reviewed By: jbschlosser Differential Revision: D29868539 Pulled By: malfet fbshipit-source-id: 206082a0ba1208e9b15ff6c9c6cb71d2da74f1c3	2021-07-26 17:54:07 -07:00
Pavithran Ramachandran	d0f430927b	[PyTorch][Edge] Serializing sub modules with same names (#61933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61933 ### Issue: SubModules with same name are not serialized correctly in bytecode format while using `_save_for_mobile`. These submodules are not distinguished as different modules even though they have different foward, setstate etc if they have the same name. ### Fix: Mangler creates unique names so that modules and submodules that have same names can be uniquely identified while saving the module. iseeyuan rightly pointed out the underlying issue that mangler is not used in the process of saving bytecode and hence unique references for the submodules are not created. Please refer to the notebook to repro the issue: N777224 ### Diff: The above idea of fix is implemented. The mangled names are used in bytecode thereby the files in `code/` directory now have right reference to the `bytecode.pkl` Will this have backward compatibility? iseeyuan please feel free to correct or update this. Yes. This fix impacts only modules with same name sub modules which were not serialized correctly before. Existing modules should have correct references and `_load_for_mobile` must not see any change. To confirm this the existing test cases need to pass for the diff to be approved and shipped. ghstack-source-id: 134242696 Test Plan: ``` ~/fbsource/fbcode > buck test caffe2/test/cpp/jit:jit -- BackendTest.TestCompositeWithSetStates Downloaded 0/5 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 19.2 sec (100%) 17619/17619 jobs, 3/17619 updated Total time: 19.5 sec More details at https://www.internalfb.com/intern/buck/build/91542d50-25f2-434d-9e1a-b93117f4efe1 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: de9e27cf-4c6c-4980-8bc5-b830b7c9c534 Trace available for this run at /tmp/tpx-20210719-161607.659665/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/844425127206388 ✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (8.140) ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestCompositeWithSetStates (0.528) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/844425127206388 ``` ``` ~/fbsource/fbcode > buck test caffe2/test/cpp/jit:jit -- BackendTest.TestConsistencyOfCompositeWithSetStates Building: finished in 4.7 sec (100%) 6787/6787 jobs, 0/6787 updated Total time: 5.0 sec More details at https://www.internalfb.com/intern/buck/build/63d6d871-1dd9-4c72-a63b-ed91900c4dc9 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 81023cd2-c1a2-498b-81b8-86383d73d23b Trace available for this run at /tmp/tpx-20210722-160818.436635/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8725724325952153 ✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (7.867) ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestConsistencyOfCompositeWithSetStates (0.607) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8725724325952153 ``` To check the `bytecode.pkl` using module inspector please check: N1007089 Reviewed By: iseeyuan Differential Revision: D29669831 fbshipit-source-id: 504dfcb5f7446be5e1c9bd31f0bd9c986ce1a647	2021-07-26 16:31:48 -07:00
mattip	a13f714b6d	DOC: remove git stamp from release documentation version (#58486 ) Summary: CI built the documentation for the recent 1.9.0rc1 tag, but left the git version in the `version`, so (as of now) going to https://pytorch.org/docs/1.9.0/index.html and looking at the version in the upper-left corner shows "1.9.0a0+git5f0bbb3" not "1.9.0". This PR should change that to cut off everything after and including the "a". It should be cherry-picked to the release/1.9 branch so that the next rc will override the current documentation with a "cleaner" version. brianjo Pull Request resolved: https://github.com/pytorch/pytorch/pull/58486 Reviewed By: zou3519 Differential Revision: D28640476 Pulled By: malfet fbshipit-source-id: 9fd1063f4a2bc90fa8c1d12666e8c0de3d324b5c	2021-07-26 16:28:59 -07:00
Raghavan Raman	60070982d2	[Static Runtime] Fixed build failure in OSS due to test_utils (#62216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62216 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D29917514 Pulled By: navahgar fbshipit-source-id: 379863e6cd0b157de3bfa1482f5519b26654b3d2	2021-07-26 16:10:10 -07:00
Janet Yang	962841b532	Fix subnet counting and re-enable check for multiple onnxifi ops in AOT (#62033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62033 Count the number of onnxifi ops rather than just number of subnets, since when the subnet size < min_ops, it isn't turned into an onnxifi op. Test Plan: Runs which ran into the "Did not find a partition with an SLS node" error now report "multiple onnxifi ops found" From https://fb.workplace.com/groups/527892364588452/permalink/807802049930814/: ``` buck run mode/opt-clang -c python.package_style=inplace sigrid/predictor/scripts:rerun_aot -- --manifold_url="https://manifold.facebook.net/v0/read/tree/2021-06-30/onnxifi_caffe2_net_aot_input_arguments_01-55-32_711d9476?bucketName=dper3_job_meta&apiKey=dper3_job_meta-key&timeoutMsec=5000&withPayload=1" ``` Reran some failures from last week which now pass AOT: From https://fb.workplace.com/groups/527892364588452/permalink/807802049930814/, https://fb.workplace.com/groups/243933520351820/permalink/572715897473579/ ``` buck run mode/opt-clang -c python.package_style=inplace sigrid/predictor/scripts:rerun_aot -- --manifold_url="https://manifold.facebook.net/v0/read/tree/2021-07-09/onnxifi_caffe2_net_aot_input_arguments_05-31-08_ef5393a6?bucketName=dper3_job_meta&apiKey=dper3_job_meta-key&timeoutMsec=5000&withPayload=1" ``` ``` buck run mode/opt-clang -c python.package_style=inplace sigrid/predictor/scripts:rerun_aot -- --manifold_url="https://manifold.facebook.net/v0/read/tree/2021-07-12/onnxifi_caffe2_net_aot_input_arguments_14-44-34_cfdf3053?bucketName=dper3_job_meta&apiKey=dper3_job_meta-key&timeoutMsec=5000&withPayload=1" ``` ``` buck run mode/opt-clang -c python.package_style=inplace sigrid/predictor/scripts:rerun_aot -- --manifold_url="https://manifold.facebook.net/v0/read/tree/2021-07-13/onnxifi_caffe2_net_aot_input_arguments_04-03-30_162e7e53?bucketName=dper3_job_meta&apiKey=dper3_job_meta-key&timeoutMsec=5000&withPayload=1" ``` Reviewed By: khabinov Differential Revision: D29796893 fbshipit-source-id: e9de7529ef86745207d41643d0fbe932fa166437	2021-07-26 16:08:51 -07:00
Shiyan Deng	037c4aa1d1	[fx2trt] flatten converter (#62202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62202 Add acc_ops.flatten converter. Also migrate to oss acc tacer for trt interpreter. Test Plan: unit test Reviewed By: khabinov Differential Revision: D29861555 fbshipit-source-id: dac88a703fdbf386f3f7fb27674a67951f3add49	2021-07-26 15:49:01 -07:00
Richard Barnes	f883ed9095	irange-ify 8b (#62195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62195 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29887946 fbshipit-source-id: e3bd44721cf06a34ced47994810212be8460a2bb	2021-07-26 15:38:54 -07:00
Richard Barnes	f7743e92bf	irange-ify 9 (#62118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62118 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29879670 fbshipit-source-id: 99b86ac7d65dfa2a47d0e6b7d65433200d18081e	2021-07-26 15:13:02 -07:00
Kimish Patel	026cfe85b4	Fix InlinedCallStack annotation to account for module calling its own (#61791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61791 methods from forward During inlining we attached InlinedCallstack to nodes being inlined. In the process we attach moodule information as well, such that if CallMethod is being inlined we know which class instance and class type the method belongs to. However, CallMethod can be calling a method of the same object to which the graph belongs. e.g.: ``` def forward(self, input): x = input + 10 return forward_impl_(x, input) ``` Here forward_impl is method defined on the same class in which forward is defined. Existing module hierarchy annotation will mislabel this as unknown instance since the method is not associated with output of GetAttr node (it would be we had called self.conv.forward_impl_ for example). Change in this PR reconciles this by creating a placeholder name "SELF" for module instance indicating that you can traverse InlinedCallStack backwards to find first node with name != SELF, which would be the name of the object. e.g.: TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward Test Plan: Add test Imported from OSS Reviewed By: larryliu0820 Differential Revision: D29745443 fbshipit-source-id: 1525e41df53913341c4c36a56772454782a0ba93	2021-07-26 15:00:57 -07:00
Nikita Shulga	f16102f72a	Revert D29892919: Add squid proxy as egress cache Test Plan: revert-hammer Differential Revision: D29892919 (`e63160d735`) Original commit changeset: ac17227f2553 fbshipit-source-id: b78313147d60f26c1df68a25293e6b571ba66919	2021-07-26 14:42:28 -07:00
Edward Yang	cf1f59452b	Hacky support for meta tensor serialization. (#62192 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62192 This support is hacky because it doesn't preserve meta tensor storage sharing (e.g., if you serialize a model with shared storage, e.g., a tensor and a view on a tensor, when I deserialize the viewing relationship will be broken and these are just different tensors.) The hack is also durable, in the sense that we will be on the hook for supporting `_rebuild_meta_tensor_no_storage` in perpetuity in the future, even if we change our mind about the serialization format. This unblocks an FB production use case. I didn't add C++ support to minimize blast area of this patch. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29910535 Pulled By: ezyang fbshipit-source-id: d98dcdd0108dfc3ae730a071d3c583b6d0281d21	2021-07-26 14:33:45 -07:00
Elton Leander Pinto	f0140a8c5f	Disable cppcoreguidelines-non-private-member-variables-in-classes (#62212 ) Summary: This PR disables the `cppcoreguidelines-non-private-member-variables-in-classes` check. PyTorch makes use of `protected` members throughout the codebase, and we do not want to perform this clang-tidy check in CI to improve signal-to-noise. Relevant failure: https://github.com/pytorch/pytorch/pull/61871/checks?check_run_id=3146453417 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62212 Reviewed By: driazati Differential Revision: D29917882 Pulled By: 1ntEgr8 fbshipit-source-id: f607c3d050a122e95136f9915060c4cda6694c9d	2021-07-26 14:14:05 -07:00
Elton Leander Pinto	1343eea037	Fix clang-tidy line filtering logic (#62210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62210 Fixes #62204 Test Plan: #62211 clang-tidy should only error on the added lines (and not on context/removals) Reviewed By: driazati Differential Revision: D29917897 Pulled By: 1ntEgr8 fbshipit-source-id: de91dbf34c1ad8405507cad91ab3dd0d6c61d82e	2021-07-26 14:12:53 -07:00
Elton Leander Pinto	2a83f24027	Enable macos clang-tidy installs (#62214 ) Summary: This PR enables installing our custom MacOS clang-tidy binaries. It also updates related documentation. The binaries are produced by [this CI job](https://github.com/pytorch/test-infra/blob/master/.github/workflows/clang-tidy-macos.yml), and are published to S3. This PR does not handle versioning of the downloaded binaries as this is being worked on separately. See https://github.com/pytorch/test-infra/issues/73 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62214 Test Plan: On a MacOS machine, run ```bash python3 -m tools.linter.install.clang_tidy .clang-tidy-bin/clang-tidy --checks="*" --list-checks \| grep "misc-max-tokens" ``` Reviewed By: janeyx99, mruberry Differential Revision: D29917728 Pulled By: 1ntEgr8 fbshipit-source-id: 98d0d8b7a57bdebf0ebcdc83228ef391e8c6629e	2021-07-26 13:43:29 -07:00
Mike Iovine	f4136c5efc	[Static Runtime] Enforce proper output dtype for many ops Summary: We previously had lots of ops with implementations like this: ``` if (p_node->Output(0).isNone()) { p_node->Output(0) = create_empty_like(input_0); } ... auto& out = p_node->Output(0); some_func_out(inputs, out); ``` This would make the output have the correct shape. But it would also take the dtype of `input_0`, which is not always correct. This change transforms these blocks to: ``` if (p_node->Output(0).isNone()) { p_node->Output(0) = some_func(inputs) } else { ... auto& out = p_node->Output(0); some_func_out(inputs, out); } ``` This gives the output the correct shape and dtype. Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D29887367 fbshipit-source-id: cef04bfa52ec082ad3a9a32aa27c44e275c6b24c	2021-07-26 13:27:02 -07:00
Richard Zou	29bb3f4647	Refactor Tensor::to to call a primitive that is not copy_. (#61458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61458 Context ------- functorch is unable to vmap(grad(f)) when f contains a .to call. This is because .to (when it is not a no-op) decomposes to .copy_ under grad and the .copy_ is not compatible with vmap. Fix --- The fix for this is to have all Tensor::to variants call a new operator, `_to_copy`, that always copies and is a primitive w.r.t. autograd so that autograd decomposes Tensor::to into a call to `_to_copy`. (This is related to https://github.com/pytorch/pytorch/issues/60956, please let me know if you want to bikeshed the naming). In order to get this done I had to do a bit of refactoring. All of the `::to` implementations now call `to_impl` which may call `_to_copy`. Autograd codegen changes ------------------------ The second thing I had to do was modify the autograd codegen. Right now, autograd assumes that every output is either statically known to be differentiable or not differentiable at codegen time. `_to_copy` is a little special because its differentiability depends on the output dtype. e.g. `torch.randn(3, requires_grad=True).to(torch.long)` is non differentiable. To get this to work: - I changed how `output_differentiability` in derivatives.yaml work. - output_differentiability can now accept "conditions" for each of the output arguments. A "condition" is some C++ code. - We currently only support `output_differentiability` with conditions if there is a single output. This is for convenience and can be changed in the future. - I added a new `output_differentiability_conditions` field to DifferentiabilityInfo. This gets populated in load_derivatives.yaml - forward-mode and reverse-mode AD take `output_differentiability_conditions` into account. Here's how the generated code for `VariableType::_to_copy` [looks like](https://gist.github.com/zou3519/93462df4bda1837acee345205b7cc849) No other autogenerated code gets modified by this PR. Performance benchmarking ------------------------ - I benchmarked [three cases that demonstrate overhead](https://gist.github.com/zou3519/5b6985e6906b80eec5a0dd94ed5b6a1a). - Case A: No-op .to(). Instruction count went from 50223 to 25623. I have no clue why but this is a good thing. - Case B: not-no-op .to(). Instruction count went from 665291 to 671961. This is expected; `_to_copy` adds an additional dispatch. - Case C: not-no-op .to() forward pass and backward pass. Instruction count went from 4022841 to 4030057. This PR adds an additional dispatch to .to() (so there should be one additional dispatch in the forward pass) so this number looks reasonable. Test Plan --------- - test_torch.py has a test_to - test_cuda.py has test_to* - test_autograd has tests (test_type_conversions) that exercise the reverse-mode path - test_ops.py has some tests (like log_softmax) that exercise the reverse-mode and forward-mode AD path. - test_quantization, test_namedtensor all exercise tensor.to as well. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29801652 Pulled By: zou3519 fbshipit-source-id: bb01eb1acf3d79d84f284150d1be4be3b4ace351	2021-07-26 13:02:39 -07:00
zhouzhuojie	e63160d735	Add squid proxy as egress cache (#62103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62103 This PR adds a squid proxy that's deployed dedicated for PyTorch CI. Initially we only roll out to GHA, and if things are ok we will extend this to circleci tests if necessary. `http_proxy` and `https_proxy` are compatible with the following http clients: - curl - wget - python Existing cache policy: ``` refresh_pattern -i .(7z\|deb\|rpm\|exe\|zip\|tar\|tgz\|gz\|ram\|rar\|bin\|tiff\|bz2\|run\|csv\|sh)$ 1440 80% 2880 ``` It uses the standard squid refresh_pattern for cache requests. In our setup, we tried to cache at least (1440 minutes - 1 day) and at max (2880 minutes - 2 days), with last-modified factor 80% ([squid doc](http://www.squid-cache.org/Doc/config/refresh_pattern/)). Please refer to [pytorch/test-infra](https://github.com/pytorch/test-infra/tree/master/aws/websites/squid-proxy) for details. Right now, it only applies to the `build` and `test` step, to limit the scope and make sure build and test are more reliable with egress cache. Test Plan: Imported from OSS Reviewed By: jbschlosser, malfet, seemethere, janeyx99 Differential Revision: D29892919 Pulled By: zhouzhuojie fbshipit-source-id: ac17227f2553ca62881711b3e9943488dfd8defd	2021-07-26 13:01:34 -07:00
Richard Barnes	d2594fa538	irange-ify 3 (#62112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62112 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29879513 fbshipit-source-id: c01d18d34bb19014bf28d92c4d04b07e50a2770a	2021-07-26 12:56:58 -07:00
Salil Desai	f5c6c3947e	Remove Input Pointer Caching for XNNPack (#61959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61959 We no longer need to cache the Input Pointer as XNNPACK has implemented a more robust approach where indirection buffer does not need to be recalculated even if activation tensor pointer changes, as long as tensor dimensions are the same. This reverses the changes in https://github.com/pytorch/pytorch/pull/42840/files Reviewed By: kimishpatel Differential Revision: D29777605 fbshipit-source-id: c1750538c17bce34f885c6f1bbb1f7164ebba25b	2021-07-26 12:02:15 -07:00
Richard Barnes	7ec6d1e857	irange-ify 2 (#62113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62113 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29879507 fbshipit-source-id: 1fb114e44afe8c1407f648b705db7fd4edb9d6e3	2021-07-26 12:00:52 -07:00
Rohan Varma	6dc2c07304	[Reland] [DDP] Implement a hook which performs FunctionalSGD step. (#62177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62177 Reland of https://github.com/pytorch/pytorch/pull/61678 Fix CI failure by gating including torchvision model on whether torchvision is available or not. ghstack-source-id: 134282165 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29904101 fbshipit-source-id: 47e799eb4a90acbbda91c5857ea00de3045d49f5	2021-07-26 11:56:56 -07:00
Jamie King	1dfb687f3c	Fixed off-by-one bug in Adam Smart Decay (#62135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62135 The initial implementation of Adam with Smart Decay had an off-by-one error. This was in the summation of the geometric series used to calculate how much built-up momentum would have been discharged in skipped minibatches. The unit tests should have caught these, but the testing strategy missed this because k, the "number of skipped minibatches" was always either 0 or so high that the impact of the bug was too small. The impact of the bug was proportional to 1/k. The testing strategy has also been adjusted to cover this bug. Differential Revision: D29889309 fbshipit-source-id: b086c0efed5c27f621061e726533c73658daffc6	2021-07-26 11:55:38 -07:00
Supriya Rao	dcb3eadc1f	[quant][fix] Update quantization c++ tests to not run if CPU_STATIC_DISPATCH is specified (#62197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62197 For build configs with ATEN_CPU_STATIC_DISPATCH defined, quantization tests will fail since they require QuantizedCPU dispatch to be enabled. This will fix some internal test failures like https://www.internalfb.com/intern/test/844424941811803?ref_report_id=0 which are run under the `caffe2_aten_cpu_inference` project Test Plan: buck test mode/dev //caffe2/aten:quantized_test Imported from OSS Reviewed By: bdhirsh Differential Revision: D29912742 fbshipit-source-id: b117eb9f4afb51e0d0dd52fbe9d5c5be7dfafe85	2021-07-26 11:39:45 -07:00
Richard Barnes	0ca5dc7f03	irange-ify 5 (#62114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62114 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29879534 fbshipit-source-id: 0b1d6d2c9062a2fd7a55b00cb9f3d59ec941bad3	2021-07-26 11:07:54 -07:00
Akshit Khurana	8e71f48f0a	Handle simple NNAPI flatten NHWC case (#61796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61796 We can easily handle nnapi conversion for nhwc inputs that have 1 channel or H & W are 1 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_flatten Imported from OSS Reviewed By: saketh-are Differential Revision: D29827735 fbshipit-source-id: 65dee4b42fceef1b032bf5dd1c4cc6e020d01e14	2021-07-26 10:59:04 -07:00
kshitij12345	b73d759708	[fix] polygamma n>=1 (#61641 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/55357 TODO: * [x] Use proper casting to avoid confusing the compiler Pull Request resolved: https://github.com/pytorch/pytorch/pull/61641 Reviewed By: albanD Differential Revision: D29816592 Pulled By: mruberry fbshipit-source-id: 2c020a6e4c325c1b5d15499a77fb39f9ba93dd79	2021-07-26 10:52:20 -07:00
Pritam Damania	ef7d572afa	Ensure ShardedTensor handles list/tuple appropriately as `size` parameter. (#62109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62109 The `size` parameter only worked correctly for *args like invocation :10, 20 and not for list: [10, 20] and tuples: (10, 20). This PR ensures this works similar to `torch.empty`. ghstack-source-id: 134246166 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: SciPioneer Differential Revision: D29884768 fbshipit-source-id: 7a4a3c5ed5d7c081344f6ead3170905b97fc652d	2021-07-26 10:31:32 -07:00
Richard Barnes	f9dce598a5	Add some missing cuda guards (#62100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62100 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29880330 fbshipit-source-id: 7089000ccbcaa70a13f0ab4531b032bd5326e539	2021-07-26 10:26:22 -07:00
Victor Quach	200b6ccdc0	Catch saved tensors default hooks race condition (#61957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61957 If the user runs code that registers default saved tensor hooks from multiple threads, it will fail with a nice error message most of the time. This commit handles the very rare case where a race condition would have made it fail silently. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29848525 Pulled By: Varal7 fbshipit-source-id: eb9bdcfbeed857a988834651246390ea14eedd33	2021-07-26 09:48:47 -07:00
Neel Pragnesh Gandhi	f2369f12f9	Add logging for dynamic rendezvous (#61822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61822 Added scuba logging to the following files: - dynamic_rendezvous.py - c10d_rendezvous_backend.py NOTE: This diff introduces the use of python's inspect module to easily allow for obtaining the calling method name and filename when logging. This module can mess with python's garbage collector, so special care was taken to never store references to results from inspect.stack() longer than absolutely needed. Test Plan: The following tests can be run. ``` buck run mode/dev-nosan //caffe2/test/distributed/elastic/rendezvous:c10d_rendezvous_backend_test ``` ``` buck run mode/dev-nosan //caffe2/test/distributed/elastic/rendezvous:dynamic_rendezvous_test ``` ``` buck run mode/dev-nosan //caffe2/test/distributed/elastic/events:lib_test ``` Reviewed By: aivanou Differential Revision: D29643774 fbshipit-source-id: f10cd5ebf8f6860856267bc2483c0b85faacb0fd	2021-07-26 09:39:09 -07:00
Mike Iovine	6007ad3529	[Static Runtime] Refactor fb op tests to use testStaticRuntime (#62064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62064 `testStaticRuntime` was previously only available in `test_static_runtime.cc`. It has been moved to a common library `test_utils` to facilitate code re-use. This also lets us test dynamic shapes in `test_fb_operators` Reviewed By: hlu1 Differential Revision: D29858928 fbshipit-source-id: 68a94760166ddb745972b0f1fc24bed594937d1c	2021-07-26 08:25:10 -07:00
Victor Quach	be17d6eadf	Add default Saved Variable hooks (#61834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61834 Expose a pair of functions to Python users: torch.autograd.graph.set_saved_tensors_default_hooks(pack, unpack) and torch.autograd.graph.reset_saved_tensors_default_hooks(). These functions control the hooks applied to saved tensors: all tensors saved in that context will be packed using the pack function, then unpacked accordingly when needed. Currently, this works by simply calling register_hooks (cf #60975) directly at the end of the constructor of a SavedVariable. This could be optimized further by not performing the copy before registering default hooks, but this would require a small refactor. Edit: the refactor is done in #61927. A current limitation is that if users create tensors in this context, they will not be able to register additional hooks on the saved tensor. For instance, to perform something like #28997, one could define a pack function that saves to disk whenever the tensor size is too big and returns a filename, then unpack simply reads the content of the file and outputs a tensor, e.g.: ``` def pack(x): name = os.path.join(tmp_dir, str(uuid.uuid4())) torch.save(x, name) return name def unpack(name): return torch.load(name) ``` Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29792193 Pulled By: Varal7 fbshipit-source-id: 33e931230ef59faa3ec8b5d11ef7c05539bce77c	2021-07-26 08:14:32 -07:00
Thomas J. Fan	89ca638c18	ENH Adds no batch dim support for AdativeMaxPool*D (#61847 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61847 Reviewed By: suo Differential Revision: D29883887 Pulled By: jbschlosser fbshipit-source-id: de3fcf1cc3878b138ab766d2a50cc59c52ec5a60	2021-07-26 07:35:36 -07:00
CodemodService FBSourceClangFormatLinterBot	394dd391dd	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D29904940 fbshipit-source-id: 16ce87cc328f2950ed95a12710b50c444e363c79	2021-07-26 03:41:55 -07:00
Hui Guo	e6e8745bea	[nnc] Add simplifierUnderContext for simplification that needs context info: currently added for-stmt index var bounds info as context (#60687 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60687 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D29373315 Pulled By: huiguoo fbshipit-source-id: 8729af60dd6d9735187b2118e3e83c75ef21789d	2021-07-25 23:30:13 -07:00
Rohan Varma	2299d6a013	Revert D29701447: [DDP] Implement a hook which performs FunctionalSGD step. Test Plan: revert-hammer Differential Revision: D29701447 (`bd95cf4473`) Original commit changeset: 183954593b82 fbshipit-source-id: 714e6a2b698147db9533a67783aed2a65d9d5bfe	2021-07-25 22:23:30 -07:00
Jerry Zhang	457a3fb6d1	[bc-breaking][quant][graphmode][fx] Produce dequant - fp_op - quant pattern for copy nodes (#61763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61763 This PR changes the is_reference=True option for convert_fx to produce a dequant - fp_op - quant pattern for copy nodes like maxpool op. Before the PR: ``` def forward(self, x): maxpool2d_input_scale_0 = self.maxpool2d_input_scale_0 maxpool2d_input_zero_point_0 = self.maxpool2d_input_zero_point_0 quantize_per_tensor = torch.quantize_per_tensor(x, maxpool2d_input_scale_0, maxpool2d_input_zero_point_0, torch.quint8); x = maxpool2d_input_scale_0 = maxpool2d_input_zero_point_0 = None maxpool2d = self.maxpool2d(quantize_per_tensor); quantize_per_tensor = None dequantize = maxpool2d.dequantize(); maxpool2d = None return dequantize ``` After (we expand the maxpool2d that works with quantized input to "dequant - maxpool2d - quant" pattern ``` def forward(self, x): maxpool2d_input_scale_0 = self.maxpool2d_input_scale_0 maxpool2d_input_zero_point_0 = self.maxpool2d_input_zero_point_0 quantize_per_tensor = torch.quantize_per_tensor(x, maxpool2d_input_scale_0, maxpool2d_input_zero_point_0, torch.quint8); x = maxpool2d_input_scale_0 = maxpool2d_input_zero_point_0 = None dequantize = quantize_per_tensor.dequantize(); quantize_per_tensor = None maxpool2d = self.maxpool2d(dequantize); dequantize = None maxpool2d_output_scale_0 = self.maxpool2d_output_scale_0 maxpool2d_output_zero_point_0 = self.maxpool2d_output_zero_point_0 quantize_per_tensor_1 = torch.quantize_per_tensor(maxpool2d, maxpool2d_output_scale_0, maxpool2d_output_zero_point_0, torch.quint8); maxpool2d = maxpool2d_output_scale_0 = maxpool2d_output_zero_point_0 = None dequantize_1 = quantize_per_tensor_1.dequantize(); quantize_per_tensor_1 = None return dequantize_1 ``` note that the call to self.maxpool2d is expanded to ``` dequantize = quantize_per_tensor.dequantize(); quantize_per_tensor = None maxpool2d = self.maxpool2d(dequantize); dequantize = None maxpool2d_output_scale_0 = self.maxpool2d_output_scale_0 maxpool2d_output_zero_point_0 = self.maxpool2d_output_zero_point_0 quantize_per_tensor_1 = torch.quantize_per_tensor(maxpool2d, maxpool2d_output_scale_0, maxpool2d_output_zero_point_0, torch.quint8); maxpool2d = maxpool2d_output_scale_0 = maxpool2d_output_zero_point_0 = None ``` Test Plan: ``` python test/test_quantization.py TestQuantizeFx.test_copy_node_has_shared_actpp_instance ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D29728900 fbshipit-source-id: cf2c7f1f6659e3ba97cbb7c6204dd13983da10bd	2021-07-25 19:49:13 -07:00
Supriya Rao	76d3cdf9df	[quant] Add from_blob_quantized_per_channel API (#62049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62049 Adds a new function that accepts qint data blobs as input and creates a per-channel quantized tensor using the pre-allocated data and the provided scale and zero_point inputs Addresses issue #61777 Test Plan: ./build/bin/quantized_test --gtest_filter='TestQTensor.FromBlobQuantizedPerChannel' Imported from OSS Reviewed By: kimishpatel Differential Revision: D29854136 fbshipit-source-id: da6ecd3fb59a6f40ae88430fdd5d895f93d5411c	2021-07-25 14:09:38 -07:00
Supriya Rao	7195b78a59	[quant] Add from_blob_quantized_per_tensor API (#61986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61986 Adds a new function that accepts qint data blobs as input and creates a quantized tensor using the pre-allocated data and the provided scale and zero_point inputs Addresses issue https://github.com/pytorch/pytorch/issues/61777 Test Plan: ./build/bin/quantized_test --gtest_filter='TestQTensor.FromBlobQuantizedPerTensor' Imported from OSS Reviewed By: kimishpatel Differential Revision: D29831135 fbshipit-source-id: b08299bbe9e939fedff98a585e6b12c14d31f17e	2021-07-25 14:08:25 -07:00
Rohan Varma	bd95cf4473	[DDP] Implement a hook which performs FunctionalSGD step. (#61678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61678 This diff makes the following changes: - Add `step_param` method to `_FunctionalSGD` class which is written similar to `step` but for a single param - Implement a communication hook wrapper that runs a given comm. hook and then applies functional SGD step - Verifies that this is equal to regular allreduce + SGD optimizerghstack-source-id: 133567598 ghstack-source-id: 134263399 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29701447 fbshipit-source-id: 183954593b82a092414623292f9b10e675fef96e	2021-07-25 13:36:47 -07:00
tktrungna	8152433de2	[1/n] Update testing lib.so path (#61960 ) Summary: ### Issue Build PyTorch wheel packages during build stage for pull requests and install during test stage. ### Fix Update all tests which call lib.so (under `./build folder`), change to call lib*.so in `{ent}/pytorch/lib/python3.8/site-packages/torch` ### Diff This diff starts to update test_fx, test_backend and test_torchbind first to check if current ci pass Pull Request resolved: https://github.com/pytorch/pytorch/pull/61960 Test Plan: check of all ci workflows pass Reviewed By: malfet, saketh-are Differential Revision: D29823235 Pulled By: tktrungna fbshipit-source-id: e7f652def698e303d4843fbaedf4859f5eca2fd9	2021-07-24 05:16:35 -07:00
Nikolay Korovaiko	956f1c981e	fix a typo (#61061 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/61061 Reviewed By: navahgar, Gamrix Differential Revision: D29495806 Pulled By: Krovatkin fbshipit-source-id: 510de724e3108c52af1b25b8ab53ae3c895b55f9	2021-07-24 00:35:58 -07:00
Richard Barnes	ee44d73e59	Modernize override (#61744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61744 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29717320 fbshipit-source-id: 6eea4295ee2e5572ab337620be412376fcc2f3cc	2021-07-23 23:04:46 -07:00
Shiyan Deng	d2e03dc484	[fx2trt] Add support for explicit batch dimension (#62110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62110 Add an option to opt in explicit batch dimension. Extend unit tests to test both scenario (implicit and explicit). Fixed some converters that doesn't work with explicit batch dimension before. Add broadcast support and a generic function for adding elementwise binary ops. Follow ups: 1. Adding the dynamic shape support in explicit batch dimension mode to allow different batch dimension at least. 2. Extend layer_norm plugin `PluginV2Ext` to make it work in explicit batch dimension. Test Plan: unit tests Reviewed By: jackm321 Differential Revision: D29798239 fbshipit-source-id: 91d47c6155d2473ed4a6f8d2816715a32c61b869	2021-07-23 22:54:07 -07:00
Jerry Zhang	cc263ef795	[bc-breaking][quant][graphmode][fx] Add observer/fake_quant for copy nodes (#61687 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61687 Previously we do not insert observer/fake_quant for output copy nodes (e.g. maxpool). But to produce reference patterns we need to insert observer/fake_quant for the output and later convert that to a quantize node. Model: ``` class M(torch.nn.Module): def __init__(self): super().__init__() self.maxpool2d = torch.nn.MaxPool2d(kernel_size=3) def forward(self, x): x = self.maxpool2d(x) return x ``` result of prepare: Before: def forward(self, x): x_activation_post_process_0 = self.x_activation_post_process_0(x); x = None maxpool2d = self.maxpool2d(x_activation_post_process_0); x_activation_post_process_0 = None return maxpool2d After: def forward(self, x): x_activation_post_process_0 = self.x_activation_post_process_0(x); x = None maxpool2d = self.maxpool2d(x_activation_post_process_0); x_activation_post_process_0 = None maxpool2d_activation_post_process_0 = self.maxpool2d_activation_post_process_0(maxpool2d); maxpool2d = None return maxpool2d_activation_post_process_0 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D29715566 fbshipit-source-id: 817df9b2933a35cad5331d8d8ce1c5ba0752e9df	2021-07-23 21:29:37 -07:00
Hao Lu	78f7d8ccfa	[Static Runtime] Remove wrappers for aten::cat (#62067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62067 The wrapper for aten::cat is no longer needed after the variadic cat change in D29565344 (`ae58a4c45d`) . Also added a simple test to test dynamic shapes, i.e., input tensors in args2 are larger than in args1. Reviewed By: navahgar, mikeiovine Differential Revision: D29864600 fbshipit-source-id: 44a712c2e776815c09e0bf5631412149b81274b2	2021-07-23 20:33:41 -07:00
Zachary DeVito	7c09de8384	[torch deploy] add support for Python C extension modules (#58117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58117 Previously it was not possible to load C extension modules with deploy because extension modules need to link against the Python.h API functions. Since each libtorchdeploy_interpreter.so had its own copy of these functions, it is not possible to tell dlopen to resolve symbols in a loaded SO from one of these libraries without exposing its symbols globally. This patch adds a custom ELF loader which does the custom loading of attaching c extension libraries to the Python API that loaded the shared library. Simple use of numpy and regex modules appears to work. This diff has some limitations: * 64-bit Linux only. OSX and windows use different formats for shared libraries. 32-bit ELF files are not supported. * debug info is not immediately availiable to debuggers. A script for lldb is provided which can be loaded so that lldb knows about the libraries as they are loaded. * shared libraries can directly use the Python API, but libraries they depend on (via DT_NEEDED entries in their dynamic segment) may not use Python. In the future, we can try to detect whether a sub library uses the Python API and load it with our customer loader. * TLS initialization and library initialization may occur in a different order than what would happen with dlopen, potentially leading to some issues running destructors in TLS segments. Use of this C++ features is relatively rare. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D28435305 Pulled By: zdevito fbshipit-source-id: 10f046053dd1d250e3c73f2cce8eb945eeba31b6	2021-07-23 19:58:54 -07:00
Yi Wang	e856a45283	[Model Averaging] Refactor averagers to accept parameters instead of a module (#62105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62105 This is for the preparation of wrapping the averager as an optimizer, which can only accept parameters rather than a module. Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 134213572 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_average_parameters Reviewed By: rohan-varma Differential Revision: D29883693 fbshipit-source-id: 474ba924a0b05068b12f163fb74582bccf314964	2021-07-23 18:39:45 -07:00
Ilia Cherniavskii	41f7a9dac0	[profiler][refactor] Avoid using legacy event in profiler (#61721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61721 Remove dependency on LegacyEvent from the profiler Test Plan: python test/test_profiler.py -v Imported from OSS Reviewed By: kimishpatel, gdankel Differential Revision: D29716769 fbshipit-source-id: 2c2b48f2ee096adcbde09821e0cc7c0fcb94d19f	2021-07-23 18:28:08 -07:00
Ivan Kobzarev	06a3b23971	[android] Lite interpreter module to load from assets (#61609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61609 Test Plan: Imported from OSS Reviewed By: cccclai Differential Revision: D29688641 Pulled By: IvanKobzarev fbshipit-source-id: 7857bad51e91eae7c90a1218d463f3767f4fae15	2021-07-23 17:51:18 -07:00
Hui Guo	643e58466e	[nnc] Rename IRSimplifierBase with PolynomialBase (#60686 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60686 Test Plan: Imported from OSS Reviewed By: navahgar, soulitzer Differential Revision: D29373316 Pulled By: huiguoo fbshipit-source-id: bd44bff60455076d1c5291273989e9939a428f9a	2021-07-23 17:18:41 -07:00
Amy He	046272f3e5	[6/N] Nnapi Backend Delegate: Comprehensive OSS Tests (#61782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61782 This PR depends on https://github.com/pytorch/pytorch/pull/61787 ### Summary: Added more comprehensive tests for Android NNAPI delegate. Previously, there was only one basic test for lowering a PReLU module with the NNAPI delegate. Now, more tests are inherited from `test_nnapi.py`, the file for testing NNAPI conversion and execution without the delegate. test_backend_nnapi.py Test file for Android NNAPI delegate. - `TestNnapiBackend` class inherits tests from `test_nnapi.py` and overrides the model conversion to use the delegate API. - Includes an extra test for passing input arguments as Tensors and Tensor Lists. - Has extra set up for loading the NNAPI delegate library and changing the default dtype from float64 to float32 (dtype is typically float32 by default, but not in delegate backend unit tests) test_nnapi.py Test file for Android NNAPI without the delegate. - Some code was refactored to allow override of only the NNAPI conversion call. - An extra function was added to allow the NNAPI delegate unit test to turn off the model execution step. Once the NNAPI delegate's execution implementation is complete, this may no longer be necessary. ### Test Plan: I ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` to run both test files. Test Plan: Imported from OSS Reviewed By: raziel, iseeyuan Differential Revision: D29772005 fbshipit-source-id: 5d14067a4f6081835699b87a2ece5bd6bed00c6b	2021-07-23 17:04:07 -07:00
Thomas J. Fan	f03e7170f0	ENH Updates docs and tests for regression modules that already support no-batch-dims (#61461 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 This PR does not use `check_sum_reduction` because I wanted to test every reduction option. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61461 Reviewed By: suo Differential Revision: D29883744 Pulled By: jbschlosser fbshipit-source-id: cdad0effb41f0484938caad0d4c9d6d83e2aec07	2021-07-23 16:40:17 -07:00
Thomas J. Fan	1ec6205bd0	ENH Adds no_batch_dim support for maxpool and unpool for 2d and 3d (#61984 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 (Interesting how the maxpool tests are currently in `test/test_nn.py`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61984 Reviewed By: suo Differential Revision: D29883846 Pulled By: jbschlosser fbshipit-source-id: 1e0637c96f8fa442b4784a9865310c164cbf61c8	2021-07-23 16:14:10 -07:00
Joel Schlosser	f4ffaf0cde	Fix type promotion for cosine_similarity() (#62054 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61454 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62054 Reviewed By: suo Differential Revision: D29881755 Pulled By: jbschlosser fbshipit-source-id: 10499766ac07b0ae3c0d2f4c426ea818d1e77db6	2021-07-23 15:20:48 -07:00
Joel Schlosser	e408af083f	Improve MHA docs (#61977 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60831 Also clarifies the relationship between `embed_dim` and `num_heads` (see https://github.com/pytorch/pytorch/issues/60853 and https://github.com/pytorch/pytorch/issues/60445). Formatting was overhauled to remove some redundancy between the input docs and shape docs; suggestions / comments welcome! Link to rendered docs here: https://14912919-65600975-gh.circle-artifacts.com/0/docs/generated/torch.nn.MultiheadAttention.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/61977 Reviewed By: bhosmer Differential Revision: D29876884 Pulled By: jbschlosser fbshipit-source-id: a3e82083219cc4f8245c021d309ad9d92bf39196	2021-07-23 15:19:34 -07:00
Hao Lu	cf3cc01f1d	[Static Runtime] Add is_frozen to StaticModule ctor (#62020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62020 Add is_frozen to StaticModule ctor so we can skip freezing in StaticModule. Reviewed By: ajyu, mikeiovine Differential Revision: D29807431 fbshipit-source-id: 7742e9f5c5ae9f442a9e4007c870a14fd8b4af20	2021-07-23 15:12:35 -07:00
Elton Leander Pinto	fa11103c6a	[clang-tidy] Fix unknown GNU flag error (#62128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62128 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D29888297 Pulled By: 1ntEgr8 fbshipit-source-id: 0657d5baa72c014a83c9def4a39338c52f4ef8d1	2021-07-23 14:46:51 -07:00
Thomas J. Fan	9730d91abd	MAINT Migrates multilabel_margin_loss from THC to ATen (CUDA) (#60708 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24603 Fixes https://github.com/pytorch/pytorch/issues/24602 <s>The implementation should be exactly the same, so it is strange that the benchmarks show such a significant improvement in this PR.</s> The benchmarks are now the same. <details> <summary>Benchmark script</summary> ```python from itertools import product import torch import torch.nn as nn import torch.nn.functional as F import time torch.manual_seed(0) MS_PER_SECOND = 1000 def _time(): torch.cuda.synchronize() return time.perf_counter() * MS_PER_SECOND device = "cuda" C = 30 n_runs = 100 reductions = ["none", "sum", "mean"] Ns = [1_000, 10_000, 100_000] for reduction, N in product(reductions, Ns): total_fwd_time = 0 total_back_time = 0 grad_out = torch.randn(N, device=device) if reduction != "none": grad_out = grad_out[0] for _ in range(n_runs): input = torch.randn(N, C, device=device, requires_grad=True) target = torch.randint(0, C, size=input.size(), device=device) # forward start = _time() result = F.multilabel_margin_loss(input, target, reduction=reduction) total_fwd_time += _time() - start result = F.multilabel_margin_loss(input, target, reduction=reduction) for _ in range(n_runs): # backward start = _time() result.backward(grad_out, retain_graph=True) total_back_time += _time() - start fwd_avg = total_fwd_time / n_runs bwd_avg = total_back_time / n_runs print( f"input size({N}, {C}), reduction: {reduction}, fwd: {fwd_avg:.2f} (ms), back: {bwd_avg:.2f} (ms)" ) ``` </details> ## master ``` input size(1000, 30), reduction: none, fwd: 0.14 (ms), back: 0.41 (ms) input size(10000, 30), reduction: none, fwd: 1.26 (ms), back: 3.58 (ms) input size(100000, 30), reduction: none, fwd: 13.15 (ms), back: 34.68 (ms) input size(1000, 30), reduction: sum, fwd: 0.14 (ms), back: 0.38 (ms) input size(10000, 30), reduction: sum, fwd: 1.16 (ms), back: 3.53 (ms) input size(100000, 30), reduction: sum, fwd: 13.04 (ms), back: 34.53 (ms) input size(1000, 30), reduction: mean, fwd: 0.14 (ms), back: 0.38 (ms) input size(10000, 30), reduction: mean, fwd: 1.17 (ms), back: 3.52 (ms) input size(100000, 30), reduction: mean, fwd: 13.12 (ms), back: 34.54 (ms) ``` ## this PR ``` input size(1000, 30), reduction: none, fwd: 0.14 (ms), back: 0.35 (ms) input size(10000, 30), reduction: none, fwd: 1.22 (ms), back: 2.98 (ms) input size(100000, 30), reduction: none, fwd: 12.90 (ms), back: 29.32 (ms) input size(1000, 30), reduction: sum, fwd: 0.14 (ms), back: 0.32 (ms) input size(10000, 30), reduction: sum, fwd: 1.16 (ms), back: 2.97 (ms) input size(100000, 30), reduction: sum, fwd: 13.00 (ms), back: 29.17 (ms) input size(1000, 30), reduction: mean, fwd: 0.14 (ms), back: 0.32 (ms) input size(10000, 30), reduction: mean, fwd: 1.17 (ms), back: 2.97 (ms) input size(100000, 30), reduction: mean, fwd: 13.09 (ms), back: 28.91 (ms) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60708 Reviewed By: saketh-are Differential Revision: D29856579 Pulled By: ngimel fbshipit-source-id: b6bbf27a71e5a04f61779f6fef4ed1c98baa2607	2021-07-23 13:45:28 -07:00
Ilia Cherniavskii	a6c6fd923e	[profiler] Nvtx support (#61634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61634 Legacy profiler supported Nvtx and that was used by emit_nvtx, this PR adds support for Nvtx in the new compiler, to prepare for the eventual deprecation of the legacy profiler Test Plan: Verified that the profiles produced with nvprof are the same ``` import torch import torchvision.models as models from torch.autograd.profiler import emit_nvtx, load_nvprof model = models.resnet18().cuda() inputs = torch.randn(5, 3, 224, 224).cuda() with emit_nvtx(record_shapes=True): model(inputs) ``` /usr/local/cuda/bin/nvprof -o test_trace2.prof -f -- python test_emit_nvtx.py ``` evt = load_nvprof("/home/iliacher/local/pytorch/test_trace.prof") ``` Imported from OSS Reviewed By: kimishpatel, gdankel Differential Revision: D29691316 fbshipit-source-id: 1e186cc072368f3e3987a2da0bfd90ed328817c5	2021-07-23 13:37:09 -07:00
Jamie King	812bc1dde6	Smart Decay for Adam - DPER3 (#62058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62058 This is the second diff in this stack. This diff includes the changes to DPER3; the first diff includes the changes to Caffe2. We want to decay learning parameters properly. Previously this was not done when a parameter is absent from a minibatch. We fix this by keeping track of missed minibatches and making decay catch up accordingly. The exponential moving averages (EMA) for the first and second moments used in Adam are updated only for parameters seen in a minibatch. Actually, for these parameters, 0 should be added to the EMAs and the EMAs should then be decayed by multiplying by beta1 and beta2 respectively. To avoid the computational overhead of touching every parameter for every minibatch, we: * keep track of the last time a parameter is seen * instead of decaying the EMAs by multiplying by beta1 and beta2, we multiply by beta1^k and beta2^k, where k is the number of minibatches since the parameter was last seen. We hope this will significantly improve the inconsistent learning parameter issue we have seen with Adam. Differential Revision: D29638897 fbshipit-source-id: 18d8e227d72c2e23010ca81e0f6eeb78872c8d3c	2021-07-23 13:26:30 -07:00
Yukio Siraichi	5224490ae9	Implement NumPy-like `frombuffer` tensor constructor. (#59077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59077 Fixes #58549 `from_buffer` constructs a tensor object from an already allocated buffer through CPython's buffer protocol. Besides the standard `dtype`, `count`, and `offset` parameters, this function also accepts: - `device`: where the buffer lives - `requires_grad`: should autograd record operations on the new tensor A new test file _test_buffer_protocol.py_ was created. Currently, only CPU tests were implemented. That's because neither PyTorch nor Numba implements CPython's buffer protocol. Therefore, there's no way to create a CUDA buffer with the existing dependencies (could use PyCUDA for that, though). At the moment, if `device` differs from the device the buffer actually lives, two things may happen: - `RuntimeError`, if `device='cuda'` - Segmentation fault (not tested -- see above), if `device='cpu'` Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29870914 Pulled By: mruberry fbshipit-source-id: 9fa8611aeffedfe39c9af74558178157a11326bb	2021-07-23 13:17:48 -07:00
Mike Iovine	ec4e6181e6	[Static Runtime] Fix broken test_static_runtime build (#62098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62098 The build was broken by D29821533 (`1d2ea76afb`). The `clamp` overloads used in `deep_wide.h` are no longer available in the `at::native` namespace. Use `at::cpu::clamp` and `at:🗜️:clip_out` (which should be an alias for clamp) instead. Reviewed By: hlu1 Differential Revision: D29880187 fbshipit-source-id: 210b6d2be8a8142e7af1a0ba07e55a95b1a77d25	2021-07-23 12:35:09 -07:00
Jane Xu	b820493cf1	[skip ci] Refactor CIFlow init logic (#62102 ) Summary: This PR refactors the CIWorkflow post_init step to best account for how CIFlow interacts with everything. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62102 Test Plan: This PR did NOT garner any workflow changes. I ran mypy and flake8 on the changed file locally with no issues. Reviewed By: jbschlosser Differential Revision: D29883275 Pulled By: janeyx99 fbshipit-source-id: 6c5c1fc1878159e0de1bf8d9bd0cb32aa47af49a	2021-07-23 12:29:04 -07:00
Yi Wang	71cfbc45b4	Remove redundant `torch.cuda.set_device(self.rank)` (#62097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62097 as title ghstack-source-id: 134196740 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_profiling_autograd_profiler Reviewed By: rohan-varma Differential Revision: D29880040 fbshipit-source-id: 6a06fb2d87e9a7dfa1d7c81bf0c3fe115c1a1abb	2021-07-23 11:59:16 -07:00
Peter Bell	5ef667a8b8	Remove duplicated movedim implementation (#61939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61939 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D29850798 Pulled By: zou3519 fbshipit-source-id: e803b235d8535a204515ff9f5d46b8c4d191b73c	2021-07-23 11:52:07 -07:00
Philip Meier	10ccc5a81c	remove `randn?` from `torch.testing` namespace (#61840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61840 Redo of #60859. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29871017 Pulled By: mruberry fbshipit-source-id: 47afed1dc6aa0bb1e826af616ef5d5aaabb8e5bb	2021-07-23 11:51:03 -07:00
Kushashwa Ravi Shrimali	cb47d1f9c8	OpInfo Ref: fmod, remainder (#61527 ) Summary: See https://github.com/pytorch/pytorch/issues/54261 for OpInfo tracker. This PR: * [x] Adds references to both `fmod` and `remainder` for testing. * [x] Updates `remainder` documentation to add a note on divergence with `std::remainder`. (something similar to NumPy's note: https://numpy.org/doc/1.20/reference/generated/numpy.remainder.html), see: https://github.com/pytorch/pytorch/pull/61527#discussion_r670238788 for further discussion. cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/61527 Reviewed By: albanD Differential Revision: D29841266 Pulled By: mruberry fbshipit-source-id: be99851a94f53ea2fc07b64fd7c947775129658c	2021-07-23 11:44:32 -07:00
Brian Hirsh	c9b71549f2	don't allow alias dispatch keys to go in the DispatchKeySet (#61771 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61771 Test Plan: Imported from OSS Reviewed By: asuhan Differential Revision: D29736432 Pulled By: bdhirsh fbshipit-source-id: 54bb716db1e41565b00f4f01ea0096f834087577	2021-07-23 11:29:46 -07:00
anjali411	143ef016ee	Throw RuntimeError when numpy() is called on a tensor with conjugate or negative bit set (#61925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61925 Resolves https://github.com/pytorch/pytorch/issues/59945 and https://github.com/pytorch/pytorch/issues/59946 bc breaking note: Unlike before, complex_tensor.conj().numpy(), complex_float_tensor.conj().view(torch.float64), complex_float_tensor.conj().imag.view(torch.int32) now doesn't return a view but instead errors out Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29819288 Pulled By: anjali411 fbshipit-source-id: 4bebec721eb535f44ef4b728bdc75fa444e05d16	2021-07-23 11:28:36 -07:00
kshitij12345	943ca5f6f7	[special] alias for mvlgamma (#61633 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Have added `out` variant for consistency. TODO: * [x] Check docs https://docs-preview.pytorch.org/61633/special.html#torch.special.multigammaln Pull Request resolved: https://github.com/pytorch/pytorch/pull/61633 Reviewed By: albanD Differential Revision: D29815514 Pulled By: mruberry fbshipit-source-id: 003c7b6a5938ecc7a96727310e8a39da0b3d7aca	2021-07-23 11:24:27 -07:00
Aliaksandr Ivanou	0c55f1bdec	[torchelastic] Improve process termination logic (#61602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61602 The diff introduces signal handlers and SignalException that is raised when the agent process receives SIGTERM or SIGINT. When any of these signals received, the termination handler will raise the `SignalException`. The exception will then be processed by the main agent loop. The `shutdown(signum)` will be invoked, that would propagate the received signal to the child processes. The default 30 seconds timeout introduced: if child processes will not be able gracefully terminate during this timeout, the agent process would kill the processes via SIGKILL. Test Plan: unittests, sandcastle Reviewed By: cbalioglu Differential Revision: D29671783 fbshipit-source-id: 3dbca2125676dc18d417cc3e3bb0301fdd42737a	2021-07-23 11:00:15 -07:00
Edward Yang	e42360d56f	Remove default arguments before calling to __torch_dispatch__ (#61123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61123 This applies the design pattern of removing explicit arguments when they coincide with the default arguments. This simplifies argument patterns that dispatch kernels receive and make it easier for us to maintain BC (as addition of a new default argument isn't immediately BC-breaking for dispatch implementors). There is an important extra API which I haven't implemented here yet, which is to take an incomplete sequence of arguments and fill out their defaults (in case the user did want normalization). I plan on adding that in a future PR. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D29853616 Pulled By: ezyang fbshipit-source-id: 71c672cb3a7d4d01f838a1c7fcdb75a8ce7d058e	2021-07-23 10:41:35 -07:00
Charles David Hernandez	32d0c3e8ee	Support for reference convert_fx working on gpu Summary: This PR enables gpu only quantization, best used with is_reference since there are not many gpu kernels for ops as of now. This PR mainly changes how qconfigs and their obs constructors operate once they on modules qconfig. The function add_module_to_qconfig_obs_ctr takes the obs constructors on the original qconfig, and configures them so that when invoked, the created obs will be on whatever device the module occupies. (Once observers are created, module.to(device) is already setup so that it moves any observers). To do this, a new method and a few small chanegs were added to the _PartialWrapper class that our observers already use to create constructors (without changing the existing functionality). These changes work in concert with changes to the prepare flow such that when the qconfigs are propagated to the moduels (in quantize.py and qconfig_utils.py) they are configured using add_module_to_qconfig_obs_ctr. Ideally this would work on other models but the is_reference support for a lot of modules isn't there yet, those tests should be added in a future PR Test Plan: python test/test_quantization.py TestQuantizeFxModels.test_static_gpu_convert_basic python test/test_quantization.py TestQuantizeFxModels.test_switch_device_prepare_convert python test/test_quantization.py TestQuantizeFxModels.test_prepare_serialize_switch_device_convert python test/test_quantization.py TestQuantizeFx.test_qconfig_precedence Reviewed By: vkuzo Differential Revision: D29684114 fbshipit-source-id: 19fefb8e1998eaf212723e836276ccf39467f2e7	2021-07-23 10:30:38 -07:00
Peter Bell	0df1679e5c	BatchNorm: fix mixed precision usage with affine=False (#61962 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61924 The fused backward kernel was using the weight dtype to detect mixed precision usage, but the weights can be none and the `running_mean` and `running_var` can still be mixed precision. So, I update the check to look at those variables as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61962 Reviewed By: albanD Differential Revision: D29825516 Pulled By: ngimel fbshipit-source-id: d087fbf3bed1762770cac46c0dcec30c03a86fda	2021-07-23 09:55:52 -07:00
Jane Xu	e318058ffe	Ignore LNK4099 for debug binary libtorch builds (#62060 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61979 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62060 Test Plan: This CI shouldn't break and https://github.com/pytorch/pytorch/pull/62061 Reviewed By: driazati Differential Revision: D29877487 Pulled By: janeyx99 fbshipit-source-id: 497f84caab3f9ae609644fd397ad87a6dc8a2a77	2021-07-23 09:31:41 -07:00
Vasiliy Kuznetsov	04c95a0638	ns for fx: expose hook to define custom weight extraction functions (#62047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62047 Adds a hook for user to define a weight extraction function for a custom type. Example usage: ``` op_to_type_to_weight_extraction_fn = \ get_op_to_type_to_weight_extraction_fn() op_to_type_to_weight_extraction_fn['call_function'][_wrapped_linear] = \ torch.quantization.ns.weight_utils.get_linear_fun_weight results = extract_weights_impl( 'a', m1, 'b', m2, op_to_type_to_weight_extraction_fn=op_to_type_to_weight_extraction_fn) ``` Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29853625 fbshipit-source-id: 183916ef54ba303bc818e0eba00b52e33c4633ad	2021-07-23 09:31:37 -07:00
Vasiliy Kuznetsov	07c6a12008	ns for fx: fix typing issue in weight extraction (#62041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62041 Before this PR, weights of conv and linear modules were extracted as lists, in order to match the signature of LSTM weights. After this PR, weight extraction preserves the type of the weights, so extracted weights of conv and linear have a different type from LSTM weights. The comparison util functions are updated to handle the LSTM weight type of `List[tensor]`. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29853626 fbshipit-source-id: 93da5b9b0b174679c61528d02b6b902cb064444e	2021-07-23 09:31:33 -07:00
Vasiliy Kuznetsov	eaba16d665	ns for fx: change weight extraction to direct mapping (#62038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62038 Updates the logic to extract weights from nodes to use a direct mapping from type to weight extraction function. This is needed for a future PR which will allow users to specify custom weight extraction functions for user defined types. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29853627 fbshipit-source-id: 3ef90ef4bd7b28f6316c0af215a2bd3ff8a2aeca	2021-07-23 09:30:08 -07:00
Richard Barnes	8a2c525d3b	Fix some sign comparisons (#61849 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61849 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29736180 fbshipit-source-id: 1391b11e73725ee985b9aa768566ca77f44d04ae	2021-07-23 09:03:33 -07:00
Jane Xu	9d4056468e	Migrate scheduled jobs debuggability to GHA (#62056 ) Summary: This removes the debuggable-ci workflow in Circle and enables the same idea in GHA, to allow contributors to run scheduled GHA workflows by: 1. assigning the PR to pytorchbot. 2. labeling the PR with ciflow/scheduled 3. unassigning the PR. This PR also adds the trigger_action_only logic to windows_ci_template yaml, as it was present on the linux template and seemed to be left out by mistake. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62056 Test Plan: Note that this periodic job https://github.com/pytorch/pytorch/pull/62056/checks?check_run_id=3138504471 ran later than other jobs (like [this one](https://github.com/pytorch/pytorch/pull/62056/checks?check_run_id=3138226668)), and its time is close to when unassigning happens. Reviewed By: seemethere Differential Revision: D29859079 Pulled By: janeyx99 fbshipit-source-id: cd5c6be415cfa8090e3cac90625f92b49fd453a8	2021-07-23 08:48:22 -07:00
Yi Wang	b03b45afd9	[DDP Comm Hook] Use a single tensor instead of a tensor list as the comm hook result (#62074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62074 Since SPMD mode is retired, the comm hook result will always be a single tensor. This can improve comm hook developer experience, as no need to add an extra `[0]` to the precursor future result. #Closes: https://github.com/pytorch/pytorch/issues/61914 ghstack-source-id: 134164593 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork Reviewed By: rohan-varma Differential Revision: D29864732 fbshipit-source-id: 59fe6dd78b66214b1788514ad4d236039d9bda31	2021-07-23 03:32:05 -07:00
Meghan Lele	1d2ea76afb	`clamp`: port to structured kernel (#61361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61361 This PR ports the `clamp` kernel to the structured format. In addition, it introduces `OptionalScalarRef` as a replacement for `c10::optional<Scalar>&`. The latter, although it is a reference type, can still involve copying the contained `Scalar` (e.g. if the actual parameter is a `Scalar` or if a `c10::optional<Scalar>` is constructed just to call a kernel). `OptionalScalarRef` contains only a `const Scalar&`, and stores flag about whether the instance contains something inside the `Scalar` itself using a new tag. For more information, see #55070. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29821533 Pulled By: SplitInfinity fbshipit-source-id: 88d55df5a4b2c14b68a57e4905d90eea1b088d99	2021-07-23 02:02:07 -07:00
Basil Hosmer	b106b958eb	preserve residual in transformer norm_first (#61692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61692 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29706830 Pulled By: bhosmer fbshipit-source-id: d9c9e88fb589d46189955a96909c6ca76d587f72	2021-07-22 23:49:08 -07:00
Yi Wang	53222c59f0	Reformat (#62073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62073 as title ghstack-source-id: 134159445 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D29869185 fbshipit-source-id: 17a32d56860e9469bd26c4eb4ca2d483827d946e	2021-07-22 23:36:22 -07:00
Karen Zhou	3687bbb1ed	[pruner] add Conv2d support (#61778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61778 Adding Conv2d as supported modules for the pruner. Previously the pruner only supported Linear layers. This addition includes: - adding a Conv2d activation reconstruction forward hook to match Conv2d weight shapes - in `prepare`, checking the type of the module and using the corresponding activation forward hook ghstack-source-id: 134143557 Test Plan: Added conv2d tests `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1LLf3 Reviewed By: jerryzh168 Differential Revision: D29719045 fbshipit-source-id: 6a9f91b96992c552fff32f0e5a6e22f16eb7077b	2021-07-22 23:00:31 -07:00
Nikita Shulga	a9b0a921d5	Disable `avoid-non-const-global-variables` lint check (#62008 ) Summary: As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH` All changes but the ones to `.clang-tidy` are generated using following script: ``` for i in `find . -type f -iname ".c" -or -iname "*.h"\|xargs grep cppcoreguidelines-avoid-non-const-global-variables\|cut -f1 -d:\|sort\|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008 Reviewed By: driazati, r-barnes Differential Revision: D29838584 Pulled By: malfet fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13	2021-07-22 18:04:40 -07:00
Nikita Shulga	260198d42c	Disable bazel in CircleCI (#62055 ) Summary: As it runs in GHA for a while Pull Request resolved: https://github.com/pytorch/pytorch/pull/62055 Reviewed By: zhouzhuojie, seemethere Differential Revision: D29856620 Pulled By: malfet fbshipit-source-id: 754e392442f68d4eee15811e2bd2cf147326c42a	2021-07-22 16:28:12 -07:00
Richard Barnes	a91be24e2d	Modernize make pointers (#61741 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61741 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29717385 fbshipit-source-id: 4452b77981e49175f744bdaab12cd225bf75b90e	2021-07-22 15:54:37 -07:00
Jane Xu	f98fa5ea13	[skip ci] minor typo link fix (#62042 ) Summary: This is not a functional change but a typo fix where I forgot to update the link to windows_smoke_tests.csv in test_python_first_shard. The windows_smoke_tests.csv is currently the same in pytorch/test-infra and my fork, janeyx99/test-infra, but that will not be the case in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62042 Reviewed By: seemethere Differential Revision: D29851984 Pulled By: janeyx99 fbshipit-source-id: 9bafdf0ba006b9128463e3cf132fdfcddd3d10f2	2021-07-22 15:34:41 -07:00
Eli Uriegas	1a64a5c0ba	.github: Only run workflows on pytorch/pytorch (#62044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62044 Downstream users have reported that they're seeing github workflows pop up in their downstream forks which is not ideal. Let's make it so that all of these generated workflows actually get skipped. Also includes workflows related to automating pytorch/pytorch repository maintenance Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet, janeyx99 Differential Revision: D29852199 Pulled By: seemethere fbshipit-source-id: bbc1684c06a50bb3597f3112cb65fe9c1a4d7c1f	2021-07-22 15:08:31 -07:00
Thomas J. Fan	414537ac99	DOC Fixes link in register_module_backward_hook (#61999 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61580 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61999 Reviewed By: saketh-are Differential Revision: D29847397 Pulled By: albanD fbshipit-source-id: 3d9e1a5abac82d658b4f1746ace73e2fecb41725	2021-07-22 14:29:40 -07:00
Alban Desmaison	b522f3be4c	Svd docfix (#62028 ) Summary: moving back the variable names to match the python variable and remove unicode exponents. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62028 Reviewed By: saketh-are, mruberry Differential Revision: D29848591 Pulled By: albanD fbshipit-source-id: f86b8666cb5f86e300e214a6d59638d069018c50	2021-07-22 14:11:52 -07:00
Jane Xu	d6e776d961	Add build/.ninja_log to artifacts for Windows (#62035 ) Summary: Being able to download the .ninja_log allows for better debugging. There may be a follow-up PR to convert this to a better tracefile. This PR only handles windows as it is already handled for linux here: https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L248-L252 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62035 Test Plan: Check the artifacts for a windows job and see if we see .ninja_log Reviewed By: malfet Differential Revision: D29852228 Pulled By: janeyx99 fbshipit-source-id: a3a87b709cd0c84f5b3cdc274ac4a623771c2b5c	2021-07-22 13:04:29 -07:00
Thomas J. Fan	0309c5780d	ENH Adds no batch dim support for AvgPool1d (#61860 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61860 Reviewed By: albanD Differential Revision: D29826382 Pulled By: jbschlosser fbshipit-source-id: 47e12073d866f0604310fc1ff270cde9907e516d	2021-07-22 12:46:48 -07:00
Kurt Mohler	5a00152a3d	Warn about poor performance creating Tensor from list of numpy.array's (#51680 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/13918 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51680 Reviewed By: saketh-are Differential Revision: D29847229 Pulled By: ezyang fbshipit-source-id: 0519aad27f9ca1d8c06be5b9e6de382374d8b72b	2021-07-22 12:02:50 -07:00
Mike Iovine	2b0eddb0aa	[Static Runtime] Implement prim::isinstance and prim::TypeCheck (#61783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61783 Implement two new prim operators for static runtime: `isinstance` and `TypeCheck`. `isinstance` is very straightforward, but there were a few wrinkles with implementing `TypeCheck`: 1. There is no way to directly generate `TypeCheck` nodes from TorchScript, they are generated by the JIT at runtime. This makes testing a little difficult. I had to make some modifications to `testStaticRuntime` to allow for the use of IR and TorchScript tests. 2. The behavior of `prim::TypeCheck` as implemented here does not match up 1:1 with the version implemented in the interpreter! This is because grad mode is disabled in static runtime. Here's an example. IR is the same as the one included in this test, but with `requires_grad == 1` ``` graph(%a.1 : Tensor, %b.1 : Tensor): %t0 : Float(2, 2, strides=[2, 1], device=cpu, requires_grad=1), %t1 : Float(3, 3, strides=[3, 1]), %type_matched : bool = prim::TypeCheck[types=[Float(2, 2, strides=[2, 1], device=cpu, requires_grad=1), Float(3, 3, strides=[3, 1])]](%a.1, %b.1) return (%t0, %t1, %type_matched) ``` And in the test setup: ``` auto a = at::zeros({2, 2}, at::kFloat); a.to(at::kCPU); a.set_requires_grad(true); auto b = at::ones({3, 3}, at::kFloat); std::vector<IValue> args_correct = {a, b}; // prim::TypeCheck should be true with args_correct, // but we get false when using static runtime! ``` Reviewed By: hlu1 Differential Revision: D29743862 fbshipit-source-id: db1788f0f5de42bab42602e8cc24eee04cbcc280	2021-07-22 10:23:35 -07:00
Zhiyuan Chen	e6339ee336	optimize imports (#61908 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/61908 Reviewed By: suo Differential Revision: D29800269 Pulled By: ejguan fbshipit-source-id: 74ce4414eb6d2a5608df9ec1efdc71e2112aef70	2021-07-22 09:58:44 -07:00
Jane Xu	554e04090f	Add 11.3 conda nightly binaries (#61873 ) Summary: Adds conda 11.3 cuda binaries to our nightly matrix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61873 Test Plan: Tested by https://github.com/pytorch/pytorch/pull/61867-->testing complete, showing all passing binaries. THIS CAN ONLY BE MERGED _AFTER_ pytorch/builder#806 and pytorch/builder#807 are merged, which they now are. Reviewed By: saketh-are Differential Revision: D29848267 Pulled By: janeyx99 fbshipit-source-id: db04899418bd0b4116315fbbe36b06f772020c2e	2021-07-22 09:50:13 -07:00
Gautier Minster	e858f6eed9	torch.nn.utils.clip_grad_norm_: remove device syncs (#61042 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60691 ### Changes Per the discussion in the above issue, this PR makes 2 changes: 1. When `error_if_nonfinite=False`, the NaN/Inf checks are truly skipped, and no device synchronization occurs. - Additionally, when performing the checks, the 2 results are combined with `torch.logical_or` to incur only a single sync (instead of 2 in the happy/finite path). 2. The `clip_coef` conditional is removed, in favor of a call to `clamp(..., max=1.0)` and an unconditional multiplication. ### Testing - The existing unit tests for `clip_grad_norm_` pass. - I have manually profiled the example program from https://github.com/pytorch/pytorch/issues/60691, and verified that: - No synchronizations occur when using `error_if_nonfinite=False`. - A single synchronization occurs when using `error_if_nonfinite=True`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61042 Reviewed By: mrshenli Differential Revision: D29764096 Pulled By: jbschlosser fbshipit-source-id: db594b24608d16374b91bcbb9469046dfeeb152d	2021-07-22 08:53:40 -07:00
imaginary-person	9e53c823b8	Add AVX512 support in ATen & remove AVX support (#61903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61903 ### Remaining Tasks - [ ] Collate results of benchmarks on two Intel Xeon machines (with & without CUDA, to check if CPU throttling causes issues with GPUs) - make graphs, including Roofline model plots (Intel Advisor can't make them with libgomp, though, but with Intel OpenMP). ### Summary 1. This draft PR produces binaries with with 3 types of ATen kernels - default, AVX2, AVX512 . Using the environment variable `ATEN_AVX512_256=TRUE` also results in 3 types of kernels, but the compiler can use 32 ymm registers for AVX2, instead of the default 16. ATen kernels for `CPU_CAPABILITY_AVX` have been removed. 2. `nansum` is not using AVX512 kernel right now, as it has poorer accuracy for Float16, than does AVX2 or DEFAULT, whose respective accuracies aren't very good either (#59415). It was more convenient to disable AVX512 dispatch for all dtypes of `nansum` for now. 3. On Windows , ATen Quantized AVX512 kernels are not being used, as quantization tests are flaky. If `--continue-through-failure` is used, then `test_compare_model_outputs_functional_static` fails. But if this test is skipped, `test_compare_model_outputs_conv_static` fails. If both these tests are skipped, then a third one fails. These are hard to debug right now due to not having access to a Windows machine with AVX512 support, so it was more convenient to disable AVX512 dispatch of all ATen Quantized kernels on Windows for now. 4. One test is currently being skipped - [test_lstm` in `quantization.bc](https://github.com/pytorch/pytorch/issues/59098) - It fails only on Cascade Lake machines, irrespective of the `ATEN_CPU_CAPABILITY` used, because FBGEMM uses `AVX512_VNNI` on machines that support it. The value of `reduce_range` should be used as `False` on such machines. The list of the changes is at https://gist.github.com/imaginary-person/4b4fda660534f0493bf9573d511a878d. Credits to ezyang for proposing `AVX512_256` - these use AVX2 intrinsics but benefit from 32 registers, instead of the 16 ymm registers that AVX2 uses. Credits to limo1996 for the initial proposal, and for optimizing `hsub_pd` & `hadd_pd`, which didn't have direct AVX512 equivalents, and are being used in some kernels. He also refactored `vec/functional.h` to remove duplicated code. Credits to quickwritereader for helping fix 4 failing complex multiplication & division tests. ### Testing 1. `vec_test_all_types` was modified to test basic AVX512 support, as tests already existed for AVX2. Only one test had to be modified, as it was hardcoded for AVX2. 2. `pytorch_linux_bionic_py3_8_gcc9_coverage_test1` & `pytorch_linux_bionic_py3_8_gcc9_coverage_test2` are now using `linux.2xlarge` instances, as they support AVX512. They were used for testing AVX512 kernels, as AVX512 kernels are being used by default in both of the CI checks. Windows CI checks had already been using machines with AVX512 support. ### Would the downclocking caused by AVX512 pose an issue? I think it's important to note that AVX2 causes downclocking as well, and the additional downclocking caused by AVX512 may not hamper performance on some Skylake machines & beyond, because of the double vector-size. I think that [this post with verifiable references is a must-read](https://community.intel.com/t5/Software-Tuning-Performance/Unexpected-power-vs-cores-profile-for-MKL-kernels-on-modern-Xeon/m-p/1133869/highlight/true#M6450). Also, AVX512 would _probably not_ hurt performance on a high-end machine, [but measurements are recommended](https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/). In case it does, `ATEN_AVX512_256=TRUE` can be used for building PyTorch, as AVX2 can then use 32 ymm registers instead of the default 16. [FBGEMM uses `AVX512_256` only on Xeon D processors](https://github.com/pytorch/FBGEMM/pull/209), which are said to have poor AVX512 performance. This [official data](https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-scalable-spec-update.pdf) is for the Intel Skylake family, and the first link helps understand its significance. Cascade Lake & Ice Lake SP Xeon processors are said to be even better when it comes to AVX512 performance. Here is the corresponding data for [Cascade Lake](https://cdrdv2.intel.com/v1/dl/getContent/338848) - ![CASCADE LAKE AVX2](https://user-images.githubusercontent.com/76181208/120666172-ffec3f80-c451-11eb-8ea1-8933ccc12a1b.PNG) ![CASCADE LAKE AVX512](https://user-images.githubusercontent.com/76181208/120666190-04b0f380-c452-11eb-9faa-38d233c874c8.PNG) The corresponding data isn't publicly available for Intel Xeon SP 3rd gen (Ice Lake SP), but [Intel mentioned that the 3rd gen has frequency improvements pertaining to AVX512](https://newsroom.intel.com/wp-content/uploads/sites/11/2021/04/3rd-Gen-Intel-Xeon-Scalable-Platform-Press-Presentation-281884.pdf). Ice Lake SP machines also have 48 KB L1D caches, so that's another reason for AVX512 performance to be better on them. ### Is PyTorch always faster with AVX512? No, but then PyTorch is not always faster with AVX2 either. Please refer to #60202. The benefit from vectorization is apparent with with small tensors that fit in caches or in kernels that are more compute heavy. For instance, AVX512 or AVX2 would yield no benefit for adding two 64 MB tensors, but adding two 1 MB tensors would do well with AVX2, and even more so with AVX512. It seems that memory-bound computations, such as adding two 64 MB tensors can be slow with vectorization (depending upon the number of threads used), as the effects of downclocking can then be observed. Original pull request: https://github.com/pytorch/pytorch/pull/56992 Reviewed By: soulitzer Differential Revision: D29266289 Pulled By: ezyang fbshipit-source-id: 2d5e8d1c2307252f22423bbc14f136c67c3e6184	2021-07-22 08:51:49 -07:00
cyy	59d6e07ada	fix forward_idx check (#59911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59911 Reviewed By: dzhulgakov Differential Revision: D29829020 Pulled By: albanD fbshipit-source-id: f685063061dab499368a272d6b94a44e89f9a143	2021-07-22 08:37:33 -07:00
Vitaly Fedyunin	b60d1b713e	Revert D26007050: add channels last support for thnn_conv2d (non-dilated) Test Plan: revert-hammer Differential Revision: D26007050 (`8b88c24670`) Original commit changeset: 1289e0687c24 fbshipit-source-id: 88b679efbcae572fe604d50e2199861cadbc3d4a	2021-07-22 08:31:15 -07:00
Vitaly Fedyunin	171598f0e3	[Refactoring] Fix imports order for torch/utils/data/dataset.py (#61328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61328 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29588897 Pulled By: VitalyFedyunin fbshipit-source-id: 63df653fb471532819c83ebcee4f9dc951500ffb	2021-07-22 08:30:08 -07:00
jiayisun	1b02641bb1	add BFloat16 operators on CPU: arange, acosh, asinh, atanh, exp2, digamma, trigamma, polygamma (#60444 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60444 Reviewed By: ejguan Differential Revision: D29800899 Pulled By: ezyang fbshipit-source-id: 26d2c2ac3e7d3a2d49679508aad8c8bf0232cad5	2021-07-22 08:13:22 -07:00
Edward Yang	f3f7e92be5	Manually call lazyInitCUDA in structured CUDA calls (#61882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61882 If you directly call the native implementation that bypasses the initialization, which is bad! This probably slows things down a little though... Fixes problem uncovered by #61642 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D29783856 Pulled By: ezyang fbshipit-source-id: 16857569a049e09c6ebd96ef04b0025403b254af	2021-07-22 07:50:05 -07:00
Vitaly Fedyunin	196679d3aa	[Refactoring] Reordering imports in torch/utils/data/datapipes/iter/__init__.py (#61325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61325 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29588896 Pulled By: VitalyFedyunin fbshipit-source-id: 8c0f3580f82083c43a590a18ecddb3e04ae93ca9	2021-07-22 07:46:08 -07:00
Alban Desmaison	25be031c6e	Add missing docker build to slow gradcheck label-triggered build (#61941 ) Summary: Currently, when adding the label, it fails like: https://app.circleci.com/pipelines/github/pytorch/pytorch/352569/workflows/d213cbad-edd6-4fe0-a79c-d46f8c0aae85/jobs/14856158 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61941 Reviewed By: suo Differential Revision: D29827084 Pulled By: albanD fbshipit-source-id: 134828d36e51324e6b6539dd4bc5f1eebfb89a03	2021-07-22 07:37:21 -07:00
Andrew Gu	5186fa2831	Fix `c10d` -> `dist` in `test_ddp_hooks.py` (#61864 ) Summary: Overview: The existing `test_ddp_hooks.py` test file uses a prefix `c10d`, which is not defined in the file, meaning the test errors if left as is. This renames each `c10d` prefix to `dist`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61864 Test Plan: All four tests pass when run: ``` gpurun python test/distributed/algorithms/ddp_comm_hooks/test_ddp_hooks.py ``` Reviewed By: ejguan Differential Revision: D29783860 Pulled By: andwgu fbshipit-source-id: 16bdd2dfcb76192964246148f14851a74f8907c8	2021-07-22 07:20:41 -07:00
Freey0	109bd5e78a	OpInfo: bitwise_and (#61349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61349 Also add type promotion test for bugs found by pr #60813 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29592840 Pulled By: ezyang fbshipit-source-id: ee013b20e31baf6c6ebf2edb881ae6d8e215c7a6	2021-07-22 07:04:17 -07:00
lezcano	2f3300f25f	[docs] Correct torch.permute (#61833 ) Summary: Noted while reviewing https://github.com/pytorch/pytorch/issues/61830 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61833 Reviewed By: albanD Differential Revision: D29816661 Pulled By: mruberry fbshipit-source-id: 895607d7ddcbd4319218ab7719a2f57cbde2283c	2021-07-22 00:27:23 -07:00
Kushashwa Ravi Shrimali	5801431c9b	OpInfo Ref: addbmm (#61832 ) Summary: See https://github.com/pytorch/pytorch/issues/54261. This PR: * Adds reference wrapper using NumPy for reference function of `addbmm` * Refines sample inputs (makes it more readable and avoids redundancy) cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/61832 Reviewed By: albanD Differential Revision: D29816024 Pulled By: mruberry fbshipit-source-id: e0fea6dc923504169a13bfaa258c61fbbc5fa9f4	2021-07-22 00:26:10 -07:00
Jiewen Tan	31beef009d	Fix IMethodTest.GetArgumentNames after D29648756 (#61985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61985 Fix IMethodTest.GetArgumentNames after D29648756 (`641f6ef8a7`). ghstack-source-id: 134054637 Test Plan: buck test mode/dev caffe2/test/cpp/api:imethod -- IMethodTest.GetArgumentNames Reviewed By: suo Differential Revision: D29828807 fbshipit-source-id: b1411745b91e1b8c0ea0fd9e9666e22125dde333	2021-07-22 00:21:59 -07:00
Zeina Migeed	07a91f1cfd	fix graph deepcopy to propagate output type (#61747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61747 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29737565 Pulled By: migeed-z fbshipit-source-id: 8583f0c87f2db27695e062f59a15de77f3b00fd6	2021-07-21 23:53:03 -07:00
Masaki Kozuki	8a2063e58a	Foreach Test Refactor: Pointwise, Min/Max-imum (#61327 ) Summary: - rewrite pointwise unittests using `ops` decorator - rewrite minimum&maximum unittests using `ops` decorator - enable minimum/maximum fastpath for BFloat16 - remove _test_data method https://github.com/pytorch/pytorch/issues/58833 cc: ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/61327 Reviewed By: albanD Differential Revision: D29830209 Pulled By: ngimel fbshipit-source-id: fa7805262b86c40fc32750b16629d80ad48ea4b5	2021-07-21 21:59:57 -07:00
Vitaly Fedyunin	d6899fe492	[Refactoring] Reordering imports in utils/data/__init__.py (#61324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61324 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29588895 Pulled By: VitalyFedyunin fbshipit-source-id: 5e719c80f9cb5630c65187ac89773831777f368d	2021-07-21 21:38:28 -07:00
Eli Uriegas	06efced177	.github: Specify directory to pull reports from (#61990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61990 This adds more specificity to where to pull test reports from since I believe that actions/upload-artifact doesn't actually respect the working-directory default Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: albanD, zhouzhuojie Differential Revision: D29831719 Pulled By: seemethere fbshipit-source-id: cee5609f97338d44a484d85baa77f0167d81ce55	2021-07-21 20:57:07 -07:00
Shiyan Deng	cc18654d66	[fx_acc] Refactoring acc_tracer (#61963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61963 Test Plan: CI Reviewed By: jfix71 Differential Revision: D29772522 fbshipit-source-id: 4b117735147624f9428b933ea798495823423a0e	2021-07-21 20:09:15 -07:00
Natalia Gimelshein	6284d2a82b	wrap cudaStreamSynchronize calls (#61889 ) Summary: This is a first step towards creating context manager that errors out on synchronizing calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61889 Reviewed By: albanD Differential Revision: D29805280 Pulled By: ngimel fbshipit-source-id: b66400fbe0941b7daa51e6b30abe27b9cccd4e8a	2021-07-21 19:30:52 -07:00
Xue Haotian	3d6aa3a2f6	Enable `torch.isclose` to suppport bool tensors (#61271 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60533 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61271 Reviewed By: zhxchen17 Differential Revision: D29737618 Pulled By: SplitInfinity fbshipit-source-id: 45314bc7e0b9a28c10700455b1e6267c0db3eefc	2021-07-21 18:50:14 -07:00
Zeina Migeed	243c7079a1	add 3d input and output shapes to maxpool documentation (#61310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61310 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29737516 Pulled By: migeed-z fbshipit-source-id: eb6964f6808b8ae05d4d3852a5162dc66930cd64	2021-07-21 18:27:27 -07:00
Pyre Bot Jr	d00bb45846	[typing] suppress errors in `fbcode/caffe2` - batch 2 Test Plan: Sandcastle Differential Revision: D29827809 fbshipit-source-id: 7ca7c2a33d691ac57392945b78a320d253c84ed4	2021-07-21 17:56:26 -07:00
driazati	a0e381641b	Remove relative paths for clang-tidy annotations (#62004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62004 Some of the files checked by clang tidy are compiled from a sibling directory, so the files all start with something like `../torch`. This ends up messing with `translate_annotations.py` which runs from the repo root. This fixes it by chopping off any relative paths in the clang tidy output. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29835446 Pulled By: driazati fbshipit-source-id: 2bd279370e41ed0a321e30f88fe38434105c75e8	2021-07-21 17:52:31 -07:00
Harut Movsisyan	e731a63e63	Silence clang-tidy linter for TorchpyTest.FxModule test (#62001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62001 This will fix [this linter error](https://github.com/pytorch/pytorch/runs/3120335141) introduced with D29690088 (`810e19979d`). Test Plan: N/A (just looked at other examples and tidy doc https://clang.llvm.org/extra/clang-tidy/) Reviewed By: suo Differential Revision: D29832654 fbshipit-source-id: 8cf69cb5551f3b1bd384a2553dc5c827beb0a68f	2021-07-21 17:40:46 -07:00
zhouzhuojie	b6ff0fa8dd	Enable dynamically ciflow/slow so that we can run GHA slow tests on PR (#61987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61987 This PR enables us to run slow GHA tests on PR. Steps to do (~may only take effect after this PR is merged~ works on this PR) - Add label `ciflow/slow` - Assign/unassign pytorchbot - The job should be running .github/workflows/pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7.yml The above steps are manual, and after probot can do the dispatch work, the ciflow will be automated. Related meta RFC issue: #61888 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D29832758 Pulled By: zhouzhuojie fbshipit-source-id: 64d31ef572502e62b80e6b7ac480ffcfa9f4e38b	2021-07-21 16:56:54 -07:00
Nikita Shulga	9d6cdf34a4	Annotate generated files in .gitattributes (#61995 ) Summary: Mark CI yaml files generated from templates as linguist-generated Fixes https://github.com/pytorch/pytorch/issues/61994 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61995 Reviewed By: seemethere Differential Revision: D29832199 Pulled By: malfet fbshipit-source-id: 86ad3a16b4d3e4f94c35b8f766a8556a07632419	2021-07-21 16:49:07 -07:00
Raghavan Raman	ae58a4c45d	[Static Runtime] Added a variadic cat operator (#61302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61302 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D29565344 Pulled By: navahgar fbshipit-source-id: 96f5f4546ec0e61eb7f87e016e026e7b62576248	2021-07-21 15:58:20 -07:00
Richard Barnes	b145889192	Modernize use make_unique (#61739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61739 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29717133 fbshipit-source-id: 70e3d81a48f7ae90cca3ef3c9587174ca15d81f4	2021-07-21 15:28:26 -07:00
Mengwei Liu	2c0ecfbb20	[PyTorch] Expose bias() and unpack() API of LinearPackedParamsBase to Python layer (#61855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61855 Exposing `bias()` and `unpack()` for `LinearPackedParamsBase`. This is useful for inspecting linear op attributes. Test Plan: See unit test passing: ``` [ (6c61a5eb4) \| devvm1625 ~/fbsource/fbcode] buck test //caffe2/test:quantization -- test_linear_bias_unpack Parsing buck files: finished in 2.8 sec Building: finished in 9.9 sec (100%) 11973/55220 jobs, 0/55220 updated Total time: 12.8 sec More details at https://www.internalfb.com/intern/buck/build/2d0ee210-c8f3-4994-ac2b-1dccf4c3ca6c Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: b7c6ea1b-8eef-430e-b83a-dad4033ecc87 Trace available for this run at /tmp/tpx-20210720-115423.031745/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5348024618459562 ✓ ListingSuccess: caffe2/test:quantization - main (10.806) ✓ Pass: caffe2/test:quantization - test_linear_bias_unpack (quantization.core.test_quantized_op.TestQuantizedOps) (10.913) Summary Pass: 1 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5348024618459562 ``` Reviewed By: kimishpatel Differential Revision: D29767704 fbshipit-source-id: 716f43b61814b92094c0b08d4e63e1dddc352aa7	2021-07-21 15:13:40 -07:00
BowenBao	a02ccd6080	[ONNX] add supplement for standardOps low precision cast (#60731 ) (#61561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61561 address Gary reply and add supplement of https://github.com/pytorch/pytorch/pull/53813. - add more details for LowPrecisionCastNodeForStandardOps to make it more comprehensible. - remove unuse gemm test Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D29767991 Pulled By: SplitInfinity fbshipit-source-id: d00032e13699f5b02fc619e64aa8fdd39f3a66b8 Co-authored-by: hwangdeyu <dejack953@outlook.com>	2021-07-21 15:10:36 -07:00
BowenBao	6f08ddfc28	[ONNX] Enable aten:normal op and add tests for aten:uniform op. (#60441 ) (#61560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61560 1. Add a new symbolic function broadcast_tensors() to support exporting torch.broadcast_tensors() function. This is required by exporting torch.distribution.normal() function. 2. Add a new symbolic function normal() to support exporting torch.distribution.normal() function. 3. Add relative tests for normal and uniform ops as well. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D29767995 Pulled By: SplitInfinity fbshipit-source-id: acfe5e7801d00c0df8ca46966bbd6015fed0045e Co-authored-by: Jay Zhang <jiz@microsoft.com>	2021-07-21 15:10:35 -07:00
BowenBao	f0054e1a6e	[ONNX] Update expand_as for dynamic shape (#61084 ) (#61559 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61559 Update expand_as for dynamic shape Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D29767990 Pulled By: SplitInfinity fbshipit-source-id: 3f1e3f68fd17c5ffbd4a50fccff224fd9d6c84fb Co-authored-by: Negin Raoof <neginmr@utexas.edu>	2021-07-21 15:10:33 -07:00
BowenBao	34075e2c8b	[ONNX] Fix the issue of converting empty list to sequence. (#58651 ) (#61558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61558 When we construct an empty list by python list comprehension, we need to avoid converting the node without inputs to onnx::Concat in shape_type_inference.cpp and peephole.cpp because it will create an invalid Concat node which doesn't have inputs. In addition, update the code to avoid passing a Sequence input to an onnx::Cast node which doesn't accept Sequence data type as an input. Add tests for the validation as well. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D29767989 Pulled By: SplitInfinity fbshipit-source-id: f97f172ff20eebda4c3744c7a934df36716f12a2 Co-authored-by: fatcat-z <jiz@microsoft.com>	2021-07-21 15:10:31 -07:00
BowenBao	22e60d77e7	[ONNX] Support tensor list as module attribute (#59685 ) (#61557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61557 * Support tensor list as module attribute. * Support exporting `torch.set_`. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D29767992 Pulled By: SplitInfinity fbshipit-source-id: 5ac5a09600d4dbe86b2fe354d240e46f1d1084ef	2021-07-21 15:08:35 -07:00
Pritam Damania	a8f6b5a80a	[1/N] Avoid skipping tests in sandcastle. (#61876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61876 In the sandcastle environment, avoid skipping tests and instead just "pass" these tests to avoid a large number of tasks being created which are not actionable. ghstack-source-id: 133846232 Test Plan: Test with `SANDCASTLE=1 TW_JOB_USER=sandcastle` Reviewed By: rohan-varma Differential Revision: D29779699 fbshipit-source-id: add71008830dfa6f456ce2365a2d70436b7b7a31	2021-07-21 14:31:17 -07:00
Laurence Rouesnel	adb73d3dcf	Removed overhead from reshape() call if tensor doesn't need to be changed (#61466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61466 ## Goal Per #55126 the performance of `reshape` is worse than `alias` in cases where they are performing the same operation (i.e. where reshape is returning a view) because `reshape` delegates to `view` and duplicates some of the operations (specifically `infer_size_dv` and `computeStride`). The goal of this pull-request is to reduce or remove the additional overhead that `reshape` has. ### Proposed Implementation Instead of using `view` we implement a private/internal operator (`_reshape_alias`) that `reshape` dispatches to which skips the relevant checks. This is functionally equivalent to `as_strided` however it is a lot simpler because it's specialized to this use-case, and importantly the `backward` implementation is a lot faster. Note that we have to dispatch (`reshape` is a composite operator) because `reshape` can return either a view or a copy of the Tensor depending on the parameters, and this complicates implementing a derivative/backward for `reshape`. ### Why not `as_strided`? Using `as_strided` directly slows down autograd. If we use a custom function equivalent to `_reshape_alias` but with a simpler backward function then `view` has the same performance as `reshape`. If we delegate to `as_strided` it is about 56% slower (and this holds against our custom function). This is also the reason we make an internal operator named `_reshape_alias` instead of exposing a new operator since this should only be used in the `reshape` case and it is effectively a more limited version of `view`, `alias`, and `as_strided`. ## Benchmarks In a micro-benchmark for `backward` running: ```cpp // Setup at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); // Benchmark loop // `reshape(-1)` replaced with a call to view(-1) for view baseline x.pow(4).reshape(-1).mean().backward(); ``` I also benchmarked simple operations without gradients using: ```cpp // Setup at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); // Benchmark loop x.reshape(-1) // replaced with a call to view(-1) for view baseline ``` Baselined to `view`: * Original `reshape`: `+3.3%` (without gradients `+20.8%`) * Using `as_strided`: `+55.1%` (without gradients `+1.0%`) * Using custom `_reshape_view`: `-1.0%` (without gradients `+6.2%`) In absolute terms (note the percentages above were generated comparing between runs/tests rather than to a single baseline): * Original `view`: `53.66 us` (without gradients `582.78 ns`) * Original `reshape`: `55.46 us` (without gradients `704.24 ns`) * Using `as_strided`: `83.24 us` (without gradients `576.49 ns`) * Using custom `_reshape_view`: `53.13 us` (without gradients `536.01 ns`) Note that these benchmarks perform a backwards operation as well. When compared without using gradient computation at all the performance differneces are more pronounced as this takes up more of the time. ### Original performance <details> <summary>Benchmark results</summary> ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f0e4d393160> x.pow(4).view(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.66 us IQR: 2.70 us (52.54 to 55.24) 884 measurements, 100 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f0e2ebd4fa0> x.pow(4).reshape(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 55.46 us IQR: 2.61 us (54.39 to 57.01) 889 measurements, 100 runs per measurement, 1 thread] 2276116 2286256 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f0e5b2e3e20> 2640 ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&) 1920 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1040 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long>&&) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) 720 ???:__tls_get_addr 520 ???:at::shouldRunRecordFunction(bool) 520 ???:__memcpy_avx_unaligned_erms 200 ???:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10:: ... g>)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) 100 ???:c10::TensorImpl::strides() const 100 ???:c10::TensorImpl::sizes() const 100 ???:at::(anonymous namespace)::manager() 77 /tmp/benchmark_utils_jit_build__1626465284__8a34e7ff-cd37-4a82-be28-7f19e081e771/timer_cpp_7815557938202456331/timer_src.cpp:main 40 ???:c10::TensorImpl::numel() const -77 /tmp/benchmark_utils_jit_build__1626465284__8a34e7ff-cd37-4a82-be28-7f19e081e771/timer_cpp_8055217880649990171/timer_src.cpp:main -260 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 10140 ``` ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f850dd66c10> x.view(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 582.78 ns IQR: 33.80 ns (573.80 to 607.61) 833 measurements, 10000 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f850de31e20> x.reshape(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 704.24 ns IQR: 24.42 ns (697.20 to 721.62) 679 measurements, 10000 runs per measurement, 1 thread] 56896 67036 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f84e1930bb0> 2640 ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&) 1920 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1040 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long>&&) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) 720 ???:__tls_get_addr 520 ???:at::shouldRunRecordFunction(bool) 520 ???:__memcpy_avx_unaligned_erms 200 ???:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10:: ... g>)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) 100 ???:c10::TensorImpl::strides() const 100 ???:c10::TensorImpl::sizes() const 100 ???:at::(anonymous namespace)::manager() 76 /tmp/benchmark_utils_jit_build__1626466038__15fbbac0-2072-4459-8f8e-08121a905b99/timer_cpp_547407365342278353/timer_src.cpp:main 40 ???:c10::TensorImpl::numel() const -76 /tmp/benchmark_utils_jit_build__1626466038__15fbbac0-2072-4459-8f8e-08121a905b99/timer_cpp_3457873755756181226/timer_src.cpp:main -260 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 10140 ``` </details> ### Using `as_strided` <details> <summary>Benchmark results</summary> ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f8b13bb5b50> x.pow(4).view(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.37 us IQR: 3.15 us (51.73 to 54.88) 936 measurements, 100 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f8af55f8490> x.pow(4).reshape(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 83.24 us IQR: 4.05 us (81.20 to 85.25) 609 measurements, 100 runs per measurement, 1 thread] 2267916 2525061 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f8af55f8e50> 31930 ???:_int_free 15940 ???:malloc 11595 ???:_int_malloc 10100 ???:torch::autograd::generated::details::as_strided_backward(at::Tensor, at::TensorGeometry, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 9360 ???:__tls_get_addr 8280 ???:free 8100 ???:torch::autograd::VariableType::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 4520 ???:c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::reset_() 4080 ???:operator new(unsigned long) ... -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1220 ???:torch::autograd::generated::ViewBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -2560 ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) -4860 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) Total: 257145 ``` ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f93176a0160> x.view(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 570.55 ns IQR: 32.69 ns (552.87 to 585.56) 874 measurements, 10000 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f92f8f29490> x.reshape(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 576.49 ns IQR: 37.95 ns (559.51 to 597.46) 861 measurements, 10000 runs per measurement, 1 thread] 56896 58556 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f932556ca60> 2140 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1940 ???:torch::autograd::VariableType::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1880 ???:torch::ADInplaceOrView::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1720 ???:at::_ops::as_strided::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1400 ???:at::native::as_strided_tensorimpl(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1260 ???:at::_ops::as_strided::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)'2 1260 ???:at::_ops::as_strided::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) ... -620 ???:at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, c10::ArrayRef<long ... ::ArrayRef<long>)> const&, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) const -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2 -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -1740 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 1660 ``` </details> ### Using custom function (`_reshape_alias`) <details> <summary>Benchmark results</summary> ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f16861d6b50> x.pow(4).view(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.50 us IQR: 2.64 us (52.32 to 54.96) 906 measurements, 100 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f1667b2ed60> x.pow(4).reshape(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.13 us IQR: 3.40 us (51.72 to 55.13) 914 measurements, 100 runs per measurement, 1 thread] 2269736 2273236 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f1693f8dc10> 5060 ???:torch::autograd::VariableType::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 2000 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1780 ???:torch::ADInplaceOrView::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1660 ???:at::_ops::_reshape_alias::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1600 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::ArrayRef<long> >(at::Tensor const&, c10::ArrayRef<long> const&, c10::ArrayRef<long> const&) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)'2 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1220 ???:torch::autograd::generated::AliasToShapeBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) ... -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2 -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1220 ???:torch::autograd::generated::ViewBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) -4860 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) Total: 3500 ``` ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f5287adfb20> x.view(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 505.10 ns IQR: 20.04 ns (500.41 to 520.45) 944 measurements, 10000 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f526951b430> x.reshape(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 536.01 ns IQR: 17.81 ns (531.34 to 549.16) 916 measurements, 10000 runs per measurement, 1 thread] 56896 60376 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f5295896c10> 2000 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1860 ???:torch::autograd::VariableType::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1780 ???:torch::ADInplaceOrView::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1660 ???:at::_ops::_reshape_alias::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1600 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::ArrayRef<long> >(at::Tensor const&, c10::ArrayRef<long> const&, c10::ArrayRef<long> const&) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)'2 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) ... -620 ???:at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, c10::ArrayRef<long ... ::ArrayRef<long>)> const&, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) const -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2 -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -1740 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 3480 ``` </details> Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29792126 Pulled By: laurencer fbshipit-source-id: f0519b45b65f868aa3e8651679354558bd761dfd	2021-07-21 14:05:35 -07:00
Richard Barnes	a8d99a28d7	Modernize avoid a C array (#61740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61740 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29717118 fbshipit-source-id: 70e73346b75deb4fe6b6399e06bd576f3b6e2b91	2021-07-21 13:52:54 -07:00
zhouzhuojie	d7b31fe95d	Add ciflow config and change jinja2 templates (#61886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61886 This PR is rolling out at the `1. Manual Phase`. ``` # Rollout Strategy: # 1. Manual Phase # step 1. Add 'ciflow/default' label to the PR # step 2. Once there's an [unassigned] event from PR, it should rerun # step 3. Remove 'ciflow/default' label # step 4. Trigger the [unassigned] event again, it should not rerun # 2. Probot Phase 1 (manual on 1 workflow) # step 1. Probot automatically add labels based on the context # step 2. Manually let probot trigger [unassigned] event # 4. Probot Phase 3 (auto on 1 workflows) # step 1. Modify the workflows so that they only listen on [unassigned] events # step 2. Probot automatically adds labels automatically based on the context # step 3. Probot automatically triggers [unassigned] event # 4. Probot Phase 3 (auto on many workflows) # step 1. Enable it for all workflows ``` Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D29808366 Pulled By: zhouzhuojie fbshipit-source-id: c7e5009d839239df58825dec093ff0f1fd281697	2021-07-21 13:32:09 -07:00
zhouzhuojie	2dab368d26	Refactor generate_ci_workflows (#61879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61879 Refactor generate_ci_workflows to support CI dispatcher. This is the first step to refactor the workflow into a dataclass with some validation and OOP. Verified that the output is the same: ``` .github/scripts/generate_ci_workflows.py git status ``` Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D29808365 Pulled By: zhouzhuojie fbshipit-source-id: b8c5fd43f4bd6e17e06f3925a1a509084b790d95	2021-07-21 13:30:36 -07:00
Jane Xu	e2acce373f	Run Windows smoke tests with gflags in test dir (#61967 ) Summary: Previous testing yielded the torch.version ModuleNotFound error when I ran the smoke tests from the pytorch root directory. This PR simply reorders the commands to run the smoke tests within the test directory, which passes in this series of runs: https://github.com/seemethere/test-repo/actions/runs/1050734298 (the failures are due to missing credentials during uploading stats, which we don't need here) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61967 Reviewed By: samestep Differential Revision: D29820985 Pulled By: janeyx99 fbshipit-source-id: 363ef321c32cfaf4446ceeb6117ea26abc311816	2021-07-21 12:06:34 -07:00
Amy He	a03466cb07	Back out "Revert D29687143: [3/N] Nnapi Backend Delegate Preprocess: Basic OSS Test" (#61878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61878 CMakeLists.txt Android NNAPI delegate library was moved from test/cpp/jit/CMakeLists.txt to torch/CMakeLists.txt. This resolves the issue the original PR had, where the NNAPI delegate library was added to builds without Python (when it depends on Python). Original PR: https://github.com/pytorch/pytorch/pull/61594 There's an error where the library cannot be built on MacOS. This problem existed in the original PR as well, but now an issue has been created: https://github.com/pytorch/pytorch/issues/61930 test_backend_nnapi.py Also changed the skip unit test headers so that it's a little cleaner. Now the unit tests are skipped if the Nnapi delegate library file is not found. Previously, the skip was based on the platform (only allowing Linux). Test Plan: To run NNAPI delegate unit tests: `python test/test_jit.py TestNnapiBackend` Imported from OSS Reviewed By: iseeyuan Differential Revision: D29799895 fbshipit-source-id: b69a767b5cde3814b0853cfbc84d61ab4155f619	2021-07-21 11:58:45 -07:00
driazati	4532b3c4a9	Fix _C public bindings test (#61088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61088 The test was previously a no-op since it was comparing the bindings with themselves. This fixes that to use the hardcoded list and adds the items that changed in the meantime. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29510525 Pulled By: driazati fbshipit-source-id: 3497023e5c8b3cd6fdd1d07d48b4f2650b203ded	2021-07-21 11:50:37 -07:00
Bradley Davis	8880f3d450	[fx] introduce `__fx_create_arg__` dunder method for controlling custom classes are handled as node args (#61780 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61780 These changes would allow objects to control how they are handled when they are an argument to a torch.fx call_module node from within their source. Previously, we have been using a custom Tracer with an overridden create_arg() method and branching based on class name to handle args that are unusual (data classes, etc). Reviewed By: suo, houseroad Differential Revision: D27976120 fbshipit-source-id: 0c5249c5f8398368ca0fbec0ad8a07ccf99b7da4	2021-07-21 11:27:09 -07:00
Eli Uriegas	3c7bfa632a	reland D29801875: .github: Clone pytorch to separate directory (#61972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61972 This reverts commit 716567504c8b4da8d764d9674595c2095b62080c. Also includes change to add the TEST_CONFIG env variable so that test reports get uploaded correctly. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29821858 Pulled By: seemethere fbshipit-source-id: 23602706446e0a95db6bd7cedfa665e8c4145168	2021-07-21 11:15:52 -07:00
Harut Movsisyan	810e19979d	Torch deploy for fx.grapm_module with non-torch dependencie (#61680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61680 This diff enables torch deploy for fx.graph_module with non-torch dependencies . Here are the issues currently preventing this and are fixed in this change: - Pickle is used as an internal format to transmit objects between interpreters. It needs to serialize python code, but to be able to get the source code for imports from python_code.globals it needs access to the PackageImporter. Currently a regular _reduce_ function is used which doesn't have the notion of custom importer. - When deserializing pickled objects on an interpreter, it is passing empty globals to exec, thus it will not be able to resolve non-torch imports located in the package. We need to be able to point exec to our custom PackageImporter. - Subclasses extending fx.graph_module should be able to optionally provide their own Tracer (extending fx.Tracer). As a solution a new reducer is introduced (_reduce_deploy_) for torch deploy workflow. Reducer will be registered in _deploy.py (entry point for C++ torch deploy API) when saving the object transmitting it between interpreters. It allows us to pass a proper PackageImporter for each interpreter for pickling/unpickling fx.graph_module. It also defines an api for passing custom fx.Tracer when needed. Test Plan: Added UT to cover changes. ``` buck test //caffe2/torch/csrc/deploy:test_deploy ``` ``` buck test caffe2/test:fx ``` Reviewed By: suo Differential Revision: D29690088 fbshipit-source-id: 3a8dbe02d5d7e085534aa61b7773c86f0f8c19b0	2021-07-21 10:29:48 -07:00
Paul Johnson	f41d3341b1	[pytorch] Support embedding_bag_4bit_rowwise_offsets in cuda (#61728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61728 Templatize existing embedding_bag_byte_rowwise_offsets_kernel to support both 4 bits per dimension and 8 bits per dimension. Test rigorously using fb internal random testing vs CPU ops. Reviewed By: hyuen Differential Revision: D29706346 fbshipit-source-id: c9f4591a2cc6205e4b7e57a363ba0a6306fdddd5	2021-07-21 10:23:30 -07:00
Eli Uriegas	716567504c	Revert D29801875: .github: Clone pytorch to separate directory Test Plan: revert-hammer Differential Revision: D29801875 (`a152c12d7b`) Original commit changeset: 71a3c7c949e5 fbshipit-source-id: 85175a9933d1e33117b1461d5a760e1a79f60047	2021-07-21 10:19:28 -07:00
Supriya Rao	ea8abcf76e	[quant] Remove calls to .item() for fake_quant_on (#61921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61921 For GPU training, the fake_quant_on tensors are present on the GPU and the .item() calls incur a GPU->CPU copy to access the tensor element. This call can prove expensive and hurt the performance during training as the `item()` and `local_scalar_dense()` calls take up 11% of the total CPU execution time. The solution here is to access the tensor on the GPU without a copy. Individual op benchmarks show a 33% speedup just by removing the `.item()` calls Profiler Before ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::fused_moving_avg_obs_fake_quant 5.61% 1.538ms 100.00% 27.421ms 548.425us 978.208us 3.42% 28.575ms 571.501us 50 aten::_fused_moving_avg_obs_fq_helper 27.63% 7.576ms 94.39% 25.883ms 517.668us 6.536ms 22.87% 27.597ms 551.937us 50 aten::_fake_quantize_per_tensor_affine_cachemask_ten... 11.07% 3.037ms 21.54% 5.905ms 118.103us 9.549ms 33.42% 9.549ms 190.978us 50 aten::_aminmax 19.39% 5.317ms 27.44% 7.524ms 150.484us 8.683ms 30.38% 8.683ms 173.651us 50 aten::item 4.49% 1.232ms 11.12% 3.051ms 61.011us 1.058ms 3.70% 2.829ms 56.579us 50 aten::_local_scalar_dense 6.63% 1.818ms 6.63% 1.818ms 36.363us 1.771ms 6.20% 1.771ms 35.419us 50 aten::empty 5.76% 1.579ms 5.76% 1.579ms 15.792us 0.000us 0.00% 0.000us 0.000us 100 aten::as_strided 2.29% 628.399us 2.29% 628.399us 6.284us 0.000us 0.00% 0.000us 0.000us 100 aten::empty_like 7.56% 2.073ms 17.13% 4.696ms 31.310us 0.000us 0.00% 0.000us 0.000us 150 aten::empty_strided 9.57% 2.623ms 9.57% 2.623ms 17.489us 0.000us 0.00% 0.000us 0.000us 150 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 27.421ms Self CUDA time total: 28.575ms ``` After ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::fused_moving_avg_obs_fake_quant 6.59% 1.240ms 100.00% 18.820ms 376.396us 490.272us 2.36% 20.745ms 414.901us 50 aten::_fused_moving_avg_obs_fq_helper 26.12% 4.916ms 93.41% 17.580ms 351.597us 2.033ms 9.80% 20.255ms 405.096us 50 aten::_fake_quantize_per_tensor_affine_cachemask_ten... 14.55% 2.738ms 31.09% 5.850ms 117.005us 9.968ms 48.05% 9.968ms 199.363us 50 aten::_aminmax 25.28% 4.758ms 36.21% 6.814ms 136.278us 8.253ms 39.79% 8.253ms 165.069us 50 aten::empty 7.94% 1.494ms 7.94% 1.494ms 14.944us 0.000us 0.00% 0.000us 0.000us 100 aten::as_strided 2.99% 561.785us 2.99% 561.785us 5.618us 0.000us 0.00% 0.000us 0.000us 100 aten::empty_like 8.36% 1.573ms 16.53% 3.112ms 31.118us 0.000us 0.00% 0.000us 0.000us 100 aten::empty_strided 8.17% 1.538ms 8.17% 1.538ms 15.384us 0.000us 0.00% 0.000us 0.000us 100 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 18.820ms Self CUDA time total: 20.745ms ``` Test Plan: python test/test_quantization.py Imported from OSS Reviewed By: jingsh Differential Revision: D29796533 fbshipit-source-id: 10abb93abd61c6ac25b8e8c114aa57b9db891918	2021-07-21 10:13:06 -07:00
Supriya Rao	b8386f5d72	[quant] Create FusedMovingAvgObsFakeQuantize for QAT (#61691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61691 Create a new module for QAT that does a Fused MovingAvgMinMaxObserver and FakeQuantize operation The module currently only supports per-tensor quantization (affine/symmetric). Follow-up PR will add support for per-channel Results on running QAT with MobileNetV2 (Obs enabled/fake_quant enabled) Original FQ module PyTorchObserver {"type": "_", "metric": "qnnpack_fp_latency_ms", "unit": "ms", "value": "242.80261993408203"} PyTorchObserver {"type": "_", "metric": "qnnpack_qat0_latency_ms", "unit": "ms", "value": "505.7964324951172"} PyTorchObserver {"type": "_", "metric": "fbgemm_fp_latency_ms", "unit": "ms", "value": "235.80145835876465"} PyTorchObserver {"type": "_", "metric": "fbgemm_qat0_latency_ms", "unit": "ms", "value": "543.8144207000732"} Fused FakeQuant module (~50% improvement in latency) PyTorchObserver {"type": "_", "metric": "qnnpack_fp_latency_ms", "unit": "ms", "value": "232.1624755859375"} PyTorchObserver {"type": "_", "metric": "qnnpack_qat0_latency_ms", "unit": "ms", "value": "263.8866901397705"} PyTorchObserver {"type": "_", "metric": "fbgemm_fp_latency_ms", "unit": "ms", "value": "236.9832992553711"} PyTorchObserver {"type": "_", "metric": "fbgemm_qat0_latency_ms", "unit": "ms", "value": "292.1590805053711"} Individual module benchmark result (>5x improvement in latency) ===> Baseline FakeQuantize module ``` --------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls --------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::fake_quantize_per_tensor_affine 0.77% 1.210ms 4.92% 7.730ms 154.596us 718.528us 0.45% 9.543ms 190.862us 50 aten::fake_quantize_per_tensor_affine_cachemask 2.41% 3.792ms 4.15% 6.520ms 130.402us 8.825ms 5.58% 8.825ms 176.492us 50 aten::_aminmax 3.25% 5.105ms 4.43% 6.955ms 139.102us 8.193ms 5.18% 8.193ms 163.868us 50 aten::zeros_like 1.87% 2.939ms 6.95% 10.922ms 109.218us 5.992ms 3.79% 10.844ms 108.442us 100 aten::zeros 0.97% 1.527ms 3.11% 4.885ms 97.702us 2.383ms 1.51% 4.800ms 96.010us 50 aten::rsub 1.34% 2.106ms 2.94% 4.614ms 92.277us 2.063ms 1.30% 4.559ms 91.173us 50 aten::clamp 2.79% 4.381ms 5.42% 8.519ms 85.190us 5.385ms 3.41% 8.438ms 84.381us 100 aten::eq 11.70% 18.384ms 21.31% 33.479ms 83.280us 22.465ms 14.21% 33.310ms 82.861us 402 aten::ones 1.05% 1.656ms 2.57% 4.038ms 80.751us 2.494ms 1.58% 3.951ms 79.028us 50 aten::le 2.52% 3.955ms 4.84% 7.607ms 76.071us 4.998ms 3.16% 7.702ms 77.016us 100 aten::min 0.69% 1.087ms 2.32% 3.641ms 72.827us 1.017ms 0.64% 3.603ms 72.055us 50 aten::max 1.40% 2.195ms 4.62% 7.260ms 72.597us 2.008ms 1.27% 7.140ms 71.404us 100 aten::is_nonzero 2.68% 4.207ms 11.35% 17.829ms 71.033us 4.062ms 2.57% 17.225ms 68.625us 251 aten::detach 1.17% 1.831ms 3.65% 5.736ms 57.360us 1.680ms 1.06% 5.634ms 56.340us 100 aten::mul 3.36% 5.278ms 3.36% 5.278ms 53.862us 5.215ms 3.30% 5.215ms 53.216us 98 aten::div 3.42% 5.376ms 3.42% 5.376ms 53.759us 5.320ms 3.36% 5.320ms 53.196us 100 aten::sub 6.79% 10.672ms 6.79% 10.672ms 53.901us 10.504ms 6.64% 10.504ms 53.050us 198 aten::item 4.06% 6.380ms 12.02% 18.883ms 53.798us 6.127ms 3.87% 18.322ms 52.198us 351 aten::add 3.28% 5.147ms 3.28% 5.147ms 52.518us 5.113ms 3.23% 5.113ms 52.171us 98 aten::minimum 1.63% 2.555ms 1.63% 2.555ms 51.092us 2.585ms 1.64% 2.585ms 51.708us 50 aten::maximum 3.22% 5.065ms 3.22% 5.065ms 50.646us 5.133ms 3.25% 5.133ms 51.329us 100 aten::round 1.61% 2.529ms 1.61% 2.529ms 50.578us 2.528ms 1.60% 2.528ms 50.552us 50 aten::zero_ 1.99% 3.125ms 4.72% 7.422ms 49.481us 2.835ms 1.79% 7.269ms 48.462us 150 aten::copy_ 6.62% 10.394ms 6.62% 10.394ms 41.576us 10.252ms 6.48% 10.252ms 41.010us 250 detach 2.49% 3.905ms 2.49% 3.905ms 39.049us 3.954ms 2.50% 3.954ms 39.539us 100 aten::select 2.01% 3.154ms 2.47% 3.876ms 38.759us 3.866ms 2.44% 3.866ms 38.658us 100 aten::_local_scalar_dense 7.96% 12.503ms 7.96% 12.503ms 35.621us 12.195ms 7.71% 12.195ms 34.743us 351 aten::to 2.31% 3.625ms 4.16% 6.530ms 32.650us 4.320ms 2.73% 6.270ms 31.348us 200 aten::fill_ 3.70% 5.808ms 3.70% 5.808ms 29.039us 5.892ms 3.73% 5.892ms 29.459us 200 aten::as_strided 0.79% 1.244ms 0.79% 1.244ms 6.221us 0.000us 0.00% 0.000us 0.000us 200 aten::empty 3.55% 5.579ms 3.55% 5.579ms 11.137us 0.000us 0.00% 0.000us 0.000us 501 aten::resize_ 2.36% 3.712ms 2.36% 3.712ms 12.332us 0.000us 0.00% 0.000us 0.000us 301 aten::empty_like 1.45% 2.284ms 3.68% 5.776ms 28.878us 0.000us 0.00% 0.000us 0.000us 200 aten::empty_strided 2.80% 4.398ms 2.80% 4.398ms 17.592us 0.000us 0.00% 0.000us 0.000us 250 --------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 157.108ms Self CUDA time total: 158.122ms ``` ===> FusedFakeQuant ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ fb::fused_fake_quant 23.42% 6.408ms 100.00% 27.361ms 547.215us 7.887ms 27.20% 28.996ms 579.925us 50 aten::fake_quantize_per_tensor_affine 4.25% 1.162ms 27.65% 7.565ms 151.298us 686.176us 2.37% 10.217ms 204.336us 50 aten::_fake_quantize_per_tensor_affine_cachemask_ten... 14.11% 3.860ms 23.40% 6.403ms 128.068us 9.531ms 32.87% 9.531ms 190.612us 50 aten::_aminmax 20.57% 5.628ms 27.47% 7.515ms 150.305us 8.218ms 28.34% 8.218ms 164.367us 50 aten::item 3.65% 999.522us 10.27% 2.810ms 56.202us 931.904us 3.21% 2.674ms 53.481us 50 aten::_local_scalar_dense 6.62% 1.811ms 6.62% 1.811ms 36.212us 1.742ms 6.01% 1.742ms 34.843us 50 aten::empty 10.85% 2.969ms 10.85% 2.969ms 14.843us 0.000us 0.00% 0.000us 0.000us 200 aten::as_strided 1.92% 524.365us 1.92% 524.365us 5.244us 0.000us 0.00% 0.000us 0.000us 100 aten::empty_like 6.48% 1.774ms 14.62% 4.000ms 26.670us 0.000us 0.00% 0.000us 0.000us 150 aten::empty_strided 8.14% 2.226ms 8.14% 2.226ms 14.842us 0.000us 0.00% 0.000us 0.000us 150 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 27.361ms Self CUDA time total: 28.996ms ``` Test Plan: python test/test_quantization.py TestFusedObsFakeQuantModule Imported from OSS Reviewed By: vkuzo Differential Revision: D29706889 fbshipit-source-id: ae3f9fb1fc559920459bf6e8663e8299bf7d21e1	2021-07-21 10:13:04 -07:00
Supriya Rao	afdca41bab	[quant] Add a new fused MovingAvg Obs + FakeQuant operator (GPU) (#61589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61589 Custom GPU implementation that does the observer + calculate qparams calculation on GPU. It calls the aten fake_quant_per_tensor/channel functions to perform the fake quant step. Test Plan: python test/test_quantization.py TestFusedObsFakeQuant Imported from OSS Reviewed By: vkuzo Differential Revision: D29682761 fbshipit-source-id: 373a50f88481b7e5b4d9e65d84a6c174bb277dd4	2021-07-21 10:13:02 -07:00
Supriya Rao	92d3391fb1	[quant] Add a new fused MovingAvg Obs + FakeQuant operator(CPU) (#61570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61570 Fused operator that computes moving average min/max values (in-place) of the input tensor and fake-quantizes it. It expects the qmin/qmax values to reflect the range of the quantized tensor (instead of reduce_range) Motivation for adding this operator is for performance reasons, since moving the computation from python to C++/CUDA can increase the performance of QAT. Test Plan: python test/test_quantization.py TestFusedObsFakeQuant Imported from OSS Reviewed By: vkuzo Differential Revision: D29682762 fbshipit-source-id: 28e4c50e77236d6976fe4b326c9a12103ed95840	2021-07-21 10:11:41 -07:00
Michael Carilli	403f59701c	Changes default DDP behavior to divide sparse grad by world size before allreduce, not after (#61814 ) Summary: I appreciate https://github.com/pytorch/pytorch/pull/61379, which restores the fusion of div-by-world-size and copy-to-allreduce-buffer for dense gradients. But i noticed in the wake of https://github.com/pytorch/pytorch/pull/61379 there's misaligned treatment of dense and sparse gradients. Specifically, dense gradients are dived by world size before the allreduce, and sparse gradients are dived by world size after the allreduce. On paper you wouldn't expect that to matter, but for cluster-scale DDP training with amp gradient scaling and allreduces of FP16 grads, we've noticed several cases where postdividing grads by world size caused nonconvergence while predividing worked. I'm not aware of any cases where the reverse was true. This PR changes the treatment of sparse gradients to match the treatment of dense gradients (both will be dived by world size before allreduce). Pull Request resolved: https://github.com/pytorch/pytorch/pull/61814 Reviewed By: mrshenli Differential Revision: D29772444 Pulled By: rohan-varma fbshipit-source-id: 033a17d5c019511889d908876282c6624fb26a2d	2021-07-21 09:54:53 -07:00
Thomas J. Fan	17d743ff04	ENH Adds test and docs for dropout for no batch dims (#61911 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 I think `Dropout` is already tested in `test_Dropout` for no batch dims. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61911 Reviewed By: albanD Differential Revision: D29810928 Pulled By: jbschlosser fbshipit-source-id: 7716a1a808e9e34aae43573f38706212552afbb4	2021-07-21 09:07:10 -07:00
Maksim Levental	06df33857b	fix adapative_avg_pool (#61851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61851 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29812559 Pulled By: makslevental fbshipit-source-id: ac54166aaec63992748ea3299c3144ee107b24f4	2021-07-21 08:42:26 -07:00
Vitaly Fedyunin	33db828e52	Revert D29647586: [jit] Renamed prim::Concat as prim::VarConcat Test Plan: revert-hammer Differential Revision: D29647586 (`db11619901`) Original commit changeset: cdd34ea5a3c9 fbshipit-source-id: bab5ac4ed67a00ac151fe39463aa3fb56897d7f4	2021-07-21 08:28:26 -07:00
Thomas J. Fan	48af9de92f	ENH Enables No-batch for *Pad1d Modules (#61060 ) Summary: Toward https://github.com/pytorch/pytorch/issues/60585 This PR adds a `single_batch_reference_fn` that uses the single batch implementation to check no-batch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61060 Reviewed By: mrshenli Differential Revision: D29739823 Pulled By: jbschlosser fbshipit-source-id: d90d88a3671177a647171801cc6ec7aa3df35482	2021-07-21 07:12:41 -07:00
Calvin McCarter	bdf439a958	Adds _LazyInstanceNorm and LazyInstanceNormXd (#60982 ) Summary: Signed-off-by: Calvin McCarter <calvin@lightmatter.co> Fixes https://github.com/pytorch/pytorch/issues/60981 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60982 Reviewed By: albanD Differential Revision: D29810547 Pulled By: jbschlosser fbshipit-source-id: d933d4c7fe5cf7be9b09a5ab93f740b94cf08cc1	2021-07-21 06:45:45 -07:00
Raghavan Raman	db11619901	[jit] Renamed prim::Concat as prim::VarConcat (#61498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61498 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D29647586 Pulled By: navahgar fbshipit-source-id: cdd34ea5a3c986350a813be17e7d428844ea4cbf	2021-07-20 19:30:00 -07:00
Raghavan Raman	7fbdc86aec	[jit] Removed a local function to check for dominators and used the one added to Node class (#60909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60909 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D29441864 Pulled By: navahgar fbshipit-source-id: 362bd462fa70256dd1f8b05756a76da0cb3d4b76	2021-07-20 19:29:58 -07:00
Raghavan Raman	429908e540	[jit] Updated the concat common inputs elimination pass to use the variadic cat op instead of aten::cat (#60908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60908 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D29441865 Pulled By: navahgar fbshipit-source-id: 2ab08168102eff1f43667ca418bdd94bb2df562a	2021-07-20 19:29:57 -07:00
Raghavan Raman	53668f8bf6	[jit] Added an API to remove list mutations and replace with variadic cat until fixed point (#60776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60776 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D29406099 Pulled By: navahgar fbshipit-source-id: e2e69eb6ebff3bc6e25d80f46ce118e52f557fb6	2021-07-20 19:29:55 -07:00
Raghavan Raman	0cfcf68aa5	[jit] Added special handling for prim::ListConstruct while checking for may alias inputs (#60775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60775 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29406101 Pulled By: navahgar fbshipit-source-id: 9b8a4050167750610400637e7de48ffa8727051a	2021-07-20 19:29:53 -07:00
Raghavan Raman	4dd04a8bbe	[jit] Handled cases when input list to cat is mutated after cat using AliasDb (#60774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60774 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29406100 Pulled By: navahgar fbshipit-source-id: af6afca65881c18c51b482eb63898a0f1c94d591	2021-07-20 19:28:42 -07:00
Nikita Shulga	604f503d30	Revert D29794958 + compilation fix (#61937 ) Summary: This PR un-reverts https://github.com/pytorch/pytorch/issues/61475 + fixes compilation with MSVC, that does not recognize alternative operator spellings (i.e. using `or` instead of `\|\|` ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61937 Reviewed By: albanD Differential Revision: D29805941 Pulled By: malfet fbshipit-source-id: 01e5963c6717c1b44b260300d87ba0bf57f26ce9	2021-07-20 18:14:45 -07:00
Eli Uriegas	a152c12d7b	.github: Clone pytorch to separate directory (#61932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61932 Clones pytorch to a separate directory for each run so that they do not overlap with each other Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: zhouzhuojie Differential Revision: D29801875 Pulled By: seemethere fbshipit-source-id: 71a3c7c949e5aeacf033ae1fc9aaef13b42833b6	2021-07-20 17:30:30 -07:00
Ivan Kobzarev	7cbb7c6d2e	[vulkan] Make vulkan ops selective (#58332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58332 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D28454976 Pulled By: IvanKobzarev fbshipit-source-id: 445c1f326be76e3530a4884aa5fe749d636e0ae5	2021-07-20 16:26:55 -07:00
Ivan Kobzarev	73fbf43684	[vulkan] Fix asserts (#61495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61495 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D29647357 Pulled By: IvanKobzarev fbshipit-source-id: cb4ba15f28625ea6e667883c9a2d31eba48b6f37	2021-07-20 16:07:13 -07:00
Nikita Shulga	22fff61f06	Revert D29794958: [pytorch][PR] changing trapz to trapezoid Test Plan: revert-hammer Differential Revision: D29794958 (`95cec8f4fa`) Original commit changeset: 60b9c07efd47 fbshipit-source-id: 2dcda2d62e01c2521a86ae5ed8246cfb686d3f64	2021-07-20 16:00:46 -07:00
Nikita Shulga	e067960243	lint_setup should not require elevated privileges (#61798 ) Summary: s/pip/pip3/ (because unversioned pip can reference either pip2 or pip3 depending on setup) Always invoke `pip install` with user option (otherwise, unless one is using conda environment, it will try to install in system folder, which should not be writable to regular users) Do not install shellcheck in `/usr/bin`, but rather rely on `~/.local/bin` and add it to the PATH Pull Request resolved: https://github.com/pytorch/pytorch/pull/61798 Reviewed By: zhouzhuojie, seemethere Differential Revision: D29747286 Pulled By: malfet fbshipit-source-id: 30cb51fe60b5096b758f430d1c51465205532a19	2021-07-20 15:53:12 -07:00
Marjan Fariborz	994434ad16	Adding complex number support for all_to_all/scatter (#61299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61299 Modifying all_to_all and scatter to support complex numbers as well as float numbers. Test Plan: buck run //caffe2/test/distributed:distributed_gloo_fork -- test_name --print-passing-details --run-disabled Reviewed By: wanchaol Differential Revision: D29563938 fbshipit-source-id: 59e436b3fa1aee3d5195cbcffd39587e642c76b9	2021-07-20 15:45:34 -07:00
rusty1s	457a0b63bf	use `torch.bucketize` in`to_sparse_csr` implementation (+ additional tests) (#61340 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57381 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61340 Reviewed By: bhosmer Differential Revision: D29601393 Pulled By: cpuhrsch fbshipit-source-id: 4ca1f013d96e8716f0e658e0cd685d9aa0d98a5c	2021-07-20 15:44:25 -07:00
Kevin Tse	95cec8f4fa	changing trapz to trapezoid (#61475 ) Summary: This PR resolves issue https://github.com/pytorch/pytorch/issues/52606 while also adding support for complex number Stack from [ghstack](https://github.com/ezyang/ghstack): * https://github.com/pytorch/pytorch/issues/61616 * https://github.com/pytorch/pytorch/issues/61615 * https://github.com/pytorch/pytorch/issues/61475 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61475 Reviewed By: mruberry Differential Revision: D29794958 Pulled By: NivekT fbshipit-source-id: 60b9c07efd47fd85b9c8178768fc7828d7b57d29	2021-07-20 15:25:55 -07:00
Jane (Yuan) Xu	86715623dd	Adding super calls to JIT test case setUp and tearDown (#61922 ) Summary: This issue was surfaced when adding this issue: https://github.com/pytorch/pytorch/issues/61655 did not manage to skip the appropriate test case. I then investigated and realized it was because the setUp code that does the test disabling is not called because another defined setUp overrode the parent class' setUp. I am not sure if that was intentional--if so we would have to adopt the child class' code to call the check_if_enable function in common_utils. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61922 Reviewed By: ejguan Differential Revision: D29798716 Pulled By: janeyx99 fbshipit-source-id: d31b664e48507d69de14574ff5e6ecf1d41ae24d	2021-07-20 15:08:44 -07:00
Hong Xu	7acb8b71e1	Remove AVX detection code that duplicates FindAVX.cmake (#61748 ) Summary: This PR deletes some code in `MiscCheck.cmake` that perform the exact same functionality as `FindAVX.cmake`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61748 Reviewed By: ejguan Differential Revision: D29791282 Pulled By: malfet fbshipit-source-id: 6595fd1b61c8ae12b821fad8c9a34892dd52d213	2021-07-20 14:34:36 -07:00
Howard Huang	e8d2916b84	Add faulty tensorpipe implementation (#61421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61421 This PR adds the faulty tensorpipe agent implementation and replaces all faulty process group agent tests with it. The faulty tensorpipe agent code is very similar to that of faulty process group agent. It allows the user to fail or delay certain types of rpc messages, which is used in the faulty agent tests. These changes are needed to deprecate the process group rpc backend. Summary of changes: - Add faulty tensorpipe agent class - Update tensorpipe pipeWrite function to allow to be overwritten and add delay - Update test backend registry and faulty agent tests to use the FAULTY_TENSORPIPE_AGENT backend. This effects all faulty agent tests, here a few of them as sample commands: `pytest test/distributed/rpc/test_faulty_agent.py -vs -k test_verify_backend_options` `pytest test/distributed/rpc/test_faulty_agent.py -vs -k test_no_faulty_messages` `pytest test/distributed/rpc/test_faulty_agent.py -vs -k test_builtin_remote_message_dropped_timeout` Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29773739 Pulled By: H-Huang fbshipit-source-id: 6b2bc366735d70b79943d4207f454bc9555bbf5f	2021-07-20 13:54:30 -07:00
Richard Barnes	d856914c57	Fix missing braces (#61745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61745 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29717538 fbshipit-source-id: ed0ff4fb6a72b701bf6d36ebde343672356a916a	2021-07-20 13:32:38 -07:00
Richard Barnes	f78142b68d	Modernize emplace (#61742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61742 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29717433 fbshipit-source-id: 93996388780862e90ab4e697508407091e8e763b	2021-07-20 13:31:19 -07:00
Han Guangyun	2c2a084012	approx 100x acceleration for parse_kineto_results (#60432 ) Summary: Fixes https://github.com/pytorch/kineto/issues/308, https://github.com/pytorch/pytorch/issues/58983 maybe related Pull Request resolved: https://github.com/pytorch/pytorch/pull/60432 Reviewed By: ilia-cher Differential Revision: D29715257 Pulled By: gdankel fbshipit-source-id: 7c94d1bb00b609f502db7aa9d9a447ab09645e6a	2021-07-20 13:21:49 -07:00
Elton Leander Pinto	4567a50b2a	Enable clang-tidy on master (#61689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61689 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29767984 Pulled By: 1ntEgr8 fbshipit-source-id: 658355da274ada41e01ed2772a03a701b90fbbab	2021-07-20 12:55:12 -07:00
mingfeima	8b88c24670	add channels last support for thnn_conv2d (non-dilated) (#49582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49582 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26007050 Pulled By: VitalyFedyunin fbshipit-source-id: 1289e0687c2459dd4eb8e4ba2efc8266397cfe5f	2021-07-20 12:50:24 -07:00
Elton Leander Pinto	91bc285084	Fix clang-tidy error in pre-commit script (#61918 ) Summary: Fixes a clang-tidy error in the git-pre-commit script. See log below for the error it fixes. ``` Running pre-commit flake8 Running pre-commit clang-tidy usage: clang_tidy [-h] [-e CLANG_TIDY_EXE] [-g GLOB] [-x REGEX] [-c COMPILE_COMMANDS_DIR] [--diff-file DIFF_FILE] [-p PATHS [PATHS ...]] [-n] [-v] [-q] [--config-file CONFIG_FILE] [--print-include-paths] [-I INCLUDE_DIR] [-s] [--disable-progress-bar] [extra_args [extra_args ...]] clang_tidy: error: unrecognized arguments: -j ``` It gets rid of the redundant binary check because `tools.linter.clang_tidy` already does this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61918 Test Plan: Run `tools/git-pre-commit`. It should not show a clang-tidy error. Reviewed By: driazati Differential Revision: D29796383 Pulled By: 1ntEgr8 fbshipit-source-id: b804b0170747f04e84d21e03d1c4985748d78cf2	2021-07-20 12:40:56 -07:00
Dmytro Dzhulgakov	f6446802c7	Revert D29783943: [pytorch][PR] add BFloat16 operators on CPU: arange, acosh, asinh, atanh, exp2, digamma, trigamma, polygamma Test Plan: revert-hammer Differential Revision: D29783943 (`513c40cb1a`) Original commit changeset: 40cebe829720 fbshipit-source-id: 5276dea572f1286dad7b7caa69ecc2f369ec13ff	2021-07-20 12:33:52 -07:00
Andrew Gu	c2cc6a9396	Add generic join unit tests (#61786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61786 This adds unit tests for the generic join context manager. ``` gpurun python test/distributed/algorithms/test_join.py ``` Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29746646 Pulled By: andwgu fbshipit-source-id: 2933d85783c2225574c4b77bfb90064690c6e668	2021-07-20 12:13:05 -07:00
Meghan Lele	1c80b5220b	`nll_loss_forward`: port to structured kernel (#61443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61443 For more information, see #55070. This PR also adds a new type, `OptionalTensorRef` as a replacement for `c10::optional<Tensor>&` in order to avoid the reference count manipulations that are inevitable with the latter. I have confirmed using Godbolt/Compiler Explorer that this class does indeed avoid manipulating the reference count of the `intrusive_ptr` inside the `Tensor` it refers to: 1. [P429709479](https://www.internalfb.com/phabricator/paste/view/P429709479) - Given a `const Tensor&` in scope, an `OptionalTensorRef` can be constructed without bumping refcount. 2. [P429709883](https://www.internalfb.com/phabricator/paste/view/P429709883) - Given an `OptionalTensorRef`, a `const Tensor&` can be produced without bumping refcount. 3. [P429710335](https://www.internalfb.com/phabricator/paste/view/P429710335) - When `OptionalTensorRef` is destructed, the refcount should not be decremented. 4. [P429769525](https://www.internalfb.com/phabricator/paste/view/P429769525) - `OptionalTensorRef` can be assigned without refcount manipulation. 5. [P429769882](https://www.internalfb.com/phabricator/paste/view/P429769882) - `OptionalTensorRef` can be move assigned without refcount manipulation. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29780666 Pulled By: SplitInfinity fbshipit-source-id: 7af157215300e9254d635433cbd583f7329fe064	2021-07-20 11:45:44 -07:00
Zhengxu Chen	f0df0207ec	[jit] Arithmetic simplification for integers. (#61444 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61444 Add a mini pass to merge arithmetic nodes like (((x - 1) + 2) * 1) - 1. Issue #60913 Test Plan: python test/test_jit.py TestPeephole.test_peephole_arith Imported from OSS Reviewed By: eellison Differential Revision: D29630614 fbshipit-source-id: 08ac64cee39070401f9ff9163d309f20ff53c5ac	2021-07-20 11:35:42 -07:00
Pritam Damania	d2abfc547b	Add ShardedTensorMetadata for ShardedTensor. (#61683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61683 This PR adds a consolidated metadata field (ShardedTensorMetadata) which has all the necessary global metadata for a ShardedTensor. ghstack-source-id: 133847517 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: wanchaol Differential Revision: D29703719 fbshipit-source-id: 567279e46c787a88ef3310e4dce6fd2ad7631c62	2021-07-20 11:28:13 -07:00
Kurt Mohler	87334c40a7	Remove torch._bmm and remove torch.bmm deterministic arg documentation (#61629 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61571 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61629 Reviewed By: mrshenli Differential Revision: D29774486 Pulled By: albanD fbshipit-source-id: bfc9119c478f0244d5be681bcf4954a3eb97e542	2021-07-20 10:55:43 -07:00
jiayisun	513c40cb1a	add BFloat16 operators on CPU: arange, acosh, asinh, atanh, exp2, digamma, trigamma, polygamma (#60444 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60444 Reviewed By: ejguan Differential Revision: D29783943 Pulled By: ezyang fbshipit-source-id: 40cebe8297207669d1ca430ed1d1e81dda5a0c45	2021-07-20 10:30:04 -07:00
Xiong Wei	45751e0b34	Support integral target for the backward of nn.SmoothL1Loss (#61112 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58816 - enhance the backward of `nn.SmoothL1Loss` to allow integral `target` - add test cases in `test_nn.py` to check the `input.grad` between the integral input and its floating counterpart. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61112 Reviewed By: mrshenli Differential Revision: D29775660 Pulled By: albanD fbshipit-source-id: 544eabb6ce1ea13e1e79f8f18c70f148e92be508	2021-07-20 10:24:03 -07:00
Richard Barnes	59a5312ce6	Modernize fix deprecated header (#61736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61736 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29716965 fbshipit-source-id: 314c2b557c240ac16bbfab114ab764beb189e78a	2021-07-20 10:06:11 -07:00
Richard Barnes	5a04bd8723	Modernize some loops in torch (#61737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61737 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29716813 fbshipit-source-id: 21f9716bead4e0e913406e681c55d1956327e6af	2021-07-20 10:04:54 -07:00
lezcano	65616184bc	[Docs] Bundle of errata and small corrections / improvements for torch.linalg docs (#61578 ) Summary: This PR bundles a number of errata detected in the linalg docs over the last few weeks. - Simpler Cholesky deprecation rule - Remove repeated consecutive words - Correct cond with rcond in lstsq - Correct examples of lstsq - More concise examples - Use the names of the inputs / outputs in the variables of the examples Pull Request resolved: https://github.com/pytorch/pytorch/pull/61578 Reviewed By: mrshenli Differential Revision: D29757988 Pulled By: mruberry fbshipit-source-id: a740a64826c065c1d7c1b8b498364d147008d76d	2021-07-20 09:58:09 -07:00
Freey0	a0c9d70fba	bitwise_and: Port to structured (#60813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60813 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29449374 Pulled By: ezyang fbshipit-source-id: d7e236ad841dcb9d5914352d117a34b10894bb91	2021-07-20 09:01:41 -07:00
Freey0	875d63ed04	bitwise_xor: Port to structured (#60812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60812 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29449372 Pulled By: ezyang fbshipit-source-id: 016d2012f64486c2490ff319e753b0d054dccf2c	2021-07-20 09:01:40 -07:00
Freey0	ce8aeefbf4	bitwise_or: Port to strucutred (#60811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60811 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29449370 Pulled By: ezyang fbshipit-source-id: ac176985b0141a55807ba909d7342eb35b1dc28f	2021-07-20 09:00:20 -07:00
johnlu	f59ac5abc8	Add thread local state guards in autograd engine hooks. (#60067 ) Summary: The thread local state of backward thread is not aligned to the GraphTask's `thread_local_` when calling the hooks in backward. This is required for profiling the statistics c10d operation of `DistributedDataParallel` module. Is there any concern to add the thread local state guard when calling the hooks in backward? ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/60067 Reviewed By: ezyang Differential Revision: D29654599 Pulled By: albanD fbshipit-source-id: 656c4f91017184fd40f1a184de24757a13387e37	2021-07-20 07:41:49 -07:00
Jiewen Tan	641f6ef8a7	Implement IMethod::getArgumentNames() (#61856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61856 This diff did the following few things: 1. It implemented IMethod::getArgumentNames() for all IMethod's subclasses. 2. It refactors PyTorchDeployPredictor to use IMethod for model executions. Test Plan: [... ~/fbsource/fbcode/caffe2] buck test mode/dev caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor [... ~/fbsource/fbcode/caffe2] buck test mode/dev caffe2/fb/predictor:pytorch_predictor_test -- PyTorchPredictor Reviewed By: wconstab Differential Revision: D29648756 fbshipit-source-id: e047345f26ce495a5d74d8063f7f8edc32a1b13c	2021-07-19 23:16:48 -07:00
Eddie Yan	42d6543c7b	[bc-breaking] Dispatch index_put with boolean mask argument to masked_fill (#61612 ) Summary: https://github.com/pytorch/pytorch/issues/57515 Based on ngimel 's branch, with a few tweaks to determine when to copy value tensors to device memory/additional tests. bc-breaking note: Previously, if in `x[index]=value` `value` was a 0-d tensor with device different from `x`'s device, it resulted in a RuntimeError. Now this case is handled by copying `value` to the correct device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61612 Reviewed By: mrshenli Differential Revision: D29753491 Pulled By: ngimel fbshipit-source-id: 3fba14f4c2b9b136b50af020f9c1eda88f7373b0	2021-07-19 22:53:14 -07:00
Peter Bell	018dc4193e	Factor vector intrinsics out of SumKernel.cpp (#61483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61483 This will make it simpler to support AVX512 which is upcoming in #56992, see https://github.com/pytorch/pytorch/pull/56992#discussion_r667060280 for reference. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29753536 Pulled By: ngimel fbshipit-source-id: 03ae66cdc01a3679c67214468e2bdf93b15c3bc2	2021-07-19 21:49:01 -07:00
Peter Bell	c44d9d9f70	Use cascade-summation to improve nansum accuracy (#61082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61082 Fixes #59415 This implements nansum as a new `LoadPolicy` for the existing sum functions. So, it's using the more accurate cascade-sum algorithm. I've also expanded `test_nansum` to cover the four special cases of the sum algorithm (inner/outer reduction; vectorized or scalar). Nansum performance comparison ----------------------------- For float sums, contiguous reductions are as much as 10x faster and discontiguous sums are ~1.8x faster (more for small shapes due to TensorIterator overheads). \| Shape \| Dim \| Master Contiguous (us) \| This PR Contiguous (us) \| Master Discontiguous (us) \| This PR Discontiguous (us) \| \|-------------:\|-----\|:----------------------:\|:-----------------------:\|:-------------------------:\|:--------------------------:\| \| 10, 1000 \| 0 \| 74.9 \| 2.02 \| 75.6 \| 6.41 \| \| \| 1 \| 8.24 \| 1.8 \| 8.28 \| 5.24 \| \| 100, 1000 \| 0 \| 134 \| 7.55 \| 130 \| 43.2 \| \| \| 1 \| 70.5 \| 7.01 \| 71.5 \| 40.6 \| \| 1000, 1000 \| 0 \| 726 \| 69.2 \| 737 \| 403 \| \| \| 1 \| 702 \| 51.0 \| 709 \| 404 \| \| 10000, 1000 \| 0 \| 15,300 \| 2,470 \| 18,200 \| 10,400 \| \| \| 1 \| 7,200 \| 1,160 \| 7,470 \| 4,440 \| \| 100000, 1000 \| 0 \| 163,000 \| 28,000 \| 199,000 \| 131,000 \| \| \| 1 \| 70,700 \| 13,500 \| 75,700 \| 44,200 \| Sum performace comparison ------------------------- For float sums, performance is unchanged to within measurement precision: \| Shape \| Dim \| Master Contiguous (us) \| This PR Contiguous (us) \| Master Discontiguous (us) \| This PR Discontiguous (us) \| \|-------------:\|-----\|:----------------------:\|:-----------------------:\|:-------------------------:\|:--------------------------:\| \| 10, 1000 \| 0 \| 1.92 \| 2.01 \| 4.2 \| 4.49 \| \| \| 1 \| 1.68 \| 1.68 \| 2.79 \| 2.75 \| \| 100, 1000 \| 0 \| 6.52 \| 7.07 \| 26.9 \| 27.3 \| \| \| 1 \| 5.91 \| 5.66 \| 16.8 \| 16.9 \| \| 1000, 1000 \| 0 \| 55.6 \| 58.6 \| 256 \| 254 \| \| \| 1 \| 41.0 \| 41.2 \| 150 \| 147 \| \| 10000, 1000 \| 0 \| 1,370 \| 1,650 \| 8,070 \| 8,020 \| \| \| 1 \| 908 \| 845 \| 3,100 \| 2,980 \| \| 100000, 1000 \| 0 \| 24,700 \| 24,700 \| 90,900 \| 91,000 \| \| \| 1 \| 12,500 \| 12,100 \| 31,500 \| 31,800 \| Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29753523 Pulled By: ngimel fbshipit-source-id: 28095ac39e4a07ff878775c98f7a7815d9a4e457	2021-07-19 21:47:43 -07:00
Freey0	bf1c9aaa79	logit_backward: Port to structured (#60817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60817 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29449376 Pulled By: ezyang fbshipit-source-id: e6f793300488370f50a97db58f0400c557ee64e5	2021-07-19 21:23:05 -07:00
Freey0	b8686b42d8	tanh_backward: Port to structured (#60816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60816 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29449375 Pulled By: ezyang fbshipit-source-id: 93b70341fc6a2a42056fef74d6e5d81ec34e9da2	2021-07-19 21:23:03 -07:00
Freey0	8c42d7ad07	sigmoid_backward: Port to structured (#60815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60815 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29449371 Pulled By: ezyang fbshipit-source-id: e68c05cc90446e86d50b67d8346f145bf13ed207	2021-07-19 21:23:01 -07:00
Freey0	11cc179366	xlogy: Port to structured (#60814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60814 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29449373 Pulled By: ezyang fbshipit-source-id: a37499cd4fabff80f848627def7dd500364b8a22	2021-07-19 21:21:54 -07:00
Michael Carilli	9fb6b40f3e	Makes a streaming backward test try gradient stealing more directly (#60065 ) Summary: Closes https://github.com/pytorch/pytorch/issues/59846. https://github.com/pytorch/pytorch/issues/59846 is likely paranoia, and some of the test_streaming_backward_* in test_cuda.py already use gradient stealing (ie, they start with `.grad`s as None before backward). Regardless, this PR augments one of the tests to stress gradient stealing a bit more directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60065 Reviewed By: mrshenli Differential Revision: D29779518 Pulled By: ngimel fbshipit-source-id: ccbf278543c3adebe5f4ba0365b1dace9a14da9b	2021-07-19 20:39:55 -07:00
Jerry Cai	873cc7a46d	Support 3 argument variant of the getattr() call where the third arg is the default return value (#61599 ) Summary: Issue: https://github.com/pytorch/pytorch/issues/56909 Note the emitted code for such a call will either be a) getattr() call with first two args if the attribute name (which must be a string literal) is determined to be valid based on the hasAttr() result, or b) just the AST node for the default value (the 3rd arg) alone with no getattr call at all. Test code: ``` import torch import numpy as np class Shape: def __init__(self): self.center = 1.0 def f(x): s = Shape() return getattr(s, "missing", []) y = torch.jit.script(f) print(y.graph) ``` Output: ``` graph(%x : Tensor): %s.1 : __torch__.Shape = prim::CreateObject() %2 : NoneType = prim::CallMethod[name="__init__"](%s.1) # ts.py:10:8 %4 : Tensor[] = prim::ListConstruct() return (%4) ``` Another example: ``` import torch class Shape: def __init__(self): self.center = 1.0 def f(x): s = Shape() y = getattr(s, "center") w : list[float] = [1.0] z = getattr(s, "missing", w) z.append(y) return z y = torch.jit.script(f) print(y.graph) --- output --- graph(%x : Tensor): %5 : float = prim::Constant[value=1.]() # ts.py:12:23 %s.1 : __torch__.Shape = prim::CreateObject() %2 : NoneType = prim::CallMethod[name="__init__"](%s.1) # ts.py:10:8 %center : float = prim::GetAttr[name="center"](%s.1) %w.1 : float[] = prim::ListConstruct(%5) %11 : float[] = aten::append(%w.1, %center) # ts.py:14:4 return (%w.1) ``` Fixes #{56969} Pull Request resolved: https://github.com/pytorch/pytorch/pull/61599 Reviewed By: ZolotukhinM Differential Revision: D29776058 Pulled By: jerryzhenleicai fbshipit-source-id: 76333bd54002e08a064677c1f287115a80cc7c8e	2021-07-19 20:04:21 -07:00
Michael Carilli	ffd2e602f4	[CUDA graphs] Make sure graph mempool cudaMalloc_count decrement pairs with cudaFree for all allocations (#61567 ) Summary: Graphs mempools aren't deleted until all their allocations are cudaFreed. `PrivatePool::cudaMalloc_count` tracks the number of outstanding (not-yet-cudaFreed) allocations. https://github.com/pytorch/pytorch/pull/44742 moves cudaFree to [release_block](https://github.com/pytorch/pytorch/pull/44742/files#diff-acc6337586bf9cdcf0a684380779300ec171897d05b8569bf439820dc8c93bd5R1160), while the `cudaMalloc_count` decrement (if needed) remains in a caller ([release_blocks](https://github.com/pytorch/pytorch/pull/44742/files#diff-acc6337586bf9cdcf0a684380779300ec171897d05b8569bf439820dc8c93bd5R1177)). But I noticed there's also a path ([release_available_cached_blocks](https://github.com/pytorch/pytorch/pull/44742/files#diff-acc6337586bf9cdcf0a684380779300ec171897d05b8569bf439820dc8c93bd5R1094)) that calls `release_block` without calling `release_blocks`, in other words, it calls cudaFree but dodges any potential `cudaMalloc_count` decrement. In practice, the way the code is currently organized, I don't _think_ this second path can cause the pool to become a zombie whose `cudaMalloc_count` will never reach zero (I think this could only happen if you call `release_available_cached_blocks` on a private pool, and the only way it would be called on a private pool is if capture is underway, and if capture is underway, the cudaFree call will hard error). Regardless, I feel much more comfortable keeping the cudaMalloc_count decrement right next to the cudaFree. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61567 Reviewed By: mrshenli Differential Revision: D29765198 Pulled By: ezyang fbshipit-source-id: bcbeed656c3e0d101112aa470d8a098c73a011b1	2021-07-19 19:22:18 -07:00
Yukio Siraichi	208d06ca8c	Port other comparison ops: `ne`, `lt`, `gt`, `le`, `ge` to structured kernels. (#60942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60942 Tracking Issue: #55070 This PR applies the same transformation of `eq` to the other comparison ops: `ne`, `lt`, `gt`, `le`, and `ge`. Macros for crating meta and impl functions are used (since the checks they have are the same). Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29509868 Pulled By: ezyang fbshipit-source-id: 6a1ed1d93d08884c9e09d3f419037533a235d68c	2021-07-19 19:14:12 -07:00
Yukio Siraichi	97327137ba	Port `eq` kernel to structured kernels. (#60177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60177 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29509871 Pulled By: ezyang fbshipit-source-id: ad81bb49c46edc81c705d12108b98c5ffaaddf92	2021-07-19 19:13:09 -07:00
Stephen Jia	64ac428889	[vulkan] Adaptive local work group size (#61170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61170 Instead of using a fixed local work group size of {4,4,4}, adjust the size based on the global size in order to minimize the number of inactive invocations. ## Perf improvements from this change On aloha portal devices, in conjunction with the below diff that tweaks the conv2d_pw shader to calculate a 4x4 output, benchmark latency of the xirp14b model was reduced from ~8.7 ms to ~6.6 ms. Test Plan: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` Reviewed By: IvanKobzarev Differential Revision: D28724591 fbshipit-source-id: ede896300b2be1a9578e492cb870121012886aa7	2021-07-19 18:52:19 -07:00
Stephen Jia	f324421d34	[vulkan] Calculate a 4x4 output tile for each invocation in conv2d_pw (#60760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60760 A simple optimization to the `conv2d_pw` shader that makes each invocation calculate a 4x4 output tile instead of a single output texel. This results in better memory reuse and subsequently a pretty significant performance win for models similar to the MobileNets. ## Perf improvements from this change On aloha portal devices, in conjunction with the above diff that introduces adaptive work group sizes, benchmark latency of the xirp14b model was reduced from ~8.7 ms to ~6.6 ms. Test Plan: Test vulkan ops: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` Reviewed By: IvanKobzarev Differential Revision: D28724590 fbshipit-source-id: e742286b01bf566dc6378677be55409b7faa8cfb	2021-07-19 18:52:18 -07:00
Stephen Jia	a1b5025ecd	[vulkan] Convolution op cleanup (#60759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60759 Remove unused convolution implementations and refactor convolution op code to make this file easier to maintain. Test Plan: Test vulkan ops: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` Reviewed By: IvanKobzarev Differential Revision: D28724592 fbshipit-source-id: cb509fa1cd68089f78188bfb3c866aabc9b0cbdb	2021-07-19 18:52:16 -07:00
Stephen Jia	cacab7e9d6	[vulkan] Reduce submission rate to save CPU cycles (#60758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60758 Further tweak the submission rate of ops. Before in D28293756 (`bc0965ac85`), the submission rate was set as high as possible in order to prioritize performance. However, in practice (i.e. when running the model in an app) the high rate of submission increases CPU usage and increases GPU contention which may regress fps. In the future it would be beneficial to devise a scheme to adaptively set the GPU submission rate. ## Perf Improvements This change doesn't really affect benchmark latency. However, through systraces it can be observed that CPU usage is reduced without too much impact on FPS/model latency. Test Plan: Test vulkan ops: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` Reviewed By: IvanKobzarev Differential Revision: D29062836 fbshipit-source-id: 1a0f42b49fecb80baee08cb3f1048bb35a1b5d5c	2021-07-19 18:51:04 -07:00
Michael Suo	554038c2a2	[package] merge test_torchscript into test_package_script (#61807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61807 These shouldn't be separate files, they test the same thing Differential Revision: D29748967 D29748967 Test Plan: Imported from OSS Reviewed By: Lilyjjo Pulled By: suo fbshipit-source-id: 177f40fa460d00d064dfd1f33a0b6656b214a296	2021-07-19 18:23:45 -07:00
Michael Suo	f02cfcc802	ban PyTorchStreamWriter from writing the same file twice (#61805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61805 Similar in spirit to https://github.com/pytorch/pytorch/pull/61371. While writing two files with the same name is allowed by the ZIP format, most tools (including our own) handle this poorly. Previously I banned this within `PackageExporter`, but that doesn't cover other uses of the zip format like TorchScript. Given that there are no valid use cases and debugging issues caused by multiple file writes is fiendishly difficult, banning this behavior enitrely. Differential Revision: D29748968 D29748968 Test Plan: Imported from OSS Reviewed By: Lilyjjo Pulled By: suo fbshipit-source-id: 0afee1506c59c0f283ef41e4be562f9c22f21023	2021-07-19 18:23:43 -07:00
Michael Suo	04043d681e	[package] fix storage serialization collision (#61806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61806 Currently, if you do `save_pickle` on a ScriptModule, then `save_pickle` on a tensor, this would result in a `0.storage` tensor being written twice to the zip archive. This would cause weird bugs on the serializing side (this presented as a ASAN-detected heap buffer overflow because we tried to read more memory from a tensor than we actually had). Turns out this was because when we did: ``` self.storage_context = self.script_module_serializer.storage_context() ``` it returned a new copy of the storage context, so we weren't actually assigning unique names to tensors!! This PR fixes the issue by making `(De)SerializationStorageContext` non-copyable and fixing up the parts of the bindings that returned by copy. Differential Revision: D29748969 D29748969 Test Plan: Imported from OSS Reviewed By: Lilyjjo Pulled By: suo fbshipit-source-id: c2f89ab270e07e7a111fb35c545b5e07b804dc3c	2021-07-19 18:22:36 -07:00
jiayisun	c30048fccf	add BFloat16 support for topk on CPU (#59547 ) Summary: Added BFloat16 support for topk on CPU, and collected the benchmark data of topk for BFloat16 and Float32 data type by using the operator_benchmark tool of PyTorch on the platform of Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz Input: 512x512, 512x1024, 1024x512, 1024x1024 K: 5 Number of cores: 1 core, 28 cores(1 socket) For 1 core: ---------------------------------------- PyTorch/Caffe2 Operator Micro-benchmarks ---------------------------------------- Tag : all Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W512_k5_dtypetorch.float32_cpu Input: H: 512, W: 512, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 911.401 Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W512_k5_dtypetorch.bfloat16_cpu Input: H: 512, W: 512, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 911.700 Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W1024_k5_dtypetorch.float32_cpu Input: H: 512, W: 1024, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 1506.927 Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W1024_k5_dtypetorch.bfloat16_cpu Input: H: 512, W: 1024, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 1492.036 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W512_k5_dtypetorch.float32_cpu Input: H: 1024, W: 512, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 1825.634 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W512_k5_dtypetorch.bfloat16_cpu Input: H: 1024, W: 512, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 1819.872 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W1024_k5_dtypetorch.float32_cpu Input: H: 1024, W: 1024, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 3001.459 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W1024_k5_dtypetorch.bfloat16_cpu Input: H: 1024, W: 1024, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 2970.718 For 28 cores(1 socket): ---------------------------------------- PyTorch/Caffe2 Operator Micro-benchmarks ---------------------------------------- Tag : all Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W512_k5_dtypetorch.float32_cpu Input: H: 512, W: 512, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 146.995 Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W512_k5_dtypetorch.bfloat16_cpu Input: H: 512, W: 512, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 123.423 Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W1024_k5_dtypetorch.float32_cpu Input: H: 512, W: 1024, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 105.967 Benchmarking PyTorch: topk Mode: Eager Name: topk_H512_W1024_k5_dtypetorch.bfloat16_cpu Input: H: 512, W: 1024, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 101.498 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W512_k5_dtypetorch.float32_cpu Input: H: 1024, W: 512, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 128.023 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W512_k5_dtypetorch.bfloat16_cpu Input: H: 1024, W: 512, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 125.172 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W1024_k5_dtypetorch.float32_cpu Input: H: 1024, W: 1024, k: 5, dtype: torch.float32, device: cpu Forward Execution Time (us) : 129.855 Benchmarking PyTorch: topk Mode: Eager Name: topk_H1024_W1024_k5_dtypetorch.bfloat16_cpu Input: H: 1024, W: 1024, k: 5, dtype: torch.bfloat16, device: cpu Forward Execution Time (us) : 124.556 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59547 Reviewed By: mrshenli Differential Revision: D29763916 Pulled By: ezyang fbshipit-source-id: 706c7d4349ac9ebd5d63f4844fca70febcb67023	2021-07-19 16:06:24 -07:00
Jeff Daily	15210f3b82	ignore and clear not ready errors (#61554 ) Summary: Follow-up to https://github.com/pytorch/pytorch/issues/18584. This PR covers the remaining places where event or stream query might result in not ready errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61554 Reviewed By: mrshenli Differential Revision: D29763973 Pulled By: ezyang fbshipit-source-id: 41d988d1826b2309cc6b01a81144094b353abdf9	2021-07-19 16:03:04 -07:00
Jane Xu	e68c016871	Regenerate libtorch workflow files that got lost in merge conflict (#61872 ) Summary: Forward fixes merge conflict on master: https://github.com/pytorch/pytorch/runs/3106027618 for PR https://github.com/pytorch/pytorch/issues/61774 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61872 Reviewed By: dzhulgakov Differential Revision: D29775595 Pulled By: janeyx99 fbshipit-source-id: 8194dd123f166fd5f3fd1e77417e865c188f40c8	2021-07-19 15:30:13 -07:00
ndkshr	0a6d88244b	Fix grammatical errors on the PyTorch Contribution Guide (#61818 ) Summary: ## What does the PR do? - Fix grammatical errors on the PyTorch Contribution Guide page. ## Changes [Screenshots] > Note: > 1. The changes are highlighted in each screenshot. > 2. Could not load CSS while testing locally, hope that is not an issue as all the changes are made on the content. 1. ![Change1](https://user-images.githubusercontent.com/20442648/126077764-39fd8b78-524f-407d-bc39-c93167bd10a7.PNG) 2. ![Change2](https://user-images.githubusercontent.com/20442648/126077766-9dd7dc61-ef06-41d0-a7e5-cfd179ece0cd.PNG) 3. ![Change3](https://user-images.githubusercontent.com/20442648/126077767-2c2e05e4-09fc-403a-a18e-9b108651a5f8.PNG) 4. ![Change4](https://user-images.githubusercontent.com/20442648/126077769-ad755db6-3afa-457b-b95c-9f6c6281f828.PNG) 5. ![Change5](https://user-images.githubusercontent.com/20442648/126077770-a7759dee-7f90-4b9e-a07c-4dec4ca934d0.PNG) 6. ![Change6](https://user-images.githubusercontent.com/20442648/126077772-0474e58d-c0c8-4156-b56f-808d225c38e7.PNG) 7. ![Change7](https://user-images.githubusercontent.com/20442648/126077774-d48382a7-5379-49a4-a8d2-b478fabf0bf0.PNG) 8. ![Change8](https://user-images.githubusercontent.com/20442648/126077777-fd743825-8dd7-4cb9-a22c-233e5fa085a6.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61818 Reviewed By: dzhulgakov Differential Revision: D29775606 Pulled By: mrshenli fbshipit-source-id: 3f3bfdeede341f784b72dfe55da9ba8bdce1192a	2021-07-19 15:06:22 -07:00
Xue Haotian	43c5dc40c5	Port `signbit` to structured kernel (#57936 ) Summary: Port signbit to structured kernel Related https://github.com/pytorch/pytorch/issues/55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57936 Reviewed By: mrshenli Differential Revision: D29764904 Pulled By: ezyang fbshipit-source-id: 758f5f085d0cc84af612726f667cde15d615053b	2021-07-19 15:03:10 -07:00
Dmytro Dzhulgakov	44d3267103	Remote whitespace introduced by #61438 (#61863 ) Summary: Since it's a one-character change it feels faster to fix than revert Verified with `(! git --no-pager grep -In '[[:blank:]]$' -- . ':(exclude)/contrib/' ':(exclude)third_party' \|\| (echo "The above lines have trailing spaces; please remove them"; false))` from the lint check Pull Request resolved: https://github.com/pytorch/pytorch/pull/61863 Reviewed By: ZolotukhinM Differential Revision: D29772353 Pulled By: dzhulgakov fbshipit-source-id: 33cb887f25e344b420f645a8e4dc8d0d7462e9ef	2021-07-19 14:57:10 -07:00
Richard Zou	26d17ddc9f	Exclude wrapper tensors from functorch in the native::resize_output fastpath (#61846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61846 Related to #61485. native::resize_output has a fast path that avoids dispatching. Unfortunately, we have a number of CompositeImplicitAutograd operations that directly call out= variants of operators. These CompositeImplicitAutograd operators (e.g. torch.linalg.norm) end up calling native::resize_output. That function, combined with how functorch uses a mode-dispatch key to wrap tensors, causes silently incorrect behavior in functorch (more details are available in #61485). The very easy short-term fix is to have `native::resize_output` always dispatch on a Tensor (and skip the fast-path) if a Tensor is a functorch wrapped Tensor. More long-term fixes are proposed in the issue. Test Plan: - I checked that this change fixes torch.linalg.norm and other operators with this problem in functorch. - We're not testing functorch in pytorch/pytorch CI but we probably will in the near future. - wait for PyTorch tests. Reviewed By: ezyang Differential Revision: D29764293 Pulled By: zou3519 fbshipit-source-id: c7afcb0bd3bc77d2ba716d5b11f62830d8bdf0a9	2021-07-19 13:50:37 -07:00
Hong Xu	f912889726	Remove unnecessary Ubuntu version checks (#61738 ) Summary: PR https://github.com/pytorch/pytorch/issues/5401 missed another Ubuntu version check in `cmake/MiscCheck.cmake`. The check for available functions added by https://github.com/pytorch/pytorch/issues/5401 are already present below the code snippet that this PR deletes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61738 Reviewed By: mrshenli Differential Revision: D29757525 Pulled By: ezyang fbshipit-source-id: 7f5f9312284973481a8b8a2b9c51cc09774722e9	2021-07-19 13:04:24 -07:00
soulitzer	1b0a7f3887	Always use fast gradcheck for LayerNorm 3d_no_affine_large_feature (#61848 ) Summary: Due to the introduction of a test from https://github.com/pytorch/pytorch/pull/59987/files, slow gradcheck has been failing intermittently (timing out/getting killed). Pull Request resolved: https://github.com/pytorch/pytorch/pull/61848 Reviewed By: mrshenli Differential Revision: D29765773 Pulled By: soulitzer fbshipit-source-id: d78bee758cab76f26ba9f54925c42d4825db9449	2021-07-19 12:33:55 -07:00
Kaige Liu	094abf5fd0	[BE] Include a unit test for Save Operator with db_options Summary: A test case that triggers db_options with the save operator is missing. Test Plan: buck test Differential Revision: D29642719 fbshipit-source-id: 72b7374d40430398abac26dfe91538550525384d	2021-07-19 12:22:59 -07:00
Richard Barnes	e389650f10	Upgrade CPUFallback for loops (#61722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61722 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29715862 fbshipit-source-id: 21e12c71e28e542abc649890f72938801d9d7d7a	2021-07-19 11:27:26 -07:00
Rohan Varma	04bd9d7577	[DDP] Add API to get model parameters in hook (#61637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61637 To support running optimizer as a communication hook, add API to retrieve the model parameters. The API returns a `dict[idx -> tensor]` where `idx` is the intra bucket index of gradient tensor and thus the same index of `perParameterTensors`. The API can be used as follows to retrieve the model parameters: ``` per_param_grad_tensors = bucket.get_per_parameter_tensors() idx_to_model_params = bucket.get_grad_index_to_variable_mapping() for grad_tensor_idx, model_param in idx_to_model_params.items(): self.assertEqual(model_param.grad, per_param_grad_tensors[grad_tensor_idx]) ``` This provides a way for comm. hook developer to retrieve model parameters within a hook. In the next diffs, we will use this to run optimizer as a DDP comm. hook. ghstack-source-id: 133768666 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29691418 fbshipit-source-id: 4bfa824768a5850f73ee330017e2bcc29ceb7edc	2021-07-19 11:24:54 -07:00
Elton Leander Pinto	66c8d21d7b	Update progress and error reporting in clang-tidy (#61672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61672 This PR adds a progress bar to clang-tidy, and updates how it threads error codes (when run in parallel). The progress bar is disabled on GHA because backspace escape codes are not supported. It also adds a `--quiet` flag to the script. Screenshot of progress bar: <img width="955" alt="Screen Shot 2021-07-14 at 3 17 11 PM" src="https://user-images.githubusercontent.com/40111357/125686114-a8a7c154-3e65-43a8-aa8f-c1fb14d51d27.png"> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29763848 Pulled By: 1ntEgr8 fbshipit-source-id: cbd352593b279f279911bc3bb8d5ed54abd5f1d5	2021-07-19 11:19:06 -07:00
Thomas J. Fan	24a6eb3fda	ENH Adds tests and docs for 2d & 3d modules that already support no batch (#61262 ) Summary: Toward https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61262 Reviewed By: mrshenli Differential Revision: D29660554 Pulled By: jbschlosser fbshipit-source-id: d5e3dc7096fcf8621bce4a1063d521b84092e0ca	2021-07-19 11:12:28 -07:00
XiaobingSuper	4f46943e3d	enable check trace when tracing a mkldnn model (#61241 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43039, when tracing a MKLDNN model with setting check_trace=True, there has an error: RuntimeError: unsupported memory format option Preserve, this PR is to solve this problem. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61241 Reviewed By: anjali411 Differential Revision: D29737365 Pulled By: suo fbshipit-source-id: e8f7f124bc6256f10b9d29969e0c65d332514625	2021-07-19 11:03:53 -07:00
Freey0	75b68def63	fmin has been ported to the structured kernel, removing the old implementation (#60810 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60810 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29449377 Pulled By: ezyang fbshipit-source-id: 0b43562d0dfe81dfa401268f1d12e0d2c3c9f420	2021-07-19 10:20:06 -07:00
Freey0	b526080d89	fmod: Port to structured (#60809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60809 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29449378 Pulled By: ezyang fbshipit-source-id: 70f6fa95988f753eec4aefa60a60dddb7f3d744e	2021-07-19 10:18:57 -07:00
michael	b65ddef000	for shared-memory handles, use an atomic counter, instead of potentially colliding random numbers (#60978 ) Summary: These handles, used for shared-memory tensors, can collide. E.g. see https://github.com/pytorch/pytorch/issues/60626#issuecomment-869919018 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60978 Reviewed By: mruberry Differential Revision: D29479291 Pulled By: ezyang fbshipit-source-id: 408ef1817768f007ad4795b286482809ea43467c	2021-07-19 09:56:43 -07:00
zhouzhuojie	ac5a40e068	Fix benchmark's import module and remove its usage of tools.stats.scribe (#61808 ) Summary: There're a few convoluted logic here to fix the `benchmarks`'s import module for pytest. - On one hand, if we want to use `tools.stats.scribe` from `benchmarks`, we will need to add `benchmarks/__init__.py` - On the other hand, if we add `benchmarks/__init__.py`, it breaks how `pytest` is working on searching what is the system built `torch` instead of the local source module `../torch` - That's why we are seeing errors like ``` ImportError while loading conftest '/var/lib/jenkins/workspace/benchmarks/fastrnns/conftest.py'. benchmarks/fastrnns/__init__.py:1: in <module> from .cells import * # noqa: F403 benchmarks/fastrnns/cells.py:1: in <module> import torch torch/__init__.py:29: in <module> from .torch_version import __version__ as __version__ torch/torch_version.py:9: in <module> from .version import __version__ as internal_version E ModuleNotFoundError: No module named 'torch.version' ``` Instead, this PR changed the usage of `upload_scribe.py` back to its original form using HTTP request, and only circleci for now will continue the this path using the `python benchmarks/upload_scribe.py`, which is gated by `if [[ -z "${GITHUB_ACTIONS}" ]];` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61808 Reviewed By: seemethere Differential Revision: D29750188 Pulled By: zhouzhuojie fbshipit-source-id: 3b842b21978f2159001e9c6c1cdc96c5a0515f2e	2021-07-19 09:45:05 -07:00
Patrick	9c3346c8aa	reduce max_num_threads for complex double ops in reduce_kernel (#61438 ) Summary: reduce_kernel currently has a all-purpose MAX_NUM_THREADS of 512, which causes register spilling in various kernel instantiations for the various ops that use it as a template (ReduceLogicKernel, ReduceMinMaxKernel, ReduceMomentKernel, ReduceNormKernel, and ReduceSumProdKernel). This is a coarse first attempt at mitigating spillage by reducing max_num_threads to 256 for all complex double ops, which are by far the most common and egregious offenders, while keeping it 512 for the other normal ops, the large majority of which are fine. Besides complex double ops, the remaining kernels which exhibit lmem usage are ReduceMinMax double, long, and BFloat16; ReduceMomentKernel BFloat16, Half, float, and double; and ReduceNorm double. The proposed fix manages to eliminate lmem usage and massively improve runtime (by 3-5x) for complex double ops. All other ops are unaffected and have the same runtime; if they used lmem before, they still do now. We would still strongly recommend further testing of input shapes and ops as well as looking into if there's a cleaner approach to doing this. We tested the following ops for both complex double instantiations, as well as testing torch.max and torch.argmax with doubles to make sure they didn't break. We didn't include the double instantiations in the timing data, since they remain unchanged post-fix vs pre-fix. Timing data for the complex double ops below (all done on Nvidia Titan-V GPU): torch.mean: ![MeanTimingData](https://user-images.githubusercontent.com/22803332/125005623-0f424800-e011-11eb-864e-8419485a9c76.PNG) torch.linalg.norm: ![NormTimingData](https://user-images.githubusercontent.com/22803332/125005649-179a8300-e011-11eb-96e1-54e18c85a336.PNG) torch.sum: ![SumTimingData](https://user-images.githubusercontent.com/22803332/125005655-1b2e0a00-e011-11eb-928e-ee5941608fb2.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61438 Reviewed By: mrshenli Differential Revision: D29756863 Pulled By: ngimel fbshipit-source-id: 4c4635df58af9313966ff1df1095f7e15a39bb07	2021-07-19 09:38:22 -07:00
Jane Xu	d565b3e9ea	Migrate libtorch to GHA (#61774 ) Summary: Makes progress on https://github.com/pytorch/pytorch/issues/57686 Tested in https://github.com/pytorch/pytorch/pull/61775: periodic 11.3 libtorch: https://github.com/pytorch/pytorch/pull/61775/checks?check_run_id=3088529584?check_suite_focus=True 10.2: https://github.com/pytorch/pytorch/pull/61775/checks?check_run_id=3089965441 11.1: https://github.com/pytorch/pytorch/pull/61775/checks?check_run_id=3089965697 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61774 Reviewed By: samestep Differential Revision: D29745793 Pulled By: janeyx99 fbshipit-source-id: a17f561051b1e5eccf4918137a4b5df19308a716	2021-07-19 09:21:52 -07:00
Andrew Gu	3e3acf8a9a	Minor documentation fixes (#61785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61785 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29746648 Pulled By: andwgu fbshipit-source-id: 435bbd8894f2ae5c814b9acd562673affea1daf6	2021-07-19 09:01:29 -07:00
Andrew Gu	813b887dad	Fix indent (#61784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61784 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29746647 Pulled By: andwgu fbshipit-source-id: f42d3a0864a8291941d695a0cf575a5737cbb35c	2021-07-19 09:00:25 -07:00
cyy	a26a9f8b75	zero initialize some members and other fixes (#59915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59915 Reviewed By: soulitzer Differential Revision: D29106684 Pulled By: ezyang fbshipit-source-id: 713cbdf10866017ee715ee89ec82acb592c769b6	2021-07-19 07:36:26 -07:00
Tomasz Cheda	0263865bfe	[Docs] Fix docs for torch.chunk (#61097 ) Summary: torch.chunk may return less than the requested number of chunks silently if some undocumented division constraints are not met. The functionality that users expect is provided by another function: torch.tensor_split This has led to confusion countless times and who knows how many systems out there are fragile because of this. My changes describe the discrepancy, show an example and direct users to the usually preferred function. Issues mentioning this problem: https://github.com/pytorch/pytorch/issues/9382 https://github.com/torch/torch7/issues/617 I considered documenting the constraint for when an unexpected number of chunks may be returned (it is chunks*chunks>input.size[dim] ), so that users could quickly tell if their code may be affected. Please let me know if you think this should be in the docs or not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61097 Reviewed By: heitorschueroff Differential Revision: D29660280 Pulled By: ezyang fbshipit-source-id: 675086bc8a8882c1685a50a2c083ae8dd1854384	2021-07-19 06:13:04 -07:00
CodemodService FBSourceClangFormatLinterBot	552eab7935	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D29758833 fbshipit-source-id: e07673bb19f15865bf5810910224f3f37a759db7	2021-07-19 04:12:20 -07:00
Raghavan Raman	593e8f41ca	[jit] Fixed a bug in the pass that replaces cat with the variadic op (#61795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61795 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D29748785 Pulled By: navahgar fbshipit-source-id: df5b84c35f007718c92a21a0b44a231e6d346918	2021-07-18 21:38:30 -07:00
Victor Quach	ff82394fc0	Apply saved tensor hooks (#60975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60975 Fixes #58512 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29466227 fbshipit-source-id: c1498d52173aceb29638b5c4f521ac05356a5958	2021-07-18 08:42:51 -07:00
Vasiliy Kuznetsov	eefbff773b	ns for fx: add utils for l2 error and cosine similarity (#61380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61380 Adds convenience wrappers for l2 error and cosine similarity to NS utils. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extend_logger_results_with_comparison ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29600354 fbshipit-source-id: 670c44a44df7f345884cacf26ed3c885edbe9977	2021-07-17 20:53:43 -07:00
Vasiliy Kuznetsov	2a2bc1fc8a	ns for fx: add fqn to results, when present (#61377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61377 Both the quantization tracer and the NS tracer record `_node_name_to_scope`, which contains the mapping from node name to FQN. This PR adds the FQN information to the NS results, so that it is more convenient for users to attribute a NS result to the corresponding module in their model. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extract_weights_fqn python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_match_activations_fqn python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_shadow_activations_fqn ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29600349 fbshipit-source-id: df489e03daff97dd380f59c83ffdc2b0012a0a53	2021-07-17 20:53:41 -07:00
Vasiliy Kuznetsov	7449f49a4c	ns for fx: return results in execution order (#61360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61360 By default, NS graph matching matches from the end of the graph to the start. This PR reverses the returned results so that the outputs of the NS APIs are in the order of execution, making it easier to analyze. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcher.test_results_order ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29600348 fbshipit-source-id: c9fa4a3748db27c1788eebf803f35221e6fc8701	2021-07-17 20:53:39 -07:00
Vasiliy Kuznetsov	2b2928c5ca	ns for fx: improve error messages for graph matching (#61359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61359 Makes the error messages when graph matching easier to read for users. Test Plan: ``` // inspect the exceptions in the following two tests and verify // that they are easier to read than before python test/test_quantization.py TestFXGraphMatcher.test_matching_failure_node_count python test/test_quantization.py TestFXGraphMatcher.test_matching_failure_node_type ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29600353 fbshipit-source-id: ec6640fe6cab7b62a697e4ee385be182f2918fd4	2021-07-17 20:53:38 -07:00
Vasiliy Kuznetsov	ddf6d6cc14	ns for fx: clean up override_qengines and copy TODO in tests (#61358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61358 1. changes override_qengines to require fbgemm instead, these tests are not testing any qengine specific logic so better to just run them once 2. removes a TODO about copy.deepcopy which we do not plan to address Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29600352 fbshipit-source-id: 4db08f0080233ff46d7679928c83e41c5ba21ec8	2021-07-17 20:53:36 -07:00
Vasiliy Kuznetsov	cf6f5efb39	ns for fx: test case for comparing fp32 vs fp32_prepared shadow (#61357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61357 Adds a test case for comparing fp32 vs fp32_prepared in a shadow model. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29600350 fbshipit-source-id: ff7518ce8a789ab7469cb22044f1d7c697e2cd04	2021-07-17 20:53:34 -07:00
Vasiliy Kuznetsov	4acd14da02	ns for fx: preserve observers and fake_quants through passes (#61323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61323 Before this PR, all observers and fake quants were silently removed when adding loggers with NS. This is problematic for QAT models because we need the fake quants to run in order to properly capture intermediate outputs. This PR fixes the issue by preserving the observers throughout the passes which add loggers. In detail: * for each quantization module or fusion, add additional patterns with that fusion and an observer/fake_quant at the end * remove the places in the logger model creation code which removed observers * add unit testing that QAT numerics do not change after adding loggers Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_loggers_preserve_qat_numerics python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_shadow_loggers_preserve_qat_numerics ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29600351 fbshipit-source-id: 5f25118b79eb47860c49bca882de6a8eae7a4456	2021-07-17 20:53:33 -07:00
Vasiliy Kuznetsov	a70505cdbd	ns for fx: support comparing fp32 vs fp32_prepared, except shadowed (#61129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61129 Adds support the comparing fp32 model (without quantization) to a fp32 model prepared with quantization. The main missing feature was handling conv-bn fusion, since this fusion for PTQ happens outside of quantization patterns. Adds testing for this case for comparing weights and comparing activations Adds a TODO for also handling this for shadow activations, we need to first stop removing observers in graph passes before we can add this support, will be in a future PR. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcherModels.test_mobilenet_v2 python test/test_quantization.py TestFXGraphMatcherModels.test_mobilenet_v2_qat python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_compare_activations_conv ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D29520009 fbshipit-source-id: f63484a998f1424bd9cacf5d823b82b2edfea1ae	2021-07-17 20:52:23 -07:00
Horace He	e117d94e21	Wrapped create_type_hint in try/except block so that NormalizeArgs doesn't fail if create_type_hint fails (#61524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61524 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D29746106 Pulled By: Chillee fbshipit-source-id: d08c0030f40b504e8f7a61fc0ee432f1515a0e6d	2021-07-17 16:13:17 -07:00
zhouzhuojie	59ca89dca8	Fix scribe logs again (#61768 ) Summary: revert the revert of 3624d75 with additional fix in https://github.com/pytorch/pytorch/pull/61764 Got the corrent logs sent to lambda ``` ... ,"21721":"OK","21722":"OK","21723":"OK","21724":"OK","21725":"OK","21726":"OK","21727":"OK","21728":"OK","21729":"OK","21730":"OK","21731":"OK","21732":"OK","21733":"OK","21734":"OK","21735":"OK","21736":"OK","21737":"OK","21738":"OK","21739":"OK","21740":"OK","21741":"OK","21742":"OK","21743":"OK","21744":"OK","21745":"OK","21746":"OK","21747":"OK","21748":"OK","21749":"OK","21750":"OK","21751":"OK","21752":"OK","21753":"OK","21754":"OK","21755":"OK","21756":"OK","21757":"OK","21758":"OK","21759":"OK","21760":"OK","21761":"OK","21762":"OK","21763":"OK","21764":"OK","21765":"OK","21766":"OK","21767":"OK","21768":"OK","21769":"OK","21770":"OK","21771":"OK","21772":"OK","21773":"OK","21774":"OK","21775":"OK","21776":"OK","21777":"OK","21778":"OK","21779":"OK","21780":"OK","21781":"OK","21782":"OK","21783":"OK","21784":"OK","21785":"OK","21786":"OK","21787":"OK","21788":"OK","21789":"OK","21790":"OK","21791":"OK","21792":"OK","21793":"OK","21794":"OK","21795":"OK","21796":"OK","21797":"OK","21798":"OK","21799":"OK","21800":"OK","21801":"OK","21802":"OK","21803":"OK","21804":"OK","21805":"OK","21806":"OK","21807":"OK","21808":"OK","21809":"OK","21810":"OK","21811":"OK","21812":"OK","21813":"OK","21814":"OK","21815":"OK","21816":"OK","21817":"OK","21818":"OK","21819":"OK","21820":"OK","21821":"OK","21822":"OK","21823":"OK","21824":"OK","21825":"OK","21826":"OK"}} class StartProcessesTest: tests: 14 failed: 0 skipped: 0 errored: 0 run_time: 4.86 seconds avg_time: 0.35 seconds median_time: 0.01 seconds 3 longest tests: test_function_large_ret_val time: 1.55 seconds test_pcontext_wait time: 1.11 seconds test_void_function time: 1.03 seconds ... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61768 Reviewed By: janeyx99 Differential Revision: D29735781 Pulled By: zhouzhuojie fbshipit-source-id: 6882e334f5108d20773ad66d5300cd37eb509ded	2021-07-16 17:56:16 -07:00
Nikita Shulga	311f1f275a	Update clang-tidy-linux64 (#61797 ) Summary: Update clang-tidy linux hash to match one build for `7ae60a49ac` by https://github.com/pytorch/test-infra/runs/3090057893 Fixes `The downloaded binary is not what was expected!` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61797 Reviewed By: zhouzhuojie Differential Revision: D29746840 Pulled By: malfet fbshipit-source-id: a7388952b04ba12f250003c32629d57b8d5ffed8	2021-07-16 17:23:21 -07:00
Charles David Hernandez	4337650c91	Fixing a bug in .to for qtensors so scale/zp move too (#61576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61576 This also fixed an issue in the empty_quantized_per_channel_affine function where specifying a device that was different from the device of scale/zp would result in a mismatched qtensor Test Plan: python test/test_quantization.py testquantizedtensor.test_per_channel_to_device Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29675461 fbshipit-source-id: 0e2ff20f0f581dae94ee01d3ceead2a620cd26b9	2021-07-16 17:16:24 -07:00
zhouzhuojie	cb6841b263	Fix ConnectionError in download_mnist (#61789 ) Summary: Fixes issues like the following error. Note that `ConnectionResetError` is a subclass of `ConnectionError`. ``` + python tools/download_mnist.py --quiet -d test/cpp/api/mnist Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz ... Traceback (most recent call last): File "tools/download_mnist.py", line 93, in <module> main() File "tools/download_mnist.py", line 86, in main download(path, resource, options.quiet) File "tools/download_mnist.py", line 42, in download urlretrieve(url, destination_path, reporthook=hook) File "/opt/conda/lib/python3.6/urllib/request.py", line 277, in urlretrieve block = fp.read(bs) File "/opt/conda/lib/python3.6/http/client.py", line 463, in read n = self.readinto(b) File "/opt/conda/lib/python3.6/http/client.py", line 507, in readinto n = self.fp.readinto(b) File "/opt/conda/lib/python3.6/socket.py", line 586, in readinto return self._sock.recv_into(b) ConnectionResetError: [Errno 104] Connection reset by peer ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61789 Reviewed By: dreiss Differential Revision: D29745459 Pulled By: zhouzhuojie fbshipit-source-id: 2deb668bd74478f32bd01704d4362e8a4d95087b	2021-07-16 17:02:13 -07:00
Zeina Migeed	4e2fe9718d	flatten operation (resnet50) (#61265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61265 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29626383 Pulled By: migeed-z fbshipit-source-id: 107769fc14f1fad295a93a10e84235f25ae17357	2021-07-16 16:06:10 -07:00
Tianyi Yu	4479aa8838	Remove all the code that constructs metadata.pkl file (#61760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61760 Remove all code that related to metadata.pkl creation including creating metadata.pkl, converting data from extra/mobile_info.json and extra/producer_info.json to metadata.pkl file. Test Plan: ## Run buck commands: - `cd` into `fbcode` then `buck build //caffe2/caffe2/fb/init:init` - `cd` into `fbcode` then `buck build //caffe2/torch/fb/init:init` - `buck build //xplat/caffe2:torch_mobile_core` ## Export a PyTorch lite/mobile model - Run: `flow-cli canary users.xcheng16.pytorch_trainer.TestWorkflow --run-as-secure-group ai_mobile_platform --buck-target //fblearner/flow/projects/users/xcheng16:workflow` under `fbcode` on devserver. - Resulted Model: metadata.pkl no longer exist {F632063134} Reviewed By: guangy10 Differential Revision: D29702943 fbshipit-source-id: ec7964f4aa3a8e09ccc20b1a7e2232f85931dd81	2021-07-16 15:39:07 -07:00
Elton Leander Pinto	7ac8054d5a	Use better defaults in the clang-tidy wrapper script (#61651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61651 This PR sets some QOL defaults to the clang-tidy wrapper script and refactors how defaults are set. - Runs in parallel - Custom executable (prints an error message to users asking them to install our custom build) - `generate_build_files` can now be run as a script Test Plan: Imported from OSS Reviewed By: malfet, zhouzhuojie Differential Revision: D29743661 Pulled By: 1ntEgr8 fbshipit-source-id: 256617d006a03e4ab96091593f5bb80c9b31a2d1	2021-07-16 14:58:19 -07:00
Thomas J. Fan	dc0d1612e1	ENH Updates docs and tests for activation modules for no-batch dims (#61300 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 This PR updates docs and tests for activation modules that already support no-batch dims. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61300 Reviewed By: heitorschueroff Differential Revision: D29660543 Pulled By: jbschlosser fbshipit-source-id: 5edad45f7e9995aca6c3403469668e6e1cbb94b6	2021-07-16 14:42:18 -07:00
Brian Hirsh	6a085648d8	add aten symbols for amin and amax (#61550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61550 Test Plan: Imported from OSS Reviewed By: asuhan Differential Revision: D29668123 Pulled By: bdhirsh fbshipit-source-id: b111e1c6c6d2beddb220cad70d95954756a3ee9d	2021-07-16 14:06:00 -07:00
Nikita Shulga	4e94e84f65	Type annotate `torch.nn.Module` ctor (#61334 ) Summary: Annotate generic types Fix some type violations Override `_modules` and `_parameters` in `Sequential`, `ModuleList`, `ModuleDict`, etc Fixes https://github.com/pytorch/pytorch/issues/45497 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61334 Reviewed By: albanD Differential Revision: D29579533 Pulled By: malfet fbshipit-source-id: 5cd8ca918b260ca35cfdd873dee8851d39d17de2	2021-07-16 13:59:06 -07:00
Nikita Shulga	ee2f2ec9a5	Revert D29687143: [3/N] Nnapi Backend Delegate Preprocess: Basic OSS Test Test Plan: revert-hammer Differential Revision: D29687143 (`5798a00aa4`) Original commit changeset: 9ba9e57f7f85 fbshipit-source-id: 6a672c76a04366b35c492698ae5b39fd4dd1785f	2021-07-16 13:32:51 -07:00
zhouzhuojie	a07d3dc34c	Pin macos mkl conda version to fix the cmake build (#61773 ) Summary: Fixes macos build error in master, recently mkl had a upgrade. CircleCI error: https://app.circleci.com/pipelines/github/pytorch/pytorch/351645/workflows/d22421c1-bb8f-48fd-9efd-7c0d77f0b083/jobs/14815607 ``` Jul 16 11:43:05 CMake Error at /Users/distiller/workspace/miniconda3/lib/cmake/mkl/MKLConfig.cmake:456 (list): Jul 16 11:43:05 list does not recognize sub-command PREPEND Jul 16 11:43:05 Call Stack (most recent call first): Jul 16 11:43:05 /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/share/cmake/Caffe2/public/mkl.cmake:1 (find_package) Jul 16 11:43:05 /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:109 (include) Jul 16 11:43:05 /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) Jul 16 11:43:05 CMakeLists.txt:5 (find_package) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61773 Reviewed By: soulitzer Differential Revision: D29736742 Pulled By: zhouzhuojie fbshipit-source-id: 68c5244196f7f7562a6c202157c4ccdcfcb64337	2021-07-16 13:15:04 -07:00
Philip Meier	8ad584823f	add shortcircuit in isclose for zero tolerances (#61529 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61412. Large integers gave false positives, because the comparison always takes place in floating point dtypes. This happens, because their integer precision is lower than the range of an integer dtype with the same number of bits. For non-extremal values, `isclose` is defined by [this equation]: ```python abs(a - b) <= atol + rtol * abs(b) ``` For `rtol == 0 and atol==0`, this is equivalent to `a == b`. This PR goes for the low hanging fruit and adds a shortcut for this case that falls back to an actual equality check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61529 Reviewed By: gchanan Differential Revision: D29707534 Pulled By: mruberry fbshipit-source-id: 71b8c4901e9cd4f366442437e52032b0d3002b4a	2021-07-16 12:48:16 -07:00
Nikita Shulga	612632556d	Fix `torch.median` crash on empty tensor (#61698 ) Summary: `torch.tensor([]).median()` returns `nan`, which mimics the behavior of `np.median` Add test to `TestReductions.test_median_corner_cases` Fixes https://github.com/pytorch/pytorch/issues/61656 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61698 Reviewed By: heitorschueroff Differential Revision: D29706912 Pulled By: malfet fbshipit-source-id: ea5f58327fbff371f3fb8786b269430c7a10d05f	2021-07-16 12:36:18 -07:00
Jane Xu	3fd9dcf934	Move non-libtorch scheduled linux CI to GHA (#61732 ) Summary: Move non-libtorch Linux 11.3 scheduled CI job to GHA. Libtorch builds will be migrated here: https://github.com/pytorch/pytorch/pull/61774 Successful run: https://github.com/pytorch/pytorch/actions/runs/1035592487 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61732 Reviewed By: seemethere Differential Revision: D29735637 Pulled By: janeyx99 fbshipit-source-id: dce13370b218ae7833483fdaa00137db95e27c98	2021-07-16 12:16:58 -07:00
Anjali Chourdia	287603f51c	Revert D29698486: [pytorch][PR] Remove torch._bmm and remove torch.bmm deterministic arg documentation Test Plan: revert-hammer Differential Revision: D29698486 (`328606699f`) Original commit changeset: 5af2d3803ab1 fbshipit-source-id: ce954c13196b1fb8277d61a686ac351d3bf13903	2021-07-16 11:02:09 -07:00
Amy He	5798a00aa4	[3/N] Nnapi Backend Delegate Preprocess: Basic OSS Test (#61594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61594 ### Summary: Added a unit test for the Nnapi delegate's preprocess() function. The function was previously tested locally, but now a basic test is added for OSS. See https://github.com/pytorch/pytorch/pull/61499 for preprocess implementation. See D29647123 for local testing. TODO: Add more comprehensive tests. Add tests for model execution, after the Nnapi delegate's initialization and execution is implemented T91991928. CMakeLists.txt: Added a library for the Nnapi delegate - Explicit linking of torch_python is necessary for the Nnapi delegate's use of pybind test_backends.py: Added a test for lowering to Nnapi - Based off https://github.com/pytorch/pytorch/blob/master/test/test_nnapi.py - Only differences are the loading of the nnapi backend library and the need to change dtype from float64 to float32 ### Test Plan: Running `python test/test_jit.py TestBackendsWithCompiler -v` succeeds. Also saved and examined the model file locally. Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D29687143 fbshipit-source-id: 9ba9e57f7f856e5ac15e13527f6178d613b32802	2021-07-16 11:00:38 -07:00
Richard Barnes	349f2f767c	Modernize to default constructor and nullptr in torch (#61735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61735 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29716659 fbshipit-source-id: ec2a0a0b7e55d2e50b1d35f0b651bd40675ae7e8	2021-07-16 10:51:13 -07:00
Philip Meier	736bb26746	use `rand` over `empty` in flaky test (#61710 ) Summary: Fixes https://github.com/pytorch/pytorch/pull/61694#issuecomment-880641635. cc krshrimali. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61710 Reviewed By: anjali411 Differential Revision: D29719660 Pulled By: mruberry fbshipit-source-id: 589574a039ad431acc7d095d452f0b3e52260208	2021-07-16 10:50:05 -07:00
Raghavan Raman	efeacc0779	[Static Runtime] Fixed visibility of ProcessedNode class and a newly added function (#61729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61729 Test Plan: Imported from OSS Reviewed By: hlu1 Differential Revision: D29719644 Pulled By: navahgar fbshipit-source-id: 27a77b2a281d1a8a48e2a9df1c254f62c0e2e7ef	2021-07-16 10:42:02 -07:00
Will Constable	6fa80f7f9f	Refactor embedded_interpreter registration to be friendly to python case (#59991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59991 add a registration mechanism whereby on loading the embedded interpreter library, a registration function is called which links up the symbols it provides with torch::deploy. Test Plan: local and CI deploy tests pass Reviewed By: suo Differential Revision: D28764436 fbshipit-source-id: 88416bd098be306f887cc9fd2d65d29199439bc4	2021-07-16 10:33:58 -07:00
Amy He	6349bde572	[4/N] Nnapi backend delegation preprocess: List Tensors & Comment Updates (#61752 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61752 Updated Android NNAPI preprocess, so that it can accept both single Tensor inputs and Tensor List inputs. - The inputs are not real data, but input parameters for shape, dtype, quantization, and dimorder that are bundled as a Tensor. Comments were updated to make this clearer. - In the future, preprocess will also accept a dedicated NnapiArg object. Compile_spec should have the following format: {"forward": {"inputs": at::Tensor}} OR {"forward": {"inputs": c10::List< at::Tensor >}} Example input Tensor: torch.tensor([[1.0, -1.0, 2.0, -2.0]]).unsqueeze(-1).unsqueeze(-1) ### Testing OSS testing is blocked by https://github.com/pytorch/pytorch/pull/61594. Testing was done locally in D29726948 TODO: Add OSS tests for single Tensor and Tensor List inputs. ghstack-source-id: 133683735 Test Plan: OSS testing is blocked by https://github.com/pytorch/pytorch/pull/61594. Testing was done locally in D29726948. TODO: Add OSS tests for single Tensor and Tensor List inputs. Reviewed By: iseeyuan Differential Revision: D29726432 fbshipit-source-id: 08de70578f37681bda365f9776a1c96030257e7a	2021-07-16 10:17:56 -07:00
Kurt Mohler	328606699f	Remove torch._bmm and remove torch.bmm deterministic arg documentation (#61629 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61571 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61629 Reviewed By: zou3519 Differential Revision: D29698486 Pulled By: albanD fbshipit-source-id: 5af2d3803ab1eb093616bcfc7e074d8b57ef6958	2021-07-16 09:18:34 -07:00
Mike Iovine	28150fd0c8	[static_runtime] Implement aten::linear (#61595 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61595 Add out variant wrapper for `aten::linear` in the static runtime Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D29684236 fbshipit-source-id: 94df6d7267b3f269b2cadf065f207648777147df	2021-07-16 08:55:43 -07:00
Jeffrey Wan	3624d75864	Revert D29703523: [pytorch][PR] Fix scribe logs Test Plan: revert-hammer Differential Revision: D29703523 (`eb5a56fb74`) Original commit changeset: 829ad3630d35 fbshipit-source-id: 2b2196d58791b995a008b6d810b3248ed27e7d94	2021-07-16 08:50:13 -07:00
Bert Maher	b963607d50	[nnc] Insert alloc/free at global scope (#61725 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61725 Alloc/free inside a loop isn't really an optimization, and furthermore it breaks some attempted optimization in the llvm backend: we use alloca for small allocations, which is efficient since alloca is on the stack, but there's no corresponding free, so we leak tons of stack. I hit this while building an rfactor buffer inside a very deeply nested loop. ghstack-source-id: 133627310 Test Plan: Unit test which simulates use of a temp buffer in a deeply nested loop. Reviewed By: navahgar Differential Revision: D29533364 fbshipit-source-id: c321f4cb05304cfb9146afe32edc4567b623412e	2021-07-16 08:42:24 -07:00
Rohan Varma	4c3d9cfe03	[BE] Fix flaky test_ddp_model_diff_across_ranks test (#61546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61546 Closes https://github.com/pytorch/pytorch/issues/60661 Fixes this flaky test by using blocking wait instead of async error handling, and performs a gloo-based barrier with higher timeout at the end of test which avoids issues with Barrier.sync. This also allows us to remove this test from the `skip_return_code_checks` list. ghstack-source-id: 133657107 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D29663884 fbshipit-source-id: 9f0df085b1968f6a7e2c7ae2f06b6dcd4838a87e	2021-07-16 08:37:02 -07:00
Rohan Varma	f1114364ad	[DDP] Enhance comm hook docs (#61677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61677 Specify return type more clearly, 2) Misc fixes ghstack-source-id: 133657895 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D29701384 fbshipit-source-id: 7f77b99065bd2977153f397745e07b75bbdd7a94	2021-07-16 08:35:49 -07:00
Tianyi Yu	39ce29efe0	Refactor metadata_map with flattened key/value pair (#61731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61731 In the previous diff, metadata_map contains mobile_info.json and producer_info.json. We need to parse json each time when we log the required information. This diff helps to flatten the content in the files into key/value pair. It allows logger to directly loop through the metadata_map and log the information. Test Plan: Since 3D Photo is disabled for current FB app, testings are only performed on CC scanner. # Test On CC Scanner Test content with LOG(WARNING) {P429123273} Scuba Logger Output 1. MOBILE_MODULE_LOAD_STATS {F631884673} 2. MOBILE_MODULE_STATS {F631884787} Reviewed By: xcheng16 Differential Revision: D29690702 fbshipit-source-id: 1db5a1f5c25e98e5b2f1cc254fd880dfdfa025e2	2021-07-16 00:37:17 -07:00
Tianyi Yu	00a7f55b6e	Apply for MOBILE_MODULE_STATS Logging (#61600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61600 This diff changes the module.h constructor, and removes metadata_. It refactors all the constructors caller side, and creates a getter & setting for metadata_. MOBILE_MODULE_STATS reads the metadata from mobile::Module, and pass it into logger. Test Plan: Since 3D Photo is disabled for current FB app, testings are only performed on CC scanner. # Test On CC Scanner Test content with LOG(WARNING) {P428930572} Scuba Logger Output {F631761194} Reviewed By: xcheng16 Differential Revision: D29673184 fbshipit-source-id: 962e0d7b06a07caaa0c695a4ac58b885fd1505ea	2021-07-16 00:37:15 -07:00
Tianyi Yu	fc710eecc0	Apply for MOBILE_MODULE_LOAD_STATS Logging (#61480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61480 Append mobile_info.json and producer_info.json into extra_files and parse the jsons from “model_info.json” in onExitLoadModel. ghstack-source-id: 133327912 Test Plan: # Test On CC Scanner Test content with LOG(WARNING) {P428339274} Scuba Logger Output {F631024095} # Test On 3D Photo Test content with LOG(WARNING) {P428340927} Scuba Logger Output {F631026739} Reviewed By: xcheng16, guangy10 Differential Revision: D29608014 fbshipit-source-id: abc39c44b947632fd4349de8a432649e84284a87	2021-07-16 00:36:09 -07:00
Rohan Varma	56d562e790	[DDP] fix test_ddp_inference (#61666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61666 Closes https://github.com/pytorch/pytorch/issues/61481. Fixes this test by removing section that uses only torch.no_grad() and doesn't call model.eval(). For SyncBN, need to call model.eval() otherwise SyncBN will assume it is in training mode, which does collective calls in the forward pass and does not work for inference. ghstack-source-id: 133657549 Test Plan: CI Reviewed By: mrshenli Differential Revision: D29699444 fbshipit-source-id: 03ccb296dd9cb56729cd23e91c7f50b72fcf3adf	2021-07-16 00:25:02 -07:00
Kushashwa Ravi Shrimali	7e1f01d4c0	Alias for `polygamma` (#59691 ) Summary: See https://github.com/pytorch/pytorch/issues/50345 cc: mruberry kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59691 Reviewed By: gchanan Differential Revision: D29707514 Pulled By: mruberry fbshipit-source-id: 40c15e1fda3d9f7013977b0f36a77b228dda6aa5	2021-07-16 00:06:27 -07:00
Kushashwa Ravi Shrimali	f008e8d32d	Remove test_out, test_variant_consistency_eager skips for `addmv`; fixed before (#61579 ) Summary: This PR: 1. Removes `test_out` skip: it's not needed anymore after it was fixed in https://github.com/pytorch/pytorch/pull/55746. This should also close https://github.com/pytorch/pytorch/issues/55589. 2. Removes `test_variant_consistency_eager` skip, it was added by mistake in https://github.com/pytorch/pytorch/issues/55771. 3. Refines `sample_inputs_addmv` function, the updated function should now be cleaner and easy to read. cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/61579 Reviewed By: gchanan Differential Revision: D29709674 Pulled By: mruberry fbshipit-source-id: 9b975c024777efdd33c6b9444b0b36e0eab85c03	2021-07-15 22:35:03 -07:00
Raghavan Raman	843c42ffd8	[nnc] Refactored test macros and updated compress buffer tests to use them (#61716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61716 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29715754 Pulled By: navahgar fbshipit-source-id: c400a58b7f393c0f93e5a25f118403124f8834b0	2021-07-15 21:17:14 -07:00
Raghavan Raman	d01837081d	[nnc] Cleaned up compress buffer tests to use BufHandle instead of Buf (#61715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61715 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29715755 Pulled By: navahgar fbshipit-source-id: 453adac8f5b13263c39d96b6b4086425a01bae54	2021-07-15 21:15:23 -07:00
zhouzhuojie	eb5a56fb74	Fix scribe logs (#61675 ) Summary: Related to https://github.com/pytorch/pytorch/issues/61632 This PR adds - refactoring of scribe related code to scribe.py - changed the `render_test_results` job to always use the `linux.2xlarge` runner - if SCRIBE_GRAPHQL_ACCESS_TOKEN is empty, try boto3 instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/61675 Reviewed By: seemethere Differential Revision: D29703523 Pulled By: zhouzhuojie fbshipit-source-id: 829ad3630d3500a498b41aa458ce6539aaeae938	2021-07-15 19:27:58 -07:00
Richard Barnes	127562a0ed	Fix some sign comparisons (#61618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61618 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29688193 fbshipit-source-id: ea7a6b6be8b25d4a0668e744688f96bbbb144dc7	2021-07-15 18:28:41 -07:00
Richard Barnes	e6860ba508	Fix some sign comparisons and a loop (#61663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61663 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29696766 fbshipit-source-id: eb5a77bd0cfafeb6209d274f121f10dca20d461a	2021-07-15 18:27:42 -07:00
Nikita Shulga	9d955abcdb	Fix test_reductions when no SciPy is installed (#61699 ) Summary: Also, use skipIfNoSciPy decorator instead of implicit `unittest.skipIf` This fixes regression introduced by https://github.com/pytorch/pytorch/pull/52565 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61699 Reviewed By: seemethere Differential Revision: D29706938 Pulled By: malfet fbshipit-source-id: 0b63c3ddadfa7f68bed994b71cadf68976d3b396	2021-07-15 15:57:11 -07:00
kshitij12345	968a01a94a	[special] migrate xlogy (#60641 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60641 Reviewed By: gchanan Differential Revision: D29709306 Pulled By: mruberry fbshipit-source-id: e8a5f64009a895a25618637de40b55cf36b8f794	2021-07-15 15:32:09 -07:00
Mike Ruberry	1ce3281a6d	Revert D29361872: [pytorch][PR] det_backward: more robust and with complex support Test Plan: revert-hammer Differential Revision: D29361872 (`fce85480b9`) Original commit changeset: b1f0fec7e3ac fbshipit-source-id: feffa74ad65b0b294e0a9b0ee72d245393421f70	2021-07-15 15:26:00 -07:00
Sam Estep	3a0801f960	[skip ci] Fix "arugment" typos (#61459 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61455. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61459 Reviewed By: soulitzer Differential Revision: D29636559 Pulled By: samestep fbshipit-source-id: 9ad65265c0491d9e81bb303abe3a07c6843bfa4a	2021-07-15 15:20:18 -07:00
Eli Uriegas	e5fcc903d6	torch: Make __version__ better with comparisons (#61556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61556 Prior to 1.10.0 `torch.__version__` was stored as a str and so many did comparisons against `torch.__version__` as if it were a str. In order to not break them we have TorchVersion which masquerades as a str while also having the ability to compare against both packaging.version.Version as well as tuples of values, eg. (1, 2, 1) Examples: Comparing a TorchVersion object to a Version object ``` TorchVersion('1.10.0a') > Version('1.10.0a') ``` Comparing a TorchVersion object to a Tuple object ``` TorchVersion('1.10.0a') > (1, 2) # 1.2 TorchVersion('1.10.0a') > (1, 2, 1) # 1.2.1 ``` Comparing a TorchVersion object against a string ``` TorchVersion('1.10.0a') > '1.2' TorchVersion('1.10.0a') > '1.2.1' ``` Resolves https://github.com/pytorch/pytorch/issues/61540 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29671234 Pulled By: seemethere fbshipit-source-id: 6044805918723b4aca60bbec4b5aafc1189eaad7	2021-07-15 15:12:09 -07:00
Kushashwa Ravi Shrimali	0ea29a6ccb	Analysing time taken by gradgrad checks for Spectral Functions (#60435 ) Summary: Description: `SpectralFuncInfo` defines decorator mentioning: "gradgrad is quite slow". This PR re-analyses that statement since things have changed with gradient tests. Test times: https://github.com/pytorch/pytorch/pull/60435#issuecomment-865658177 Follow-up of https://github.com/pytorch/pytorch/pull/57802 cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/60435 Reviewed By: gchanan Differential Revision: D29707444 Pulled By: mruberry fbshipit-source-id: 444b4863bac8556c7e8fcc8ff58d81a91bd96a21	2021-07-15 14:02:03 -07:00
Kushashwa Ravi Shrimali	4ff121f58d	Add `complex64` dtype for OpInfo Reference testing (#61627 ) Summary: This PR adds `complex64` dtype testing, following conversation from: pytorch/xla#3019 ([comment](https://github.com/pytorch/xla/pull/3019#discussion_r666754943)). Original PR that added OpInfo reference testing: https://github.com/pytorch/pytorch/pull/59369. cc: mruberry kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61627 Reviewed By: gchanan Differential Revision: D29710560 Pulled By: mruberry fbshipit-source-id: 55b2e5ff47f031069335a0c75a45d4f4885ef9ac	2021-07-15 13:40:37 -07:00
Nikita Shulga	e2c3049e2a	Delete stable-sort-only-works-on-cpu warning (#61685 ) Summary: stable GPU sorting is implemented by https://github.com/pytorch/pytorch/pull/56821 Fixes https://github.com/pytorch/pytorch/issues/61682 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61685 Reviewed By: gchanan Differential Revision: D29704864 Pulled By: malfet fbshipit-source-id: 3a5aa24bf6507be63844fe6016fb9e3c682f4d84	2021-07-15 13:34:41 -07:00
Bo Wang	e098e9000b	Compare DDP static graph (C++ core) with legacy DDP forward and backward delay. (#61507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61507 Benchmark Python-only DDP vs production C++ based DistributedDataParallel. - Implemented a pure python DDP: PythonDDP with support of SYNC and ASYNC reduction - Added compare_ddp to measure the difference in forward and backward step Kudos on Shen and Yi for the great idea. Test Plan: Test on DevGPUS with 2 CUDA devices. $python compare_ddp.py Python only DDP has slightly better (-1%) forward performance and slightly slower (2%-20%) backward performance. This suggested that we need to keep C++ Core since the maximum latency increase can be 20%. See README.md for details. Imported from OSS Differential Revision: D29685364 D29685364 Reviewed By: mrshenli Pulled By: bowangbj fbshipit-source-id: 429e4473fac0ec4c70d6db12d946d2636dd6477a	2021-07-15 12:52:22 -07:00
Akshit Khurana	7a3b05ea6d	Fix hardswish inplace op for strided tensor with skipped elements (#61622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61622 Hardswish inplace op would return incorrect response for strided tensor inputs that skip elements like a slice. Create a contiguous tensor and copy elements back to return the correct answer Test Plan: Internal CI tests Reviewed By: kimishpatel Differential Revision: D29689745 fbshipit-source-id: 11618a8d865f550f6b70637345f9ebc3e5676f11	2021-07-15 11:50:27 -07:00
Nikita Vedeneev	fce85480b9	det_backward: more robust and with complex support (#58195 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58195 Reviewed By: albanD Differential Revision: D29361872 Pulled By: anjali411 fbshipit-source-id: b1f0fec7e3ac52acd1481bcc878cc0c1d07c1852	2021-07-15 11:04:42 -07:00
Raghavan Raman	bd360ebe6f	[nnc] Added a new API to distribute loop and all its parents (#61293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61293 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29560008 Pulled By: navahgar fbshipit-source-id: e4e459184f20b1872bc242ba8626d0a6df29e810	2021-07-15 10:28:20 -07:00
Raghavan Raman	76f097466e	[nnc] Added a new API to compress all buffers in a given statement (#61087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61087 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29506677 Pulled By: navahgar fbshipit-source-id: 63583fd5a0e42c0096ddf08d5b96bc680ea8a44e	2021-07-15 10:28:18 -07:00
Raghavan Raman	2908d3eb45	[nnc] Modified the semantics of reorder in using permutation (#61085 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61085 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29506679 Pulled By: navahgar fbshipit-source-id: f674aedff8175b9947404fd2164a0b4f57a71e93	2021-07-15 10:28:16 -07:00
Rohan Varma	7177509380	Revert [DDP] Support not all outputs used in loss calculation (#61497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61497 Reverts [DDP] Support not all outputs used in loss calculation ghstack-source-id: 133589153 Test Plan: CI, ping authors to run their workflow on this diff Reviewed By: zhaojuanmao Differential Revision: D29642892 fbshipit-source-id: 81a15b9ab3329602f34d3758bb0799005a053d4f	2021-07-15 10:28:14 -07:00
Rohan Varma	25f9c35dd7	Revert [DDP] Support for multiple backwards (#61401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61401 Reverts https://github.com/pytorch/pytorch/pull/59359, which is causing a few internal issues in DDP training. We will evaluate the internal use cases and reland it after reconsidering the design. Also moves `prepare_for_backward` back into forward pass instead of DDP Sink for `find_unused_parameters`. This ensures that hooks will always fire in the backwards pass, which is behavior that internal training workloads rely on. Calling `prepare_for_backward` in DDPSink autograd function is not the best solution since other autograd threads may have been executing which can cause races. ghstack-source-id: 133589152 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D29608948 fbshipit-source-id: f060f41cd103573ddff8da50cdbb6c56768dab46	2021-07-15 10:28:13 -07:00
Rohan Varma	38ac9e69aa	Back out "[DDP] Disable reducer hooks from running outside of DDP backwards." (#61399 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61399 Reverts https://github.com/pytorch/pytorch/pull/60921 Original commit changeset: fef76a0dd295 ghstack-source-id: 133581300 Test Plan: CI Differential Revision: D29594262 fbshipit-source-id: a308d3f10dbbb2169d9a7f60f2f28f139185ed1f	2021-07-15 10:27:02 -07:00
Yu Guo	a50a389ca6	Revert D29701479: [pytorch][PR] Remove `_broadcast_object()` from `ZeroRedundancyOptimizer` Test Plan: revert-hammer Differential Revision: D29701479 (`9b5d9b4049`) Original commit changeset: c8d5f9057b32 fbshipit-source-id: 35ab1f399513fb9d1c4e73b1fa906e559d2a6994	2021-07-15 10:03:08 -07:00
Joel Schlosser	aa01a7a61c	Fix for get_buffer(): check buffers by name instead of value (#61429 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61242 Previous code was wrongly checking if a tensor is a buffer in a module by comparing values; fix compares names instead. Docs need some updating as well- current plan is to bump that to a separate PR, but I'm happy to do it here as well if preferred. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61429 Reviewed By: gchanan Differential Revision: D29712341 Pulled By: jbschlosser fbshipit-source-id: 41f29ab746505e60f13de42a9053a6770a3aac22	2021-07-15 09:55:09 -07:00
Peter Bell	5407108533	CopyBackward: Remove redundant src_device and unnecessary copy=True (#60025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60025 `to` already copies unconditionally if `src.device() != options.device()` so specifying the copy argument is unnecessary. `src.device()` is also completely equivalent to `src.options().device()` so storing both is redundant. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29698627 Pulled By: albanD fbshipit-source-id: eb091d39b71db688e6bcbb33a227c01b94b432bb	2021-07-15 09:48:03 -07:00
Eli Uriegas	da667e2d5f	Add .github for CODEOWNERS (#61598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61598 I'd like to be notified on changes to the github actions workflows, add this so I can be notified. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99, samestep Differential Revision: D29685783 Pulled By: seemethere fbshipit-source-id: 865a1360a24633ef5074e43b8277838a0eef94f6	2021-07-15 09:39:12 -07:00
Patrick	8afb65b6c5	changed launch bounds for upsample_linear1d fwd, bwd from 1024 to 512 (#61307 ) Summary: Changed launch bounds for upsample_linear1d_out_frame and upsample_linear1d_backward_out_frame from 1024 to 512. Shows performance improvement as shown below. Does not completely eliminate lmem usage (lmem usage goes from 40-48 bytes to 8-16 bytes), not sure why. Timing data (using Nvidia Titan-V GPU): ![UpsampleLinear1dTimingData](https://user-images.githubusercontent.com/22803332/124677708-e20d6280-de75-11eb-8187-fb50ec89dc50.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61307 Reviewed By: heitorschueroff Differential Revision: D29662137 Pulled By: ngimel fbshipit-source-id: 9653672ee17f25b75a02f295f388a78327091431	2021-07-15 09:19:16 -07:00
Victor Quach	ee5a97de11	Register Saved Tensors hooks (#60663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60663 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29466223 fbshipit-source-id: 65dc3a935c18a0e6b93a37e24543c696e6ae0321	2021-07-15 08:09:55 -07:00
Don Jang	94965212e5	[static runtime] Use at::allclose to test NNC sigmoid (#61566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61566 This change uses `at::allclose` to compare results from sigmoid functions (CPU/NNC) instead of `Tensor::equals` due to numerical errors occurring between them. Test Plan: I confirmed that the flakiness of `StaticRuntime.Sigmoid` is gone with this change: ``` [djang@devvm1999.ftw0 ~/fbsource/fbcode] buck-out/gen/caffe2/benchmarks/static_runtime/static_runtime_cpptest -v 3 --gtest_filter=StaticRuntime.Sigmoid --gtest_repeat=100 &> output.txt [djang@devvm1999.ftw0 ~/fbsource/fbcode] grep PASSED output.txt \| wc 100 500 2100 ``` Reviewed By: bertmaher Differential Revision: D29671203 fbshipit-source-id: 99a7b16d18ea047c9aad444f36d8368f9d0b088d	2021-07-14 19:48:00 -07:00
Andrew Gu	9b5d9b4049	Remove `_broadcast_object()` from `ZeroRedundancyOptimizer` (#61539 ) Summary: Revised version of https://github.com/pytorch/pytorch/issues/60573. Overview: This makes two changes: - It introduces a `map_location` argument to `broadcast_object_list()`. The argument specifies the device to load tensors contained in objects received from the broadcast. This change requires modifying the implementation of `_object_to_tensor()` and `_tensor_to_object()` to use `torch.save()` and torch.load()` respectively. - It removes all calls to `_broadcast_object()` in `ZeroRedundancyOptimizer` and the corresponding test file in favor of `broadcast_object_list()`. The default value of `map_location` is `None`, in which case `_object_to_tensor()` and hence `broadcast_object_list()` preserve their original behavior. Namely, contained tensors are loaded to their original device. In `consolidate_state_dict()`, I specify `map_location=torch.device("cpu")` instead of `self._default_device`. This slightly changes the behavior from before when using `_broadcast_object()`. The reason I do so is that it saves one GPU to CPU data transfer since the action immediately after receiving the broadcasted `local_state_dict` is to copy it to CPU. Explicitly, if `map_location=self._default_device`, then the data transfer path assuming NCCL backend is as follows: `source GPU --[before serialize]--> source CPU --[before broadcast]--> source GPU --[broadcast]--> destination GPU --[before deserialize]--> destination CPU --[deserialize]--> destination GPU --[copy]--> destination CPU` Hence, by setting `map_location=torch.device("cpu")` instead, the suffix becomes: `destination CPU --[deserialize]--> destination CPU --[copy]--> destination CPU` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61539 Test Plan: I added a test `test_broadcast_object_list_map_location()` that checks for both `map_location` as CPU and GPU that (1) tensors contained in broadcasted objects are appropriately loaded onto the specified device and (2) that the contents of the tensors are correct. The existing `ZeroRedundancyOptimizer` tests pass. ``` gpurun4 python test/distributed/optim/test_zero_redundancy_optimizer.py ``` The existing `broadcast_object_list()` test passes: ``` touch /tmp/barrier && TEMP_DIR="/tmp" BACKEND="nccl" WORLD_SIZE="2" gpurun python test/distributed/test_distributed_fork.py -- TestDistBackendWithFork.test_broadcast_object_list ``` Reviewed By: zou3519 Differential Revision: D29701479 Pulled By: andwgu fbshipit-source-id: c8d5f9057b32e5e9f40e8edc5b2cc25fb21414a9	2021-07-14 17:36:30 -07:00
Valentin Andrei	e3d5619ff0	[pytorch][profiler] Fix division by 0 in computeFlops (#61676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61676 Reviewed By: ilia-cher Differential Revision: D29646067 fbshipit-source-id: d872221bbde5384a9e397e68c1e5b0664d913b42	2021-07-14 16:38:19 -07:00
Dimitrije Jankov	70e94bb1dd	Avoid redefining `__BYTE_ORDER` (#60346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60346 Introduction: In order to support the Intel SGX platform, we have to avoid redefining __BYTE_ORDER. Solution: Check if the platform is SGX and avoid the redefinition. Test Plan: Run the PyTorch tests. Reviewed By: h397wang, malfet Differential Revision: D29022626 fbshipit-source-id: 801c3a75c202d192a3808eb5d54b875094499996	2021-07-14 14:55:04 -07:00
Jinay Dagli	a9c3580080	Grammatical update of tech docs (#61547 ) Summary: Added some minor grammatical updates to the 'Complex Numbers' docs. ![Screenshot (180)](https://user-images.githubusercontent.com/75036632/125342884-0b952500-e373-11eb-9e63-410ff31e6c21.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61547 Reviewed By: zou3519 Differential Revision: D29677361 Pulled By: H-Huang fbshipit-source-id: 78222310a755911192905a8f52aa0ae325900006	2021-07-14 14:01:59 -07:00
Garrett Cramer	5a5c7f563d	add trainer hook functions (#60785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60785 This pr adds hook functions for the trainers. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29697299 Pulled By: gcramer23 fbshipit-source-id: cc3b991aad0d32503fbfc5acd4fca8b404e74c0f	2021-07-14 13:19:17 -07:00
Garrett Cramer	304c02ee44	refactor ps benchmark (#60784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60784 This pr refactors the ps benchmark for modular trainers. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29697291 Pulled By: gcramer23 fbshipit-source-id: 64579a1f5326d3cd9f32936dcf53bc243d54b71d	2021-07-14 13:19:13 -07:00
Pritam Damania	7d2ea9a8f7	Release GIL as much as possible in dist_autograd pybind. (#61593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61593 Following the pattern in https://github.com/pytorch/pytorch/pull/61588 to avoid deadlocks as much as possible. ghstack-source-id: 133497897 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D29683451 fbshipit-source-id: 1951622eb964f57a551a9c0d46ad0ab24b66c458	2021-07-14 13:19:10 -07:00
Pritam Damania	5ebc7c9f97	Avoid holding GIL while calling retrieveContext. (#61588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61588 As part of debugging https://github.com/pytorch/pytorch/issues/60290, we discovered the following deadlock: ``` Thread 79 (Thread 0x7f52ff7fe700 (LWP 205437)): #0 pthread_cond_timedwait@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225 #1 0x0000564880199152 in PyCOND_TIMEDWAIT (cond=0x564880346080 <gil_cond>, mut=0x564880346100 <gil_mutex>, us=5000) at /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/condvar.h:103 #2 take_gil (tstate=0x7f5254005ef0) at /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/ceval_gil.h:224 #3 0x0000564880217b62 in PyEval_AcquireThread (tstate=0x7f5254005ef0) at /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/ceval.c:278 #4 0x00007f557d54aabd in pybind11::gil_scoped_acquire::gil_scoped_acquire() () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so #5 0x00007f557da7792f in (anonymous namespace)::concrete_decref_fn(c10::impl::PyInterpreter const, _object) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so #6 0x00007f5560dadba6 in c10::TensorImpl::release_resources() () from /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so #7 0x00007f5574c885bc in std::_Sp_counted_ptr_inplace<torch::distributed::autograd::DistAutogradContext, std::allocator<torch::distributed::autograd::DistAutogradContext>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so #8 0x00007f5574c815e9 in std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<long const, std::shared_ptr<torch::distributed::autograd::DistAutogradContext> >, false> > >::_M_deallocate_node(std::__detail::_Hash_node<std::pair<long const, std::shared_ptr<torch::distributed::autograd::DistAutogradContext> >, false>) [clone .isra.325] () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so #9 0x00007f5574c81bf1 in torch::distributed::autograd::DistAutogradContainer::eraseContextIdAndReset(torch::distributed::autograd::DistAutogradContainer::ContextsShard&, long) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so #10 0x00007f5574c86e83 in torch::distributed::autograd::DistAutogradContainer::releaseContextIfPresent(long) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so #11 0x00007f5574cc6395 in torch::distributed::rpc::RequestCallbackNoPython::processCleanupAutogradContextReq(torch::distributed::rpc::RpcCommandBase&) const () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so #12 0x00007f5574cccf15 in torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so Thread 72 (Thread 0x7f53077fe700 (LWP 205412)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007f55bc62adbd in __GI___pthread_mutex_lock (mutex=0x564884396440) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f5574c82a2f in torch::distributed::autograd::DistAutogradContainer::retrieveContext(long) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so #3 0x00007f557de9bb2f in pybind11::cpp_function::initialize<torch::distributed::autograd::(anonymous namespace)::dist_autograd_init(_object, _object)::{lambda(long)#11}, pybind11::dict, long, pybind11::name, pybind11::scope, pybind11::sibling, char [931], pybind11::arg>(torch::distributed::autograd::(anonymous namespace)::dist_autograd_init(_object, _object)::{lambda(long)#11}&&, pybind11::dict ()(long), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [931], pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so ``` Basically Thread 72, holds GIL and tries to acquire the lock for DistAutogradContainer to perform a lookup on a map. On the other hand, Thread 79 holds the lock on DistAutogradContainer to remove a Tensor and as part of TensorImpl destructor, concrete_decref_fn is called which waits for GIL. As a result, we have a deadlock. To fix this issue, I've ensured we release GIL when we call `retrieveContext` and acquire it later when needed. ghstack-source-id: 133493659 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D29682624 fbshipit-source-id: f68a1fb39040ca0447a26e456a97bce64af6b79c	2021-07-14 13:17:16 -07:00
Stephen Jia	f2adbff36e	[Metal] Do not use read/write textures in concat shaders (#61074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61074 `read_write` textures are not available on some devices, such as iPhone 7. This prevents the concat op from functioning on those devices. This diff rewrites the concat shaders such that they do not depend on `read_write` textures. Test Plan: Test on device: run squeezenet and/or the operator tests ``` arc focus2 pp-ios ``` Test on Mac ``` buck test pp-macos ``` Test specifically on iPhone7, either device or simulator. Reviewed By: xta0 Differential Revision: D29501656 fbshipit-source-id: de4a059953ab4b0abf38b6ecb3f665323dcdbea1	2021-07-14 13:03:48 -07:00
Nikita Shulga	80bdfd64c5	Skip Bfloat16 support when building for VSX (#61630 ) Summary: Copy-paste ifdef guard from vec256/vec256.h Probably fixes https://github.com/pytorch/pytorch/issues/61575 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61630 Reviewed By: janeyx99 Differential Revision: D29690676 Pulled By: malfet fbshipit-source-id: f6d91eadab74bcbcb1dc9854ae1b98a0dccacd14	2021-07-14 13:02:29 -07:00
Mikhail Zolotukhin	43a2f7c26a	[TensorExpr] Do not fuse float16 values. (#61569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61569 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D29672564 Pulled By: ZolotukhinM fbshipit-source-id: fe64ec38209d43f8246bcb6c397b64a28cbd86fa	2021-07-14 12:53:59 -07:00
Bo Wang	ab27399566	Make broadcast_object_list accept a device parameter. (#61305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61305 Part I (this PR): Add dist_device argument to broadcast_object_list API Part II: andwgu@ will deprecate _broadcast_object with the newly introduced API Also include the changes to _object_to_tensor()/_tensor_to_object() with PR 60573 Context: https://github.com/pytorch/pytorch/issues/60062 Test Plan: Run the following on DevGpus with two cuda devices $python setup.py develop --- run this build on DevGPU $BACKEND='nccl' WORLD_SIZE=2 with-proxy python test/distributed/test_distributed_fork.py TestDistBackendWithFork.test_broadcast_object_list --v $BACKEND='gloo' WORLD_SIZE=2 with-proxy python test/distributed/test_distributed_fork.py TestDistBackendWithFork.test_broadcast_object_list --v Build with distributed on: USE_DISTRIBUTE=1 python setup.py develop Test on CPU devvm: $ with-proxy python test/distributed/optim/test_zero_redundancy_optimizer.py Imported from OSS Differential Revision: D29566538 D29566538 Reviewed By: iramazanli, mrshenli Pulled By: bowangbj fbshipit-source-id: 0bea52442551c5194acba85eadda16ba2ec4b6ef	2021-07-14 11:43:17 -07:00
Karen Zhou	9b3cbeaf7d	[pruner] fix activation handles logic (#61592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61592 Add activation handles for each layer (stored in a list), so they can each be removed. We don't remove them in the `convert` in eager mode because we aren't modifying output/input layer dimensions. We will need this in Fx mode though. ghstack-source-id: 133497376 Test Plan: Added some tests to make sure `model(x)` runs without error. `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1LBf4 Reviewed By: z-a-f Differential Revision: D29682789 fbshipit-source-id: 9185702736e5f7f4320754ffef441610738ac154	2021-07-14 11:07:23 -07:00
John Shen	343cb276b0	[pytorch] Add broadcasting support to add_relu kernel (#61584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61584 add_relu is not working with broadcasting. This registers a scalar version of add_relu in native_functions that casts to tensor before calling the regular function. TensorIterator handles broadcasting analogously to existing add. ghstack-source-id: 133480068 Test Plan: python3 test/test_nn.py TestAddRelu Reviewed By: kimishpatel Differential Revision: D29641768 fbshipit-source-id: 1b0ecfdb7eaf44afed83c9e9e74160493c048cbc	2021-07-14 10:32:20 -07:00
Jamie King	c23db9327a	Smart Decay for Adam - Caffe2 (#61548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61548 We want to decay learning parameters properly. Previously this was not done when a parameter is absent from a minibatch. We fix this by keeping track of missed minibatches and making decay catch up accordingly. The exponential moving averages (EMA) for the first and second moments used in Adam are updated only for parameters seen in a minibatch. Actually, for these parameters, 0 should be added to the EMAs and the EMAs should then be decayed by multiplying by beta1 and beta2 respectively. To avoid the computational overhead of touching every parameter for every minibatch, we: * keep track of the last time a parameter is seen * instead of decaying the EMAs by multiplying by beta1 and beta2, we multiply by beta1^k and beta2^k, where k is the number of minibatches since the parameter was last seen * we calculate the amount of momentum that would have been discharged over the missed minibatches and update the weight accordingly. Differential Revision: D29654246 fbshipit-source-id: 7a6cd7966eb1f31116d99dfce79a78b2d3ee9e3e	2021-07-14 10:22:38 -07:00
Kaige Liu	58adaaba60	Enable C2 load rate limiter [2/n] (#61551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61551 We aim to enable rate limiter in C2 load, with a fix bandwidth limit. This diff update LoadOp to pass down the manifold db options. Test Plan: ``` buck test mode/opt caffe2/caffe2/python/operator_test:load_save_test ``` Differential Revision: D29639102 fbshipit-source-id: cf69549adadf4c7f12a8a2b7f3ca39092cab4b99	2021-07-14 08:27:05 -07:00
Andrew Gu	57feb35474	Refactor non-joined process computation (#61555 ) Summary: Overview: This refactors the computation on non-joined processes relating to the join context manager. The concept was inspired by a comment from pritamdamania. Changes: This introduces a `_Joinable` abstract base class, which requires a `_join_hook()` method and `_join_device()` and `_join_process_group()` property methods. Any class that we want to be compatible with the generic join context manager should inherit from `_Joinable` and implement `_join_hook()`, `_join_device()`, and `_join_process_group()`. (The `device` and `process_group` information has been moved from `_JoinHook` to `_Joinable`.) The generic join context manager now takes in a `List[_Joinable]` instead of `List[_JoinHook]`. The motivation for this is that previously, by passing the `_JoinHook`s into the context manager, the class providing a `_JoinHook` can modify the context manager's behavior, but the context manager cannot modify the class's behavior. This is solved by giving the context manager a reference to the class's instance. This implementation reserves the field `_join_config` in every `_Joinable` to store a `_JoinConfig` instance, which holds all dynamic fields needed from the `_Joinable` for the join context manager: `enable`, `throw_on_early_termination`, and `is_first_joinable`. ("dynamic" here means that for a given `_Joinable` instance, the values for those fields may change across different join context usages.) In particular, these fields are needed to implement a method `notify_join_context()`, which encapsulates the computation performed on non-joined processes relating to the join context manager --- (1) the all-reduce to indicate that the process has not yet joined and (2) the all-reduce to check whether to throw an exception if `throw_on_uneven_inputs=True`. The idea is that every `_Joinable` class only needs to make a call to `notify_join_context()` before its per-iteration collective communications; it is a simple one-line addition. Only the first `_Joinable` instance passed into the context manager actually performs the collective communications in `notify_join_context()`. In that case, the method returns an async work handle for the initial all-reduce indicating that the process not yet joined. Otherwise, the method returns `None`. This conditional logic is handled internally without additional input from the user. New API: Now, the example usage would look like: ``` ddp_model = DistributedDataParallel(...) zero_optim = ZeroRedundancyOptimizer(ddp_model.parameters(), ...) with _Join([ddp_model, zero_optim]): ... ``` Any arguments meant for a join hook (e.g. `divide_by_initial_world_size`) must be specified as keyword arguments. For example: ``` with _Join([ddp_model, zero_optim], divide_by_initial_world_size=False): ... ``` They will be forwarded to every `_join_hook()` function via `kwargs`. This creates a clear separation between the variables needed by the context manager (`enable` and `throw_on_early_termination`) and those needed by the `_Joinable` class (e.g. `divide_by_initial_world_size`). Recap:** After this change, the relevant information to use the generic join context manager looks like the following (omitting prefix `_` from names): - Suppose we have a class `C` (e.g. `DistributedDataParallel`) that we want to be able to use the `Join` context. - We make `C` inherit from `Joinable` and implement `join_hook() -> JoinHook`, `join_device()`, and `join_process_group()`. - To implement `join_hook()`, we define a `CJoinHook` class inheriting from `JoinHook` and implement `main_hook()` and `post_hook()` as needed. - We locate a place before `C`'s per-iteration collective communications and add a call to `Join.notify_join_context()`. - We call `Joinable.__init__(self)` in `C`'s constructor. - The `C.join_config` field will be used internally by the context manager. This does not affect `C`'s serializability. - Run time arguments for `C`'s join hook can be passed in as keyword arguments to the context manager: `with Join([C()], arg1=..., arg2=...):`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61555 Test Plan: I ran the existing DDP join tests: ``` touch /tmp/barrier && TEMP_DIR="/tmp" BACKEND="nccl" WORLD_SIZE="2" gpurun python test/distributed/test_distributed_fork.py -- TestDistBackendWithFork.test_ddp_uneven_inputs TestDistBackendWithFork.test_ddp_uneven_inputs_stop_iteration_sync_bn TestDistBackendWithFork.test_ddp_grad_div_uneven_inputs TestDistBackendWithFork.test_ddp_uneven_input_join_disable TestDistBackendWithFork.test_ddp_uneven_input_exception ``` I ran the ZeRO join tests: ``` gpurun4 python test/distributed/optim/test_zero_redundancy_optimizer.py TestZeroRedundancyOptimizerDistributed.test_zero_join_gpu TestZeroRedundancyOptimizerDistributed.test_zero_join_cpu ``` Reviewed By: zou3519 Differential Revision: D29690359 Pulled By: andwgu fbshipit-source-id: 2950f78de755eb5fb13b95b803dd7c705879a9c7	2021-07-14 08:20:40 -07:00
Charles David Hernandez	03a79f43e3	adding support for index_select on quantized tensors (#61406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61406 Only really needed to fix a few select functions so that they could work for quantized tensors. Primarily creation and resizing of tensors required a branch for quantized tensors. This doesn't work for per_channel tensors Test Plan: ```python test/test_quantization.py TestQuantizedTensor.test_qtensor_index_select_cuda``` ```python test/test_quantization.py TestQuantizedTensor.test_qteensor_index_select_cpu``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29654446 fbshipit-source-id: 8fde9b2dd2c3e380cc330bbad71d6c4d2aeec0ab	2021-07-14 05:38:00 -07:00
Hao Lu	a07b08136f	[Static Runtime] Check unsupported up when enabling static runtime (#61613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61613 Reviewed By: ajyu, movefast1990 Differential Revision: D29663466 fbshipit-source-id: d819903b7227f534c0a4fffa5eeea2b5c0c04750	2021-07-14 02:13:51 -07:00
James Reed	ac64a41e8a	[FX][docs] Add note about python set pitfall (#61597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61597 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D29685735 Pulled By: jamesr66a fbshipit-source-id: b5c5b53ff94fac1022f69b7c0ad4e4055b116029	2021-07-13 20:09:13 -07:00
Rong Rong (AI Infra)	9ade039593	fix test file not found issue (#61610 ) Summary: it should not error out if the file is not found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61610 Reviewed By: samestep Differential Revision: D29687958 Pulled By: walterddr fbshipit-source-id: 17cacba8daa131df9bfb37fd58d6e4870ff75198	2021-07-13 17:50:50 -07:00
Dimitrije Jankov	2ab8126e36	Add NewLib support (#60345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60345 Add support for NewLib, an embedded libc variant by re-using existing Android library stubs plus few NewLib specific guards Problem: Newlib is a C standard library intended for embedded use, similarly to how Android uses bionic. This causes some incompatibility with the math functions that are present in glibc but not Newlib (and some versions bionic) and makes porting PyTorch to environments such as SGX hard. Solution: Subscribed Newlib to the same fixes present for older versions of Android and add fixes specific for Newlib Test Plan: Run the PyTorch tests. Reviewed By: malfet Differential Revision: D29022623 fbshipit-source-id: 028dd7ff9b3ee394371c275642c90c9ef108e639	2021-07-13 17:26:45 -07:00
Can Balioglu	8e6d8991b2	[torch/elastic] Fix the agent store key prefix used by workers (#61590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61590 This PR fixes the bug where the state of the first run of a failed training job leaks to the secondary runs due to constant worker key prefix. ghstack-source-id: 133494239 Test Plan: Run the existing integ tests. Reviewed By: SciPioneer Differential Revision: D29682743 fbshipit-source-id: d96ecadcfe5b6563225ee19f5d0776c7f935393a	2021-07-13 14:57:27 -07:00
Scott Wolchok	523d6fe27c	[PyTorch] Remove unnecessary std::string in Device.cpp (#61502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61502 No reason not to use string literals here. ghstack-source-id: 133449808 Test Plan: buildsizebot Reviewed By: dhruvbird Differential Revision: D29648079 fbshipit-source-id: 74ecf12283c2f196b4b3edb75c6bb1eeed51322e	2021-07-13 14:36:13 -07:00
dependabot[bot]	72394aaf68	Bump addressable from 2.7.0 to 2.8.0 in /ios/TestApp (#61573 ) Summary: Bumps [addressable](https://github.com/sporkmonger/addressable) from 2.7.0 to 2.8.0. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/sporkmonger/addressable/blob/main/CHANGELOG.md">addressable's changelog</a>.</em></p> <blockquote> <h1>Addressable 2.8.0</h1> <ul> <li>fixes ReDoS vulnerability in Addressable::Template#match</li> <li>no longer replaces <code>+</code> with spaces in queries for non-http(s) schemes</li> <li>fixed encoding ipv6 literals</li> <li>the <code>:compacted</code> flag for <code>normalized_query</code> now dedupes parameters</li> <li>fix broken <code>escape_component</code> alias</li> <li>dropping support for Ruby 2.0 and 2.1</li> <li>adding Ruby 3.0 compatibility for development tasks</li> <li>drop support for <code>rack-mount</code> and remove Addressable::Template#generate</li> <li>performance improvements</li> <li>switch CI/CD to GitHub Actions</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`6469a232c0`"><code>6469a23</code></a> Updating gemspec again</li> <li><a href="`24336385de`"><code>2433638</code></a> Merge branch 'main' of github.com:sporkmonger/addressable into main</li> <li><a href="`e9c76b8897`"><code>e9c76b8</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sporkmonger/addressable/issues/378">https://github.com/pytorch/pytorch/issues/378</a> from ashmaroli/flat-map</li> <li><a href="`56c5cf7ece`"><code>56c5cf7</code></a> Update the gemspec</li> <li><a href="`c1fed1ca0a`"><code>c1fed1c</code></a> Require a non-vulnerable rake</li> <li><a href="`0d8a3127e3`"><code>0d8a312</code></a> Adding note about ReDoS vulnerability</li> <li><a href="`89c76130ce`"><code>89c7613</code></a> Merge branch 'template-regexp' into main</li> <li><a href="`cf8884f815`"><code>cf8884f</code></a> Note about alias fix</li> <li><a href="`bb03f7112e`"><code>bb03f71</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sporkmonger/addressable/issues/371">https://github.com/pytorch/pytorch/issues/371</a> from charleystran/add_missing_encode_component_doc_entry</li> <li><a href="`6d1d8094a6`"><code>6d1d809</code></a> Adding note about :compacted normalization</li> <li>Additional commits viewable in <a href="https://github.com/sporkmonger/addressable/compare/addressable-2.7.0...addressable-2.8.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=addressable&package-manager=bundler&previous-version=2.7.0&new-version=2.8.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `dependabot rebase` will rebase this PR - `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `dependabot merge` will merge this PR after your CI passes on it - `dependabot squash and merge` will squash and merge this PR after your CI passes on it - `dependabot cancel merge` will cancel a previously requested merge and block automerging - `dependabot reopen` will reopen this PR if it is closed - `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/pytorch/pytorch/network/alerts). </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/61573 Reviewed By: xta0 Differential Revision: D29685329 Pulled By: seemethere fbshipit-source-id: a43008155144a358950dc3ed1934fcc470b73c02	2021-07-13 14:30:33 -07:00
Angela Yi	0751a41ab1	[quant] Input-Weight Equalization - ConvReLU support (#61350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61350 Applied changes in convert to allow for ConvReLU2d layers Initial Model: `x -> conv1 -> relu` After fusion: `x -> convRelu2d` After prepare: `x -> input_quant_obs -> input_eq_obs1 -> convRelu2d -> output_quant_obs1` After equalization functions: `x -> mul -> input_quant_obs (scaled) -> convRelu2d -> output_quant_obs` After convert: `x -> mul -> quantize_per_tensor -> quantized::convRelu2d -> dequantize` Test Plan: `python test/test_quantization.py TestEqualizeFx` Initial Model: ``` ConvReluModel( (fc): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1)) (relu): ReLU() ) ``` After prepare: ``` GraphModule( (x_activation_post_process_0): MinMaxObserver(min_val=5.960464477539063e-08, max_val=0.9999999403953552) (x_activation_post_process_0_equalization_process_0): _InputEqualizationObserver( (input_obs): PerChannelMinMaxObserver(min_val=tensor([1.1921e-07, 3.3379e-06, 5.9605e-08]), max_val=tensor([1.0000, 1.0000, 1.0000])) ) (fc): ConvReLU2d( (0): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1)) (1): ReLU() ) (fc_activation_post_process_0): MinMaxObserver(min_val=0.0, max_val=1.2341605424880981) ) graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {}) return fc_activation_post_process_0 ``` After equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0,), kwargs = {}) %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {}) return fc_activation_post_process_0 ``` After convert: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %fc_input_scale_0 : [#users=1] = get_attr[target=fc_input_scale_0] %fc_input_zero_point_0 : [#users=1] = get_attr[target=fc_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %fc_input_scale_0, %fc_input_zero_point_0, torch.quint8), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%quantize_per_tensor,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%fc,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29638275 fbshipit-source-id: 40d4666a4451e132612ea38fdfeaaec177a1defb	2021-07-13 14:00:40 -07:00
Angela Yi	b3e4dab45a	[quant] Input-Weight Equalization - Conv convert support (#61287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61287 Modifications to functions during convert() to support equalization. Note that this implementation does not work for connected F.conv2d layers yet. Initial: ``` w \| x -> conv -> y ``` After prepare: ``` w \| weight_quant_obs \| weight_eq_obs \| x -> input_quant_obs -> input_eq_obs -> conv -> out_quant_obs -> y ``` After convert: ``` scale, zero_point w (scaled) \| \| x -> mul -> quantize_per_tensor (scaled) -> quantized::conv -> dequant -> y \| eq_scale ``` Test Plan: `python test/test_quantization.py TestEqualizeFx` Initial model: ``` ConvModel( (conv): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1), bias=False) ) ``` After prepare: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {}) return conv_activation_post_process_0 ``` After equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0,), kwargs = {}) %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {}) return conv_activation_post_process_0 ``` After convert: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %conv_input_scale_0 : [#users=1] = get_attr[target=conv_input_scale_0] %conv_input_zero_point_0 : [#users=1] = get_attr[target=conv_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %conv_input_scale_0, %conv_input_zero_point_0, torch.quint8), kwargs = {}) %conv : [#users=1] = call_module[target=conv](args = (%quantize_per_tensor,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%conv,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29557055 fbshipit-source-id: dc9f44182e31fa362c43ad2dfe224e6f4e4a730e	2021-07-13 14:00:38 -07:00
Angela Yi	77d36b657a	[quant] Input-Weight Equalization - Conv prepare support (#61286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61286 Modifies the prepare step to support conv layers during input-weight equalization and adds tests to make sure that the results are as expected. Initial: ``` w \| x -> conv -> y ``` After prepare: ``` w \| weight_quant_obs \| weight_eq_obs \| x -> input_quant_obs -> input_eq_obs -> conv -> out_quant_obs -> y ``` Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_prepare` Initial: ``` ConvModel( (conv): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1), bias=False) ) ``` After prepare: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {}) return conv_activation_post_process_0 ``` Imported from OSS Reviewed By: supriyar Differential Revision: D29557051 fbshipit-source-id: 25d1531645dfaf565f5c615e2ee850fcf96c7eb9	2021-07-13 14:00:36 -07:00
Angela Yi	ce9cedd119	[quant] Input-Weight Equalization - Conv observer support (#61285 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61285 Modifies observers to support conv layers and tests to make sure that the observers are returning the expected values for conv inputs. Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_eq_observer` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29557041 fbshipit-source-id: 5e43329f189ba352eb8b991f38bf37752eebb6e6	2021-07-13 13:59:23 -07:00
Anjali Chourdia	30e48bbeae	Add neg bit (#56058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56058 User facing changes: 1. Adds a negative bit and corresponding new API (`is_neg()`,`resolve_neg()`) 2. `tensor.conj().imag` now returns a floating point tensor with neg bit set to 1 instead of a tensor with no notion of negative bit. Note that imag is still a view and all the view properties still hold for imag. Non user facing changes: 1. Added a new Negative dispatch key and a backend fallback to handle it 2. Updated copy kernel to handle negative bit 3. Merged conjugate and negative bit fallback kernel 4. fixed https://github.com/pytorch/pytorch/issues/60478 (caused due to https://github.com/pytorch/pytorch/pull/54987) Testing: 1. Added a new OpInfo based test `test_neg_view` (verifies that out-of-place and in-place operations work correctly for all operations when the input is a neg view tensor by checking the result against an actually negated tensor, verifies that autograd returns the same output for both neg view and actually negated tensors as well as it works fine when grad_out is a neg view). 2. Added a new test class containing `test_conj_view`, `test_neg_view`. Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29636403 fbshipit-source-id: 12214c9dc4806c51850f4a72a109db9527c0ca63	2021-07-13 13:50:42 -07:00
Aliaksandr Ivanou	60382de455	[torch] Set `nproc_per_node` to 1 (#61552 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61552 Set `nproc_per_node` to 1 Test Plan: unittests Reviewed By: cbalioglu Differential Revision: D29667056 fbshipit-source-id: 6601f66fec5e018c7737d909f8c71642451abb29	2021-07-13 13:35:25 -07:00
Max Motovilov	437e7d9fc9	codegen_backend_module() now passes correct type designators to isinstance in the generated script Summary: For methods returning complex (i.e. container) types, the existing code attempted to pass type designators with unsupported syntax (e.g. `Tensor[]`) into `isinstance`. Will now use the correct syntax supported by TorchScript (i.e. `List[Tensor]`). Test Plan: Unfortunately, a backend supporting methods returning container types has not yet been identified so the functionality cannot be tested end-to-end. Adding a printout of `method_ct.format(method_te)` before https://fburl.com/code/4619d12g lets inspect the difference in the generated method body, e.g.: ``` assert isinstance(_0, List[Tensor]) ``` vs ``` assert isinstance(_0, Tensor[]) ``` Reviewed By: allwu Differential Revision: D29537358 fbshipit-source-id: 3356f3c1477aa9304e1f070711f480441579414d	2021-07-13 12:18:17 -07:00
Akshit Khurana	b42cc19c88	Fix broken assertion error test in NNAPI convertor (#61586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61586 Error message was changed Test Plan: pytest test/test_nnapi.py: Imported from OSS Reviewed By: iseeyuan Differential Revision: D29682319 fbshipit-source-id: 52a96d79633ee9aae1de2056c7583311edc92353	2021-07-13 11:46:32 -07:00
Eli Uriegas	2ade4d2a92	.github: Ensure clean workspaces before checkout (#61565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61565 I was noticing the checkout step failing a lot for me, this adds a cleaning step to fully remove the github workspace before attempting to do your checkout Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: zhouzhuojie Differential Revision: D29671074 Pulled By: seemethere fbshipit-source-id: 43a8f9a9a272c6bdbfffa9c6263443aac37f4b89	2021-07-13 11:13:48 -07:00
Rohan Varma	d5204064dc	[BE] Fix flaky ProcessGroupGloo tests (#61396 ) Summary: A hypothesis as to why tests such as https://github.com/pytorch/pytorch/issues/57469 may be flaky is due to `c10d = ProcessGroupGloo(...)` is not actually guaranteed to be a synchronization point, so some ranks may create the PG, run all the error checking (which does not actually call into gloo APIs so doesn't require synchronization), and then exit, all before other ranks have created the gloo pg. This can result in the following error: ``` File "distributed/test_c10d_gloo.py", line 1037, in test_reduce_checks May 03 06:42:34 pg = c10d.ProcessGroupGloo(store, self.rank, self.world_size, self.opts()) May 03 06:42:34 RuntimeError: [/var/lib/jenkins/workspace/third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [127.0.0.1]:35521 ``` which indicates that the remote end has hung up. Furthermore all the flaky tests in this file only do error checking and don't call into the gloo APIs, further indicating that this issue may be the root cause. Not 100% sure this PR will fix it because I haven't been able to actually repro the issue even after 10000+ runs, but it happens regularly in CI. To fix this, we add a `dist.barrier(group=pg)` call after creating the pg to enforce a synchronization. Would be good to land this and observe whether it helps with the flakiness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61396 Reviewed By: mrshenli Differential Revision: D29664189 Pulled By: rohan-varma fbshipit-source-id: bc046d5d816fe6cb426522b85312383bfa3f90b7	2021-07-13 10:34:59 -07:00
Heitor Schueroff	3e5d2b539d	Replace deprecated comment with C10_DEPRECATED in linalg.h (#60374 ) Summary: Replace // DEPRECATED comment with C10_DEPRECATED. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60374 Reviewed By: H-Huang Differential Revision: D29661630 Pulled By: heitorschueroff fbshipit-source-id: fc086276fd7d3ddfb8d17c67ade456377ef0e990	2021-07-13 08:21:22 -07:00
Nikita Shulga	9679fa7f30	Update cpp_extension.py (#61484 ) Summary: By default, majority of Python-3.[6789] installation comes with `pkg_resources.packaging` version 16.8 (or `setuptool` older than 49.6.0), which does not have major/minor properties on Version package, as one can observe in https://github.com/pypa/setuptools/blob/v49.5.0/pkg_resources/_vendor/packaging/version.py On the other hand, compare operators exists, so why not use it to check for version equality Fixes https://github.com/pytorch/pytorch/issues/61036 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61484 Reviewed By: walterddr, seemethere Differential Revision: D29643883 Pulled By: malfet fbshipit-source-id: 3db9168c1b009ac3a278709083ea8c5b417471b8	2021-07-13 07:11:58 -07:00
Tongliang Liao	0afbb9e81e	`PYTHON_LIBRARY` may be set to empty or NOTFOUND. (#61230 ) Summary: Not sure why (maybe from dependencies?) but it can certainly break package lookup upon re-entry of cmake. So instead of checking whether they are defined, we should check whether there is any meaningful value inside. Fixes https://github.com/pytorch/pytorch/issues/59887 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61230 Reviewed By: H-Huang Differential Revision: D29668766 Pulled By: malfet fbshipit-source-id: 79a59578740c4434327aff4f9a22eba9c4bf48d1	2021-07-13 07:09:31 -07:00
Michael Melesse	ac6ec0efa1	[ROCM] fix bug in #60313 (#61073 ) Summary: This PR fixes a bug in https://github.com/pytorch/pytorch/issues/60313. Where the tensors generated by _generate_valid_rocfft_input are on the cpu instead of the gpu. This was due to using numpy to generate tensors and converting it to pytorch using torch.from_numpy. This leads to the generated tensors staying on the cpu. We now generate the tensors using pytorch itself which carries over the device type of the input tensors to the generated tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61073 Reviewed By: H-Huang Differential Revision: D29668418 Pulled By: malfet fbshipit-source-id: ce2025c26d079c15603a89b9bf7878f48d73155e	2021-07-13 07:08:17 -07:00
Jiewen Tan	2e49c5dc37	Move GetArgumentNamesModule registration to InterpreterManager() (#61549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61549 Move GetArgumentNamesModule registration to InterpreterManager() such that the module is a permanent part of the interpreters and can be used by InterpreterSession.global() freely. Test Plan: [... ~/fbsource/fbcode/caffe2] buck test mode/dev caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor.GetArgumentNames Reviewed By: wconstab Differential Revision: D29643460 fbshipit-source-id: cf132d4795cbb334ce164ac715d590a105535508	2021-07-13 00:57:01 -07:00
Meghan Lele	5144381b1d	[pytorch][JIT] Widen exception caught by ScriptList casting (#61520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61520 This commit widens the exception caught by the try-catch block that checks if an object passed to a scripted function is a `ScriptList`. It turns out that there are internal tests that do not throw a `py::cast_error` so catching only that is not sufficient. Test Plan: Ran the failing tests in T94889011. Reviewed By: Chillee Differential Revision: D29560815 fbshipit-source-id: 442258f8997146d833a9d5db923e1f6359f2bfdd	2021-07-12 23:20:58 -07:00
Dimitrije Jankov	94840969e4	SGX can not read from /dev/urandom (#60368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60368 Problem: The SGX secure enclave does not support reading from /dev/urandom as it is isolated from the OS for greater security. The SGX api provides a way to generate random numbers as a replacment. Solution: Conditionally enable SGX api for random number generation when building for it. Test Plan: Run the PyTorch tests Reviewed By: malfet, LiJihang Differential Revision: D29022616 fbshipit-source-id: 1c7115457a2abde682df4d55fa4a8446fc5f8613	2021-07-12 20:43:23 -07:00
Don Jang	8a2c7d902f	[static runtime] Add DCHECK to ensure that outputs do not overlap with immutable inputs (#61301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61301 This change adds a `DCHECK` to ensure that outputs do not overlap with immutable inputs. Test Plan: Added unittests as follows: - `ProcessedNode.VerifyOutputsNotOverlappingWithImmutableInputsWithImmutableArguments` - `ProcessedNode.VerifyOutputsNotOverlappingWithImmutableInputsWithMutableArguments` Reviewed By: hlu1 Differential Revision: D29564158 fbshipit-source-id: bf14b4978ab544af79010cf724ed28202b4521cc	2021-07-12 18:04:05 -07:00
Vitaly Fedyunin	4ef640d6f6	Sort imports of test_datapipe.py (#61312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61312 Sorting according to isort output. Alphabetically ordered one per line imports help merging. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29588833 Pulled By: VitalyFedyunin fbshipit-source-id: 4c80c3086132b50894e734ad6c5799d78d689e42	2021-07-12 15:33:20 -07:00
Vitaly Fedyunin	fd13e925ec	Adding backward compatibility for sharding support in old DataLoader (#61237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61237 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29588832 Pulled By: VitalyFedyunin fbshipit-source-id: 3bfa4417f6a04450f656ecf28fc95322d2cf076a	2021-07-12 14:53:45 -07:00
Vitaly Fedyunin	d3cb065b2f	Implement usage of `is_shardable` and `apply_sharding` (#61236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61236 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29588835 Pulled By: VitalyFedyunin fbshipit-source-id: 00c3042f96af498637b2dcf6e3f842c1fc05ddd8	2021-07-12 14:23:20 -07:00
Joel Schlosser	4d842d909b	Revert FC workaround for ReflectionPad3d (#61308 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61248 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61308 Reviewed By: iramazanli Differential Revision: D29566849 Pulled By: jbschlosser fbshipit-source-id: 8ab443ffef7fd9840d64d71afc2f2d2b8a410ddb	2021-07-12 14:19:07 -07:00
Eli Uriegas	2fd37a830e	Revert D29642893: .github: Add force_on_cpu tests for windows Test Plan: revert-hammer Differential Revision: D29642893 (`a52de0dfec`) Original commit changeset: 2dd2b295c71d fbshipit-source-id: c01c421689f6d01cdfb3fe60a8c6428253249c5f	2021-07-12 14:01:44 -07:00
Eli Uriegas	7fdce39a4b	Revert D29642891: .circleci: Remove force_on_cpu jobs from circleci Test Plan: revert-hammer Differential Revision: D29642891 (`2aedd17661`) Original commit changeset: d51bb859bc28 fbshipit-source-id: a39a2d57d6e68961d94d4137a57bdc280f9b1b5b	2021-07-12 13:59:39 -07:00
Michael Dagitses	58df01c3b8	clarify default value of requires_grad for tensors (#61038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61038 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29491984 Pulled By: dagitses fbshipit-source-id: 7e6b7f8e81d77f38c881b86a68c17d3cf5483dad	2021-07-12 12:57:37 -07:00
Michael Dagitses	5897a60480	warn about SVD outputs not supporting backprop (#61037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61037 * #61037 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29491985 Pulled By: dagitses fbshipit-source-id: 6322e7c86cade52671062ee97d2fcb8c15d8aa86	2021-07-12 12:55:37 -07:00
Rong Rong (AI Infra)	65ab861ec6	fix mm not correctly report TORCH_CHECK failure issue (#61394 ) Summary: fixes https://github.com/pytorch/pytorch/issues/61291. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61394 Reviewed By: zhouzhuojie, seemethere Differential Revision: D29614208 Pulled By: walterddr fbshipit-source-id: f49a15dde708e30b06059b47fae1cda7c2c3571c	2021-07-12 12:50:51 -07:00
vfdev	68f9819df4	Typo fix (#41121 ) Summary: Description: - Typo fix in the docstring Pull Request resolved: https://github.com/pytorch/pytorch/pull/41121 Reviewed By: heitorschueroff Differential Revision: D29660228 Pulled By: ezyang fbshipit-source-id: fc2b55683ec5263ff55c3b6652df3e6313e02be2	2021-07-12 12:43:47 -07:00
Jeff Hwang	255a324258	add nesting_level as attribute to pickle for map datapipe (#61534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61534 currently, attribute `nesting_level` on `MapIterDataPipe` is not pickled. this yields `AttributeError` exceptions when multiprocessing with `DataLoader` this diff adds it as an attribute to pickle Test Plan: confirmed errors go away after change Reviewed By: ejguan Differential Revision: D29648655 fbshipit-source-id: 943b57eaff9712eb7ce92f43cb360acdb3111f2b	2021-07-12 11:41:01 -07:00
Elton Leander Pinto	5144cc029e	Bump docker image tag for clang-tidy (#61545 ) Summary: Fixes recent `clang-diagnostic-errors` on clang-tidy runs See https://github.com/pytorch/test-infra/pull/59 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61545 Reviewed By: malfet, seemethere Differential Revision: D29664061 Pulled By: 1ntEgr8 fbshipit-source-id: cca482a8774e34e61919f2298846ae0b479bf224	2021-07-12 11:32:39 -07:00
Rong Rong (AI Infra)	a5a10fe353	Move all downloading logic out of common_utils.py (#61479 ) Summary: and into tools/ folder Currently run_tests.py invokes tools/test_selections.py 1. download and analyze what test_file to run 2. download and parse S3 stats and pass the info to local files. 3. common_utils.py uses download S3 stats to determine what test cases to run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61479 Reviewed By: janeyx99 Differential Revision: D29661986 Pulled By: walterddr fbshipit-source-id: bebd8c474bcc2444e135bfd2fa4bdd1eefafe595	2021-07-12 11:23:22 -07:00
Eli Uriegas	2aedd17661	.circleci: Remove force_on_cpu jobs from circleci (#61473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61473 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D29642891 Pulled By: seemethere fbshipit-source-id: d51bb859bc28efe15618d1e65f1a1cee64d60508	2021-07-12 11:17:33 -07:00
Eli Uriegas	a52de0dfec	.github: Add force_on_cpu tests for windows (#61472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61472 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D29642893 Pulled By: seemethere fbshipit-source-id: 2dd2b295c71d79593ad7f71d6160de4042c08b80	2021-07-12 11:16:17 -07:00
Amy He	51d18369c3	[1/N] Nnapi backend delegation preprocess (#61499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61499 Added a preprocess function for the delegate to Nnapi backend (internal and external files). In the past we had functions and classes for converting to the Nnapi backend. Now, these functions and classes will be wrapped by the delegate API. ### nnapi_backend_preprocess.cpp: Contains the preprocess function, which uses Pybind to call an existing python function, `convert_model_to_nnapi()`. - The model is wrapped by a `RecursiveScriptModule`, so that `convert_model_to_nnapi()` can run correctly, since when jumping from Python to C++ to Python, the model loses its original wrapper. - A tensor, which includes shape, data type, and quantization information, is passed through preprocess's compile_spec to `convert_model_to_nnapi()`. - Finally, the Nnapi model is serialized for mobile and returned as a string. ### nnapi_backend_lib.cpp: Contains stub functions for compile and execute, and is necessary for the Nnapi backend to be registered correctly. These will be implemented in a future PR. TODO: implement execute and compile for the delegate API; throw exceptions for incorrect an compile_spec; add OSS tests Testing: Tests were done locally (see D29647123). A simple module was lowered to Nnapi, saved locally, and examined. ghstack-source-id: 133415234 Test Plan: Tests were done locally (see D29647123). TODO: add test in OSS in test_backends.py after CMake is ready. I ran buck run caffe2:nnapi_backend_example. The model files are saved as nnapi_model.ptl and mobile_model.ptl. I checked that both zip files have expected contents. Reviewed By: iseeyuan Differential Revision: D29563351 fbshipit-source-id: 642e349356e38aecc1b9973c285569650c02668c	2021-07-12 11:13:05 -07:00
kshitij12345	3faf6a715d	[special] migrate log_softmax (#60512 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Rendered Docs: https://14335157-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.log_softmax Pull Request resolved: https://github.com/pytorch/pytorch/pull/60512 Reviewed By: iramazanli Differential Revision: D29626262 Pulled By: mruberry fbshipit-source-id: c42d4105531ffb004f11f1ba6ae50be19bc02c91	2021-07-12 11:01:25 -07:00
Vitaly Fedyunin	f2857883c4	Add DataPipes Graph Functions (#61235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61235 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29588834 Pulled By: VitalyFedyunin fbshipit-source-id: e0331d6e1fc2a3f8b6211aac83965bcf13165161	2021-07-12 10:28:35 -07:00
Thomas J. Fan	25a705610f	ENH Adds support for no-batch dim in AdaptiveAvgPool1d (#61264 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61264 Reviewed By: iramazanli Differential Revision: D29615292 Pulled By: jbschlosser fbshipit-source-id: 826d1c87d67261a7211270e90e3a1022bbbe37bd	2021-07-12 10:24:37 -07:00
Richard Zou	583b045fc3	Make .contiguous(memory_format) call .clone(memory_format) (#61456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61456 functorch is unable to `vmap(grad(f))` when `f` contains a `.contiguous` call. This is because `.contiguous` (when it is not a no-op) decomposes to `.copy_` under grad and the `.copy_` is not compatible with vmap. The fix for this is to have `.contiguous` call `.clone` instead of `.copy_`. `clone` is a primitive w.r.t. to autograd, so `grad` decomposes contiguous into clone. Perf testing (forward pass) - [script and output](https://gist.github.com/zou3519/294f583b9c5d7bdf234d5295f97fb02e) - The instruction count increased from 774479 to 781379. This is because we're now calling .clone(), which does an additional dispatch. We could optimize the implementation of clone() to not dispatch on .copy_() in the future if we really care about this. Perf testing (backward pass) - [script and output](https://gist.github.com/zou3519/6fbdb121de6342334192d55c8a72276a) - The instruction count decreased from 5402648 to 5335977. This is because the [backward for .clone](`9b908ab0d0/tools/autograd/derivatives.yaml (L383)`) is a lot simpler than the [backward for copy_](`9b908ab0d0/torch/csrc/autograd/functions/tensor.cpp (L37-L41)`) - The backward for .clone() and .copy_() end up doing the same thing for contiguous (from reading the code above, they both do no-op copies). Test Plan: - wait for existing tests (test_view_ops have the tests) - functorch isn't tested in PyTorch CI yet. - Taking suggestions on how to write a test for this. I'm thinking we could use LoggingTensor from #59760 (because it logs underneath autograd) and test that clone is called instead of copy_ but I didn't want to refactor it into a utility Reviewed By: soulitzer Differential Revision: D29636859 Pulled By: zou3519 fbshipit-source-id: 97eb56bfae1c4bb31612dc9d06536019f21d69a6	2021-07-12 10:19:33 -07:00
Ansha Yu	5a20c56ebc	[static runtime] Remove hasOperation() check (#61496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61496 glow::FusionGroup is JitOnlyOperator that produces an Operation when passed a Node* https://fburl.com/ybwfn3bl hasOperation doesn't return true in that case https://fburl.com/19wd10aw by removing the hasOperation() check, the Operation gets successfully materialized, and static runtime enables successfully and runs ok. Will check that the outputs match with jit interpreter Test Plan: Test with 281805158_2 ``` ./buck-out/gen/admarket/lib/ranking/prediction_replayer/replayer --model_inference_type_target=DISAGG_ACCELERATOR --prediction_replayer_force_model_type=inline_cvr_post_imp_model --prediction_replayer_force_model=281805158_2 --prediction_replayer_target_tier=127.0.0.1:7447 --prediction_replayer_input_stream_filename=/data/users/ansha/tmp/adfinder/filter_requests_inline_cvr_post_imp_model_1000_2021_04_29 --ignore_model_id_mismatch --check_performance --fully_remote_sr_connection_options="overall_timeout:10000000,processing_timeout:10000000" --use_new_encoding_for_ads_services --use_new_encoding_from_model_id_to_shard_id --sigrid_force_model_dir=/data/users/ansha/tmp/adfinder/281805158_2/ --sigrid_predictor_model_suffix=.predictor.disagg.local —use_new_encoding_from_model_id_to_shard_id=true --prediction_replayer_force_model_kind=19 --pytorch_predictor_static_runtime_enable=true --prediction_replayer_target_qps=1 ``` ``` NNPI_LOG_LEVEL=0 USE_INF_API=1 ./buck-out/gen/sigrid/predictor/sigrid_remote_predictor_glow_nnpi \ --force_models=281805158_2 \ --sigrid_predictor_model_suffix=.predictor.disagg.remote_other \ --gflags_config_path=sigrid/predictor/gflags/predictor_gflags_ads_perf_glow_nnpi_pyper_v1 \ --smc_server_port=7447 \ --sigrid_predictor_tier_name=sigrid.predictor.perf.dianshi_staticruntime_debug_0604.test.storage \ --predictor_storage_smc_tier=sigrid.predictor.perf.dianshi_staticruntime_debug_0604.test.storage \ --predictor_storage_smc_tier_v2=sigrid.predictor.perf.dianshi_staticruntime_debug_0604.test.storage \ --torch_glow_min_fusion_group_size=30 \ --glow_enable_sanitize_inputs=100 \ --sigrid_force_model_dir=/data/users/ansha/tmp/adfinder/281805158_2/ \ --pytorch_predictor_static_runtime_enable=true \ --pytorch_predictor_glow_enable=true \ --pytorch_predictor_enable_loading_xl_format_on_cpu=false \ --pytorch_disagg_acc_input_dump_path=/tmp/ ``` Reviewed By: hlu1 Differential Revision: D29647043 fbshipit-source-id: 8ce6dc0f4f0464b65ca6a8c9d42e3d8bb392e66e	2021-07-12 10:09:33 -07:00
Vitaly Fedyunin	99959fe3f5	[DataLoader] Adding demux and mux DataPipe-s (#61234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61234 * #61234 [WIP] Adding demux and mux DataPipe API examples Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29588836 Pulled By: VitalyFedyunin fbshipit-source-id: 523d12ea6be7507d706b4c6d8827ec1ac4ccabc3	2021-07-12 10:04:03 -07:00
Kushashwa Ravi Shrimali	d46689a201	OpInfo reference tests for `add` and `sub` (#61169 ) Summary: This PR adds OpInfo reference checks for `add, sub`. See https://github.com/pytorch/pytorch/issues/54261 cc: mruberry pmeier Pull Request resolved: https://github.com/pytorch/pytorch/pull/61169 Reviewed By: iramazanli Differential Revision: D29625702 Pulled By: mruberry fbshipit-source-id: c5e536ab52865890990353c5c862b44b5a16ed20	2021-07-12 09:27:22 -07:00
Xiao Wang	c18017190b	Relax some linalg test tolerances (#61101 ) Summary: We are seeing some test failures on A100 machine, though TF32 matmul is not involved in these cases. I tried `svd_lowrank` test. It passed while testing itself, but failed when I run the whole test suite. It's probably some random seed issue. Relax test tolerance would be much easier to do. Some SVD tests failed when we compare CPU float32 vs GPU float32. Since linear algebra are sort of unstable at single precision, comparing two single precision results may give some false positives. So we calculate CPU results in float64 or complex128, which is much more accurate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61101 Reviewed By: ngimel Differential Revision: D29593483 Pulled By: mruberry fbshipit-source-id: 3df651e3cca1b0effc1a4ae29d4f26b1cb4082ed	2021-07-12 09:17:59 -07:00
Edward Yang	bacf8ecbd1	Make pin_memory/is_pinned use BackendSelect (#60547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60547 These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. https://github.com/pytorch/pytorch/pull/59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD, bdhirsh Differential Revision: D29331881 Pulled By: ezyang fbshipit-source-id: db3b4e2c872ba1caa0243fecc60a4da65179ce28	2021-07-12 09:13:14 -07:00
Masaki Kozuki	7136a62b56	Add `expecttest` to CONTRIBUTING.md (#61163 ) Summary: Now expecttest is an independent library but `CONTRIBUTING.md` and `requirements.txt` do not mention the need of the library. Related: https://github.com/pytorch/pytorch/pull/60658 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61163 Reviewed By: heitorschueroff Differential Revision: D29660296 Pulled By: ezyang fbshipit-source-id: e2e86d42526c83bec7cdf7221e19fe83d9686103	2021-07-12 09:11:12 -07:00
hauntsaninja	8754238410	torch._utils.ExceptionWrapper: fix for Exceptions with multiple args (#58131 ) Summary: Here's an example of what this PR should fix: ``` from torch._utils import ExceptionWrapper class TwoArgException(Exception): def __init__(self, msg, count): ... # If you need a "real world" exception with two args, here's one from the stdlib: # import asyncio # TwoArgException = asyncio.exceptions.LimitOverrunError # or if on Python 3.7, try: # TwoArgException = asyncio.streams.LimitOverrunError try: raise TwoArgException("oh no", 0) except Exception as e: data = ExceptionWrapper(where="in a test case") data.reraise() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58131 Reviewed By: heitorschueroff Differential Revision: D29660248 Pulled By: ezyang fbshipit-source-id: cbcecfee9cac183354542e147ee3d956038c8986	2021-07-12 09:04:36 -07:00
Antonio Cuni	93d98ecef7	update the pytorch-gdb example so that it works on current master (#61175 ) Summary: As pointed out by https://github.com/pytorch/pytorch/pull/54339#issuecomment-872827580, the `pytorch-gdb` example is currently broken because the code has been refactored. This PR updates the example so that it works again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61175 Reviewed By: heitorschueroff Differential Revision: D29660336 Pulled By: ezyang fbshipit-source-id: 8bcd32fc583c0b28a705ef37203ce7ad4d636732	2021-07-12 08:57:18 -07:00
cyy	0de35fe039	fix return local reference (#59913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59913 Reviewed By: soulitzer Differential Revision: D29107110 Pulled By: ezyang fbshipit-source-id: c0f9888867c7dfeb05f6a3b9d2067df35e1e3ffb	2021-07-12 08:29:32 -07:00
Jane Xu	d4549ba5dc	Add VS_VERSION to Circle (#61532 ) Summary: Fixes current HUD 10.1 failure https://app.circleci.com/pipelines/github/pytorch/pytorch/349359/workflows/ead2904b-3f37-4c9d-b271-a8e772046523/jobs/14713215 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61532 Test Plan: The new 10.1 CI run: https://app.circleci.com/pipelines/github/pytorch/pytorch/349677/workflows/b7143b56-e8e7-4f85-8bdf-0ce50788f3c0/jobs/14727686 Reviewed By: walterddr Differential Revision: D29661179 Pulled By: janeyx99 fbshipit-source-id: 5023c41fe6ddce4113116b07d8f0fd7d66c864a8	2021-07-12 08:21:02 -07:00
cyy	00c4897c51	use make_unique (#61272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61272 Reviewed By: pbelevich Differential Revision: D29660354 Pulled By: ezyang fbshipit-source-id: f0aba1ea6983aec415915ed9b7dbced2e2b3b171	2021-07-12 08:09:46 -07:00
mdmn07C5	ac086ca15b	Update version.txt file path (#61177 ) Summary: The file version.txt is located one directory above generate_torch_version, some platforms are unable to find this file unless given an explicit path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61177 Reviewed By: pbelevich Differential Revision: D29660334 Pulled By: ezyang fbshipit-source-id: f66105f782aaff031e373f96a69baabb13c89337	2021-07-12 07:30:10 -07:00
Richard Zou	09679af260	Delete dead code in Tensor::to implementation (#61435 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61435 Deleted the following: - I couldn't find the NOTE mentioned so I deleted the reference to it - The memory_format check (because it always passes) - The requires_grad check (because it always passes) Test Plan: - run tests Reviewed By: soulitzer Differential Revision: D29636872 Pulled By: zou3519 fbshipit-source-id: 48a32c1821b72c512d337becf2398ce7f4cf01a2	2021-07-12 07:10:27 -07:00
Jane Xu	60086ab39b	Remove export PYTHONPATH hacks (#61487 ) Summary: Remove `export PYTHONPATH=$PWD` in favor of `-m` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61487 Test Plan: Let's see if CI passes Reviewed By: 1ntEgr8 Differential Revision: D29645544 Pulled By: janeyx99 fbshipit-source-id: 841aea8ebed2cb1c7dbc68754b5fbdee932559c2	2021-07-12 06:59:50 -07:00
CodemodService Bot	5c1505076b	[Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` Reviewed By: zertosh Differential Revision: D29656934 fbshipit-source-id: c40bbc8e4512b145050ee47db2c8dc781f3c36e9	2021-07-12 04:15:21 -07:00
Zeina Migeed	666dff381d	add AdaptiveAvgPooling2D (#61239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61239 Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D29626359 Pulled By: migeed-z fbshipit-source-id: b7cd4ce4176e2d6e7a853974443affd23a49d3d9	2021-07-10 20:07:14 -07:00
Zeina Migeed	93ef40bd83	add linear operation and modify one of the tests (#61238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61238 Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D29626333 Pulled By: migeed-z fbshipit-source-id: d4303918e380d64ba8ab678f249db6674e89357a	2021-07-10 20:07:12 -07:00
Zeina Migeed	292ee65261	add maxpool2D, add more tests, handle integer parameters for maxpool2D (#61188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61188 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29626303 Pulled By: migeed-z fbshipit-source-id: 32309cd1eb1189beaba63017653b3aeccdf2761d	2021-07-10 20:06:07 -07:00
Supriya Rao	7a15576a65	[quant] update FakeQuant modules to use tensor qparams (#61318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61318 Remove the `float()` and `int()` calls in the forward function so that we can directly use the tensor qparams in the fake_quantize operator. Calling `float()/int()` internally calls `item()` which can trigger a gpu-> cpu copy if the original tensors reside on GPU. Local benchmark P427668213 Before this change ``` Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls --------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::_aminmax 2.57% 1.507ms 3.10% 1.819ms 36.371us 2.872ms 4.81% 2.872ms 57.446us 50 aten::fake_quantize_per_tensor_affine 1.04% 610.915us 3.60% 2.114ms 42.276us 472.896us 0.79% 2.698ms 53.962us 50 aten::fake_quantize_per_tensor_affine_cachemask 1.69% 993.626us 2.56% 1.503ms 30.058us 2.225ms 3.73% 2.225ms 44.504us 50 aten::is_nonzero 3.85% 2.258ms 19.68% 11.540ms 46.161us 2.168ms 3.63% 11.084ms 44.336us 250 aten::zeros_like 1.82% 1.064ms 6.65% 3.901ms 39.007us 1.531ms 2.57% 3.905ms 39.045us 100 aten::eq 13.80% 8.093ms 25.90% 15.189ms 37.972us 9.580ms 16.05% 15.566ms 38.914us 400 aten::item 5.67% 3.323ms 21.50% 12.607ms 36.019us 3.233ms 5.42% 12.167ms 34.762us 350 aten::zeros 0.94% 549.208us 2.93% 1.717ms 34.343us 688.928us 1.15% 1.695ms 33.894us 50 aten::le 2.52% 1.478ms 4.50% 2.641ms 26.411us 1.753ms 2.94% 2.845ms 28.448us 100 aten::rsub 1.04% 608.715us 2.44% 1.433ms 28.667us 532.000us 0.89% 1.418ms 28.353us 50 aten::max 1.54% 905.401us 4.62% 2.711ms 27.106us 847.488us 1.42% 2.697ms 26.969us 100 aten::ones 0.92% 542.159us 2.16% 1.266ms 25.324us 661.856us 1.11% 1.301ms 26.017us 50 aten::min 0.82% 479.167us 2.15% 1.258ms 25.160us 407.808us 0.68% 1.276ms 25.530us 50 aten::_local_scalar_dense 15.83% 9.284ms 15.83% 9.284ms 26.526us 8.934ms 14.97% 8.934ms 25.524us 350 aten::clamp 2.35% 1.378ms 4.21% 2.467ms 24.669us 1.546ms 2.59% 2.461ms 24.612us 100 aten::zero_ 2.53% 1.482ms 5.65% 3.316ms 22.108us 1.326ms 2.22% 3.380ms 22.531us 150 aten::maximum 3.08% 1.805ms 3.08% 1.805ms 18.052us 1.849ms 3.10% 1.849ms 18.494us 100 aten::minimum 1.33% 778.854us 1.33% 778.854us 15.577us 868.672us 1.46% 868.672us 17.373us 50 aten::round 1.36% 799.910us 1.36% 799.910us 15.998us 809.568us 1.36% 809.568us 16.191us 50 aten::copy_ 6.61% 3.878ms 6.61% 3.878ms 15.513us 4.036ms 6.76% 4.036ms 16.143us 250 aten::div 2.53% 1.483ms 2.53% 1.483ms 14.833us 1.535ms 2.57% 1.535ms 15.353us 100 aten::mul 2.44% 1.431ms 2.44% 1.431ms 14.314us 1.478ms 2.48% 1.478ms 14.782us 100 aten::detach 1.46% 855.670us 2.41% 1.411ms 14.110us 832.448us 1.39% 1.395ms 13.949us 100 aten::add 2.22% 1.301ms 2.22% 1.301ms 13.008us 1.383ms 2.32% 1.383ms 13.828us 100 aten::fill_ 4.18% 2.452ms 4.18% 2.452ms 12.262us 2.693ms 4.51% 2.693ms 13.463us 200 aten::sub 5.06% 2.967ms 5.06% 2.967ms 14.837us 2.675ms 4.48% 2.675ms 13.374us 200 aten::to 2.10% 1.230ms 3.65% 2.140ms 10.701us 1.310ms 2.20% 2.062ms 10.310us 200 aten::select 1.28% 749.144us 1.49% 874.227us 8.742us 863.232us 1.45% 863.232us 8.632us 100 detach 0.95% 555.326us 0.95% 555.326us 5.553us 562.496us 0.94% 562.496us 5.625us 100 aten::as_strided 0.40% 232.289us 0.40% 232.289us 1.161us 0.000us 0.00% 0.000us 0.000us 200 aten::empty 2.93% 1.720ms 2.93% 1.720ms 3.439us 0.000us 0.00% 0.000us 0.000us 500 aten::resize_ 1.04% 611.313us 1.04% 611.313us 2.038us 0.000us 0.00% 0.000us 0.000us 300 aten::empty_like 0.75% 438.585us 1.77% 1.036ms 5.180us 0.000us 0.00% 0.000us 0.000us 200 aten::empty_strided 1.36% 799.442us 1.36% 799.442us 3.198us 0.000us 0.00% 0.000us 0.000us 250 --------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 58.645ms Self CUDA time total: 59.674ms ``` After this change ``` test_fake_quant_profiler (scripts.supriyar.benchmark.module_bench.ProfilerBench) ... ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::fake_quantize_per_tensor_affine 0.98% 505.210us 4.38% 2.259ms 45.187us 419.424us 0.78% 3.218ms 64.367us 50 aten::_aminmax 2.78% 1.434ms 3.42% 1.766ms 35.321us 2.825ms 5.27% 2.825ms 56.505us 50 aten::fake_quantize_per_tensor_affine_cachemask_tens... 2.38% 1.229ms 3.40% 1.754ms 35.083us 2.799ms 5.22% 2.799ms 55.979us 50 aten::rsub 0.94% 485.040us 5.02% 2.590ms 51.793us 458.976us 0.86% 2.587ms 51.747us 50 aten::is_nonzero 3.78% 1.952ms 23.64% 12.196ms 48.786us 2.055ms 3.83% 11.986ms 47.944us 250 aten::item 6.92% 3.572ms 19.86% 10.244ms 40.977us 3.670ms 6.85% 9.931ms 39.724us 250 aten::zeros_like 1.65% 848.874us 6.64% 3.426ms 34.260us 1.397ms 2.61% 3.572ms 35.717us 100 aten::zeros 0.85% 436.691us 3.00% 1.549ms 30.984us 551.936us 1.03% 1.576ms 31.516us 50 aten::eq 10.60% 5.467ms 20.26% 10.452ms 26.130us 7.018ms 13.09% 10.832ms 27.079us 400 aten::le 2.58% 1.332ms 4.67% 2.407ms 24.074us 1.580ms 2.95% 2.614ms 26.144us 100 aten::_local_scalar_dense 12.93% 6.673ms 12.93% 6.673ms 26.691us 6.261ms 11.68% 6.261ms 25.046us 250 aten::clamp 2.43% 1.253ms 4.37% 2.256ms 22.560us 1.431ms 2.67% 2.273ms 22.725us 100 aten::ones 0.89% 460.133us 2.18% 1.123ms 22.467us 570.496us 1.06% 1.128ms 22.551us 50 aten::min 0.74% 383.132us 2.06% 1.065ms 21.296us 377.536us 0.70% 1.091ms 21.824us 50 aten::zero_ 2.36% 1.219ms 5.87% 3.029ms 20.194us 1.261ms 2.35% 3.199ms 21.327us 150 aten::max 1.51% 779.081us 4.06% 2.096ms 20.960us 791.680us 1.48% 2.130ms 21.295us 100 aten::sub 7.97% 4.111ms 7.97% 4.111ms 20.556us 3.847ms 7.18% 3.847ms 19.234us 200 aten::div 2.94% 1.516ms 2.94% 1.516ms 15.158us 1.580ms 2.95% 1.580ms 15.798us 100 aten::round 1.45% 750.445us 1.45% 750.445us 15.009us 756.064us 1.41% 756.064us 15.121us 50 aten::copy_ 6.88% 3.548ms 6.88% 3.548ms 14.190us 3.701ms 6.90% 3.701ms 14.803us 250 aten::minimum 1.32% 681.654us 1.32% 681.654us 13.633us 713.664us 1.33% 713.664us 14.273us 50 aten::maximum 2.55% 1.317ms 2.55% 1.317ms 13.169us 1.338ms 2.50% 1.338ms 13.378us 100 aten::mul 2.63% 1.358ms 2.63% 1.358ms 13.581us 1.328ms 2.48% 1.328ms 13.283us 100 aten::detach 1.34% 688.820us 2.35% 1.211ms 12.110us 772.800us 1.44% 1.278ms 12.779us 100 aten::fill_ 4.53% 2.338ms 4.53% 2.338ms 11.692us 2.495ms 4.65% 2.495ms 12.473us 200 aten::add 2.32% 1.197ms 2.32% 1.197ms 11.968us 1.240ms 2.31% 1.240ms 12.405us 100 aten::to 2.07% 1.069ms 3.66% 1.889ms 9.443us 1.224ms 2.28% 1.975ms 9.874us 200 aten::select 1.44% 743.042us 1.64% 848.207us 8.482us 641.600us 1.20% 641.600us 6.416us 100 detach 1.01% 522.155us 1.01% 522.155us 5.222us 505.088us 0.94% 505.088us 5.051us 100 aten::as_strided 0.44% 227.884us 0.44% 227.884us 1.139us 0.000us 0.00% 0.000us 0.000us 200 aten::empty 3.20% 1.652ms 3.20% 1.652ms 3.304us 0.000us 0.00% 0.000us 0.000us 500 aten::resize_ 1.25% 646.711us 1.25% 646.711us 2.156us 0.000us 0.00% 0.000us 0.000us 300 aten::empty_like 0.79% 407.768us 2.07% 1.067ms 5.334us 0.000us 0.00% 0.000us 0.000us 200 aten::empty_strided 1.52% 785.788us 1.52% 785.788us 3.143us 0.000us 0.00% 0.000us 0.000us 250 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 51.590ms Self CUDA time total: 53.609ms ghstack-source-id: 133370215 Test Plan: buck test mode/dev-nosan caffe2/test/:quantization Reviewed By: raghuramank100 Differential Revision: D29566512 fbshipit-source-id: 1aefca51f99949da7334bcfe504848275c9f952c	2021-07-10 19:43:02 -07:00
Supriya Rao	99848c7269	[quant] Add tensor_qparam variant to fake_quantize_per_tensor (#61317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61317 Add an overload to fake_quantize_per_tensor that accepts scale/zero_point as input. The reasons to do this are * required for fused observer + fake_quant operator on GPU where the scale/zero_point will be calculated by the observer on device. Passing tensor inputs enables us to directly access the scale/zero-point value in the cuda kernel to avoid extra copies/malloc * enables us to pass in float as scale dtype and int32 as zero_point dtype (which is consistent with what the quantize call actually uses) https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/affine_quantizer_base.cpp#L52-L53 * overload consistent with `quantizer_per_tensor.tensor_qparams` ghstack-source-id: 133370216 Test Plan: buck test mode/dev-nosan caffe2/test/:quantization -- test_backward_per_tensor_cachemask buck test mode/dev-nosan caffe2/test/:quantization -- test_forward_per_tensor_cachemask Reviewed By: raghuramank100 Differential Revision: D29552727 fbshipit-source-id: cbb9af40fc575ad27a29c646b760d5ee52cc923d	2021-07-10 19:41:55 -07:00
Peter Bell	57676ce128	Migrate multi_margin_loss to ATen (CUDA) (#61426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61426 Closes gh-24600, closes gh-24601 These operators use custom kernels that aren't well suited to `TensorIterator` style, so this is just changing the CPU code and cleaning up the style. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29648015 Pulled By: ngimel fbshipit-source-id: cadf1890cdc2199d57f4533370e554613efeb54a	2021-07-10 18:48:58 -07:00
Xiao Wang	5a17cb6f44	Add channels-last support for bilinear and nearest 2d interpolation on CUDA (#56322 ) Summary: Add channels-last support for bilinear and nearest 2d interpolation on CUDA Benchmark (on 2070 Super) is available at - nearest 2d: https://github.com/xwang233/code-snippet/tree/master/interpolate-channels-last/nearest-2d - bilinear: https://github.com/xwang233/code-snippet/tree/master/interpolate-channels-last/bilinear Some regressions are seen for tensors with small channel size. We may add a heuristic to dispatch the contiguous and channels-last path if needed. Close https://github.com/pytorch/pytorch/issues/60137 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56322 Reviewed By: mruberry Differential Revision: D29645980 Pulled By: ngimel fbshipit-source-id: c36dff4ee4789bec9b01da4029f326d30067c6b7	2021-07-10 18:00:50 -07:00
Yi Wang	df00c636d2	[Model Averaging] Skip model averaging for the first K steps (#61207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61207 Model averager now must be combined with post-localSGD DDP communication hook. It will skip model averaging for the first K steps, because post-localSGD communication hook will run global gradient averaging during this phase. Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 133371335 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager Reviewed By: pritamdamania87 Differential Revision: D29523738 fbshipit-source-id: 3fa9611046e1c0afa4bda78aa3ba200fa2a5fa4b	2021-07-10 17:12:16 -07:00
Yi Wang	0f6876d721	[Model Averaging] Create a post-localSGD communication hook (#61206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61206 Create a communication hook to run post-local SGD. This will be combined with model averager component to better support local SGD. In contrast to the previous approach that runs local gradient averaging + global model averaging at each step for the first K steps, now we plan to run global gradient averaging only for the first K steps at each step, just like normal DDP. This can give us two advantages: 1) For some optimizers, model averaging can cause discrepancy in optimizer states. If we still do global gradient averaging for the first K steps, we can defer such discrepancy until we actually start local SGD. 2) Gradient averaging at the first K steps only run one allreduce that overlaps with backward pass, so it should also be more efficient. Proposal: https://github.com/pytorch/pytorch/issues/59699 ghstack-source-id: 133371322 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_hook_parity_post_localSGD Reviewed By: pritamdamania87 Differential Revision: D29523292 fbshipit-source-id: 3f215f7150f2917c2781278fad759530c685ea2c	2021-07-10 17:11:10 -07:00
gmagogsfm	a46d4212bf	Allow dims=0 in torch.tensordot call (#61331 ) Summary: In one of my previous PRs that rewrite `tensordot` implementation, I mistakenly take empty value of `dims_a` and `dims_b` as illegal values. This turns out to be not true. Empty `dims_a` and `dims_b` are supported, in fact common when `dims` is passed as an integer. This PR removes the unnecessary check. Fixes https://github.com/pytorch/pytorch/issues/61096 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61331 Reviewed By: eellison Differential Revision: D29578910 Pulled By: gmagogsfm fbshipit-source-id: 96e58164491a077ddc7a1d6aa6ccef8c0c9efda2	2021-07-10 17:05:20 -07:00
Hao Lu	7d7b7abb3b	[Static Runtime] Separate function for getting always_alive values (#61506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61506 Separate out the logic of GetAlwaysAliveValues from GetLivenessMap so to simplify the code structure. Also you don't need to run GetLivenessMap if optimize_memory is turned off. Reviewed By: ajyu Differential Revision: D29423534 fbshipit-source-id: dbdeeb10f7bcad86a24aa12f741f7c9ab946bb3b	2021-07-10 16:59:29 -07:00
David Reiss	7fdc5f9e08	model_dump: Fix non-counting and double-counting bugs in tensor memory (#60702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60702 - Instead of traversing and counting all tensor memory, collect a map from storage key to storage info while traversing. Add up sizes at the end to avoid double counting. - Count tensor memory from constants as well. Test Plan: Ran webdriver test. Reviewed By: dhruvbird Differential Revision: D29380396 Pulled By: dreiss fbshipit-source-id: 6d0fd66f677fe23c851aa218387aa4dc59502b1e	2021-07-10 15:16:34 -07:00
David Reiss	158d351517	model_dump: Add webdriver test (#60701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60701 The unit test previously only tested that the dump could complete successfully. It was not able to verify that any JS worked properly. Now we can test the JS as long as webdriver is installed. Tweaked the implementation of Hider a bit to make it easier for tests to find and open them. I disabled the tests by default since I don't want to deal with webdriver in CI. Enable them with the environment variable RUN_WEBDRIVER=1. We could make the tests use headless mode, but it's kind of fun to watch them run. Add a test to verify that tensor memory computation is working for the simple model. Test Plan: Ran the test. Reviewed By: dhruvbird Differential Revision: D29380398 Pulled By: dreiss fbshipit-source-id: f19d0b05d79ad5a8231e85422976f1889e021c89	2021-07-10 15:16:32 -07:00
David Reiss	cc78c463c0	model_dump: Render constants.pkl similar to data.pkl (#60700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60700 Test Plan: Dumped a model with a lot of constants (qconvs produced by optimizing). Was able to see them rendered nicely. Reviewed By: dhruvbird Differential Revision: D29380400 Pulled By: dreiss fbshipit-source-id: c951508b92bb2717591dd173282157e1a40a30bd	2021-07-10 15:16:31 -07:00
David Reiss	e292f34def	model_dump: Make stdout argument for main a keyword-only argument (#60699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60699 Also add a unit test for main, which brings the test coverage up to ~98%. Also factor out the "needs importlib.resources" check into a function for easier reuse. Test Plan: CI Reviewed By: dhruvbird Differential Revision: D29380397 Pulled By: dreiss fbshipit-source-id: bba16da85bf7bfb4370308e38c844694d01b47eb	2021-07-10 15:16:29 -07:00
David Reiss	2942e9aa80	model_dump: update maintainer comment (#60698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60698 ... to reflect that the Python command should be re-run when changing the model. Test Plan: CI Reviewed By: dhruvbird Differential Revision: D29380399 Pulled By: dreiss fbshipit-source-id: 1ec464da4ebe6ddf400eb4a3b14da683369c0039	2021-07-10 15:15:15 -07:00
Ansley Ussery	f5c10fdbd3	Allow for heterogenous List and Dict values + Improve container typing algorithm (#57137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57137 This PR corrects and expands our typing algorithm for unannotated, non-empty dicts and lists. Previously, to verify type correctness for an unannotated, non-empty container, we had gotten the type of the first element in the container, then checked if each following element was a subtype of the first type. That's too restrictive--what if the first element were a subtype of the second element? Instead, we should type the container by getting the smallest common supertype of all the given elements. We need slightly different rules for keys and values in dicts, though: because the set of key types is restricted, finding two key types that cannot be unified should cause an error. On the other hand, the set of value types is not restricted, so we should be able to use `Any` as a valid supertype. We need to keep the set of keys restricted since the keys are used to generate and match schemas. This does not break backwards compatibility, because the default element type is the smallest supertype of all the given types. So, if someone creates an unannotated dict where the keys are all `str` and the values are all `torch.Tensor`, the dict will be inferred to `Dict[str, Tensor]` just like it was before. Empty lists are still typed as `List[torch.Tensor],` and empty dicts are still typed as `Dict[str, Tensor]`. This PR unblocks three engineers on an FB-internal team and improves FX-TorchScript compatibility. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D28231839 Pulled By: ansley fbshipit-source-id: 7297bf239749daa54895add708185c75e6ca5999	2021-07-10 14:29:05 -07:00
Hao Lu	ccd0977060	[Static Runtime] Support prim::GetAttr/SetAttr (#61505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61505 The handling of `self` in static runtime was previously incorrect. This diff fixed that issue, since self is essential to prim::GetAttr/SetAttr. After all, most of the time we're getting and setting attributes from self, the torch script module. Reviewed By: ajyu Differential Revision: D29350173 fbshipit-source-id: 6e62add4cda517ef8cd6c315d4cb0595e7d531fb	2021-07-10 14:06:06 -07:00
Nikita Shulga	f291b1899f	Revert D27978269: Smart Decay for Adam - Caffe2 Test Plan: revert-hammer Differential Revision: D27978269 (`aaa1e07609`) Original commit changeset: e47524101ddf fbshipit-source-id: 334824bbf9a6ed788e75af9c292754081f70a19b	2021-07-10 13:09:58 -07:00
Rohan Varma	8bcf24b37a	[TCPStore] enhance connect timeout error message (#61390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61390 Enhances this error message for better debugability. ghstack-source-id: 133185482 Test Plan: CI Reviewed By: H-Huang Differential Revision: D29601528 fbshipit-source-id: f7aaf4d67ac96e6ed0b535e0200f918dd01e42f9	2021-07-10 03:57:23 -07:00
Jithun Nair	336970c03e	Add note on torch.distributed backends on ROCm (#58975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58975 Reviewed By: soulitzer Differential Revision: D29595510 Pulled By: rohan-varma fbshipit-source-id: 384bb67fcd003d65b76e957a474406b2a38099b9	2021-07-10 03:51:19 -07:00
Will Constable	73b86c9f9c	Add getMethod to PytorchPredictorContainer (#61052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61052 Implement getMethod in the container in a similar way to getPredictor, using either Deploy or Script functionality depending on how the container was initialized and how the gflag deploy override are set. Test Plan: Add new unit test Reviewed By: houseroad Differential Revision: D29346969 fbshipit-source-id: 08e95ee96d533f5a7cc9c8f9b1c53751715c9181	2021-07-09 22:27:40 -07:00
Zeina Migeed	677313b670	ReLU (#61150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61150 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29625826 Pulled By: migeed-z fbshipit-source-id: 10e0662e33ccd4342cedd51579a10651755b633f	2021-07-09 19:32:08 -07:00
Ilia Cherniavskii	a556c1c4dc	[profiler] Update Kineto submodule (ci-all) (#61478 ) Summary: Update Kineto submodule Pull Request resolved: https://github.com/pytorch/pytorch/pull/61478 Test Plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/61432 Reviewed By: gdankel Differential Revision: D29646019 Pulled By: ilia-cher fbshipit-source-id: 02ecb0a2a6b457f6537c7d6b3c475e1e0ace3b6f	2021-07-09 19:32:06 -07:00
Jane Xu	06166a13e0	Remove VS install step unless necessary from GHA Windows workflows (#60791 ) Summary: ~~This should only be merged after our AMI has been deployed after https://github.com/fairinternal/pytorch-gha-infra/pull/1. (And will likely fail our current windows jobs)~~ I have revised this PR to install VS only when it's not already installed. This should save ~5min per Windows workflow. ![image](https://user-images.githubusercontent.com/31798555/125141598-7e886c80-e0e3-11eb-9fe0-bb9e6bcc14f1.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/60791 Reviewed By: soulitzer Differential Revision: D29643876 Pulled By: janeyx99 fbshipit-source-id: 4bcfaf5bcad9e5636a1624c3e799e7cc97a87660	2021-07-09 19:32:04 -07:00
Natalia Gimelshein	9b2b45919a	Revert D29639797: [package] error if we try to mock a module in 3.6 Test Plan: revert-hammer Differential Revision: D29639797 Original commit changeset: 775ed78638fb fbshipit-source-id: 9d2f6dae7ee35c6b37338e36ec7ade9d9e2ccbc2	2021-07-09 19:31:04 -07:00
Jamie King	aaa1e07609	Smart Decay for Adam - Caffe2 (#61488 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61488 We want to decay learning parameters properly. Previously this was not done when a parameter is absent from a minibatch. We fix this by keeping track of missed minibatches and making decay catch up accordingly. The exponential moving averages (EMA) for the first and second moments used in Adam are updated only for parameters seen in a minibatch. Actually, for these parameters, 0 should be added to the EMAs and the EMAs should then be decayed by multiplying by beta1 and beta2 respectively. To avoid the computational overhead of touching every parameter for every minibatch, we: * keep track of the last time a parameter is seen * instead of decaying the EMAs by multiplying by beta1 and beta2, we multiply by beta1^k and beta2^k, where k is the number of minibatches since the parameter was last seen. Differential Revision: D27978269 fbshipit-source-id: e47524101ddfcb281c46c505b9b7a8f0835bc64a	2021-07-09 18:28:21 -07:00
Mikhail Zolotukhin	b52909d861	[TensorExpr] Add python bindings for ArgValue class and TensorExprKernel constructor accepting custom lowerings. (#61385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61385 The bindings coverage might be not full yet, but this already allows us to register custom lowerings from python. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D29623487 Pulled By: ZolotukhinM fbshipit-source-id: b97ee420a57fd887e204c021b9e098764b2ee232	2021-07-09 18:27:14 -07:00
Gary Miguel	dec5aa2260	[JIT] clean up (#60390 ) Summary: * Minor: spelling, grammar. * Add calls to `GRAPH_DUMP()` where they were missing. * Add or expand a few comments. * Move a few comments to seemingly more appropriate spots. * In canonicalize_graph_fuser_ops.cpp inline `runnableInputs()` since it was only called in one place and had a misleading comment and confusing name. * In `PeepholeOptimizeImpl::optimizeBlock()`, set `changed = true;` when removing `aten::is_complex`. Pretty sure its absence was a bug. * Delete unused `_jit_pass_remove_inplace_ops` and and its implementation `RemoveInplaceOps()`. * In `preprocessCaffe2Ops()`, remove redundant check for nested optional types. It was already checked in `checkONNXCompatibility()`. * In `EncoderBase::AddAttribute`, log the unexpected attribute kind. I don't remember the repro case now but I did hit this error at some point and this additional logging made it easier to understand. * In `fuseConvBatchNorm()` in eval_peephole.cpp, consistently use camelCase instead of snake_case for local variables. * Add curly braces around the bodies of if and loops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60390 Reviewed By: Krovatkin Differential Revision: D29523283 Pulled By: SplitInfinity fbshipit-source-id: 4e16c5648616f53da07d68dab7fdf252e06a0752	2021-07-09 16:28:27 -07:00
Michael Suo	54ea7d33ba	[package] error if we try to mock a module in 3.6 (#61469 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61469 This feature is not supported, error out early. Differential Revision: D29639797 D29639797 Test Plan: Imported from OSS Reviewed By: Lilyjjo Pulled By: suo fbshipit-source-id: 775ed78638fb6da8f830b632726b00c0533ed176	2021-07-09 16:26:38 -07:00
Akshit Khurana	a3670ba377	Add option to specify custom NNAPI serializer (#61025 ) Summary: To add serializer for custom ops we can subclass default serializer and update ADDER_MAP Pull Request resolved: https://github.com/pytorch/pytorch/pull/61025 Test Plan: * pytest test/test_nnapi.py::TestNNAPI for current serializer * Custom serializers to be tested with custom ops Imported from OSS Reviewed By: anshuljain1 Differential Revision: D29480745 fbshipit-source-id: 37e3f8de3c97f6c8a486f9879ce11430ea89af34	2021-07-09 15:27:10 -07:00
Lily Johnson	cbb6ab6d88	[package] ignore dunder import errors (#61148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61148 Changes `__import__` processing to silently skip cases where the `__import__` statement cannot be parsed. Adds failed imports to a list retrievable by `PackageExporter.failed_dunder_import_list()`. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D29559680 Pulled By: Lilyjjo fbshipit-source-id: 2513d0b9ef271c85cadc3f5a013fbd8c8de80b46	2021-07-09 15:27:08 -07:00
Lily Johnson	12772c8dd8	[package] PackageExporter visualization methods (#61147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61147 Basic tooling to enable users to see what is inside of a PackageExporter. Added methods: - `externed/interned/mocked/denied_list()`: returns list of modules which are currently in the specified category - `relied_on_by(module_name)`: returns list of modules which rely on `module_name` - `dependency_graph_str()`: returns string format of graph for users. Example of output: ``` digraph G { rankdir = LR; node [shape=box]; "<res.foo.pkl>" -> "foo"; "foo" -> "torch.package"; "foo" -> "time"; "foo" -> "sentencepiece"; "foo" -> "package_top"; } ``` Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D29559683 Pulled By: Lilyjjo fbshipit-source-id: 5dff4d04af911a9c9fdd0d100420f1382eaef46e	2021-07-09 15:27:06 -07:00
Lily Johnson	b5f0576278	[package] Modify Digraph to track predecessors (#61146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61146 Track predecessors of nodes in DiGraph in order to enable cleaner dependency visualization code. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D29559682 Pulled By: Lilyjjo fbshipit-source-id: 06f51b1108423aece5bdd72a7b82ab736e5e4f94	2021-07-09 15:27:04 -07:00
Akshit Khurana	ae65f63971	Make nnapi flatten converter accept flex inputs (#61024 ) Summary: As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/61024 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_flatten Reviewed By: anshuljain1 Differential Revision: D29480748 fbshipit-source-id: c334b09600a64d3e552cec843d6da3de28e7d27c	2021-07-09 15:27:02 -07:00
Aliaksandr Ivanou	028e438d6c	[torchelastic] Make sure `rdzv_configs[timeout]` is not getting overwritten (#61471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61471 Make sure `rdzv_configs[timeout]` is not getting overwritten Test Plan: sandcastle Differential Revision: D29638606 fbshipit-source-id: e164cdddaed77e7e35412ed58ac1ee312e9d489d	2021-07-09 15:27:00 -07:00
Bradley Davis	1f4bba77b6	[fx] fix subgraph API call_module warning about no owning module (#61463 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61463 seems like a small oversight(?), current test fails when warnings are recorded. discovered this when calling `graph.call_module(existing_call_module_node.target)` and it raised a warning Test Plan: `buck test //caffe2/test:fx` Reviewed By: ansley Differential Revision: D29637799 fbshipit-source-id: 2305629863230235f76a926fe2e4de480cbf853c	2021-07-09 15:25:44 -07:00
Akshit Khurana	76c0f223d3	Make nnapi cat converter accept flex inputs Summary: As title Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_cat Reviewed By: anshuljain1 Differential Revision: D29480747 fbshipit-source-id: 161803054ff1a4c2c750fc30a5f0fc6d8a24b2c9	2021-07-09 14:27:53 -07:00
Akshit Khurana	9e81d3d869	Make NNAPI linear converter accept flex inputs (#61022 ) Summary: As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/61022 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_linear Reviewed By: anshuljain1 Differential Revision: D29480749 fbshipit-source-id: 35975861740298c9e16f866c939e7ee3c2151710	2021-07-09 14:27:51 -07:00
Michael Suo	35b950ea98	[package] properly handle case where we are re-packaging mocked modules (#61434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61434 Mocking is the only time we introduce a "special" module to a torch.package of our own creation. This interacts poorly with re-packaging, since if we treat `_mock` as a regular module and try to package it normally we will produce a broken package. This PR teaches PackageExporter to recognize `_mock` modules and treat them specially during the dependency walking process, thus avoiding the issue. Test Plan: Imported from OSS Reviewed By: jdonald, Lilyjjo Differential Revision: D29638283 Pulled By: suo fbshipit-source-id: 37a7ffa34da8bb665f679fbd72aa3d71154b2209	2021-07-09 14:27:49 -07:00
Andrew Gu	4f4beb8286	Add Model Parallel Support to ZeRO (#61370 ) Summary: Overview: The existing `ZeroRedundancyOptimizer` implementation assumes that all model parameters are stored on the same device (due to the recent [refactor](https://github.com/pytorch/pytorch/pull/59834)). This change allows model parameters to be sharded across multiple devices, as in the DDP with Model Parallelism example [here](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html). The only logic affected is the bucketing strategy used when `parameters_as_bucket_view=True`. Let `n` denote the world size and `k` denote the number of devices per process. - Previously, `k = 1`, and `self._buckets` was a `List[torch.Tensor]`, where `self._buckets[j]` is a tensor (i.e. bucket) containing the parameters assigned to rank `j` for `j = 0, ..., n - 1`. - Now, `self._buckets` is a `List[List[torch.Tensor]]`, where `self._buckets[i][j]` is a tensor containing the parameters stored on device `i` assigned to rank `j` for `i = 0, ..., k - 1` and `j = 0, ..., n - 1`. This bucket construction uses an auxiliary data structure `self._device_to_per_rank_params`, which is a `Dict[torch.device, List[List[torch.Tensor]]]`. It maps: - `dev_0` to `[rank 0's assigned parameters on dev_0, rank 1's assigned parameters on dev_1, ...]`, - `...` - `dev_{k-1}` to `[rank 0's assigned parameters on dev_{k-1}, rank 1's assigned parameters on dev_{k-1}, ...]` I removed the invariant checker `_verify_same_param_device()` and its corresponding test since it is no longer an invariant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61370 Test Plan: I added a new test `test_zero_model_parallel()` that checks for parity between a DDP model with model parallelism using `ZeroRedundancyOptimizer` and a local model with the same architecture using a local optimizer. I also verified that the existing tests still pass. Reviewed By: soulitzer Differential Revision: D29637132 Pulled By: andwgu fbshipit-source-id: 07112959fa4e94a3f40e67e88cbb58ce3cd1e033	2021-07-09 14:27:47 -07:00
Scott Wolchok	fb7ed24f6e	[PyTorch] Try using ExclusivelyOwned in LinearAlgebra (#59420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59420 This is a sample of how we might use ExclusivelyOwned on an opt-in basis. ghstack-source-id: 133089540 Test Plan: 1) CI to run regression tests 2) Spot-checked assembly for linalg_det_out. Rather than calling the intrusive_ptr dtor, we get the ExclusivelyOwned dtor inline. In particular, we do not get any atomic refcount decrement instructions emitted. 3) TODO: some kind of perf profiling; advice welcome Reviewed By: ezyang Differential Revision: D28885313 fbshipit-source-id: ae4b39ed738c41d0c4a4509a5199c040ba9aa63a	2021-07-09 14:27:45 -07:00
Scott Wolchok	a5c5b56cf5	gen ExclusivelyOwned in structured kernels (#59827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59827 ghstack-source-id: 133089541 Test Plan: existing CI Reviewed By: ezyang, janeyx99 Differential Revision: D28965922 fbshipit-source-id: ffbc1d43e5d3ab3abfad3b0830b4da1ce899f505	2021-07-09 14:26:37 -07:00
Elton Leander Pinto	711ded688d	Add a script to codemod max_tokens_total pragmas to C/C++ files (#61369 ) Summary: This PR adds a new script: `max_tokens_pragmas.py` This is a utility script that can add/remove `max_tokens_total` pragmas from the codebase. - [x] Implement script and test manually - [x] Write test script Examples: First, change directories ```bash cd tools/linter/clang_tidy ``` Then run the following: ```bash cat << EOF > test/test1.cpp // File without any prior pragmas int main() { for (int i = 0; i < 10; i++); return 0; } EOF cat << EOF > test/test2.cpp // File with prior pragmas #pragma clang max_tokens_total 1 int main() { for (int i = 0; i < 10; i++); return 0; } EOF cat << EOF > test/test3.cpp // File with multiple prior pragmas #pragma clang max_tokens_total 1 // Different pragma; script should ignore this #pragma clang max_tokens_here 20 int main() { for (int i = 0; i < 10; i++); return 0; } #pragma clang max_tokens_total 1 EOF # Add pragmas to some files python3 max_tokens_pragma.py --num-max-tokens 42 test/.cpp grep "#pragma clang max_tokens_total 42" test/.cpp # Remove pragmas from files python3 max_tokens_pragma.py --strip test/.cpp grep "#pragma clang max_tokens_total 42" test/.cpp # should fail # Ignore files python3 max_tokens_pragma.py --num-max-tokens 42 test/.cpp --ignores test/test2.cpp grep "#pragma clang max_tokens_total 42" test/.cpp # should not list `test/test2.cpp` ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61369 Test Plan: `tools/linter/clang_tidy/test/test_max_tokens_pragma.py` Reviewed By: malfet Differential Revision: D29604291 Pulled By: 1ntEgr8 fbshipit-source-id: 3efe52573583769041a07e6776161d4d5bbf16a7	2021-07-09 13:30:52 -07:00
Elton Leander Pinto	3b004aed3a	Enable local clang-tidy lint (#61121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61121 This change enables the make target to run clang-tidy locally Test Plan: Run this command ``` make clang-tidy ``` This should run `clang-tidy` on the paths and filters specified in `tools/linter/clang_tidy/__main__.py` Quicklint ``` make quicklint ``` This should report "No files detected" if no c/cpp files are altered. Reviewed By: soulitzer Differential Revision: D29598927 Pulled By: 1ntEgr8 fbshipit-source-id: aa443030494fed92c313da4b203a5450be09fa38	2021-07-09 13:30:50 -07:00
Aliaksandr Ivanou	8296cb37c7	[torchelastic] Set the correct maximum border width Summary: The diff sets the correct max border delimiters between error sections Test Plan: Example of the uncontrolled border: https://www.internalfb.com/intern/testinfra/diagnostics/7599824415964133.844424970500348.1625590344/ Reviewed By: kiukchung Differential Revision: D29636814 fbshipit-source-id: 95465d3150066bff82dc7499bb1c63ea4f5ebc2d	2021-07-09 13:29:23 -07:00
Dimitrije Jankov	6bb33d93ab	disable the format library in C10 (#60052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60052 Introduction: We would like to use the minimal implementation of C10 for our our SGX port of pytorch. This would include disabling signal handlers and the fmt library. Problem : When C10_SUPPORTS_SIGNAL_HANDLER is disabled there is no reason to have fmt enabled as it is used only in stacktraceSignalHandler. The problem is that fmt/format.h is included regardless whether C10_SUPPORTS_SIGNAL_HANDLER is disabled or not. Solution : Move the #include <fmt/format.h> inside the #ifdef section of code where C10_SUPPORTS_SIGNAL_HANDLER is checked. Test Plan: Run the pytorch unit tests. Reviewed By: h397wang, LiJihang Differential Revision: D29022628 fbshipit-source-id: 638cf98381585cd6059129d9c5a65d9e6a841575	2021-07-09 12:28:19 -07:00
Mengwei Liu	b01329b164	[xplat] Update XNNPACK to github revision 79cd5f9 (#61400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61400 allow-large-files Update XNNPACK to github version 79cd5f9. Test Plan: Spark apps build works. Hand tracking works: https://pxl.cl/1L76g Reviewed By: dreiss Differential Revision: D29385882 fbshipit-source-id: 6be920a68b876faedf7e86e33df43f8b1db14a4d	2021-07-09 12:28:16 -07:00
Santiago Castro	86463a8d02	Save some little memory in `default_collate` (#61424 ) Summary: It can be a non-little save if there are many workers and a large batch size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61424 Reviewed By: soulitzer Differential Revision: D29635477 Pulled By: ejguan fbshipit-source-id: 1fc48b5964e873bd8833ad81bed9d51b0b6d137e	2021-07-09 12:27:07 -07:00
Luca Wehrstedt	c830db0265	Raise error in CMake for CUDA <9.2 (#61462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61462 Anything before CUDA 9.2 is not supported (see https://github.com/pytorch/pytorch/pull/36848), and perhaps not even that. ghstack-source-id: 133312018 Test Plan: CI Reviewed By: samestep Differential Revision: D29637251 fbshipit-source-id: 4300169b7298274b2074649342902a34bd2220b5	2021-07-09 11:28:38 -07:00
Luca Wehrstedt	b5c464d5ef	Make Future store weak pointers to storages (#60943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60943 In https://github.com/pytorch/pytorch/pull/60470 we made Future store Storages rather than store references to their DataPtrs (because these references could go stale...). However this meant that the Future could keep the Storage alive, and thus keep its memory allocated, even after the user was done with it. We fix it here by instead storing a weak ptr to that Storage (well, in fact to the StorageImpl, but it's the same). ghstack-source-id: 133295799 Test Plan: CI Reviewed By: mrshenli Differential Revision: D29454104 fbshipit-source-id: d36dee00a4841c087bb7b3f5bc39e0459f209cdb	2021-07-09 11:28:36 -07:00
Karen Zhou	962c9fbf85	[pruner] add handles for hooks (#61425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61425 Adding handle for activation reconstruction and bias forward hooks so they can be removed later ghstack-source-id: 133244536 Test Plan: This change should not affect behavior yet, but to double check: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1LpM9 Reviewed By: z-a-f Differential Revision: D29619720 fbshipit-source-id: c7428d2d0325cd11ce7919e0b67321e8cc196041	2021-07-09 11:28:35 -07:00
Philip Meier	682ebc1dd1	remove UsageError in favor of ValueError (#61031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61031 See https://github.com/pytorch/pytorch/pull/58916#issuecomment-868519515. Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D29626810 Pulled By: mruberry fbshipit-source-id: 25ddf26815f9ef82b8234d7dac811a6a13a53c54	2021-07-09 11:28:33 -07:00
Philip Meier	5401dd2f9a	change language from array to tensor (#60639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60639 Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D29626812 Pulled By: mruberry fbshipit-source-id: 1b0e78426fd08d7b72d890adc9811d31afd805fe	2021-07-09 11:28:31 -07:00
Philip Meier	09c90b3589	relax type equality constraint (#60638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60638 Initial proposal in https://github.com/pytorch/pytorch/pull/58981#issuecomment-866690334. Opposed to the proposal, this PR only allows relaxing the type equality constraint to a common superclass constraint, for example `torch.Tensor` vs `torch.nn.Parameter`. Inputs that do not share a common superclass will still fail. Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29626811 Pulled By: mruberry fbshipit-source-id: 1916c3b710d38889de7ce57eb0770c76cbbb8166	2021-07-09 11:27:32 -07:00
Jeffrey Wan	24a8915534	Relax use-count check to allow for 0 (#61414 ) Summary: Previously we require tensor use count to be exactly 1. We should actually allow for use count to be zero as well. Use count is zero when an undefined tensor is returned, and this is common in backward functions that have multiple outputs. In this PR I also remove some entries from the skip list that should be covered by this change: they return multiple tensors AND are backward functions. Batch norm is also known to return undefined tensors when `training=False`. Related issue: https://github.com/pytorch/pytorch/issues/60426 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61414 Reviewed By: albanD Differential Revision: D29614687 Pulled By: soulitzer fbshipit-source-id: ab0892aed4bd1346b50b0a9552ffcc3287ac96af	2021-07-09 10:28:12 -07:00
Akshit Khurana	9e533a62f6	Make conv2d nnapi converter accept flexible batch (#61021 ) Summary: Same as title Pull Request resolved: https://github.com/pytorch/pytorch/pull/61021 Test Plan: pytest test/test_nnapi.py::TestNNAPI Reviewed By: anshuljain1 Differential Revision: D29480746 fbshipit-source-id: 7217c8f3a811db8c3c373f3e7ca31caf9502ef22	2021-07-09 10:28:10 -07:00
Jagadish Krishnamoorthy	64d61901eb	[ROCm] Skip test_masked_scatter_large_tensor_cuda (#61313 ) Summary: Refer https://github.com/pytorch/pytorch/issues/60190. Skipping unit test until hipcub issue is fixed. Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/61313 Reviewed By: iramazanli Differential Revision: D29626664 Pulled By: malfet fbshipit-source-id: db2a390d2a3e28ec05a5032a50aa9a35c86b96ca	2021-07-09 10:27:08 -07:00
shmsong	ee2dd35ef4	Resolving native dependency and try_run for cross compile (#59764 ) Summary: This is a PR on build system that provides support for cross compiling on Jetson platforms. The major change is: 1. Disable try runs for cross compiling in `COMPILER_WORKS`, `BLAS`, and `CUDA`. They will not be able to perform try run on a cross compile setup Pull Request resolved: https://github.com/pytorch/pytorch/pull/59764 Reviewed By: soulitzer Differential Revision: D29524363 Pulled By: malfet fbshipit-source-id: f06d1ad30b704c9a17d77db686c65c0754db07b8	2021-07-09 09:29:21 -07:00
Akshit Khurana	8bd3e52e00	Add conv2d transpose NNAPI converter (#59529 ) Summary: * Conv2d transpose support * Quantize WIP Pull Request resolved: https://github.com/pytorch/pytorch/pull/59529 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_conv2d_transpose Reviewed By: anshuljain1 Differential Revision: D28926335 fbshipit-source-id: 8f90182f96cee0a13c4f38331d421e1e8ac618de	2021-07-09 09:29:20 -07:00
zilinzhu	c19adfff54	[DataLoader] Introduce ConcatMapDataPipe functional datapipe (#61010 ) Summary: As part of https://github.com/pytorch/pytorch/issues/57031, this PR adds the ConcatMapDataPipe functional datapipe for the MapDataPipe class. We may need to discuss how to treat the datapipes with no valid length. For now, I just use them as if they have infinite length and the `__getitem__` could not go pass them. Thank you for your time on reviewing this~ cc ejguan Pull Request resolved: https://github.com/pytorch/pytorch/pull/61010 Reviewed By: soulitzer Differential Revision: D29587679 Pulled By: ejguan fbshipit-source-id: 5eb97fa727209bec6c534520057c64a78000626e	2021-07-09 09:29:18 -07:00
Jane Xu	2bbcc80de3	Enable disabling test cases on specific platforms (#61427 ) Summary: This adds functionality to our common_utils.py to allow disabling test cases for platforms Mac, Windows, and Linux. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61427 Test Plan: CI should not change as no issues currently have the line "Platforms:..." I tested locally by making sure `test_async_script` is skipped while running `python test/test_jit.py -k TestAsync.test_async_script` with a cached modified `.pytorch-disabled-tests.json`: ``` { "total_count": 32, "incomplete_results": false, "items": [ { "url": "https://api.github.com/repos/pytorch/pytorch/issues/60652", "repository_url": "https://api.github.com/repos/pytorch/pytorch", "labels_url": "https://api.github.com/repos/pytorch/pytorch/issues/60652/labels{/name}", "comments_url": "https://api.github.com/repos/pytorch/pytorch/issues/60652/comments", "events_url": "https://api.github.com/repos/pytorch/pytorch/issues/60652/events", "html_url": "https://github.com/pytorch/pytorch/issues/60652", "id": 929288995, "node_id": "MDU6SXNzdWU5MjkyODg5OTU=", "number": 60652, "title": "DISABLED test_async_script (jit.test_async.TestAsync)", "user": { "login": "ezyang", "id": 13564, "node_id": "MDQ6VXNlcjEzNTY0", "avatar_url": "https://avatars.githubusercontent.com/u/13564?v=4", "gravatar_id": "", "url": "https://api.github.com/users/ezyang", "html_url": "https://github.com/ezyang", "followers_url": "https://api.github.com/users/ezyang/followers", "following_url": "https://api.github.com/users/ezyang/following{/other_user}", "gists_url": "https://api.github.com/users/ezyang/gists{/gist_id}", "starred_url": "https://api.github.com/users/ezyang/starred{/owner}{/repo}", "subscriptions_url": "https://api.github.com/users/ezyang/subscriptions", "organizations_url": "https://api.github.com/users/ezyang/orgs", "repos_url": "https://api.github.com/users/ezyang/repos", "events_url": "https://api.github.com/users/ezyang/events{/privacy}", "received_events_url": "https://api.github.com/users/ezyang/received_events", "type": "User", "site_admin": false }, "labels": [ { "id": 1301397902, "node_id": "MDU6TGFiZWwxMzAxMzk3OTAy", "url": "https://api.github.com/repos/pytorch/pytorch/labels/module:%20flaky-tests", "name": "module: flaky-tests", "color": "f7e101", "default": false, "description": "Problem is a flaky test in CI" }, { "id": 679953883, "node_id": "MDU6TGFiZWw2Nzk5NTM4ODM=", "url": "https://api.github.com/repos/pytorch/pytorch/labels/oncall:%20distributed", "name": "oncall: distributed", "color": "f7e101", "default": false, "description": "Add this issue/PR to distributed oncall triage queue" } ], "state": "open", "locked": false, "assignee": { "login": "rohan-varma", "id": 8039770, "node_id": "MDQ6VXNlcjgwMzk3NzA=", "avatar_url": "https://avatars.githubusercontent.com/u/8039770?v=4", "gravatar_id": "", "url": "https://api.github.com/users/rohan-varma", "html_url": "https://github.com/rohan-varma", "followers_url": "https://api.github.com/users/rohan-varma/followers", "following_url": "https://api.github.com/users/rohan-varma/following{/other_user}", "gists_url": "https://api.github.com/users/rohan-varma/gists{/gist_id}", "starred_url": "https://api.github.com/users/rohan-varma/starred{/owner}{/repo}", "subscriptions_url": "https://api.github.com/users/rohan-varma/subscriptions", "organizations_url": "https://api.github.com/users/rohan-varma/orgs", "repos_url": "https://api.github.com/users/rohan-varma/repos", "events_url": "https://api.github.com/users/rohan-varma/events{/privacy}", "received_events_url": "https://api.github.com/users/rohan-varma/received_events", "type": "User", "site_admin": false }, "assignees": [ { "login": "rohan-varma", "id": 8039770, "node_id": "MDQ6VXNlcjgwMzk3NzA=", "avatar_url": "https://avatars.githubusercontent.com/u/8039770?v=4", "gravatar_id": "", "url": "https://api.github.com/users/rohan-varma", "html_url": "https://github.com/rohan-varma", "followers_url": "https://api.github.com/users/rohan-varma/followers", "following_url": "https://api.github.com/users/rohan-varma/following{/other_user}", "gists_url": "https://api.github.com/users/rohan-varma/gists{/gist_id}", "starred_url": "https://api.github.com/users/rohan-varma/starred{/owner}{/repo}", "subscriptions_url": "https://api.github.com/users/rohan-varma/subscriptions", "organizations_url": "https://api.github.com/users/rohan-varma/orgs", "repos_url": "https://api.github.com/users/rohan-varma/repos", "events_url": "https://api.github.com/users/rohan-varma/events{/privacy}", "received_events_url": "https://api.github.com/users/rohan-varma/received_events", "type": "User", "site_admin": false } ], "milestone": null, "comments": 0, "created_at": "2021-06-24T14:28:33Z", "updated_at": "2021-06-24T16:40:42Z", "closed_at": null, "author_association": "CONTRIBUTOR", "active_lock_reason": null, "body": "Platforms:Mac, windows, Linux\r\n```\r\nJun 24 00:59:14 ======================================================================\r\nJun 24 00:59:14 ERROR [0.477s]: test_async_script (__main__.ProcessGroupGlooWrapperTest)\r\nJun 24 00:59:14 ----------------------------------------------------------------------\r\nJun 24 00:59:14 Traceback (most recent call last):\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 398, in wrapper\r\nJun 24 00:59:14 self._join_processes(fn)\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 590, in _join_processes\r\nJun 24 00:59:14 self._check_return_codes(elapsed_time)\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 633, in _check_return_codes\r\nJun 24 00:59:14 raise RuntimeError(error)\r\nJun 24 00:59:14 RuntimeError: Process 0 exited with error code 10 and exception:\r\nJun 24 00:59:14 RuntimeError: [/var/lib/jenkins/workspace/third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [172.17.0.2]:21400\r\nJun 24 00:59:14 \r\nJun 24 00:59:14 During handling of the above exception, another exception occurred:\r\nJun 24 00:59:14 \r\nJun 24 00:59:14 Traceback (most recent call last):\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 516, in run_test\r\nJun 24 00:59:14 getattr(self, test_name)()\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 400, in wrapper\r\nJun 24 00:59:14 fn()\r\nJun 24 00:59:14 File \"distributed/test_pg_wrapper.py\", line 270, in test_collective_hang\r\nJun 24 00:59:14 self._test_collective_hang(pg)\r\nJun 24 00:59:14 File \"distributed/test_pg_wrapper.py\", line 52, in _test_collective_hang\r\nJun 24 00:59:14 wrapper_pg.allreduce([tensor])\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/unittest/case.py\", line 217, in __exit__\r\nJun 24 00:59:14 expected_regex.pattern, str(exc_value)))\r\nJun 24 00:59:14 File \"/opt/conda/lib/python3.6/unittest/case.py\", line 135, in _raiseFailure\r\nJun 24 00:59:14 raise self.test_case.failureException(msg)\r\nJun 24 00:59:14 AssertionError: \"Ranks 1 failed to pass monitoredBarrier\" does not match \"[/var/lib/jenkins/workspace/third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [172.17.0.2]:21400\"\r\n```\r\n\r\nhttps://www.internalfb.com/intern/opensource/ci/job/log/225221175921058/\n\ncc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23", "performed_via_github_app": null, "score": 0.0 } ] } ``` Reviewed By: iramazanli Differential Revision: D29627799 Pulled By: janeyx99 fbshipit-source-id: 5ef79127cbe0055c4f41766048e66f98cf80d2c4	2021-07-09 09:29:16 -07:00
Sam Estep	e9a40de1af	Add other Linux GPU auxiliary test jobs (#61055 ) Summary: - [x] add the jobs to the matrix - [x] `jit_legacy` - [x] `nogpu_NO_AVX` - [x] `nogpu_NO_AVX2` - [x] `slow` - [x] use the test config properly to enable the different test conditions - [x] validate that it works - [x] disable on pull requests before merging Pull Request resolved: https://github.com/pytorch/pytorch/pull/61055 Test Plan: CI. Example run: https://github.com/pytorch/pytorch/actions/runs/1013240987 Reviewed By: walterddr Differential Revision: D29594080 Pulled By: samestep fbshipit-source-id: 02c531ebc42feae81ecaea0785915f95e0f53ed7	2021-07-09 09:29:15 -07:00
Xiao Wang	c966ce6933	Fix several test_ops cuda dtypes tests (#60922 ) Summary: Close https://github.com/pytorch/pytorch/issues/60443 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60922 Reviewed By: jdonald, iramazanli Differential Revision: D29630122 Pulled By: mruberry fbshipit-source-id: 441f79828860282e5849a2565facf9e7f72912e8	2021-07-09 09:29:13 -07:00
kshitij12345	5e9bcf9101	fix: support removing hook in the hook (#61250 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/58354 Problem: Once a hook is called `05c1e5b655/torch/csrc/autograd/python_hook.cpp (L51-L54)` If the hook has `handle.remove()` while executing and if there are no references to the hook function object then `python` is free to garbage collect. At the subsequent call to `05c1e5b655/torch/csrc/autograd/python_hook.cpp (L54)` we have `hook` pointing to invalid memory Thus when we try to fetch the name for `hook` from `check_single_result` with `05c1e5b655/torch/csrc/autograd/python_hook.cpp (L175-L177)` we get segfault. Solution: Temporarily increase the life-time of hook with `Py_INCREF` till we have verified the result. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61250 Reviewed By: iramazanli Differential Revision: D29623826 Pulled By: soulitzer fbshipit-source-id: c71322311f19066cafb7203980668868c59d4e5e	2021-07-09 09:27:58 -07:00
Andrew Gu	179249084b	Refactor DDP join() API, adding hooks (#60757 ) Summary: Targets https://github.com/pytorch/pytorch/issues/54318. Overview: DDP offers a `join()` context manager to accommodate training on uneven inputs. This creates a new generic `_Join()` API permitting custom hooks, refactors DDP `join()` to call this generic `_Join()`, and implements a hook for ZeRO. (For now, the generic `_Join()` is implemented as private, but this may change after design discussions are cleared.) There are two classes introduced: `_JoinHook`, the class defining the customizable join hook, and `_Join`, the generic join context manager. The `_JoinHook` provides two entry points: `main_hook()`, which is called repeatedly while there exists a non-joined process, and `post_hook()`, which is called once all process have joined with the additional `bool` argument `is_last_joiner`. The class also requires `process_group` and `device` information by defining corresponding abstract property methods. Thus, to implement a join hook, (1) inherit from `_JoinHook`, (2) override `main_hook()` and `post_hook()` as appropriate, and (3) override `process_group()` and `device()` to provide process group and device information to be used by the join context manager implementation for collective communications. The `_Join` constructor requires `join_hooks: List[_JoinHook]` and optionally `enable: bool = True` and `throw_on_early_termination: bool = False`. A training loop only needs to be wrapped with `with _Join(join_hooks):` (using the appropriate `join_hooks`) to be able to train on uneven inputs without hanging/erroring. The context manager requires a `dist.all_reduce(torch.ones(1))` to be called on every non-joined process each time before it performs its collective communications in order to indicate that the process has not yet joined. It also requires that all `process_group` attributes in the `_JoinHook` objects are the same. Notes: - The argument `is_last_joiner` to `post_hook()` may be useful for finding an authoritative rank when synchronizing. - `enable` is a flag that can be set to `False` if the user knows the current training loop will not have uneven inputs. This may be used to disable join-related computation in the classes providing join hooks. - `throw_on_early_termination` is a flag that can be set to `True` to notify processes to terminate upon detecting uneven inputs (i.e. upon the first process joining when there exists a non-joined process). Notably, the notification requires an all-reduce, so to prevent hanging/erroring, non-joined process must participate in the all-reduce. The first-joining process raises a `RuntimeError`, and the other processes are expected (but not required) to do the same. This may be used to implement training on uneven inputs in cases that do not conform to the generic join context manager (e.g. `SyncBatchNorm`). - Classes providing a join hook should do so via a `_join_hook()` method that returns a `_JoinHook` instance with the methods appropriately overridden. - If there are multiple join hooks, the device specified by the first is used by the join context manager implementation to perform its collective communications. - If there are multiple join hooks, both the main and post-hooks are iterated in the order in which the `_JoinHook` objects are passed into the context manager constructor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60757 Test Plan: The current implementation preserves backward compatibility by not changing the existing DDP `join()` API at all. To check this, I ran through the uneven input tests (`test_ddp_grad_div_uneven_inputs`, `test_ddp_uneven_inputs_stop_iteration_sync_bn`, `test_ddp_uneven_inputs`, `test_ddp_uneven_input_join_disable`, `test_ddp_uneven_input_exception`) on the AI AWS cluster: ``` touch /tmp/barrier && TEMP_DIR="/tmp" BACKEND="nccl" WORLD_SIZE="2" gpurun python test/distributed/test_distributed_fork.py -- ``` Because the existing DDP join logic does not provide correct gradients to the joined processes if `gradient_as_bucket_view=False` and a joined process requires those gradients to correctly update its shard of the parameters in `ZeroRedundancyOptimizer.step()`, DDP and ZeRO are not fully compatible at the moment. To work around this and to test ZeRO's join hook separately, I added a test `_test_zero_join()` (with `test_zero_join_gpu()` and `test_zero_join_cpu()` flavors), which compares DDP with a local optimizer on uneven inputs against ZeRO on uneven inputs with the gradients set manually. Reviewed By: iramazanli, mrshenli Differential Revision: D29624636 Pulled By: andwgu fbshipit-source-id: ec70a290e02518b0d8b683f9fed2126705b896c7	2021-07-09 08:29:20 -07:00
Philip Meier	8423ab4f99	Fix `CosineAnnealingWarmRestart` annotation (#61106 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44770. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61106 Reviewed By: 1ntEgr8 Differential Revision: D29635764 Pulled By: walterddr fbshipit-source-id: ddc45a7f04532a76d033ae7774706da1fa8608f7	2021-07-09 08:28:18 -07:00
CodemodService FBSourceClangFormatLinterBot	9b908ab0d0	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D29631829 fbshipit-source-id: 6cef1a3a091bdf0e10838d05b2e82fc0760ebe48	2021-07-09 05:28:44 -07:00
CodemodService Bot	819bac63ff	[Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` Reviewed By: zertosh Differential Revision: D29632524 fbshipit-source-id: 3eccc1804a7bf953480b9754f68ea56a2a8e3fd8	2021-07-09 05:27:29 -07:00
Luca Wehrstedt	14f63763c1	Avoid using mp.Manager to report #GPUs needed in dist tests (#61409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61409 We used a multiprocessing.Manager in order to share TEST_SKIPS between the parent and the child processes. TEST_SKIPS is a global variable that defines a unique error code for each "error type", so that the parent can figure out the reason a child exited. While originally this mapping was immutable, at some point we allowed children to modify the parent's value of that mapping so they could update the message for the `multi-gpu` error to make it reflect how many GPUs were really needed. This occurred in D23285790 (`2a4d312027`). Since then this Manager proved to be quite problematic, especially around thread safety, races, TSAN, ... (see D22753459 (`f0c46878c6`), D23641618 (`567c51cce9`), D28490129, D28794321 (`0128eb9a85`) and D29585862). This seems like an awful lot of trouble for such a small functionality. Here I propose we drop Manager and instead get the same result by using separate error codes for each number of GPUs. It should be much simpler and thus more robust. ghstack-source-id: 133236447 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D29612614 fbshipit-source-id: 8ad0fedcb7796e5832a0eb196f8fdc147e02b3df	2021-07-09 01:29:35 -07:00
Yi Wang	905cd6733e	[DDP Comm Hook] Re-enable the optimization of fusing copy and division when no comm hook is specified (#61379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61379 The optimization was accidentally removed in https://github.com/pytorch/pytorch/pull/59574 This optimization can help save a scan over all the input parameters, by fusing copy and div operations. Now the default temporary hook is allreduce by sum, and no extra division is done inside the hook. ghstack-source-id: 133288529 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_accumulate_gradients_no_sync buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16 buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_grad_is_view buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_non_default_stream buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_sparse_gradient buck test mode/dev-nosan caffe2/test/distributed:c10 -- test_ddp_checkpointing_once buck test mode/dev-nosan caffe2/test/distributed:c10 -- test_ddp_checkpointing_twice Reviewed By: rohan-varma Differential Revision: D29597614 fbshipit-source-id: 2434e4fd4e6abad7871cfe47886fe97b6e4ba28f	2021-07-09 01:29:33 -07:00
Richard Barnes	8f61d94610	Fix a variable initialization (#60896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60896 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29431625 fbshipit-source-id: 076d5ed350507b3ab1f14c1a5c7700de0427eefc	2021-07-09 01:29:31 -07:00
Richard Barnes	15010bf223	Make some downcast issues explicit (#60412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60412 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29243195 fbshipit-source-id: c508b729d6a0e6f8a591521bce788e6cfd8531f8	2021-07-09 01:29:29 -07:00
Michael Suo	6a3170dba1	[package] minor cleanups to internal APIs (#61428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61428 I was reading this code again after a while and didn't understand as quickly as I would have liked. Some of the function names are no longer accurate, etc. This PR renames these functions to be in the same language of "dependencies" that the rest of the API uses. I think the resulting usage of the APIs is more clear than before Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D29620946 Pulled By: suo fbshipit-source-id: 7df640a7ffbd43998063b9ee3955c9dfcbc42cfb	2021-07-09 01:28:24 -07:00
Zeina Migeed	d52ebf2b1b	conv2d (#61093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61093 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D29562478 Pulled By: migeed-z fbshipit-source-id: d41f3a9526ee52a9571cb861be03bf9ae176a373	2021-07-08 20:29:32 -07:00
Lily Johnson	5fbc853c5f	[package] PackageExporter remove verbose mode (#61145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61145 Remove 'verbose' mode from PackageExporter as people have complained that it is not useful. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D29559681 Pulled By: Lilyjjo fbshipit-source-id: eadb1a3a25fadc64119334a09bf1fa4b355b1edd	2021-07-08 18:26:43 -07:00
Don Jang	a74516d699	[static runtime] implement aten::log (#61393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61393 Test Plan: Added `StaticRuntime.IndividualOps_Log` ``` ... [ RUN ] StaticRuntime.IndividualOps_Log V0701 12:10:50.829100 3708165 impl.cpp:455] StaticModuleOptions: cleanup_activations 1, enable_out_variant 1, optimize_memory1, optimize_graph_output_memory0 V0701 12:10:50.888468 3708165 impl.cpp:1279] Switch to out variant for node: %3 : Tensor = aten::log(%inp.1) V0701 12:10:50.889098 3708165 impl.cpp:1279] Switch to out variant for node: %a.1 : Tensor = aten::clone(%3, %2) ``` Reviewed By: hlu1 Differential Revision: D29511622 fbshipit-source-id: 819fd7d90c084609a060efeadb3015e35acac517	2021-07-08 18:25:35 -07:00
Charles David Hernandez	06dfaadfc6	update internal function names that apply to both cpu and cuda (#59701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59701 These functions have been updated to work for cpu and cuda, their names are now changed to reflect that quantize_per_channel_cpu -> quantize_per_channel dequantize_quantized_cpu -> dequantize_quantized (Note: this ignores all push blocking failures!) Test Plan: python test/test_quantization.py TestQuantizedTensor Imported from OSS Reviewed By: raghuramank100 Differential Revision: D29018270 fbshipit-source-id: 3a0da8d5e3f357dcf19119bcdbc6172d41f2b0c1	2021-07-08 17:26:25 -07:00
BowenBao	8726f08e15	[ONNX] Update documentation (#58712 ) (#60249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60249 * Add introductory paragraph explaining what ONNX is and what the torch.onnx module does. * In "Tracing vs Scripting" and doc-string for torch.onnx.export(), clarify that exporting always happens on ScriptModules and that tracing and scripting are the two ways to produce a ScriptModule. * Remove examples of using Caffe2 to run exported models. Caffe2's website says it's deprecated, so it's probably best not to encourage people to use it by including it in examples. * Remove a lot of content that's redundant: * The example of how to mix tracing and scripting, and instead link to Introduction to TorchScript, which includes very similar content. * "Type annotations" section. Link to TorchScript docs which explain that in more detail. * "Using dictionaries to handle Named Arguments as model inputs" section. It's redundant with the description of the `args` argument to `export()`, which appears on the same page once the HTML is generated. * Remove the list of supported Tensor indexing patterns. If it's not in the list of unsupported patterns, users can assume it's supported, so having both is redundant. * Remove the list of supported operators and models. I think the list of supported operators is not very useful. A list of supported model architectures may be useful, but in reality it's already very out of date. We should add it back if / when we have a system for keeping it up to date. * "Operator Export Type" section. It's redundant with the description of the `operator_export_type` arg to to `export()`, which appears on the same page once the HTML is generated. * "Use external data format" section. It's redundant with the description of the `use_external_data_format` arg to `export()`. * "Training" section. It's redundant with the description of the `training` arg to `export()`. * Move the content about different operator implementations producing different results from the "Limitations" section into the doc for the `operator_export_type` arg. * Document "quantized" -> "caffe2" behavior of OperatorExportTypes.ONNX_ATEN_FALLBACK. * Combing the text about using torch.Tensor.item() and the text about using NumPy types into a section titled "Avoid NumPy and built-in Python types", since they're both fundamentally about the same issue. * Rename "Write PyTorch model in Torch way" to "Avoiding Pitfalls". * Lots of minor fixes: spelling, grammar, brevity, fixing links, adding links. * Clarify limitation on input and output types. Phrasing it in terms of PyTorch types is much more accessible than in terms of TorchScript types. Also clarify what actually happens when dict and str are used as inputs and outputs. * In Supported operators, use torch function and class names and link to them. This is more user friendly than using the internal aten op names. * Remove references to VariableType.h, which doesn't appear to contain the information that it once did. Instead refer to the generated .pyi files. * Remove the text in the FAQ about appending to lists within loops. I think this limitation is no longer present (perhaps since https://github.com/pytorch/pytorch/pull/51577). * Minor fixes to some code I read along the way. * Explain the current rationale for the weird ::prim_PythonOp op name. Test Plan: Imported from OSS Reviewed By: zou3519, ZolotukhinM Differential Revision: D29494912 Pulled By: SplitInfinity fbshipit-source-id: 7756c010b2320de0692369289604403d28877719 Co-authored-by: Gary Miguel <garymiguel@microsoft.com>	2021-07-08 16:29:32 -07:00
BowenBao	00b0d826a1	[ONNX] shape type inference fixes for control flow (#59319 ) (#60248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60248 * ~~Allow shape inference to skip for blocks by checking unsupported cases recursively. Currently onnx::Identity would trigger a shape inference failure.~~ Fixed in onnx submodule 1.9. * Remove previous special post process for if op, since that was for constant folding, and now it is handled elsewhere. Update new post process for control flow nodes to copy assign node shape from subblock output shape correctly. Test Plan: Imported from OSS Reviewed By: zou3519, ZolotukhinM Differential Revision: D29494913 Pulled By: SplitInfinity fbshipit-source-id: de274a388df86e86403981e1b89b8b4a0d1e26d1 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-07-08 16:29:30 -07:00
BowenBao	81f95cce59	[ONNX] Extend chunk for dynamic chunk values (#59644 ) (#60247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60247 Related to #42785 Test Plan: Imported from OSS Reviewed By: zou3519, ZolotukhinM Differential Revision: D29494914 Pulled By: SplitInfinity fbshipit-source-id: 51ddb876d00185e59cfe54a8af5a9c8dd073a09f Co-authored-by: Shubham Bhokare <shubhambhokare@gmail.com>	2021-07-08 16:29:28 -07:00
BowenBao	d9dc94406f	[ONNX] Add linspace symbolic (#58854 ) (#60246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60246 * Adds support for linspace op * Modifies arange symbolic in opset 9 to replicate the same behavior in which dtype is determined (similar to opset 11) as in https://pytorch.org/docs/stable/generated/torch.arange.html * Enabled some arange unit tests which were disabled for opset 9 Test Plan: Imported from OSS Reviewed By: zou3519, ZolotukhinM Differential Revision: D29494911 Pulled By: SplitInfinity fbshipit-source-id: bddff18a90f8a78121c8ecdd1dafc15c69962d66 Co-authored-by: Shubham Bhokare <shubhambhokare@gmail.com>	2021-07-08 16:29:26 -07:00
BowenBao	4ccfa3ffeb	[ONNX] Fix sum export with attribute keepdims (#59316 ) (#60245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60245 Fix after b9bdb07a0261ab5a0b1038f290fa03af6ce0415f. Improving previous fix on two aspects * Not only checks 0 on first dimension for empty tensor. * Do not assume empty tensor when shape is not accessible. Test Plan: Imported from OSS Reviewed By: zou3519, ZolotukhinM Differential Revision: D29494917 Pulled By: SplitInfinity fbshipit-source-id: 02587c3c3be0510312c1a1959f28cab12d81812d Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-07-08 16:29:24 -07:00
BowenBao	95a7f3ccfe	[ONNX] Fix shape inference for large model (#59320 ) (#60244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60244 Do 2GB size check for protocol buffer serialization at a later time, to avoid false alarming for cases like shape inference where no serialization actually happens. Test Plan: Imported from OSS Reviewed By: zou3519, ZolotukhinM Differential Revision: D29494910 Pulled By: SplitInfinity fbshipit-source-id: 4c36d26de9a94e5d6cf78f332d4dffc46588ebf0 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-07-08 16:29:22 -07:00
BowenBao	9636c077c3	[ONNX] Handle onnx::Size in ComputeConstant folding (#59122 ) (#60243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60243 Handle onnx::Size in ComputeConstant folding Test Plan: Imported from OSS Reviewed By: zou3519, ZolotukhinM Differential Revision: D29494915 Pulled By: SplitInfinity fbshipit-source-id: 9782e356f5e36ae1dd2819412f970010360e9cc0 Co-authored-by: jiafatom <jiafa@microsoft.com>	2021-07-08 16:29:21 -07:00
Rong Rong	38c48e42c6	[Reland][BE] add test wall time report (#61389 ) Summary: This is a reland of https://github.com/pytorch/pytorch/issues/61322. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61389 Reviewed By: malfet Differential Revision: D29601573 Pulled By: walterddr fbshipit-source-id: dfb2bdc7d72d493c01b9dbac50ef9b79c1782054	2021-07-08 16:29:19 -07:00
Elton Leander Pinto	7481c6fc02	Bump googletest version to v1.11.0 (#61395 ) Summary: This PR bumps the `googletest` version to v1.11.0. To facilitate this change, `CAFFE2_ASAN_FLAG` and `CAFFE2_TSAN_FLAG` are divided into corresponding compiler and linker variants. This is required because `googletest v1.11.0` sets the `-Werror` flag. The `-pie` flag is a linker flag, and passing it to a compiler invocation results in a `-Wunused-command-line-argument` warning, which in turn will cause `googletest` to fail to build with ASAN. Fixes https://github.com/pytorch/pytorch/issues/60865 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61395 Reviewed By: iramazanli Differential Revision: D29620970 Pulled By: 1ntEgr8 fbshipit-source-id: cdb1d3d12e0fff834c2e62971e42c03f8c3fbf1b	2021-07-08 16:29:17 -07:00
Aliaksandr Ivanou	13658b10bb	[torch] Various improvements to `torch.distributed.launch` and `torch.distributed.run` (#61294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61294 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60925 * Make `torch.distributed.launch` restarts to 0 * Remove unnecessary `-use_env` warning, move `-use_env` warnings * Move `-use_env` warnings to `torch.distributed.launch` * Make default log level WARNING * Add new doc section around transitioning to `torch.distributed.run` * Make `torch.distributed.launch` not use error-propagation * Set default events handler to `null` that does not print events to console * Add reference from `torch.distributed.launch` to `torch.distributed.run` * Set correct preexec function that sends SIGTERM to child processes when parent dies Issues resolved: https://github.com/pytorch/pytorch/issues/60716 https://github.com/pytorch/pytorch/issues/60754 Test Plan: sandcastle python -m torch.distributed.launch --nproc_per_node 2 main.py -> uses 0 restarts python -m torch.distributed.run --nproc_per_node 2 main.py -> uses default for torchelastic, 0 restarts python -m torch.distributed.launch --nproc_per_node=4 --use_env --no_python main.py -> produces error python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py -> no warning python -m torch.distributed.launch --nproc_per_node=4 --no_python main.py ->warning Output of running torch.distributed.launch without --use_env: $path/torch/distributed/launch.py:173: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torch.distributed.run. Note that --use_env is set by default in torch.distributed.run. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ('LOCAL_RANK')` instead. New section: {F628923078} {F628974089} Reviewed By: cbalioglu Differential Revision: D29559553 fbshipit-source-id: 03ed9ba638bf154354e1530ffc964688431edf6b	2021-07-08 16:28:06 -07:00
Pavel Belevich	10f372601d	Support RRefs that contain torch.cuda.Event (#61354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61354 Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D29617155 Pulled By: pbelevich fbshipit-source-id: 6e56b3fd0a0f93ecec048b58c90f2a47b4cba688	2021-07-08 15:33:08 -07:00
Brian Hirsh	8bc2ba3fe3	detect missing kernels from external backends in codegen (#60737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60737 Test Plan: Imported from OSS Reviewed By: ezyang, jdonald Differential Revision: D29392615 Pulled By: bdhirsh fbshipit-source-id: d49d013243dbc8c8b55fbdb0b9b3eed38df52255	2021-07-08 15:33:04 -07:00
Brian Hirsh	7318747a3b	move all external kernels into a class for better compiler error messages (#59839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59839 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D29047680 Pulled By: bdhirsh fbshipit-source-id: 18cf4124be440a0a343b5983e1a4165db808e7c1	2021-07-08 15:31:02 -07:00
Janet Yang	86eac5b456	[caffe2] Check for number of created subnets and optionally throw an error (#57366 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57366 We often get error messages such as ``` Model failed AOT (glow ahead-of-time compilation) with exception: Error during AOT optimization (non-provisioned addNetwork): Non-recoverable device error when adding network: Error code: PARTITIONER_ERROR Error message: Did not find a partition with an SLS node Error return stack: -------------------------------------------------------------------------------- glow/glow/lib/Partitioner/Partitioner.cpp:1244 -------------------------------------------------------------------------------- glow/glow/lib/Runtime/HostManager/HostManager.cpp:375 -------------------------------------------------------------------------------- ``` This makes the error message more clear by checking for the number of OnnixifiOp created before going into Glow. The check is enabled with the `verify_only_single_subnet` flag, and is disabled by default. Test Plan: Unit tests pass Reviewed By: khabinov Differential Revision: D28097674 fbshipit-source-id: 0eefd8f6ec1a82546b759be8e541256bf271a673	2021-07-08 14:29:03 -07:00
Michael Carilli	0fc110cdd1	[CUDA graphs] Don't sync between replays for cuda driver version 11.4+ (#61063 ) Summary: The bug in libcuda.so that required https://github.com/pytorch/pytorch/pull/57556 is fixed for libcuda.so versions >= 11.4. This PR changes replay() to sync after each launch only if the process's in-use libcuda.so is < 11.4. With all the "enhanced" and "forward" compatibility promises flying around, and the fact that "driver" sometimes means kernel-mode driver and sometimes means user-mode driver (libcuda.so), I wasn't sure if this PR's check suffices to trigger the sync iff the in-use libcuda.so is < 11.4, but Cuda people say what I wrote is reasonable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61063 Reviewed By: mruberry Differential Revision: D29600907 Pulled By: ngimel fbshipit-source-id: 71bf0bcbde43091e29f3812440abeb7a95d161e2	2021-07-08 13:26:07 -07:00
Nikita Shulga	80797d03e0	Simplify lambda syntax in SegmentReduce.cpp (#61416 ) Summary: Fixes Windows build by dismantling a combination nested lambdas+preprocessor magic into explicit templates Pull Request resolved: https://github.com/pytorch/pytorch/pull/61416 Reviewed By: pbelevich Differential Revision: D29616449 Pulled By: malfet fbshipit-source-id: 687ef73b8b37bc272f82d44fc690448e403e3a0c	2021-07-08 12:30:35 -07:00
Howard Huang	cdc027679b	Add compare_set in distributed docs (#61351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61351 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29588206 Pulled By: H-Huang fbshipit-source-id: 9db48e7b6de29503275f10616470ad2d66b075f9	2021-07-08 12:30:32 -07:00
Eli Uriegas	f01a4e3b02	.github: Ensure build-results per job is unique (#61005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61005 build-results have the potential to be tainted between jobs since runs are not ephemeral Signed-off-by: Eli Uriegas <seemethere101@gmail.com> Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D29526747 Pulled By: seemethere fbshipit-source-id: f8c5bc5f647b771a059cbe380d694ce6dc535ae4	2021-07-08 12:30:28 -07:00
Yi Wang	4beb5f9ad6	[DDP Comm Hook] Fix some comments (#61376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61376 After SPMD is retired, the API of `get_tensors` becomes `get_tensor`. Fix some comments that refer to the obsolete API. The `allreduce` hook example does not do division inside, which actually is incorrect. ghstack-source-id: 133174272 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D29596857 fbshipit-source-id: 2046b185225cd6d1d104907b5f9b4009b6e87c99	2021-07-08 12:30:24 -07:00
Kyle Chen	dfe25069a8	[ROCm] Skip test__stress_cuda test for ROCm (#60490 ) Summary: Skipping test__stress_cuda tests because they sometimes fail for ROCm Signed-off-by: Kyle Chen <kylechen@amd.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/60490 Reviewed By: SciPioneer Differential Revision: D29595552 Pulled By: rohan-varma fbshipit-source-id: fee18204775211747337985c472ab1084a71f2f1	2021-07-08 12:28:06 -07:00
Jane Xu	9310f6bac1	Use our own statically stored vs_buildtools.exe (#61372 ) Summary: We might be getting limited for our VS install requests, leading to HUD failures. This PR moves it to curl from our own S3, so we wouldn't get limited. This PR also upgrades our vs_install to 16.8.6 from 16.8.5 as moving to S3 didn't help, but moving to the newer installer did. The CI passes the VS install now, but fails on a build error that I don't think is relevant: https://github.com/pytorch/pytorch/pull/61372/checks?check_run_id=3013140957 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61372 Reviewed By: iramazanli Differential Revision: D29597204 Pulled By: janeyx99 fbshipit-source-id: 3eb52da308451271ea80120bbf2e511fb781b5dc	2021-07-08 11:27:02 -07:00
jjsjann123	ac5b910600	clang-tidy patch (#60714 ) Summary: Two changes made here: 1. Set `LANG=C.UTF-8` for clang-tidy so we can properly decode symbols in comment; 2. In case of file removed, `end` could be null and we should skip the chunk/file; 3. tiny bug fix for the loop indent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60714 Reviewed By: iramazanli Differential Revision: D29617171 Pulled By: 1ntEgr8 fbshipit-source-id: b1603929333529a174105baf51e18246d09c012e	2021-07-08 11:16:00 -07:00
David Riazati	074c776011	Force mypy colors in CI (#61391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61391 Both the [GitHub Actions log viewer](https://github.community/t/ansi-color-output-in-webview/17621) and the HUD PR page log viewer support ANSI color codes so turn those on via this [secret env variable](https://github.com/python/mypy/issues/7771) Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D29602686 Pulled By: driazati fbshipit-source-id: e8f4cd71572cc068927e6719534e64773cb16c7f	2021-07-08 11:08:38 -07:00
Paul Johnson	c76eba650a	[bootcamp][pytorch][WIP] Support embedding_bag_byte_rowwise_offsets in cuda (#61075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61075 Completed implementation of the embedding_bag_byte_rowwise_offsets wrote randomized test comparing GPU and CPU kernel outputs. Test Plan: ``` buck build mode/opt --show-full-output //caffe2/torch/fb/sparsenn:gpu_test /data/users/johnsonpaul/fbsource/fbcode/buck-out/gen/caffe2/torch/fb/sparsenn/gpu_test#binary.par -r test_embedding_bag_byte_rowwise_offsets ``` Reviewed By: hyuen Differential Revision: D29218597 fbshipit-source-id: 786260466ab4e8e3d89540496bd8a38be14c5c1b	2021-07-08 10:51:50 -07:00
Pavithran Ramachandran	9ef1c64907	[PyTorch][Edge] Tests for QuantizationFx API on lite modules (#60476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60476 # Context Add tests for Lite modules that are quantized using fx API Read this posts for details about why we need a test bench for quantized lite modules https://fb.workplace.com/groups/2322282031156145/permalink/4289792691071726/ https://github.com/pytorch/pytorch/pull/60226#discussion_r654615851 moved common code to `caffe2/torch/testing/_internal/common_quantization.py` ghstack-source-id: 133144292 Test Plan: ``` ~/fbsource/fbcode] buck test caffe2/test:fx_quantization_lite Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss Building: finished in 8.3 sec (100%) 11892/11892 jobs, 2 updated Total time: 8.6 sec More details at https://www.internalfb.com/intern/buck/build/ffb7d517-d85e-4c8f-9531-5e5d9ca1d34c Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: d79a5713-bd29-4bbf-ae76-33a413869a09 Trace available for this run at /tmp/tpx-20210630-105547.675980/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/3096224749578707 ✓ ListingSuccess: caffe2/test:fx_quantization_lite - main (9.423) ✓ Pass: caffe2/test:fx_quantization_lite - test_embedding (mobile.test_quantize_fx_lite_script_module.TestFuseFx) (10.630) ✓ Pass: caffe2/test:fx_quantization_lite - test_submodule (mobile.test_quantize_fx_lite_script_module.TestFuseFx) (12.464) ✓ Pass: caffe2/test:fx_quantization_lite - test_conv2d (mobile.test_quantize_fx_lite_script_module.TestFuseFx) (12.728) Summary Pass: 3 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/3096224749578707 ``` Reviewed By: iseeyuan Differential Revision: D29306402 fbshipit-source-id: aa481e0f696b7e9b04b9dcc6516e8a390f7dc1be	2021-07-08 10:40:08 -07:00
Xiao Wang	179b3ab88c	[cuDNN] Enable cudnn_batchnorm_spatial_persistent for BatchNorm3d channels_last_3d (#59129 ) Summary: This PR enables the use of cuDNN BatchNorm spatial persistent algorithm for BatchNorm3d (5-D tensor) in channels_last_3d format, aka NDHWC. Performance and numerical accuracy are tested. - [x] Performance check for common shapes. - [x] Numerical accuracy check for (1 million) random shapes https://github.com/xwang233/code-snippet/tree/master/batchnorm3d-channels-last/A100 https://github.com/xwang233/code-snippet/tree/master/batchnorm3d-channels-last/V100 - [ ] Convergence check for common 3D models Pull Request resolved: https://github.com/pytorch/pytorch/pull/59129 Reviewed By: mruberry Differential Revision: D29593309 Pulled By: ngimel fbshipit-source-id: 2caf282c6cf2f426aa14a24f94e6bddada68ddac	2021-07-07 21:28:29 -07:00
Pritam Damania	0222291544	Fix docs for ShardMetadata. (#61388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61388 The doc for `placement` argument was outdated and is now fixed. ghstack-source-id: 133184441 Test Plan: waitforbuildbot Reviewed By: wanchaol Differential Revision: D29601316 fbshipit-source-id: a0817f799382bf91a5192c54dfeea4d253eb0d56	2021-07-07 21:27:30 -07:00
Ivan Yashchuk	7011513d23	Enable sparse_csr.to_dense() for bool, float16, bfloat16 and complex (#60657 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60657 Fixes https://github.com/pytorch/pytorch/issues/60648 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29408102 Pulled By: cpuhrsch fbshipit-source-id: 406505c1c52c0eada934833f9723f58fa67e9256	2021-07-07 19:29:19 -07:00
Brian Hirsh	5054cb8934	fix torch.cat bug with boxed CPUFallback (#60993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60993 Fixes https://github.com/pytorch/pytorch/issues/60902 The boxed fallback was written to assume that there was at least one tensor argument, which it used to figure out what device to move the cpu tensors to. That fails with an op like `torch.cat()`, which doesn't have any tensor arguments, but instead has a single `TensorList` argument. I also added handling to gracefully deal with the case where you have an empty list of tensors - in that case we don't know what device to move everything to, but that doesn't matter because an empty list of tensors implies that we have no tensors to move anyway. I tested it out though and noticed that `torch.cat(())` doesn't handle empty lists well anyway (erroring out in the dispatcher). I'm not sure that it's a huge issue, and not even sure that we want to fix it (default to CPU? add an extra codegen'd check into every op that only takes TensorList args?) but I'll file a separate bug for that: https://github.com/pytorch/pytorch/issues/60997 I tested it by running the pytorch/xla suite after removing `cat` from `xla_native_functions.yaml`, and confirming that we don't segfault anymore. Test Plan: Imported from OSS Reviewed By: asuhan Differential Revision: D29471577 Pulled By: bdhirsh fbshipit-source-id: 58c96e8d48d993785b8d15cfa846ec745a34e623	2021-07-07 19:29:17 -07:00
Tao Xu	141bfbef86	[iOS GPU] Add tanh and clamp to support GAN (#61383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61383 Since we already have the support for hardtanh, it's easy to add support for clamp. GPU is 40% ish faster. ghstack-source-id: 133113272 Test Plan: - CI - buck test pp-macos Reviewed By: dhruvbird Differential Revision: D29572933 fbshipit-source-id: d22ec09e18d02456440f552067c9a8aea9a1a8ab	2021-07-07 19:29:16 -07:00
Richard Zou	4937d9fd6f	Fix Dispatching not considering List[Optional[Tensor]] for dispatch (#60787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60787 Fixes #60461. Previously, when one calls `self.index(indices)` using a regular `self` Tensor and a `BatchedTensor` indices the dispatcher would not dispatch to the Batched key. This is because the dispatcher did not extract dispatch keys from `indices`. Similar #58283 and #58296, this PR modifies the dispatcher to extract dispatch keys from List[Optional[Tensor]] arguments. We do this for both boxed and unboxed kernels. Test Plan: - run the test case in https://gist.github.com/zou3519/4421df7c5271376a0ef53ca857b18740 (requires functorch). After this PR, it raises `RuntimeError: Batching rule not implemented for aten::index.Tensor. We could not generate a fallback.`, which shows that dispatch happened on the Batched key. - Taking suggestions for how to write a test for this in core Reviewed By: jbschlosser Differential Revision: D29438611 Pulled By: zou3519 fbshipit-source-id: 77e182f763e18aa3fa857eebafa8b7f83384db71	2021-07-07 19:28:07 -07:00
Michael Suo	426c42ba45	[package] ensure we don't write files twice to the archive. (#61371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61371 The ZIP format allows for writing multiple files with the same name. But this is handled poorly by most tooling (including our own), so doing so produces weird behavior depending on the implementation of the ZIP reader. Since we have no valid use case for writing multiple files with the same name to a `torch.package`, just ban it. Differential Revision: D29595518 D29595518 Test Plan: Imported from OSS Reviewed By: Lilyjjo Pulled By: suo fbshipit-source-id: b9f5263ab47572abde233745c102af3d6143946e	2021-07-07 18:28:42 -07:00
Pritam Damania	1d1d5acbb0	[RPC] Ensure _wait_all_workers doesn't swallow exception. (#61094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61094 `_wait_all_workers` was swallowing exceptions and as a result if there were any errors it would still continue with rpc_agent.join() which would hang since something already failed before. To fix this, I've ensured that wait_all_workers throws and in that case we just proceed with an ungraceful shutdown without joining. ghstack-source-id: 133160706 Test Plan: 1) Added unit test. 2) waitforbuildbot Reviewed By: rohan-varma Differential Revision: D29509286 fbshipit-source-id: 7c3f1c68d712ae2f63e10e0216580db8e9bcc29d	2021-07-07 18:28:41 -07:00
Ivan Kobzarev	7b6ddb6793	[nnapi] add log_softmax (#61378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61378 Test Plan: Imported from OSS Reviewed By: axitkhurana Differential Revision: D29597355 Pulled By: IvanKobzarev fbshipit-source-id: 55124749f8eeffa2b2713f7cffd5ccf965561de1	2021-07-07 18:28:39 -07:00
Richard Barnes	eb82a88d85	Add a type for test fixture world_size (#61363 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61363 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29561360 fbshipit-source-id: 821217e33adc483b1810585a2b91a2ee416513bd	2021-07-07 18:27:37 -07:00
Charles David Hernandez	d51b437b74	Cuda quantized tensors, support for quantize per channel (#58245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58245 This adds the support for the per_channel quantization, (Note: this ignores all push blocking failures!) Test Plan: python test/test_quantization.py TestQuantizedTensors python test/test_quantization.py TestQuantizedTensors.test_compare_quant_dequant_device_numerics python test/test_quantization.py TestQuantizedTensors.test_qtensor_to_device Imported from OSS Reviewed By: raghuramank100 Differential Revision: D29018271 fbshipit-source-id: 4f59aed98f2f8ff607154250e4e3f85592e17854	2021-07-07 17:36:53 -07:00
Jeffrey Wan	b1dc9c3946	Skip _cudnn_rnn_backward in codegen check (#61386 ) Summary: Fixes internal test failure encountered internally For context see: https://github.com/pytorch/pytorch/issues/60426 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61386 Reviewed By: malfet Differential Revision: D29601031 Pulled By: soulitzer fbshipit-source-id: 3592ca45a01e7bbaa804ab5404338191154f0fbc	2021-07-07 17:36:51 -07:00
Rong Rong (AI Infra)	b25c65b4f3	Revert D29589020: [pytorch][PR] adding a build_start_time_epoch to build meta info Test Plan: revert-hammer Differential Revision: D29589020 (`d33066ab3f`) Original commit changeset: 309fc3b01cbc fbshipit-source-id: 9b50c1e8dd63e59ab4e593d250dfd5eeb623f0af	2021-07-07 17:35:29 -07:00
Ivan Yashchuk	9dd1824741	Fix dispatch keys for eigh, lu_solve (#60945 ) Summary: I added a test to `test_ops.py` that verifies that the op can run correctly from different cuda devices. This test revealed that `linalg_eigh`, `linalg_eigvalsh`, `linalg_matrix_rank`, `linalg_pinv` were failing. `matrix_rank` and `pinv` are calling `eigh` internally. `linalg_eigh` and `lu_solve` internally use dispatch stubs, so they should be registered with `CPU, CUDA` dispatch keys. The generated code includes device guards in this case and the problem is not present. Implemented a better out variant for `eigvalsh` and registered it with `CPU, CUDA` dispatch keys. ~I added a device guard to `linalg_eigh_kernel` as a fix for `eigvalsh` function. This function needs to be registered as CompositeImplicitAutograd, because it calls `at::linalg_eigh` if `at::GradMode::is_enabled()`.~ Fixes https://github.com/pytorch/pytorch/issues/60892. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60945 Reviewed By: mruberry Differential Revision: D29589580 Pulled By: ngimel fbshipit-source-id: 5851605958bdfc3a1a1768263934619449957168	2021-07-07 16:28:22 -07:00
Jane Xu	fb00194030	Fix typo in common_utils.py (#61365 ) Summary: Missed this in review of https://github.com/pytorch/pytorch/pull/57953. I don't think this has affected much, though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61365 Reviewed By: walterddr Differential Revision: D29593764 Pulled By: janeyx99 fbshipit-source-id: 2c6f6aa961eabca0d8b8a7607aaae979667cca3b	2021-07-07 16:28:20 -07:00
zhouzhuojie	6107cf3750	Add --jobs 0 for git submodule update (#61311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61311 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61152 Some related docs about `submodule.fetchJobs` https://git-scm.com/docs/git-config#Documentation/git-config.txt-submodulefetchJobs ``` time git submodule update --init --recursive ________________________________________________________ Executed in 243.20 secs fish external usr time 49.64 secs 213.00 micros 49.64 secs sys time 29.27 secs 795.00 micros 29.27 secs ``` ``` time git submodule update --init --recursive --jobs 4 ________________________________________________________ Executed in 143.04 secs fish external usr time 51.06 secs 246.00 micros 51.06 secs sys time 30.96 secs 742.00 micros 30.96 secs ``` ``` time git submodule update --init --recursive --jobs 8 ________________________________________________________ Executed in 124.64 secs fish external usr time 51.76 secs 264.00 micros 51.76 secs sys time 30.49 secs 739.00 micros 30.49 secs ``` ``` time git submodule update --init --recursive --jobs 0 # use all online cpus ________________________________________________________ Executed in 129.75 secs fish external usr time 51.64 secs 181.00 micros 51.64 secs sys time 31.49 secs 781.00 micros 31.49 secs ``` Test Plan: Imported from OSS Reviewed By: 1ntEgr8 Differential Revision: D29560875 Pulled By: zhouzhuojie fbshipit-source-id: 556027dffe744c66428075a8a1bf64683930aaaf	2021-07-07 16:28:18 -07:00
Rong Rong	d33066ab3f	adding a build_start_time_epoch to build meta info (#61322 ) Summary: Adding a `build_start_time_epoch` as a normal field in scribe reporting. This should fix https://github.com/pytorch/pytorch/issues/60591. The decision was made because: - we would like only one build (test CI job) start time as partition key string - the alternative is to report the duration on each test case individually which would result in duplicate numeric value upload. - we would be easily calculate the wall-time of a test job from `MAX('time') - build_start_time_epoch` for all reporting messages with the same normal keys. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61322 Test Plan: CI should report the extra normal field. See: https://fburl.com/scuba/pytorch_test_times/pm6chz9w Reviewed By: driazati Differential Revision: D29589020 Pulled By: walterddr fbshipit-source-id: 309fc3b01cbce76cd62f8ccd2eb0ecad27782b88	2021-07-07 16:27:13 -07:00
Peter Bell	429436edbd	Avoid complex-to-real cast warning in CopyBackward (#60021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60021 Dropping the imaginary component is expected and gives the correct gradient formula, so silencing the warning is appropriate. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29589371 Pulled By: mruberry fbshipit-source-id: 73e1511cae69207dc9abe576e2769ee1d03f1bbd	2021-07-07 15:28:38 -07:00
Peter Bell	10b2a24508	Migrate log_sigmoid (forward and backward) to ATen (CUDA) (#60881 ) Summary: Fixes gh-24591, fixes gh-24590, closes gh-39642 Benchmarks were run with nvprof using contiguous inputs; they show improvement across the board. #### Forward benchmarks \| Num Elements \| Master (us) \| This PR (us) \| \|:------------:\|:-----------:\|:------------:\| \| 10^4 \| 2.5840 \| 2.5230 \| \| 10^5 \| 4.6410 \| 3.9280 \| \| 10^6 \| 33.772 \| 23.025 \| \| 10^7 \| 299.67 \| 206.35 \| \| 10^8 \| 3001.9 \| 2052.8 \| #### Backward benchmarks \| Num Elements \| Master (us) \| This PR (us) \| \|:------------:\|:-----------:\|:------------:\| \| 10^4 \| 2.7750 \| 2.7080 \| \| 10^5 \| 5.2430 \| 3.9010 \| \| 10^6 \| 46.198 \| 32.878 \| \| 10^7 \| 447.18 \| 296.18 \| \| 10^8 \| 4393.2 \| 2938.0 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/60881 Reviewed By: mruberry Differential Revision: D29589455 Pulled By: ngimel fbshipit-source-id: 70cd5db244bf6292e9ca367462640530a1d85f7d	2021-07-07 15:28:36 -07:00
David Riazati	f86460a352	Add coverage files to .gitignore (#61144 ) Summary: Fixes failures when coverage is turned on: https://github.com/pytorch/pytorch/runs/2966295169 https://github.com/pytorch/pytorch/runs/2964409741 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61144 Test Plan: ```bash $ echo hi > test/.coverage.jit.1625168654.4504092 $ git status $ ``` Reviewed By: zhouzhuojie Differential Revision: D29530709 Pulled By: driazati fbshipit-source-id: 0e6a1cb217c4d48f14c0c58a546f98393d2b0392	2021-07-07 15:28:35 -07:00
Karen Zhou	5e83fefdf8	[sparsity] sparsifier `step` tests (#60107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60107 Unit tests for sparsifier `step` Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestWeightNormSparsifier` https://pxl.cl/1LhQP Reviewed By: z-a-f Differential Revision: D29167029 fbshipit-source-id: 053027ca92701097406372ef0f81d79ef28380aa	2021-07-07 15:28:33 -07:00
Karen Zhou	8881b9d852	[sparsity] sparsifier `convert` tests (#60105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60105 Unit tests for sparsifier `convert` Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestWeightNormSparsifier` https://pxl.cl/1LhQ8 Reviewed By: z-a-f Differential Revision: D29145450 fbshipit-source-id: b87b8f0d44751a7dae19d454a11b2d207a7286e2	2021-07-07 15:28:31 -07:00
Karen Zhou	ec200a60bd	[sparsity] sparsifier `prepare` tests (#60042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60042 Unit tests for sparsifier `prepare` Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestWeightNormSparsifier` https://pxl.cl/1LhR1 Reviewed By: z-a-f Differential Revision: D29140945 fbshipit-source-id: 73cbf27f278ce849e3930ba6caa82bb2f64f1321	2021-07-07 15:28:30 -07:00
Karen Zhou	21ad978d4f	[sparsity] rename `sparsity_pattern` to `sparse_block_shape` (#59898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59898 In `weight_norm_sparsifier`, the name of the argument `sparsity_pattern` is not intuitive for an argument describing the shape of the sparse block. It has been changed to `sparse_block_shape`. Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestWeightNormSparsifier` https://pxl.cl/1LhRM Reviewed By: z-a-f Differential Revision: D29077045 fbshipit-source-id: 0cf9c5387d41ca8e839ee050d71f4fe477374143	2021-07-07 15:27:16 -07:00
Hui Guo	aa6a8a6d21	[nnc] Add LoopNest::unsafe_fuseLoops to let users apply fusion on stmts that may violate our correctness checks (#60601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60601 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D29346128 Pulled By: huiguoo fbshipit-source-id: 0eb143e97dc57224adeedf99981036ad836e5a03	2021-07-07 14:27:18 -07:00
Ramgopal Venkateswaran	8fd90f7cfd	Implementing transpose for PackedTensorAccessor (#61114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61114 Matching the functionality of THCDeviceTensor::transpose. This is the same as PR 60968 (https://github.com/pytorch/pytorch/pull/60968) which was already approved; the state of the PR got messed up so creating a fresh one. ghstack-source-id: 133050553 Test Plan: Unit tests at aten/src/ATen/test/packedtensoraccessor_test.cpp Imported from OSS Reviewed By: ezyang Differential Revision: D29516530 fbshipit-source-id: 91d5bcc38381c00420825646b1c352c0d6bc8b3f	2021-07-07 14:27:16 -07:00
Zeina Migeed	39a76fe73c	BatchNorm2D (#61012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61012 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29562337 Pulled By: migeed-z fbshipit-source-id: 2b848d0af607bd4f36cea2436ab2278ac4bc28d7	2021-07-07 14:26:07 -07:00
Jiewen Tan	357c4d9cc4	Add a test case for findDanglingImpls (#61104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61104 This patch added a new test case for findDanglingImpls. The test case introduces a C++ extension which has a dangling impl such that findDanglingImpls can find it and output its information. Test Plan: python test/test_dispatch.py TestDispatch.test_find_dangling_impls_ext Imported from OSS Reviewed By: ezyang Differential Revision: D29512520 fbshipit-source-id: 6883fb8f065f2c0ae0e7a1adf6fd298591497e2b	2021-07-07 13:34:16 -07:00
Akifumi Imanishi	4d9fd8958b	Support `__rand__`, `__ror__` and `__rxor__` (#59240 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58120. This PR implements `torch.Tensor.{__rand__/__ror__/__rxor__}` for the compatibility with NumPy’s interface. (cc: mruberry, rgommers, emcastillo, kmaehashi) Pull Request resolved: https://github.com/pytorch/pytorch/pull/59240 Reviewed By: ngimel Differential Revision: D29482304 Pulled By: mruberry fbshipit-source-id: 13789202c1d8dddf8658a45381aeedcc31e2f603	2021-07-07 13:34:14 -07:00
Nikita Shulga	9547e57643	Create SECURITY.md (#61356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61356 Reviewed By: samestep Differential Revision: D29589904 Pulled By: malfet fbshipit-source-id: 5d79d25e35af9cb258fd6843559955360dc0cc4e	2021-07-07 13:34:12 -07:00
Serhat Yilmaz	f84a441718	[torch][segment_reduce] Update default values when initial value is not set (#61266 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61266 Same as title. Mainly this concludes the initially planned features from the op. Only missing functionality is to do reduction on any axis (currently axis 0 only is supported). Test Plan: Updated unit test. Reviewed By: ngimel Differential Revision: D29552037 fbshipit-source-id: 023c7cbf750a0671f76082708f14c05739dda07a	2021-07-07 13:34:10 -07:00
Serhat Yilmaz	a78ad5dc4c	[torch][segment_reduce] Add support for int lengths as well (#61141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61141 Currently only long is supported. This diff adds support for other index type. Next Steps: - Update default, refactor unit test and test non_initial value as well - Cleanup (more tests, benchmark, update documentation) Test Plan: updated unit test. rely on CI. Reviewed By: ngimel Differential Revision: D29526308 fbshipit-source-id: b4043603483851ef7e0e93b0bb02ac7849c6449d	2021-07-07 13:34:09 -07:00
Kushashwa Ravi Shrimali	423523d8bb	Alias for logsumexp to special namespace (#58838 ) Summary: See https://github.com/pytorch/pytorch/issues/50345 cc: kshitij12345 Lezcano mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/58838 Reviewed By: malfet Differential Revision: D29565033 Pulled By: mruberry fbshipit-source-id: 9b715ea00c78f47b6f183357ee3c7d4c3abe4d01	2021-07-07 13:32:15 -07:00
Sam Estep	c03f99f3ef	Remove pyproject.toml (#61367 ) Summary: This reverts https://github.com/pytorch/pytorch/issues/60408, since it doesn't really give much benefit, and it ended up breaking things: - https://github.com/pytorch/pytorch/issues/60665 - https://github.com/pytorch/pytorch/pull/60408#issuecomment-873979383 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61367 Reviewed By: malfet, janeyx99 Differential Revision: D29593886 Pulled By: samestep fbshipit-source-id: b1ba0ac7695e3eacf66a35e293080e8a1240efca	2021-07-07 12:47:45 -07:00
Charles David Hernandez	994ce7dbd9	Cuda quantized tensors, support for quantize per tensor (#59700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59700 implements quantized tensors in cuda for for per_tensor quantization, along with several necessary functions (Note: this ignores all push blocking failures!) Test Plan: python test/test_quantization.py TestQuantizedTensors python test/test_quantization.py TestQuantizedTensors.test_compare_quant_dequant_device_numerics python test/test_quantization.py TestQuantizedTensors.test_qtensor_to_device Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29018272 fbshipit-source-id: e07d19d6d67729c46324c2bb5946d959e6e6db8e	2021-07-07 12:40:51 -07:00
Akshit Khurana	baa518e2f6	Add Int32 support for NNAPI (#59365 ) Summary: Support Int32 tensors in NNAPI converter Pull Request resolved: https://github.com/pytorch/pytorch/pull/59365 Test Plan: Local testing with FB prod models Reviewed By: anshuljain1 Differential Revision: D28881040 fbshipit-source-id: 2dacceffd322a21d91bfefcf2fb2ea400d952d0d	2021-07-07 12:40:49 -07:00
Akshit Khurana	cf285d8eea	Add aten::slice NNAPI converter (#59364 ) Summary: Add support for aten::slice op in the NNAPI model converter * If start = 0; end = max -> identity * Flexible shapes can be passed through * Flexible shapes can't be sliced over Pull Request resolved: https://github.com/pytorch/pytorch/pull/59364 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_slice Reviewed By: anshuljain1 Differential Revision: D28881039 fbshipit-source-id: 3c1c630ff27b5bba6eda403d87570c61d43ae90e	2021-07-07 12:40:47 -07:00
Akshit Khurana	d26372794a	Add aten::detach NNAPI converter (#58543 ) Summary: * Add support for aten::detach op in the NNAPI model converter as a no-op * Also add flexible op support for add_pointwise_simple_unary_op Pull Request resolved: https://github.com/pytorch/pytorch/pull/58543 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_detatch Reviewed By: anshuljain1 Differential Revision: D28531942 fbshipit-source-id: 4387dbbbadd8ce6b690841f3a903e68a380b849d	2021-07-07 12:40:46 -07:00
Akshit Khurana	0be228dd5f	Add aten::flatten NNAPI converter (#60885 ) Summary: Add support for aten::div op in the NNAPI model converter. Startup time variable size support isn't supported as shapes go as inputs to NNAPI op Runtime variable size support to supported soon Pull Request resolved: https://github.com/pytorch/pytorch/pull/60885 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_flatten Reviewed By: anshuljain1 Differential Revision: D29451725 fbshipit-source-id: 8902745f7758c8cc88ad4b4ce02b8301ff894bd4	2021-07-07 12:40:44 -07:00
Akshit Khurana	b297f65b66	Add aten::div NNAPI converter (#58541 ) Summary: Add support for aten::div op in the NNAPI model converter. Add variable size input test as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58541 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_div Reviewed By: anshuljain1 Differential Revision: D28531943 fbshipit-source-id: e96342146f6de216f7b88443618edfc54963747c	2021-07-07 12:40:42 -07:00
Akshit Khurana	eab18a9a40	Add aten::to NNAPI converter (#58540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58540 Add support for aten::to op in the NNAPI model converter for simple cases like to("cpu"), to("gpu") Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_to Reviewed By: anshuljain1 Differential Revision: D28531941 fbshipit-source-id: 0c934f7aceaff2669307c3426efe32046d8c44f3	2021-07-07 12:40:41 -07:00
Akshit Khurana	14d604a13e	Add aten::softmax NNAPI converter (#58539 ) Summary: Add support for aten::softmax op in the NNAPI model converter with flexible size Pull Request resolved: https://github.com/pytorch/pytorch/pull/58539 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_softmax Reviewed By: anshuljain1 Differential Revision: D28531946 fbshipit-source-id: 8633f3e3f7f52795f9866ff16ad0867ea36a19e8	2021-07-07 12:39:31 -07:00
Xue Haotian	45ce26c397	Port `isposinf` & `isneginf` kernel to structured kernels (#60633 ) Summary: Porting `torch.isposinf` & `torch.isneginf` to structured kernel Related https://github.com/pytorch/pytorch/issues/55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60633 Reviewed By: saketh-are Differential Revision: D29517528 Pulled By: bdhirsh fbshipit-source-id: f8f62e4c203e0c54790437b5e512024bfabdddfc	2021-07-07 12:33:41 -07:00
Don Jang	c2b0af2560	[static runtime] Implement aten::sign (#61154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61154 Test Plan: Added `StaticRuntime.IndividualOps_Sign` ``` [djang@devvm861.prn0 ~/local/fbsource/fbcode/caffe2] buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- -v 1 ... [ RUN ] StaticRuntime.IndividualOps_Sign V0701 12:05:31.836099 3679080 impl.cpp:455] StaticModuleOptions: cleanup_activations 1, enable_out_variant 1, optimize_memory1, optimize_graph_output_memory0 V0701 12:05:31.898192 3679080 impl.cpp:1279] Switch to out variant for node: %3 : Tensor = aten::sign(%input.1) V0701 12:05:31.898849 3679080 impl.cpp:1279] Switch to out variant for node: %4 : Tensor = aten::clone(%3, %2) ``` Reviewed By: hlu1 Differential Revision: D29518603 fbshipit-source-id: e47b96d037fea639c41052f3849c82bbfa5f482a	2021-07-07 12:29:25 -07:00
Philip Meier	1262b2c4c6	fix `torch.futures` docstring examples (#61029 ) Summary: Trying to run the doctests for the complete documentation hangs if it reaches the examples of `torch.futures`. It turns out to be only syntax errors, which are normally just reported. My guess is that `doctest` probably doesn't work well for failures within async stuff. Anyway, while debugging this, I fixed the syntax. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61029 Reviewed By: mruberry Differential Revision: D29571923 Pulled By: mrshenli fbshipit-source-id: bb8112be5302c6ec43151590b438b195a8f30a06	2021-07-07 11:47:55 -07:00
Kento Nozawa	376dc500a9	Minor bug fix in the warning message (#61127 ) Summary: The current example code does not work. The correct one is like this: `cb7d813275/torch/distributed/run.py (L266)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61127 Reviewed By: cbalioglu Differential Revision: D29572003 Pulled By: mrshenli fbshipit-source-id: 05b470230f3d70f8a6164edb5f92894a1112069f	2021-07-07 11:42:51 -07:00
Facebook Community Bot	90241d254f	Automated submodule update: FBGEMM (#59968 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `a2257d9471` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59968 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: r-barnes Differential Revision: D29109045 fbshipit-source-id: 386b28b28275e728ee229d4baf1ff192635d49c3	2021-07-07 11:33:57 -07:00
Philip Meier	29ecb9f90b	Don't check stride by default (#60637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60637 We now have ~three out of three~ four out of four datapoints that `check_stride` will be `partial`'ed to `False`: - `torch` test suite: https://github.com/pytorch/pytorch/pull/58981#discussion_r639514081 - `torchvision` test suite: https://github.com/pytorch/pytorch/issues/56544#issuecomment-845352605 - `kornia`: `9041c42b41/test/utils.py (L25)` - `torch.fft`: https://github.com/pytorch/pytorch/pull/60304#pullrequestreview-687882323 Given that the strides in most cases are in implementation detail, IMO we should change the default to `False`. In cases were matching strides is a requirement for closeness / equality it can always set to `True` manually. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29556355 Pulled By: mruberry fbshipit-source-id: 0029a44280d8f4369fbdb537dce3202eeee4b1d9	2021-07-07 09:55:36 -07:00
Philip Meier	e2a3f4b560	Use maximum of tolerances in case of mismatching dtypes (#60636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60636 See https://github.com/pytorch/pytorch/pull/58981#issuecomment-866654600. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29556352 Pulled By: mruberry fbshipit-source-id: 36e97e0f338df5d17a94af078f172c668ef51ecb	2021-07-07 09:55:34 -07:00
Philip Meier	5f18ba7075	upcast to most precise dtype within their category before the comparison (#60536 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60536 `torch.isclose` does not do this bool tensors, which results in a test failure since subtraction (`abs(actual - expected)`) is not supported for them (see #58981). Since the `dtype` is already checked at this point, we can safely move the upcasting before `torch.isclose` is invoked. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29556356 Pulled By: mruberry fbshipit-source-id: 4c65fad4f06cf402d6aab9dde5b127235766d5e0	2021-07-07 09:55:32 -07:00
Philip Meier	5ac87cde30	tests for diagnostics in callable `msg` in `torch.testing.assert_close` (#60254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60254 Before we only tested that the correct error message is returned if `msg` is passed as callable. This adds tests that make sure that - the inputs passed to the callable are the same inputs passed to `torch.assert_close` and - the `diagnostics` namespace has the same attributes and types as documented. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29556354 Pulled By: mruberry fbshipit-source-id: 9793c6d86fda842b6329381fc03b945eee878464	2021-07-07 09:55:30 -07:00
Philip Meier	76d9e680d7	update docstring examples of `torch.testing.assert_close` (#60163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60163 Changes to the default error message in case of mismatching values need to be reflected in the examples given in the docstring. Normally this should be enforced by a [`doctest`](https://docs.python.org/3/library/doctest.html). mruberry do you know why we don't have such a check? Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29556353 Pulled By: mruberry fbshipit-source-id: 8dbc3f566f429618811b542a059d9abde9a6530b	2021-07-07 09:55:29 -07:00
Philip Meier	9979289037	Improve error messages of `torch.testing.assert_close` in case of mismatching values (#60091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60091 Closes #58383. (1) and (2) are implemented. (3) was rejected. No consensus was reached on (4) and (5). Improvements: - Instead of calling everything "Tensors" we now use "Scalars" and "Tensor-likes" depending on the shape. Plus, we now internally have the option to adapt this identifier for example to report "Imaginary components of complex tensor-likes", which is even more expressive. - The reported conditions "not close" and "not equal" are now determined based on `rtol` and `atol`. - The number of mismatched elements and the offending indices are only reported in case the inputs are not scalar - The allowed `rtol` and `atol` is only reported if `> 0` Example 1 ```python torch.testing.assert_close(1, 3, rtol=0, atol=1) ``` Before: ``` AssertionError: Tensors are not close! Mismatched elements: 1 / 1 (100.0%) Greatest absolute difference: 2 at 0 (up to 1 allowed) Greatest relative difference: 0.6666666865348816 at 0 (up to 0 allowed) ``` After: ``` AssertionError: Scalars are not close! Absolute difference: 2 (up to 1 allowed) Relative difference: 0.6666666865348816 ``` Example 2 ```python torch.manual_seed(0) t = torch.rand((2, 2), dtype=torch.complex64) torch.testing.assert_close(t, t + complex(0, 1)) ``` Before: ``` AssertionError: Tensors are not close! Mismatched elements: 4 / 4 (100.0%) Greatest absolute difference: 1.0000000596046448 at (0, 0) (up to 1e-05 allowed) Greatest relative difference: 0.8833684352411922 at (0, 1) (up to 1.3e-06 allowed) The failure occurred for the imaginary part. ``` After: ``` AssertionError: Imaginary components of tensor-likes are not close! Mismatched elements: 4 / 4 (100.0%) Greatest absolute difference: 1.0000000596046448 at index (0, 0) (up to 1e-05 allowed) Greatest relative difference: 0.8833684352411922 at index (0, 1) (up to 1.3e-06 allowed) ``` Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29556357 Pulled By: mruberry fbshipit-source-id: 559d4a19ad4fc069b2b4f8cb5fc2f6058621e33d	2021-07-07 09:54:09 -07:00
Sameer Deshmukh	e1338016dd	cuSOLVER path for LU factorization in CUDA. (#56887 ) Summary: This PR adds cuSOLVER path for `torch.lu`. Performance comparison results: https://github.com/pytorch/pytorch/issues/53879#issuecomment-830635381 Code for reproducing performance results: https://github.com/pytorch/pytorch/pull/56887#issuecomment-843212868 The following heuristics are used for choosing cuSOLVER over MAGMA: * If batch size == 1 OR (batch size <= 8 AND shape <= 16), choose cuSOLVER over MAGMA. * For all other cases use MAGMA. See also https://github.com/pytorch/pytorch/issues/47953. Following are the performance results between the MASTER branch and the current changes: <details> ``` [-------------------------- LU factorization (ATen) torch.float64 ---------------------------] \| lu_factorize CURRENT \| lu_factorize MASTER 1 threads: ----------------------------------------------------------------------------------- torch.Size([1, 1, 1]) \| 363.9 \| 284.1 torch.Size([2, 1, 1]) \| 354.8 \| 271.8 torch.Size([4, 1, 1]) \| 393.7 \| 278.0 torch.Size([8, 1, 1]) \| 459.3 \| 279.1 torch.Size([16, 1, 1]) \| 524.2 \| 288.9 torch.Size([32, 1, 1]) \| 525.1 \| 281.2 torch.Size([64, 1, 1]) \| 524.5 \| 281.7 torch.Size([128, 1, 1]) \| 522.8 \| 285.2 torch.Size([1, 2, 2]) \| 360.4 \| 277.7 torch.Size([2, 2, 2]) \| 372.9 \| 279.2 torch.Size([4, 2, 2]) \| 419.4 \| 278.3 torch.Size([8, 2, 2]) \| 475.7 \| 279.2 torch.Size([16, 2, 2]) \| 530.0 \| 299.5 torch.Size([32, 2, 2]) \| 530.0 \| 294.5 torch.Size([64, 2, 2]) \| 531.0 \| 291.5 torch.Size([128, 2, 2]) \| 544.4 \| 292.3 torch.Size([1, 8, 8]) \| 372.6 \| 292.8 torch.Size([2, 8, 8]) \| 380.9 \| 296.2 torch.Size([4, 8, 8]) \| 420.0 \| 293.4 torch.Size([8, 8, 8]) \| 490.6 \| 294.6 torch.Size([16, 8, 8]) \| 535.6 \| 296.5 torch.Size([32, 8, 8]) \| 534.7 \| 302.1 torch.Size([64, 8, 8]) \| 539.1 \| 305.5 torch.Size([128, 8, 8]) \| 540.7 \| 296.5 torch.Size([1, 16, 16]) \| 345.0 \| 303.2 torch.Size([2, 16, 16]) \| 405.0 \| 306.3 torch.Size([4, 16, 16]) \| 482.8 \| 305.6 torch.Size([8, 16, 16]) \| 596.3 \| 305.9 torch.Size([16, 16, 16]) \| 539.6 \| 304.4 torch.Size([32, 16, 16]) \| 542.2 \| 305.8 torch.Size([64, 16, 16]) \| 556.1 \| 311.0 torch.Size([128, 16, 16]) \| 545.1 \| 308.1 torch.Size([1, 32, 32]) \| 432.7 \| 342.4 torch.Size([2, 32, 32]) \| 582.6 \| 341.8 torch.Size([4, 32, 32]) \| 580.4 \| 344.4 torch.Size([8, 32, 32]) \| 586.5 \| 343.8 torch.Size([16, 32, 32]) \| 582.9 \| 346.0 torch.Size([32, 32, 32]) \| 574.4 \| 343.7 torch.Size([64, 32, 32]) \| 562.8 \| 350.8 torch.Size([128, 32, 32]) \| 568.3 \| 349.8 torch.Size([1, 64, 64]) \| 537.1 \| 518.4 torch.Size([2, 64, 64]) \| 766.5 \| 539.1 torch.Size([4, 64, 64]) \| 771.6 \| 551.9 torch.Size([8, 64, 64]) \| 783.4 \| 556.0 torch.Size([16, 64, 64]) \| 798.8 \| 555.3 torch.Size([32, 64, 64]) \| 795.6 \| 548.6 torch.Size([64, 64, 64]) \| 804.2 \| 580.4 torch.Size([128, 64, 64]) \| 837.6 \| 616.9 torch.Size([1, 128, 128]) \| 844.7 \| 848.9 torch.Size([2, 128, 128]) \| 1096.7 \| 873.3 torch.Size([4, 128, 128]) \| 1117.9 \| 884.8 torch.Size([8, 128, 128]) \| 1138.1 \| 903.6 torch.Size([16, 128, 128]) \| 1169.1 \| 943.9 torch.Size([32, 128, 128]) \| 1204.8 \| 981.4 torch.Size([64, 128, 128]) \| 1336.6 \| 1105.8 torch.Size([128, 128, 128]) \| 1639.4 \| 1473.3 torch.Size([1, 512, 512]) \| 3714.3 \| 3928.6 torch.Size([2, 512, 512]) \| 4388.3 \| 4179.7 torch.Size([4, 512, 512]) \| 4765.4 \| 4536.9 torch.Size([8, 512, 512]) \| 5615.2 \| 5441.1 torch.Size([16, 512, 512]) \| 7203.6 \| 7130.2 torch.Size([32, 512, 512]) \| 10580.5 \| 10503.9 torch.Size([64, 512, 512]) \| 17374.8 \| 17349.6 torch.Size([128, 512, 512]) \| 32542.3 \| 32548.8 torch.Size([1, 1024, 1024]) \| 10041.5 \| 14292.3 torch.Size([2, 1024, 1024]) \| 17126.6 \| 16971.0 torch.Size([4, 1024, 1024]) \| 20591.0 \| 20490.8 torch.Size([8, 1024, 1024]) \| 27682.8 \| 27560.7 torch.Size([16, 1024, 1024]) \| 41035.2 \| 41035.8 torch.Size([32, 1024, 1024]) \| 67091.8 \| 67345.9 torch.Size([64, 1024, 1024]) \| 119612.3 \| 119782.3 torch.Size([128, 1024, 1024]) \| 230095.5 \| 230766.2 Times are in microseconds (us). ``` </details> The main reason why a performance regression can be seen is related to this issue (https://github.com/pytorch/pytorch/issues/55122) and there seems to be no easy way to fix this (atleast in this PR). Pull Request resolved: https://github.com/pytorch/pytorch/pull/56887 Reviewed By: ngimel Differential Revision: D29482342 Pulled By: mruberry fbshipit-source-id: 4fdedf21b0d5597b289e168dff61d5f5d7727fb1	2021-07-07 09:45:23 -07:00
ramvenkat98	4a544df00d	Implement and benchmark a torch.optim.multi_tensor.adagrad implementation (#59155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59155 Test Plan: Imported from OSS Reviewed By: iramazanli Differential Revision: D29525213 Pulled By: ramvenkat98 fbshipit-source-id: 6d7e8da91c965d1f4e955a084ed875bab641dc9a	2021-07-07 08:08:32 -07:00
mingfeima	8bec478a9e	MaxPool2d: use channels_last format for both output and indice when input is channels_last (#61245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61245 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29557884 Pulled By: ezyang fbshipit-source-id: 0d2b8cbaaf13411eefd7d867021bd6028d40e5cc	2021-07-07 07:50:28 -07:00
xiaolil1	66158a6e90	Enable AutogradXPU DispatchKey for Intel heterogeneous computation platform. (#61105 ) Summary: Add string wrapper for AutogradXPU to enable this DispatchKey. We are going to use AutogradXPU as custom autograd backend, which needs this DispatchKey. This sting wrapper is used to map AutogradXPU to the corresponding DispatchKey. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61105 Reviewed By: malfet Differential Revision: D29557697 Pulled By: ezyang fbshipit-source-id: f0c8155decc8e2fd90741650a05de5a8b5a70121	2021-07-07 07:47:01 -07:00
Freey0	a69e947ffd	avg_pool3d_backward: Port to structured (#59084 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59084 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28802619 Pulled By: ezyang fbshipit-source-id: 89a0fcdcf8976ca7c21da7a40fd26a1cba180faa	2021-07-07 07:44:17 -07:00
haozhe.zhu	e4c450a4e8	The dispatch order for custom function (#60251 ) Summary: Hi, I am working on dev some custom ops. And I found this issue: Cause of the logical here: https://github.com/pytorch/pytorch/compare/master...zhuhaozhe:customer-op-trace?expand=1#diff-d7ade8589773904745c0cf965a19f24c940f1d36038f4c0ce85af2f3d89991dcL173-L177. For all custom ops, "Tracer" dispatch key got the highest priority. This make custom-ops and non-custom-ops get different behavior during dispatch. I do not understand whether there exist some special reason to let custom-ops "trace" first then begin to "dispatch". Pull Request resolved: https://github.com/pytorch/pytorch/pull/60251 Reviewed By: malfet Differential Revision: D29577131 Pulled By: ezyang fbshipit-source-id: a8e824029cf934f09f29638b127961a6a5c332de	2021-07-07 06:31:43 -07:00
Jeffrey Wan	a6fea03a8a	Skip codegen checks for `dequantize_self`, `lu_unpack`, `_cudnn_rnn`, and `.conv._backward.*` (#61139 ) Summary: Temporary fix for fb-internal tests. This and similar failures are being discussed here: https://github.com/pytorch/pytorch/issues/60426 Applies the below changes: - This may seem counter intuitive because storage check comes before tensor check, but if TensorImpl use count is not enforced, we should also not enforce storage use count. If an op returns one of its inputs as-is, it is possible for this input to already be aliased with another tensor, and hence would have StorageImpl use count greater than one. - Also clarify in description that use_count is not necessarily > 1, use_count may but not necessarily return one of its inputs as-is. - Allow usage of regex in skip list Pull Request resolved: https://github.com/pytorch/pytorch/pull/61139 Reviewed By: malfet, Varal7 Differential Revision: D29564917 Pulled By: soulitzer fbshipit-source-id: 806b7177117a573dd12f161cc80dcadac892f9d0	2021-07-07 05:21:19 -07:00
Zeina Migeed	6f1455440b	task 3: typecheck (#60805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60805 Test Plan: Imported from OSS Reviewed By: jamesr66a, VitalyFedyunin Differential Revision: D29522885 Pulled By: migeed-z fbshipit-source-id: 559a8a495a16e517af77fd5a0785a82e1ebb3bd7	2021-07-06 23:51:49 -07:00
Nikita Shulga	9813b9bc0d	Fix mypy.ini (#61333 ) Summary: Fixes CI regression caused by https://github.com/pytorch/pytorch/issues/61119 Unlike Python, `.ini` string lists could not end with trailing comma. Fixes CI on master Pull Request resolved: https://github.com/pytorch/pytorch/pull/61333 Reviewed By: bhosmer Differential Revision: D29578696 Pulled By: malfet fbshipit-source-id: b81e5f4c0a553299c4d4bee0a9bb73748910795f	2021-07-06 22:46:09 -07:00
Nikita Shulga	f0316ec0b6	Revert D24068202: [pytorch][PR] Add typing return value to init in nn.Module Test Plan: revert-hammer Differential Revision: D24068202 (`506397a809`) Original commit changeset: 4cd9b6ca12b5 fbshipit-source-id: f45fcf7ee6ee9198ed6f3f34956ce68a64378c32	2021-07-06 22:15:31 -07:00
Zeina Migeed	98119bfce9	task 2: ast rewrite (#60622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60622 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29493747 Pulled By: migeed-z fbshipit-source-id: 684fcdfd3dd441e72c77bb7a4d64c18b9849a198	2021-07-06 20:15:30 -07:00
Peter Bell	0dc40474fe	Migrate glu from the THC to ATen (CUDA) (#61153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61153 Fixes gh-24571, fixes gh-24572 Closes gh-39586, closes gh-39586 Benchmarks ---------- The benchmarks were run with nvprof calling the operator in a loop. It shows reliable improvements for large tensors, but the TH implementation seems to fair better for smaller tensors. For sufficiently large tensors, the ATen implementation does win though. \| Shape \| Dim \| Master Forward (us) \| This PR Forward (us) \| Master Backward (us) \| This PR Backward (us) \| \|-------------:\|-----\|:-------------------:\|:--------------------:\|:--------------------:\|:---------------------:\| \| 128, 1000 \| 0 \| 2.4770 \| 2.0820 \| 3.0440 \| 3.4680 \| \| \| 1 \| 2.7060 \| 4.4850 \| 3.3380 \| 3.6250 \| \| 128, 10000 \| 0 \| 26.531 \| 21.366 \| 38.083 \| 34.623 \| \| \| 1 \| 27.680 \| 30.465 \| 38.943 \| 35.204 \| \| 128, 100000 \| 0 \| 292.09 \| 219.56 \| 355.57 \| 324.49 \| \| \| 1 \| 260.43 \| 243.08 \| 332.25 \| 323.37 \| \| 128, 1000000 \| 0 \| 2475.7 \| 1874.6 \| 3810.1 \| 3215.7 \| \| \| 1 \| 2586.3 \| 2380.9 \| 3349.9 \| 3207.8 \| Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29538093 Pulled By: ngimel fbshipit-source-id: 1f66b45ec7c46fb8e680b50110a5fde6fe7faab7	2021-07-06 19:06:51 -07:00
James Reed	7a4ffbd1da	[FX] s/IS_SANDCASTLE/IS_FBCODE/ in tests (#61304 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61304 Previously tests were unrunnable on devserver. This fixes that ghstack-source-id: 133051811 Test Plan: waitforsadcastle Reviewed By: Chillee Differential Revision: D29561806 fbshipit-source-id: 6020e5b4ba72d6de1ea2563e70fdb0e604bee1a5	2021-07-06 17:20:53 -07:00
Alexander	506397a809	Add typing return value to init in nn.Module (#45654 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45497 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45654 Reviewed By: driazati Differential Revision: D24068202 Pulled By: malfet fbshipit-source-id: 4cd9b6ca12b531311302e3cdeeab39bc45d86c94	2021-07-06 17:09:30 -07:00
Zeina Migeed	9f3167ebdf	task 1: annotate (#60621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60621 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29493619 Pulled By: migeed-z fbshipit-source-id: 1bd3fb02c90ae5b394869a474b2e6b06af0d4791	2021-07-06 16:48:11 -07:00
Elton Leander Pinto	a1ad28da10	Refactor clang_tidy.py (#61119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61119 This change spilts the clang-tidy CI job into smaller steps and uses a refactored version of the clang_tidy.py script. The new folder structure is as follows: ``` tools/linter/clang_tidy \|_ __main__py \|_ requirements.txt \|_ run.py \|_ setup.sh ``` `__main__.py` This script will run `tools/linter/clang_tidy/setup.sh` if a `build` directory doesn't exist, mimicing what used to be done as a separate step in the CI job. After that, it will invoke `clang-tidy` with default arguments being declared in the script itself (as opposed to declaring them in lint.yml). The reasoning behind this approach is two-fold: - Make it easier to run `clang-tidy` locally using this script - De-duplicate the option passing `requirements.txt` Contains a list of additional python dependencies needed by the `clang-tidy` script. `setup.sh` If a build directory doesn't exist, this command will run the necessary codegen and build commands for running `clang-tidy` Example usage: ``` python3 tools/linter/clang_tidy --parallel ``` Notice that we don't have to put the `.py` at the end of `clang_tidy`. Test Plan: Run the following command: ``` python3 tools/linter/clang_tidy --paths torch/csrc/fx --parallel ``` Reviewed By: walterddr, janeyx99 Differential Revision: D29568582 Pulled By: 1ntEgr8 fbshipit-source-id: cd6d11c5cb8ba9f1344a87c35647a1cd8dd45b04	2021-07-06 16:02:11 -07:00
Fritz Obermeyer	81e36d02a6	Improve error message on invalid values to Distribution methods (#61056 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/18133 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61056 Reviewed By: jbschlosser Differential Revision: D29510173 Pulled By: neerajprad fbshipit-source-id: 205ec7de6c8576a73e77ee4bf01c30e99b38a52e	2021-07-06 15:44:55 -07:00
driazati	45cc207a88	Fix breakpad build + add test canary (#60990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60990 This makes the breakpad build more explicit in its messaging and hints to cmake where to look for the library (it wasn't able to find it without `PATHS` on CI even though that works locally). This also adds a smoke test that will fail if breakpad isn't present on a CI job where it is expected (e.g. binary builds). Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29514316 Pulled By: driazati fbshipit-source-id: 79514363334788f311ba5d4f25deed3452f0c3eb	2021-07-06 14:15:07 -07:00
Richard Barnes	b6024b9d12	More loop transforms 2 Summary: Exact duplicate of D29410111 to fix land issues. Test Plan: Sandcastle Reviewed By: walterddr Differential Revision: D29538335 fbshipit-source-id: 6a4f9ac4a505339ed242af60fe7fd4ba1fda3b32	2021-07-06 13:38:10 -07:00
Xiao Wang	c74c0c5718	add thrust/host_vector.h header for cuda 11.4 build (#61004 ) Summary: needed for cuda 11.4 build Close https://github.com/pytorch/pytorch/issues/61011 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61004 Reviewed By: ngimel Differential Revision: D29523896 Pulled By: malfet fbshipit-source-id: acb11bdd19c0cc240696be21e5c492f8976fea65	2021-07-06 12:44:56 -07:00
Ruslan Semenov	5da507b57b	Add bazel actions workflow (#61039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61039 - Added a new template for bazel GH Actions workflow - Simplified the workflow based on malfet's suggestion by combining build and test jobs into one as we only run a small subset of tests for bazel - Tested the run to make sure it succeeds - Build step takes 4 minutes, test step takes 7 minutes The downside of this approach is that I duplicated some of the jobs in a new template file. Alternative solution would be to use something like this https://jinja.palletsprojects.com/en/3.0.x/templates/#template-inheritance, however, that is better to be done in a separate PR as linux and windows workflows would need to be changed. Another solution is to use a bunch of if else statements in a linux workflow template to accommodate bazel build as part of it, but this seems not as clean as template inheritance with jinja. Here is a link to the latest bazel run with this change https://github.com/pytorch/pytorch/actions/runs/1004656584 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29562260 Pulled By: rsemenov fbshipit-source-id: a7d7d3a0b8092f52929fb109820bfad4574f5602	2021-07-06 12:18:43 -07:00
Masaki Kozuki	fac744e116	Foreach Binary Test Refactor (#59907 ) Summary: Related: https://github.com/pytorch/pytorch/issues/58833 ## Changes I'm a bit concerned - binary ops with one tensorlist and one scalarlist support complex dtypes. To realize this, I added a specialization of [`TensorListScalarListMetadata<c10::complex<double>, 1>` ](https://github.com/pytorch/pytorch/pull/59907/files#diff-131eb9b310905b15b3528da6a23e542a3a3aa952bc88f7423c98a23a8a28cca1R49). This might be out of the scope of this pull request. cc ptrblck ngimel mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/59907 Reviewed By: mruberry Differential Revision: D29551001 Pulled By: ngimel fbshipit-source-id: 46b25fdba85dd4d6332a77b27376fe96cd422384	2021-07-06 11:49:38 -07:00
Thomas J. Fan	5503a4ac6e	DOC Improves shape documentation for *Flatten (#60980 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60841 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60980 Reviewed By: VitalyFedyunin Differential Revision: D29526650 Pulled By: jbschlosser fbshipit-source-id: 2b4b0b84e0652c4cf3e9a48debb3b1bfe4e04b05	2021-07-06 10:47:11 -07:00
Nikita Shulga	95cada8810	Make breakpad depdendencies private (#61183 ) Summary: Otherwise, it will results in the following errors for people developing extensions ``` CMake Error in frontends/pytorch/csrc/CMakeLists.txt: Imported target "torch" includes non-existent path "/usr/local/include/breakpad" ``` Fixes different issue reported in https://github.com/pytorch/pytorch/issues/60485 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61183 Reviewed By: driazati Differential Revision: D29538332 Pulled By: malfet fbshipit-source-id: e83cfd0b335e9b0b1ba5715789b09765db671346	2021-07-06 10:02:34 -07:00
Nikita Shulga	635d864b26	Fix modernize-use-equals-default nolint failures in torch/csrcs (#61142 ) Summary: Test-plan: Compile + clang-tidy Pull Request resolved: https://github.com/pytorch/pytorch/pull/61142 Reviewed By: VitalyFedyunin Differential Revision: D29529372 Pulled By: malfet fbshipit-source-id: 2ccde7712a51c28243b16bbb4d1d68086e0414a6	2021-07-06 09:46:46 -07:00
Rong Rong (AI Infra)	718db968b8	move CI related functions out of run_test.py (#61124 ) Summary: run_test.py currently does lots of downloading and test file/suite/case parsing. It doesn't work well outside of the CI environment Restructured the run_test.py and created tools/test/test_selections.py and move all test selection logic (reordering, categorizing slow test, creating shards) Follow up PRs should: - refactor those file read/write logic entangled inside test_selections.py into stats/ folder - restructure and add network independent test logics to test_test_selections.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/61124 Test Plan: - tools/test - CI Related PR: This follows the refactoring example in: https://github.com/pytorch/pytorch/issues/60373 Reviewed By: malfet Differential Revision: D29558981 Pulled By: walterddr fbshipit-source-id: 7f0fd9b4720a918d82918766c002295e8df04169	2021-07-06 09:06:42 -07:00
Ruslan Semenov	864dcbb2cc	Set sccache bucket on test runs to save some run minutes (#61140 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61140 While working on bazel port to GitHub Actions I noticed that we do not set sccache bucket for test runs that causing cache misses while running test jobs. For example https://github.com/pytorch/pytorch/runs/2965919198?check_suite_focus=true test run 1 uses local cache and has 44 cache misses with avg 9 sec read per miss it is saving 44*9/60 = 7 minutes per run. Here is another example https://github.com/pytorch/pytorch/runs/2966210127?check_suite_focus=true Open to feedback if there is a downside of using AWS cache. Test Plan: Imported from OSS Reviewed By: 1ntEgr8 Differential Revision: D29557292 Pulled By: rsemenov fbshipit-source-id: e8fb000850ec4627d7cccf690e8f5743999fdf36	2021-07-06 07:29:57 -07:00
Zafar	05c1e5b655	[sparsity] Lambda Scheduler (#59771 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59771 Implements a specific sparsity scheduler, that uses a user-provided lambda's to change the levels. Test Plan: ``` python test/test_ao_sparsity.py ``` Imported from OSS Differential Revision: D29070604 D29070604 Reviewed By: raghuramank100 Pulled By: z-a-f fbshipit-source-id: c7ccbe63fe4cd6a0c3563541b7fcf93a99d0e62f	2021-07-02 21:39:38 -07:00
Zafar	37ebf2e3cd	[sparsity] Base sparsity level scheduler class (#59770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59770 Implements the base scheduler class for changing the sparsity levels in the sparsifier. Test Plan: ``` python test/test_ao_sparsity.py ``` Imported from OSS Differential Revision: D29070603 D29070603 Reviewed By: raghuramank100 Pulled By: z-a-f fbshipit-source-id: 0b160e4eb0a2a303d2d19e6a3beb4784002b2cb7	2021-07-02 21:38:24 -07:00
Richard Barnes	ed63fb5225	Fix some more loops (#60895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60895 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29431572 fbshipit-source-id: fbcf48696bf2c90cc0973a767d83bb526f6ccd7f	2021-07-02 19:17:08 -07:00
Rohan Varma	43fb39c3eb	[DDP] Make uneven inputs work with comm. hook (#61020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61020 Makes uneven input support with `join` context manager work with custom communication hooks. This will ensure that the two features can work well together. Added relevant unittests to test allreduce and powerSGD hooks. Instead of calling `allreduce`, the join manager now calls into `_run_reduction_hook` which will automatically run whatever hook is installed. ghstack-source-id: 132950108 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29480028 fbshipit-source-id: c91dc467a62c5f1e0ec702a2944ae3deb10f93f4	2021-07-02 18:48:21 -07:00
Rohan Varma	94b730681f	[DDP] Refactor uneven inputs to take GradBucket (#61019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61019 Changes uneven input logic of running allreduce to using `GradBucket` structure. This is to enable support for comm. hook with join in the next diff. ghstack-source-id: 132950107 Test Plan: ci Reviewed By: SciPioneer Differential Revision: D29480027 fbshipit-source-id: 7c42c53653052f71b86a75e14a5fc7ae656433f7	2021-07-02 18:47:23 -07:00
Peter Bell	512448a425	CTCLoss: Remove dispatching in parallel region (#60599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60599 Ref #56794 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29446190 Pulled By: ngimel fbshipit-source-id: eb01783c8c32a1405b58e1364fc3d71c0f054e0a	2021-07-02 17:55:56 -07:00
Zafar	d42f1751d4	[sparsity] WeightNormSparsifier (#58955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58955 Implements the weight norm sparsifier. This type of sparsifier computes the norm of the weights, sorts them, and zeroes-out the target fraction of them. The main imeplemented method is `update_mask`, which holds the main logic of changing the masks. Test Plan: ``` python test/test_ao_sparsity.py ``` Imported from OSS Differential Revision: D28970960 D28970960 Reviewed By: raghuramank100 Pulled By: z-a-f fbshipit-source-id: 8f2a4360ad877f430cdc1065c6777106938b58d5	2021-07-02 17:35:27 -07:00
Zafar	7ab2729481	[sparsity][refactor] Import factoring out (#58707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58707 Minor refactor that changes the format of the import. This is done to avoid accidental circular dependencies. Test Plan: ``` python test/test_ao_sparsity.py ``` Imported from OSS Differential Revision: D28970961 D28970961 Reviewed By: raghuramank100 Pulled By: z-a-f fbshipit-source-id: c312742f5e218c435a1a643532f5842116bfcfff	2021-07-02 16:32:39 -07:00
Zafar	973e9266ff	[sparsity] Sparsifier class (#58704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58704 Implements the base sparsifier class based on the #59835 RFC documents. This PR implements the base class for the sparsification. Specifically, the prepare method is implemented. Test Plan: ``` python test/test_ao_sparsity.py ``` Imported from OSS Differential Revision: D28970958 D28970958 Reviewed By: raghuramank100 Pulled By: z-a-f fbshipit-source-id: 0ef98a445c0a0aca22ce5708e34a9f94606d0e2b	2021-07-02 16:31:21 -07:00
Zafar	80cab10534	[sparsity] Sparsity parametrization (#58705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58705 The basic demo for this particular implementation can be found here: https://gist.github.com/z-a-f/1d06ae8d5a509d3c9c1596dcb924afe0 Test Plan: ``` python test/test_ao_sparsity.py ``` Imported from OSS Differential Revision: D28970959 D28970959 Reviewed By: raghuramank100 Pulled By: z-a-f fbshipit-source-id: 2a0bea1e0a81816690e05f83051d607c90925d32	2021-07-02 11:12:31 -07:00
Zafar	5d34b7955b	[sparsity][refactor] Changing linear row/col control (#60850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60850 Test Plan: ``` python test/test_ao_sparsity.py ``` ``` python test/test_ao_sparsity.py ``` Differential Revision: D29465900 D29465900 Reviewed By: raghuramank100 Pulled By: z-a-f fbshipit-source-id: 412f50da857f377898fea79d378ae54a049b81fe	2021-07-02 11:12:30 -07:00
Zafar	509b1ef9d5	[sparsity] Add sparsity tests to run_test.py (#60887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60887 Test Plan: ``` ./test/run_test.py -i test_ao_sparsity ``` ``` ./test/run_test.py -i test_ao_sparsity ``` Differential Revision: D29465834 D29465834 Reviewed By: mruberry Pulled By: z-a-f fbshipit-source-id: 144f940363a20dd65c2bbfe70924c266d8791dc7	2021-07-02 11:11:20 -07:00
Peter Bell	54673fc944	Sparse: Remove dispatch in parallel region (#60598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60598 Ref #56794 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29446192 Pulled By: ngimel fbshipit-source-id: 1a11f3aa847e4ce83fc6f50cee362b7d0cb61eae	2021-07-01 21:56:17 -07:00
Rohan Varma	11b722c063	[DDP] Refactor hook running logic (#61018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61018 Extract logic of hook running to a function `run_reduction_hook` that takes in a `GradBucket` and runs the hook/allreduce. This is mainly to prepare for join to support comm. hook in follow up diffs. ghstack-source-id: 132924220 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29477143 fbshipit-source-id: 87e8e563e71821fd462d6b259c98a6a0afbcd7b4	2021-07-01 20:41:55 -07:00
Rohan Varma	b21df03f3b	[DDP] Remove SPMD from get_bucket_tensors (#61017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61017 Removes SPMD nested vector logic from this codepath. This is mostly in preparation for the next diffs in this stack which enable support for join with comm. hook. ghstack-source-id: 132924223 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29477360 fbshipit-source-id: f8132a94b1abfe28586aa78ac47e13a7ce6bb137	2021-07-01 20:40:53 -07:00
Meghan Lele	4a2e8b53bb	[JIT] Add `torch._C.ScriptList`` (#52832 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52832 Summary This commit adds `torch._C.ScriptList`, a list type that has reference semantics across the Python/TorchScript boundary. That is, modifications made in TorchScript to instances of `torch._C.ScriptList` are visible in Python even when it is not returned from the function. `torch._C.ScriptList` is implemented using a modified version of pybind's `stl_bind.h`-style bindings attached to `ScriptList` and `ScriptListIterator`, wrapper classes around `c10::impl::GenericList` and `c10::impl::GenericList::iterator`. These bindings allow instances of `torch._C.ScriptList` to be used as if it were a regular `list` in Python. Reference semantics are achieved by simply retrieving the `IValue` contained in `ScriptList` in `toIValue` (invoked when converting Python arguments to `IValues` before calling TorchScript code). Test Plan This commit adds `TestScriptList` to `test_list_dict.py`, a set of tests that check that all of the common list operations are supported and that instances have reference semantics across the Python/TorchScript boundary. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D29478121 Pulled By: SplitInfinity fbshipit-source-id: 652cc25cfa37debe28db9527504846f22abd8b54	2021-07-01 20:28:13 -07:00
David Riazati	6e9e30cc1d	Ignore notebooks when checking for newlines (#61156 ) Summary: Fix lint on master (these files should be considered "generated" so don't lint them) Pull Request resolved: https://github.com/pytorch/pytorch/pull/61156 Reviewed By: malfet Differential Revision: D29532211 Pulled By: driazati fbshipit-source-id: a1e47f45bedf441613bdc2bd60fbf8299e5c962f	2021-07-01 18:11:43 -07:00
Supriya Rao	a4d86e0d53	[quant][fx][perf] improve runtime of prepare step for large models (#61132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61132 For large models, the insert_observers_for_model function was taking a long time, especially for the case where not all the nodes are being quantized For example for a model with 21000 nodes of which only ~50 are being quantized the breakdown of prepare_fx vs convert fx was prepare_fx 979 seconds convert_fx 9 seconds The main reason was because we were doing some unnecessary computation for all nodes in this function, this PR just moves them to where they are actually used After this PR prepare_fx 26 seconds convert_fx 9 seconds Test Plan: Existing tests Imported from OSS Reviewed By: raghuramank100 Differential Revision: D29522303 fbshipit-source-id: 7ce12582a859d02ff763abebf4a592d28e0764ca	2021-07-01 17:17:10 -07:00
Vitaly Fedyunin	277b310edb	[DataLoader] Add notebook with DataPipes API example (#60680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60680 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29461079 Pulled By: VitalyFedyunin fbshipit-source-id: 6532bf77113ab89a50f8bb022daf80f8477e9297	2021-07-01 16:39:28 -07:00
Karen Zhou	ca2702a776	[pruner] Make bias hook stateless (#61077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61077 Removing `BiasHook` class, using function instead. ghstack-source-id: 132899223 Test Plan: ` buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1L7Tg Reviewed By: z-a-f Differential Revision: D29504119 fbshipit-source-id: 6dd9689d18b17ac64e8a461f466e2c9018bc530b	2021-07-01 14:59:00 -07:00
Karen Zhou	0a7875231b	[pruner] Add bias support (#60970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60970 Support adding bias in eager mode ghstack-source-id: 132695883 Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1L3K3 Reviewed By: z-a-f Differential Revision: D29441499 fbshipit-source-id: 47e0fff5b3014612bd021e145160ea54e2645e24	2021-07-01 14:57:09 -07:00
Thomas J. Fan	87dbdef65d	MAINT Adds test and docs for Linear with no batch dims (#60992 ) Summary: Towards https://github.com/pytorch/pytorch/issues/60585 This PR updates docs for `Linear` and adds a non-batch test case to `common_nn.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60992 Reviewed By: VitalyFedyunin Differential Revision: D29518451 Pulled By: jbschlosser fbshipit-source-id: 6dd79c0f21ac5b6f693e3e1ba954379d2606d4e0	2021-07-01 14:49:24 -07:00
Akshit Khurana	369802a504	Add aten::avgpool2d NNAPI converter (#58538 ) Summary: Add support for aten::avgpool2d op in the NNAPI model converter with var size support Pull Request resolved: https://github.com/pytorch/pytorch/pull/58538 Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_avgpool2d Reviewed By: anshuljain1 Differential Revision: D28531944 fbshipit-source-id: 43ff8c9389365698c282f204042b49c7ec84d824	2021-07-01 14:07:14 -07:00
Martin Yuan	19b6ee4d4e	model_dump working with delegate models (#61043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61043 Trying to make model_dump work with delegate models ghstack-source-id: 132809755 Test Plan: N509022. The data.pkl in the lowered model: ``` bash-3.2$ python -m torch.utils.show_pickle /Users/myuan/models/backend/lowered_model.pt@*/data.pkl torch.jit.backend_with_compiler_demo.LoweredModule.__torch__.___torch_mangle_5.ModuleAdd()(state= (torch.jit._pickle.restore_type_tag({'forward': torch.jit._pickle.restore_type_tag({'input_shapes': '((1, 1, 320, 240), (1, 3))', 'some_other_option': 'True'}, 'Dict[str, str]')}, 'Dict[str, Any]'), torch.jit._pickle.restore_type_tag({'forward': 'prim::Constant#1<debug_handle>271,aten::add<debug_handle>272'}, 'Dict[str, str]'), True)) ``` Comparing to data.pkl in scripted_model.pt: ``` __torch__.___torch_mangle_7.ModuleAdd()(state= {'_is_full_backward_hook': None, 'training': True}) ``` Reviewed By: Amyh11325 Differential Revision: D29464860 fbshipit-source-id: d738e98ea518339465f8e3375207cf83e3dac532	2021-07-01 13:39:56 -07:00
Pearu Peterson	374278f431	Improved sparse CSR tensor sampling method (#60283 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59379 The improved sparse CSR tensor sampling method is described in https://pearu.github.io/csr_sampling.html that features: - for specified `nnz`, one gets a CSR sample with the same `nnz` - variability of the number of specified columns per row is maximized - `crow_indices` content is randomized - a given row specific `col_indices` content is sorted and filled with unique values (see also https://github.com/pytorch/pytorch/issues/60277) Pull Request resolved: https://github.com/pytorch/pytorch/pull/60283 Reviewed By: bhosmer Differential Revision: D29492605 Pulled By: cpuhrsch fbshipit-source-id: 8d875b7c2b0573a9ab37047c6d8fe8b540295ce1	2021-07-01 13:26:19 -07:00
Mike Guo	6ecc1a4c4f	Make pytorch clang-tidy clean (#60649 ) Summary: This PR suppresses clang-tidy warnings in the codebase (for now) so that we can re-enable clang-tidy checks on master. I ran this script to add the `NOLINTNEXTLINE` comments (on a devserver): ```bash python3 setup.py develop # Uses same script that's run on CI and adds the -j (parallel), -s (add comments), -k (continue if diagnostic errors are found) options python3 tools/clang_tidy.py \ -j \ -s \ -k \ -v \ --paths torch/csrc/ \ -g"-torch/csrc/jit/passes/onnx/helper.cpp" \ -g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp" \ -g"-torch/csrc/jit/serialization/onnx.cpp" \ -g"-torch/csrc/jit/serialization/export.cpp" \ -g"-torch/csrc/jit/serialization/import.cpp" \ -g"-torch/csrc/jit/serialization/import_legacy.cpp" \ -g"-torch/csrc/onnx/init.cpp" \ -g"-torch/csrc/cuda/nccl." \ -g"-torch/csrc/cuda/python_nccl.cpp" \ -g"-torch/csrc/autograd/FunctionsManual.cpp" \ -g"-torch/csrc/generic/.cpp" \ -g"-torch/csrc/jit/codegen/cuda/runtime/*" \ -g"-torch/csrc/deploy/interpreter/interpreter.cpp" \ -g"-torch/csrc/deploy/interpreter/interpreter.h" \ -g"-torch/csrc/deploy/interpreter/interpreter_impl.h" \ -g"-torch/csrc/deploy/interpreter/test_main.cpp" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60649 Test Plan: Verified changes by re-running the script (without the `-s` option) and seeing no warnings/errors. Reviewed By: walterddr, janeyx99 Differential Revision: D29504258 Pulled By: 1ntEgr8 fbshipit-source-id: 78310b30ee8213b73ddb4771ad874665323e7a4e	2021-07-01 12:21:07 -07:00
driazati	a0a9ea6598	Fix documentation preview instructions (#61080 ) Summary: People don't need to self host these anymore since we do it automatically in PRs Pull Request resolved: https://github.com/pytorch/pytorch/pull/61080 Reviewed By: VitalyFedyunin, janeyx99 Differential Revision: D29506465 Pulled By: driazati fbshipit-source-id: 45875cb229f8cc565a9a1405f52cef198ee0e687	2021-07-01 12:17:34 -07:00
Rohan Varma	60509f8921	Update DDP documentation to mention outputs not used in loss is supported (#60275 ) Summary: We recently landed a change to ensure that when running under ``find_unused_parameters=True``, not all module outputs have to be used in loss computation and DDP will work as expected. Mention this update in the documentation and add some additional clarification. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60275 Reviewed By: SciPioneer Differential Revision: D29502609 Pulled By: rohan-varma fbshipit-source-id: ddb3129cff9492018e61813413b30711af212309	2021-07-01 11:56:53 -07:00
Luca Wehrstedt	0128eb9a85	Fix TSAN issue in distributed tests (#59238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59238 Creating a `mutliprocessing.Manager()` launches a new process using the `fork` method (because it's the default one), and then in that subprocess it launches a new thread. TSAN really doesn't like this (and rightly so!) because we already had threads in the superprocess, and intermixing threads and forks is dangerous. The proper way to deal with this is to `exec` inside the child process or, in other words, use the `spawn` method. Note that the method used to launch the Manager is entirely unrelated from the method used to launch our "own" subprocesses, hence we were using `fork` for the Manager even though we were using `spawn` for our own subprocesses. ghstack-source-id: 130240724 Test Plan: Reverted the silencing introduced in D28490129, ran the `test_init_rpc_then_pg` test from the TensorPipe suite and saw the original TSAN failure. Then applied my fix, re-ran the test, and the failure was gone. Reviewed By: zhaojuanmao Differential Revision: D28794321 fbshipit-source-id: 12242e69be399a7f02a40a0ebb3d92f92e00ce73	2021-07-01 11:53:01 -07:00
Victor Quach	5b44d817fb	Expose raw saved tensors for codegen functions (#60565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60565 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29466225 fbshipit-source-id: 77eb4214a1baecc501282413d99d55f8935dc01f	2021-07-01 11:25:21 -07:00
Sam Estep	3f0f860a1c	Condense JIT/Quantization triage into one workflow (#61130 ) Summary: The `.github/workflows/{jit,quantization}_triage.yml` workflows are nearly identical, so this PR consolidates them into a single GitHub Actions workflow to reduce code duplication. It also renames the workflow so it starts with a capital letter, so that it will show up alongside all our other GitHub Actions workflows on [the HUD](https://hud.pytorch.org/build2/pytorch-master). Pull Request resolved: https://github.com/pytorch/pytorch/pull/61130 Reviewed By: walterddr Differential Revision: D29520022 Pulled By: samestep fbshipit-source-id: 673789762e08c2c77d72e7c20eb16d6beec573ba	2021-07-01 10:50:26 -07:00
Pritam Damania	6f92f10c94	Use a leaky singleton for CublasHandlePool. (#60987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60987 We were seeing deadlocks as follows during shutdown: ``` Thread 1 (LWP 2432101): #0 0x00007efca470190b in __pause_nocancel () from /lib64/libc.so.6 #1 0x00007efca49de485 in __pthread_mutex_lock_full () from /lib64/libpthread.so.0 #2 0x00007ef91d4c42c6 in __cuda_CallJitEntryPoint () from /lib64/libnvidia-ptxjitcompiler.so.1 #3 0x00007efc651ac8f1 in ?? () from /lib64/libcuda.so #4 0x00007efc651aee03 in ?? () from /lib64/libcuda.so #5 0x00007efc64f76b84 in ?? () from /lib64/libcuda.so #6 0x00007efc64f77f5d in ?? () from /lib64/libcuda.so #7 0x00007efc64eac858 in ?? () from /lib64/libcuda.so #8 0x00007efc64eacfbc in ?? () from /lib64/libcuda.so #9 0x00007efc7810a924 in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #10 0x00007efc780fa2be in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #11 0x00007efc78111044 in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #12 0x00007efc7811580a in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #13 0x00007efc78115aa4 in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #14 0x00007efc781079ec in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #15 0x00007efc780e6a7a in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #16 0x00007efc7811cfa5 in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #17 0x00007efc777ea98c in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #18 0x00007efc777ebd80 in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #19 0x00007efc777ea2c9 in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #20 0x00007efc778c2e2d in cublasDestroy_v2 () from /usr/local/cuda/lib64/libcublas.so.11 #21 0x00007efc51a3fb56 in std::_Sp_counted_ptr_inplace<at::cuda::(anonymous namespace)::DeviceThreadHandlePool<cublasContext, &at::cuda::(anonymous namespace)::createCublasHandle, &at::cuda::(anonymous namespace)::destroyCublasHandle>, std::allocator<at::cuda::(anonymous namespace)::DeviceThreadHandlePool<cublasContext, &at::cuda::(anonymous namespace)::createCublasHandle, &at::cuda::(anonymous namespace)::destroyCublasHandle> >, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /data/users/pritam/pytorch/torch/lib/libtorch_cuda.so #22 0x00007efc51a3fc5f in std::shared_ptr<at::cuda::(anonymous namespace)::DeviceThreadHandlePool<cublasContext*, &at::cuda::(anonymous namespace)::createCublasHandle, &at::cuda::(anonymous namespace)::destroyCublasHandle> >::~shared_ptr() () from /data/users/pritam/pytorch/torch/lib/libtorch_cuda.so #23 0x00007efca4648b0c in __run_exit_handlers () from /lib64/libc.so.6 #24 0x00007efca4648c40 in exit () from /lib64/libc.so.6 #25 0x0000558c8852e5f9 in Py_Exit (sts=0) at /tmp/build/80754af9/python_1614362349910/work/Python/pylifecycle.c:2292 #26 0x0000558c8852e6a7 in handle_system_exit () at /tmp/build/80754af9/python_1614362349910/work/Python/pythonrun.c:636 #27 0x0000558c8852e742 in PyErr_PrintEx (set_sys_last_vars=<optimized out>, set_sys_last_vars=<optimized out>) at /tmp/build/80754af9/python_1614362349910/work/Python/pythonrun.c:646 #28 0x0000558c88540dd6 in PyRun_SimpleStringFlags (command=0x7efca4dc9050 "from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=9, pipe_handle=13)\n", flags=0x7ffe3a986110) at /tmp/build/80754af9/python_1614362349910/work/Python/pythonrun.c:457 #29 0x0000558c88540ead in pymain_run_command (cf=0x7ffe3a986110, command=<optimized out>) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:420 #30 pymain_run_python (pymain=0x7ffe3a986220) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:2907 #31 pymain_main (pymain=0x7ffe3a986220) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:3460 #32 0x0000558c8854122c in _Py_UnixMain (argc=<optimized out>, argv=<optimized out>) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:3495 #33 0x00007efca4632493 in __libc_start_main () from /lib64/libc.so.6 #34 0x0000558c884e5e90 in _start () at ../sysdeps/x86_64/elf/start.S:103 ``` This was likely caused due to a static singleton that wasn't leaky. Following the guidance in https://isocpp.org/wiki/faq/ctors#construct-on-first-use-v2 to use a leaky singleton instead. ghstack-source-id: 132847448 Test Plan: Verified locally. Reviewed By: malfet Differential Revision: D29468866 fbshipit-source-id: 89250594c5cd2643417b1da584c658b742dc5a5c	2021-07-01 10:23:07 -07:00
Hector Yuen	d2fef350f2	add embedding bag skeleton take 2 (#61126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61126 adding skeleton implementations of quantized embedding tables with zeroes Test Plan: compilation, farm test, and ran test_find_dangling_impls and passed did a manual negative test and verified the message is printed properly ``` ====================================================================== FAIL: test_find_dangling_impls (test_dispatch.TestPythonDispatcher) ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/users/hyz/fbsource/fbcode/buck-out/opt/gen/caffe2/test/others#binary,link-tree/test_dispatch.py", line 892, in test_find_dangling_impls self.assertEqual( File "/data/users/hyz/fbsource/fbcode/buck-out/opt/gen/caffe2/test/others#binary,link-tree/torch/testing/_internal/common_utils.py", line 1498, in assertEqual super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) AssertionError: False is not true : Scalars failed to compare as equal! 0 != 1 Expect zero dangling impls, but found: ['name: quantized::qembedding_bag_4bit_unpack\nschema: (none)\nCUDA: registered at caffe2/aten/src/ATen/native/quantized/cuda/embedding_bag.cu:394 :: (Tensor _0) -> (Tensor _0) [ boxed unboxed ]\n'] Reviewed By: walterddr Differential Revision: D29518274 fbshipit-source-id: d0cb81c8bf51cdc4b83038758131ccf61e4360f5	2021-07-01 10:11:45 -07:00
Michael Suo	e5ae0e652d	[jit] Allow instance overrides of ignored methods (#61076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61076 Previously we would always retrieve ignored methods from the type, which doesn't work when the user has overriden the ignored method for a specific instance. This PR changes things up so we retrieve the ignored method as a bound method from the object being scripted, unwrap it, then re-bind it to the scriptmodule. Test Plan: Imported from OSS Differential Revision: D29504421 Pulled By: suo fbshipit-source-id: 14649863ea69a8d2180dd2c4341ec9a826039de1	2021-07-01 09:26:30 -07:00
Vitaly Fedyunin	ccfdb30644	Revert D29413019: [torch] Various improvements to `torch.distributed.launch` and `torch.distributed.run` Test Plan: revert-hammer Differential Revision: D29413019 (`4e181dfc35`) Original commit changeset: 323bfbad9d0e fbshipit-source-id: 1f8ae4b3d0a23f3eaff28c37e9148efff25fafe2	2021-07-01 08:44:51 -07:00
Vitaly Fedyunin	48bfc0e51c	[DataLoader] Add Example Only `fork` DataPipe (#60679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60679 This is example only DataPipe, not intended to be used in production. Used for tutorials, tests and documentation. Have to be replaced by real `fork` upon DataLoader update. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29461084 Pulled By: VitalyFedyunin fbshipit-source-id: a7e435f055f040e358f5465092b8daa07f8e29b7	2021-07-01 08:41:26 -07:00
Vitaly Fedyunin	62b2dc2059	[DataLoader] Decorate ZipDataPipe as `zip` (#60678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60678 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29461085 Pulled By: VitalyFedyunin fbshipit-source-id: f2037fbc67369aae10b07ef80a19e2a0ea7bf530	2021-07-01 08:41:25 -07:00
Vitaly Fedyunin	8e21ff91e2	[DataLoader] Add simple `groupby` DataPipe (#60675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60675 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29461082 Pulled By: VitalyFedyunin fbshipit-source-id: ded5a3a1555bfd8457d64b7e61ab6729fff9cb75	2021-07-01 08:40:20 -07:00
Vitaly Fedyunin	cb7d813275	Revert D28836794: SumKernel (BFloat16): use float as accumulation type Test Plan: revert-hammer Differential Revision: D28836794 (`4f5c68857f`) Original commit changeset: 46ed3a862c2b fbshipit-source-id: 3b586eeb752b7cdee909fa97a4c78876a6014770	2021-07-01 08:12:31 -07:00
Richard Barnes	11dca2e5f3	Fix some integer comparisons (#60894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60894 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D29431512 fbshipit-source-id: b0ef7656806f378ad823e503e7c27cc563d3dc7d	2021-07-01 08:08:39 -07:00
Rong Rong (AI Infra)	7017dc101f	Revert D29313058: add an embedding bag skeleton operators Test Plan: revert-hammer Differential Revision: D29313058 (`ae21357ada`) Original commit changeset: b05df6ff9a7c fbshipit-source-id: ef422aedad71dee6cb2824c58aceb66104376a65	2021-07-01 07:37:02 -07:00
Shijun Kong	d6521c2249	[pyper][emb][quantization] Support emb trained in FP16 (#60736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60736 Add support of embedding with input data type as float16, utilize new kernel functions added in fbgemm https://github.com/pytorch/FBGEMM/pull/616 Test Plan: `buck test caffe2/test/:quantization -- test_embedding_bag` Reviewed By: supriyar Differential Revision: D29392320 fbshipit-source-id: 0a120b3a58b6cf1d84961831097e9581ffd2b591	2021-07-01 07:35:59 -07:00
Elton Leander Pinto	d42aa176e4	Bump docker image tag for clang-tidy (#61115 ) Summary: The new tag should fix the "Missing <omp.h>" error message on clang-tidy runs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61115 Test Plan: Ran the clang-tidy job using the diff from https://github.com/pytorch/pytorch/issues/60976. Expected Output: There should be no clang diagnostic errors. Reviewed By: walterddr Differential Revision: D29516845 Pulled By: 1ntEgr8 fbshipit-source-id: 554229904db67eb7a7b93b3def434b30de6a43b0	2021-07-01 07:30:28 -07:00
Hao Lu	46595a9623	[Static Runtime] Add gflag to disable nnc and caffe2 math library (#61090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61090 Reviewed By: ajyu Differential Revision: D29479860 fbshipit-source-id: 2b53405f41d319f074c75d8923d97fd6a45fee4b	2021-07-01 00:01:37 -07:00
Zafar	c1499a9933	Enable jit tracing to parametrization and add jit tests (#60969 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60969 This PR fixes the tracing in the parametrizations. The current resolution is that when tracing is performed while caching is enabled, we throw an error. Without caching, the tracing should work properly (tests added). Currently, the parametrizations don't support scripting. This PR introduces the same logic as with the tracing (throw error if caching). However, the scripting itself cannot enabled due to the use of the generator expressions in the parametrizations. Added TODO to fix it. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29462887 Pulled By: z-a-f fbshipit-source-id: 49721d3059be58f36055d1c374080df41a748d66	2021-06-30 23:54:02 -07:00
Aliaksandr Ivanou	4e181dfc35	[torch] Various improvements to `torch.distributed.launch` and `torch.distributed.run` (#60925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60925 * Make `torch.distributed.launch` restarts to 0 * Remove unnecessary `-use_env` warning, move `-use_env` warnings * Move `-use_env` warnings to `torch.distributed.launch` * Make default log level WARNING * Add new doc section around transitioning to `torch.distributed.run` * Make `torch.distributed.launch` not use error-propagation * Set default events handler to `null` that does not print events to console * Add reference from `torch.distributed.launch` to `torch.distributed.run` * Set correct preexec function that sends SIGTERM to child processes when parent dies Issues resolved: https://github.com/pytorch/pytorch/issues/60716 https://github.com/pytorch/pytorch/issues/60754 Test Plan: sandcastle python -m torch.distributed.launch --nproc_per_node 2 main.py -> uses 0 restarts python -m torch.distributed.run --nproc_per_node 2 main.py -> uses default for torchelastic, 0 restarts python -m torch.distributed.launch --nproc_per_node=4 --use_env --no_python main.py -> produces error python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py -> no warning python -m torch.distributed.launch --nproc_per_node=4 --no_python main.py ->warning Output of running torch.distributed.launch without --use_env: $path/torch/distributed/launch.py:173: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torch.distributed.run. Note that --use_env is set by default in torch.distributed.run. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ('LOCAL_RANK')` instead. New section: {F628923078} {F628974089} Reviewed By: kiukchung, cbalioglu Differential Revision: D29413019 fbshipit-source-id: 323bfbad9d0e4aba3b10ddd7a243ca6e48169630	2021-06-30 23:31:02 -07:00
Hector Yuen	ae21357ada	add an embedding bag skeleton operators (#60491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60491 basic reference embedding bag operators, these are not going to be performant but can be used for functionality enablement these operators will output the right shape, but the implementation is empty Test Plan: tbd Reviewed By: vkuzo Differential Revision: D29313058 fbshipit-source-id: b05df6ff9a7c0c6ac46ef64a42464988453bd460	2021-06-30 23:09:11 -07:00
Philip Meier	db1dd9e7e0	add support for quantized tensors in `torch.testing.assert_close` (#58926 ) Summary: This adds support for quantized tensors the same way torch.testing._internal.common_utils.TestCase.assertEqual does: `bf269fdc98/torch/testing/_internal/common_utils.py (L1314-L1341)` - `.qscheme()` is checked for equality - `.q_scale` and `q_zero_point` are checked for equality (see comment below) for `.qscheme() == torch.per_tensor_affine` - `.q_per_channel_scales`, `q_per_channel_zero_points`, and `q_per_channel_axis` are checked for equality (see comment below) for `.qscheme() == torch.per_tensor_affine` - values are checked with the default checks after a `.int_repr().to(torch.int32)` call Pull Request resolved: https://github.com/pytorch/pytorch/pull/58926 Reviewed By: jerryzh168 Differential Revision: D29483532 Pulled By: mruberry fbshipit-source-id: 003fde7e21cf844778a879c3de0a7c84d13877bd	2021-06-30 21:43:02 -07:00
Jeffrey Wan	06fc637b41	Check native_function's outputs' TensorImpl and StorageImpl (#60286 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25927 Does some checks described in https://github.com/pytorch/pytorch/issues/25927#issuecomment-589354373: If function does not modify its inputs (non-inplace and has no out arg): - Check TensorImpl has use_count of 1. (This should make us aware of functions that return self. - If function is a view function check that StorageImpl is same as that of the aliased input, otherwise, StorageImpl's use_count is 1. Detected a couple functions that failed the check that returned TensorImpl should have use_count of 1: 'native_batch_norm', 'native_batch_norm_backward', '_embedding_bag'. (Filing issues). Examples of generated code: We did not update checks for in-place ops (this includes in-place views). Example of a view: - Check that outputs StorageImpl of `result` is the same as that of `self`. - Check TensorImpl has use_count of 1 ```cpp at::Tensor as_strided(c10::DispatchKeySet ks, const at::Tensor & self, at::IntArrayRef size, at::IntArrayRef stride, c10::optional<int64_t> storage_offset) { auto& self_ = unpack(self, "self", 0); auto _any_requires_grad = compute_requires_grad( self ); (void)_any_requires_grad; std::shared_ptr<AsStridedBackward> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<AsStridedBackward>(new AsStridedBackward(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); grad_fn->self_geometry = TensorGeometry(self); grad_fn->size = size.vec(); grad_fn->stride = stride.vec(); grad_fn->storage_offset = storage_offset; } #ifndef NDEBUG c10::optional<Storage> self__storage_saved = self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt; c10::intrusive_ptr<TensorImpl> self__impl_saved; if (self_.defined()) self__impl_saved = self_.getIntrusivePtr(); #endif auto _tmp = ([&]() { at::AutoDispatchBelowAutograd guard; return at::redispatch::as_strided(ks & c10::after_autograd_keyset, self_, size, stride, storage_offset); })(); auto result = std::move(_tmp); #ifndef NDEBUG if (self__storage_saved.has_value()) AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage())); if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr()); if (self__storage_saved.has_value()) AT_ASSERT(self__storage_saved.value().is_alias_of(result.storage())); <<<<<<<<<<<<<<<<<<<<<<<< AT_ASSERT(result.use_count() == 1); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } TORCH_CHECK_NOT_IMPLEMENTED(!(isFwGradDefined(self)), "Trying to use forward AD with as_strided that does not support it."); return result; } ``` Example of non-view: - Check that output's StorageImpl has use_count of 1. - Check that output's TensorImpl has use_count of 1. ```cpp at::Tensor asin(c10::DispatchKeySet ks, const at::Tensor & self) { auto& self_ = unpack(self, "self", 0); auto _any_requires_grad = compute_requires_grad( self ); (void)_any_requires_grad; std::shared_ptr<AsinBackward> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<AsinBackward>(new AsinBackward(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); grad_fn->self_ = SavedVariable(self, false); } #ifndef NDEBUG c10::optional<Storage> self__storage_saved = self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt; c10::intrusive_ptr<TensorImpl> self__impl_saved; if (self_.defined()) self__impl_saved = self_.getIntrusivePtr(); #endif auto _tmp = ([&]() { at::AutoDispatchBelowADInplaceOrView guard; return at::redispatch::asin(ks & c10::after_autograd_keyset, self_); })(); auto result = std::move(_tmp); #ifndef NDEBUG if (self__storage_saved.has_value()) AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage())); if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr()); if (result.has_storage()) AT_ASSERT(result.storage().use_count() == 1); <<<<<<<<<<<<<<<<<<<<<<<<<< AT_ASSERT(result.use_count() == 1); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } if (isFwGradDefined(self)) { auto self_t_raw = toNonOptFwGrad(self); auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self)); auto self_p = toNonOptPrimal(self); auto result_new_fw_grad = (self_t.conj() * (-self_p * self_p + 1).rsqrt().conj()).conj(); if (result_new_fw_grad.defined()) { // The hardcoded 0 here will need to be updated once we support multiple levels. result._set_fw_grad(result_new_fw_grad, /* level / 0, / is_inplace_op */ false); } } return result; } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60286 Reviewed By: jbschlosser Differential Revision: D29402253 Pulled By: soulitzer fbshipit-source-id: b90f34c455b8767f95a52c329db351dbbb495397	2021-06-30 19:19:01 -07:00
Joel Schlosser	03b5a225a7	Test parametrization for instantiated device-specific tests (#60233 ) Summary: The `ops` decorator provides a way to parameterize a test across a given list of ops. This would be useful for modules as well (e.g. a `modules` decorator), but the mechanism by which this is accomplished is specific to ops. In the details, the `ops` decorator tags a test function with the metadata needed (list of ops, `dtypes`) and the actual tests are generated according to this metadata during the call to `instantiate_device_type_tests()`. This PR makes this mechanism more generic, allowing for test parameterization across arbitrary dimensions. This makes a `modules` decorator (or any similar type of decorator) straightforward to implement without changes to the device-specific test instantiation logic. One caveat is that, since this is implemented where the old `ops` decorator was (within `instantiate_device_type_tests()`), this only works for tests instantiated using the device-specific instantiation logic. Longer term, even device-specific test instantiation could be treated as an optional parameterization across device types, but this PR takes a low-risk approach for now. In practice, this just means that a `device` kwarg is required for all test signatures used with the mechanism. The `ops` decorator has been refactored to use the generic mechanism and works the same as before, with one difference: when `OpDTypes.none` is specified, the test signature no longer needs an unused `dtype` kwarg. This is a nice bonus that demonstrates the added flexibility of a generic parameterization mechanism. The refactored form also has the bonus that all op-specific test generation logic is contained within the `ops` decorator class, improving readability. Behind the scenes, the generic mechanism is a base decorator class (`_TestParameterizer`) from which `ops` derives. The core functionality is in the `_parameterize_test()` method, which takes in a test function and returns a generator that produces parameterized tests, including names and parameter kwargs to pass to them. Using the `ops` decorator results in a set of op-specific tests from a given generic test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60233 Reviewed By: iramazanli Differential Revision: D29494995 Pulled By: jbschlosser fbshipit-source-id: a14446488c106094fafcaa75ccf8e9e3faf33bfc	2021-06-30 18:50:22 -07:00
Zhengxu Chen	6643df2680	[jit] Use computed loop to dispatch to next instruction in interpreter. (#60211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60211 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D29211283 fbshipit-source-id: 2f87b5a78d4fc00ce11ed509fc15db35332690b6	2021-06-30 17:44:26 -07:00
Xiaomeng Yang	357a21bc92	Fix numerical issue of rowwise normalization in Caffe2 and internal tests. (#60880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60880 Fix numerical issue of rowwise normalization in Caffe2 and internal tests. Test Plan: buck test mode/opt //dper3/dper3/modules/tests:xdeepint_test -- --exact 'dper3/dper3/modules/tests:xdeepint_test - test_xdeepint_with_full_features_with_interactions_3 (dper3.dper3.modules.tests.xdeepint_test.XdeepInt_Test)' Reviewed By: esqu1 Differential Revision: D29431597 fbshipit-source-id: 72df52fdcbb29ad3de7b9472f25fde26cf804a76	2021-06-30 17:31:04 -07:00
Rong Rong (AI Infra)	0824b919ec	[BE] move general script out of .circleci/ into tools/ (#60973 ) Summary: Second step in https://github.com/pytorch/pytorch/issues/60373. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60973 Reviewed By: samestep Differential Revision: D29499385 Pulled By: walterddr fbshipit-source-id: 22df22f78f6b9af6221917a10188218773245009	2021-06-30 17:20:05 -07:00
Nikita Shulga	4036820506	Add PocketFFT support (#60976 ) Summary: Needed on platforms, that do not have MKL, such as aarch64 and M1 - Add `AT_POCKETFFT_ENABLED()` to Config.h.in - Introduce torch._C.has_spectral that is true if PyTorch was compiled with either MKL or PocketFFT - Modify spectral test to use skipCPUIfNoFFT instead of skipCPUIfNoMKL Share implementation of `_out` functions as well as fft_fill_with_conjugate_symmetry_stub between MKL and PocketFFT implementations Fixes https://github.com/pytorch/pytorch/issues/41592 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60976 Reviewed By: walterddr, driazati, janeyx99, samestep Differential Revision: D29466530 Pulled By: malfet fbshipit-source-id: ac5edb3d40e7c413267825f92a5e8bc4bb249caf	2021-06-30 16:28:20 -07:00
Rong Rong (AI Infra)	2d0c6e60a7	going back to use packaging.version.parse instead (#61053 ) Summary: I think this may be related to https://app.circleci.com/pipelines/github/pytorch/vision/9352/workflows/9c8afb1c-6157-4c82-a5c8-105c5adac57d/jobs/687003 Apparently `pkg_resource.parse_version` returns a type of `pkg_resources.extern.packaging.version.Version` instead of `packaging.version.Version` and seems on some older version of the setuptools it doesn't support `.major/minor` operation. changing it back to using `packaging.version.parse` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61053 Test Plan: CI Reviewed By: samestep Differential Revision: D29494322 Pulled By: walterddr fbshipit-source-id: 294572a10b167677440d7404e5ebe007ab59d299	2021-06-30 16:23:59 -07:00
David Riazati	a2ad84afbb	Send test reports to S3 (#61071 ) Summary: This sends the test reports zip to S3 in addition to the GitHub artifact store. This makes it easier to query in the PR HUD since we don't have to deal with the GitHub API's rate limits / download speeds. The impact on S3 storage should be minimal since it's only 500 KB or so per run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61071 Reviewed By: nikithamalgifb Differential Revision: D29498941 Pulled By: driazati fbshipit-source-id: 74bfbe7fa7d1d97fd8a6938c98dfe0caff0ab6eb	2021-06-30 16:00:01 -07:00
Han-Hsien Huang	812ed47caa	[Static runtime] Add unit tests to ops bmm and addmm (#61000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61000 Add unit tests to bmm and addmm operators in static runtime. Test Plan: buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest {F628935117} Reviewed By: hlu1 Differential Revision: D29459679 fbshipit-source-id: 5c7fa5c9b0675c1c84f3ae3110204d663255009c	2021-06-30 15:55:58 -07:00
Mike Guo	4ff81ab112	escape backward slash in stack trace in Windows to slash (#60842 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60187 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60842 Reviewed By: gdankel Differential Revision: D29498498 Pulled By: malfet fbshipit-source-id: 78e1b25a2e6bdfd3ba0c988d023c7a7f79a22cf4	2021-06-30 15:32:03 -07:00
Meghan Lele	6c1c1111de	[JIT] Add reference semantics to TorchScript classes (#44324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44324 Summary This commit adds reference semantics to TorchScript class types; modifications made to them within TorchScript will be visible in Python. Test Plan This commit adds a unit test to `TestClassType` that checks that modifications made to a class type instance passed into TorchScript are visible in Python after executing the scripted function or module. Fixes This commit closes #41421. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24912807 Pulled By: SplitInfinity fbshipit-source-id: d64ac6211012425b040b987e3358253016e84ca0	2021-06-30 14:27:17 -07:00
driazati	aa728dc335	Fix fx patch module name (#61062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61062 Instead of being 'patch' this should be the import-able name of the module (it's defined as `_fx` on the `torch._C` module, so the full name should be `torch._C._fx`). This now works correctly: ```python >>> import torch._C._fx >>> dir(torch._C._fx) ['__doc__', '__loader__', '__name__', '__package__', '__spec__', 'patch_function'] ``` Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D29497018 Pulled By: driazati fbshipit-source-id: 093aa0552b48feb0aabe47bdf72776dddd5a3b8f	2021-06-30 14:23:35 -07:00
Angela Yi	dabadd7e20	[quant] Added reset_min_max_vals() function to observers (#60883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60883 As per this [comment](https://github.com/pytorch/pytorch/pull/59964#discussion_r659064270), I created a `reset_min_max_vals()` function inside the observers which will be called during input-weight equalization. This is so that we will not expose the implementation of the observers in the equalization code. Test Plan: `python test/test_quantization.py TestEqualizeFx` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29491848 fbshipit-source-id: 00e91959ceb3b4f3688175a1a7ba11823e929b2f	2021-06-30 14:22:08 -07:00
Angela Yi	1a0195db49	[quant] Input-Weight Equalization - support for LinearReLU layers (#60653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60653 Special casing was needed to get the weight attribute in the linear layers of fused LinearReLU layers. Initial Model: `x -> linear1 -> relu` After fusion: `x -> linearRelu` After prepare: `x -> input_quant_obs -> input_eq_obs1 -> linearRelu -> output_quant_obs1` After equalization functions: `x -> mul -> input_quant_obs (scaled) -> linearRelu -> output_quant_obs` After convert: `x -> mul -> quantize_per_tensor -> quantized::linearRelu -> dequantize` More step-throughs here: https://fb.quip.com/A9J3AsBxkykR Test Plan: `python test/test_quantization.py TestEqualizeFx` Original model: ``` LinearReluModel( (fc): Linear(in_features=5, out_features=5, bias=True) (relu): ReLU() ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {}) return fc_activation_post_process_0 ``` Graph after equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0,), kwargs = {}) %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {}) return fc_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %fc_input_scale_0 : [#users=1] = get_attr[target=fc_input_scale_0] %fc_input_zero_point_0 : [#users=1] = get_attr[target=fc_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %fc_input_scale_0, %fc_input_zero_point_0, torch.quint8), kwargs = {}) %fc : [#users=1] = call_module[target=fc](args = (%quantize_per_tensor,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%fc,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: supriyar Differential Revision: D29406999 fbshipit-source-id: add38e8e7fb84a241c3b10bfb8451b50103effd4	2021-06-30 14:22:06 -07:00
Nikita Shulga	546102e161	Fix overflow in quantize_val_arm (#60079 ) Summary: By using `__builtin_add_overflow` to detect integer overflows when `zero_point` is added to rounded integral value. Also fix small typo. After this PR `python3 -c "import torch;print(torch.torch.quantize_per_tensor(torch.ones(10) * 2**32, 0.5, 1, torch.quint8))"` returns same vector of `127` on both x86_64 and aarch64 platforms This change merely mitigates overflow bug, more proper (and perhaps performance impacting) fix would be to add `zero_point` to floating values both in serial and in vectorized code. Filed https://github.com/pytorch/pytorch/issues/61047 to track this one Also filed https://github.com/pytorch/pytorch/issues/61046 to clarify intended use of `__ARM_NEON__` define Fixes https://github.com/pytorch/pytorch/issues/60077 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60079 Reviewed By: kimishpatel Differential Revision: D29157883 Pulled By: malfet fbshipit-source-id: 6f75d93e6d3d4d0d5a5eab545cb27773086b9768	2021-06-30 14:20:56 -07:00
Nikita Shulga	cef0851223	Make torch.utils.bencmark numpy free (#60564 ) Summary: PyTorch core do not depend on numpy, so benchmarks should not depend on it as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/60564 Reviewed By: robieta Differential Revision: D29497375 Pulled By: malfet fbshipit-source-id: d9566e5b2e48868cef5568cd62f691af19ccf1f1	2021-06-30 14:17:32 -07:00
Jeff Daily	d1a4c9e682	[ROCm] allow user to override PYTORCH_ROCM_ARCH (#60602 ) Summary: Restores the ability of a user to call .jenkins/pytorch/build.sh while also setting PYTORCH_ROCM_ARCH. Otherwise, with IN_CI=1 as the new default, it will forcibly ignore user settings when build.sh is used outside of CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60602 Reviewed By: samestep Differential Revision: D29490791 Pulled By: janeyx99 fbshipit-source-id: b5e8a529b8e0b5020b260b4bf027a37e0c1df8d5	2021-06-30 13:35:11 -07:00
Richard Barnes	14cc234a8a	Fix some comparison warnings (#60875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60875 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29406593 fbshipit-source-id: 0eb070ef05c1cd343c9e835786b42014d0553aa5	2021-06-30 13:09:41 -07:00
Richard Barnes	74692f3ada	Loop transformation (#60874 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60874 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29406474 fbshipit-source-id: c994361e9fdafb7c4519ce2f1c40288a9ef025be	2021-06-30 13:09:39 -07:00
Richard Barnes	a8b56ea58b	Remove another for-loop in SoftMax (#60873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60873 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29406429 fbshipit-source-id: 3b5710ed9e5d1d14379f64670638ab119d0d78e3	2021-06-30 13:09:38 -07:00
Richard Barnes	850ff82edc	Remove for-loop for getting number of elements in favour of abstraction (#60872 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60872 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29406199 fbshipit-source-id: ae49672cf1bb370d574d0c21231477bb17dea0ca	2021-06-30 13:08:25 -07:00
Martin Yuan	95e77e0af2	[Delegate] A more specific prefix for lowered module name. (#61007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61007 Test Plan: Imported from OSS Reviewed By: kimishpatel, raziel Differential Revision: D29477733 Pulled By: iseeyuan fbshipit-source-id: 94a7a784d98a41ff7ba255955acf74bd26297c9f	2021-06-30 12:37:09 -07:00
Heitor Schueroff	f32f85e6da	Implemented torch.corrcoef (#60420 ) Summary: Implements `torch.corrcoef` similar to [`np.corrcoef`](https://numpy.org/doc/stable/reference/generated/numpy.corrcoef.html) using `torch.cov` implemented in https://github.com/pytorch/pytorch/pull/58311. closes https://github.com/pytorch/pytorch/issues/1254 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60420 Reviewed By: mruberry Differential Revision: D29474687 Pulled By: heitorschueroff fbshipit-source-id: f3c7c5610363aebd88274a51fc77e3cf879cb611	2021-06-30 12:36:02 -07:00
Jiewen Tan	d5be67a338	Expose findDanglingImpls to Python (#60827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60827 This diff exposed Dispatcher.findDanglingImpls to Python as _C._dispatch_find_dangling_impls. ghstack-source-id: 132799970 Test Plan: buck test mode/dev //caffe2/test:others -- test_find_dangling_impls Reviewed By: ezyang Differential Revision: D29416330 fbshipit-source-id: d2f26054b6e247be1bb9e818eaa7cb9e68a4a913	2021-06-30 12:31:19 -07:00
Peter Bell	3cf267bfa6	Embedding: Remove dispatch in parallel region (#60597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60597 Ref #56794 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29446191 Pulled By: ngimel fbshipit-source-id: d6ff010104ae621d5e3d9c269ed2b48407e71d67	2021-06-30 12:30:15 -07:00
mingfeima	4f5c68857f	SumKernel (BFloat16): use float as accumulation type (#55217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55217 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28836794 Pulled By: VitalyFedyunin fbshipit-source-id: 46ed3a862c2bb4c6325c78ecfc5d01761f7a113a	2021-06-30 12:27:42 -07:00
Amy He	4d5edef8d4	Python composite module execution unit tests on delegation of backend_with_compiler_demo (#60801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60801 backend_with_compiler_demo Added unit tests for the execution of a simple composite module with a compiler Test Plan: Running python test/test_jit.py TestBackendsWithCompiler -v gives a success Imported from OSS Reviewed By: iseeyuan Differential Revision: D29409958 fbshipit-source-id: b02e58bdcc25a2997b70ecae41a019b8596323c1	2021-06-30 12:23:32 -07:00
Rohan Varma	3957ed41a9	[DDP] Disable reducer hooks from running outside of DDP backwards. (#60921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60921 Sometimes local modules can fire hooks (such as when user calls backward after using `ddp_module.module` explicitly). This isn't supported behavior and can cause issues with various state and gradient reduction we run in DDP, so it's best to disable this entirely. ghstack-source-id: 132739311 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29435737 fbshipit-source-id: fef76a0dd2955c432131632fb81dde4a4982ad91	2021-06-30 12:19:18 -07:00
Yi Zhang	5a4282d06b	fix typo in binary_build_script (#61016 ) Summary: resolve comments in https://github.com/pytorch/pytorch/issues/60849 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61016 Reviewed By: samestep Differential Revision: D29487908 Pulled By: janeyx99 fbshipit-source-id: 32feb6c6e1009324201e3d2c6fcd9a7388791401	2021-06-30 11:52:38 -07:00
Sam Estep	d44515c418	Fix lint (#61058 ) Summary: https://github.com/pytorch/pytorch/issues/61003 broke Lint / shellcheck because of a race condition with https://github.com/pytorch/pytorch/issues/60221. This PR fixes it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61058 Test Plan: CI. Reviewed By: walterddr Differential Revision: D29494727 Pulled By: samestep fbshipit-source-id: e6c5ea6daa47db13eb6a42cc2b5bf9c938c1839d	2021-06-30 11:45:23 -07:00
Will Constable	a25e6370e5	Add IMethod interface Summary: Expose IMethod interface, which provides a unified interface to either script or python methods backed by torchscript or torchdeploy. IMethod provides a way to depend on a torch method without depending on a particular runtime implementation such as torchscript or python/deploy. Test Plan: add unit tests. Reviewed By: suo Differential Revision: D29463455 fbshipit-source-id: 903391d9af9fbdd8fcdb096c1a136ec6ac153b7c	2021-06-30 11:28:24 -07:00
imaginary-person	dace860008	Migrate pytorch-linux-bionic-py3.8-gcc9-coverage to GHA (#61050 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59166 `pytorch-linux-bionic-py3.8-gcc9-coverage` build & tests can be run on `linux.2xlarge` instances on GHA, which have AVX512 support. Thanks cc malfet seemethere samestep zhouzhuojie Pull Request resolved: https://github.com/pytorch/pytorch/pull/61050 Reviewed By: walterddr, 1ntEgr8 Differential Revision: D29493335 Pulled By: samestep fbshipit-source-id: de79e61f13c537ef7ff30a1e04d1bbc625a06dd1	2021-06-30 11:02:57 -07:00
Dimitrije Jankov	b4496df7d3	mkl_scsrmm needs to be disabled when MKL is not used (#60051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60051 Introduction: We want to minimize the number of dependencies for the SGX port. Therefore we need the ability to disable MKL when it is not used. Problem : There is a call to mkl_scsrmm that is enabled when CAFFE2_USE_MKL is not defined. This causes a compile error. Solution : Surround the call with preprocessor checks to CAFFE2_USE_MKL Test Plan: Run the pytorch tests. Reviewed By: LiJihang Differential Revision: D29022635 fbshipit-source-id: 94ae9fdfe53399b64d8c2d4089eebe93d1d260e8	2021-06-30 10:40:18 -07:00
Jane Xu	5644c31ec0	Move windows periodic jobs to GHA (#61003 ) Summary: Moves periodic 11.3 windows jobs to GHA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61003 Test Plan: https://github.com/pytorch/pytorch/pull/61003/checks?check_run_id=2947910829 Does NOT yet move the debuggable CI part yet Reviewed By: malfet Differential Revision: D29488761 Pulled By: janeyx99 fbshipit-source-id: b16b23b40fe1f6ae189292c6f2c561e5e70f122b	2021-06-30 10:25:10 -07:00
Vitaly Fedyunin	9b5e1e0734	[DataLoader] Make `batch` DataPipe sensitive to unbatch_level argument (#60672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60672 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29461086 Pulled By: VitalyFedyunin fbshipit-source-id: efc6b3b567323defe64d3f1b30a5708107e62dd4	2021-06-30 10:04:32 -07:00
Vitaly Fedyunin	66de50cc11	[DataLoader] Make `shuffle` DataPipe sensitive to unbatch_level argument (#60671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60671 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29461083 Pulled By: VitalyFedyunin fbshipit-source-id: 3d371017d5ce948a1e5b8182ae91033190f64da7	2021-06-30 10:03:29 -07:00
Vitaly Fedyunin	a652398465	[DataLoader] Rename transform DataPipe to legacy_transform (#60670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60670 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29461081 Pulled By: VitalyFedyunin fbshipit-source-id: 57f53a91db9032a6126e86243ddea9149c473060	2021-06-30 09:49:14 -07:00
zhouzhuojie	abb4ed7412	Move clang-format to lint.yml (#60918 ) Summary: Refactor and consolidate the location of lint related workflows Pull Request resolved: https://github.com/pytorch/pytorch/pull/60918 Reviewed By: mruberry Differential Revision: D29459605 Pulled By: zhouzhuojie fbshipit-source-id: c2993cfd037a03b733a414897bd53cf407c7c268	2021-06-30 09:45:35 -07:00
Sam Estep	0b8a7daa2a	Enable multigpu_test in GHA (#60221 ) Summary: - [x] add to test matrix - [x] enable on PRs for testing - [x] modify the scripts so it actually runs the multigpu tests - [x] put `num_shards` after `shard` number - [x] use a separate test-reports artifact - [x] run on `linux.16xlarge.nvidia.gpu` - [x] validate that it works - [x] disable on PRs before merging Pull Request resolved: https://github.com/pytorch/pytorch/pull/60221 Test Plan: CI. Example run: https://github.com/pytorch/pytorch/actions/runs/984347177 Reviewed By: malfet Differential Revision: D29430567 Pulled By: samestep fbshipit-source-id: 09f8e208e524579b603611479ca00515c8a1b5aa	2021-06-30 08:52:38 -07:00
Vasiliy Kuznetsov	5576c7bdd1	ns for fx: initial support for int8 shadows fp32 (#60419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60419 Adds support for NS for FX shadowed activations pass to handle int8 modules shadowing fp32 modules. The difficulty here is that in order to insert the dtype cast, we need the qparams of the input. For the current PR, we only handle the easy cases where the previous node is either a `quantize_per_tensor` or an OSS quantized module. A future PR can handle more complicated cases such as various functions. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_int8_shadows_fp32_simple ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29280050 fbshipit-source-id: 465257c9f82a34fa91b48ae8887355c68e00edc6	2021-06-30 08:08:46 -07:00
Victor Quach	a5e2ea4345	Add noop register hook (#60685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60685 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29466224 fbshipit-source-id: 68c8aa022ccffeefd45062f1443d15c9a6824f3d	2021-06-30 07:46:34 -07:00
Alban Desmaison	1fd65967e5	Revert D29312809: add quantized_resize and dequantize for some cuda backends Test Plan: revert-hammer Differential Revision: D29312809 (`c4cc26f26a`) Original commit changeset: c5c5eabb98bc fbshipit-source-id: 565e215513b68eae0dacdd1660b1a01759215511	2021-06-30 07:37:09 -07:00
Hao Lu	bfe03120ee	[PyPer] Fix schema of fb::equally_split (#60852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60852 Reviewed By: ajyu Differential Revision: D29423425 fbshipit-source-id: 4525db1f268ca65d6851a5ec846a6ae2f710ec6b	2021-06-30 03:18:15 -07:00
lezcano	af5a0df1d0	Prefer linalg::qr over qr in the C++ API (#60529 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60060 Also adds `torch::linalg::qr` to the C++ API, as it was missing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60529 Reviewed By: ngimel Differential Revision: D29353133 Pulled By: mruberry fbshipit-source-id: e18feaffca91c13940ad3d6bd1f40bb57dc101ae	2021-06-30 02:48:04 -07:00
Kurt Mohler	b39770c461	Fix degenerate shape behavior for ord=+/-2 (#60273 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59198 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60273 Reviewed By: jbschlosser Differential Revision: D29422907 Pulled By: mruberry fbshipit-source-id: 609cd640b0477f90bebca20865e34cbe182d3909	2021-06-30 02:17:26 -07:00
Mengwei Liu	10fc58620e	[PyTorch][NASProfiler] Add moduleHierarchy Python API to print out hierarchical information about a Node (#60384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60384 Currently inlining module graph will drop module hierarchy info on Python side. Here we retrieve the module hierarchy from cpp side and expose it to a new Python API on Node called `moduleHierarchy()`. Test Plan: Usage: ``` torch._C._jit_pass_inline(module.graph) torch._C._jit_pass_propagate_shapes_on_graph(module.graph) node = module.graph.findNode("quantized::conv2d_relu") 'top(' + module.original_name + ').' + node.moduleHierarchy() + '.' + node.kind() ``` Output: ``` 'top(QuantWrapper).module(FBNetHR).0(Sequential).xif0_0(ConvBNRelu).conv(ConvReLU2d).quantized::conv2d_relu' ``` Reviewed By: kimishpatel Differential Revision: D29252169 fbshipit-source-id: 74163a87f919e061e5e75dfebc4c5cdbe8489d93	2021-06-30 01:32:31 -07:00
Philip Meier	44b3dc4eac	resolve conjugate bit in `torch.testing.assert_close` (#60522 ) Summary: We need to resolve the conjugate bit for complex tensors, because otherwise we may not be able to access the imaginary component: ```python >>> torch.tensor(complex(1, 1)).conj().imag RuntimeError: view_as_real doesn't work on unresolved conjugated tensors. To resolve the conjugate tensor so you can view it as real, use self.resolve_conj(); however, be warned that the resulting tensor will NOT alias the original. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60522 Reviewed By: ngimel Differential Revision: D29353095 Pulled By: mruberry fbshipit-source-id: c36eaf883dd55041166f692f7b1d35cd2a34acfb	2021-06-30 01:31:30 -07:00
Hector Yuen	c4cc26f26a	add quantized_resize and dequantize for some cuda backends (#60489 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60489 adding entries into native_functions.yaml to enable these functions since the code is common between cuda and cpu Test Plan: tested with a full model, unit tests on the way Reviewed By: ezyang Differential Revision: D29312809 fbshipit-source-id: c5c5eabb98bc192343ec78980dc4e3fc3f41d3db	2021-06-30 00:33:12 -07:00
Hao Lu	4adc5eb6c5	[Caffe2][Testing] Check for equality first in assertTensorEqualsWithType<float> (#61006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61006 Test Plan: Modified existing unit test to test for eps = 0. It would fail without the equality test first. Reviewed By: ajyu Differential Revision: D29423770 fbshipit-source-id: 168e7de00d8522c4b646a8335d0120700915f260	2021-06-29 23:31:37 -07:00
Malay Bag	287c0ab170	[FX] Add requires_grad to TensorMetadata (#60972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60972 For PyTorch model memory requirement calculation, requires_grad is needed. Output tensors with requires_grad are saved in module context and increases memory during forward pass. Test Plan: Existing test cases Reviewed By: jamesr66a Differential Revision: D29024932 fbshipit-source-id: def990f8c6ff6fa4537bfc377c646b9d44464ebd	2021-06-29 23:07:27 -07:00
Michael Melesse	ce232e7847	[ROCM] enable fft tests (#60313 ) Summary: This PR enables fft tests on ROCM. It contains a function that generates a valid input for fft tests that call hipfftExecC2R or hipfftExecZ2D. With this helper function we are able to fix a number of fft tests. This brings a close to the series of fft PRs enabling fft tests on ROCM. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60313 Reviewed By: mruberry Differential Revision: D29463487 Pulled By: malfet fbshipit-source-id: d0903fbf12d24ba95a42c8b7589714fdb63353ed	2021-06-29 22:43:29 -07:00
Pruthvi Madugundu	e2b42c6f52	[ROCm] Update the magma build to new commit (#60900 ) Summary: Magma master branch is updated with all the fixes required for ROCm, so updating the magma build to the new commit for ROCm pyTorch builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60900 Reviewed By: jbschlosser Differential Revision: D29440587 Pulled By: malfet fbshipit-source-id: 2ccdf48441dfff3d19c4a478e03ac11a843f8419	2021-06-29 22:38:58 -07:00
Bert Maher	93772792e3	[nnc] Get rid of fuser trigger counters (#57334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57334 Here's a possibly controversial PR. These counters got in the way of generalizing the fuser tests to handle arbitrary devices, and I guess I'm just generally skeptical that they provide much value. While true that they let us observe whether fusion groups were created, we already have assertions based on the shape of the graph, and I'm not sure that I trust those any less than these counters. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29471484 Pulled By: bertmaher fbshipit-source-id: f6d76f6e72dbfb581acff1d834b0c74500941b57	2021-06-29 22:22:15 -07:00
Bert Maher	c4f718cb72	[nnc] Serialize initialization of LLVM targets (#60996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60996 We've had a bug report of weird LLVM initialization errors, e.g., ``` Unexpected failure in LLVM JIT: Cannot choose between targets "x86-64" and "x86-64" ``` While I haven't repro'ed that exact message, I did run a stress-test that compiles on many threads simultaneously, and it deadlocks in TargetRegistry::lookupTarget. And in fact I remember debugging this before in a different system, and finding "Clients are responsible for avoid race conditions in registration" in https://llvm.org/doxygen/TargetRegistry_8cpp_source.html. So yeah, let's lock this thing. ghstack-source-id: 132719018 Test Plan: Heavy multithreaded compilation. Not sure if it's suitable for landing. Reviewed By: ZolotukhinM Differential Revision: D29471343 fbshipit-source-id: b495e468b57e77796a08b627884d3efeca2d1f7c	2021-06-29 22:21:00 -07:00
Patrick	5bc28c897e	fixed launch bounds for gamma_cuda_kernel (#60393 ) Summary: Changed launch bounds for gamma_cuda_kernel from 512 to 256. Timing data (using Nvidia Titan-V): ![GammaTimingData](https://user-images.githubusercontent.com/22803332/122821464-bc873300-d291-11eb-9be6-2fb690f0d5c7.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/60393 Reviewed By: jbschlosser Differential Revision: D29447926 Pulled By: ngimel fbshipit-source-id: c2112f9be8ede3bb07cb72f301393f24d17e0c01	2021-06-29 19:22:07 -07:00
Peter Bell	b3ec92cf66	BatchNorm: Remove dispatch in parallel region (#60596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60596 Ref #56794 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29446193 Pulled By: ngimel fbshipit-source-id: 3ebf44a5f1e001e7dc42cd5963752b7e5b9bcbd9	2021-06-29 18:28:46 -07:00
Peter Bell	28dc02fe9f	Accumulate 16-bit float sums in 32-bit accumulators (#60387 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60387 Fixes gh-59489 Using 32-bit accumulators is a win-win: improved precision and improved performance since the half precision types needed to be converted back and forth to 32-bit float to do the arithmetic anyway. Note that on multi-threaded or dis-contiguous sums, there can be partial sums stored in the output so they are necessarily trucated to 16-bit. Fixing this would require a rework of TensorIterator reductions. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29447187 Pulled By: ngimel fbshipit-source-id: d0619e0ca2fe116d101460142b79ca56fd6d0840	2021-06-29 17:52:30 -07:00
Victor Quach	f54290fd72	Expose raw saved tensors for custom functions (#60551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60551 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29466228 fbshipit-source-id: 7565f6cc3f2488c7e444cf81c7eb37a60c75b0e8	2021-06-29 17:21:52 -07:00
Yi Zhang	a469298707	Free space in windows libtorch build (#60849 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60856 Remove more unless pre-installed softwares in CI image verification links https://app.circleci.com/pipelines/github/pytorch/pytorch/342992/workflows/3f52cacc-ba1c-4093-804f-d4c1b1c0b806/jobs/14436533 https://app.circleci.com/pipelines/github/pytorch/pytorch/342992/workflows/3f52cacc-ba1c-4093-804f-d4c1b1c0b806/jobs/14437351 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60849 Reviewed By: mruberry Differential Revision: D29473637 Pulled By: seemethere fbshipit-source-id: f33dd98de32a79ba1195481f1bd9f2d5362fe16e	2021-06-29 16:53:10 -07:00
Elton Leander Pinto	af66356d47	[skip-ci] Bump docker image tag (#60988 ) Summary: This PR bumps the docker image tag for clang-tidy. The new image runs ubuntu-20.04 (and therefore has python3.8 by default). Pull Request resolved: https://github.com/pytorch/pytorch/pull/60988 Reviewed By: malfet Differential Revision: D29469941 Pulled By: 1ntEgr8 fbshipit-source-id: 7268bdb23edff0bc26f275689bf4b1f1ca129df7	2021-06-29 15:23:06 -07:00
Howard Huang	8780f8fc3c	Remove extraneous process group agent test code (#60903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60903 RPC tests using process group backend were disabled for CI internally / externally. This is removing the code for process group (only) tests. Faulty agent tests which also use process group will be in a later PR. Test Plan: Imported from OSS Reviewed By: jbschlosser, mrshenli Differential Revision: D29440674 Pulled By: H-Huang fbshipit-source-id: 4724c189a110ac821c3f4f6f1f8a5c98e057a2a4	2021-06-29 14:21:56 -07:00
wuhuikx	d3de37609f	Support fused_dropout with XPU backend (#60231 ) Summary: ## Motivation Enable the fused dropout optimization on XPU devices. ## Solution Add XPU device in the fused dropout acceptable checking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60231 Reviewed By: jbschlosser Differential Revision: D29437659 Pulled By: ezyang fbshipit-source-id: b77245bb53d3ac93ab30a2a85994376ae5928c34	2021-06-29 14:20:17 -07:00
Feng Shi	b4a4a8434d	[1/n]support double for Caffe2 ScatterWeightedSum (#60402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60402 Add float64 data type support for ScatterWeightedSum for cases that 10^7 precision is not sufficient. Test Plan: buck test caffe2/caffe2/python/operator_test:sparse_ops_test -- testScatterWeightedSum Reviewed By: jianyuh Differential Revision: D29190324 fbshipit-source-id: 871a60744694e901a2c7685a67350860745d6729	2021-06-29 14:17:04 -07:00
Elton Leander Pinto	5f51406a51	Modify error message when atol=0 and rtol=0 (#60897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60897 Fixes #56377 Example output: #60898 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D29461107 Pulled By: 1ntEgr8 fbshipit-source-id: c6e15b299290aab6f8d5a19011c1d39279673f74	2021-06-29 14:17:02 -07:00
Raghavan Raman	6d952dbaf0	[nnc] Fixed checking for loop carried dependence while fusing 2D reduction loops (#60609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60609 Fixes #60310 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D29386144 Pulled By: navahgar fbshipit-source-id: 230df4f59d6196a250ea57ff649b117d096fcdbc	2021-06-29 14:17:01 -07:00
Yukio Siraichi	b099f5429c	Port `argmin` kernel to structured kernels. (#60364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60364 Tracking issue: #55070 This PR was openned so as to solve the CI failures in main when merging: #59371 #59372 #59373 #59937 #59938. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D29265855 Pulled By: ezyang fbshipit-source-id: ccee3810940542f8b370596105826c96b32231ec	2021-06-29 14:16:59 -07:00
Yukio Siraichi	3e2233841f	Port `argmax` to structured kernels. (#60363 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60363 Tracking issue: #55070 This PR was openned so as to solve the CI failures in main when merging: #59371 #59372 #59373 #59937 #59938. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29265857 Pulled By: ezyang fbshipit-source-id: 586914d2aa79028c56988896093945755a2b9781	2021-06-29 14:16:57 -07:00
Yukio Siraichi	df47fa5bdc	Using meta checks for unary `torch.all` and `torch.any`. (#60362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60362 This PR makes use of the newly implemented unified `at::meta::check_reduction` for validating the inputs and configuring its `TensorIterator`. This PR was openned so as to solve the CI failures in main when merging: #59371 #59372 #59373 #59937 #59938. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29265858 Pulled By: ezyang fbshipit-source-id: e8961b7da65a31acfed5ac3f5c1f5985ae81ec37	2021-06-29 14:16:56 -07:00
Lily Johnson	0dd90cceaf	[package] track storages across lifetime of PackageExporter (#59735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59735 1. Fixes ABA storage identity problem during serialization for `torch.package` by keeping reference of serialized storages through lifetime of `PackageExporter` to prevent reuse of memory address. Achieved by extending logic used in solution to mobile's same issue. 2. Adds determinism to naming scheme of serialized storages in export code paths which utilize `tensor_cdata_naming_scheme`(introduced 2nd mapping in `StorageContext`, now maps `storage cdata ptr` -> `unique id`, `unique id` -> `c10::Storage`) 3. Additionally uses presence of a storage in the `StorageContext` instance as marker for if a storage has been serialized or not, removing the need to scan the `PythonStreamWriter` for presence of the storage's serialization file Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D29075276 Pulled By: Lilyjjo fbshipit-source-id: 15a5c30b1de99c5bd7079388f2db9b6ece2eca12	2021-06-29 14:16:54 -07:00
Lily Johnson	eb2f535689	c10::Storage python to cpp converter and typecast (#59734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59734 Adds typecast logic to allow for c10::Storages to cross the Python/C++ barrier with pyBind Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D29075279 Pulled By: Lilyjjo fbshipit-source-id: 3e67b8525d308c5bccc64438ebac82b4d17ba462	2021-06-29 14:16:52 -07:00
David Riazati	93eba7471b	Remove fetch in clang-tidy setup (#60974 ) Summary: This was necessary previously since we'd have to diff against upstream in order to figure out what to run in clang-tidy, but now we pull this from GitHub https://github.com/pytorch/pytorch/issues/60045 so we can delete this part of the workflow Pull Request resolved: https://github.com/pytorch/pytorch/pull/60974 Reviewed By: mruberry Differential Revision: D29466036 Pulled By: driazati fbshipit-source-id: a9d619ab731e77bc69ab32b37cfb2c249e22a477	2021-06-29 14:15:34 -07:00
Victor Bittorf	91c076eadc	Add TorchVitals for DataLoader (#60959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60959 Add TorchVitals for Dataloader, this indicates that the data loader was enabled. This is a no-op if TORCH_VITALS environment variable is not set. Test Plan: buck test mode/dbg caffe2/test:torch -- --regex vitals Reviewed By: VitalyFedyunin Differential Revision: D29445146 fbshipit-source-id: d5778fff3dafb3c0463fec7a498bff4905597518	2021-06-29 14:08:32 -07:00
mingfeima	652d911f81	add BFloat16 support for LayerNorm CPU (#55210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55210 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D28836793 Pulled By: VitalyFedyunin fbshipit-source-id: 998298deedd7a18e45fb761a0a4e0d88b65f2e0c	2021-06-29 14:08:30 -07:00
Serhat Yilmaz	89d0e31fe5	[torch][repeat_interleave] Remove stream sync when output_size is given for scalar repeats (#60965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60965 Same as title. Simple change on tensor creation. Test Plan: Rely on existing signals and verify manually that sync is not happening. Reviewed By: ngimel Differential Revision: D29461773 fbshipit-source-id: 21d6ebfba08449da39fc7f109958f6c6978a4f32	2021-06-29 14:08:28 -07:00
Winston Smith	086f6e557e	Fix divide by zero error in the ASAN test (#60723 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60722 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60723 Reviewed By: jbschlosser Differential Revision: D29432147 Pulled By: albanD fbshipit-source-id: c82cd0df8e4a04ee561ca26ae821a8b61c13a698	2021-06-29 14:07:26 -07:00
Heitor Schueroff	ec9c03c234	Implemented torch.cov (#58311 ) Summary: Based from https://github.com/pytorch/pytorch/pull/50466 Adds the initial implementation of `torch.cov` similar to `numpy.cov`. For simplicity, we removed support for many parameters in `numpy.cov` that are either redundant such as `bias`, or have simple workarounds such as `y` and `rowvar`. cc PandaBoi closes https://github.com/pytorch/pytorch/issues/19037 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58311 Reviewed By: jbschlosser Differential Revision: D29431651 Pulled By: heitorschueroff fbshipit-source-id: 167dea880f534934b145ba94291a9d634c25b01b	2021-06-29 14:02:39 -07:00
Heitor Schueroff	8f658d537d	Improved JIT support for torch.einsum (#59265 ) Summary: Added JIT support for the vararg version of `torch.einsum`. Note that JIT does not support the Python's Ellipsis object (`...`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/59265 Reviewed By: VitalyFedyunin Differential Revision: D29328469 Pulled By: heitorschueroff fbshipit-source-id: 5e4b177fda93255251f45d735b00c08220f0f124	2021-06-29 14:01:21 -07:00
Edgar Andrés Margffoy Tuay	d46eb77b04	Improve CUDA extension building error/warning messages (#59665 ) Summary: See https://github.com/pytorch/pytorch/issues/55267 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59665 Reviewed By: mruberry Differential Revision: D29462248 Pulled By: ezyang fbshipit-source-id: 9de13a284a14a7cd24200b9684151ce652e1eb1e	2021-06-29 13:03:30 -07:00
Rohan Varma	12b63f4046	[DDP] Fix case where new tensors with no grad_fn are returned in DDP forward. (#60882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60882 Fixes https://github.com/pytorch/pytorch/issues/60733, which identified an issue with a previous PR that resulted in DDP no longer supporting cases where newly created tensors are returned that don't have a grad_fn. The result of this is the grad_fn is set to that of the `DDPSink` custom backward which results in errors during the backwards pass. This PR fixes the issue by ensuring we don't touch the `grad_fn` of the tensors if it is `None`. Added relevant tests as well. ghstack-source-id: 132632515 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D29423822 fbshipit-source-id: a9e01046c7be50aa43ffb955f6e0f48fef4bc881	2021-06-29 12:50:48 -07:00
Rohan Varma	1db2d9b0a8	[ProcessGroupNCCL] change WARNING to INFO (#60901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60901 Short-term fix to address https://github.com/pytorch/pytorch/issues/60752 . Longer-term fix is tracked here: https://github.com/pytorch/pytorch/issues/53658 and will involve detecting whether the user has called `torch.cuda.set_device` in their script and respecting that device if so, otherwise falling back to our current approach. ghstack-source-id: 132637336 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D29439322 fbshipit-source-id: 92a18fadbb514b1c029332b60fd48075874906ff	2021-06-29 12:46:47 -07:00
Ruslan Semenov	150c828803	Add lint rule to keep collect_env.py python2 compliant (#60946 ) Summary: Fixes T94400857 - [x] Add lint rule - [x] Verify lint rule works - [x] Fix torch/utils/collect_env.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/60946 Reviewed By: malfet, mruberry Differential Revision: D29457294 Pulled By: rsemenov fbshipit-source-id: 3c0670408d7aee1479e1de335291deb13a04ace9	2021-06-29 11:57:53 -07:00
Adam Simpkins	808d0e3353	[caffe2] update make_mnist_db and make_image_db to move strings into DB::Put() (#60919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60919 Update make_mnist_db.cc and make_image_db.cc to work with the DB API changes in D29204425 (`00896cb9ed`). This is similar to the changes to make_cifar_db.cc landed in D29374754 (`394f60b0fc`). ghstack-source-id: 132621346 Test Plan: buck build caffe2/binaries/... Reviewed By: valmikir Differential Revision: D29447314 fbshipit-source-id: 33aff85c24d8b785211287de23d46704c7eb0726	2021-06-29 11:52:43 -07:00
Eli Uriegas	fab1b6cc70	.github: Increase test shards for linux GPU (#60914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60914 Linux GPU tests are taking almost 4 hours to execute through, let's up the test shards for these jobs so they finish in a more timely fashion Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D29461968 Pulled By: seemethere fbshipit-source-id: a1eab08f9cd3abd8ceca48871fe702d0bccd8a3f	2021-06-29 10:44:01 -07:00
driazati	5fbca0d281	Use cpu docker image for cpu builds (#60920 ) Summary: This was set to use the [CUDA 10.0 image](https://hub.docker.com/r/pytorch/manylinux-cuda100) which hasn't been updated in quite a while, so fix it to use the up-to-date [cpu image](https://hub.docker.com/r/pytorch/manylinux-cpu) instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/60920 Reviewed By: janeyx99 Differential Revision: D29447897 Pulled By: driazati fbshipit-source-id: 6e89091110361d0ddda859bb266e229c6cf83c2d	2021-06-29 10:11:55 -07:00
Sam Estep	10b929bbfb	Make Jeff and Jithun .circleci/docker code owners (#60958 ) Summary: Following up on https://github.com/pytorch/pytorch/pull/60658#issuecomment-870681027. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60958 Reviewed By: 1ntEgr8 Differential Revision: D29460721 Pulled By: samestep fbshipit-source-id: 74badff6c4a17b3ff48dc2fc27d1faa9edeae097	2021-06-29 09:47:58 -07:00
DamonDeng	53489bc385	fix for #60319 , forcing to use fork as start method in test/test_dat… (#60868 ) Summary: fix for https://github.com/pytorch/pytorch/issues/60319 , forcing to use fork as start method in test/test_dataloader.py Fixes #{60319} Pull Request resolved: https://github.com/pytorch/pytorch/pull/60868 Reviewed By: mruberry Differential Revision: D29432876 Pulled By: ejguan fbshipit-source-id: 5da25f7cfaf8ea0803c0b1aacf2badd656799e16	2021-06-29 09:30:37 -07:00
Karen Zhou	4310044fec	update `unsafe` flag documentation (#60899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60899 modify documentation for `unsafe` flag in `parametrize.py` ghstack-source-id: 132591862 Test Plan: shouldn't modify code behavior but as a double check, `buck test mode/dev-nosan //caffe2/test:nn -- --exact 'caffe2/test:nn - test_register_and_remove_parametrization (test_nn.TestNN)'` https://pxl.cl/1L1fw Reviewed By: albanD Differential Revision: D29436688 fbshipit-source-id: 85499ad22b49ad992507b9ed5e7def8231cbfeba	2021-06-29 09:25:37 -07:00
Yi Wang	5b6818f08a	[Model Averaging] Enforce a synchronization before allreduce parameters (#60891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60891 This fix is particularly useful for local SGD when the averaging period is very small, which may cause the conflict between gradient allreduce within per-machine subgroup and the global parameter allreduce by the communication world. ghstack-source-id: 132564252 Test Plan: f281873295 (#Try1) failed due to the conflict between global process group and subgroup. ``` <Thread(configerator-monitor-singleton, started 139839806633728)> File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 890, in _bootstrap self._bootstrap_inner() File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 870, in run self._target(self._args, *self._kwargs) File "/tmp/jetter.gson7tr3/configerator/client.py", line 348, in _monitor_loop self._parent_thread.join(self._interval_ms / 1000) File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 1015, in join self._wait_for_tstate_lock(timeout=max(timeout, 0)) File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock elif lock.acquire(block, timeout): ``` Fixed after adding an explicit sync: f282044866, f282241800 Reviewed By: rohan-varma Differential Revision: D29434597 fbshipit-source-id: a4f777fc26f379639f85fda32de425cd3b337b33	2021-06-29 01:39:40 -07:00
Pritam Damania	fbd4cb1cd7	Fix error logging in common_distributed. (#60917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60917 The second line of error log didn't handle f-string properly. Before fix: ``` exiting process with exit code: {MultiProcessTestCase.TEST_ERROR_EXIT_CODE} ``` After fix: ``` exiting process 3 with exit code: 10 ``` ghstack-source-id: 132618199 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D29446574 fbshipit-source-id: f806ef0470cb6aa86fe3c404e1c895514abb6488	2021-06-28 19:32:17 -07:00
Scott Wolchok	d71e7ae740	[PyTorch][vulkan] Unify vtensor_from_vulkan to always return non-const ref (#59996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59996 Just like D28811477 (`dce8697aea`), there's no reason we can't give it this signature. ghstack-source-id: 132566618 Test Plan: CI Reviewed By: AshkanAliabadi Differential Revision: D29119070 fbshipit-source-id: d049d49c38099eef6c96e8f69909827e64376097	2021-06-28 19:25:13 -07:00
Patrick	7eef78597e	fixed launch bounds for grid sampler 3d (#60385 ) Summary: Changed launch bounds for grid_sampler_3d from 1024 to 512 and grid_sampler_3d_backward from 1024 to 256. Timing data (using Nvidia Titan-V): ![GridSampler3dTimingData](https://user-images.githubusercontent.com/22803332/122813457-d3c12300-d287-11eb-99c1-6572f539660f.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/60385 Reviewed By: jbschlosser Differential Revision: D29433741 Pulled By: ngimel fbshipit-source-id: 7f475d0c2e854ae65dd0f1fb0167dfae7e506ec9	2021-06-28 19:01:38 -07:00
Jeff Daily	d36ce61a5e	use explicitly non-returning GPU atomics (#60607 ) Summary: Enables an important performance optimization for ROCm, in light of the discussion in https://github.com/pytorch/pytorch/issues/41028. CC jithunnair-amd sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60607 Reviewed By: jbschlosser Differential Revision: D29409894 Pulled By: ngimel fbshipit-source-id: effca258a0f37eaefa35674a7fd19459ca7dc95b	2021-06-28 18:17:29 -07:00
Sam Estep	d62c3ea354	[skip ci] Add GitHub Actions label for g3.16xlarge (#60888 ) Summary: Prerequisite for https://github.com/pytorch/pytorch/issues/60221. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60888 Reviewed By: seemethere Differential Revision: D29436592 Pulled By: samestep fbshipit-source-id: b3254139ec9c46c533f8f951a9ede3b372a65536	2021-06-28 15:49:52 -07:00
Sam Estep	d5a44f9f12	Use expecttest from PyPI (#60658 ) Summary: This PR removes `torch/testing/_internal/expecttest.py` in favor of https://github.com/ezyang/expecttest. See also https://github.com/ezyang/ghstack/pull/71. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60658 Test Plan: CI. Reviewed By: ezyang Differential Revision: D29430763 Pulled By: samestep fbshipit-source-id: b7cdc7ba37330176149fd465312118e2254ae92e	2021-06-28 15:43:34 -07:00
Bert Maher	ddb1f293b6	Fix the NNC-disabled path in static runtime for perf comparisons Summary: The path which has NNC/LLVM disabled still constructs a tensor expression, even though `supports()` will always return false, so a `KernelScope` is necessary to manage those memory allocations. I guess we could avoid building the TEs at all in this case, but it's pretty clean this way. Test Plan: ``` scripts/bertrand/static_runtime/run.sh ``` Reviewed By: hlu1 Differential Revision: D29415909 fbshipit-source-id: dde43de8516b9a2cf9f5f7f3699962bf9ccd8c30	2021-06-28 15:39:07 -07:00
Angela Yi	9b94aa5356	[quant][fx][fix] Fused modules with object_type in qconfig (#60779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60779 When we do fusion, we replace certain modules (such as Linear + ReLU) with fused versions (such as LinearReLU) by calling `_fuse_fx` in prepare_fx. However when we try to look up using the fused module type in qconfig_dict, we cannot find a match anymore since the qconfig dict contains the original module types. An example is here [N882873](https://fburl.com/anp/azenjx3v). So we will now update the qconfig_dict to include the fused modules mapping to the qconfigs used for the modules that make up the fused modules. If the modules are not mapped to the same qconfig, then we will raise an error. Test Plan: `python test/test_quantization.py TestFuseFx.test_qconfig_fused_module` Imported from OSS Reviewed By: supriyar Differential Revision: D29406941 fbshipit-source-id: 74b5db89f4998aeb02b2bf7c37bf97326580c654	2021-06-28 15:22:22 -07:00
cyy	cadce14e02	don't return in __init__ functions (#60830 ) Summary: Fix some warnings from a code analyzer Pull Request resolved: https://github.com/pytorch/pytorch/pull/60830 Reviewed By: jbschlosser Differential Revision: D29433638 Pulled By: albanD fbshipit-source-id: 148df1d8a0a79778f18e8b6abffbddef36c5031c	2021-06-28 14:56:13 -07:00
Andrew Gallagher	9af8aecd00	[caffe2/libtorch] Remove already-owned source Summary: This source is already owned by a more fine-grained rule, so avoid a package boundary violation by having it also be owned by an outer rule. Test Plan: CI Reviewed By: aniketmathur Differential Revision: D29422794 fbshipit-source-id: 432accc969abcb4d56bd97341a07029926939ea0	2021-06-28 14:45:34 -07:00
Andrew Gallagher	eeea696c02	[caffe2] Fix include of corresponding header Summary: AFAICT, this include was a typo, and meant to be the corresponding header for this .cpp, but instead pulled in an unrelated header. Test Plan: CI Reviewed By: igorsugak Differential Revision: D29422993 fbshipit-source-id: cc9bb29ee1f1007b68c6666ea8e389f6f39928af	2021-06-28 14:45:32 -07:00
Andrew Gallagher	c3977bf3da	[caffe2/utils] Add some fine-grained rules to avoid package boundary violations Test Plan: CI Reviewed By: igorsugak Differential Revision: D29401295 fbshipit-source-id: e921e5578c1fcc8df6bd670ae9f95722b8e32d85	2021-06-28 14:45:30 -07:00
Andrew Gallagher	03de807d81	[caffe2/utils] Add explicit rule to avoid package boundary violation (#60677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60677 Add a rule to wrap conversions.h and depend on that, rather than relying on a glob which violates package boundaries. Test Plan: `buck2 build fbcode//caffe2/caffe2:caffe2_core` Reviewed By: mzlee Differential Revision: D29370841 fbshipit-source-id: d4dd383eb8457d4f5118574e34e6f17c32fde647	2021-06-28 14:43:30 -07:00
Sam Estep	41c380e649	Enable bionic-cuda10.2-cudnn7-py3.9-gcc7 in GHA (#60204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60204 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D29430679 Pulled By: samestep fbshipit-source-id: 9380f5535cd370ec7aabf609a6170c8cb4df505d	2021-06-28 13:08:36 -07:00
David Reiss	971cdafd15	Upgrade benchmark to v1.5.5 (#60750 ) Summary: This fixes the build for gcc 11. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60750 Test Plan: CI Reviewed By: larryliu0820 Differential Revision: D29394541 Pulled By: dreiss fbshipit-source-id: 61557431b52a3e898ffcc32f97133b3ea94a838f	2021-06-28 13:03:03 -07:00
Karen Zhou	007ba37c9a	[pruning] Speedup activation reconstruction (#60683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60683 Vectorized reconstruction without for loops Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1KSJQ Reviewed By: z-a-f Differential Revision: D29370805 fbshipit-source-id: 75402437654a0b6f6391c8590bbe3f6fe3f43d8f	2021-06-28 12:58:21 -07:00
Karen Zhou	f302e0c781	[pruning] Additional pruning tests (#60681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60681 Adding additional pruning tests for more complex models and more pruned rows Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1KQ2Z Reviewed By: z-a-f Differential Revision: D29347546 fbshipit-source-id: cb65e564dd46d24f4aca1b00dd915ee8d64f8318	2021-06-28 12:58:20 -07:00
Karen Zhou	8d4a6ef962	[pruning] Activation reconstruction (#60292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60292 Added activation reconstruction in the `reconstruct` method Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1KLl1 Reviewed By: z-a-f Differential Revision: D29236569 fbshipit-source-id: 1ad085f4143eb9fa3efca51e00d810e0fdb7e9b1	2021-06-28 12:58:18 -07:00
Karen Zhou	965dad25a5	Allow resizing of parametrized tensors (#60418 ) Summary: Modify `parametrize.py` to allow resizing of parametrized tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/60418 Test Plan: `buck test mode/dev-nosan //caffe2/test:nn -- --exact 'caffe2/test:nn - test_register_and_remove_parametrization (test_nn.TestNN)'` https://pxl.cl/1L0wh Reviewed By: z-a-f Differential Revision: D29279442 Pulled By: kazhou fbshipit-source-id: 4d94915748f896e7761a40ad18f4c6444f505c3a	2021-06-28 12:57:11 -07:00
kshitij12345	956faea585	[fix] cauchy sampling inf on cuda (#60186 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59144 As pointed by ngimel, the issue is indeed with calling `tan`. However the C++ `std::tan` [documenation](https://en.cppreference.com/w/cpp/numeric/math/tan) states that ``` The function has mathematical poles at π(1/2 + n); however no common floating-point representation is able to represent π/2 exactly, thus there is no value of the argument for which a pole error occurs. ``` All `torch.tan`,`numpy.tan` and `math.tan` are compliant with the above statement. <details> ```python import torch import math import numpy as np # Single Precision print(torch.tan(torch.tensor(math.pi, device='cuda', dtype=torch.float32) * 0.5)) print(np.tan(np.array(np.pi, dtype=np.float32) * 0.5)) # Double Precision print(math.tan(math.pi * 0.5)) print(torch.tan(torch.tensor(math.pi, device='cuda', dtype=torch.double) * 0.5)) print(np.tan(np.array(np.pi, dtype=np.float64) * 0.5)) ``` Output ``` tensor(-22877334., device='cuda:0') -22877332.42885646 1.633123935319537e+16 tensor(1.6331e+16, device='cuda:0', dtype=torch.float64) 1.633123935319537e+16 ``` </details> So this issue stems from the use of `__tanf` faster approximation of tan from CUDA library (for float16, bfloat16 and float). `8a839c5478/aten/src/ATen/NumericUtils.h (L91-L100)` The fix in the PR is to use the slower but more correct version. Benchmark:: ``` [ cauchy : input dtype torch.float16 device cuda ] \| Before \| After 1 threads: ------------------------------------- (128,) \| 3.8 \| 4.3 (256, 128) \| 3.8 \| 4.2 (2, 512, 256) \| 3.8 \| 4.2 (2, 64, 256, 128) \| 22.8 \| 29.6 (4, 2, 512, 256, 128) \| 649.6 \| 869.3 Times are in microseconds (us). [ cauchy : input dtype torch.bfloat16 device cuda ] \| Before \| After 1 threads: ------------------------------------- (128,) \| 3.8 \| 4.3 (256, 128) \| 3.8 \| 4.3 (2, 512, 256) \| 3.8 \| 4.3 (2, 64, 256, 128) \| 23.8 \| 30.8 (4, 2, 512, 256, 128) \| 682.5 \| 904.2 Times are in microseconds (us). [ cauchy : input dtype torch.float32 device cuda ] \| Before \| After 1 threads: -------------------------------------- (128,) \| 3.8 \| 4.2 (256, 128) \| 3.7 \| 4.2 (2, 512, 256) \| 3.7 \| 4.2 (2, 64, 256, 128) \| 35.3 \| 37.1 (4, 2, 512, 256, 128) \| 1020.0 \| 1058.3 Times are in microseconds (us). [- cauchy : input dtype torch.float64 device cuda ] \| Before \| After 1 threads: ---------------------------------------- (128,) \| 3.8 \| 4.2 (256, 128) \| 8.0 \| 8.0 (2, 512, 256) \| 46.0 \| 46.0 (2, 64, 256, 128) \| 669.2 \| 669.4 (4, 2, 512, 256, 128) \| 21255.0 \| 21262.1 Times are in microseconds (us). ``` <details> Benchmark Script: ```python import torch import itertools import time from torch.utils.benchmark import Timer from torch.utils.benchmark import Compare import sys import pickle print('Using pytorch %s' % (torch.__version__)) cuda_shapes = [(128,), (256, 128), (2, 512, 256), (2, 64, 256, 128), (4, 2, 512, 256, 128)] cuda_dtypes = [torch.half, torch.bfloat16, torch.float, torch.double] results = [] repeats = 10 for device in ['cuda']: dtypes = cuda_dtypes shapes = cuda_shapes for dtype in dtypes: for shape in shapes: t = torch.randn(shape, device=device, dtype=dtype) * 10 tasks = [("t.cauchy_()", "After", "")] timers = [Timer(stmt=stmt, label=f"cauchy : input dtype {dtype} device {device}", sub_label=f"{(shape)}", description=desc, globals=globals()) for stmt, desc, label in tasks] for i, timer in enumerate(timers * repeats): results.append( timer.blocked_autorange() ) print(f"\r{i + 1} / {len(timers) * repeats}", end="") sys.stdout.flush() with open('after-pr.pkl', 'wb') as f: pickle.dump(results, f) comparison = Compare(results) comparison.print() ``` Compare Script: ``` import torch import itertools import time from torch.utils.benchmark import Timer from torch.utils.benchmark import Compare import sys import pickle with open('before-pr.pkl', 'rb') as f: after_results = pickle.load(f) with open('after-pr.pkl', 'rb') as f: before_results = pickle.load(f) comparison = Compare(after_results + before_results) comparison.print() ``` </details> TODO: * [x] Add comment Pull Request resolved: https://github.com/pytorch/pytorch/pull/60186 Reviewed By: jbschlosser Differential Revision: D29433897 Pulled By: ngimel fbshipit-source-id: 9c5f14b83e3372bed72369f70eed9256c04385c6	2021-06-28 12:49:30 -07:00
David Riazati	70e205a2ab	Use the new URL for docs preview link (#60893 ) Summary: This is all set up on CloudFront now with a custom domain, so we don't need the long default cloudfront domain anymore Pull Request resolved: https://github.com/pytorch/pytorch/pull/60893 Reviewed By: malfet Differential Revision: D29437300 Pulled By: driazati fbshipit-source-id: 6f5ffd1b10c5167b0022b7e64b2164508624ca91	2021-06-28 12:45:04 -07:00
Elton Leander Pinto	f5e5ced202	Enable parallel clang-tidy on ec2 runner (#60870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60870 This PR makes `clang-tidy` run on our self-hosted runner in a parallel fashion. Fixes #60867 Test Plan: #60871 Reviewed By: jbschlosser Differential Revision: D29434240 Pulled By: 1ntEgr8 fbshipit-source-id: cead30ed718ddf5e14b13afe70cb209aa16b44a0	2021-06-28 11:45:44 -07:00
Elton Leander Pinto	c8fb785857	Print stdout and stderr to console on parallel runs (#60869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60869 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29434155 Pulled By: 1ntEgr8 fbshipit-source-id: 925c9d832775dbb710af9367c07962f3367fda38	2021-06-28 11:29:12 -07:00
Jeff Yang	a8057e7ef1	docs: add `permute` in torch docs (#60821 ) Summary: fix https://github.com/pytorch/pytorch/issues/60181 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60821 Reviewed By: VitalyFedyunin Differential Revision: D29431949 Pulled By: jbschlosser fbshipit-source-id: 2353afceaa188315cde1f0c955897c4750809c8e	2021-06-28 11:20:35 -07:00
Han-Hsien Huang	d7c58e5a04	[vulkan] Implement tanh activation function (#60695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60695 As title. Implement tanh in Vulkan. Test Plan: Build Pytorch repository with the build command in P425131222. Run test command `pytorch/build/bin/vulkan_api_test` Output: {F627752306} Reviewed By: SS-JIA Differential Revision: D29375071 fbshipit-source-id: 2d613a9542774719dd78524757a677e3b2450c74	2021-06-28 10:58:44 -07:00
Angela Yi	da70dd199d	[quant] Input-Weight Equalization - tests (#60378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60378 Created the following unit-tests to check that our equalization algorithm is as expected: - Check the equalization scales calculated and stored in the graph are as expected - Check the scaled weights and biases are as expected - Check that the min/max values in the quantization observers are as expected - Check that the graphs with equalization are structured in the same way as graphs without equalization (except that equalized graphs have additional equalization scale and mul nodes) before and after quantization Test Plan: `python test/test_quantization TestEqualizeFx.test_input_weight_equalization_equalization_scales` `python test/test_quantization TestEqualizeFx.test_input_weight_equalization_weights_bias` `python test/test_quantization TestEqualizeFx.test_input_activation_values` `python test/test_quantization TestEqualizeFx.test_input_weight_equalization_graphs` Imported from OSS Reviewed By: supriyar Differential Revision: D29406942 fbshipit-source-id: 518208546ae5835c1ebb2af217507e90af66fbe4	2021-06-28 10:44:29 -07:00
Angela Yi	dfb9c0bae8	[quant] Input-Weight Equalization - support for connected F.linear layer (#60272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60272 Test Plan: `python test/test_quantization.py TestEqualizeFx` Original model: ``` FunctionalLinear2Module( (linear1): Linear() (linear2): Linear() ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %linear1_w : [#users=1] = get_attr[target=linear1.w] %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0](args = (%linear1_w,), kwargs = {}) %linear1_w_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0_equalization_process_0](args = (%linear1_w_activation_post_process_0,), kwargs = {}) %linear1_b : [#users=1] = get_attr[target=linear1.b] %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0_equalization_process_0, %linear1_w_activation_post_process_0_equalization_process_0), kwargs = {bias: %linear1_b}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) %linear_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0_equalization_process_0](args = (%linear_activation_post_process_0,), kwargs = {}) %linear2_w : [#users=1] = get_attr[target=linear2.w] %linear2_w_activation_post_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0](args = (%linear2_w,), kwargs = {}) %linear2_w_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0_equalization_process_0](args = (%linear2_w_activation_post_process_0,), kwargs = {}) %linear2_b : [#users=1] = get_attr[target=linear2.b] %linear_1 : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%linear_activation_post_process_0_equalization_process_0, %linear2_w_activation_post_process_0_equalization_process_0), kwargs = {bias: %linear2_b}) %linear_1_activation_post_process_0 : [#users=1] = call_module[target=linear_1_activation_post_process_0](args = (%linear_1,), kwargs = {}) return linear_1_activation_post_process_0 ``` Graph after equalization steps: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %linear1_w : [#users=1] = get_attr[target=linear1.w] %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0](args = (%linear1_w,), kwargs = {}) %linear1_b : [#users=1] = get_attr[target=linear1.b] %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) %linear2_w : [#users=1] = get_attr[target=linear2.w] %linear2_w_activation_post_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0](args = (%linear2_w,), kwargs = {}) %linear2_b : [#users=1] = get_attr[target=linear2.b] %linear_1 : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%linear_activation_post_process_0, %linear2_w_activation_post_process_0), kwargs = {bias: %linear2_b}) %linear_1_activation_post_process_0 : [#users=1] = call_module[target=linear_1_activation_post_process_0](args = (%linear_1,), kwargs = {}) return linear_1_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {}) %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0] %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {}) %linear1_packed_weight_0 : [#users=1] = get_attr[target=linear1_packed_weight_0] %linear1_scale_0 : [#users=1] = get_attr[target=linear1_scale_0] %linear1_zero_point_0 : [#users=1] = get_attr[target=linear1_zero_point_0] %linear : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%quantize_per_tensor, %linear1_packed_weight_0, %linear1_scale_0, %linear1_zero_point_0), kwargs = {}) %linear2_packed_weight_0 : [#users=1] = get_attr[target=linear2_packed_weight_0] %linear2_scale_0 : [#users=1] = get_attr[target=linear2_scale_0] %linear2_zero_point_0 : [#users=1] = get_attr[target=linear2_zero_point_0] %linear_1 : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%linear, %linear2_packed_weight_0, %linear2_scale_0, %linear2_zero_point_0), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear_1,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29267218 fbshipit-source-id: 6b97bed1a307f1d0b1f5efcbecf41f35418242f7	2021-06-28 10:44:27 -07:00
Angela Yi	ddf2ce03bb	[quant] Input-Weight Equalization - support for connected linear layers (#60034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60034 Added support for equalizing models with connected linear layers. To account for connected linear layers, we will additionally multiply the previous weight values (row-wise) by the next equalization scale, and remove the input equalization observer between the two linear layers. We also want to scale the bias by the next equalization scale. The math is shown here: https://fb.quip.com/fK8rA9aRM4ca . Original Model: `x -> linear1 -> linear2` After `prepare_fx`: `x -> InpEqObs -> InpQuantObs -> linear1 -> OutQuantObs -> InpEqObs -> linear2` After equalization: `x -> mul -> InpQuantObs -> linear1 -> OutQuantObs -> linear2` Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_convert` Original Model: ``` Linear2Module( (linear1): Linear(in_features=2, out_features=2, bias=True) (linear2): Linear(in_features=2, out_features=2, bias=True) ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {}) %linear1 : [#users=1] = call_module[target=linear1](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {}) %linear1_activation_post_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0](args = (%linear1,), kwargs = {}) %linear1_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0_equalization_process_0](args = (%linear1_activation_post_process_0,), kwargs = {}) %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1_activation_post_process_0_equalization_process_0,), kwargs = {}) %linear2_activation_post_process_0 : [#users=1] = call_module[target=linear2_activation_post_process_0](args = (%linear2,), kwargs = {}) return linear2_activation_post_process_0 ``` Graph after equaliation functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0_equalization_process_0_scale : [#users=1] = get_attr[target=x_activation_post_process_0_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_activation_post_process_0_equalization_process_0_scale), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {}) %linear1 : [#users=1] = call_module[target=linear1](args = (%x_activation_post_process_0,), kwargs = {}) %linear1_activation_post_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0](args = (%linear1,), kwargs = {}) %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1_activation_post_process_0,), kwargs = {}) %linear2_activation_post_process_0 : [#users=1] = call_module[target=linear2_activation_post_process_0](args = (%linear2,), kwargs = {}) return linear2_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_activation_post_process_0_equalization_process_0_scale : [#users=1] = get_attr[target=x_activation_post_process_0_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_activation_post_process_0_equalization_process_0_scale), kwargs = {}) %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0] %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {}) %linear1 : [#users=1] = call_module[target=linear1](args = (%quantize_per_tensor,), kwargs = {}) %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear2,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29204347 fbshipit-source-id: 6bb9e25e2468f50df523885ded2edc731f002ac1	2021-06-28 10:44:25 -07:00
Angela Yi	7917318917	[quant] Input-Weight Equalization - support for F.linear layers (#59964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59964 Input-Weight Equalization support for functional layers Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_convert` Original model: ``` FunctionalLinearModule( (linear1): Linear() ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {}) graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {}) %linear1_w : [#users=1] = get_attr[target=linear1.w] %linear1_w_equalization_process_0 : [#users=1] = call_module[target=linear1_w_equalization_process_0](args = (%linear1_w,), kwargs = {}) %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_00](args = (%linear1_w_equalization_process_0,), kwargs = {}) %linear1_b : [#users=1] = get_attr[target=linear1.b] %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) return linear_activation_post_process_0 ``` Graph after equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%mul,), kwargs = {}) %linear1_w : [#users=1] = get_attr[target=linear1.w] %linear1_w_equalization_process_0 : [#users=1] = call_module[target=linear1_w_equalization_process_0](args = (%linear1_w,), kwargs = {}) %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_00](args = (%linear1_w_equalization_process_0,), kwargs = {}) %linear1_b : [#users=1] = get_attr[target=linear1.b] %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) return linear_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {}) %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0] %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {}) %linear1_packed_weight_0 : [#users=1] = get_attr[target=linear1_packed_weight_0] %linear1_scale_0 : [#users=1] = get_attr[target=linear1_scale_0] %linear1_zero_point_0 : [#users=1] = get_attr[target=linear1_zero_point_0] %linear : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%quantize_per_tensor, %linear1_packed_weight_0, %linear1_scale_0, %linear1_zero_point_0), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29135459 fbshipit-source-id: 1e69bfbb82a0c89538e55b64968effd0b11b2fde	2021-06-28 10:44:24 -07:00
joerg-de	387289d4a5	support non-contiguous tensor in bilinear (#38409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38409 Reviewed By: anjali411 Differential Revision: D29361043 Pulled By: albanD fbshipit-source-id: 05147a9b0f7a47204bcd5ff70e281a464e8de1e6	2021-06-28 10:43:21 -07:00
albanD	f118d20bea	Make requires grad check run only when grad mode is enabled (#60740 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60740 Reviewed By: ngimel Differential Revision: D29405934 Pulled By: albanD fbshipit-source-id: 35c537939a3871f5a0d2146543506e4d07465724	2021-06-28 10:40:30 -07:00
Edward Yang	3ad3f20bff	Add an optional Device parameter to pin_memory/is_pinned that does nothing (#60201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60201 This is to flush out BC/FC problems with adding this parameter. Later PR will actually add the desired functionality. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29331880 Pulled By: ezyang fbshipit-source-id: 6036716d6ae55e6ea7ef2348b6c34a39613c8dd5	2021-06-28 10:38:52 -07:00
Edward Yang	85af24f52b	Remove some unnecessary functions from CUDAHooks (#59655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59655 CUDAHooks is to be used solely when you need to call into CUDA functionality from a context where you cannot directly link to CUDA libraries. Neither of hasPrimaryContext nor getDevceIndexWithPrimaryContext (sic) needs to be used in such contexts. By moving them out of CUDAHooks and calling them directly a dynamic dispatch can be skipped. I also fixed the typo in getDev(i)ceIndexWithPrimaryContext Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28972946 Pulled By: ezyang fbshipit-source-id: edcd7a7b62aec97928f07fbf3bf413b9fb027517	2021-06-28 10:38:51 -07:00
Freey0	b52849b589	Port silu_backward to structured (#58661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58661 I removed dispatch: CompositeImplicitAutograd: math_silu_backward Definitely not right, but I don't know how it works with structured core. Keeping it will trigger an assertion failure ``` assert dispatch.keys() != {DispatchKey.CompositeImplicitAutograd}, \ f"unexpected name for singleton CompositeImplicitAutograd dispatch entry: expected {cpp.name(func)} " \ f"but got {dispatch[DispatchKey.CompositeImplicitAutograd]}. Rename your implementation to the expected " \ "name, then delete the dispatch table" ``` Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D28572530 Pulled By: ezyang fbshipit-source-id: 410f03bddf79cda7c9f0fd66f697383ee2925d32	2021-06-28 10:37:45 -07:00
Richard Barnes	66f01db36c	Make some comparisons explicit (#60505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60505 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29313240 fbshipit-source-id: 3f558e68cbb0328326d7540e2b3bd0c2e12ba3e2	2021-06-28 10:33:59 -07:00
Rohan Varma	f5341bd5e6	Enhance ProcessGroupWrapper with additional checks + refactor (#60237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60237 Closes https://github.com/pytorch/pytorch/issues/58711 This diff refactors the collective consistency checking in `ProcessGroupWrapper` as described in the above issue. In particular, we no longer run separate verification checks (`all_gather`s) for shapes, op type, etc. Instead, we implement a function `serialize_fingerprint` to serialize all this data into a single tensor and only verify that. This has the benefit of being a lot more extensible, the developer does not need to add separate `all_gather` calls in order to verify additional data in the future. We can also provide some sort of mechanism where we allow data that needs to be verified to be "registered" in the `CollectiveFingerPrint` struct and make it even easier to add additional data, we can consider doing this if there are significant additions to `process group wrapper`. We now also begin to check tensor `dtypes` and device types for consistency as well. Tests are refactored/added accordingly. ghstack-source-id: 132520261 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D28597287 fbshipit-source-id: b09f14f628df9e2457623ba81fc13fd4e214f3c9	2021-06-28 10:24:11 -07:00
Kiuk Chung	aaea81e3fb	[torch/distributed] remove outdated FutureWarning in distributed/elastic/util/store.py (#60807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60807 Addresses: https://github.com/pytorch/pytorch/issues/60717 This warning should have been removed since this code is no longer in "experimental" mode. Test Plan: N/A - just removing experimental warning that should've been removed. Reviewed By: H-Huang, aivanou Differential Revision: D29412972 fbshipit-source-id: 16a8a98abde70a4ae0c1ac1b14bda339cb44863a	2021-06-28 10:22:16 -07:00
Richard Barnes	94cdbbf48d	Paren-matching kernel launch check without external deps (#60778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60778 Matches parens and the opening `<<<` to make a more accurate kernel launch check. Test Plan: ``` buck test //caffe2/test:kernel_launch_checks ``` Reviewed By: ngimel Differential Revision: D29401624 fbshipit-source-id: 8649af7c33e67dbb24044af0134b1cea6f2e5dc3	2021-06-28 10:18:04 -07:00
Amy He	88b0518a83	Python error unit tests on delegation of backend_with_compiler_demo (#60689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60689 Added a test for errors that occur with a compiler, specifically when an operator is not supported by the backend. ghstack-source-id: 132485207 Test Plan: Running python test/test_jit.py TestBackendsWithCompiler -v returns a success. Imported from OSS Reviewed By: iseeyuan Differential Revision: D29374513 fbshipit-source-id: ac52b315a01719eaa4985680939239ae058d277b	2021-06-28 09:33:03 -07:00
Thomas J. Fan	e63db3ae46	ENH Adds byte support for nll_loss (CUDA) (#60650 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59765 This PR adds byte support for nll_loss on CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60650 Reviewed By: albanD Differential Revision: D29429456 Pulled By: jbschlosser fbshipit-source-id: 894c969ed6bfc6117dee8e844a7cb5b99977247c	2021-06-28 08:20:13 -07:00
Elton Leander Pinto	7f6b2bc2d0	Add -I<directory> option to tools/linter/clang_tidy.py (#60745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60745 Fixes #60739 Test Plan: Run this command: ``` python3 tools/linter/clang_tidy.py --paths torch/csrc/fx -I/usr/include/path -I/usr/include/another/path --print-include-paths ``` Output: If the paths don't exist, you should see this: ``` ignoring nonexistent directory "/usr/include/path" ignoring nonexistent directory "/usr/include/another/path" ``` If the paths exist, you should see them listed. Reviewed By: ngimel Differential Revision: D29395227 Pulled By: 1ntEgr8 fbshipit-source-id: c89650546d45887cd39e574da07f08bcfec686e0	2021-06-28 06:56:02 -07:00
Natalia Gimelshein	5b118a7f23	Don't reference reflection_pad3d in functional.py (#60837 ) Summary: To work around FC issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/60837 Reviewed By: jbschlosser Differential Revision: D29421142 Pulled By: ngimel fbshipit-source-id: f5c1d9c324173b628e286f9005edf7109162066f	2021-06-27 20:54:32 -07:00
Ilqar Ramazanli	f0e972a481	To add Nesterov Adam algorithm for multi-tensor optimizers API (#59165 ) Summary: Previously in the PR: https://github.com/pytorch/pytorch/issues/59009 we added NAdam to Optimizers. Here in this PR we are proposing multi-tensor version of NAdam for PyTorch. Nadam has been proposed in the paper https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ and report and report : http://cs229.stanford.edu/proj2015/054_report.pdf by Timothy Dozat. It has been one of the most used algorithm in Deep Learning community. It worth to noting that the implementation of NAdam is inspired by the implementation for Keras : `f9d3868495/tensorflow/python/keras/optimizer_v2/nadam.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59165 Reviewed By: vincentqb Differential Revision: D29360577 Pulled By: iramazanli fbshipit-source-id: 0fe14016303b2df2cb8cc31912a2674acf63d1e5	2021-06-27 17:00:41 -07:00
Mikhail Zolotukhin	3bfe15085d	[TensorExpr] Add a mechanism to register custom TS->NNC lowerings in TensorExprKernel. (#60804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60804 The lowerings are stored as a map c10::Symbol -> std::function and the signature of thoese functions match the signature of `computeOperandValue`. Custom lowerings have higher priority over the standard ones, i.e. we can redefine how a given op is lowered. In general this feature is aimed at unblocking users whose models contain ops that are not yet supported by NNC - it allows to quickly add a custom lowering for a given op. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D29409580 Pulled By: ZolotukhinM fbshipit-source-id: e8e8dc9d3cb9155cfbf5c08a4216ba1b5b791a60	2021-06-27 15:27:22 -07:00
Ilqar Ramazanli	5563f4bda0	To add Rectified Adam algorithm for multi-tensor optimizers API (#59161 ) Summary: Previously in the PR: https://github.com/pytorch/pytorch/issues/58968 we added RAdam to Optimizers. Here in this PR we are proposing multi-tensor version of RAdam for PyTorch. Radam has been proposed in the paper https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. It has been one of the most used algorithm in Deep Learning community. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4 as it is the common practice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59161 Reviewed By: vincentqb Differential Revision: D29360576 Pulled By: iramazanli fbshipit-source-id: 7ccdbf12b1ee7f12e66f7d7992123a70cc818b6b	2021-06-27 13:01:20 -07:00
Ansley Ussery	0fbc471d10	Support default values on NamedTuple fields (#54682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54682 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D27327241 Pulled By: ansley fbshipit-source-id: 76546f1770d50ebc3435bba3b74540e3c6be8a1c	2021-06-26 15:18:21 -07:00
Rong Rong (AI Infra)	6b53792f18	fix cuda mem leak check not properly run on master_builds (#60742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60742 improved CI_MASTER flag check logic, since it can be unset, true or false Test Plan: search for `PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK` in logs below: - Before adding ci/master: - build workflow (`PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=1`): https://circleci.com/api/v1.1/project/github/pytorch/pytorch/14394913/output/107/0?file=true&allocation-id=60d5fd2fa55ae50282aec997-0-build%2F10295B30 - After adding ci/master label: - build workflow (`PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=0`): https://circleci.com/api/v1.1/project/github/pytorch/pytorch/14398213/output/107/0?file=true&allocation-id=60d61cf8bb9d097afc7a11aa-0-build%2F400138F1 - master build workflow (`PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=0`): https://circleci.com/api/v1.1/project/github/pytorch/pytorch/14398198/output/107/0?file=true&allocation-id=60d61ca3467438480c963290-0-build%2F2999C909 Reviewed By: ngimel Differential Revision: D29405732 Pulled By: walterddr fbshipit-source-id: 09dd653cbb47ca61b1f8872851bda6db8db671b9	2021-06-26 07:05:32 -07:00
Hao Lu	e3abccec8a	[Static Runtime] Remove output type constraints (#60669 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60669 Test Plan: Added unit test to check for nested outputs. Reviewed By: ajyu Differential Revision: D29322025 fbshipit-source-id: a3c8d3c5f0bb7cf7fda4bc5f579adb8fa7bc3724	2021-06-26 02:36:27 -07:00
Takeshi Watanabe	dae25c2002	Fix missing spaces in error of constant_pad_nd (#60729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60729 Reviewed By: ZolotukhinM Differential Revision: D29404422 Pulled By: ngimel fbshipit-source-id: c40458c7a6ae33f61c680bff8de778a80658c250	2021-06-25 19:20:03 -07:00
Richard Barnes	9a08e87d8b	Modernize for-loops in aten (#59598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59598 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D28946826 fbshipit-source-id: 9f3f7e38833c2bc33d27243cef16ab0118c65f3a	2021-06-25 19:02:00 -07:00
Xiong Wei	7e3a694b23	supports non-leaf inputs for autograd.backward() function (#60521 ) Summary: Close https://github.com/pytorch/pytorch/issues/60268 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60521 Reviewed By: ngimel Differential Revision: D29393586 Pulled By: albanD fbshipit-source-id: 2dd2de427ecfecca8d544237bacf690e0b7c918c	2021-06-25 18:57:26 -07:00
albanD	056a8e0d5c	Remove un-used parameter in _trilinear backward (#60673 ) Summary: This argument is only important for speed and memory usage. So it is ok to ignore it during the backward. As discussed, we might want to change this to speed up backward in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60673 Reviewed By: soulitzer Differential Revision: D29370125 Pulled By: albanD fbshipit-source-id: ad50b3ea530aeb194f5a51845523b517a50f2c71	2021-06-25 17:47:10 -07:00
Yi Wang	f262217101	[Model Averaging] Move step out of model averaging API (#60632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60632 Address the comment https://github.com/pytorch/pytorch/pull/60320#discussion_r654845062 ghstack-source-id: 132340278 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager Reviewed By: rohan-varma Differential Revision: D29355609 fbshipit-source-id: 50a6f13ed70b5a5b5b92ead2f3d7082c11277af5	2021-06-25 17:20:52 -07:00
Ivan Yashchuk	c5f0692b6e	Sparse CSR: increase dtype test coverage (#60656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60656 This PR uses `torch.testing.get_all_dtypes()` for dtype parametrisation of tests in `test_sparse_csr.py`. It adds previously excluded from tests bool, half, bfloat16, complex dtypes. `torch.complex32` is omitted due to lack of coverage and lack of specialized `AT_DISPATCH...`. The process of adding more dtypes to tests releaved that `.to_dense()` doesn't work for all dtypes. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D29408058 Pulled By: cpuhrsch fbshipit-source-id: 319b6f51b9786d6957d508f51657657a6d00267a	2021-06-25 17:11:21 -07:00
mingfeima	dd045ab540	add channels last for AdapativeMaxPool2d (#48920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48920 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D25399467 Pulled By: VitalyFedyunin fbshipit-source-id: d9d2cc728cc7a18a26983e96d3c3e81a23659e89	2021-06-25 16:36:20 -07:00
Will Constable	367aff91d8	Fix missing #pragma once in jit/method.h Summary: it seems to be accidentally missing Test Plan: run CI Reviewed By: suo Differential Revision: D29335990 fbshipit-source-id: 2790bc10d141f9484a0807ff7800024a02fd9cfa	2021-06-25 16:32:54 -07:00
Victor Bittorf	8b6487c650	Add CUDA Vital (#58059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58059 Add CUDA.used vital sign which is true only if CUDA was "used" which technically means the context was created. Also adds the following features: - Force vitals to be written even if vitals are disabled, to enable testing when the env variable is not set from the start of execution - Add a read_vitals call for python to read existing vital signs. Test Plan: buck test mode/dbg caffe2/test:torch -- --regex basic_vitals Reviewed By: xuzhao9 Differential Revision: D28357615 fbshipit-source-id: 681bf9ef63cb1458df9f1c241d301a3ddf1e5252	2021-06-25 16:31:11 -07:00
Brian Hirsh	9134b0e42f	add a boxed CPU fallback kernel (#58065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58065 This PR replaces the existing code-generated CPU fallback kernels that XLA uses with a single boxed CPU fallback. Current state: there are a couple different design ideas that I want to point out, but the logic for the actually kernel is mostly done and passing tests. ### Design To preface, I'm not 100% tied to the current design and I'm putting the PR up now for opinions and totally open to alternatives, some of which I listed below. Actually after writing this description, I'm leaning toward the following changes: * Confirm whether or not we can remove all C++ logging info directly in the yaml. Current Design All of the CPU fallback codegen is deleted. In its place, XLA (and other external backends, later) can choose to opt into a CPU fallback by adding the following code in a C++ file. I have an corresponding [xla-side PR with the xla changes](https://github.com/pytorch/xla/pull/2945/files#diff-1a005c10039f0cb11130a3b740f5de716d2f10acaea121017016025861886798R1). There's no actual requirement to split up the code into a .h and .cpp file, but that's necessary in the XLA case because they sometimes need to call the fallback directly from their handcrafted kernels. ``` // xla_cpu_fallback.h #include <ATen/native/CPUFallback.h> ... void xla_cpu_fallback(const c10::OperatorHandle& op, torch::jit::Stack* stack); ... ``` ``` // xla_cpu_fallback.cpp #include "torch_xla/csrc/aten_cpu_fallback.h" ... void xla_cpu_fallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) { // Do custom logging here ... // Call the actual boxed CPU fallback. at::native::cpu_fallback(op, stack); } TORCH_LIBRARY_IMPL(_, XLA, m) { m.fallback(torch::CppFunction::makeFromBoxedFunction<&xla_cpu_fallback>()); } ``` Now that the fallback is exposed in the backend, they can call it directly. Doing so requires converting from an unboxed to a boxed context, which we provide a utility function before. E.g.: ``` #include <ATen/native/CPUFallback.h> at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) { .... if (...call_fallback...) { return at::native::call_fallback_fn<&xla_cpu_fallback, decltype(at::addmm)>::call("aten::addmm", self, mat1, mat2, beta, alpha); } ... } ``` That `decltype(at::addmm)` logic isn't actually used everywhere in the xla-side PR yet, since you hit issues with overloads. I could use it everywhere once #58092 lands. Alternatives: The API for calling the CPU fallback directly is ugly, can we make it nicer? We could change the api to use `at::redispatch`, which would make it look something like this: ``` at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) { .... if (...call_fallback...) { return at::redispatch::addmm(c10::DispatchKeySet(c10::DispatchKey::CPUFallback), self, mat1, mat2, beta, alpha); } ... } ``` Which definitely feels cleaner, but also requires adding a new DispatchKey just for this use case. Conditionally calling the CPU fallback doesn't sound like a hugely important use case, so I don't know if giving up one of our 64 dispatch key slots is worth the API improvement. Totally open to other opinions though! Another more mild improvement that would avoid having to pass operator string names (including overloads) around would be to codegen (yet another) namespaced API. Something like this: ``` at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) { .... if (...call_fallback...) { return at::fallback::addmm<&xla_cpu_fallback>(self, mat1, mat2, beta, alpha); } ... } ``` Writing that out actually I actually like it more (I think it'll let us get rid of `decltype(...)`). Maybe that is nice enough to warrant a new codegen API - I haven't tried adding that yet, but if people like it I'm happy to try it out. More alternatives The current design also involves the backend manually writing and registering the boxed fallback themselves, but an alternative would be for us to do it in codegen too: they would just need to pass in all of the C++ logging that they want done in the fallback, directly through the yaml. The main downsides: * Backend code that wants to call the fallback needs to abide by whatever convention our codegen uses to name the generated boxed fallback. * Passing custom C++ logging through yaml is just more fragile: right now xla uses an `iostream` to log each tensor arg in the operator, so we'd have to either force other backends into the same convention or figure something else out later. To be fair, we actually already do that: XLA has custom per-tensor-arg logging for all of the generated `out` wrappers in the codegen, which we do by passing their C++ logging info through the yaml. This seems unnecessary though, since `out` wrappers just call into a functional kernel, which is hand written with its own custom logging. So my take is: try to remove custom C++ logging from the yaml, and if it turns out to be really necessary, then we may as well take advantage of that to codegen the fallback. ### Performance impact While ops that fall back to CPU aren't exactly hot path, we probably don't want to use a boxed fallback if it turns out to be an absolute perf killer. I ran my benchmarks using callgrind, benchmarking both `at::add` and `at::add_out` run on XLA. My callgrind benchmark for `at::add` can be found here (the add_out benchmark looks basically the same): https://www.internalfb.com/phabricator/paste/view/P415418587. I created the benchmark by hacking the existing xla C++ test build scripts and throwing in a reference to callgrind. I also attached the full callgrind output for each benchmark; the full output is actually pretty noise and hard to parse, but I focused on everything underneath the `at::add()` call in the output, which was much more stable. My guess is that it's due to some heavyweight async startup processing that xla does. `at::add`: before: 88,505,130 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415421001 after: 102,185,654 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415421273 delta: ~15.5% increase `at::add_out`: before: 63,897,395 instructions. Full output: https://www.internalfb.com/intern/everpaste/?handle=GBrrKwtAPlix9wUEAOZtrFXpdO5UbsIXAAAz after: 73,170,346 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415423227 delta: ~14.5% increase High level takeaway: A framework overhead increase of 10-20% doesn't seem too horrible for the CPU fallback use case. For structured, functional ops that requires a CPU fallback, we're actually in an unfortunate situation: we're doing even more work than necessary. Our codegen automatically creates a `CompositeExplicitAutograd` kernel which calls into the `out` operator. So the extra work that we end up doing is: * An extra dispatcher hop: (at::add -> CompositeExplicitAutograd -> CPUFallback -> at::native::add) instead of (at::add -> CPUFallback -> at::native::add) * An unnecessary tensor allocation (the CompositeExplicitAutograd kernel uses at::empty() to create an output tensor, which is immediately overwritten by the CPU fallback) * An unnecessary meta() call (the CompositeExplicitAutograd kernel calls it to create the output tensor, but we call it again in the CPU kernel). * unboxing->boxing->unboxing logic (this is the only strictly required piece) There are definitely ways to avoid the unnecessary work explained above: one would be to give the boxed fallback higher priority than composite keys (there's [an issue for it here](https://github.com/pytorch/pytorch/issues/55104)), and codegen fallthroughs for all composite ops. It'll require more infra to set up, so I see it as more of a perf knob that we can apply if we need it later. Unfortunately I couldn't dig much deeper into the differences aside from the aggregate change in instructions, since it looks like callgrind fudged some of the instruction attribution (`at::to_cpu` takes up a ton of instructions, but I don't see any attribution for the `at::native::add` kernel anywhere). Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D28833085 Pulled By: bdhirsh fbshipit-source-id: 537ebd5d7fb5858f1158764ff47132d503c3b92b	2021-06-25 16:26:50 -07:00
Hongbo Zhang	ad69e2fd11	[torch] Module fix on the support of LazyModule on bug #60132 (#60517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60517 This is to fix the module support on lazymodulefixin on the bug issue #60132 Check the link: https://github.com/pytorch/pytorch/issues/60132 We will have to update lazy_extension given the dependency on module.py and update the unit test as well. Test Plan: Unit test passes torchrec test passes Reviewed By: albanD Differential Revision: D29274068 fbshipit-source-id: 1c20f7f0556e08dc1941457ed20c290868346980	2021-06-25 16:20:19 -07:00
Basil Hosmer	cab926b2c0	faster generate_square_subsequent_mask in nn.Transformer (#60631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60631 Per #48360, speed up `Transformer.generate_square_subsequent_mask`. New impl is informally ~5x faster, though absolute difference is probably small. PR includes Python and C++ versions as well as a couple of places where the previous impl had been copied around. Test Plan: Imported from OSS Reviewed By: jbschlosser, albanD Differential Revision: D29356673 Pulled By: bhosmer fbshipit-source-id: 4c062ba0ead61a445aeef451c78777bf0b3a631e	2021-06-25 16:07:01 -07:00
Ansley Ussery	7585783b8d	Remove `Optional[None]` annotations (#60704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60704 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D29380281 Pulled By: ansley fbshipit-source-id: 055c17329a35375de4ebd058ee6d127475aad373	2021-06-25 15:53:58 -07:00
David Riazati	5ed7400b75	Fix doc preview source directory (#60792 ) Summary: `merge` is the directory with the actual changes, not `master`. Verified by downloading arficats from https://github.com/pytorch/pytorch/pull/60777/checks and searching through the result. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60792 Reviewed By: walterddr Differential Revision: D29405288 Pulled By: driazati fbshipit-source-id: 419c943727c00429945c1f116645bfa22fb12456	2021-06-25 15:46:30 -07:00
Basil Hosmer	7b933cd9ea	configurable pre/post LayerNorm in nn.Transformer (#60593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60593 Per #55270, this PR makes it configurable whether to run LayerNorm before or after other operations in Transformer layers. However, it leaves for a separate PR the removal of the LayerNorm performed after the final encoder/decoder layer has run, which is redundant when LayerNorms has been run after other in-layer operations (problem described in #24930 #50086 #51447). Note: this means that transformers built with `nn.Transformer()` are now configurable, but will still contain a redundant LayerNorm when configured as before. However, callers of the `TransformerEncoder` and `TransformerDecoder` classes have always been able to avoid this redundancy. Reviewer notes: 1. Ran across this during other work, don't know if anybody's working on it already (most recent conversation in issues seems to be from early April). Happy to abandon if so. 2. Was looking for a quick way to add tests but it looks like the existing ones in test_nn just compare against snapshots. I could add something similar, but curious if there's any prepackaged way to add a test that LayerNorm-first (the new option) yields model that trains properly, etc. 3. New code in the `forward`s was written to minimize diff churn rather than maximize beauty :P happy to pretty it up if desired. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29356590 Pulled By: bhosmer fbshipit-source-id: 308669326990b8923aab5fcd96e03b582fb21f24	2021-06-25 15:43:35 -07:00
angelayi	e13a9587b4	Revert "Revert D29135358: [quant] Input-Weight Equaliaztion - convert modifications" (#60646 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60646 This reverts commit e60f9cfc58fb2fe3e2e7f65fcdbbf350e5b55a75. Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D29361191 Pulled By: angelayi fbshipit-source-id: 275d8691d8e47da4ab80bb21b51d77ec25a0f714	2021-06-25 15:37:05 -07:00
Mikhail Zolotukhin	7188d84ccf	[Tools] Update path in clang_format_utils after #60473 (#60782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60782 PR #60473 introduced a new folders nesting level, this change updates clang_format_utils.py to accordingly adjust the way it sets up root path. Test Plan: Imported from OSS Reviewed By: zhxchen17 Differential Revision: D29403622 Pulled By: ZolotukhinM fbshipit-source-id: 6404271615c2d263834cf538ab0153c4d41cc5c3	2021-06-25 14:30:45 -07:00
Adam Simpkins	394f60b0fc	[caffe2] update make_cifar_db to move the string into DB::Put() (#60692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60692 Update make_cifar_db.cc to work with the DB API changes in D29204425 (`00896cb9ed`). Test Plan: buck build caffe2/binaries:make_cifar_db Differential Revision: D29374754 fbshipit-source-id: 23d2acd24031d11071791e398433b537215ffd38	2021-06-25 14:02:24 -07:00
Ilqar Ramazanli	e1bd4963e2	To intorduce Functional API for multi-tensor (#60735 ) Summary: In this PR we change Multi-Tensor Optimizers to Functional API. We can see that in the file : https://github.com/pytorch/pytorch/blob/master/torch/optim/_functional.py , there has been functional API defined for most of Optimizers. However we do not have similar file / functionality for multi tensors : https://github.com/pytorch/pytorch/tree/master/torch/optim/_multi_tensor Therefore we are adding it in this PR here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60735 Reviewed By: vincentqb Differential Revision: D29392253 Pulled By: iramazanli fbshipit-source-id: cebc8e7b07ab11156370f5297cfb419cd9f20b46	2021-06-25 13:09:26 -07:00
Richard Barnes	8f16a38067	Add missing kernel checks (#60635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60635 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29355747 fbshipit-source-id: 20bae292703a54b2895a33c11e6f1b8b9a9d8195	2021-06-25 12:54:40 -07:00
lezcano	dfc8247d33	Faster cumsum and cumprod backwards (#60642 ) Summary: Piggybacking on https://github.com/pytorch/pytorch/pull/58747, now we can implement the backwards of `cumsum` and `cumprod` without tricks. This minimises the number of kernels that are launched in GPU, so we see a reasonable speed-up on GPU. We should also get a better stability for ill-conditioned inputs, as we do not perform any numerical tricks to get the result. Note that the benchmarks test forward + backward, so the true speed-up on the backward should be even faster. Even more so in `cumsum`, as it requires less operations than the backward of `cumprod`. <details> <summary> Test Script </summary> ```python from itertools import product import torch from torch.utils.benchmark import Compare, Timer def get_timer(ndims, prod_dim, dim, num_threads, device): size = [500]ndims size[dim] = prod_dim x = torch.rand(size, device=device, requires_grad=True) # Make sure there are no zeros as the formula for the backward # that we are testing is for when the backward has no zeros with torch.no_grad(): x.add_(1e-3) grad = torch.ones_like(x) timer = Timer( "torch.autograd.grad([x.cumprod(dim)], [x], grad_outputs=[grad])", globals={"x": x, "dim": dim, "grad": grad}, label=f"Cumprod + Backwards {device}", description=f"dim: {dim}", sub_label=f"prod_dim: {prod_dim}", num_threads=num_threads, ) return timer.blocked_autorange(min_run_time=5) def get_params(): ndims = 3 dims = range(ndims) prod_dims = [10, 100, 500] for dim, prod_dim, device in product(dims, prod_dims, ("cpu", "cuda")): threads = (1, 2, 4) if device == "cpu" else (1,) for num_threads in threads: yield ndims, prod_dim, dim, num_threads, device compare = Compare([get_timer(*params) for params in get_params()]) compare.trim_significant_figures() compare.print() ``` </details> <details> <summary> Benchmark PR </summary> ``` [------------ Cumprod + Backwards cpu -------------] \| dim: 0 \| dim: 1 \| dim: 2 1 threads: ----------------------------------------- prod_dim: 10 \| 11 \| 14 \| 12 prod_dim: 100 \| 260 \| 270 \| 260 prod_dim: 500 \| 1400 \| 1550 \| 1360 2 threads: ----------------------------------------- prod_dim: 10 \| 6 \| 6 \| 6 prod_dim: 100 \| 170 \| 166 \| 167 prod_dim: 500 \| 902 \| 950 \| 858 4 threads: ----------------------------------------- prod_dim: 10 \| 4 \| 3 \| 3 prod_dim: 100 \| 110 \| 108 \| 106 prod_dim: 500 \| 576 \| 590 \| 547 Times are in milliseconds (ms). [------------ Cumprod + Backwards cuda ------------] \| dim: 0 \| dim: 1 \| dim: 2 1 threads: ----------------------------------------- prod_dim: 10 \| 562 \| 566 \| 1075 prod_dim: 100 \| 5388 \| 5394 \| 6697 prod_dim: 500 \| 28170 \| 27580 \| 30740 Times are in microseconds (us). ``` </details> <details> <summary> Benchmark master </summary> ``` [------------ Cumprod + Backwards cpu -------------] \| dim: 0 \| dim: 1 \| dim: 2 1 threads: ----------------------------------------- prod_dim: 10 \| 11 \| 13 \| 12 prod_dim: 100 \| 270 \| 270 \| 256 prod_dim: 500 \| 1500 \| 1590 \| 1300 2 threads: ----------------------------------------- prod_dim: 10 \| 6 \| 6 \| 6 prod_dim: 100 \| 170 \| 170 \| 164 prod_dim: 500 \| 911 \| 940 \| 840 4 threads: ----------------------------------------- prod_dim: 10 \| 4 \| 4 \| 4 prod_dim: 100 \| 111 \| 109 \| 105 prod_dim: 500 \| 570 \| 590 \| 536 Times are in milliseconds (ms). [------------ Cumprod + Backwards cuda ------------] \| dim: 0 \| dim: 1 \| dim: 2 1 threads: ----------------------------------------- prod_dim: 10 \| 616 \| 597 \| 1109 prod_dim: 100 \| 5976 \| 5723 \| 7017 prod_dim: 500 \| 31110 \| 29160 \| 32320 Times are in microseconds (us). ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/60642 Reviewed By: ngimel Differential Revision: D29366368 Pulled By: albanD fbshipit-source-id: b0d692ce030352965c2f152e0f92fbb61fc5ebde	2021-06-25 12:44:12 -07:00
David Riazati	d3bec9f4d2	Use S3 for documentation previews (#60711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60711 We already build the docs on each PR, this adds a step to push the relevant folder of the docs (we build the entire website for pytorch.github.io which clocks in at around 500 MB, but we really only need the "master" docs, not every version. The master docs by themselves are around 50 MB which is more reasonable). It uses the same S3 bucket as the artifacts but places the items at the `pytorch/pytorch/pr-previews/<pr number>` prefix. The bucket has a rule to expire resources in that prefix after 1 month. On the AWS side the bucket has static hosting enabled with CloudFront directing to the docs preview prefix, so you can see the output at `https://d28slxzaq48q8t.cloudfront.net/<pr number>/`, e.g. https://d28slxzaq48q8t.cloudfront.net/60711/. For advertising we could link this on the HUD PR page as well as in the Dr. CI comment. We could add a CNAME on CloudFront to make this be `pr-preview.pytorch.org/<pr number>` or something but having random PRs be able to host content on the pytorch.org domain seems sketchy. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D29398818 Pulled By: driazati fbshipit-source-id: 24032854d83815853b3650d8e54f60b684707f76	2021-06-25 12:12:26 -07:00
Edward Yang	aacc722aec	Dispatch to Python via __torch_dispatch__ (#59760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59760 See https://github.com/pytorch/pytorch/issues/59049 There are some moving parts to this PR, I'll structure this explanation so the straightforward parts go first, and then the less straightforward parts. The actual dispatch to Python. The core logic of dispatch to Python lives in `concrete_dispatch_fn` in `torch/csrc/autograd/python_variable.cpp`. It takes the input IValue stack, scans all the arguments for Tensor arguments, and defers most of the heavy lifting to `handle_torch_function_no_python_arg_parser` which actually does all of the logic for calling out to torch dispatch (in particular, this function handles multiple dispatch situations for you). Because we have a different function name than regular `__torch_function__` handling, `handle_torch_function_no_python_arg_parser` is generalized to accept a magic method name to look for when testing if Tensors have custom handling or not. Unlike `__torch_function__`, by default there is no `__torch_dispatch__` on Tensor classes. Maintaining the Python dispatch key. In order to get to the dispatch to Python logic, we must tag Tensors with the `__torch_dispatch__` magic method with the newly added Python dispatch key (separated from PythonFuncTorch to allow for a transitional period while they migrate to this mechanism). We expose a new private property `_is_python_dispatch` that assists in debugging if a Tensor is participating in Python dispatch or not. We apply the Python dispatch key the first time a PyObject for a Tensor is constructed (THPVariable_NewWithVar), testing if `__torch_dispatch__` exists with then newly added `check_has_torch_dispatch`. Shallow copy and detach. For the simple examples tested in this PR, most creations of Tensor route through the dispatcher. The exception to this is `shallow_copy_and_detach`, which bypasses the dispatcher and is used when saving tensors for backwards. When a Tensor is Python dispatch, we override the behavior of `shallow_copy_and_detach` to instead directly call into `__torch_dispatch__` to perform a `detach` operation (in the same way it would be invoked if you called `detach` directly). Because this Python call is triggered directly from c10::TensorImpl, it must be indirected through `PyInterpreter::detach`, which is the general mechanism for dynamic dispatching to the Python interpreter associated with a TensorImpl. torchdeploy compatibility. The dispatch to Python logic cannot be directly registered to the dispatcher as it is compiled in the Python library, which will get loaded multiple times per torchdeploy interpreter. Thus, we must employ a two phase process. First, we register a fallback inside a non-Python library (aten/src/ATen/core/PythonFallbackKernel.cpp). Its job is to determine the appropriate PyInterpreter to handle the Python dispatch by going through all of the arguments and finding the first argument that has a PyObject/PyInterpreter. With this PyInterpreter, it makes another dynamic dispatch via "dispatch" which will go to the correct torchdeploy interpreter to handle dispatching to actual Python. Testing. We provide a simple example of a LoggingTensor for testing, which can be used to generate TorchScript-like traces to observe what operations are being called when a Tensor is invoked. Although a LoggingTensor would be better implemented via an is-a relationship rather than a has-a relationship (as is done in the test), we've done it this way to show that arbitrarily complex compositions of tensors inside a tensor work properly. Known limitations. * We haven't adjusted any operator code, so some patterns may not work (as they lose the Python subclass in an unrecoverable way) * `__torch_function__` must be explicitly disabled with `_disabled_torch_function_impl` otherwise things don't work quite correctly (in particular, what is being disabled is default subclass preservation behavior.) * We don't ever populate kwargs, even when an argument is kwarg-only Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D29017912 D29017912 Test Plan: Imported from OSS Reviewed By: bdhirsh Pulled By: ezyang fbshipit-source-id: a67714d9e541d09203a8cfc85345b8967db86238	2021-06-25 11:50:32 -07:00
Aswin John Mathews	a53d7f8f7c	Remove test linalg test skips from MAGMA integration (#58232 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55552; majority of cases in https://github.com/pytorch/pytorch/issues/51303 Tests in torch/testing/_internal/common_methods_invocations.py (tested through test_ops) cannot be fully removed, since the machines seem to be running out of gpu memory during the test, and needs further analysis Pull Request resolved: https://github.com/pytorch/pytorch/pull/58232 Reviewed By: ngimel Differential Revision: D29394021 Pulled By: malfet fbshipit-source-id: f108a70af33beec908ac1c0b58467f8744e6fe87	2021-06-25 11:44:49 -07:00
Elton Leander Pinto	8216da1f23	Use python3.6 compatible APIs in clang_tidy.py (#60659 ) Summary: This PR make `tools/clang_tidy.py` use python 3.6 APIs for `asyncio` and `shlex`. I ran into some issues when running this script with the `-j` flag inside of the clang-tidy docker image (which uses python 3.6). Specifically, the functions `asycnio.run` and `shlex.join` are available in python >= 3.8. This change does not affect CI because we do not run the clang-tidy job in parallel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60659 Reviewed By: albanD Differential Revision: D29377851 Pulled By: 1ntEgr8 fbshipit-source-id: 92ab7ee6782b78d40ffccd03f1718ede4204d948	2021-06-25 10:35:03 -07:00
Edgar Andrés Margffoy Tuay	6322f66878	Add python version and cuda-specific folder to store extensions (#60592 ) Summary: See https://github.com/pytorch/pytorch/issues/55267 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60592 Reviewed By: albanD Differential Revision: D29353368 Pulled By: ezyang fbshipit-source-id: 1fbcd021f1030132c0f950f33ce4a3a2fef351e0	2021-06-25 10:27:04 -07:00
Masaki Kozuki	a404cc9a7b	CUDA `addcmul` and `addcdiv` do math in float for 16 bits I/O (#60715 ) Summary: Currently foreach `addcmul` and `addcdiv` cast scalar to float so that actual math is done in FP32 when tensor dtype is Float16/BFloat16 while regular `addcmul` and `addcdiv`, not. ### Reproducible steps to see the behavioral difference ```ipython In [1]: import torch; torch.__version__ Out[1]: '1.9.0' In [2]: a, b, c = torch.tensor([60000.0], device='cuda', dtype=torch.half), torch.tensor([60000.0], device='cuda', dtype=torch.half), torch.tensor([-1.0], device='cuda', dtype=torch.half) In [4]: torch.addcmul(a, b, c, value=2) Out[4]: tensor([-inf], device='cuda:0', dtype=torch.float16) In [5]: torch._foreach_addcmul([a], [b], [c], value=2)[0] Out[5]: tensor([-60000.], device='cuda:0', dtype=torch.float16) ``` ### How foreach casts? Foreach addcmul and addcdiv cast scalar to `opmath_t` (almost equivalent to acc_type) here: `42c8439b6e/aten/src/ATen/native/cuda/ForeachPointwiseOp.cu (L30)` and cast inputs and results here: `42c8439b6e/aten/src/ATen/native/cuda/ForeachFunctors.cuh (L133-L135)` Related to https://github.com/pytorch/pytorch/issues/58833 #60227 https://github.com/pytorch/pytorch/issues/60454 cc ptrblck mcarilli ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/60715 Reviewed By: albanD Differential Revision: D29385715 Pulled By: ngimel fbshipit-source-id: 8bb2db19ab66fc99d686de056a6ee60f9f71d603	2021-06-25 10:21:35 -07:00
Rohan Varma	0be65cd52a	[c10d] Fix test_collective_hang flakiness (#60662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60662 Fixes this flaky test. Basically, sometimes a rank can exit the test early before rank 0 calls into allreduce. In this case Gloo will throw connection reset error on all other ranks. ghstack-source-id: 132363151 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D29364806 fbshipit-source-id: ce0c292a2166edad57ea0dbb76df12cfd560a10d	2021-06-25 10:15:18 -07:00
Elton Leander Pinto	474bdaf54d	Add --print-include-paths option to tools/linter/clang_tidy.py (#60744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60744 Fixes #60739 Test Plan: Run this comand: ``` python3 tools/linter/clang_tidy.py --paths torch/csrc/fx --print-include-paths ``` Output (varies from machine to machine): ``` (clang-tidy output) . . . clang -cc1 version 11.0.0 based upon LLVM 11.0.0 default target x86_64-unknown-linux-gnu ignoring nonexistent directory "nccl/include" ignoring nonexistent directory "/include" ignoring duplicate directory ".." ignoring duplicate directory "../aten/src" ignoring duplicate directory "../third_party/onnx" ignoring duplicate directory ".." ignoring duplicate directory ".." ignoring duplicate directory "../torch/lib" ignoring duplicate directory "../torch/../third_party/gloo" as it is a non-system directory that duplicates a system directory ignoring duplicate directory "../third_party/ideep/mkl-dnn/src/../include" as it is a non-system directory that duplicates a system directory #include "..." search starts here: #include <...> search starts here: aten/src ../aten/src . .. ../cmake/../third_party/benchmark/include caffe2/contrib/aten ../third_party/onnx third_party/onnx ../third_party/foxi third_party/foxi ../torch/../aten/src/TH caffe2/aten/src third_party ../torch/../third_party/valgrind-headers ../torch/csrc ../torch/csrc/api/include ../torch/lib ../torch/lib/libshm ../torch/csrc/api third_party/ideep/mkl-dnn/include ../third_party/fmt/include third_party/gloo ../torch/../third_party/gloo ../cmake/../third_party/googletest/googlemock/include ../cmake/../third_party/googletest/googletest/include ../third_party/protobuf/src /data/users/eltonpinto/miniconda3/envs/pytorch/include ../third_party/gemmlowp ../third_party/neon2sse ../third_party/XNNPACK/include ../third_party ../cmake/../third_party/eigen /home/eltonpinto/local/miniconda3/envs/pytorch/include/python3.8 /home/eltonpinto/local/miniconda3/envs/pytorch/lib/python3.8/site-packages/numpy/core/include ../cmake/../third_party/pybind11/include /usr/local/cuda-11.3/include ../third_party/ideep/mkl-dnn/src/../include ../third_party/ideep/include /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8 /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/x86_64-redhat-linux /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/backward /usr/local/include /usr/lib64/clang/11.0.0/include /usr/include . . . (more clang-tidy output) ``` Imported from OSS Reviewed By: ngimel Differential Revision: D29395398 fbshipit-source-id: e92077a9c4e9dee7f9d7e05df180d552e3763540	2021-06-25 10:12:15 -07:00
Elton Leander Pinto	608f12b818	Fix --dry-run option in tools/linter/clang_tidy.py (#60744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60744 Fixes #60741 Test Plan: Run this command: ``` python3 tools/linter/clang_tidy.py --paths torch/csrc/fx --dry-run ``` Output: ``` clang-tidy -p build -config '{"InheritParentConfig": true, "Checks": " bugprone-, -bugprone-forward-declaration-namespace, -bugprone-macro-parentheses, -bugprone-lambda-function-name, -bugprone-reserved-identifier, cppcoreguidelines-, -cppcoreguidelines-avoid-magic-numbers, -cppcoreguidelines-interfaces-global-init, -cppcoreguidelines-macro-usage, -cppcoreguidelines-owning-memory, -cppcoreguidelines-pro-bounds-array-to-pointer-decay, -cppcoreguidelines-pro-bounds-constant-array-index, -cppcoreguidelines-pro-bounds-pointer-arithmetic, -cppcoreguidelines-pro-type-cstyle-cast, -cppcoreguidelines-pro-type-reinterpret-cast, -cppcoreguidelines-pro-type-static-cast-downcast, -cppcoreguidelines-pro-type-union-access, -cppcoreguidelines-pro-type-vararg, -cppcoreguidelines-special-member-functions, -facebook-hte-RelativeInclude, hicpp-exception-baseclass, hicpp-avoid-goto, modernize-, -modernize-concat-nested-namespaces, -modernize-return-braced-init-list, -modernize-use-auto, -modernize-use-default-member-init, -modernize-use-using, -modernize-use-trailing-return-type, performance-, -performance-noexcept-move-constructor, -performance-unnecessary-value-param, ", "HeaderFilterRegex": "torch/csrc/.*", "AnalyzeTemporaryDtors": false, "CheckOptions": null}' torch/csrc/fx/fx_init.cpp ``` Reviewed By: ngimel Differential Revision: D29394538 Pulled By: 1ntEgr8 fbshipit-source-id: b824bc2aa63631f074e9ad17092e4e063d347395	2021-06-25 09:53:29 -07:00
lezcano	3a838e4ce3	Parametrizations depending on several inputs (#60530 ) Summary: Resubmit of https://github.com/pytorch/pytorch/pull/58488 There was a line that had been changed in `test_nn.py` as caught in https://github.com/pytorch/pytorch/pull/58488#discussion_r651267668 I reverted that line, which should never have been changed. I reckon that should solve the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60530 Reviewed By: ngimel Differential Revision: D29329865 Pulled By: albanD fbshipit-source-id: 8dfd0cd968fe26a3924dae7ca366af2c8a8639b3	2021-06-25 09:16:57 -07:00
Kevin Tse	8cba365378	Fix incorrect doc about the dtype for `torch.randint` described in issue #56347 (#60507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60507 Fix incorrect documentation about the dtype for `torch.randint` described in issue #56347 Test Plan: Review documentation to make sure formatting is right Reviewed By: bdhirsh Differential Revision: D29321181 fbshipit-source-id: caae69a9bbb30052da518a3f5d22a7ed3504cdd2	2021-06-25 07:51:36 -07:00
Martin Yuan	d8c3d555e4	[Delegate] Support composite of lowered sub modules of the same backend (#59921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59921 Test Plan: Imported from OSS Reviewed By: raziel Differential Revision: D29091143 Pulled By: iseeyuan fbshipit-source-id: 9ffcd18681917ece8ec73a34866c53701bdee1bc	2021-06-25 07:18:32 -07:00
Ilqar Ramazanli	7c2938bf67	To refactor Sparse Adam algorithm for functional form (#59171 ) Summary: Adds Functional Interface for Sparse Adam Optimizer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59171 Reviewed By: vincentqb Differential Revision: D29360582 Pulled By: iramazanli fbshipit-source-id: 5ceffd7f4b7abd1e0b758a5b8445abdf5555eba0	2021-06-25 06:35:39 -07:00
Xiaomeng Yang	963c983366	Improve numerical stability of LayerNorm (#59987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59987 Similar as GroupNorm, improve numerical stability of LayerNorm by Welford algorithm and pairwise sum. Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm" Reviewed By: ngimel Differential Revision: D29115235 fbshipit-source-id: 5183346c3c535f809ec7d98b8bdf6d8914bfe790	2021-06-25 02:22:42 -07:00
Protonu Basu	5b1f5c8f17	When creating a single parition skip the output nodes, but process possible nodes after it. (#60370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60370 When creating a single parition skip the output nodes, but process possible nodes after it. Test Plan: Run all CI tests. Reviewed By: jfix71 Differential Revision: D29265278 fbshipit-source-id: 2242009973a54498d8027cce5a294558a1206fdf	2021-06-24 23:50:30 -07:00
Hao Lu	2b51a8a935	[BackwardCompatibility] Remove aten::to from allow_list (#60147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60147 Remove aten::to from allow_list now that the aten::to schema change has landed (D29121620 (`eda2ddb5b0`)). Test Plan: CI Reviewed By: iseeyuan Differential Revision: D29187314 fbshipit-source-id: abdb5a560287a861f3858732f7b3da342ee4aa55	2021-06-24 22:57:57 -07:00
kshitij12345	3ca28656fa	[special] erfcx cuda support (#60519 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60519 Reviewed By: ngimel Differential Revision: D29353105 Pulled By: mruberry fbshipit-source-id: 2f525a347a22f96411739a16e354c7291e863f95	2021-06-24 21:50:37 -07:00
Garrett Cramer	46d27a53fe	cuda rpc backward sparse tensor fix (#59609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59609 quick fix for https://github.com/pytorch/pytorch/issues/58755 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D29335722 Pulled By: gcramer23 fbshipit-source-id: 0de7e0399b30f0934320f1e9abb1b92a45bcf929	2021-06-24 21:40:43 -07:00
Mike Ruberry	561132f902	Revert D29330585: [pytorch][PR] add BFloat16 support for arange on CPU Test Plan: revert-hammer Differential Revision: D29330585 (`375d201086`) Original commit changeset: b8a04cee0c3f fbshipit-source-id: dc138f9613becd083848e82d15c138d3883493c8	2021-06-24 20:57:43 -07:00
David Reiss	d63c236fb3	Introduce quantized convolution serialization format 3 (#60241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60241 We're going to make a forward-incompatible change to this serialization format soon, so I'm taking the opportunity to do a little cleanup. - Use int for version. This was apparently not possible when V2 was introduced, but it works fine now as long as we use int64_t. (Note that the 64-bits are only used in memory. The serializer will use 1 byte for small non-negative ints.) - Remove the "packed params" tensor and replace it with a list of ints. - Replace the "transpose" field with "flags" to allow more binary flags to be packed in. - Unify required and optional tensors. I just made them all optional and added an explicit assertion for the one we require. A bit of a hack: I added an always-absent tensor to the front of the tensor list. Without this, when passing unpacked params from Python to the ONNX JIT pass, they type would be inferred to `List[Tensor]` if all tensors were present, making it impossible to cast to `std::vector<c10::optional<at:Tensor>>` without jumping through hoops. The plan is to ship this, along with another diff that adds a flag to indicate numerical requirements, wait a few weeks for an FC grace period, then flip the serialization version. Test Plan: CI. BC tests. Reviewed By: vkuzo, dhruvbird Differential Revision: D29349782 Pulled By: dreiss fbshipit-source-id: cfef5d006e940ac1b8e09dc5b4c5ecf906de8716	2021-06-24 20:52:43 -07:00
Peter Bell	42c8439b6e	TH: Clean up dead code (#60655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60655 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29371717 Pulled By: ngimel fbshipit-source-id: faa71b1d4a15450c78e12aa917daec853057bce9	2021-06-24 19:42:16 -07:00
Peter Bell	4a7d281119	Migrate THAllocator to ATen (#60325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60325 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29371715 Pulled By: ngimel fbshipit-source-id: 78ec8368a48e1a4690d0664a0b02d2a235af98ff	2021-06-24 19:42:14 -07:00
Peter Bell	d586248544	Migrate THStorage_resizeBytes to ATen (CPU) (#60324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60324 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29371716 Pulled By: ngimel fbshipit-source-id: 056aee0ec87722090c133777b6948c28b03b37e4	2021-06-24 19:41:02 -07:00
Natalia Gimelshein	ddec2e0ef4	tentative fix for adaptiveavgpool gradient computation (#60630 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60524 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60630 Reviewed By: jbschlosser Differential Revision: D29374257 Pulled By: ngimel fbshipit-source-id: be05f0ceb53e6f0f0a59a83b710dafde469d4e8a	2021-06-24 19:02:32 -07:00
Nikita Shulga	40a7c317bc	Run BLAS F2C checks on host architecture (#60703 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60351 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60703 Reviewed By: driazati Differential Revision: D29379727 Pulled By: malfet fbshipit-source-id: dadbb1d39373887f07d59d0a05e093a5d070b016	2021-06-24 18:44:41 -07:00
Brian Hirsh	7bc86458e1	Revert "Revert D28833086: beef up at::_ops API" (#60214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60214 Relanding this PR, but with a fix for windows cuda builds (example failure in master here: https://github.com/pytorch/pytorch/runs/2852662871) This is identical to the original PR except for one change in `tools/codegen/gen.py`: `static constexpr` -> `static CONSTEXPR_EXCEPT_WIN_CUDA` This actually took a while to figure out, until I tracked down a previous pytorch PR that encountered a similar issue: https://github.com/pytorch/pytorch/pull/40675 This reverts commit 6d0fb85a623f5ef3f3f1a2afc3660cb71fa70511. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D29213932 Pulled By: bdhirsh fbshipit-source-id: b90c7c10e5a51f8d6173ddca673b418e5774c248	2021-06-24 18:08:54 -07:00
Nikita Shulga	9c4eec2a2d	Adjust path to distributed cpp tests (#60705 ) Summary: After https://github.com/pytorch/pytorch/issues/60543 they are installed in the same folder as the rest of the tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/60705 Reviewed By: driazati Differential Revision: D29380670 Pulled By: malfet fbshipit-source-id: a432d26c731e9220e00d8c800b1429b37d51655b	2021-06-24 17:42:36 -07:00
Alexander Grund	8395fdde46	Increase tolerance for some distributed tests to 5e-5 (#60462 ) Summary: On A100 GPUs 10 tests fail due to slightly higher deviations. This fixes those. Note that rtol is still the default and atol was increased by a factor of 5 (from 1e-5) The failing tests were: - test_accumulate_gradients_module - test_accumulate_gradients_module_with_grad_is_view - test_ddp_checkpointing_once - test_ddp_checkpointing_twice - test_ddp_checkpointing_unused_params - test_ddp_checkpointing_weight_sharing - test_nccl_backend_1gpu_module_device_ids_integer_list - test_nccl_backend_1gpu_module_device_ids_torch_device_list - test_nccl_backend_single_device_module_device_ids_None - test_nccl_backend_single_device_module_empty_device_id Pull Request resolved: https://github.com/pytorch/pytorch/pull/60462 Reviewed By: albanD Differential Revision: D29366145 Pulled By: zhaojuanmao fbshipit-source-id: c3e34c007363dfebf75ccb82004a67e4d2e6f3cd	2021-06-24 17:38:54 -07:00
Michael Carilli	2fa6c7627e	[CUDA graphs][BC-breaking] Removes post-backward syncs on default stream (#60421 ) Summary: Before https://github.com/pytorch/pytorch/pull/57833, calls to backward() or grad() synced only the calling thread's default stream with autograd leaf streams at the end of backward. This made the following weird pattern safe: ```python with torch.cuda.stream(s): # imagine forward used many streams, so backward leaf nodes may run on many streams loss.backward() # no sync use grads ``` but a more benign-looking pattern was unsafe: ```python with torch.cuda.stream(s): # imagine forward used a lot of streams, so backward leaf nodes may run on many streams loss.backward() # backward() syncs the default stream with all the leaf streams, but does not sync s with anything, # so counterintuitively (even though we're in the same stream context as backward()!) # it is NOT SAFE to use grads here, and there's no easy way to make it safe, # unless you manually sync on all the streams you used in forward, # or move "use grads" back to default stream outside the context. use grads ``` mruberry ngimel and I decided backward() should have the [same user-facing stream semantics as any cuda op](https://pytorch.org/docs/master/notes/cuda.html#stream-semantics-of-backward-passes). In other words, the weird pattern should be unsafe, and the benign-looking pattern should be safe. Implementationwise, this meant backward() should sync its calling thread's current stream, not default stream, with the leaf streams. After https://github.com/pytorch/pytorch/pull/57833, backward syncs the calling thread's current stream AND default stream with all leaf streams at the end of backward. The default stream syncs were retained for temporary backward compatibility. This PR finishes https://github.com/pytorch/pytorch/pull/57833's work by deleting syncs on the default stream. With this PR, graph-capturing an entire backward() call should be possible (see the [test_graph_grad_scaling diffs](https://github.com/pytorch/pytorch/compare/master...mcarilli:streaming_backwards_remove_default_syncs?expand=1#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3641-R3642)). first paragraph has a formatting error which this PR should also fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60421 Reviewed By: albanD Differential Revision: D29370344 Pulled By: ngimel fbshipit-source-id: 3248bc5fb92fc517db0c15c897e5d7250f67d7fe	2021-06-24 17:34:02 -07:00
albanD	d90aefe380	Improve error message for non-differentiable inputs (#60610 ) Summary: Improve the error message when inputs should not requires_grad=True. For example, we now get ``` RuntimeError: The function 'binary_cross_entropy' is not differentiable with respect to argument 'weight'. This input cannot have requires_grad True. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60610 Reviewed By: anjali411 Differential Revision: D29361424 Pulled By: albanD fbshipit-source-id: 38163ce11ae1b8df326424e95ca20e55fea2a99a	2021-06-24 17:29:16 -07:00
Garrett Cramer	4ed2d5d9bb	ps sparse rpc (#58003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58003 adds trainer class DdpTrainer adds trainer class DdpSparseRpcTrainer adds server class ParameterServerBase adds server class AverageParameterServer adds experiment ddp_cpu_sparse_rpc_nccl_allreduce adds experiment ddp_cuda_sparse_rpc_nccl_allreduce quip document https://fb.quip.com/iQUtAeKIxWpF Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29379696 Pulled By: gcramer23 fbshipit-source-id: 9cf5fb7398ba2fa3eb694afbddc4ed00d97f205f	2021-06-24 17:21:49 -07:00
Adam Simpkins	fadaa52f64	[caffe2] add an EstimateAllBlobSizes operator (#59775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59775 This operator is similar to `GetAllBlobNames` but also returns the estimated size required to serialize each node. One goal of this operator is to allow checkpoint saving logic to estimate the amount of space/bandwidth required to save a checkpoint when first starting training, without actually serializing any blobs yet. Currently the checkpointing logic uses `GetAllBlobNames` to determine the blobs to checkpoint. It can instead be updated to use `EstimateAllBlobSizes` to also get an estimate for how much space will be required for the checkpoint. ghstack-source-id: 132275153 Test Plan: Included a new unit test. Reviewed By: mraway Differential Revision: D29020227 fbshipit-source-id: 811e5d86c4b59183e84e6424c48c97739be09043	2021-06-24 16:55:22 -07:00
Lily Johnson	fe4ded01f7	[package] typing.io/re edge case hack (#60666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60666 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D29367847 Pulled By: Lilyjjo fbshipit-source-id: 2c38140fbb3eab61ae3de60ab475243f0338c547	2021-06-24 14:53:46 -07:00
jiayisun	375d201086	add BFloat16 support for arange on CPU (#60444 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60444 Reviewed By: VitalyFedyunin Differential Revision: D29330585 Pulled By: ezyang fbshipit-source-id: b8a04cee0c3f2ff5544e2b821324ce8fc4e9d0f2	2021-06-24 14:38:47 -07:00
Vasiliy Kuznetsov	7fc4e67771	ns for fx: fix shadow logger error for resnet18 (#60559 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60559 Adds `resnet18` to integration test, and fixes the error to make creating the shadow model work. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_resnet18 ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29336236 fbshipit-source-id: 9425aa096162d80ef3a7c98144b2301cfbccc1ea	2021-06-24 13:42:18 -07:00
Vasiliy Kuznetsov	4ddb2b43b7	ns for fx: expose function to add comparisons between logged values (#60311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60311 Adds a user facing utility function to FX Numeric Suite Core APIs for comparing the values extracted by the loggers to each other. This is needed for any kind of analysis, so would be great to provide an example implementation. Example: ``` // code m = nn.Sequential(nn.Conv2d(1, 1, 1), nn.Conv2d(1, 1, 1)).eval() qconfig_dict = {'': torch.quantization.default_qconfig} mp = torch.quantization.quantize_fx.prepare_fx(m, qconfig_dict) mq = torch.quantization.quantize_fx.convert_fx(copy.deepcopy(mp)) results = extract_weights('fp32', mp, 'int8', mq) extend_logger_results_with_comparison( results, 'fp32', 'int8', compute_sqnr, 'sqnr_int8_vs_fp32') print(results) // results { '_1': {'weight': { 'fp32': [ {'type': 'weight', 'values': [tensor([[[[-0.3284]]]])], 'prev_node_name': '_1', 'prev_node_target_type': "<class 'torch.nn.modules.conv.Conv2d'>", 'ref_node_name': '_1', 'index_within_arg': 0, 'index_of_arg': 0} ], 'int8': [ {'type': 'weight', 'values': [tensor([[[[-0.3297]]]], size=(1, 1, 1, 1), dtype=torch.qint8, quantization_scheme=torch.per_tensor_affine, scale=0.002575645223259926, zero_point=0)], 'prev_node_name': '_1', 'prev_node_target_type': "<class 'torch.nn.quantized.modules.conv.Conv2d'>", 'ref_node_name': '_1', 'index_within_arg': 0, 'index_of_arg': 0, 'sqnr_int8_vs_fp32': [tensor(48.1308)]} ] }}, '_0': {'weight': { 'fp32': [{'type': 'weight', 'values': [tensor([[[[0.5205]]]])], 'prev_node_name': '_0', 'prev_node_target_type': "<class 'torch.nn.modules.conv.Conv2d'>", 'ref_node_name': '_0', 'index_within_arg': 0, 'index_of_arg': 0}], 'int8': [{'type': 'weight', 'values': [tensor([[[[0.5184]]]], size=(1, 1, 1, 1), dtype=torch.qint8, quantization_scheme=torch.per_tensor_affine, scale=0.004082232713699341, zero_point=0)], 'prev_node_name': '_0', 'prev_node_target_type': "<class 'torch.nn.quantized.modules.conv.Conv2d'>", 'ref_node_name': '_0', 'index_within_arg': 0, 'index_of_arg': 0, 'sqnr_int8_vs_fp32': [tensor(48.1309)]}] }} } ``` Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extend_logger_results_with_comparison ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29244715 fbshipit-source-id: a5547b449ea54e046c752119559be49bd738beea	2021-06-24 13:42:16 -07:00
Vasiliy Kuznetsov	31fe1c1323	ns for fx: rekey results by model node names (#60305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60305 Adjusts the NS for FX weight and activation extraction APIs to require a model name, and rekeys the results of these APIs to use the node names of the specified model as layer keys. For example, before ``` // API call results = ns.extract_logger_info( model_a, model_b, ns.OutputLogger) // results {'base_op_1_0': {'node_output': {'model_a': [{'ref_node_name': 'linear1', ...}]}}} ``` and after ``` // API call results = ns.extract_logger_info( model_a, model_b, ns.OutputLogger, 'model_b_name') // results // note: instead of `base_op_1_0`, the layer is named `linear1` {'linear1': {'node_output': {'model_a': [{'ref_node_name': 'linear1', ...}]}}} ``` Note: we cannot use these names while collecting data because node names are not guaranteed to be consistent across graphs. This is why we only rekey as the very last step. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_layer_names ``` Imported from OSS Reviewed By: hx89 Differential Revision: D29243045 fbshipit-source-id: d39ecdfdd18b07291e3ecefed2ede287b100b7d0	2021-06-24 13:41:01 -07:00
Alexander Grund	0ba4044b9d	Increase some tolerances for tf32 for Conv3d tests (#60451 ) Summary: Allow those tests to pass on A100 GPUs which support tf32 Basically follow-up to https://github.com/pytorch/pytorch/pull/52871 which also increased some precisions to 0.05 For reference these are the failures I see (only ones in testnn with 1.9.0): ``` FAIL: test_Conv3d_pad_same_cuda_tf32 (__main__.TestNN) ---------------------------------------------------------------------- Traceback (most recent call last): File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper method(args, kwargs) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper method(args, kwargs) File "test_nn.py", line 11296, in with_tf32_on test.test_cuda(self, kwargs) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_nn.py", line 5103, in test_cuda test_case.assertEqualIgnoreType(cpu_d_i, gpu_d_i, atol=self.precision, rtol=0) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1254, in assertEqualIgnoreType return self.assertEqual(args, exact_dtype=False, kwargs) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1355, in assertEqual super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=0.005, found 161 element(s) (out of 288) whose difference(s) exceeded the margin of error (including 0 nan compariso ns). The greatest difference was 0.032408137116391345 (-33.45570601919647 vs. -33.42329788208008), which occurred at index (2, 0, 0, 1, 0). ====================================================================== FAIL: test_Conv3d_pad_same_dilated_cuda_tf32 (__main__.TestNN) ---------------------------------------------------------------------- Traceback (most recent call last): File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper method(args, *kwargs) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper method(args, kwargs) File "test_nn.py", line 11296, in with_tf32_on test.test_cuda(self, kwargs) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_nn.py", line 5103, in test_cuda test_case.assertEqualIgnoreType(cpu_d_i, gpu_d_i, atol=self.precision, rtol=0) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1254, in assertEqualIgnoreType return self.assertEqual(args, exact_dtype=False, kwargs) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1355, in assertEqual super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=0.005, found 111 element(s) (out of 288) whose difference(s) exceeded the margin of error (including 0 nan compariso ns). The greatest difference was 0.024654212557543076 (35.104286017977465 vs. 35.07963180541992), which occurred at index (3, 0, 0, 0, 2). ====================================================================== FAIL: test_Conv3d_pad_valid_cuda_tf32 (__main__.TestNN) ---------------------------------------------------------------------- Traceback (most recent call last): File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper method(args, *kwargs) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper method(args, kwargs) File "test_nn.py", line 11296, in with_tf32_on test.test_cuda(self, kwargs) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_nn.py", line 5103, in test_cuda test_case.assertEqualIgnoreType(cpu_d_i, gpu_d_i, atol=self.precision, rtol=0) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1254, in assertEqualIgnoreType return self.assertEqual(args, exact_dtype=False, *kwargs) File "/tmp/easybuild-tmp/eb-ED4 (`1f47a80e88`)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1355, in assertEqual super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=0.005, found 41 element(s) (out of 288) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.010903167642320355 (8.074376869119371 vs. 8.06347370147705), which occurred at index (0, 0, 1, 0, 0). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60451 Reviewed By: albanD Differential Revision: D29353255 Pulled By: ngimel fbshipit-source-id: 155a02242be5a11dcbd9dd40ab63f15c6757ae1b	2021-06-24 13:36:27 -07:00
albanD	a3ebc40bab	Update intro doc for derivatives.yaml (#60614 ) Summary: Clarify some phrasing and document the findings on the different non differentiable states. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60614 Reviewed By: anjali411 Differential Revision: D29362740 Pulled By: albanD fbshipit-source-id: 5bc2e8b8dde57ba5a9247d7c28b83c793703e35f	2021-06-24 13:20:40 -07:00
Richard Barnes	48509b1a9b	Add exclusion list to _check_kernel_launches.py (#60562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60562 Test Plan: ``` buck test //caffe2/test:kernel_launch_checks ``` Reviewed By: ngimel Differential Revision: D29336561 fbshipit-source-id: 0cc101143d24e887e852bd6a9ab34ac43155eb63	2021-06-24 13:18:07 -07:00
Luca Wehrstedt	a016150163	Move torch/lib/c10d to torch/csrc/distributed/c10d (#60543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60543 Since now c10d is part of libtorch, it would also be nice if the sources lived all in one place. ghstack-source-id: 132306292 Test Plan: It builds Reviewed By: cbalioglu Differential Revision: D29062002 fbshipit-source-id: d9e1301e9d73e1643fa0f0119cd2d618f1ad52e6	2021-06-24 12:38:51 -07:00
Edward Yang	b8d7db3b31	Turn default kernels into Meyer singletons (#60568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60568 https://github.com/pytorch/pytorch/pull/58661 induced a static initialization order fiasco as flagged by ASAN strict_init_order=true. On further inspection, it became clear that it was not necessary for these to actually be globals initialized at module load time; so I converted them into Meyer singletons which ensures they get loaded immediately when another compilation unit requests them. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D29338019 Pulled By: ezyang fbshipit-source-id: 282846118df6867277404a1830d0ce39fccaa769	2021-06-24 12:30:26 -07:00
Edward Yang	4c00df12ec	Include full Python version in collect_env.py output (#59632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59632 Before: ``` Python version: 3.7 (64-bit runtime) ``` After: ``` Python version: 3.7.7 (default, Mar 23 2020, 17:31:31) [Clang 4.0.1 (tags/RELEASE_401/final)] (64-bit runtime) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D28961500 Pulled By: ezyang fbshipit-source-id: 0f95a49abf6977941f09a64243916576a820679f	2021-06-24 12:11:01 -07:00
Amy He	d52ef2497a	Python basic module execution unit test on delegation of backend_with_compiler_demo (#60468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60468 Added a unit test for the execution of a basic module with a compiler ghstack-source-id: 132307488 Test Plan: Running python test/test_jit.py TestBackendsWithCompiler -v returns a successful test Imported from OSS Reviewed By: iseeyuan Differential Revision: D29306225 fbshipit-source-id: bf1ff075ebc63acbbe46d6ea030086405e29d7d3	2021-06-24 11:43:45 -07:00
nikithamalgi	b7298f499d	Annotate NoneType as Optional[type] (#60383 ) Summary: ------------ Infer NoneType as Optional[torch.Tensor] for monkeytype type inference Pull Request resolved: https://github.com/pytorch/pytorch/pull/60383 Test Plan: ------ python test/test_jit.py -k TestPDT.test_nonetype_as_optional_of_type Reviewed By: gmagogsfm Differential Revision: D29341513 Pulled By: nikithamalgifb fbshipit-source-id: 9a96670cb5cf2560cd4e19962faef5fecea8b24a	2021-06-24 11:00:26 -07:00
mingfeima	5a077bb10b	Optimize some redunction operators on CPU BFloat16 (#55202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55202 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D28836790 Pulled By: VitalyFedyunin fbshipit-source-id: f3a29633d85eb5a614652e568140e9b19509f959	2021-06-24 10:50:24 -07:00
Luca Wehrstedt	4aff267072	Fix Windows error in distributed (#60167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60167 We were getting errors such as this on Windows in our c10d ProcessGroup test suite: ``` test_send_recv_all_to_all (__main__.ProcessGroupGlooTest) ... Exception in thread Thread-1: Traceback (most recent call last): File "C:\Jenkins\Miniconda3\lib\threading.py", line 932, in _bootstrap_inner self.run() File "C:\Jenkins\Miniconda3\lib\threading.py", line 870, in run self._target(self._args, *self._kwargs) File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_distributed.py", line 471, in _event_listener if pipe.poll(None): File "C:\Jenkins\Miniconda3\lib\multiprocessing\connection.py", line 257, in poll return self._poll(timeout) File "C:\Jenkins\Miniconda3\lib\multiprocessing\connection.py", line 330, in _poll return bool(wait([self], timeout)) File "C:\Jenkins\Miniconda3\lib\multiprocessing\connection.py", line 883, in wait ov.cancel() OSError: [WinError 6] The handle is invalid Fatal Python error: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads Python runtime state: finalizing (tstate=000001EFDF228CE0) Thread 0x00001f68 (most recent call first): File "C:\Jenkins\Miniconda3\lib\threading.py", line 1202 in invoke_excepthook File "C:\Jenkins\Miniconda3\lib\threading.py", line 934 in _bootstrap_inner File "C:\Jenkins\Miniconda3\lib\threading.py", line 890 in _bootstrap Current thread 0x00000f94 (most recent call first): <no Python frame> FAIL (5.009s) ``` And the process would then exit with error code 3221226505. See: https://app.circleci.com/pipelines/github/pytorch/pytorch/337351/workflows/ad919a3e-fe9a-4566-8ad6-8b0a252f730c/jobs/14170191/steps By looking at [the code of `_event_listener` in `common_distributed.py`](`eb36f67dcc/torch/testing/_internal/common_distributed.py (L467-L489)`) I think that the first exception (the one about the handle being invalid) is "expected" as it results from another thread purposely closing the pipe while that thread is polling it. The relevant part of the problem seems to be the "could not acquire lock" one. I think this stems from the event listener thread being launched as a daemon thread, which means the interpreter will not wait for that thread to complete before shutting down. When the interpreter shuts down it instantly aborts all other threads. If the event listener thread was aborter _while_ it was logging to stdout then that thread was holding the lock but never got to release it. This is probably what the error is complaining about. This seems to be intended/expected behavior for CPython: https://bugs.python.org/issue42717. The solution thus is simple: don't make that thread a daemon thread and explicitly wait for it to terminate before shutting down. ghstack-source-id: 132293710 Test Plan: Will see... Reviewed By: pritamdamania87 Differential Revision: D29193014 fbshipit-source-id: 4aabe1fc74bf9c54ca605e7a702ac99655489780	2021-06-24 10:35:38 -07:00
Eli Uriegas	f2f2f5bf20	.github: Zip test reports before uploading (#60475 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60475 Uploading many artifacts can cause issues with GHA backend leading to errors on our side. To be safe let's zip our artifacts into one archive so that we avoid uploading too many files at once. See: https://github.com/actions/upload-artifact#too-many-uploads-resulting-in-429-responses Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D29307205 Pulled By: seemethere fbshipit-source-id: da8c9957f88bdcc758969157ee696205db5d4dff	2021-06-24 10:30:51 -07:00
Rong Rong (AI Infra)	7e619b9588	First step to rearrange files in tools folder (#60473 ) Summary: Changes including: - introduced `linter/`, `testing/`, `stats/` folders in `tools/` - move appropriate scripts into these folders - change grepped references in the pytorch/pytorch repo Next step - introduce `build/` folder for build scripts Pull Request resolved: https://github.com/pytorch/pytorch/pull/60473 Test Plan: - CI (this is important b/c pytorch/test-infra also rely on some script reference. - tools/tests/ Reviewed By: albanD Differential Revision: D29352716 Pulled By: walterddr fbshipit-source-id: bad40b5ce130b35dfd9e59b8af34f9025f3285fd	2021-06-24 10:13:58 -07:00
Rong Rong (AI Infra)	40d2fe1053	correct filename issue for test_cpp_extensions_aot (#60604 ) Summary: Using file copy to make actual ninja vs. no_ninja suffixed python test files. This is to trick xmlrunner to report test cases in the correct folder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60604 Test Plan: - CI reports correctly into the corresponding folders - If download the test statistics, calculate shards now doesn't need custom logic to handle `test_cpp_extensions_aot` CI result shown it is working properly: https://github.com/pytorch/pytorch/pull/60604/checks?check_run_id=2900038654 vs https://github.com/pytorch/pytorch/pull/60604/checks?check_run_id=2900038673 Reviewed By: albanD Differential Revision: D29349562 Pulled By: walterddr fbshipit-source-id: e86e6bc0db288a2a57bea3c5f8edf03be1773944	2021-06-24 09:20:19 -07:00
zhouzhuojie	9cab894367	Fix build_only for libtorch (#60615 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60605 We have the `build_only` defined, but the config.yml doesn't have the parameter, this PR fixed that. As a result, the docker image push will be skipped ``` // in config.yml if [ -z "${BUILD_ONLY}" ]; then ``` ``` ("11.1", [ ("3.8", [ ("shard_test", [XImportant(True)]), ("libtorch", [ (True, [ ('build_only', [X(True)]), ]), ]), ]), ]), ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60615 Reviewed By: albanD Differential Revision: D29351567 Pulled By: zhouzhuojie fbshipit-source-id: dab78bb91f62e8bed47739377987167fea1602cb	2021-06-24 09:11:54 -07:00
sawradip	eddc5f40f9	Added GLU and FeatureAlphaDropout to nn docs (#60590 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60563 and https://github.com/pytorch/pytorch/issues/60570 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60590 Reviewed By: albanD Differential Revision: D29352372 Pulled By: jbschlosser fbshipit-source-id: f81dd65deab1848a68dc202df252c416ce5214d0	2021-06-24 08:00:18 -07:00
Edward Yang	204da12592	Reduce number of CEX when passing Tensors to Python (#60546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60546 Before, we assume conservatively that any Tensor passed to THPVariable_Wrap could be aliased in another thread and therefore race. However, THPVariable_Wrap takes in Variable by value; and so if use_count() <= 1, it is impossible for another thread to have a reference to it. So we can conclude that it is definitely uninitialized if the quick test fails! Thanks bdhirsh for pointing out the optimization opportunity here. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29331718 Pulled By: ezyang fbshipit-source-id: e100796fbc55a0af2c6565c6fbc9ddc8ae7ceb42	2021-06-24 07:40:39 -07:00
Luca Wehrstedt	bdb964f89f	Support RRefs that contain threading.Locks (#57943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57943 This is a common scenario (our own tutorials propose it), hence we should ensure it works. A more generic solution is desirable, but this should fix the immediate concern. ghstack-source-id: 132289683 Test Plan: Added a test Reviewed By: mrshenli Differential Revision: D28316076 fbshipit-source-id: 64e9766189f40474298876227ea247ce5b699d97	2021-06-24 06:36:09 -07:00
lezcano	4e347f1242	[docs] Fix backticks in docs (#60474 ) Summary: There is a very common error when writing docs: One forgets to write a matching `` ` ``, and something like ``:attr:`x`` is rendered in the docs. This PR fixes most (all?) of these errors (and a few others). I found these running ``grep -r ">[^#<][^<]*\`"`` on the `docs/build/html/generated` folder. The regex finds an HTML tag that does not start with `#` (as python comments in example code may contain backticks) and that contains a backtick in the rendered HTML. This regex has not given any false positive in the current codebase, so I am inclined to suggest that we should add this check to the CI. Would this be possible / reasonable / easy to do malfet ? Pull Request resolved: https://github.com/pytorch/pytorch/pull/60474 Reviewed By: mrshenli Differential Revision: D29309633 Pulled By: albanD fbshipit-source-id: 9621e0e9f87590cea060dd084fa367442b6bd046	2021-06-24 06:27:41 -07:00
Luca Wehrstedt	bb9e1150ea	Revert D29342234: [pytorch][PR] [CUDA graphs][BC-breaking] Removes post-backward syncs on default stream Test Plan: revert-hammer Differential Revision: D29342234 (`675cea1adb`) Original commit changeset: 98e6be7fdd85 fbshipit-source-id: 84022973248b2254210eee57402df2c4f4bc43c6	2021-06-24 04:49:28 -07:00
Luca Wehrstedt	2b72068a68	Make Future store Storages instead of references to DataPtrs (#60470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60470 A Future needs to know what DataPtrs are used by its value, but it isn't always able to extract them (and even when it is, that's expensive) so they're cached. DataPtrs are kinda like unique_ptrs (movable only, cannot be copied) hence the Future can only hold _references_ to them. The Future's value, however, is unfortunately mutable (we'd wish that weren't the case, but we don't think we can prevent that), which means the tensor/storage that owned that DataPtr might be deleted and thus the DataPtr could be freed. This means our cached reference becomes stale! Which leads to all kinds of disaster, like reading garbage data or segfaulting. Luckily all the DataPtrs we were dealing with were held inside Storages, which have a shared_ptr semantics, thus allowing us to hold a strong pointer to them which ensures they're kept alive. ghstack-source-id: 132177396 Test Plan: CI Reviewed By: mrshenli Differential Revision: D29303570 fbshipit-source-id: d814754806fa58b24e45269e97d768485ef972ba	2021-06-24 03:56:04 -07:00
Luca Wehrstedt	06e6d63187	Use a no-warning registry for TensorPipe backends (#60457 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60457 The "without warning" variants of the registry were introduced in https://github.com/pytorch/pytorch/pull/31126 to be used in Gloo for the exact same reason: we use a registry precisely so that backends can be overridden, no need to scare users with a warning. ghstack-source-id: 132051268 Test Plan: Rebuilt and re-run Reviewed By: mrshenli Differential Revision: D29293840 fbshipit-source-id: 3450e547056b2c534166972e8266dab5479d5e43	2021-06-24 03:27:04 -07:00
Raghavan Raman	d3a8505ee1	[jit] Added a pass to transform aten::cat ops to prim::Concat op with variable number of inputs (#59881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59881 This pass is not included in the JIT flow or anywhere else at this point. The idea is, once this lands, everyone can use this to test their workflow with this transformation and once we are convinced this is useful and/or improves performance, we can include it in the appropriate workflow. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29277876 Pulled By: navahgar fbshipit-source-id: b5be7bdcc98dced59295bd7b8f6627619cb58d41	2021-06-24 01:27:41 -07:00
Raghavan Raman	c35a3dd6f2	[jit] Added a new operator for concat that takes in variadic parameters (#59880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59880 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29277877 Pulled By: navahgar fbshipit-source-id: 6db24e7432f683a1d1466f9778201e0aa5d3b1ad	2021-06-24 01:26:22 -07:00
kshitij12345	dfd2edc025	[special] add zeta (#59623 ) Summary: Reference https://github.com/pytorch/pytorch/issues/50345 `zeta` was already present in the codebase to support computation of `polygamma`. However, `zeta` only had `double(double, double)` signature for CPU before the PR (which meant that computation `polygamma` were always upcasted to `double` for zeta part). With this PR, float computations will take place in float and double in double. Have also refactored the code and moved the duplicate code from `Math.cuh` to `Math.h` Note: For scipy, q is optional, and if it is `None`, it defaults `1` which corresponds to Reimann-Zeta. However, for `torch.specia.zeta`, I made it mandatory cause for me it feels odd without `q` this is Reimann-Zeta and with `q` it is the general Hurwitz Zeta. I think sticking to just general made more sense as passing `1` for q sounds trivial. Verify: * [x] Docs https://14234587-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.zeta Pull Request resolved: https://github.com/pytorch/pytorch/pull/59623 Reviewed By: ngimel Differential Revision: D29348269 Pulled By: mruberry fbshipit-source-id: a3f9ebe1f7724dbe66de2b391afb9da1cfc3e4bb	2021-06-24 00:00:12 -07:00
Akifumi Imanishi	26cdec6ce4	Support `torch.bitwise_{left/right}_shift` and `__rlshift__`, `__rrshift__` (#59544 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58121 This PR implements `torch.bitwise_left_shift` and `torch.bitwise_right_shift` and `torch.Tensor.{__rlshift__/__rrshift__}`for compatibility with Python array API standard. (cc: mruberry, rgommers, emcastillo, kmaehashi) Pull Request resolved: https://github.com/pytorch/pytorch/pull/59544 Reviewed By: ngimel Differential Revision: D29348869 Pulled By: mruberry fbshipit-source-id: 329aee296cf890735e8a9f858bccfe87c03d06ca	2021-06-23 23:57:16 -07:00
Pritam Damania	b82453cbd4	Run dist_autograd backward RPCs on appropriate CUDA streams. (#60606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60606 TensorPipe receives tensors over the wire on custom streams and these streams are passed to some RPC callbacks but not to `BACKWARD_AUTOGRAD_REQ`. As a result, `BACKWARD_AUTOGRAD_REQ` ran on the default stream while still using tensors from the custom stream. This resulted in downstream autograd operations running on the incorrect stream. To fix this, I've passed the streams to `BACKWARD_AUTOGRAD_REQ` as well and added an appropriate guard. #Closes: https://github.com/pytorch/pytorch/issues/59793 ghstack-source-id: 132252069 Test Plan: Test https://github.com/pytorch/pytorch/issues/59793 Reviewed By: mrshenli Differential Revision: D29347244 fbshipit-source-id: 8ff8b150763c970ab15c2cac8dccf56e66e9ef5d	2021-06-23 23:52:22 -07:00
Michael Carilli	675cea1adb	[CUDA graphs][BC-breaking] Removes post-backward syncs on default stream (#60421 ) Summary: Before https://github.com/pytorch/pytorch/pull/57833, calls to backward() or grad() synced only the calling thread's default stream with autograd leaf streams at the end of backward. This made the following weird pattern safe: ```python with torch.cuda.stream(s): # imagine forward used many streams, so backward leaf nodes may run on many streams loss.backward() # no sync use grads ``` but a more benign-looking pattern was unsafe: ```python with torch.cuda.stream(s): # imagine forward used a lot of streams, so backward leaf nodes may run on many streams loss.backward() # backward() syncs the default stream with all the leaf streams, but does not sync s with anything, # so counterintuitively (even though we're in the same stream context as backward()!) # it is NOT SAFE to use grads here, and there's no easy way to make it safe, # unless you manually sync on all the streams you used in forward, # or move "use grads" back to default stream outside the context. use grads ``` mruberry ngimel and I decided backward() should have the [same user-facing stream semantics as any cuda op](https://pytorch.org/docs/master/notes/cuda.html#stream-semantics-of-backward-passes). In other words, the weird pattern should be unsafe, and the benign-looking pattern should be safe. Implementationwise, this meant backward() should sync its calling thread's current stream, not default stream, with the leaf streams. After https://github.com/pytorch/pytorch/pull/57833, backward syncs the calling thread's current stream AND default stream with all leaf streams at the end of backward. The default stream syncs were retained for temporary backward compatibility. This PR finishes https://github.com/pytorch/pytorch/pull/57833's work by deleting syncs on the default stream. With this PR, graph-capturing an entire backward() call should be possible (see the [test_graph_grad_scaling diffs](https://github.com/pytorch/pytorch/compare/master...mcarilli:streaming_backwards_remove_default_syncs?expand=1#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3641-R3642)). first paragraph has a formatting error which this PR should also fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60421 Reviewed By: VitalyFedyunin, albanD Differential Revision: D29342234 Pulled By: ngimel fbshipit-source-id: 98e6be7fdd8550872f0a78f9a66cb8dfe75abf63	2021-06-23 23:35:24 -07:00
Adam Simpkins	00896cb9ed	[caffe2] update db::Transaction::Put() to accept the value by rvalue reference (#60208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60208 Update the DB APIs so that `db::Transaction::Put()` accepts the value by rvalue reference. This allows DB implementations to write data asynchronously without being forced to make an additional copy of the data in memory. `Put()` implementations can now use the string move constructor or assignment operator to get the string data and continue performing the write asynchronously after returning from `Put()`. Note that I chose to entirely replace the existing `Put()`, removing the ability for callers to call `Put()` with a `const std::string&` argument for the value, rather than simply adding another overloaded version of `Put()`. This was done because in practice there were no call sites using `Put()` that cannot move in their data. Eliminating the `const std::string&` API entirely simplifies the DB implementations: DBs that wish do support move semantics do not have to implement both the move and the copy versions of `Put()`. Test Plan: Searched through fbcode to try and make sure I found all `db::Transaction` subclasses, and will check sandcastle results to help confirm. Ran the modelstore checkpointing unit tests. Differential Revision: D29204425 fbshipit-source-id: 28be6646e92e5df71954d4bb3dc0c8add30ed041	2021-06-23 22:12:53 -07:00
Adam Simpkins	b09c0b6550	[caffe2] update the BlobSerializer acceptor to allow moving in the data (#60207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60207 Update the `BlobSerializerBase` API so that the serizialized blob data is passed as a `std::string&&` rather than `const std::string&`. This allows the acceptor to take ownership of the string data. This allows the acceptor to do things like queue it for storing asynchronously, rather than having to make a copy of the data if they need it to remain valid after returning. All existing `BlobSerializerBase` implementations already pass in a valid rvalue reference to the data, so this change did not require updating any of the existing serializer implementations. ghstack-source-id: 132216750 Test Plan: Examined all ~46 `BlobSerializerBase` subclasses in fbsource to confirm they already pass in an rvalue reference for this argument. Also searched for `BlobSerializerBase` on google and did not find any external references to this class in other open source projects that might be affected. Differential Revision: D29204426 fbshipit-source-id: b1d567e52a5c17a01d651c70bbfa2fddbaea6cd9	2021-06-23 22:11:42 -07:00
Philip Meier	6ea22672c4	add support for sparse tensors in `torch.testing.assert_close` (#58844 ) Summary: This adds support for sparse tensors the same way `torch.testing._internal.common_utils.TestCase.assertEqual` does: `5c7dace309/torch/testing/_internal/common_utils.py (L1287-L1313)` - Tensors are coalesced before comparison. - Indices and values are compared individually. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58844 Reviewed By: zou3519 Differential Revision: D29160250 Pulled By: mruberry fbshipit-source-id: b0955656c2c7ff3db37a1367427ca54ca14f2e87	2021-06-23 21:59:01 -07:00
Yi Wang	80f40b172f	[Model Averaging] Periodic model averager (#60320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60320 This averager can be used for post-local SGD. ghstack-source-id: 131908011 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager Reviewed By: rohan-varma Differential Revision: D29249850 fbshipit-source-id: 09675d6bb1edfb8ffbeb94510d91962532d8ca3e	2021-06-23 20:23:04 -07:00
Thomas J. Fan	4e51503b1f	DOC Improves input and target docstring for loss functions (#60553 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56581 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60553 Reviewed By: VitalyFedyunin Differential Revision: D29343797 Pulled By: jbschlosser fbshipit-source-id: cafc29d60a204a21deff56dd4900157d2adbd91e	2021-06-23 20:20:29 -07:00
Thomas J. Fan	6d1b4642f0	DOC Describes parameters/buffers registered as None in load_state_dict (#60549 ) Summary: Related to https://github.com/pytorch/pytorch/issues/8104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60549 Reviewed By: VitalyFedyunin Differential Revision: D29343732 Pulled By: jbschlosser fbshipit-source-id: ef5ba3094c8eaf2f9c8efeba6a9d9ab52ebf8b2c	2021-06-23 20:15:22 -07:00
Hao Lu	1e31d26b1d	[Static Runtime] Fix bugs in static_runtime::to_copy (#60503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60503 Fixed a few issues in the static_runtime::to_copy impl: - fixed a bug with memory_format - copy strides when appropriate. This is necessary to make sure that the fbgemm path in the copy kernel gets hit. - fix the schema in the `ReplaceWithCopy` pass - add registration of `static_runtime::to_copy.other` Add more unit tests: - test dynamic shapes - test strided input tensor to `aten::to` - test alias case (same input/output) - test `to.other` Reviewed By: ajyu Differential Revision: D26838933 fbshipit-source-id: ec0d1a2deebe998fcfe8858e772e1ef429cb4522	2021-06-23 19:57:17 -07:00
Hao Lu	d200e9de26	[Static Runtime] Test for dynamic shapes in SR unit tests (#60579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60579 - Modify testStaticRuntime to take two sets of inputs so if the second set of inputs have bigger shapes, it would trigger memory allocations in resize_ calls. - Modify test scripts so that the output of the test op is managed by the memory planner, as explained in comments. Reviewed By: ajyu Differential Revision: D29221452 fbshipit-source-id: 09f0f7eb384dc8ca67594f1fa76e1e31392ee6ca	2021-06-23 19:56:05 -07:00
Thomas J. Fan	99b641169b	Migrates nll_loss_forward from TH to Aten (CUDA) (#60097 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24610 Aten Umbrella issue https://github.com/pytorch/pytorch/issues/24507 Related to https://github.com/pytorch/pytorch/issues/59765 The performance does not change between this PR and master with the following benchmark script: <details> <summary>Benchmark script</summary> ```python import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): torch.cuda.synchronize() MS_PER_SECOND = 1000 return time.perf_counter() * MS_PER_SECOND device = "cuda" C = 30 softmax = nn.LogSoftmax(dim=1) n_runs = 250 for reduction in ["none", "mean", "sum"]: for N in [100_000, 500_000, 1_000_000]: fwd_t = 0 bwd_t = 0 data = torch.randn(N, C, device=device) target = torch.empty(N, dtype=torch.long, device=device).random_(0, C) loss = nn.NLLLoss(reduction=reduction) input = softmax(data) for i in range(n_runs): t1 = _time() result = loss(input, target) t2 = _time() fwd_t = fwd_t + (t2 - t1) fwd_avg = fwd_t / n_runs print( f"input size({N}, {C}), reduction: {reduction} " f"forward time is {fwd_avg:.2f} (ms)" ) print() ``` </details> ## master ``` input size(100000, 30), reduction: none forward time is 0.02 (ms) input size(500000, 30), reduction: none forward time is 0.08 (ms) input size(1000000, 30), reduction: none forward time is 0.15 (ms) input size(100000, 30), reduction: mean forward time is 1.81 (ms) input size(500000, 30), reduction: mean forward time is 8.24 (ms) input size(1000000, 30), reduction: mean forward time is 16.46 (ms) input size(100000, 30), reduction: sum forward time is 1.66 (ms) input size(500000, 30), reduction: sum forward time is 8.24 (ms) input size(1000000, 30), reduction: sum forward time is 16.46 (ms) ``` ## this PR ``` input size(100000, 30), reduction: none forward time is 0.02 (ms) input size(500000, 30), reduction: none forward time is 0.08 (ms) input size(1000000, 30), reduction: none forward time is 0.15 (ms) input size(100000, 30), reduction: mean forward time is 1.80 (ms) input size(500000, 30), reduction: mean forward time is 8.24 (ms) input size(1000000, 30), reduction: mean forward time is 16.46 (ms) input size(100000, 30), reduction: sum forward time is 1.66 (ms) input size(500000, 30), reduction: sum forward time is 8.24 (ms) input size(1000000, 30), reduction: sum forward time is 16.46 (ms) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60097 Reviewed By: mrshenli Differential Revision: D29303099 Pulled By: ngimel fbshipit-source-id: fc0d636543a79ea81158d286dcfb84043bec079a	2021-06-23 19:47:01 -07:00
Hong Xu	ef84bcfee6	Convert floating-point constants to T in Bessel functions (#59416 ) Summary: If T is float, many of the computations are more expensive than expected. Compilers may be reluctant to optimize because they often lead to different outcome. Converting many constants to T before using them to clear any doubt. Benchmark: (Debian 11, no turbo, Release build, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz, gcc 10.2.1) ```python import timeit for dtype in ('torch.float',): for func in ('i0', 'i0e', 'i1', 'i1e'): for n, t in [(10_000, 10000), (100_000, 1000)]: print(f'torch.special.{func}(torch.arange(n, dtype=torch.float32)), n = {n} for {t} times, dtype={dtype}') print(timeit.timeit(f'torch.special.{func}(a)', setup=f'import torch; a = torch.arange({n}, dtype=torch.float32)', number=t)) ``` Before: ``` torch.special.i0(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float 1.539132010017056 torch.special.i0(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float 0.9613071230123751 torch.special.i0e(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float 4.32450835997588 torch.special.i0e(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float 1.5751779029960744 torch.special.i1(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float 1.0810036820184905 torch.special.i1(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float 0.5314770240220241 torch.special.i1e(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float 0.41711462699458934 torch.special.i1e(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float 0.1759720179834403 ``` After: ``` torch.special.i0(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float 1.337154256994836 torch.special.i0(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float 0.8640981369826477 torch.special.i0e(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float 4.308618158014724 torch.special.i0e(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float 1.5217605629877653 torch.special.i1(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float 0.9398589830088895 torch.special.i1(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float 0.4667845010117162 torch.special.i1e(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float 0.3658539849857334 torch.special.i1e(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float 0.15680673700990155 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59416 Reviewed By: anjali411 Differential Revision: D29249897 Pulled By: mruberry fbshipit-source-id: c170e78f2ab47176ea95b8442c6279d7ec1d75c2	2021-06-23 19:43:27 -07:00
Kushashwa Ravi Shrimali	08020220f3	[Testing] Adding reference tests to `OpInfo` class (#59369 ) Summary: This PR will ideally add `ref` argument to `OpInfo` base class. The idea is to add reference checks for all the ops _eligible_. For more discussion, please check https://github.com/pytorch/pytorch/issues/58294 * [x] Migrate (but not removing yet) and modify helper functions from `UnaryUfuncOpInfo` class to `OpInfo` base class. * [x] Test the reference checks for multiple ops. (also decide a list of different and eligible ops for this) * [x] Handle possible edge cases (for example: `uint64` isn't implemented in PyTorch but is there in NumPy, and this needs to be handled -- more on this later) -- _Update_: We decided that these reference tests should only test for values and not types. * [x] Create a sample PR for a single (of all different categories?) on adding reference functions to the eligible ops. -- _Update_: This is being done in this PR only. * [x] ~Remove reference tests from `test_unary_ufuncs.py` and test to make sure that nothing breaks.~ (Update: We won't be touching Unary Ufunc reference tests in this PR) * [x] Add comments, remove unnecessary prints/comments (added for debugging). Note: To keep the PR description short, examples of edge cases encountered have been mentioned in the comments below. cc: mruberry pmeier kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59369 Reviewed By: ngimel Differential Revision: D29347252 Pulled By: mruberry fbshipit-source-id: 69719deddb1d23c53db45287a7e66c1bfe7e65bb	2021-06-23 19:26:08 -07:00
Nikolay Korovaiko	236d3afd82	manual revert of 57575 (#60572 ) Summary: manually reverting 57575 while keeping 57574 since it's fixing a bug: https://github.com/pytorch/pytorch/issues/55609 Sandcastle couldn't do it automatically Pull Request resolved: https://github.com/pytorch/pytorch/pull/60572 Reviewed By: driazati Differential Revision: D29342473 Pulled By: Krovatkin fbshipit-source-id: 66ad7d316984a13d203158ceba9706a5f451f9b2	2021-06-23 19:21:48 -07:00
Masaki Kozuki	9e773ea7d5	Use `accscalar_t` for CUDA add/sub with Tensor and Scalar (#60454 ) Summary: Follow up of https://github.com/pytorch/pytorch/issues/60227, related to https://github.com/pytorch/pytorch/issues/59907 & https://github.com/pytorch/pytorch/issues/58833 With this pull request, `torch.add` & `torch.sub` use `acc_type` for `Scalar` if either of two arguments is `Scalar`. This mimics the behavior of [`torch.mul`](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu#L18), `torch._foreach_(add\|sub).Scalar` and `torch._foreach_(add\|sub).ScalarList`. --- reference - torch.mul CUDA kernel: `b0c9762e2d/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu (L17-L25)` - `torch._foreach_(add\|sub).Scalar`: cast scalar `b0c9762e2d/aten/src/ATen/native/cuda/ForeachBinaryOpScalar.cu (L27)` - `torch._foreach_(add\|sub).ScalarList`: `BinaryOpScalarListFunctor` `b0c9762e2d/aten/src/ATen/native/cuda/ForeachFunctors.cuh (L180-L182)` and multi_tensor_apply handles `scalar_t` and computes `opmath_t` (almost equivalent `accscalar_t`) `b0c9762e2d/aten/src/ATen/native/cuda/MultiTensorApply.cuh (L60-L68)`. BinaryOpScalarListFunctor is used `b0c9762e2d/aten/src/ATen/native/cuda/ForeachBinaryOpScalarList.cu (L24)` cc ngimel ptrblck mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/60454 Reviewed By: VitalyFedyunin Differential Revision: D29345035 Pulled By: ngimel fbshipit-source-id: 5dbafbdfe029a9544ec2e58f17d547928e017a04	2021-06-23 18:59:22 -07:00
Serhat Yilmaz	af66824c1f	[torch][segment_reduce] Add support for sum and min reductions (#60379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60379 This concludes the support for all reductions types initially planned (min, max, mean, sum). Next Steps: - Cleanups - update default values when length is 0 and initial not given - templatize the code to avoid branching with every item.( and other known improvements) - more unit tests, verification - benchmarking Test Plan: updated unit tests. Reviewed By: ngimel Differential Revision: D29268218 fbshipit-source-id: c77d91671e01dcf96c18c758fa3ea522b2e13db9	2021-06-23 18:51:44 -07:00
Ilqar Ramazanli	63219f1f9f	To add Rectified Adam Algorithm to Optimizers (#58968 ) Summary: Fixes : https://github.com/pytorch/pytorch/issues/24892 In the paper : https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm. It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process. Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well : `2f03dd1970/radam/radam.py (L156)` `f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968 Reviewed By: vincentqb Differential Revision: D29310601 Pulled By: iramazanli fbshipit-source-id: b7bd487f72f1074f266687fd9c0c6be264a748a9	2021-06-23 18:27:57 -07:00
Kiuk Chung	5a2f41a2db	[torch/distributed.elastic] Fix utils.distributed_test.test_create_store_timeout_on_server to be dual-stack ip compatible (#60558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60558 Fixes 1/2 flaky tests as described in: https://github.com/pytorch/pytorch/issues/60260 `test_create_store_timeout_on_server` tests whether trying to create a `c10d::TCPStore` server on an already taken port actually fails with an `IOError`. Prior to this change the `utils.get_socket_with_port()` util method was used to synthetically reserve a port, then try creating the `TCPStore` on that port to validate the `IOError`. The issue with this is that on a dual stack ip setup, `get_socket_with_port()` (since it uses `socket.AF_UNSPEC`) reserves an ipv6 port, while `TCPStore` will try binding to an ipv4 port, so an `IOError` is not observed. Changing the logic of the test to create two `TCPStore` servers. The first chooses a free port (by passing `server_port=0`) while the second tries to create a `TCPStore` server on the port that the first store is already running on. This would induce an `IOError` on the second store's constructor. NOTE: this change does not solve another broader issue with `TCPStore` where the server and workers can listen and connect on ipv4 vs ipv6 when they are running on dual-stak ip hosts without ipv4 DNS entry and/or a `/etc/gai.conf` specifying the preferred bind ordering. See: https://github.com/pytorch/pytorch/pull/49124 Test Plan: ``` buck test //caffe2/test/distributed/elastic/utils:distributed_test ``` Reviewed By: cbalioglu Differential Revision: D29334947 fbshipit-source-id: 76b998c59082cb04c0e86b7a1f3b509367fa0136	2021-06-23 17:12:18 -07:00
Bert Maher	1a0058f593	[nnc] Merge inconsistent profiling information (#60510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60510 We encountered a situation where loop unrolling caused us to duplicate profiled tensor types in a manner that wasn't logically consistent (see the attached test case). When applying this profiling information, we need to merge the profiled types so that we use a conservative (unspecialized) type. ghstack-source-id: 132160002 Test Plan: new unit test, plus local predictor using P424983338 Reviewed By: Krovatkin Differential Revision: D29322487 fbshipit-source-id: 4c18ee69c71bb0622c2e6f6aa361ab5613cbaca4	2021-06-23 17:05:32 -07:00
Tao Xu	b5b42d4ce2	[iOS GPU] Add tests for RoIAlign (#60595 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60595 ghstack-source-id: 132245331 Test Plan: CI Reviewed By: husthyc Differential Revision: D29345400 fbshipit-source-id: 7406edee232a0ab7b40a4820e3ff9ac07871cdd4	2021-06-23 16:26:53 -07:00
Supriya Rao	1120a1b92e	[quant][fx][fix] QAT with object_type in qconfig (#60555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60555 When we do QAT, we swap the FP32 modules with the corresponding quantized modules counterpart by calling `qat_swap_modules` in prepare. However when we try to look up using the swapped module type in qconfig_dict, we cannot find a match anymore since the qconfig dict contains the original module type. In this PR we update the qconfig_dict to include the modules swapped for QATT Test Plan: python test/test_quantization.py TestQuantizeFx.test_qconfig_qat_module_type Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29337036 fbshipit-source-id: 60212eec3ee252a2445c1b58874cb36048c9f7dd	2021-06-23 15:55:25 -07:00
Hui Guo	d867340c7b	[nnc] Add LoopNest::getLoopAt to retrieve a specified inner For-stmt (#60569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60569 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D29337767 Pulled By: huiguoo fbshipit-source-id: e3ae23c1b290739c03d1fa5d7da25de878eb1d4c	2021-06-23 15:53:29 -07:00
Hui Guo	c0d08dc10f	[NNC] Add tile transformation in loopnest (fixed #52785 ) (#57758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57758 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28260744 Pulled By: huiguoo fbshipit-source-id: 6b5591850aaf46455bf3c2d776fa930654839a63	2021-06-23 15:52:19 -07:00
Yi Wang	aeea5bf4a1	[Model Averaging] Provide a util function for model averaging (#60303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60303 The util function can be used for averaging parameters. More optimizations can be done in the future. ghstack-source-id: 132214212 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_average_parameters buck test mode/dev-nosan caffe2/test/distributed:distributed_gloo_fork -- test_average_parameters Reviewed By: rohan-varma Differential Revision: D29242806 fbshipit-source-id: 76fb5a92adb4bdc6151a9f411e366a0ed2a31f47	2021-06-23 15:41:15 -07:00
Andrew Gu	b770c4b61a	Fix ZeRO sort to be by numel (#60556 ) Summary: Overview: This is a follow-up to [this PR](https://github.com/pytorch/pytorch/pull/59586) and corrects the ZeRO partitioning algorithm to sort by the number of elements in the tensor rather than the size of the first dimension. As context, that PR was meant to migrate from using a _naive greedy_ algorithm to a _sorted-greedy_ algorithm when partitioning parameters in ZeRO. Updated Results: The updated table for the partitions can be found [here](https://github.com/pytorch/pytorch/pull/59410#issuecomment-865203219). There, I also considered a third algorithm (sometimes known as multifit), which is more computationally expensive than the greedy and sorted-greedy algorithms but cannot perform worse. However, because of its increased complexity and lack of improved results, I chose to settle with the simpler sorted-greedy algorithm. The `step()` latencies show slight improvements, but the improvements may be in the noise. The values below are in seconds and were generated using NCCL backend (unlike in the previous PR which used Gloo): Two processes: \| Model \| Max `optimizer.step()` Time - Greedy (Std.) \| Max `optimizer.step()` Time - Sorted-Greedy (Std.) \| \| --- \| --- \| --- \| \| ResNet-50 \| 0.047 (0.00142) \| 0.044 (0.00025) \| \| ResNet-152 \| 0.057 (0.00034) \| 0.054 (0.00022) \| \| BERT \| 0.021 (0.00008) \| 0.020 (0.00008) \| Four processes: \| Model \| Max `optimizer.step()` Time - Greedy \| Max `optimizer.step()` Time - Sorted-Greedy (Std.) \| \| --- \| --- \| --- \| \| ResNet-50 \| 0.019 (0.00065) \| 0.013 (0.00040) \| \| ResNet-152 \| 0.045 (0.00024) \| 0.045 (0.00025) \| \| BERT \| 0.019 (0.00022) \| 0.018 (0.00016) \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/60556 Test Plan: I verified that the ZeRO tests pass (via the AI AWS cluster): ``` srun -p $DEV_QUEUE --cpus-per-task=16 -t 5:00:00 --gpus-per-node=4 python test/distributed/optim/test_zero_redundancy_optimizer.py ``` Reviewed By: VitalyFedyunin Differential Revision: D29335260 Pulled By: andwgu fbshipit-source-id: 469d1c6e029b77c1b300a94cd1fd94b633cd28dd	2021-06-23 15:22:36 -07:00
Jane Xu	1054ad5af3	Add back smoke tests for windows shard 1 for CircleCI (#60571 ) Summary: The reason I removed the smoke tests here were because we didn't have gflags on our GHA runners and we wanted to get sharding done sooner rather than later. However, we shouldn't remove these tests for windows as they are important for debugging linker issues with torch. Thus, this is step 1 in adding the tests back. Next step: - add gflags to base ami - remove the exist check Pull Request resolved: https://github.com/pytorch/pytorch/pull/60571 Test Plan: CI shouldn't break Reviewed By: walterddr Differential Revision: D29341850 Pulled By: janeyx99 fbshipit-source-id: 7e0c98887534d096f867e28a5482b32aa493b132	2021-06-23 14:52:14 -07:00
driazati	555c154df5	Use asyncio in tools/clang_tidy.py (#60495 ) Summary: This replaces Ninja for parallel builds with asyncio which is more idiomatic Python + easier to debug when things go wrong since the data never leaves Python. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60495 Reviewed By: bhosmer Differential Revision: D29315526 Pulled By: driazati fbshipit-source-id: 196b1807fe4ee6db432d5fef146e52f96939b44d	2021-06-23 14:18:03 -07:00
Eli Uriegas	2dedd96dd2	cmake: Prefer CMAKE_CURRENT_SOURCE_DIR to TORCH_SRC_DIR (#60493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60493 TORCH_SRC_DIR appears to be a bit bugged when it comes to identifying include directories so let's try and use CMAKE_CURRENT_SOURCE_DIR instead <details> <summary>Logs for builds with torchaudio</summary> ``` -- Building version 0.10.0a0+9e36281 running bdist_wheel running build running build_py copying torchaudio/version.py -> build/lib.linux-x86_64-3.6/torchaudio running build_ext -- Configuring done -- Generating done -- Build files have been written to: /home/eliuriegas/work/audio/build/temp.linux-x86_64-3.6 [1/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o -c ../../third_party/kaldi/submodule/src/base/kaldi-error.cc [2/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o -c ../../third_party/kaldi/submodule/src/base/kaldi-math.cc [3/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o -c ../../third_party/kaldi/submodule/src/feat/feature-functions.cc [4/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o -c ../../third_party/kaldi/src/matrix/kaldi-matrix.cc [5/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o -c ../../third_party/kaldi/submodule/src/feat/resample.cc [6/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o -c ../../third_party/kaldi/src/matrix/kaldi-vector.cc [7/11] /usr/lib64/ccache/c++ -DINCLUDE_KALDI -DTORCH_API_INCLUDE_EXTENSION_H -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_torchaudio_EXPORTS -I../../ -I/tmp/tmp.GKeM3KKcFi/include/python3.6m -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o -MF torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o.d -o torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o -c ../../torchaudio/csrc/kaldi.cpp [8/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o -c ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc: In member function ‘void kaldi::OnlinePitchFeatureImpl::UpdateRemainder(const kaldi::VectorBase<float>&)’: ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc:814:11: warning: unused variable ‘full_frame_length’ [-Wunused-variable] 814 \| int32 full_frame_length = opts_.NccfWindowSize() + nccf_last_lag_; \| ^~~~~~~~~~~~~~~~~ ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc: In member function ‘void kaldi::OnlineProcessPitch::UpdateNormalizationStats(kaldi::int32)’: ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc:1504:35: warning: comparison of integer expressions of different signedness: ‘std::vector<kaldi::OnlineProcessPitch::NormalizationStats>::size_type’ {aka ‘long unsigned int’} and ‘kaldi::int32’ {aka ‘int’} [-Wsign-compare] 1504 \| if (normalization_stats_.size() <= frame) \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~ [9/11] : && /usr/bin/cmake -E rm -f third_party/kaldi/libkaldi.a && /usr/bin/ar qc third_party/kaldi/libkaldi.a third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o && /usr/bin/ranlib third_party/kaldi/libkaldi.a && : [10/11] : && /usr/lib64/ccache/c++ -fPIC -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -O3 -DNDEBUG -shared -Wl,-soname,_torchaudio.so -o torchaudio/csrc/_torchaudio.so torchaudio/csrc/CMakeFiles/_torchaudio.dir/pybind.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/lfilter.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/overdrive.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/utils.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o -Wl,-rpath,/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib: /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch_python.so third_party/kaldi/libkaldi.a /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch.so -Wl,--no-as-needed,"/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed /usr/local/lib/libbreakpad_client.a /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so -lpthread -Wl,--no-as-needed,"/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch.so" -Wl,--as-needed /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so && : [10/11] cd /home/eliuriegas/work/audio/build/temp.linux-x86_64-3.6 && /usr/bin/cmake -P cmake_install.cmake -- Install configuration: "Release" -- Installing: /home/eliuriegas/work/audio/build/lib.linux-x86_64-3.6/torchaudio/./_torchaudio.so -- Set runtime path of "/home/eliuriegas/work/audio/build/lib.linux-x86_64-3.6/torchaudio/./_torchaudio.so" to "" installing to build/bdist.linux-x86_64/wheel running install running install_lib creating build/bdist.linux-x86_64/wheel creating build/bdist.linux-x86_64/wheel/torchaudio copying build/lib.linux-x86_64-3.6/torchaudio/kaldi_io.py -> build/bdist.linux-x86_64/wheel/torchaudio copying build/lib.linux-x86_64-3.6/torchaudio/transforms.py -> build/bdist.linux-x86_64/wheel/torchaudio copying build/lib.linux-x86_64-3.6/torchaudio/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio creating build/bdist.linux-x86_64/wheel/torchaudio/compliance copying build/lib.linux-x86_64-3.6/torchaudio/compliance/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/compliance copying build/lib.linux-x86_64-3.6/torchaudio/compliance/kaldi.py -> build/bdist.linux-x86_64/wheel/torchaudio/compliance creating build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/cmuarctic.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/librispeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/libritts.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/vctk.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/commonvoice.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/gtzan.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/ljspeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/speechcommands.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/tedlium.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets copying build/lib.linux-x86_64-3.6/torchaudio/datasets/yesno.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets creating build/bdist.linux-x86_64/wheel/torchaudio/_internal copying build/lib.linux-x86_64-3.6/torchaudio/_internal/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal copying build/lib.linux-x86_64-3.6/torchaudio/_internal/fft.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal copying build/lib.linux-x86_64-3.6/torchaudio/_internal/module_utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal creating build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/common.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/no_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/soundfile_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/sox_io_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend copying build/lib.linux-x86_64-3.6/torchaudio/backend/utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend creating build/bdist.linux-x86_64/wheel/torchaudio/extension copying build/lib.linux-x86_64-3.6/torchaudio/extension/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/extension copying build/lib.linux-x86_64-3.6/torchaudio/extension/extension.py -> build/bdist.linux-x86_64/wheel/torchaudio/extension creating build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/conv_tasnet.py -> build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/deepspeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2letter.py -> build/bdist.linux-x86_64/wheel/torchaudio/models copying build/lib.linux-x86_64-3.6/torchaudio/models/wavernn.py -> build/bdist.linux-x86_64/wheel/torchaudio/models creating build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2 copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2 copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/components.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2 copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/model.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2 creating build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/import_fairseq.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/import_huggingface.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils creating build/bdist.linux-x86_64/wheel/torchaudio/sox_effects copying build/lib.linux-x86_64-3.6/torchaudio/sox_effects/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/sox_effects copying build/lib.linux-x86_64-3.6/torchaudio/sox_effects/sox_effects.py -> build/bdist.linux-x86_64/wheel/torchaudio/sox_effects creating build/bdist.linux-x86_64/wheel/torchaudio/utils copying build/lib.linux-x86_64-3.6/torchaudio/utils/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/utils copying build/lib.linux-x86_64-3.6/torchaudio/utils/sox_utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/utils creating build/bdist.linux-x86_64/wheel/torchaudio/functional copying build/lib.linux-x86_64-3.6/torchaudio/functional/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional copying build/lib.linux-x86_64-3.6/torchaudio/functional/filtering.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional copying build/lib.linux-x86_64-3.6/torchaudio/functional/functional.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional creating build/bdist.linux-x86_64/wheel/torchaudio/prototype copying build/lib.linux-x86_64-3.6/torchaudio/prototype/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/prototype copying build/lib.linux-x86_64-3.6/torchaudio/prototype/rnnt_loss.py -> build/bdist.linux-x86_64/wheel/torchaudio/prototype copying build/lib.linux-x86_64-3.6/torchaudio/version.py -> build/bdist.linux-x86_64/wheel/torchaudio copying build/lib.linux-x86_64-3.6/torchaudio/_torchaudio.so -> build/bdist.linux-x86_64/wheel/torchaudio running install_egg_info running egg_info writing torchaudio.egg-info/PKG-INFO writing dependency_links to torchaudio.egg-info/dependency_links.txt writing requirements to torchaudio.egg-info/requires.txt writing top-level names to torchaudio.egg-info/top_level.txt reading manifest file 'torchaudio.egg-info/SOURCES.txt' writing manifest file 'torchaudio.egg-info/SOURCES.txt' Copying torchaudio.egg-info to build/bdist.linux-x86_64/wheel/torchaudio-0.10.0a0+9e36281-py3.6.egg-info running install_scripts adding license file "LICENSE" (matched pattern "LICEN[CS]E*") creating build/bdist.linux-x86_64/wheel/torchaudio-0.10.0a0+9e36281.dist-info/WHEEL creating 'dist/torchaudio-0.10.0a0+9e36281-cp36-cp36m-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it adding 'torchaudio/__init__.py' adding 'torchaudio/_torchaudio.so' adding 'torchaudio/kaldi_io.py' adding 'torchaudio/transforms.py' adding 'torchaudio/version.py' adding 'torchaudio/_internal/__init__.py' adding 'torchaudio/_internal/fft.py' adding 'torchaudio/_internal/module_utils.py' adding 'torchaudio/backend/__init__.py' adding 'torchaudio/backend/common.py' adding 'torchaudio/backend/no_backend.py' adding 'torchaudio/backend/soundfile_backend.py' adding 'torchaudio/backend/sox_io_backend.py' adding 'torchaudio/backend/utils.py' adding 'torchaudio/compliance/__init__.py' adding 'torchaudio/compliance/kaldi.py' adding 'torchaudio/datasets/__init__.py' adding 'torchaudio/datasets/cmuarctic.py' adding 'torchaudio/datasets/commonvoice.py' adding 'torchaudio/datasets/gtzan.py' adding 'torchaudio/datasets/librispeech.py' adding 'torchaudio/datasets/libritts.py' adding 'torchaudio/datasets/ljspeech.py' adding 'torchaudio/datasets/speechcommands.py' adding 'torchaudio/datasets/tedlium.py' adding 'torchaudio/datasets/utils.py' adding 'torchaudio/datasets/vctk.py' adding 'torchaudio/datasets/yesno.py' adding 'torchaudio/extension/__init__.py' adding 'torchaudio/extension/extension.py' adding 'torchaudio/functional/__init__.py' adding 'torchaudio/functional/filtering.py' adding 'torchaudio/functional/functional.py' adding 'torchaudio/models/__init__.py' adding 'torchaudio/models/conv_tasnet.py' adding 'torchaudio/models/deepspeech.py' adding 'torchaudio/models/wav2letter.py' adding 'torchaudio/models/wavernn.py' adding 'torchaudio/models/wav2vec2/__init__.py' adding 'torchaudio/models/wav2vec2/components.py' adding 'torchaudio/models/wav2vec2/model.py' adding 'torchaudio/models/wav2vec2/utils/__init__.py' adding 'torchaudio/models/wav2vec2/utils/import_fairseq.py' adding 'torchaudio/models/wav2vec2/utils/import_huggingface.py' adding 'torchaudio/prototype/__init__.py' adding 'torchaudio/prototype/rnnt_loss.py' adding 'torchaudio/sox_effects/__init__.py' adding 'torchaudio/sox_effects/sox_effects.py' adding 'torchaudio/utils/__init__.py' adding 'torchaudio/utils/sox_utils.py' adding 'torchaudio-0.10.0a0+9e36281.dist-info/LICENSE' adding 'torchaudio-0.10.0a0+9e36281.dist-info/METADATA' adding 'torchaudio-0.10.0a0+9e36281.dist-info/WHEEL' adding 'torchaudio-0.10.0a0+9e36281.dist-info/top_level.txt' adding 'torchaudio-0.10.0a0+9e36281.dist-info/RECORD' removing build/bdist.linux-x86_64/wheel ``` </details> Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D29316372 Pulled By: seemethere fbshipit-source-id: 02be64df6197c0d4bad5a5bfb3cef336c11f53ed	2021-06-23 14:08:19 -07:00
Richard Barnes	ad1041576a	Fix loop types (#60504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60504 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29313197 fbshipit-source-id: bc86622b587e4fdb73431c2ff27300404c9693ae	2021-06-23 13:26:22 -07:00
Thomas J. Fan	da030c59e7	ENH Adds Byte support for nll_loss (CPU) (#60308 ) Summary: Addresses a part of https://github.com/pytorch/pytorch/issues/59765 This PR adds byte support for nll_loss on the CPU for `input.dim() == 2`. CUDA support will be implemented when `nll_loss` migration to CUDA is completed in https://github.com/pytorch/pytorch/pull/60299 and https://github.com/pytorch/pytorch/pull/60097 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60308 Reviewed By: VitalyFedyunin Differential Revision: D29329458 Pulled By: jbschlosser fbshipit-source-id: d3585c4966030bc61e451f8aa817406a8a3acf47	2021-06-23 12:16:45 -07:00
Natalia Gimelshein	7bf195f360	fix kernel launch check in cross kernel Summary: per title Test Plan: buck test mode/opt //caffe2/test:kernel_launch_checks -- --exact 'caffe2/test:kernel_launch_checks - test_check_cuda_launches (test_kernel_launch_checks.AlwaysCheckCudaLaunchTest)' --run-disabled Reviewed By: r-barnes Differential Revision: D29335739 fbshipit-source-id: 385c66b1806886deba35f7fd83e29e0885999119	2021-06-23 11:47:50 -07:00
Yuxin Chen	308d238377	add SequenceMask op (#60235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60235 This diff - added SequenceMask op in Dper3 (caffe2 & pytorch) - added shape inference functions for SequenceMask op Test Plan: ``` buck test dper3/dper3/modules/low_level_modules/tests:single_operators_test -- test_sequence_mask ``` Differential Revision: D29210097 fbshipit-source-id: cab3460e0fd6c49bec6d0c5c624bd4652de7604b	2021-06-23 11:33:00 -07:00
Rong Rong (AI Infra)	e60f9cfc58	Revert D29135358: [quant] Input-Weight Equaliaztion - convert modifications Test Plan: revert-hammer Differential Revision: D29135358 (`3de79b7757`) Original commit changeset: 2d0005672904 fbshipit-source-id: cac30c1202ebbce4f22e50ed920340c7b4c6849f	2021-06-23 11:23:24 -07:00
Peter Bell	03ab5b72c9	Fix parallel tbb build (#60532 ) Summary: Small typo in https://github.com/pytorch/pytorch/issues/60183 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60532 Reviewed By: walterddr Differential Revision: D29336173 Pulled By: ngimel fbshipit-source-id: 57d753f21d484bbae26a23cb3eb35e497e25118a	2021-06-23 11:16:36 -07:00
Pritam Damania	bea83e2e46	Add `NoChunk` wrapper for pipeline args. (#57325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57325 As per the design outlined in https://github.com/pytorch/pytorch/issues/53952, adding a `NoChunk` wrapper for pipeline parallelism inputs. If a Tensor is wrapped with this wrapper, the pipeline implementation does not split this Tensor across micro-batches and instead just replicates this tensor as-is similar to non-tensors. ghstack-source-id: 132009305 Test Plan: 1) unit tests. 2) waitforbuildbot. Reviewed By: SciPioneer Differential Revision: D28109277 fbshipit-source-id: ee78c814c715d207d2796aba40b756a8e1834898	2021-06-23 11:13:14 -07:00
Jane Xu	6385621003	Use JOB_BASE_NAME throughout code--consolidate CIRCLE_JOB (#60425 ) Summary: This PR is a first step in unifying our environment variables across CI (so that we don't have `CIRCLE_BLAH` in our GHA workflows, for example), though I'd like for this PR to be more for discussion about how best to consolidate these variables. This small change only changes most CIRCLE_JOB references in our code to be JOB_BASE_NAME, as that seems the closest GHA (and ROCm) equivalent. Currently, JOB_BASE_NAME is defined as: - in Circle: CIRCLE_JOB (name of the job, like `pytorch_linux_bionic_py3_8_gcc9_coverage_test1`) - in GHA: the build_environment with a `-build` or `-test` tacked to the end , e.g., `pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7-test` - in ROCm: I don't actually know, but it's important for ROCm test sharding as shown in https://github.com/pytorch/pytorch/pull/60409 I am not sure if this is the intention for JOB_BASE_NAME so it is open to discussion what variable we should use if not JOB_BASE_NAME. I also don't know if it's worth the effort consolidating all these variables, so discussion is also highly encouraged there! Next steps: - Consolidate more CIRCLE_* references, maybe into CI_* equivalents? - We use BUILD_ENVIRONMENT everywhere in Circle though the variable is inconsistent across binary vs CI jobs and across platforms. For example, for linux tests and builds, BUILD_ENVIRONMENT contains the `_test` and `_build` suffixes, but the windows jobs don't. In GHA, BUILD_ENVIRONMENT is similar to how it's defined in windows jobs on Circle. This inconsistency is confusing, and we can probably do something about it. I'm thinking of switching out BUILD_ENVIRONMENT for JOB_BASE_NAME in our test scripts where we actually mean JOB_BASE_NAME. - We should probably document the meaning of the variables we consolidate somewhere, preferably in a README in some unified `ci/` folder. For example, it seems BUILD_ENVIRONMENT is supposed to capture the build environment, whereas JOB_BASE_NAME is supposed to capture the environment _and_ whether we're building or testing. Notes: - I did not replace CIRCLE_JOB references in third_party directories - Previously, print_test_stats reported CIRCLE_JOB as only the build environment for GHA workflows, and I think tacking on the `build` or `test` will not harm anything, though I may be wrong. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60425 Reviewed By: seemethere, samestep Differential Revision: D29333882 Pulled By: janeyx99 fbshipit-source-id: a82080e6205a03a1183035011ce59698eca06748	2021-06-23 11:11:21 -07:00
Howard Huang	ff3678eec2	Disable group group backend rpc tests from running on CI (#60407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60407 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29278179 Pulled By: H-Huang fbshipit-source-id: ee78085eeb04d81842c95236b8c3a33de7142a3a	2021-06-23 10:58:31 -07:00
Pritam Damania	109f831409	Support non-Tensor args in the Pipe API (#57226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57226 As per the design outlined in https://github.com/pytorch/pytorch/issues/53952, this PR adds support for non-Tensor args in the pipeline. The `NoChunk` wrapper hasn't been implemented yet and will be implemented in a follow up PR. ghstack-source-id: 132008356 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: SciPioneer Differential Revision: D28083564 fbshipit-source-id: 5f09da238eec0167feff76fe98916dedb0a9ae4e	2021-06-23 10:53:37 -07:00
Bert Maher	10e11dbdcd	Reland D29190420: [nnc][tests] Tests and benchmarks for computeSum (#60550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60550 Original commit changeset: ed655497a981 Whatever gcc version OSS Bazel uses wasn't happy move-constructing the SimpleIREvaluator, so use a unique_ptr instead. Test Plan: CI. Hope that the gcc version used by OSS Bazel build is happier with this (it should be), since actually testing it locally is an intractable pain. Reviewed By: navahgar Differential Revision: D29333116 fbshipit-source-id: c3e4b5d8c91eb96a43ae5315a01ca0c0f4d4a99d	2021-06-23 10:50:03 -07:00
Yukio Siraichi	5fd45b8089	Port `any` kernel to structured kernels. (#60361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60361 Tracking issue: #55070 This PR was openned so as to solve the CI failures in main when merging: #59371 #59372 #59373 #59937 #59938. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29265859 Pulled By: ezyang fbshipit-source-id: 0cca0431569f38a168473b5cc572ced473799961	2021-06-23 10:44:24 -07:00
Yukio Siraichi	a5aa940f5e	Port `all` kernel to structured kernels. (#60360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60360 Tracking issue: #55070 This PR was openned so as to solve the CI failures in main when merging: #59371 #59372 #59373 #59937 #59938. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29265856 Pulled By: ezyang fbshipit-source-id: 6e9b45ad3fc3852bb142ae2e3d58fc5d0a911aed	2021-06-23 10:43:25 -07:00
Nikita Shulga	7b2d375148	Fix convolution_depthwise3x3_winograd for multichannel output (#60460 ) Summary: Before this change it was implemented with the assumption, that number of groups, input and output channels are the same, which is not always the case Extend the implementation to support any number of output channels as long as number of groups equals to the number of input channels (i.e. kernel.size(1) == 1) Fixes https://github.com/pytorch/pytorch/issues/60176 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60460 Reviewed By: albanD Differential Revision: D29299693 Pulled By: malfet fbshipit-source-id: 31130c71ce86535ccfba2f4929eee3e2e287b2f0	2021-06-23 10:38:14 -07:00
Jane Xu	c63a0d0cfe	Adding windows CUDA smoke tests on PRs (#59686 ) Summary: Adding windows CUDA smoke tests on PRs (master should run the full suite). Next step: - Automate data update so we get a new smoke test list without manual effort Pull Request resolved: https://github.com/pytorch/pytorch/pull/59686 Test Plan: https://github.com/pytorch/pytorch/actions/runs/958296267 The sharded smoke tests take long still because of dependencies installation Reviewed By: walterddr Differential Revision: D29243533 Pulled By: janeyx99 fbshipit-source-id: dde7ba127fa15c95bda0e833cc5311598fb85e2b	2021-06-23 10:13:50 -07:00
Rohan Varma	8162439cbd	[DDP] Remove python GradBucket construction (#60301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60301 `GradBucket` is not meant to be constructed by Python user, only consumed as part of comm. hook ghstack-source-id: 131860243 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D29239320 fbshipit-source-id: f1631a16e7d66b7e4a9b4a44698e2319005d10b2	2021-06-23 10:05:34 -07:00
Ilqar Ramazanli	e8690dacb2	To add Nesterov Adam Algorithm to Optimizers (#59009 ) Summary: Fixes : https://github.com/pytorch/pytorch/issues/5804 In the paper : https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ Timothy Dozat suggested a new optimization algorithm with an essence of combination of NAG and Adam algorithms. It is known that the idea of momentum can be improved with the Nesterov acceleration in optimization algorithms, and Dozat is investigating to apply this idea to momentum component of Adam algorithm. Author provided experiment evidence in their work to show excellence of the idea. In this PR we are implementing the proposed algorithm NAdam in the mentioned paper. Author has a preliminary work http://cs229.stanford.edu/proj2015/054_report.pdf where he shows the decay base constant should be taken as 0.96 which we also followed the same phenomenon here in this implementation similar to Keras. Moreover, implementation / coding practice have been followed similar to Keras in some other places as well: `f9d3868495/tensorflow/python/keras/optimizer_v2/nadam.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59009 Reviewed By: gchanan, vincentqb Differential Revision: D29220375 Pulled By: iramazanli fbshipit-source-id: 4b4bb4b15f7e16f7527f368bbf4207ed345751aa	2021-06-23 08:21:43 -07:00
Kevin Tse	a2525b035c	Remove unused sample input argument from functions to resolve issue #55737 (#60486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60486 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D29311875 Pulled By: NivekT fbshipit-source-id: 4bf451c4f8e78290398e0514860a14a335a51fa7	2021-06-23 08:02:04 -07:00
johnlu	265f0e5321	Add device runtime API for the plug-in to register platform python module into torch (#59857 ) Summary: ## Motivation Allow the out-of-tree Pytorch plug-in, for the device type other than CUDA, to add the runtime interface to the `torch` module. The runtime interface of the device can be referred with the device type name in the `torch` module. I.E., `torch.cuda` or `torch.xpu`. ## Solution - Add a register interface for the plug-in to add the platform python module into `torch` module with the device type name. I.E., The `torch.xpu` can be used to refer the XPU runtime interface after the XPU runtime module is registered with `torch._register_device_module('xpu', xpu_module)` in Intel's XPU plug-in. ## Additional Context More details about runtime has been discussed in https://github.com/pytorch/pytorch/issues/53707. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59857 Reviewed By: mrshenli Differential Revision: D29309320 Pulled By: ezyang fbshipit-source-id: b9802a5f937ddef9e0bdaf2f7692dfe463912fbe	2021-06-23 07:54:45 -07:00
Alexander Grund	c97d4d5a34	Fix test failures with some glibc libraries (#60450 ) Summary: Large complex values lead to nan/inf results when using some glibc implementations of atanh/acos - Skip test_reference_numerics_hard instead of "normal" - Test the edge values only for cdouble where the stdlib/glibc implementations support those large values Fixes https://github.com/pytorch/pytorch/issues/60259 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60450 Reviewed By: mrshenli Differential Revision: D29304834 Pulled By: ezyang fbshipit-source-id: d6b97456847c5573b9d2cb447bfc62abba73cb2a	2021-06-23 07:49:27 -07:00
Andrew Gu	f0e4e4be72	Clean Up ZeRO (#60285 ) Summary: Overview: Being relatively new to PyTorch and ZeRO, I found parts of the code slightly hard to follow. This change strives to clean up the `ZeroRedundancyOptimizer` code in `zero_redundancy_optimizer.py` by reorganizing some computations, making variable names more explicit and consistent, and unifying terminology in the documentation. The goal is for the code to be easier to extend afterwards. Changes: 1) `state_dict()`: The [logic](`85517a2b70/torch/distributed/optim/zero_redundancy_optimizer.py (L510)`) for updating the global `state_dict` with each rank's local `state_dict` is simplified and made more explicit. Notably, the `dict` [`local_index_to_param_id`](`85517a2b70/torch/distributed/optim/zero_redundancy_optimizer.py (L513)`) is unneeded. It maps `local_pg["params"][i]` to `id(global_pg["params"][i])`, so it is equivalent to make a single pass over both lists in tandem, effectively iterating over `i`, without a need for the explicit `dict`. 2) `_update_trainable()`: The function [initializes](`85517a2b70/torch/distributed/optim/zero_redundancy_optimizer.py (L597)`) the local optimizer if it does not exist. I am unaware of any reason for the local optimizer to be destroyed after initialization, so I moved that logic to its own function `_init_local_optimizer()`, which is called once in the constructor. After [discussion](https://github.com/pytorch/pytorch/pull/60285#discussion_r654706728), I removed the function `_update_trainable()` itself in favor of adding a check for `parameters_as_bucket_view` in `build_param_buckets()` directly. 3) `rank_local_state_dict()`: This [function](`85517a2b70/torch/distributed/optim/zero_redundancy_optimizer.py (L528)`) is currently broken. It appears to be legacy and relies on the input `state_dict` to have the key `"partitions"`. For now, I have removed it and added an [issue](https://github.com/pytorch/pytorch/issues/60284). Is it a notable use case to want to access another rank's `state_dict` in particular (as opposed to consolidating the entire state and then accessing)? 4) `local_state_dict():` After [discussion](https://github.com/pytorch/pytorch/pull/60285#discussion_r655571043), I removed the function. 5) `partition_parameters()`: After [discussion](https://github.com/pytorch/pytorch/pull/60285#discussion_r654708183), I renamed the function to `_partition_parameters()` to mark it as private. 6) `_param_to_index`: After [discussion](https://github.com/pytorch/pytorch/pull/60285#discussion_r654828100), I changed the key to be the parameter itself rather than its integer ID. 7) `buckets`: I renamed the data structure to `_buckets` to mark it as private. 8) Terminology: I tried to reduce the set of terms being used instead of juggling a number of synonyms. In particular, I made an effort to distinguish between "local" and "global" and to make names more indicative of typing. 9) Style: Per the [PyTorch contributing guide](https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md#writing-documentation), I made all docstrings abide by the 80 character limit, except for the one [line](`554891f6fa/torch/distributed/optim/zero_redundancy_optimizer.py (L142)`) showing the example ZeRO usage. Some code lines violate the limit for readability. Also, I unified some of the minor stylistic usages out of habit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60285 Test Plan: The test suite passes as expected (on the AI AWS cluster): ``` gpurun python test/distributed/optim/test_zero_redundancy_optimizer.py ``` I visually inspected the generated HTML doc (as generated following [this](https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md#writing-documentation)). Reviewed By: mrshenli Differential Revision: D29320726 Pulled By: andwgu fbshipit-source-id: 23f69a19ecc5e877a38fe1df0da11329428311dd	2021-06-23 07:21:40 -07:00
Michael Carilli	56481f9762	Ensure proper syncs for out-of-place grad creation (torch.autograd.grad) when backward ops run on side streams (#60127 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59844. Streaming backwards collects "leaf streams" for AccumulateGrad functions that stash or accumulate .grad attributes for autograd leaf tensors, and syncs those streams with some ambient stream(s) so later ops can safely consume the grads on the ambient stream(s). But, currently, streaming backwards does not collect leaf streams for grads produced out-of-place (ie, not stashed onto a .grad attribute) by `torch.autograd.grad`, because these out-of-place grads are "captured" and returned before they reach an AccumulateGrad function. Some out-of-place grads might not even have an AccumulateGrad function to go to, because `torch.autograd.grad` can be told to make grads for non-leaf temporaries.[1] The upshot is, when streaming backwards makes ops that produce out-of-place gradients run on side streams, no ambient stream is told to sync on these side streams, so `torch.autograd.grad` doesn't offer the same post-call safe-use guarantees for grads as the leaf accumulation of `torch.autograd.backward`. This PR ensures `torch.autograd.grad` gives the same safe-use guarantees as `torch.autograd.backward` by also stashing leaf streams for grads created out-of-place. I augmented a streaming backwards test to include a torch.autograd.grad attempt. The test fails on current master[2] and passes with the engine.cpp diffs. I have no idea if this bug or its fix matter to distributed autograd. pritamdamania mrshenli should take a look before it's merged. [1] example: ```python leaf = torch.tensor(..., requires_grad=True) tmp = leaf * 2 loss = tmp.sum() torch.autograd.grad(loss, inputs=(tmp, leaf)) ``` Technically, because `torch.autograd.grad` can be told to produce grads for non-leaf temporaries, these streams might NOT be "leaf streams". Maybe I should rename `leaf_streams`? [2] the way the test currently fails is fun: it reports ``` AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 0 element(s) (out of 25) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.0 (5.0 vs. 5.0), which occurred at index (0, 0). ``` I suspect this [kafka trap](https://en.wiktionary.org/wiki/Kafkatrap) happens because assertEqual does a comparison test on the device, syncs on some bool result, sees failure and prints the tensors post-sync at which point is IS safe to access the values. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60127 Reviewed By: mrshenli Differential Revision: D29276581 Pulled By: albanD fbshipit-source-id: a9f797e2fd76e2f884cce5a32ecf5d9b704c88ee	2021-06-23 07:14:01 -07:00
Anjali Chourdia	b14f19b6fe	Revert D29190420: [nnc][tests] Tests and benchmarks for computeSum Test Plan: revert-hammer Differential Revision: D29190420 (`21479ad20c`) Original commit changeset: 86246df82098 fbshipit-source-id: ed655497a981783da4c8f13e2d7fec104e3cb184	2021-06-23 06:59:37 -07:00
Ilqar Ramazanli	90cd57ee16	To add edge_order=2 and documentation for gradient operator (#58165 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56036 Fixes https://github.com/pytorch/pytorch/issues/56130 * All the interior points are computed using second order accurate central differences method for gradient operator. However, currently we only have first order method computation for edge points. In this PR we are adding second order methods for edge points as well. * Currently, there is no detailed description of how gradient operator computed using second order method, and how to use parameters correctly. We add detailed explanation of meaning of each parameter, and return of the gradient operator, meanwhile giving description of the second-order computation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58165 Reviewed By: mruberry Differential Revision: D29305321 Pulled By: iramazanli fbshipit-source-id: 0e0e418eed801c8510b8babe2ad3d064479fb4d6	2021-06-23 03:35:15 -07:00
Jordan Fix	7ed07e2a7d	[NormalizeArgs] Retain node.meta (#60449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60449 After normalizing args, still retain each node's `meta` Test Plan: Added unit test. Reviewed By: gcatron Differential Revision: D29293179 fbshipit-source-id: 432b409790041fa4d6e759f7b46a8bee363497b0	2021-06-23 03:31:53 -07:00
Peter Bell	66452e0a8c	Ensure num_threads is initialized before calling omp_get_max_threads (#60185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60185 `get_num_threads` is usually called before `parallel_for` so there's no guaruntee we've initialized `num_threads` properly. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29287814 Pulled By: ngimel fbshipit-source-id: 7e9c86fc32d63889a57a9b1d2b7d8f3863481dce	2021-06-23 01:18:24 -07:00
Peter Bell	19553438ed	OpenMP: Refactor parallel_reduce to share code with parallel_for (#60184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60184 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29287817 Pulled By: ngimel fbshipit-source-id: 734a33a8d965208662989e2497b345b68c132498	2021-06-23 01:18:22 -07:00
Peter Bell	c75714e594	Ensure thread id is valid in nested parallel regions (#60183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60183 Fixes https://github.com/pytorch/pytorch/pull/59149#issuecomment-863287331 `parallel_for` will call the function directly if it would have run on only a single thread anyway. This is great for performance, but causes an issue in nested parallel regions because `get_thread_num` will reflect the parent parallel region instead of the current `parallel_for` call. I fix this by using a `thread_local` variable for the current thread id and manually setting it before each call to the user-provided function. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29287816 Pulled By: ngimel fbshipit-source-id: 777f771a0900750c7f22eb1dd185d84d19282108	2021-06-23 01:17:09 -07:00
Peter Bell	3f3fd57044	Migrate crossKernel from THC to ATen (CUDA) (#60039 ) Summary: Ref https://github.com/pytorch/pytorch/issues/24507 (There doesn't seem to be an actual issue for cross) This also moves the remaining operator functors in `THCTensorMathPointwise.cuh` to `SparseCUDATensorMath.cu` which is the only file using them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60039 Reviewed By: mrshenli Differential Revision: D29314638 Pulled By: ngimel fbshipit-source-id: aa7b57f6e11a933fb44f044e26945bb4a9e3de5f	2021-06-23 00:37:55 -07:00
Nikita Shulga	f590cceacb	[BE] Fix Convolution.cpp build warnings (#60463 ) Summary: Use `c10::irange` and `auto` to get rid of narrowing cast and signed-unsigned compilation warnings Pull Request resolved: https://github.com/pytorch/pytorch/pull/60463 Reviewed By: samestep Differential Revision: D29300415 Pulled By: malfet fbshipit-source-id: 4d7f519e2e3ebaa754364f60af762658c1b4a62e	2021-06-23 00:02:33 -07:00
Alexander Grund	3846cef2d7	Increase tolerance for test_grad_scaling_clipping (#60458 ) Summary: This makes it pass on A100 and with e.g. torch.manual_seed(6) called before running this test. Fixes https://github.com/pytorch/pytorch/issues/60455 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60458 Reviewed By: mrshenli Differential Revision: D29309618 Pulled By: ngimel fbshipit-source-id: 72584087bcc949f7bc96b0644b701e69ae1fa025	2021-06-22 23:43:25 -07:00
Eddie Yan	40de03fc55	`topk` on CUDA supports `bfloat16` (#59977 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56176 via https://github.com/pytorch/pytorch/issues/58196 CC zasdfgbnm ngimel ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/59977 Reviewed By: mrshenli Differential Revision: D29315018 Pulled By: ngimel fbshipit-source-id: 0a87e7f155a97225fc6b2ec5dc0dc38a23156b41	2021-06-22 23:39:24 -07:00
Bert Maher	21479ad20c	[nnc][tests] Tests and benchmarks for computeSum (#60160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60160 Adds a few simple tests and benchmarks for the `computeSum` op (equivalent to `at::sum`). The benchmarks test 1D reduction and 2D row and column reduction. Performance is in the ballpark of aten (14-15 GB/s) on my skylake devserver for all cases, and occasionally better (e.g. 256k * 64 row reduction goes from 9 GB/s to 13). Results (on my skylake-avx512, with turbo disabled): ``` ------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------ Reduce1D/Torch/16777216 4746995 ns 4746722 ns 150 BYTES=14.1379G/s Reduce1D/Naive/16777216 34063215 ns 34061388 ns 21 BYTES=1.97023G/s Reduce1D/NativeRfactor/16777216 5057175 ns 5057167 ns 139 BYTES=13.2701G/s Reduce1D/TeNaive/16777216 33868945 ns 33868851 ns 21 BYTES=1.98143G/s Reduce1D/TeSplitTail/16777216 33902786 ns 33900436 ns 21 BYTES=1.97959G/s Reduce1D/TeSplitMask/16777216 33922509 ns 33920604 ns 21 BYTES=1.97841G/s Reduce1D/TeRfactorV1/16777216 5141150 ns 5141002 ns 135 BYTES=13.0537G/s Reduce1D/Op/16777216 5140390 ns 5140091 ns 135 BYTES=13.056G/s Reduce2DCol/Torch/8/2097152 12824403 ns 12823563 ns 55 BYTES=5.8874G/s Reduce2DCol/Torch/64/262144 8306873 ns 8306743 ns 83 BYTES=8.20507G/s Reduce2DCol/Torch/4096/4096 7992364 ns 7992239 ns 87 BYTES=8.3988G/s Reduce2DCol/OpSchedule/8/2097152/0 4866144 ns 4865766 ns 138 BYTES=15.5161G/s Reduce2DCol/OpSchedule/64/262144/0 36668978 ns 36666415 ns 19 BYTES=1.85885G/s Reduce2DCol/OpSchedule/4096/4096/0 155862459 ns 155801266 ns 4 BYTES=430.839M/s Reduce2DCol/OpSchedule/8/2097152/1 8067683 ns 8061117 ns 85 BYTES=9.36563G/s Reduce2DCol/OpSchedule/64/262144/1 7496686 ns 7496562 ns 93 BYTES=9.09183G/s Reduce2DCol/OpSchedule/4096/4096/1 5262821 ns 5262186 ns 131 BYTES=12.7562G/s Reduce2DCol/OpSchedule/8/2097152/2 6237899 ns 6237210 ns 109 BYTES=12.1044G/s Reduce2DCol/OpSchedule/64/262144/2 5258012 ns 5257655 ns 127 BYTES=12.9635G/s Reduce2DCol/OpSchedule/4096/4096/2 5231686 ns 5228241 ns 132 BYTES=12.839G/s Reduce2DCol/OpSchedule/8/2097152/3 11088573 ns 11087557 ns 62 BYTES=6.80921G/s Reduce2DCol/OpSchedule/64/262144/3 5338843 ns 5338326 ns 127 BYTES=12.7676G/s Reduce2DCol/OpSchedule/4096/4096/3 4311617 ns 4308102 ns 162 BYTES=15.5812G/s Reduce2DRow/Torch/8/2097152 4642244 ns 4641794 ns 151 BYTES=14.4575G/s Reduce2DRow/Torch/64/262144 4628311 ns 4628245 ns 151 BYTES=14.4999G/s Reduce2DRow/Torch/4096/4096 4894012 ns 4893316 ns 143 BYTES=13.7177G/s Reduce2DRow/Torch/262144/64 10469098 ns 10468027 ns 68 BYTES=6.51101G/s Reduce2DRow/Hand/262144/64 5554380 ns 5554059 ns 126 BYTES=12.2716G/s Reduce2DRow/OpSchedule/8/2097152/0 33890363 ns 33888931 ns 21 BYTES=1.98026G/s Reduce2DRow/OpSchedule/64/262144/0 33901317 ns 33899436 ns 21 BYTES=1.97965G/s Reduce2DRow/OpSchedule/4096/4096/0 33500358 ns 33498815 ns 21 BYTES=2.00381G/s Reduce2DRow/OpSchedule/262144/64/0 13132231 ns 13131049 ns 53 BYTES=5.19056G/s Reduce2DRow/OpSchedule/8/2097152/1 5200423 ns 5200025 ns 134 BYTES=12.9055G/s Reduce2DRow/OpSchedule/64/262144/1 5204428 ns 5204327 ns 133 BYTES=12.8949G/s Reduce2DRow/OpSchedule/4096/4096/1 8724355 ns 8723370 ns 80 BYTES=7.69488G/s Reduce2DRow/OpSchedule/262144/64/1 1811861280 ns 1811352083 ns 1 BYTES=37.6279M/s Reduce2DRow/OpSchedule/8/2097152/2 9169829 ns 9168946 ns 76 BYTES=7.31915G/s Reduce2DRow/OpSchedule/64/262144/2 9159901 ns 9158560 ns 76 BYTES=7.32747G/s Reduce2DRow/OpSchedule/4096/4096/2 9217398 ns 9215557 ns 76 BYTES=7.28391G/s Reduce2DRow/OpSchedule/262144/64/2 10820450 ns 10818998 ns 66 BYTES=6.29979G/s Reduce2DRow/OpSchedule/8/2097152/3 5227921 ns 5226544 ns 133 BYTES=12.84G/s Reduce2DRow/OpSchedule/64/262144/3 5194362 ns 5194082 ns 133 BYTES=12.9203G/s Reduce2DRow/OpSchedule/4096/4096/3 5196080 ns 5195349 ns 134 BYTES=12.9203G/s Reduce2DRow/OpSchedule/262144/64/3 5235189 ns 5234728 ns 133 BYTES=13.0202G/s ``` ghstack-source-id: 131753875 Test Plan: these tests Reviewed By: navahgar Differential Revision: D29190420 fbshipit-source-id: 86246df82098da4f5493d6c4f34a40016d95a9f0	2021-06-22 23:04:09 -07:00
Bert Maher	fbeb8b4992	[nnc] Speed up batchnorm benchmark Summary: Use better scheduling: fuse and parallelize NC, fuse and vectorize HW. ``` ----------------------------------------------- N/C/H/W ATen NNC ----------------------------------------------- 1/64/112/112 45449 ns 36672 ns 1/256/14/14 15555 ns 7116 ns 1/128/28/28 15737 ns 8560 ns 1/64/56/56 20766 ns 12153 ns 1/512/7/7 16985 ns 8182 ns 5/64/112/112 2532475 ns 2069668 ns 5/256/14/14 24507 ns 12228 ns 5/128/28/28 29352 ns 20146 ns 5/64/56/56 44786 ns 38784 ns 5/512/7/7 22307 ns 20505 ns ``` Test Plan: benchmark results above Reviewed By: navahgar Differential Revision: D29288658 fbshipit-source-id: dd05efa4b7d26b6ad94f54a9ef6c8c47adb160b5	2021-06-22 22:57:43 -07:00
Jiakai Liu	b0c9762e2d	[pytorch][nnc] external function call to xnnpack ops (#59525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59525 This PR added NNC external function call binding for two XNNPack ops: - prepacked::linear_clamp_run - prepacked::conv2d_clamp_run Both ops take two arguments: a regular input tensor and a prepacked context object that contains other parameters like weights/bias/etc. The prepacked context object's type is a custom class. NNC doesn't generate assembly code that reads the content of the prepacked object directly. It simply passes it into the XNNPack ops wrapper, so both NNC and the generated assembly code don't need to know the custom class type. At compilation time, we use a size-1 dummy tensor as the placeholder for the prepacked XNNPack context object. At runtime, we pass in the raw pointer of the XNNPack context object as if it were a regular tensor storage data pointer. Inside the external function call wrapper, we reinterpret_cast the raw pointer back to the custom class type before dispatching to the XNNPack ops. ghstack-source-id: 132135512 Test Plan: unit test Reviewed By: bertmaher Differential Revision: D28924934 fbshipit-source-id: 15326b35dc6c022f4c3f247a2037c361e06e80b4	2021-06-22 21:29:31 -07:00
Ilqar Ramazanli	79dc500a99	Add error message for sequence length to be equal to 0 case for RNNs (#60269 ) Summary: Fixes #https://github.com/pytorch/pytorch/issues/50192 It has been discussed in the issue that, currently RNN apis do not support inputs with `seq_len=0` and the error message does not reflect this issue clearly. This PR is suggesting a solution to this issue, by adding a more clear error message that, none of RNN api (nn.RNN, nn.GRU and nn.LSTM) do not support `seq_len=0` for neither one-directional nor bi-directional layers. ``` import torch input_size = 5 hidden_size = 6 rnn = torch.nn.GRU(input_size, hidden_size) for seq_len in reversed(range(4)): output, h_n = rnn(torch.zeros(seq_len, 10, input_size)) print('{}, {}'.format(output.shape, h_n.shape)) ``` Previously was giving output as : ``` torch.Size([3, 10, 6]), torch.Size([1, 10, 6]) torch.Size([2, 10, 6]), torch.Size([1, 10, 6]) torch.Size([1, 10, 6]), torch.Size([1, 10, 6]) Traceback (most recent call last): File "test.py", line 8, in <module> output, h_n = rnn(torch.zeros(seq_len, 10, input_size)) File "/opt/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/opt/miniconda3/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 739, in forward result = _VF.gru(input, hx, self._flat_weights, self.bias, self.num_layers, RuntimeError: stack expects a non-empty TensorList ``` However, after adding this PR, this error message change for any combination of [RNN, GRU and LSTM] x [one-directional, bi-directional]. Let's illustrate the change with the following code snippet: ``` import torch input_size = 5 hidden_size = 6 rnn = torch.nn.LSTM(input_size, hidden_size, bidirectional=True) output, h_n = rnn(torch.zeros(0, 10, input_size)) ``` would give output as following: ``` Traceback (most recent call last): File "<stdin>", line 2, in <module> File "/fsx/users/iramazanli/pytorch/torch/nn/modules/module.py", line 1054, in _call_impl return forward_call(input, kwargs) File "/fsx/users/iramazanli/pytorch/torch/nn/modules/rnn.py", line 837, in forward result = _VF.gru(input, hx, self._flat_weights, self.bias, self.num_layers, RuntimeError: Expected sequence length to be larger than 0 in RNN ``` ********************************* The change for Packed Sequence didn't seem to be necessary because from the following code snippet error message looks clear about the issue: ``` import torch import torch.nn.utils.rnn as rnn_utils import torch.nn as nn packed = rnn_utils.pack_sequence([]) ``` returns: ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/fsx/users/iramazanli/pytorch/torch/nn/utils/rnn.py", line 398, in pack_sequence return pack_padded_sequence(pad_sequence(sequences), lengths, enforce_sorted=enforce_sorted) File "/fsx/users/iramazanli/pytorch/torch/nn/utils/rnn.py", line 363, in pad_sequence return torch._C._nn.pad_sequence(sequences, batch_first, padding_value) RuntimeError: received an empty list of sequences ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60269 Reviewed By: mrshenli Differential Revision: D29299914 Pulled By: iramazanli fbshipit-source-id: 5ca98faa28d4e6a5a2f7600a30049de384a3b132	2021-06-22 21:25:05 -07:00
nikithamalgi	dc9aa7b960	Add custom code filter for TS (#60309 ) Summary: ----------- Adds custom code filter for Torchscript to include tracing of forward calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60309 Reviewed By: zhxchen17 Differential Revision: D29317150 Pulled By: nikithamalgifb fbshipit-source-id: d49e4dc74a2b8cc98b0d4967980d819908b7ea7b	2021-06-22 20:55:57 -07:00
Angela Yi	3de79b7757	[quant] Input-Weight Equaliaztion - convert modifications (#59963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59963 When converting, before quantizing the nodes, we call `update_obs_for_equalization()` and `convert_eq_obs()`. `update_obs_for_equalization`: 1. For each InputEqualizationObserver, we find the corresponding WeightEqualizationObserver. 2. For nn.Linear layers, we will create an instance of the WeightEqualizationObserver, run forward on the observer with the given weights. 3. Calculate the equalization scale between the InputEqualizationObserver and WeightEqualizationObserver. `convert_eq_obs`: For every InputEqualizationObserver, we will do the following: 1. Create a node (ex. `x0_activation_post_process_scale`) containing the equalization scale constant. 2. Create another node containing a `mul` operator multiplying the equalization scale and the input. 3. Remove the current InputEqualizationObserver node, and replace it with the `mul` node. For every WeightEqualizationObserver, we will do the following: 1. Get the next equalization scale (we may need this for equalizing connected linear layers). 2. Scale the weights by multiplying it with the reciprocal of the current equalization scale and the next equalization scale Currently, this supports models with `nn.Linear` layers, but does not support connecting linear layers. Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_convert` Original Model: ``` .LinearModule( (linear): Linear(in_features=2, out_features=2, bias=True) ) ``` Graph after `prepare_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {}) %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) return linear_activation_post_process_0 ``` Graph after equalization functions: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {}) %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%mul,), kwargs = {}) %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {}) %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {}) return linear_activation_post_process_0 ``` Graph after `convert_fx`: ``` graph(): %x : [#users=1] = placeholder[target=x] %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale] %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {}) %linear_input_scale_0 : [#users=1] = get_attr[target=linear_input_scale_0] %linear_input_zero_point_0 : [#users=1] = get_attr[target=linear_input_zero_point_0] %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear_input_scale_0, %linear_input_zero_point_0, torch.quint8), kwargs = {}) %linear : [#users=1] = call_module[target=linear](args = (%quantize_per_tensor,), kwargs = {}) %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear,), kwargs = {}) return dequantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29135358 fbshipit-source-id: 2d00056729041318463de61841483490b6bfeee5	2021-06-22 20:43:30 -07:00
nikithamalgi	7589d9c58b	Enable rcb lookup for typing (#60413 ) Summary: ----------- For FX traced models, types from typing modules are not available during the lookup for the function to be traced. Because of which the resolving the type results to a None type object. By enabling lookup for `typing` module in `_jit_internal.py`, we can mitigate this issue with FX_Tracing and scripting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60413 Test Plan: -------- with-proxy python test/test_jit.py -k TestPDT.test_fx_tracing_with_typing Reviewed By: bhosmer Differential Revision: D29314531 Pulled By: nikithamalgifb fbshipit-source-id: 1aa651430b1074c7e6fa74ba02bbcc4e1b00b01b	2021-06-22 18:53:19 -07:00
Nuno Lopes	135e203e5e	avoid unnecessary copies in MultiDispatchKeySet (#60093 ) Summary: The code would previously pass Generator & optional<Tensor> by value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60093 Reviewed By: swolchok Differential Revision: D29310624 Pulled By: bhosmer fbshipit-source-id: fb4a9740a57ef319aaf7c778d51430907a7c0cc5	2021-06-22 18:44:06 -07:00
Supriya Rao	4887c6e401	[quant] avoid resize calls in observer/fake_quant (#60386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60386 During QAT we sometimes encounter errors with scripted models `RuntimeError: cannot resize variables that require grad` For per-tensor cases we don't need to resize some buffers so this PR removes the extra resize ops where applicable Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D29271905 fbshipit-source-id: 01a484a9559a3a4180490f9476d0cd3044ba0d1b	2021-06-22 17:41:43 -07:00
Nikita Shulga	d3ae3e07aa	parse_reports() should include hidden files (#60404 ) Summary: Not sure why there are report files starting with `.`, but in that case `glob('*/.xml')` should not be used, as it will skip those Pull Request resolved: https://github.com/pytorch/pytorch/pull/60404 Reviewed By: samestep Differential Revision: D29276459 Pulled By: malfet fbshipit-source-id: 8e131c38013425ad786e0a9ca0c0a43e57b1679a	2021-06-22 15:53:00 -07:00
Richard Barnes	986a88056c	Remove some unused variables (#60411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60411 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29221207 fbshipit-source-id: da6ad44036291a98f0b36b260062d077a7c2691b	2021-06-22 15:44:33 -07:00
Richard Barnes	36d4062a62	Fix some variable types (#60414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60414 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29221183 fbshipit-source-id: f855efca2fd08844de65d0f9ef73bcceffee657e	2021-06-22 15:44:31 -07:00
Richard Barnes	7d779f84a3	Fix some loop types (#60415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60415 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29221174 fbshipit-source-id: 9bc56655f198f6eb95e6b2e7a4f0573a2cd2f9a1	2021-06-22 15:43:10 -07:00
Sam Estep	6e926f1303	Fix lint (#60472 ) Summary: This PR fixes the `mypy` failure introduced by [`numpy` 1.21.0](https://github.com/numpy/numpy/releases/tag/v1.21.0) (by pinning `numpy` to 1.20, at least for now) and the `quick-checks` failure introduced by https://github.com/pytorch/pytorch/issues/60405. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60472 Test Plan: The Lint workflow in GitHub Actions. Reviewed By: walterddr Differential Revision: D29313009 Pulled By: driazati fbshipit-source-id: 53fd0e0549c26be5fc5d3c502c5891c56c83a32c	2021-06-22 14:48:07 -07:00
Philip Meier	0c916c8a4e	up the priority of numpy array comparisons in self.assertEqual (#59067 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58988. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59067 Reviewed By: jbschlosser Differential Revision: D28986642 Pulled By: heitorschueroff fbshipit-source-id: 3ef2d26b4010fc3519d0a1a020ea446ffeb46ba0	2021-06-22 13:07:07 -07:00
Edward Yang	82c52fd417	Do not wrap Tensor.{grad,_base} by default (#60464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60464 Fixes https://github.com/szagoruyko/pytorchviz/issues/65 An alternate implementation of this PR would be to remove the __torch_function__ interposition points for these accessors entirely. In the end, I decided to opt for extra expressivity. See torch.overrides for the criterion on how I decided which accessors should get the nowrap treatment. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29302835 Pulled By: ezyang fbshipit-source-id: fbe0ac4530a6cc9d6759a3fdf5514d4d7b1f7690	2021-06-22 12:49:23 -07:00
Sam Estep	f42140cb8a	Disable warn_unused_ignores again (#60480 ) Summary: Fixes https://github.com/pytorch/pytorch/pull/60006#issuecomment-866130657. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60480 Test Plan: Run `mypy --config mypy-strict.ini` with [`ruamel.yaml`](https://pypi.org/project/ruamel.yaml/) installed. Reviewed By: zhouzhuojie Differential Revision: D29307823 Pulled By: samestep fbshipit-source-id: 97fa4b7dad0465c269411c48142b22ce751bf830	2021-06-22 12:42:37 -07:00
Weiqiang Wu	6a87e8d087	Implement erfcx() (#58194 ) Summary: Implement erfcx() https://github.com/pytorch/pytorch/issues/31945 Reference: https://github.com/pytorch/pytorch/issues/50345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58194 Reviewed By: ngimel Differential Revision: D29285979 Pulled By: mruberry fbshipit-source-id: 5bcfe77fddfabbeb8c8068658ba6d9fec6430399	2021-06-22 12:38:38 -07:00
Jeffrey Wan	b34965435d	Improve testing of inplace views (#59891 ) Summary: Partially addresses https://github.com/pytorch/pytorch/issues/49825 by improving the testing - Rename some of the old tests that had "inplace_view" in their names, but actually mean "inplace_[update_]on_view" so there is no confusion with the naming - Adds some tests in test_view_ops that verify basic behavior - Add tests that creation meta is properly handled for no-grad, multi-output, and custom function cases - Add test that verifies that in the cross dtype view case, the inplace views won't be accounted in the backward graph on rebase as mentioned in the issue. - Update inference mode tests to also check in-place Pull Request resolved: https://github.com/pytorch/pytorch/pull/59891 Reviewed By: albanD Differential Revision: D29272546 Pulled By: soulitzer fbshipit-source-id: b12acf5f0e3f788167ebe268423cdb58481b56f6	2021-06-22 12:28:09 -07:00
Andrew Gallagher	20bda0057e	[caffe2/utils] Add explicit rule to avoid package boundary violation Summary: Add a rule to wrap proto_utils.h and depend on that, rather than relying on a glob which violates package boundaries. Reviewed By: igorsugak Differential Revision: D29273453 fbshipit-source-id: 08f198a03d06ee2fdf61f5dbe1d0087db22aec8b	2021-06-22 12:22:24 -07:00
Andrew Gallagher	7c1bca9e94	[caffe2/utils] Add explicit rule to avoid package boundary violation Summary: Add a rule to wrap simple_queue.h and depend on that, rather than relying on a glob which violates package boundaries. Test Plan: `buck2 build fbcode//caffe2/caffe2:caffe2_core` Reviewed By: igorsugak Differential Revision: D29273415 fbshipit-source-id: f2b62a82cd6478bd71a8194d661d1c8b023c0953	2021-06-22 12:21:08 -07:00
Michael Carilli	7f2592195d	Adds stream recording for cross-stream uses of gradients in streaming backward (#60230 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33909. I _think_ the two recordDataPtrOnStreams i added are necessary and sufficient. They're the ones that worked for dmitrivainbrand's intricate multistream pipelining in https://github.com/pytorch/pytorch/issues/33909 and I can more or less convince myself they're enough, but it's hard to be sure (and hard to test). PRing without a test now for visibility. I'll try to come up with something. input_buffer.cpp needs to compile in cuda or cpu-only builds, so I can't call `c10::cuda::CUDACachingAllocator::recordStream` directly. I planned to work around by adding a binding in VirtualGuardImpl but https://github.com/pytorch/pytorch/pull/57047 spared me the trouble, thanks lw . Recording a usage stream on a generic tensor was uglier than I expected, see https://github.com/pytorch/pytorch/issues/60306. Up to you guys if adding a unified way to record streams on a tensor backed by any TensorImpl should block this PR (and if so, whether it should happen in a separate PR or as part of this PR). Pull Request resolved: https://github.com/pytorch/pytorch/pull/60230 Reviewed By: mrshenli Differential Revision: D29289392 Pulled By: albanD fbshipit-source-id: 1339d382b7d238a461b082597b3962847b5201fe	2021-06-22 12:16:07 -07:00
Sam Estep	c7d0e9da0a	Add pyproject.toml (#60408 ) Summary: This makes PyTorch conform to [PEP 517](https://www.python.org/dev/peps/pep-0517/) and [PEP 518](https://www.python.org/dev/peps/pep-0518/) by explicitly stating that we use [`setuptools`](https://setuptools.readthedocs.io/). It also follows up on https://github.com/pytorch/pytorch/pull/60119#pullrequestreview-685791812 by moving our [`isort`](https://pycqa.github.io/isort/) config into the new `pyproject.toml` file. I didn't move any of our other tool configs into `pyproject.toml` in this PR because: - `.flake8` is assumed to exist in its current format for `tools/actions_local_runner.py` to work - `mypy.ini` is not our only `mypy` config - `pytest.ini` has detailed comments on `addopts` which [would have to be removed](https://github.com/toml-lang/toml/issues/340#issuecomment-122164501) in TOML because that setting is [a string, not an array](https://docs.pytest.org/en/6.2.x/customize.html#pyproject-toml) Pull Request resolved: https://github.com/pytorch/pytorch/pull/60408 Reviewed By: 1ntEgr8 Differential Revision: D29277327 Pulled By: samestep fbshipit-source-id: 3f2e63f6cf9024f8c534cb13a0d854a75609c5ba	2021-06-22 12:12:36 -07:00
Sam Estep	1abf45e37f	Revert D29241736: [pytorch][PR] To add Rectified Adam Algorithm to Optimizers Test Plan: revert-hammer Differential Revision: D29241736 (`0d2a936176`) Original commit changeset: 288b9b1f3125 fbshipit-source-id: 56c4ec98647c6f1822b130726741a1c9ca193670	2021-06-22 12:08:31 -07:00
Thomas J. Fan	99ca2c5b4b	Migrates nll_loss_backward from TH to Aten (CUDA) (#60299 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24609 Aten Umbrella issue https://github.com/pytorch/pytorch/issues/24507 Related to https://github.com/pytorch/pytorch/issues/59765 There are no performance differences when running the following benchmark: <details> <summary>Benchmark script</summary> ```python import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): torch.cuda.synchronize() MS_PER_SECOND = 1000 return time.perf_counter() * MS_PER_SECOND device = "cuda" C = 30 softmax = nn.LogSoftmax(dim=1) n_runs = 250 for reduction in ["none", "mean", "sum"]: for N in [100_000, 500_000, 1_000_000]: elapsed = 0 for i in range(n_runs): data = torch.randn(N, C, device=device, requires_grad=True) target = torch.empty(N, dtype=torch.long, device=device).random_(0, C) loss = nn.NLLLoss(reduction=reduction) input = softmax(data) result = loss(input, target) if reduction == "none": gradient = torch.randn(N, device=device) else: gradient = torch.randn(1, device=device).squeeze() t1 = _time() result.backward(gradient) t2 = _time() elapsed = elapsed + (t2 - t1) elapsed_avg = elapsed / n_runs print( f"input size({N}, {C}), reduction: {reduction} " f"elapsed time is {elapsed_avg:.2f} (ms)" ) print() ``` </details> ## master ``` input size(100000, 30), reduction: none elapsed time is 0.19 (ms) input size(500000, 30), reduction: none elapsed time is 0.83 (ms) input size(1000000, 30), reduction: none elapsed time is 1.66 (ms) input size(100000, 30), reduction: mean elapsed time is 1.50 (ms) input size(500000, 30), reduction: mean elapsed time is 7.19 (ms) input size(1000000, 30), reduction: mean elapsed time is 14.35 (ms) input size(100000, 30), reduction: sum elapsed time is 1.49 (ms) input size(500000, 30), reduction: sum elapsed time is 7.17 (ms) input size(1000000, 30), reduction: sum elapsed time is 14.21 (ms) ``` ## this PR ``` input size(100000, 30), reduction: none elapsed time is 0.19 (ms) input size(500000, 30), reduction: none elapsed time is 0.83 (ms) input size(1000000, 30), reduction: none elapsed time is 1.66 (ms) input size(100000, 30), reduction: mean elapsed time is 1.48 (ms) input size(500000, 30), reduction: mean elapsed time is 7.16 (ms) input size(1000000, 30), reduction: mean elapsed time is 14.29 (ms) input size(100000, 30), reduction: sum elapsed time is 1.49 (ms) input size(500000, 30), reduction: sum elapsed time is 7.15 (ms) input size(1000000, 30), reduction: sum elapsed time is 14.18 (ms) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60299 Reviewed By: albanD Differential Revision: D29287613 Pulled By: ngimel fbshipit-source-id: 21e15f2c518087e9fb797a379e1e0a3508c98509	2021-06-22 12:04:07 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	fca931d181	List striding with arbitrary step size (#58537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58537 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D28531721 Pulled By: tugsbayasgalan fbshipit-source-id: 8c8ed32ca00366603bfb5086e87dfa62736ff4b2	2021-06-22 11:25:23 -07:00
Kevin Tse	df8a8fbc1b	Improve code and documentation clarity for DataPipes APIs (#60423 ) Summary: Fixes issues that are discussed with ezyang in the comments of PR https://github.com/pytorch/pytorch/issues/59498 Improved code and documentation clarity, and refactored .filter to nesting_level directly Pull Request resolved: https://github.com/pytorch/pytorch/pull/60423 Reviewed By: ezyang Differential Revision: D29281599 Pulled By: NivekT fbshipit-source-id: a9bbaf52f492db0741c00f3ceb4022b08ddb1506	2021-06-22 11:19:08 -07:00
Karen Zhou	71b83c27e2	[pruning] Move pruning directory into experimental folder (#60395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60395 Experimental folder so other developers know this is work in progress Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1KGJD Reviewed By: z-a-f Differential Revision: D29272319 fbshipit-source-id: 93eeeceba0376753efc9a5bb69a155278ceb2fca	2021-06-22 11:08:48 -07:00
Karen Zhou	f75ea51e67	[pruning] Move pruning files to their own directory (#60293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60293 Move pruning files to their own directory Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1KCfz Reviewed By: z-a-f Differential Revision: D29238159 fbshipit-source-id: 0173a278b39ff5ee4cbd54f333f558b6fe412be5	2021-06-22 11:08:47 -07:00
Karen Zhou	b25db5251a	[pruning] Base pruner class (#60278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60278 Implemented `PruningParametrization`, which removes pruned rows, and `BasePruner`, which is the base class for structured pruning. Test Plan: `buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner` https://pxl.cl/1KC2n Reviewed By: z-a-f Differential Revision: D29208349 fbshipit-source-id: f34e8e258bf13fa80292c2bd64d56f5ad1e72b6a	2021-06-22 11:07:31 -07:00
Peter Bell	31a884987d	Remove some TH includes from ATen (#60323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60323 Test Plan: Imported from OSS Reviewed By: malfet, anjali411 Differential Revision: D29252862 Pulled By: ngimel fbshipit-source-id: 9ea13495d382c04dfd52b8dd63314f53b7e83936	2021-06-22 10:55:17 -07:00
Ilqar Ramazanli	0d2a936176	To add Rectified Adam Algorithm to Optimizers (#58968 ) Summary: Fixes : https://github.com/pytorch/pytorch/issues/24892 In the paper : https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm. It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process. Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well : `2f03dd1970/radam/radam.py (L156)` `f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968 Reviewed By: gchanan Differential Revision: D29241736 Pulled By: iramazanli fbshipit-source-id: 288b9b1f3125fdc6c7a7bb23fde1ea5c201c0448	2021-06-22 10:38:41 -07:00
Kshiteej K	0126f42841	[complex] `torch.sigmoid`: CUDA support and complex autograd support (#48647 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48552 Changes * Complex support for `torch.sigmoid` CUDA (CPU support already exists) * Complex autograd support for `torch.sigmoid` (CUDA and CPU) Pull Request resolved: https://github.com/pytorch/pytorch/pull/48647 Reviewed By: H-Huang Differential Revision: D29163012 Pulled By: anjali411 fbshipit-source-id: 0cac0412355312675bee1cc46e090be7351d5dac	2021-06-22 10:35:00 -07:00
Winston Smith	567e6d3a87	Remove Caffe2 thread-pool leak warning (#60318 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57273. Some users reported that they dislike the Caffe2 thread-pool leak warning, as it floods their logs, and have requested disabling it, or have asked for a way to filter it. It seems caffe2 pthreadpool already exists because of some dependency in the binary distribution, so `torch.set_num_threads()` invocation isn't required to reproduce the issue (as is otherwise the case when building from the master branch). https://github.com/pytorch/pytorch/issues/60171's test script does have a `set_num_threads` invocation & hence that's why I was able to reproduce the issue after building from the master branch's source code. cc malfet & ejguan, who have the authority to make a decision. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60318 Reviewed By: albanD Differential Revision: D29265771 Pulled By: ezyang fbshipit-source-id: 26f678af2fec45ef8f7e1d39a57559790eb9e94b	2021-06-22 10:26:55 -07:00
Michael Dagitses	91451369ed	require non-empty inputs to grad() calls in the API (#52016 ) Summary: The grad() function needs to return the updated values, and hence needs a non-empty inputs to populate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52016 Test Plan: Passes Python and C++ unit tests, and added new tests to catch this behavior. Fixes https://github.com/pytorch/pytorch/issues/47061 Reviewed By: albanD Differential Revision: D26406444 Pulled By: dagitses fbshipit-source-id: 023aeca9a40cd765c5bad6a1a2f8767a33b75a1a	2021-06-22 10:10:58 -07:00
Saketh Are	729f7cd52f	Implement histogram operator on CPU (#58780 ) Summary: The existing [torch.histc](https://pytorch.org/docs/stable/generated/torch.histc.html) operator is limited in comparison to [numpy.histogram](https://numpy.org/doc/stable/reference/generated/numpy.histogram.html). This PR adds torch.histogram on CPU. The new operator replicates numpy.histogram's behavior, including support for caller-specified bin edges and weights. It was motivated by previous community requests for histogram. The implementation was [benchmarked](https://docs.google.com/spreadsheets/d/1xCR0jODchVvwdVSAjiLsNCkmyictA6j1LNfDpWOafjw/edit?usp=sharing) against numpy.histogram as well as torch.histc. This implementation is weakly faster than numpy.histogram across all types of inputs tested, and performs in line with torch.histc for the limited inputs histc supports. mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/58780 Test Plan: Added unit tests, OpInfo for the new torch.histogram operator. Tested execution time on a variety of input sizes and compared to numpy.histogram performance: https://docs.google.com/spreadsheets/d/1xCR0jODchVvwdVSAjiLsNCkmyictA6j1LNfDpWOafjw/edit?usp=sharing Reviewed By: ezyang Differential Revision: D29134626 Pulled By: saketh-are fbshipit-source-id: f2773085de1697f6bc6ffdeffe9a81267f51bdfc	2021-06-22 10:06:04 -07:00
Patrick	3a56758e1f	changed launch bound to fix col2im kernel (#60315 ) Summary: Changed launch bound for col2im kernel from 1024 to 512 to fix register spilling into local memory. Perf comparison (using Nvidia Titan-V): ![Col2ImTimingData](https://user-images.githubusercontent.com/22803332/122627527-e0b1fc80-d064-11eb-83df-f2a1165cefcc.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/60315 Reviewed By: albanD Differential Revision: D29288113 Pulled By: ngimel fbshipit-source-id: f78eb90941835700a1aef8e08fac6aff86dedfe9	2021-06-22 09:29:34 -07:00
Patrick	926bb5d6be	changed launch bounds, unrolled for loop for grid sampler 2d fwd and bwd (#60405 ) Summary: Changed launch bounds for grid sampler 2d fwd and bwd from 1024 to 256, added loop unrolling to fix register spilling into local memory. Timing Data: (using Nvidia Titan-V) Interpolation mode 2, padding 0, align corners False ![GridSampler2dTimingData](https://user-images.githubusercontent.com/22803332/122830305-01fd2d80-d29d-11eb-9cd3-7da533a03f33.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/60405 Reviewed By: albanD Differential Revision: D29288075 Pulled By: ngimel fbshipit-source-id: 5e060f0c2d1cc0a3086718e6be263413dfa29689	2021-06-22 09:22:41 -07:00
Kevin Tse	23bb2ed00a	Improve documentation for torch.set_rng_state (#60422 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59974 by improving documentation for the function torch.set_rng_state Pull Request resolved: https://github.com/pytorch/pytorch/pull/60422 Test Plan: Only a comment is being changed. Reviewed By: bdhirsh Differential Revision: D29281578 Pulled By: NivekT fbshipit-source-id: 2c160f782438b7f91f16c44f06c342e8b8b8437b	2021-06-22 07:10:50 -07:00
Chen Lai	700df82881	[PyTorch Edge] Update iOS readme to use lite interpreter (#59841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59841 As lite interpreter moves to beta, it's recommended to let users start using it. ghstack-source-id: 131766778 Test Plan: CI Reviewed By: husthyc Differential Revision: D29048350 fbshipit-source-id: 54d2ad09b4e9475304522c80b358647bcea79b14	2021-06-22 02:17:04 -07:00
Mike Ruberry	15dc320cae	Fix lint build (#60438 ) Summary: per title Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/60438 Reviewed By: ngimel Differential Revision: D29288175 Pulled By: mruberry fbshipit-source-id: f59b579b1793fdb1d298109c2bef0a70badb37b4	2021-06-22 00:11:55 -07:00
Patrick	0585daae83	fixed launch bounds for gathertopk kernel (#60314 ) Summary: Changed launch bounds for gatherTopK kernel to fix register spilling into local memory. Comparison (Nvidia Titan-V GPU): Args: Input size as below, k=32, dim=None ![TopKTimingData](https://user-images.githubusercontent.com/22803332/122624922-46978780-d057-11eb-9b52-d5786da432c0.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/60314 Reviewed By: mruberry Differential Revision: D29267789 Pulled By: ngimel fbshipit-source-id: 4056efb2e44e5527786167af66a127504980a3af	2021-06-21 22:24:44 -07:00
Peter Bell	45ae2e7863	Set TORCH_WARN_ONCE to always warn inside of assertNotWarn (#60020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60020 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D29249909 Pulled By: mruberry fbshipit-source-id: 10a8d5c05bd8d4aec345f70b132efd3623601f6a	2021-06-21 21:35:54 -07:00
Peter Bell	5d476f5b95	Fix FFT documentation examples and run doctests in the test suite (#60304 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59514 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60304 Reviewed By: anjali411 Differential Revision: D29253980 Pulled By: mruberry fbshipit-source-id: 0654f00197e5fae338aa8edf0b61ef5692cdaa7e	2021-06-21 20:47:25 -07:00
Rong Rong (AI Infra)	5921b5480a	ensure xml report path are relative to */pytorch/test (#60380 ) Summary: Changes the approach. Root cause of this is for some reason: `inspect.getfile` returns absolute path instead of relative path to `os.getcwd` in newer python version. we sanitize this by removing the CI_PREFIX if applies See: https://app.circleci.com/pipelines/github/pytorch/pytorch/339568/workflows/43cac71c-759e-471f-83c2-d59c152dcd8a/jobs/14278585 vs. https://app.circleci.com/pipelines/github/pytorch/pytorch/339568/workflows/43cac71c-759e-471f-83c2-d59c152dcd8a/jobs/14278285 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60380 Test Plan: CI Plot twist: windows tests are actually launched via ``` pushd test python run_test.py ``` while linux/macos tests are ``` python test/run_test.py ``` This might cause problem when using `os.getcwd()` we will see from PR CI results. Reviewed By: malfet Differential Revision: D29276969 Pulled By: walterddr fbshipit-source-id: 336c2805d0c92733e0ff4c309ff2044dc2ed4e21	2021-06-21 20:47:23 -07:00
praneeth	9b30fb8528	add support for constant (#60166 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58739 Add support for constants according to python array API stipulation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60166 Reviewed By: anjali411 Differential Revision: D29253958 Pulled By: mruberry fbshipit-source-id: 0bc86b74d3a4eb3ec4a65c941ec2710747402db1	2021-06-21 20:47:21 -07:00
Jeff Daily	1764aa79b9	restore JOB_BASE_NAME for test1 and test2 in test.sh (#60409 ) Summary: JOB_BASE_NAME for test1 and test2 were removed by https://github.com/pytorch/pytorch/issues/60124. This caused the ROCm CI to run all tests for both test1 and test2. Restore the use of JOB_BASE_NAME. Fixes https://github.com/pytorch/pytorch/issues/60377. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60409 Reviewed By: anjali411 Differential Revision: D29277560 Pulled By: walterddr fbshipit-source-id: ddf01466492a9a626ce1b6adf87cd102d8f1fe35	2021-06-21 20:46:17 -07:00
Philip Meier	7d39608a29	split TestAsserts by functionality (#58919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58919 Instead of having one large TestAsserts test case, we split of tests for self-contained functionality like container or complex checking into separate test cases. That makes it a lot easier to keep an overview over what is tested. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D29259407 Pulled By: mruberry fbshipit-source-id: 9769cb6d56c1a3790280542db398cb247986b09a	2021-06-21 20:44:23 -07:00
Philip Meier	14b0191d1f	make assert_equal an example how to partial `torch.testing.assert_close` (#58918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58918 ~Instead of a distinct `torch.testing.assert_close` and `torch.testing.assert_equal`, this makes `torch.testing.assert_equal` a special case of `torch.testing.assert_close` for `rtol=atol=0`. In this case the closeness definition `abs(actual - expected) <= atol + rtol * abs(expected)` boils down to `abs(actual - expected) <= 0`. Since `abs(x)` can never be `<0`, this is equivalent to `abs(a - b) == 0` and this again boils down to `a == b`.~ Following https://github.com/pytorch/pytorch/pull/58918#issuecomment-860642057 and some offline discussions, we opted to use `assert_equal` as an example how to `partial` it. This makes maintaing the module a lot easier, because we don't need to keep two functions in sync. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D29259404 Pulled By: mruberry fbshipit-source-id: fa1a1fa93672a7ed1c5f0e4beb0dcd45b5c14fce	2021-06-21 20:44:21 -07:00
Philip Meier	583f072778	introduce TestingErrorMeta for internal use (#58917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58917 In #54780 we opted to return `Optional[Exception]` from all internal helper functions. Since then multiple PRs added functionality that needs to amend the error message. For this we recreate the error `09a1b1cf87/torch/testing/_asserts.py (L417-L430)` To untangle this a little, this PR introduces the `_TestingErrorMeta`, which carries the exception type and the message. The idiom ```python exc = check_foo(): if exc: return exc ``` is still valid although `exc` should be renamed to `error_meta` to reflect the new nature. In the top-level functions `assert_(equal\|close)` ```python exc = check_foo(): if exc: raise exc ``` changes to ```python error_meta = check_foo(): if error_meta: raise error_meta.to_error() ``` Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D29259405 Pulled By: mruberry fbshipit-source-id: 9078fe326283d5aa3d0cf256bf007887df9bfbfb	2021-06-21 20:44:20 -07:00
Philip Meier	cf789b9941	remove pytest.UsageError (#58916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58916 Using `pytest.UsageError` in case `pytest` is available adds almost nothing as observed in https://github.com/pytorch/pytorch/pull/53820#discussion_r593868752, but makes it harder to maintain: due to the conditional import, `mypy` is not able to handle `UsageError` in a type annotation. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D29259409 Pulled By: mruberry fbshipit-source-id: 82b00d13fa47db77383996d0caa69177804a48b6	2021-06-21 20:44:18 -07:00
Philip Meier	9fffd05e54	hide top-level test functions from pytest's traceback (#58915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58915 History: - It was included for internal helper functions in the initial proposal in #53820 - It was removed in #54780, since it is not honored when used with `pytest`'s `--tb=native`, which is the default for PyTorch Since PyTorch shouldn't be the only user of `assert_(equal\|close)` we add it here to the top-level functions `assert_(equal\|close)`. If `pytest` is used without `--tb=native`, the traceback for ```python assert torch.eq(actual, expected), "Tensors are not equal!" torch.testing.assert_equal(actual, expected) ``` looks the same, making it more concise. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D29259406 Pulled By: mruberry fbshipit-source-id: acee47b30b7f14def27433f7d56a4b19d77393c0	2021-06-21 20:44:16 -07:00
Philip Meier	18d45b960b	remove rouge raise in helper function (#58914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58914 Only the top-level functions `assert_(equal\|close)` should raise the exception to keep the traceback managable. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D29259408 Pulled By: mruberry fbshipit-source-id: 40dd52eec6f9e8166b3b239d5172ee44b749e8dc	2021-06-21 20:43:06 -07:00
Baichuan Yuan	dca97b4394	Weighted decay with frequency (count-based) (#60382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60382 Instead of setting weight_decay w uniformly for all ids, for each row i in the sparse embedding table, the actual weight_decay `w_i` becomes `wfreq_i` where `freq_i = halflife/counter_i \in [\log(2), halflife]`. Counter is from `rowwise_counter` with definition `counter_i = 1 + \exp(-iter_{\delta}\rho)*counter_i`. Test Plan: buck test //caffe2/caffe2/python/operator_test:adagrad_test -- test_row_wise_sparse_adagrad buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_weight_decay Reviewed By: 0x10cxR1 Differential Revision: D25581030 fbshipit-source-id: 54b3831b20516c76c559b13d8deb809e2ee3b446	2021-06-21 18:46:35 -07:00
Aliaksandr Ivanou	8f03018980	[pytorch] Move signal handler test to internal codebase (#60394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60394 Move signal handler test to internal codebase Github issue: https://github.com/pytorch/pytorch/issues/60260 Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/elastic/multiprocessing:api_test buck test mode/dev-nosan //caffe2/torch/distributed/elastic/multiprocessing/fb/test:api_test Reviewed By: cbalioglu Differential Revision: D29273160 fbshipit-source-id: e4ae72f7f6d54cbba324119fce7446a30a6c37c9	2021-06-21 18:26:41 -07:00
jiayisun	af3f7a210a	add BFloat16 support for kthvalue and median on CPU (#60074 ) Summary: Add BFloat16 support for kthvalue and median on CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/60074 Reviewed By: gchanan Differential Revision: D29230348 Pulled By: heitorschueroff fbshipit-source-id: fa9c086758d51069acf270faa526e4b141b0ef68	2021-06-21 17:52:18 -07:00
Lily Johnson	2606022d01	[package] fix for edge case `os` and `os.path` importing (#60276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60276 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D29234143 Pulled By: Lilyjjo fbshipit-source-id: 4d96dde4ef1d84f9966f9f58c883ab9bb92fe728	2021-06-21 16:54:02 -07:00
Nicolas Weber	25e077bce1	[Issue 59296] added VE device (#59620 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59296 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59620 Reviewed By: zou3519 Differential Revision: D29196830 Pulled By: ezyang fbshipit-source-id: 7bb49f776dc755804a0ba0bc3a7dbdab9c93914e	2021-06-21 16:44:52 -07:00
Hariom Narang	9d1d799034	Added API to change logging levels for JIT (#58821 ) Summary: Description: - Before this, logging level could only be changed by changing the env variable "PYTORCH_JIT_LOG_LEVEL" - Can change the level from python now - Have not added stream configuration for now - Configuration is stored in a singleton class managing the options Issue Link: https://github.com/pytorch/pytorch/issues/54188 Gotchas: - Created separate functions `::torch::jit::get_jit_logging_levels/set_jit_logging_levels` instead of using the singleton class's method directly - This is because when running test cases, two different instances of the singleton are created for the test suite and the actual code (`jit_log.cpp`) - On using these methods directly, `is_enabled` calls the singleton in `jit_log.cpp` while we are setting the config using another singleton - See: https://stackoverflow.com/questions/55467246/my-singleton-can-be-called-multiple-times API: - To set the level: `torch._C._jit_set_logging_option("level")` - To get the level: `torch._C._jit_get_logging_option()` Testing: - UTs were added for C++ - A very simple UT was added for python to just check if the API is being called correctly - The API was checked by running trace in a sample python file - Set env variable to "" and used `_jit_set_logging_option` in python to set the variable to `>dead_code_elimination` - The error output had logs of form [DUMP..] [UPDATE...] etc Fixes https://github.com/pytorch/pytorch/issues/54188 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58821 Reviewed By: soulitzer Differential Revision: D29116712 Pulled By: ZolotukhinM fbshipit-source-id: 8f2861ee2bd567fb63b405953d035ca657a3200f	2021-06-21 16:10:49 -07:00
Eli Uriegas	82a6574d89	cmake: Use BUILD_INTERFACE with TORCH_SRC_DIR (#60403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60403 TORCH_SRC_DIR has the potential to be hardcoded thus breaking downstream cmake extensions. Prefer CMAKE_CURRENT_SOURCE_DIR with BUILD_INTERFACE to make it magically work together See https://cmake.org/cmake/help/latest/command/target_include_directories.html Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D29276503 Pulled By: seemethere fbshipit-source-id: 6ec0754de6a02cdc35a4a453d6271ac4fdfc5ee3	2021-06-21 15:37:27 -07:00
Pavithran Ramachandran	8dd1dc89cb	[PyTorch][Edge] Adding tests for lite quantized models (#60226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60226 # Context Read this posts for details about why we need a test bench for quantized lite modules https://fb.workplace.com/groups/2322282031156145/permalink/4289792691071726/ # This Diff Adds test cases for Quantized Lite modules ghstack-source-id: 131859101 Test Plan: ``` [ ~/fbsource/fbcode] buck test caffe2/test:mobile -- mobile.test_lite_script_module.TestLiteScriptQuantizedModule Unable to connect to Buck daemon, restarting it... Running with tpx session id: 44cf0b2f-0905-444a-95df-4a2eec774163 Trace available for this run at /tmp/tpx-20210618-093849.343917/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7036874461151326 ✓ ListingSuccess: caffe2/test:mobile - main (16.736) ✓ Pass: caffe2/test:mobile - test_two_layer (mobile.test_lite_script_module.TestLiteScriptQuantizedModule) (14.836) ✓ Pass: caffe2/test:mobile - test_annotated_nested (mobile.test_lite_script_module.TestLiteScriptQuantizedModule) (15.073) ✓ Pass: caffe2/test:mobile - test_quantization_example (mobile.test_lite_script_module.TestLiteScriptQuantizedModule) (16.286) ✓ Pass: caffe2/test:mobile - test_single_layer (mobile.test_lite_script_module.TestLiteScriptQuantizedModule) (18.360) Summary Pass: 4 ListingSuccess: 1 ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/7036874461151326/ Reviewed By: iseeyuan Differential Revision: D29212232 fbshipit-source-id: 8d0b61b3f414e31720f1e3ce681ec8fa716555c1	2021-06-21 15:09:42 -07:00
Rong Rong (AI Infra)	5bd49c3396	fix workflow id usage in GHA (#60376 ) Summary: This fixes: https://github.com/pytorch/pytorch/issues/60139 GHA workflow ID is set to `run_id` previously and it doesn't change across re-runs see: https://docs.github.com/en/actions/reference/environment-variables#default-environment-variables Using GITHUB_RUN_NUMBER to report workflow ID instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60376 Test Plan: CI see: [with rerun](https://github.com/pytorch/pytorch/actions/runs/952508536) and [without rerun](https://github.com/pytorch/pytorch/actions/runs/955665324 ) example --> they reported everything under the same run ID but in fact the first one ran twice as many test cases reported in scuba. This shouldn't occur after this PR. Reviewed By: samestep Differential Revision: D29267455 Pulled By: walterddr fbshipit-source-id: 00fc6b75b84861e2f7d3e21698a5f840c3c21dcd	2021-06-21 14:54:49 -07:00
Edward Yang	1f50dc6e46	Fix ignoring Tensor properties in torch.overrides (#60050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60050 It doesn't work to put torch.Tensor.prop.__get__ in the ignored list. Now it does. (Not exercised here, see next diff in stack). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29171464 Pulled By: ezyang fbshipit-source-id: e7354668b481f9275f2eb5bb3a6228d1815fecea	2021-06-21 14:49:51 -07:00
Erjia Guan	65f33ec85c	Follow-up fix for compilation error on CUDA92 (#60287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60287 Follow up of #60017 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D29236208 Pulled By: ejguan fbshipit-source-id: f1acf9630b45fea8cbdf7d64e47661643d0a52b8	2021-06-21 13:29:11 -07:00
kshitij12345	01e0296eb7	[special] migrate log1p, sinc, round to special namespace (#55878 ) Summary: Reference : https://github.com/pytorch/pytorch/issues/50345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55878 Reviewed By: zou3519, janeyx99 Differential Revision: D29160593 Pulled By: mruberry fbshipit-source-id: f3ca9c541382bab33fb85d7817ce8ddc117c6826	2021-06-21 12:34:29 -07:00
Stephen Macke	769c299dcf	[caffe2] add tests for inplace elementwise ops (#60106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60106 In Caffe2, some elementwise in-place compatible ops lack coverage for the in-place case. We add tests for a subset of them here and thereby increase coverage. Test Plan: ``` buck test //caffe2/caffe2/python/operator_test:elementwise_ops_test ``` Let CI run. Reviewed By: clrfb Differential Revision: D29143189 fbshipit-source-id: 83138ad8eff8fe95c40aece53714da3577396a23	2021-06-21 12:04:18 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	f66b53e8b2	Ignore unsupported attribute checker pass for torch.jit.trace (#60200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60200 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29207583 Pulled By: tugsbayasgalan fbshipit-source-id: 241620209dbafc94ebdb83d99257e341b11e999b	2021-06-21 11:55:12 -07:00
Simon Seo	b505adbb09	Fix typo in ChainDataset docs (#60336 ) Summary: * chainning -> chaining Pull Request resolved: https://github.com/pytorch/pytorch/pull/60336 Reviewed By: bdhirsh Differential Revision: D29265236 Pulled By: anjali411 fbshipit-source-id: 17a9b73af9e094550bd1ee25bc9439fb8d455e2b	2021-06-21 11:47:21 -07:00
Michael Wootton	2f3be2735f	Don't split oversize cached blocks (#44742 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35901 This change is designed to prevent fragmentation in the Caching Allocator. Permissive block splitting in the allocator allows very large blocks to be split into many pieces. Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned. Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks' Approach: - Large blocks above a certain size are designated "oversize". This limit is currently set 1 decade above large, 200 MB - Oversize blocks can not be split - Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block) - In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated. This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering Initial performance tests show this is similar or quicker than the original strategy. Additional tests are ongoing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742 Reviewed By: zou3519 Differential Revision: D29186394 Pulled By: ezyang fbshipit-source-id: c88918836db3f51df59de6d1b3e03602ebe306a9	2021-06-21 11:46:08 -07:00
Jane Xu	eaa36ee679	Enable sharding for Windows GHA CI (#59970 ) Summary: Enables sharding for Windows on CI. To make that possible, we currently remove the smoke tests tested in shard 1 which don't seem all that important as they are 1. tested on nightlies 2. seems to be tested anyway by running the test suite Pull Request resolved: https://github.com/pytorch/pytorch/pull/59970 Reviewed By: seemethere Differential Revision: D29268484 Pulled By: janeyx99 fbshipit-source-id: 7f90d73037cfeb2c267b28714550316eb471b4dd	2021-06-21 11:42:22 -07:00
Sam Estep	023907a6fe	Allow Docker build on macOS (#60375 ) Summary: This PR allows developers using macOS to build Docker images locally. The `basename $(mktemp -u)` part was suggested by seemethere; I modified it slightly to appease ShellCheck and because [Docker doesn't allow uppercase characters in tags](https://stackoverflow.com/a/54291205). Pull Request resolved: https://github.com/pytorch/pytorch/pull/60375 Test Plan: On a Mac: ``` cd .circleci/docker ./build.sh pytorch-linux-xenial-py3.6-gcc5.4 ``` Reviewed By: driazati Differential Revision: D29267025 Pulled By: samestep fbshipit-source-id: ba27d2fb108f573a50db069cf9ddea0414ed6074	2021-06-21 11:27:49 -07:00
David Riazati	27e34f731a	Re-enable clang-tidy on PRs (#60297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60297 This switches clang-tidy to the fresh tag from https://github.com/pytorch/test-infra/runs/2860763986 which has a fix for the missing OMP headers we were seeing. Along with #60225 this should restore clang-tidy to normal functionality and we shouldn't see any spurious warnings. Test Plan: Imported from OSS Reviewed By: seemethere, 1ntEgr8 Differential Revision: D29239783 Pulled By: driazati fbshipit-source-id: b1893256fdb27436af03d6c5279e81f64b47fe6b	2021-06-21 11:04:09 -07:00
Thomas J. Fan	c16f87949f	ENH Adds nn.ReflectionPad3d (#59791 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/27655 This PR adds a C++ and Python version of ReflectionPad3d with structured kernels. The implementation uses lambdas extensively to better share code from the backward and forward pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59791 Reviewed By: gchanan Differential Revision: D29242015 Pulled By: jbschlosser fbshipit-source-id: 18e692d3b49b74082be09f373fc95fb7891e1b56	2021-06-21 10:53:14 -07:00
Michael Carilli	f89ae9cb8d	Moves grid_sampler to autocast promote list (#58618 ) Summary: Should close https://github.com/pytorch/pytorch/issues/42218 Numerically, `grid_sampler` is fine in fp16 or fp32, but takes several inputs and expects their dtypes to match, so it belongs on the autocast promote list. `grid_sampler` currently uses `gpuAtomicAdd`, notoriously slow in fp16 because it calls cuda's atomicAdd __half overload which uses a software compare-and-swap loop internally. To allow good performance if both inputs happen to be FP16, the PR also modifies `grid_sampler_[2,3]d_backward_kernel`s to use `fastAtomicAdd` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58618 Reviewed By: mruberry Differential Revision: D29257199 Pulled By: ngimel fbshipit-source-id: 3cc7505945b480427f2fc1beb36bee80bf3853b3	2021-06-21 10:22:36 -07:00
Raghavan Raman	61e0bc1955	[nnc] Remove check on initializer in compressBuffer (#60194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60194 Test Plan: Imported from OSS Reviewed By: bertmaher, huiguoo Differential Revision: D29206255 Pulled By: navahgar fbshipit-source-id: 0a68ec4067c37f06ca1ea9ddeeb5ad5e0dcb0639	2021-06-21 09:57:37 -07:00
CodemodService FBSourceClangFormatLinterBot	f2bb0932da	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D29259226 fbshipit-source-id: 15fd79f6fed38d6ed2d84018852806683d5a09fa	2021-06-21 03:57:10 -07:00
Mike Ruberry	5ff407df67	Skips failing MacOS tests (#60348 ) Summary: Mitigates, but does not fix https://github.com/pytorch/pytorch/issues/60347. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60348 Reviewed By: ngimel Differential Revision: D29257917 Pulled By: mruberry fbshipit-source-id: de9be93ddeda1ca27ea2ff4650162f886d10f1e2	2021-06-21 01:35:36 -07:00
Sameer Deshmukh	1dee99c973	LU Solve using cublas and cusolver (#59148 ) Summary: This PR introduces cuSOLVER and cuBLAS for the `lu_solve` routine. Solves a part of https://github.com/pytorch/pytorch/issues/47953. Since usage of cuSOLVER with MAGMA introduces performance regressions in MAGMA (https://github.com/pytorch/pytorch/issues/56590), we use heuristics for determining when to call cuSOLVER, cuBLAS or MAGMA depending on the batch and matrix sizes. The 64-bit cuSOLVER API is not introduced in this PR since there are several problems with the LU factorization using cusolver (https://github.com/pytorch/pytorch/pull/59148). The following are performance benchmarks using various configurations: <details> ``` [--------------------------------------------------------- LU solve CUDA torch.float64 ----------------------------------------------------------] \| lu_solve CUSOLVER \| lu_solve MAGMA \| lu_solve CUBLAS \| lu_solve cuSOLVER/MAGMA \| lu_solve TEST ALL 1 threads: --------------------------------------------------------------------------------------------------------------------------------------- torch.Size([1, 1, 1]) \| 703.4 \| 489.8 \| 511.8 \| 710.1 \| 487.1 torch.Size([2, 1, 1]) \| 738.9 \| 504.1 \| 513.0 \| 958.2 \| 494.4 torch.Size([4, 1, 1]) \| 790.7 \| 514.7 \| 506.8 \| 983.9 \| 540.2 torch.Size([8, 1, 1]) \| 865.3 \| 496.4 \| 514.7 \| 975.2 \| 520.0 torch.Size([16, 1, 1]) \| 955.5 \| 483.9 \| 508.3 \| 937.6 \| 526.5 torch.Size([32, 1, 1]) \| 1167.7 \| 495.2 \| 511.2 \| 934.0 \| 528.7 torch.Size([64, 1, 1]) \| 1730.0 \| 492.1 \| 537.8 \| 936.4 \| 533.2 torch.Size([128, 1, 1]) \| 2748.4 \| 499.7 \| 526.5 \| 982.9 \| 540.8 torch.Size([1, 2, 2]) \| 724.6 \| 498.2 \| 541.7 \| 715.0 \| 504.7 torch.Size([2, 2, 2]) \| 737.0 \| 514.3 \| 527.6 \| 934.5 \| 524.5 torch.Size([4, 2, 2]) \| 750.5 \| 524.1 \| 537.4 \| 935.5 \| 543.0 torch.Size([8, 2, 2]) \| 844.8 \| 513.7 \| 538.9 \| 953.3 \| 534.4 torch.Size([16, 2, 2]) \| 1013.1 \| 521.9 \| 530.0 \| 932.2 \| 537.9 torch.Size([32, 2, 2]) \| 1335.8 \| 515.1 \| 544.4 \| 939.9 \| 559.5 torch.Size([64, 2, 2]) \| 1819.6 \| 511.8 \| 534.1 \| 973.9 \| 540.0 torch.Size([128, 2, 2]) \| 3018.7 \| 526.3 \| 546.1 \| 979.3 \| 543.5 torch.Size([1, 8, 8]) \| 732.5 \| 524.9 \| 532.9 \| 762.4 \| 516.8 torch.Size([2, 8, 8]) \| 771.2 \| 514.9 \| 538.7 \| 1007.5 \| 531.1 torch.Size([4, 8, 8]) \| 811.3 \| 507.7 \| 534.6 \| 1002.2 \| 548.5 torch.Size([8, 8, 8]) \| 866.6 \| 530.0 \| 532.0 \| 1016.1 \| 562.9 torch.Size([16, 8, 8]) \| 991.8 \| 533.6 \| 548.0 \| 1022.6 \| 548.5 torch.Size([32, 8, 8]) \| 1271.7 \| 541.2 \| 534.7 \| 1013.8 \| 545.6 torch.Size([64, 8, 8]) \| 1817.2 \| 530.2 \| 520.6 \| 1008.7 \| 566.3 torch.Size([128, 8, 8]) \| 2678.7 \| 531.6 \| 552.2 \| 1006.2 \| 555.0 torch.Size([1, 16, 16]) \| 738.2 \| 546.1 \| 536.6 \| 775.6 \| 540.1 torch.Size([2, 16, 16]) \| 782.6 \| 543.5 \| 539.6 \| 1010.9 \| 541.1 torch.Size([4, 16, 16]) \| 815.2 \| 546.1 \| 560.9 \| 1012.5 \| 553.1 torch.Size([8, 16, 16]) \| 877.7 \| 543.0 \| 547.9 \| 1012.8 \| 551.5 torch.Size([16, 16, 16]) \| 1008.7 \| 549.2 \| 562.7 \| 1016.6 \| 546.8 torch.Size([32, 16, 16]) \| 1291.9 \| 540.8 \| 560.3 \| 1055.8 \| 539.3 torch.Size([64, 16, 16]) \| 1846.3 \| 553.5 \| 556.0 \| 1010.8 \| 551.9 torch.Size([128, 16, 16]) \| 2953.8 \| 562.7 \| 547.5 \| 1026.2 \| 555.8 torch.Size([1, 32, 32]) \| 789.1 \| 590.6 \| 590.9 \| 790.5 \| 579.0 torch.Size([2, 32, 32]) \| 806.9 \| 596.6 \| 600.2 \| 1085.6 \| 573.8 torch.Size([4, 32, 32]) \| 852.0 \| 597.9 \| 588.2 \| 1098.9 \| 574.7 torch.Size([8, 32, 32]) \| 914.2 \| 597.8 \| 591.4 \| 1090.3 \| 585.7 torch.Size([16, 32, 32]) \| 1063.0 \| 604.6 \| 597.3 \| 1094.0 \| 580.5 torch.Size([32, 32, 32]) \| 1302.0 \| 602.0 \| 598.9 \| 1090.3 \| 583.6 torch.Size([64, 32, 32]) \| 1861.7 \| 601.1 \| 599.8 \| 1113.4 \| 588.6 torch.Size([128, 32, 32]) \| 3251.0 \| 619.6 \| 595.3 \| 1106.8 \| 608.9 torch.Size([1, 64, 64]) \| 978.6 \| 842.7 \| 778.6 \| 1071.4 \| 825.8 torch.Size([2, 64, 64]) \| 1072.3 \| 845.7 \| 785.4 \| 1400.6 \| 829.0 torch.Size([4, 64, 64]) \| 1051.9 \| 842.9 \| 796.1 \| 1352.2 \| 788.2 torch.Size([8, 64, 64]) \| 1090.3 \| 834.1 \| 805.2 \| 1382.6 \| 804.7 torch.Size([16, 64, 64]) \| 1206.9 \| 835.7 \| 802.2 \| 1365.6 \| 801.2 torch.Size([32, 64, 64]) \| 1671.2 \| 846.5 \| 794.5 \| 1345.1 \| 814.2 torch.Size([64, 64, 64]) \| 2759.3 \| 848.5 \| 795.4 \| 1409.7 \| 832.9 torch.Size([128, 64, 64]) \| 4928.6 \| 877.4 \| 848.3 \| 1439.0 \| 883.9 torch.Size([1, 128, 128]) \| 1315.6 \| 1158.4 \| 1130.0 \| 1301.3 \| 1177.1 torch.Size([2, 128, 128]) \| 1334.7 \| 1198.2 \| 1186.6 \| 1703.9 \| 1209.5 torch.Size([4, 128, 128]) \| 1374.6 \| 1200.7 \| 1266.2 \| 1640.6 \| 1272.3 torch.Size([8, 128, 128]) \| 1453.6 \| 1215.9 \| 1287.3 \| 1669.1 \| 1288.7 torch.Size([16, 128, 128]) \| 1882.1 \| 1244.9 \| 1337.6 \| 1698.8 \| 1347.1 torch.Size([32, 128, 128]) \| 2789.0 \| 1284.5 \| 1398.6 \| 1747.6 \| 1396.3 torch.Size([64, 128, 128]) \| 4763.0 \| 1425.2 \| 1581.7 \| 1921.0 \| 1584.1 torch.Size([128, 128, 128]) \| 8835.9 \| 1808.9 \| 1968.7 \| 2197.6 \| 1961.8 torch.Size([1, 512, 512]) \| 4369.9 \| 4577.6 \| 4804.0 \| 4331.4 \| 4599.0 torch.Size([2, 512, 512]) \| 4635.9 \| 4850.1 \| 5159.1 \| 5315.4 \| 4845.5 torch.Size([4, 512, 512]) \| 5367.5 \| 5261.6 \| 6134.7 \| 5807.8 \| 5345.2 torch.Size([8, 512, 512]) \| 7025.2 \| 6184.5 \| 7065.6 \| 6711.6 \| 6303.9 torch.Size([16, 512, 512]) \| 10221.3 \| 7849.7 \| 8820.1 \| 8323.6 \| 7992.1 torch.Size([32, 512, 512]) \| 16574.8 \| 11208.4 \| 12284.3 \| 11704.7 \| 11394.4 torch.Size([64, 512, 512]) \| 29500.1 \| 18043.1 \| 19249.3 \| 18744.0 \| 18242.1 torch.Size([128, 512, 512]) \| 56783.3 \| 33903.9 \| 34713.5 \| 33893.8 \| 34041.8 torch.Size([1, 1024, 1024]) \| 14864.5 \| 15714.6 \| 16128.1 \| 14726.7 \| 14992.6 torch.Size([2, 1024, 1024]) \| 17891.0 \| 18553.3 \| 19111.6 \| 19271.5 \| 19283.0 torch.Size([4, 1024, 1024]) \| 22143.4 \| 21909.2 \| 23667.1 \| 22698.9 \| 22713.8 torch.Size([8, 1024, 1024]) \| 30621.1 \| 28669.9 \| 30822.9 \| 29725.0 \| 29760.8 torch.Size([16, 1024, 1024]) \| 47045.9 \| 41900.0 \| 44353.8 \| 43215.6 \| 43237.5 torch.Size([32, 1024, 1024]) \| 79245.5 \| 68316.9 \| 70959.0 \| 69506.4 \| 69876.7 torch.Size([64, 1024, 1024]) \| 147973.9 \| 121120.6 \| 124601.1 \| 122084.4 \| 122578.7 torch.Size([128, 1024, 1024]) \| 295586.2 \| 232871.8 \| 237421.8 \| 233765.3 \| 234704.6 Times are in microseconds (us). ``` </details> Here's the details of how the tests were performed: * CUSOLVER - Only call `cusolver` for all problem sizes. * MAGMA - Only call `magma` for all problem sizes (this is the current master branch). * CUBLAS - Only call `cublas` for all problem sizes. * cuSOLVER / MAGMA - Use cusolver for `batch_size == 1` and magma for all others. * TEST ALL - Employ heuristics to switch between cublas/cusolver/magma. This yields the best overall results (this PR). Script for reproducing the results: <details> ``` python import torch import pickle import itertools from torch.utils.benchmark import Timer import sys shapes = [1, 2, 8, 16, 32, 64, 128, 512, 1024] batches = [(1,), (2,), (4,), (8,), (16,), (32,), (64,), (128,)] results = [] num_threads = 1 dtype = torch.float64 repeats = 2 from torch.testing._internal.common_utils import random_hermitian_pd_matrix def lu_factorize_solve(mat, b): lu_data = torch.lu(mat) x = torch.lu_solve(b, lu_data) for shape, batch in itertools.product(shapes, batches): mat = torch.randn(batch, shape, shape, dtype=dtype, device='cuda') b = torch.randn(batch, shape, 1, dtype=dtype, device='cuda') tasks = [("lu_factorize_solve(mat, b)", "lu_solve CUSOLVER")] print("shape: ", shape, " batch: ", batch) timers = [Timer(stmt=stmt, num_threads=num_threads, label=f"LU solve CUDA {dtype}", sub_label=f"{mat.shape}", description=label, globals=globals()) for stmt, label in tasks] for i, timer in enumerate(timers repeats): results.append( pickle.dumps(timer.blocked_autorange()) ) print(f"\r{i + 1} / {len(timers) * repeats}", end="") sys.stdout.flush() f = open("cusolver_lu_solve.pickle", "wb") pickle.dump(results, f) f.close() ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/59148 Reviewed By: H-Huang Differential Revision: D29160609 Pulled By: mruberry fbshipit-source-id: 7280f25db1e66aa650ea15608a6dc5d688fb4db2	2021-06-20 21:27:35 -07:00
Jerry Zhang	4a3eea9a6a	[quant][graphmode][fx] Produce reference linear module in convert (#60152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60152 Test Plan: python test/test_quantization.py TestQuantizeFx Imported from OSS Reviewed By: vkuzo Differential Revision: D29188263 fbshipit-source-id: f7bbbef5d4d747eadf7a627a4e77a5ec9bb0bc94	2021-06-20 20:08:12 -07:00
Rong Rong (AI Infra)	510334f34b	[BE] clean up IS_PYTORCH_CI and IN_CI (#60279 ) Summary: `IS_PYTORCH_CI` and `IN_CI` are used randomly, however in some cases IN_CI is not currently set because it only exist in .circleci/scripts/setup_ci_environment.sh. This cleans up the 2 flags and only use IN_CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/60279 Test Plan: CI Reviewed By: seemethere Differential Revision: D29239545 Pulled By: walterddr fbshipit-source-id: a069424a2bb8790a3adfdaf0dc460301026bf8c7	2021-06-20 19:45:07 -07:00
Jerry Zhang	2293ab4e53	[quant][graphmode][fx] Refactor convert for linear to use get_static_module_mapping and get_dynamic_module_mapping (#60151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60151 Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D29188264 fbshipit-source-id: d2b77ffcf4b7446fc6c43248e43218092d2a6aea	2021-06-20 19:41:16 -07:00
Ivan Yashchuk	a516424a70	Update internal code for torch.linalg.solve (#56613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56613 Replace linalg_solve_helper with `lu_stub` + `lu_solve_stub`. Once `lu_stub` and `lu_solve_stub` have cuSOLVER-based codepath, `torch.linalg.solve` will have it as well. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28627408 Pulled By: mruberry fbshipit-source-id: b95bbdf35f845a56a1489c04b53742a01b36e789	2021-06-20 19:37:12 -07:00
Jerry Zhang	47d727fe1b	[quant][graphmode][fx] Produce conv reference static quant modules (#60138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60138 Test Plan: python test/test_quantization.py TestQuantizeFx Imported from OSS Reviewed By: vkuzo Differential Revision: D29184791 fbshipit-source-id: 971a40012dbba0cf687c62a3a4af9358513c253b	2021-06-20 19:25:45 -07:00
Masaki Kozuki	b298013cd5	[add/sub] Cast `alpha` to `acc_type` (#60227 ) Summary: This PR lets `torch.add` & `torch.sub` CUDA kernels cast `alpha` to `acc_type`, not `scalar_t`. I do not remove `cast`s from `test/test_foreach.py` because I'll do this in https://github.com/pytorch/pytorch/issues/59907 or follow-up for it. Current upstream `torch._foreach_add` & `torch._foreach_sub` upcast `alpha` parameter to `acc_type<scalar_t>` while `torch.add` & `torch.sub` not. This is kind of problematic because outputs of `torch.add` and `torch.sub` are different from `torch._foreach_add` and `torch._foreach_sub`, respectively if the dtype of input tensors is either `torch.half` or `torch.bfloat16`. The discrepancy is proportional-ish to `abs(alpha)` except when `alpha` is representable with 16 bits. ref: - `torch._foreach_add` & `torch._foreach_sub` cast `alpha`: `6d0fb85a62/aten/src/ATen/native/cuda/ForeachBinaryOpList.cu (L21-L28)`, `BinaryOpListAlphaFunctor` is defined here: `6d0fb85a62/aten/src/ATen/native/cuda/ForeachFunctors.cuh (L202)` related: https://github.com/pytorch/pytorch/issues/58833, https://github.com/pytorch/pytorch/pull/59907 cc ngimel ptrblck mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/60227 Reviewed By: mruberry Differential Revision: D29252759 Pulled By: ngimel fbshipit-source-id: 847f3b9493ae30a900f7445af00aef1abcc1ab21	2021-06-20 19:05:22 -07:00
Rohan Varma	0131a5972d	[DDP] Test inference works with eval() and no_grad() (#59666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59666 Tests that inference with DDP model won't hang when user sets eval() or no_grad(). Note that if the model has a syncBN layer, they need both eval() and no_grad() as eval() makes SyncBN work like a regular BN layer. ghstack-source-id: 131906625 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D28974146 fbshipit-source-id: 137f8245b1c303beb2416518476e70fe67c73376	2021-06-20 12:02:43 -07:00
Jiakai Liu	69b2bf70f9	[pytorch] fix tools/code_analyzer for llvm 11 (#60322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60322 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D29250420 Pulled By: ljk53 fbshipit-source-id: ff7f9cbacd1d9518ed81c06fc843a90d6948f760	2021-06-20 00:39:11 -07:00
Masaki Kozuki	c19acf816f	Replace TensorRT's deprecated API in `caffe2/python/trt/test_pt_onnx_trt.py` (#60236 ) Summary: TensorRT v8 is going to remove some functions/methods that used in test. ref: - getMaxWorkspaceSize deprecation: `b2d60b6e10/include/NvInfer.h (L6984-L6993)` - buildCudaEngine deprecation: `b2d60b6e10/include/NvInfer.h (L7079-L7087)` cc ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/60236 Reviewed By: gchanan Differential Revision: D29232376 Pulled By: ngimel fbshipit-source-id: 2b8a48787bf61c68a81568b6026d6afd5a83e751	2021-06-19 19:56:30 -07:00
kshitij12345	5ec4ad7f54	[special] Add special.ndtri (#58650 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 TODO * [x] Add docs https://13865352-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.ndtri * [x] Add comments on implementation * [x] Clean-up Pull Request resolved: https://github.com/pytorch/pytorch/pull/58650 Reviewed By: H-Huang Differential Revision: D29160170 Pulled By: mruberry fbshipit-source-id: 50e4ea663920e97b8437d03d5b52bcd9dedc1a8d	2021-06-19 18:36:54 -07:00
Jiakai Liu	5824a866b7	[pytorch][nnc] support custom class parameters (#59466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59466 Change saved parameter type from at::Tensor to at::IValue to support custom class parameters, e.g. `__torch__.torch.classes.xnnpack.Conv2dOpContext`. The NNC produced kernels won't deal with custom class parameters directly. They simply pass through to the external operators that take these custom class parameters, e.g. `prepacked::conv2d_clamp_run`. It will reuse the `__getstate__` and `__setstate__` methods on the custom class to persist and restore the state of the parameters. When calling into the kernel, it will pass in the untyped raw pointer of the custom class objects to the kernel as `void*`. It's similar to the regular tensor parameters, for which it will pass in the raw data pointer of the tensor storage. The generated kernel needs to hardcode the expected type for each parameter and cast before calling the external ops. ghstack-source-id: 131897904 Test Plan: - unit tests Reviewed By: kimishpatel Differential Revision: D28902496 fbshipit-source-id: 4b2c0895dd28f0b7d344aa08183d42ad6a355dae	2021-06-19 06:11:01 -07:00
Tao Xu	cac9ae1506	[iOS GPU][BE][3/n] Give MPSImage objects a label for better debugging experience (#60282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60282 1. Adds a label to the MPSImage objects. The label describes the size of the image. 2. Remove `[image markRead]`. 3. Rename two APIs for better naming convention. ghstack-source-id: 131839557 Test Plan: 1. CircleCI 2. buck test pp-mac Reviewed By: SS-JIA Differential Revision: D29232975 fbshipit-source-id: 075175c4b5a1c5b79e795f4860e1694d7c06d4f2	2021-06-18 18:47:05 -07:00
Tao Xu	b9cd97c94b	[iOS GPU][BE][2/n] Remove unused APIs (#60281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60281 1. REmove unused APIs from MPSImageUtils. 2. Move tensor related APIs from MetalUtils to MetalTensorUtils. Delete MetalUtils.h/mm 3. Move metal buffer related APIs to MetalContext ghstack-source-id: 131839559 Test Plan: 1. CircleCI 2. buck test pp-mac Reviewed By: SS-JIA Differential Revision: D29232973 fbshipit-source-id: a4c0c848883b8ef615eeb2936c1f3d18cddcb318	2021-06-18 18:47:04 -07:00
Tao Xu	80e6e3f1da	[iOS GPU][BE][1/n] Rename MPSCNNContext to MetalContext (#60280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60280 No significant changes besides renaming the class. In the future, we'll convert this objc class to c++. ghstack-source-id: 131827490 Test Plan: - CircleCI - buck test pp-mac Reviewed By: SS-JIA Differential Revision: D29231824 fbshipit-source-id: a0d1327a55a0414011c78a7144d3b05f1579cf42	2021-06-18 18:45:24 -07:00
Pritam Damania	319890b1b2	Support args in Pipe.forward API. (#55441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55441 This is the first step towards supporting the proposal outlined in https://github.com/pytorch/pytorch/issues/53952. In this PR I've ensured Pipe.forward() accepts a inputs argument instead of just a single input as previously. This lays the groundwork for supporting non-Tensors and generic arguments to the Pipe API. In this PR we still only support Tensors and non-Tensor support will come in future PRs. For backward compatibility I've ensured a single Tuple[Tensor] input still works as expected previously. ghstack-source-id: 130767499 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D27613887 fbshipit-source-id: 05e19e537e6d7fe4999745fc4ba9941ac54906de	2021-06-18 17:53:32 -07:00
Pritam Damania	a8430f1076	Remove PlacementSpec from ShardingSpecs. (#59990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59990 ShardingSpecs accepted a Device/PlacementSpec and was initially written this way for flexibility. Although, it is slightly confusing given there is no general use case for this. As a result, to keep things simple I've ensured that both specs only accept devices for now. We can always extend this to include a general PlacementSpec later on. ghstack-source-id: 131842525 Test Plan: waitforbuildbot Reviewed By: SciPioneer, rohan-varma Differential Revision: D29116463 fbshipit-source-id: a6f2b3f1346ac6afab91c9595d4cae4f4da04fda	2021-06-18 17:37:43 -07:00
Thomas J. Fan	1c97c3e3a4	DOC Adds LSTM docs for defined variables when bidirectional=True (#60120 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59332 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60120 Reviewed By: gchanan Differential Revision: D29240245 Pulled By: jbschlosser fbshipit-source-id: acad9c24f41f7253a7d42cd940e54bb66e083ecf	2021-06-18 17:28:44 -07:00
Joel Schlosser	aae2a3c95e	Clarify ConvTransposeNd + reference links (#60291 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56873 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60291 Reviewed By: gchanan Differential Revision: D29239199 Pulled By: jbschlosser fbshipit-source-id: 9b2de1a8b1a7444797f82c73195c5efc929562eb	2021-06-18 17:18:11 -07:00
Peter Bell	e8e3394ea8	Recognize transposed dense tensors as a form of partial overlap (#59014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59014 Fixes #48401 `assert_no_overlap` currently has a false-negative where it recognizes the transpose of a contiguous tensor as fully overlapping. This happens because the memory regions do fully overlap, but of course the strides are different so the actual elements don't all overlap. This goes slightly in the other direction, by requiring strides to exactly match we get false-positives for some unusual situations, e.g. ``` torch.add(a, a, out=a.view([1, *a.shape])) ``` Or replacing strides of length-1 dimensions, etc. However, I think these are sufficiently obscure that it's okay to error and the common cases like inplace operations still work as before. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D29040928 Pulled By: ngimel fbshipit-source-id: 5a636c67536a3809c83f0d3117d2fdf49c0a45e6	2021-06-18 16:29:25 -07:00
Raghavan Raman	47bbc01e0b	[nnc] Added micro-benchmark to show perf improvement with cat subgraph optimization (#59581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59581 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28955317 Pulled By: navahgar fbshipit-source-id: 53bb3dbfafbd3b146063f305523c2e6ec96cf6b8	2021-06-18 14:32:09 -07:00
Raghavan Raman	d0c4ace00f	[jit] Added a tranformation to move consumers of aten::cat to its inputs, in the fused subgraphs (#59580 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59580 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28955318 Pulled By: navahgar fbshipit-source-id: 7504d5aea441920f4eb9234cdfa17077161ab13c	2021-06-18 14:32:07 -07:00
Raghavan Raman	d4c626a346	[jit] Exported a method to get the supported list of elementwise ops (#60162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60162 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D29190841 Pulled By: navahgar fbshipit-source-id: bb786a653441c5b586509e25cc80d357d2223af3	2021-06-18 14:32:05 -07:00
Raghavan Raman	55755edc60	[jit] Made a list for element-wise ops. (#59579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59579 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D28955319 Pulled By: navahgar fbshipit-source-id: 605531aedf9250a226b0401d55fda3427bdc6f33	2021-06-18 14:30:47 -07:00
Jerry Zhang	a029422cae	[quant][graphmode][fx][refactor] Change the env map to add dtype as a key (#60054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60054 Previously env in convert is Dict[str, Tuple[Node, torch.dtype]], that is, at a given time each node can only have one dtype, this causes a problem for the following case: ``` class M(torch.nn.Module): def __init__(self): super().__init__() self.conv = nn.Conv2d(1, 1, 1) def forward(self, x): x = self.conv(x) x1 = x.expand_as(x) x2 = torch.add(x, x1) return x2 def forward(self, x): x = self.activation_post_process_0(x) x = self.conv(x) x = self.activation_post_process_1(x) x1 = x.expand_as(x) x1 = self.activation_post_process_2(x1) x2 = torch.add(x, x1) x2 = self.activation_post_process_3(x2) return x2 def forward(self, x): x = torch.quantize_per_tensor(x, ...) x = self.conv(x). # quantized conv x = torch.dequantize(x) x1 = x.expand_as(x) x1 = torch.quantize_per_tensor(x1, ...) # Error: x is dequantized x2 = torch.ops.quantized.add(x, x1) return x2 Currently we have a env that is a map from node name of the observed graph to the Node in the quantized graph, here the problem is that following a quantized operator conv, we have two operators, one is expecting float input (expand_as), the other is expecting quantized input (quantized add), and in the quantized graph, ideally, expand_as should consume the dequantized output, and quantized add should consume the quantized output: quantized_conv - dequantize - expand_as \ ------- quantized_add But currently in env, each node needs to either be quantized or not quantized. Therefore we will need to change env to include dtype as well: env: Dict[str, Dict[dtype, Node]], e.g. {‘x’: {torch.float: dequantized_node, torch.quint8: quantized_node}} And when we load from the env, we will need to provide the dtype of the Node that we want to load as well. We can have a separate pass to figure out this information for each node. ``` Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D29149408 fbshipit-source-id: c9e4b7d65444ab6a6f573929bae1db5037629892	2021-06-18 13:31:43 -07:00
Rong Rong (AI Infra)	c0f8cad0f0	Be fix shard inbalance (#60206 ) Summary: First step to address https://github.com/pytorch/pytorch/issues/60136 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60206 Reviewed By: janeyx99 Differential Revision: D29215237 Pulled By: walterddr fbshipit-source-id: ec25beb57366ef2eaf37878cdea391b245de9bef	2021-06-18 12:49:30 -07:00
Mikhail Zolotukhin	d9e7df707b	[TensorExpr] Add NNC lowerings for `aten::mean`, `aten::addmm`, and `aten::adaptive_avg_pool2d`. (#59347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59347 We had external call wrappers for them, but they were not used in NNC. This PR adds lowerings using these ext calls and fixes some bugs in them. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D28853832 Pulled By: ZolotukhinM fbshipit-source-id: 1718400368e1a9cf3f19180ee2290a4ed9c99d41	2021-06-18 11:56:32 -07:00
Mikhail Zolotukhin	c6bb9409b8	[TensorExpr] Handle not-specified dtypes and strides. (#59346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59346 Currently JIT has a pass to propagate shapes, but doesn't have a capability to fill in strides and dtypes. This PR works around that by assuming default dtype to be Float and strides corresponding to contiguous layout, unless otherwise specified. Ideally, we won't need this, and this is done simply as a workaround unless the corresponding features are implemented on JIT side. This is required for AOT compilation of mobilenet v3 with NNC. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28853831 Pulled By: ZolotukhinM fbshipit-source-id: 81adb59409684f39b444909ab8ec58ee4a39d496	2021-06-18 11:56:30 -07:00
Mikhail Zolotukhin	f042455a8d	[JIT] ShapeProp: add missing ops from mobilenet v3. (#59163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59163 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D28853833 Pulled By: ZolotukhinM fbshipit-source-id: 451fb9ee848968049d26fb5623a904d8fa7bd6fc	2021-06-18 11:55:00 -07:00
Eddie Yan	3870e68644	TF32 threshold twiddling for tests (#60209 ) Summary: Following https://github.com/pytorch/pytorch/issues/59624 I observed some straggling failing tests on Ampere due to TF32 thresholds. This PR just twiddles some more thresholds to fix the (6) failing tests I saw on A100. CC Flamefire ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/60209 Reviewed By: gchanan Differential Revision: D29220508 Pulled By: ngimel fbshipit-source-id: 7c83187a246e1b3a24b181334117c0ccf2baf311	2021-06-18 11:41:33 -07:00
Zhengxu Chen	5f010c066f	[package] Bring back save_source_file (#59962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59962 This reverts commit 44b021d21b5681c105529881bdbaefb6d3e335f6. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D29113224 Pulled By: zhxchen17 fbshipit-source-id: 55d42acc421c5f4abbbad9d9ed4d32b615939463	2021-06-18 11:13:35 -07:00
Vasiliy Kuznetsov	5a45103139	ns for fx: add API usage logging (#60103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60103 Adds internal logging for NS for FX API usage. Test Plan: CI Reviewed By: jerryzh168 Differential Revision: D29166710 fbshipit-source-id: 2a1bf2f6038b0c6c5945b57b2db2de25c585a04a	2021-06-18 10:25:59 -07:00
Ansha Yu	0baad214b0	[static runtime][fix] resize to the input tensor size for full_like (#60229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60229 Fix bug where we did not resize to the input tensor size, causing the output to be incorrect Test Plan: Test on replayer, rebased on D29217781, with model 278203319_26. Verify with jit outputs (D28583950) `./buck-out/gen/admarket/lib/ranking/prediction_replayer/replayer --model_inference_type_target=DISAGG_ACCELERATOR --prediction_replayer_force_model_type=inline_cvr_post_imp_model --prediction_replayer_force_model=278203319_26 --prediction_replayer_target_tier=sigrid.predictor.perf.dianshi_staticruntime_debug_0604.test --prediction_replayer_input_stream_filename=/data/users/ansha/tmp/adfinder/filtered_requests_inline_cvr_100 --ignore_model_id_mismatch --check_performance --fully_remote_sr_connection_options="overall_timeout:10000000,processing_timeout:10000000" --use_new_encoding_for_ads_services --use_new_encoding_from_model_id_to_shard_id --sigrid_force_model_dir=/data/users/ansha/tmp/adfinder/278203319_26/ --sigrid_predictor_model_suffix=.predictor.disagg.local —use_new_encoding_from_model_id_to_shard_id=true --prediction_replayer_force_model_kind=19 --pytorch_predictor_static_runtime_enable=true --prediction_replayer_target_qps=1` Reviewed By: hlu1, movefast1990 Differential Revision: D29218918 fbshipit-source-id: dab4bbbabeaa8367174ed90edca43d6204c65409	2021-06-18 09:56:25 -07:00
Rohan Varma	d5df274ea5	[DDP] Support for multiple backwards (#59359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59359 Move `prepare_for_backward` into `_DDPSink` backward instead of calling it in DDP forward pass so that we can run multiple backwards in DDP with `retain_graph=True`. ghstack-source-id: 131774159 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D28855226 fbshipit-source-id: 6b7b25d75b7696f5b5629078233433f97663d61c	2021-06-18 09:23:57 -07:00
Sam Estep	3815a013ed	Enable xenial-cuda11.1-cudnn8-py3.6-gcc7 in GHA (#60196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60196 Test Plan: https://github.com/pytorch/pytorch/issues/60198: https://github.com/pytorch/pytorch/actions/runs/947796763 I should have used `ghstack` but I forgot; will do that in the future. Reviewed By: walterddr Differential Revision: D29231161 Pulled By: samestep fbshipit-source-id: 8299a248ca9c1d36c3845d1c8a10ca9bf7101124	2021-06-18 09:18:53 -07:00
Philip Meier	d5988c5eca	remove unused `type: ignore` directives (#60006 ) Summary: During development it is common practice to put `type: ignore` comments on lines that are correct, but `mypy` doesn't recognize this. This often stems from the fact, that the used `mypy` version wasn't able to handle the used pattern. With every new release `mypy` gets better at handling complex code. In addition to fix all the previously accepted but now failing patterns, we should also revisit all `type: ignore` comments to see if they are still needed or not. Fortunately, we don't need to do it manually: by adding `warn_unused_ignores = True` to the configuration, `mypy` will error out in case it encounters an `type: ignore` that is no longer needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60006 Reviewed By: jbschlosser, malfet Differential Revision: D29133237 Pulled By: albanD fbshipit-source-id: 41e82edc5cd5affa7ccedad044b59b94dad4425a	2021-06-18 07:23:31 -07:00
TJ-coding	7c29ca7f2b	Fix Subset of a Subset not sliceable issue (#59513 ) Summary: Dataset can be indexed by a list, but a list can not be indexed by a list. This gives error when slicing a Subset initialised with a Subset, instead of a dataset. Fixed the issue by changing the indices to a Tensor which can be indexed by a list. Fixes https://github.com/pytorch/pytorch/issues/59512 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59513 Reviewed By: zou3519 Differential Revision: D29196891 Pulled By: ejguan fbshipit-source-id: ccde6e474fbcbddd2e9c7c107bc8b5de1307cdb9	2021-06-18 07:07:34 -07:00
Luca Wehrstedt	08ce5eedf5	[reland] Move RPC agents to libtorch (#60170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60170 Reland of #59939. Test Plan: CI Reviewed By: mrshenli Differential Revision: D29193234 fbshipit-source-id: ee2a90d6be961c10f91361512bdd4cadca43dd60	2021-06-18 05:15:09 -07:00
Luca Wehrstedt	958b881d70	[reland] Add some TORCH_API annotations to RPC (#60169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60169 Reland of #59939. ghstack-source-id: 131706861 Test Plan: CI Reviewed By: mrshenli Differential Revision: D29193233 fbshipit-source-id: 91d3ef9003b9da7b99e1b9310b7f5a6c505d3b99	2021-06-18 05:15:07 -07:00
Luca Wehrstedt	83fde5d981	[reland] Pass RequestCallback to FaultyPG RPC agent (#60168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60168 Reland of #59939. ghstack-source-id: 131706860 Test Plan: CI Reviewed By: mrshenli Differential Revision: D29193235 fbshipit-source-id: 170108956a041f6a91b2b21c76ab1a0e0cdd34a2	2021-06-18 05:13:57 -07:00
albanD	8a839c5478	Fix saved variable unpacking version counter (#60195 ) Summary: We only set the value and not the actual VC. This means that in the context of double backward, if that saved tensor is saved again and the original Tensor is modified inplace, we would not detect it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60195 Reviewed By: Varal7 Differential Revision: D29208766 Pulled By: albanD fbshipit-source-id: 81175f8e3f111f89524f8e46f47577b2ea4fc945	2021-06-18 04:36:46 -07:00
Mike Ruberry	5609c2e59c	Adds an OpInfo note (#57428 ) Summary: Like the title says. The OpInfo pattern can be confusing when first encountered, so this note links the Developer Wiki and tracking issue, plus elaborates on the goals and structure of the OpInfo pattern. cc imaginary-person, who I can't add as a reviewer, unfortunately Pull Request resolved: https://github.com/pytorch/pytorch/pull/57428 Reviewed By: SplitInfinity Differential Revision: D29221874 Pulled By: mruberry fbshipit-source-id: aa73228748c9c96eadf2b2397a8b2ec31383971e	2021-06-18 03:40:42 -07:00
driazati	ecc37184a5	Fix clang-tidy path filtering (#60225 ) Summary: PR https://github.com/pytorch/pytorch/issues/60048 neglected to include the `--paths` option for file filtering, so it ended up passing every changed file in the diff to clang-tidy (cpp files outside `torch/csrc/`, yaml/sh files, etc.). This adds that back in to make the filtering work properly again. Tested it manually by printing out the files to lint and running ```bash curl -L https://github.com/pytorch/pytorch/pull/60018.diff > diff python tools/clang_tidy.py --diff-file diff --paths torch/csrc/ curl -L https://github.com/pytorch/pytorch/pull/60222.diff > diff python tools/clang_tidy.py --diff-file diff --paths torch/csrc/ ``` Should fix https://github.com/pytorch/pytorch/issues/60192 and fix https://github.com/pytorch/pytorch/issues/60193, the files tripping errors there shouldn't have been passed to clang-tidy in the first place (supporting aten/ for clang-tidy is a separate task) Pull Request resolved: https://github.com/pytorch/pytorch/pull/60225 Reviewed By: zhouzhuojie Differential Revision: D29216251 Pulled By: driazati fbshipit-source-id: b5d7fb7161d33eb7958a6f1ccc25809942045209	2021-06-17 23:03:59 -07:00
Ruilin Chen	38c3116813	[hierarchical sharding 5/n] enable table-wise -> col-wise sharding in embedding table lookup Summary: This diff add table-wise -> col-wise sharding support in GroupedShardedEmbeddingBag. Changes includes: 1. Add necessary member variables set up. 2. Create new fast kernel and add fast kernel lookup support 3. Add intra-host all2all and cross-host all2all logic. Test Plan: UT ``` buck test mode/dev-nosan //caffe2/torch/fb/training_toolkit/backend/tests:test_model_materializer_full_sync_spawn ``` ``` buck test caffe2/torch/fb/hpc/tests:model_sharder_test ``` QPS check: ``` buck run mode/dev-nosan -c python.package_style=inplace caffe2/torch/fb/training_toolkit/examples:sync_sgd_local_driver -- prod-preset --num-trainers 32 --use-shrunk-model false --model-version=inline_cvr_dec_2020 --fast-kernel table_batched --max-batches 10000 --num-dpp-worker-threads 16 --num-readers 100 --hpc-identity ads_model_platform --table-partition hierarchical_based --hierarchical-options "["table_based", "column_based"]" --flow-entitlement ads_global_qps ``` with diff: dec inline_cvr: table-wise -> table-wise (82K): https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_d0a0cba5?version=0&tab=status&env=PRODUCTION table-wise -> column-wise (80k): https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_b1ac5873 column-wise: dec inline_cvr: gpu trace: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1623827677%2F127.0.0.1%2Flibkineto_activities_4550.json.gz&bucket=gpu_traces https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_a79e1522 (81k) https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_2dacc13e (88k) row-wise(62k): https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_4e349cab table-wise(90k): https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_5d51b608 10x ctr_mbl_feed: ``` buck run mode/dev-nosan -c python.package_style=inplace caffe2/torch/fb/training_toolkit/examples:sync_sgd_local_driver -- prod-preset --num-trainers 128 --use-shrunk-model false --model-version=ctr_mbl_oct_2020_10x_3tb --num-dpp-worker-threads 16 --num-readers 200 --fast-kernel table_batched --max-batches 5000000 --hpc-identity ads_model_platform --table-partition column_based --flow-entitlement ads_global_tc_mimo ``` column-wise: https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_f05fb306?version=0&tab=status&env=PRODUCTION (290k) w/o diff: dec inline_cvr: column-wise (87K): gpu trace: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1623864444%2F127.0.0.1%2Flibkineto_activities_4451.json.gz&bucket=gpu_traces https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_e1315f14 row-wise (60k): https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_8fcc0adf table-wise (91k): https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_cb94ff41 10x ctr_mbl_feed: https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_203ef35b?version=0&tab=status&env=PRODUCTION (281k) NE check(use deterministic reading D28711400) ``` buck run mode/dev-nosan -c python.package_style=inplace caffe2/torch/fb/training_toolkit/examples:sync_sgd_local_driver -- prod-preset --num-trainers 32 --use-shrunk-model false --model-version=inline_cvr_dec_2020 --fast-kernel table_batched --max-batches 100000 --num-dpp-worker-threads 16 --num-readers 64 --hpc-identity ads_model_platform --table-partition hierarchical_based --hierarchical-options "[table_based, column_based]" --flow-entitlement ads_global_qps --use-deterministic-model --use-deterministic-reading --model-entity-id 995557193 ``` w/o this diff: ``` I0611 12:19:18.766000 647 print_publisher.py:33 master ] Publishing batch metrics: ne-ne\|lifetime_ne 0.8660048340401448 I0611 12:19:18.766000 647 print_publisher.py:33 master ] Publishing batch metrics: ne-ne\|window_ne 0.8660048340401447 I0611 12:19:18.766000 647 print_publisher.py:33 master ] Publishing batch metrics: qps-qps\|total_examples 1867776.0 I0611 12:19:18.766000 647 print_publisher.py:33 master ] Publishing batch metrics: qps-qps\|window_qps 491.5199890136719 ``` https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_15bc6243?version=0&tab=status&env=PRODUCTION w this diff: ``` I0611 12:19:18.766000 647 print_publisher.py:33 master ] Publishing batch metrics: ne-ne\|lifetime_ne 0.8660048340401448 I0611 12:19:18.766000 647 print_publisher.py:33 master ] Publishing batch metrics: ne-ne\|window_ne 0.8660048340401447 I0611 12:19:18.766000 647 print_publisher.py:33 master ] Publishing batch metrics: qps-qps\|total_examples 1867776.0 ``` https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_15bc6243?version=0&tab=status&env=PRODUCTION Reviewed By: JadeNie Differential Revision: D28689126 fbshipit-source-id: 1c7879d4e3ee2b90aaf2a89e87f7b827d54173b3	2021-06-17 22:25:25 -07:00
Patrick Wang	8b55e9feaf	removed cat, equal, and stack from autocast promote list (#59497 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59497 Reviewed By: zou3519 Differential Revision: D29185909 Pulled By: ngimel fbshipit-source-id: db96239106d9e46a2704b8f457fd0463dacc1f5c	2021-06-17 21:13:22 -07:00
Gisle Dankel	faf459f13e	[Profiler] Fix memory profiler merge issue (#60037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60037 The memory profiler was broken due to a mis-merge during rebase. Add lost line back. Reviewed By: ezyang Differential Revision: D29143469 fbshipit-source-id: c3bf0088ca12e7535eeddbede24e28201eccd5f4	2021-06-17 21:05:23 -07:00
Patrick	bcf8752fb2	updated launch bounds for trilinear 3d (#59999 ) Summary: Updates launch bounds for upsample_trilinear_3d forward and backward kernel to remove register spilling into local memory. Improves runtime for forward pass by 3-4x factor, backward pass has same runtime (probably different bottleneck). Timing data: (Using Nvidia Titan-V GPU) ![TrilinearTimingData](https://user-images.githubusercontent.com/22803332/121979658-72f19200-cd3f-11eb-9363-c00e2c4eea6d.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/59999 Reviewed By: zou3519 Differential Revision: D29185976 Pulled By: ngimel fbshipit-source-id: 0b2313e70e45c53938cd7262464d3aa4fab8da4a	2021-06-17 21:02:12 -07:00
Thomas J. Fan	7e032f18cf	DOC Describes behavior for None in module.register_* (#60125 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45834 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60125 Reviewed By: zou3519 Differential Revision: D29196138 Pulled By: jbschlosser fbshipit-source-id: af736c0d66005ec33412860f00b233a5d2922137	2021-06-17 19:18:23 -07:00
Eli Uriegas	047925dac1	.github: Run Windows CUDA build on pull requests (#60215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60215 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D29214519 Pulled By: seemethere fbshipit-source-id: 58df5ee49cc5cd46f48938f023f87a6da958f3b6	2021-06-17 16:30:31 -07:00
Serhat Yilmaz	6af5d00e4b	[torch][segment_reduce] Add support for multi-dimensional input (cuda) (#60018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60018 Same as title. This diff finishes cuda support for currently implemented reductions and input parameters. Next Steps: - Add support for sum/min - More testing and benchmarking - Cleanup - Update default values when length is 0 - Use TensorIterator - Update documentation Test Plan: Unit test to cover cuda forward path. Reviewed By: ngimel Differential Revision: D29135373 fbshipit-source-id: d070727eeb660f56782e7ac8a5b0798be688480a	2021-06-17 16:30:30 -07:00
Serhat Yilmaz	a727f655c8	[torch][segment_reduce] Support for multi dimension (cpu only) (#59951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59951 Add support for multi-d input for cpu forward/backward implementation. Next step: Adding cuda support for multi-d input. Test Plan: Added unit tests. Reviewed By: ngimel Differential Revision: D29105457 fbshipit-source-id: a389ba4cc10f02434a336b8e7d36259f32552e11	2021-06-17 16:29:14 -07:00
Eli Uriegas	8e67981995	.github: Disable clang-tidy for now (#60219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60219 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D29214928 Pulled By: seemethere fbshipit-source-id: 20cf38ebfe77ed646e25293c577937c56bd930d3	2021-06-17 16:26:31 -07:00
Zhuojie Zhou	acf04cdedf	Fix default DEFAULT_FILE_PATTERN in clang-tidy (#60212 ) Summary: Without the change, clang-tidy also checks folders like `.circleci/...` Example of the clang-tidy that looked into `.circleci` changes https://github.com/pytorch/pytorch/runs/2844682644?check_suite_focus=true [skip ci] Pull Request resolved: https://github.com/pytorch/pytorch/pull/60212 Reviewed By: seemethere Differential Revision: D29214728 Pulled By: zhouzhuojie fbshipit-source-id: fd53f7b2f7d88936264db1effdc06cc4fc271ca4	2021-06-17 16:25:18 -07:00
zhouzhuojie	9c03de1dde	Use mirrors for ubuntu apt source (#60216 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60135 Experimented on circleci https://app.circleci.com/pipelines/github/zhouzhuojie/gha-ci-playground/7/workflows/965c95b8-2186-434a-92ca-9cd9c8aaafdc/jobs/7 Sample logs ``` Need to get 1,389 kB of archives. After this operation, 5,495 kB of additional disk space will be used. Get:1 http://mirrors.ubuntu.com/mirrors.txt Mirrorlist [3,270 B] Get:2 http://mirror.lstn.net/ubuntu focal/main amd64 libtcl8.6 amd64 8.6.10+dfsg-1 [902 kB] Get:7 http://ubuntu.securedservers.com focal/main amd64 libipc-run-perl all 20180523.0-2 [89.7 kB] Get:5 http://mirrors.edge.kernel.org/ubuntu focal/universe amd64 expect amd64 5.45.4-2build1 [137 kB] Get:4 http://mirror.pnl.gov/ubuntu focal/universe amd64 tcl-expect amd64 5.45.4-2build1 [105 kB] Get:6 http://mirror.lstn.net/ubuntu focal/main amd64 libio-pty-perl amd64 1:1.12-1 [32.4 kB] Get:9 https://mirrors.bloomu.edu/ubuntu focal/main amd64 libtimedate-perl all 2.3200-1 [34.0 kB] Get:8 http://la-mirrors.evowise.com/ubuntu focal/universe amd64 libtime-duration-perl all 1.21-1 [13.1 kB] Get:3 http://mirrors.ocf.berkeley.edu/ubuntu focal/main amd64 tcl8.6 amd64 8.6.10+dfsg-1 [14.8 kB] Get:10 http://mirrors.ocf.berkeley.edu/ubuntu focal/universe amd64 moreutils amd64 0.63-1 [60.5 kB] Fetched 1,392 kB in 3s (464 kB/s) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60216 Reviewed By: seemethere Differential Revision: D29214661 Pulled By: zhouzhuojie fbshipit-source-id: ed2d85f8c0c23af4bcf33558c57472fcf9d913e8	2021-06-17 16:19:27 -07:00
BowenBao	3995fb1840	Add new_ones symbolic (#59255 ) (#59539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59539 Add new_ones symbolic in PT-ONNX exporter Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D29046603 Pulled By: SplitInfinity fbshipit-source-id: e7420c7b543c33e3640e62461d08ff4d5843eda7 Co-authored-by: Shubham Bhokare <shubhambhokare@gmail.com>	2021-06-17 15:49:24 -07:00
Stephen Jia	ef1c107be5	[vulkan] Do not use memcmp to compare structs (#60199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60199 It isn't safe to use `memcmp` to determine the equality of structs due to potential random padding between fields of the struct. This can cause overloaded equality operators to return false when comparing structs with equivalent fields. This bug appears to be responsible for the Vulkan backend crashing on WorkVC release builds. Test Plan: Run Vulkan unit tests: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` Test on workvc rdk build, first ensure you are receiving the Vulkan models. ``` buck install fbsource//fbandroid/mode/opt fbsource//fbandroid/mode/aloha_build_rdk fbsource//fbandroid/mode/no_obfuscation fbandroid/buck-configs/buckconfig.caffe2_pkg_snpe_libs_android aloha_workvc_rdk --deep --show-full-output ``` Reviewed By: IvanKobzarev Differential Revision: D29203177 fbshipit-source-id: e0ee79d4e635174e165b250f2cee842a09092df9	2021-06-17 15:20:30 -07:00
Brian Hirsh	6d0fb85a62	Revert D28833086: beef up at::_ops API Test Plan: revert-hammer Differential Revision: D28833086 (`e2129d1c06`) Original commit changeset: 55f322a8378c fbshipit-source-id: e55bf812ec411bb6bee87654f1d65ff10c046106	2021-06-17 14:28:32 -07:00
Rohan Varma	0cbb5e15d7	Correct backend in pipe_with_ddp_test (#60123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60123 All of the tests would run with gloo, but some tests specify a different backend param which we should respect. ghstack-source-id: 131688188 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D29171549 fbshipit-source-id: 3e306060df189c0e38d5ca6dd34f4b4fbca052b9	2021-06-17 13:43:01 -07:00
Rohan Varma	acd914f039	Fix Pipe + DDP for unused parameters, static graph (#60118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60118 Pipe + DDP has a few issues: 1) with static graph, does not synchronize gradients on first backward pass (i.e. delay allreduce is not run). does not work since https://github.com/pytorch/pytorch/pull/55248 2) when find_unused_parameters=True, also does not results in gradient synchronization. does not work since https://github.com/pytorch/pytorch/pull/57081 The reason for both cases is that calling `DDPSink.apply(output_tensor)` does not call the custom `backward` of `DDPSink` when the `output_tensor` is actually an `OwnerRRef`, which is the case when running DDP in `Pipe`. This is because we do `backward` on the `rref.local_value()` which does not have this autograd recording. To fix, we unwrap the RRef and reconstruct it as needed, similar to the fix in https://github.com/pytorch/pytorch/pull/49908. to test: All tests in pipe_with_ddp_test pass. The reason these tests did not catch the errors earlier is because all ranks received the same model inputs. So if gradient synchronization did not occur, then grads would still be the same because the model is the same on all ranks (guaranteed by ddp). Fixed the tests to use different inputs across ranks. ghstack-source-id: 131688187 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D29167283 fbshipit-source-id: fe62310db2dc6de8519eb361b1df8ae4dfce3ab8	2021-06-17 13:41:51 -07:00
Tao Xu	2062cafaa5	[iOS GPU][MaskRCNN] Implement RoIAlign in Metal shaders using Sampler (#56075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56075 Inspired by the CUDA implementation - https://fburl.com/diffusion/e90tabkj. The main difference is the way we implement bilinear interpolation. CUDA does this manually by iterating every point in each bin box. Whereas, Metal does this by calling sampler's sample function, which is a bit easier and faster. The result is almost identical to the result from CPU - P365102522. We'll do another round of refactor once we have figured out how to support custom ops on GPU. ghstack-source-id: 131720620 Test Plan: 1. Circle CI 2. Sandcastle Reviewed By: ajtulloch Differential Revision: D27485068 fbshipit-source-id: 31e831aead9d3799a3fde96e99dd677d96bd3da1	2021-06-17 13:29:42 -07:00
Brian Hirsh	e2129d1c06	beef up at::_ops API (#59115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59115 This PR beefs up the `at::_ops::` API as a source of truth for compile-time information about each operator. ### Changes For every op defined in native_functions.yaml, e.g. `at::_ops::add_Tensor` previously defined an unambiguous function; effectively an unambiguously named version of the C++ API that you could decltype() successfully because it had no overloads with a user-facing macro: `decltype(ATEN_FN2(add, Tensor)) // expands to decltype(at::_ops::add_Tensor)`. Now, `at::_ops::add_Tensor` is a struct containing a few static fields and methods (declared in `Operators.h`, defined in `Operators.cpp`): ``` struct TORCH_API add_Tensor { using schema = at::Tensor (const at::Tensor &, const at::Tensor &, const at::Scalar &); using ptr_schema = at::Tensor ()(const at::Tensor &, const at::Tensor &, const at::Scalar &); static constexpr const char name = "aten::add"; static constexpr const char* overload_name = "Tensor"; static constexpr const char* schema_str = "add.Tensor(Tensor self, Tensor other, , Scalar alpha=1) -> Tensor"; static at::Tensor call(const at::Tensor & self, const at::Tensor & other, const at::Scalar & alpha); static at::Tensor redispatch(c10::DispatchKeySet dispatchKeySet, const at::Tensor & self, const at::Tensor & ot }; ``` What used to be the function `at::_ops::add_Tensor` can now be accessed as `at::_ops::add_Tensor::call`, and I've added a new macro to access the entire struct (naming suggestions welcome) - `ATEN_OP2(add, Tensor)`. ### Motivation There were two motivations for this change: Codegen refactor* The `at::_ops::` API as it exists now is (yet another) C++ entry point into the dispatcher, in addition to the Function, Method, and Redispatch APIs. Instead, after this PR, the existing three API's are all inline-able wrapper API's that call into the `at::_ops` API to do the real work. The function and method API's call into `at::_ops::{op}::call`, while the redispatch API calls into `at::_ops::{op}::redispatch`. This will hopefully make it easier to pile in any future C++ API's that we want to code-generate. It also means that stuff like the string name, overload name, and schema of each operator is consolidated in a single place, rather than having the codegen hardcode various strings in multiple codegen output files. Extra compile-time metadata In the [boxed CPU fallback PR](https://github.com/pytorch/pytorch/pull/58065/files#diff-c9b55f0d692a9bea8019c6f19bc46877f1efa0f9d4fc2086cf299b52768343b4R31) above this in the stack, I added a new API that external backends can use to call directly into their boxed fallback from an unboxed context. Adding extra metadata to `at::_ops` means that XLA's usage of that API doesn't require passing in the string name and overload of each name as arguments; we can just infer them. The updated API looks like this (see [the XLA-side PR ](https://github.com/pytorch/xla/pull/2945/files#diff-5e65c3c1d847191cb691d1874732e971f09fa1aad7a980a555c3b0504a5b6470R250) for more examples) ``` return at::native::call_fallback_fn<&xla_cpu_fallback, ATEN_OP2(add, Tensor)>::call(a, b, 1.0); ``` Characteristics of the `at::_ops` API (I also commented this in the codegen) (1) It follows the Dispatcher API. This means, e.g., that it takes in the expanded arguments rather than `TensorOptions`. This is kind of necessary for perf, if we want to `at::_ops` to serve as the main implementation of the existing C++ API's. For example: if it followed the C++ API, then all of the faithful C++ factory functions would need to wrap their arguments into TensorOptions only to unwrap them again. (2) Overload names are disambiguated. This is the same as before; it's helpful for pytorch extenders who would like to decltype() an aten operator, that has overloads, e.g. decltype(at::_ops::mul_Tensor::call) (3) No argument defaulting is allowed. This is more of an implementation detail to avoid #include cycles, since TensorBody.h (which defines the Tensor class) needs to include this file. The #include situation is precarious though! (4) manual_cpp_bindings and faithful names are not included in the API. I think that this is one we have a choice with. This applies to stuff like __dispatch__is_complex(), and add_outf(). These aren't "real native_functions.yaml ops", they're just additional functions provided by the C++ API. They're implemented as wrappers in Functions.h that call into the actual operators defined here, i.e. at::_ops::is_complex::call() and at::_ops::add_out::call(). This means that ATEN_OP(is_complex) will not fastpath, and will go through the dispatcher. It also means that `ATEN_OP2(add, out)` is automatically faithful and takes its out argument at the end (this is just because it follows the dispatcher API). Details Instead of codegen'ing the existing 3 API's in `Functions.cpp`, `TensorMethods.cpp` and `RedispatchFunctions.cpp`, I codegen them directly into the headers: `Functions.h`, `TensorBody.h`, and `RedispatchFunctions.h`. I mostly did this for perf, since we want to avoid introducing an extra function call in the hot path of every operator. These functions are also now all one-liners that call into `at::_ops`, so the compiler should just inline them all anyway. The main downside in doing that though was that I had to bend over backwards in a few cases to avoid cyclical #include statements. The issue is that `TensorBody.h` now includes `Operators.h` (because the codegen'd method API is implemented by calling into `at::_ops`), but `TensorBody.h` also includes the definition of the Tensor class. That means that `Operators.h` can't be aware of the Tensor class; it needs to forward declare everything and avoid using the Tensor class directly. To fix cyclic includes, I had to: - Not allow defaulting in the `at::_ops` API - Move some code that was called when translating from C++ to Dispatcher API's directly into the codegen template (`check_tensor_options_and_extract_memory_format`) It's not great, but I don't think this specific include cycle will break down in the near future; the only code that we need to call before getting to `Operators.cpp` is the translations from various API's to the dispatcher API; there aren't many of them, and there's no major reason for them to live an external utils file somewhere. Moving the code into the headers also meant that the codegen no longer needs to deal with `Functions.cpp`/`TensorMethods.cpp`/`RedispatchFunctions.cpp`. All of the functions that used to be defined in `TensorMethods.cpp` seemed small enough for me to lump into `TensorBody.h`, but some of the functions in `Functions.cpp` looked pretty big to put in a header, so I moved the file to `aten/src/ATen/native/Functions.cpp`. It might be worth keeping `TensorMethods.cpp` there and leaving it too, in-case we have any beefy hand-written tensor methods that we don't want to put in a header. Perf I ran a few benchmarks in callgrind, and didn't see a noticeable instruction count change when calling `at::add()`. I also saw in the output that `at::add()` was successfully getting inlined. There's also probably a light risk of binary size increase; I think that there's a binary size regression test that I can run in phabricator (going to try it). I can also try inspecting `libtorch.so` directly and seeing if it's any bigger, but my hope is that the inline-ing means that we aren't generated separate symbols for `at::add` and `at::_ops::add_Tensor::call`. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D28833086 Pulled By: bdhirsh fbshipit-source-id: 55f322a8378cb9a3cb6642f72aa291be381dd95b	2021-06-17 13:09:46 -07:00
Jane Xu	462448f07a	Enable GHA sharding on linux (#60124 ) Summary: This is branch off of https://github.com/pytorch/pytorch/issues/59970 to only shard on linux so far (we're running in issues with windows gflags). This would enable sharding of tests on a few Linux jobs on GHA, allowing tts to be essentially halved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60124 Reviewed By: zou3519 Differential Revision: D29204211 Pulled By: janeyx99 fbshipit-source-id: 1cc31d1eccd564d96e2aef14c0acae96a3f0fcd0	2021-06-17 13:00:23 -07:00
Shen Li	bbedfd913d	Run an dummy rpc._all_gather in init_rpc to avoid shutdown timeout (#59801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59801 Fixes https://github.com/pytorch/pytorch/issues/59795. The RPC calls in shutdown no longer able to finish within 5s if there is no other RPCs before `rpc.shutdown()` in that process, because agent initialization can take longer than 5s. We don't have this problem previously, because TensorPipe's backend registry used to use RPC to communicate CUDA devices in `init_rpc`. However, after #58753, `init_rpc` uses ProcessGroup to communicate devices, and hence the channels/transport could be uninitialized after `init_rpc`. Differential Revision: D29039238 D29039238 Test Plan: Imported from OSS Reviewed By: rohan-varma Pulled By: mrshenli fbshipit-source-id: 46f89b01a058a51d271ddef9084a67b220a067b7	2021-06-17 11:47:54 -07:00
Richard Zou	ebafd2aadf	Stop warning on .names() access in max_pool2d and max_pool2d_backward (#60059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60059 Fixes #60053. The problem is that `.names()` always triggers the named tensor warning. To not trigger it, one has to guard it with has_names: `x.has_names() ? x.names() : DimnameList{}` This is not the first time this has happened; we should probably make it so that .names() doesn't raise a warning unless it is actually populated with names. That's a little tricky to implement so I'm leaving it for the future. Test Plan: - New test, also run `python test/test_nn.py -v -k "max_pool"` and confirm there are no warnings. Reviewed By: gchanan Differential Revision: D29152737 Pulled By: zou3519 fbshipit-source-id: 89a2fdbe6a6064a7044b5b75f7d0c58e51e57509	2021-06-17 10:34:41 -07:00
Brian Hirsh	ef09428804	Revert D29104399: Port `all` kernel to structured kernels. Test Plan: revert-hammer Differential Revision: D29104399 (`7809494c68`) Original commit changeset: 18bb747b7a19 fbshipit-source-id: f57043df5646f1e675e8a555cb4fa0e436953751	2021-06-17 10:32:23 -07:00
Brian Hirsh	3ff5507fb0	Revert D29104395: Port `any` kernel to structured kernels. Test Plan: revert-hammer Differential Revision: D29104395 (`519698362d`) Original commit changeset: 0cfde57c22ba fbshipit-source-id: ac5ebdc4b9d3aeb4c5eeab55c92ac931599d39d1	2021-06-17 10:32:21 -07:00
Brian Hirsh	81baa7fb0d	Revert D29104398: Using meta checks for unary `torch.all` and `torch.any`. Test Plan: revert-hammer Differential Revision: D29104398 (`c078cefa7d`) Original commit changeset: 6771b80130c9 fbshipit-source-id: 10e5a34370113fcd2f87aea2c2e76108fa9328d8	2021-06-17 10:32:20 -07:00
Brian Hirsh	873dac4b5a	Revert D29104397: Port `argmax` to structured kernels. Test Plan: revert-hammer Differential Revision: D29104397 (`6f3da4f4bf`) Original commit changeset: 580355cf3b4e fbshipit-source-id: e51fb79329066bc1a6364cfa44a8732908a684ed	2021-06-17 10:32:18 -07:00
Brian Hirsh	6b5e77904f	Revert D29104396: Port `argmin` kernel to structured kernels. Test Plan: revert-hammer Differential Revision: D29104396 (`226d745a0b`) Original commit changeset: 39c59bcc0446 fbshipit-source-id: 82de26f925a885f65572a785fa45a9980d3a974b	2021-06-17 10:31:06 -07:00
Bin Bao	3dc8112187	[NNC] Handle int64 indices and loop bounds (#59769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59769 Allow loop bound and tensor indice to be either int32 or int64, and avoid unnecessary cast op. Test Plan: ``` build/bin/test_tensorexpr ``` Reviewed By: H-Huang Differential Revision: D29173970 Pulled By: desertfire fbshipit-source-id: 859a876ddb1b41535b2266089aa1222884295c78	2021-06-17 09:35:59 -07:00
Bin Bao	96b3537e71	[NNC] Add a dtypeToCppString virtual method in IRPrinter (#59449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59449 Make dtypeToCppString as a virtual method so that a child class can easily override the dtype string generation rule. This is needed as a preparation to make loop and tensor index as int64_t. Test Plan: ``` build/bin/test_tensorexpr ``` Reviewed By: H-Huang Differential Revision: D29173969 Pulled By: desertfire fbshipit-source-id: a447badba76788354da1c79f80c834c99f105776	2021-06-17 09:34:58 -07:00
Alexander Golynski	ed1da5be21	PG NCCL cleanup: remove usage of completed_ in WorkNCCL copies (#59899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59899 Test Plan: Imported from OSS Reviewed By: cbalioglu, osalpekar Differential Revision: D29080299 Pulled By: agolynski fbshipit-source-id: 9ae368f91e81f19471e0a20fc913d8e9df1b9dec	2021-06-17 09:05:35 -07:00
Sam Estep	010f4b6f2d	Add .isort.cfg (#60119 ) Summary: This adds the `.isort.cfg` file from https://github.com/pytorch/pytorch/issues/55928, but doesn't try to enforce it in CI because as that PR showed, that is currently difficult to do. We could use this to gradually sort the codebase according to this configuration (enforcing bits and pieces in CI) but I don't do that here. The advantage of including this file (even if we don't enforce it) is that it affects how certain tools work, thus encouraging a specific import style for people who happen to use those tools. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60119 Test Plan: Open `test/run_test.py` in VS Code and run the Python Refactor: Sort Imports command. Compare with and without this PR. Reviewed By: 1ntEgr8 Differential Revision: D29199504 Pulled By: samestep fbshipit-source-id: 83e937b0f517c60e3e7dedb6c0306173908fbbb0	2021-06-17 09:04:25 -07:00
Yukio Siraichi	226d745a0b	Port `argmin` kernel to structured kernels. (#59938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59938 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29104396 Pulled By: ezyang fbshipit-source-id: 39c59bcc044649c1ec9c9685366c4dda87f76aa7	2021-06-17 08:18:13 -07:00
Yukio Siraichi	6f3da4f4bf	Port `argmax` to structured kernels. (#59937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59937 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29104397 Pulled By: ezyang fbshipit-source-id: 580355cf3b4e9e5c934b4e51a16196087bcb3459	2021-06-17 08:18:12 -07:00
Yukio Siraichi	c078cefa7d	Using meta checks for unary `torch.all` and `torch.any`. (#59373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59373 This PR makes use of the newly implemented unified `at::meta::check_reduction` for validating the inputs and configuring its `TensorIterator`. Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29104398 Pulled By: ezyang fbshipit-source-id: 6771b80130c91c2f1360853127de0acebcfff183	2021-06-17 08:18:10 -07:00
Yukio Siraichi	519698362d	Port `any` kernel to structured kernels. (#59372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59372 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29104395 Pulled By: ezyang fbshipit-source-id: 0cfde57c22ba88607945c98f28b18df7709becd0	2021-06-17 08:18:08 -07:00
Yukio Siraichi	7809494c68	Port `all` kernel to structured kernels. (#59371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59371 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29104399 Pulled By: ezyang fbshipit-source-id: 18bb747b7a19d873427d52c1145ef7cede333a0e	2021-06-17 08:16:41 -07:00
Rong Rong (AI Infra)	b8ab98626b	only runs mem leak check on master (#60023 ) Summary: setting environment variable to only do cuda mem leak check on master CI jobs. See discussion in https://github.com/pytorch/pytorch/pull/59402#issuecomment-860773034 See stats before/after disabling mem leak check: https://github.com/pytorch/pytorch/pull/59942#issuecomment-860947095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60023 Test Plan: https://github.com/pytorch/pytorch/issues/60108 https://github.com/pytorch/pytorch/issues/60116 Reviewed By: janeyx99 Differential Revision: D29164182 Pulled By: walterddr fbshipit-source-id: dfe88c2c1275b6eb35f18b58aacdc220f34ccb59	2021-06-17 07:56:26 -07:00
Mike Ruberry	59b10036d5	Unifies OpInfo dtype tests (#60157 ) Summary: Simplifies the OpInfo dtype tests and produces nicer error messages, like: ``` AssertionError: Items in the first set but not the second: torch.bfloat16 Items in the second set but not the first: torch.int64 : Attempted to compare [set] types: Expected: {torch.float64, torch.float32, torch.float16, torch.bfloat16}; Actual: {torch.float64, torch.float32, torch.float16, torch.int64}. The supported dtypes for logcumsumexp on cuda according to its OpInfo are {torch.float64, torch.float32, torch.float16, torch.int64}, but the detected supported dtypes are {torch.float64, torch.float32, torch.float16, torch.bfloat16}. The following dtypes should be added to the OpInfo: {torch.bfloat16}. The following dtypes should be removed from the OpInfo: {torch.int64}. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/60157 Reviewed By: ngimel Differential Revision: D29188665 Pulled By: mruberry fbshipit-source-id: e84c9892c6040ea47adb027cfef3a6c0fd2f9f3c	2021-06-17 06:34:54 -07:00
Heitor Schueroff	4caca7a15b	Improved torch.einsum testing and fixed bug (#59731 ) Summary: Improved torch.einsum testing and fixed a bug where lower case letters appeared before upper case letters in the sorted order which is inconsistent with NumPy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59731 Reviewed By: SplitInfinity, ansley Differential Revision: D29183078 Pulled By: heitorschueroff fbshipit-source-id: a33980d273707da2d60a387a2af2fa41527ddb68	2021-06-17 04:48:47 -07:00

3698 changed files with 186605 additions and 89948 deletions

									
										2

.azure_pipelines/job_templates/build-verify-publish-template-unix.yml
									
												View File
												
				@ -44,7 +44,7 @@ jobs:

				        is_official_build: ${{ parameters.is_official_build}}

				    # Sync and update PyTorch submodules

				    - bash: git submodule update --init --recursive

				    - bash: git submodule update --init --recursive --jobs 0

				      displayName: Update PyTorch submodules

				    # Build PyTorch and run unit tests - no packaging

									
										2

.azure_pipelines/job_templates/build-verify-publish-template-win.yml
									
												View File
												
				@ -47,7 +47,7 @@ jobs:

				        is_official_build: ${{ parameters.is_official_build}}

				    # Sync and update PyTorch submodules

				    - script: git submodule update --init --recursive

				    - script: git submodule update --init --recursive --jobs 0

				      displayName: Update PyTorch submodules

				    # Build PyTorch and run unit tests - no packaging

									
										26

.azure_pipelines/job_templates/notify-webapp-template.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,26 @@

				parameters:

				  name: ''

				  pool: ''

				  customMatrixes: ''

				jobs:

				- job: ${{parameters.name}}

				  timeoutInMinutes: 600

				  strategy:

				    matrix:

				      ${{ insert }}: ${{parameters.customMatrixes}}

				  pool:

				    name: ${{ parameters.pool}}

				  steps:

				  # Clone PyTorch Tests repository

				  - bash: |

				      B64_PAT=$(echo -n ":$_ADOTOKEN" | base64)

				      git -c http.extraHeader="Authorization: Basic ${B64_PAT}" clone $(AZURE_DEVOPS_PYTORCH_TESTS_REPO_URL)

				      cd pytorch_tests

				      git checkout $(PYTORCH_TESTS_CHECKOUT_BRANCH)

				    env:

				      _ADOTOKEN: $(AZURE_DEVOPS_CLI_PAT)

				    displayName: Clone PyTorch Tests repo

				  - bash: |

				      bash $(Build.SourcesDirectory)/pytorch_tests/webapp/notify_webapp.sh

				    displayName: Notify Webapp

									
										2

.azure_pipelines/job_templates/pytorch-template-unix.yml
									
												View File
												
				@ -33,7 +33,7 @@ jobs:

				  # Clone PyTorch Tests repository

				  - bash: |

				      B64_PAT=$(printf "%s"":$_ADOTOKEN" | base64)

				      B64_PAT=$(echo -n ":$_ADOTOKEN" | base64)

				      git -c http.extraHeader="Authorization: Basic ${B64_PAT}" clone $(AZURE_DEVOPS_PYTORCH_TESTS_REPO_URL)

				      cd pytorch_tests

				      git checkout $(PYTORCH_TESTS_CHECKOUT_BRANCH)

									
										2

.azure_pipelines/job_templates/wheel-wait-job-template.yml
									
												View File
												
				@ -8,7 +8,7 @@ steps:

				    connectionType: 'connectedServiceName'

				    serviceConnection: circleciconn

				    method: 'POST'

				    headers: '{"Content-Type":"application/json", "BranchName":"$(_TARGET_BRANCH_TO_CHECK)", "JobName":"$(TARGET_CIRCLECI_BUILD_PR)", "PRNumber":"$(_NUMBER_BUILD_PR)", "TargetCommit":"$(_TARGET_COMMIT)", "PlanUrl":"$(System.CollectionUri)", "ProjectId":"$(System.TeamProjectId)", "HubName":"$(System.HostType)", "PlanId":"$(System.PlanId)", "JobId":"$(System.JobId)", "TimelineId":"$(System.TimelineId)", "TaskInstanceId":"$(System.TaskInstanceId)", "AuthToken":"$(System.AccessToken)"}'

				    headers: '{"Content-Type":"application/json", "BranchName":"$(_TARGET_BRANCH_TO_CHECK)", "JobName":"$(TARGET_CIRCLECI_BUILD_PR)", "PRNumber":"$(_TARGET_PR_NUMBER)", "TargetCommit":"$(_TARGET_COMMIT)", "PlanUrl":"$(System.CollectionUri)", "ProjectId":"$(System.TeamProjectId)", "HubName":"$(System.HostType)", "PlanId":"$(System.PlanId)", "JobId":"$(System.JobId)", "TimelineId":"$(System.TimelineId)", "TaskInstanceId":"$(System.TaskInstanceId)", "AuthToken":"$(System.AccessToken)"}'

				    body: ''

				    urlSuffix: 'api/JobStatus'

				    waitForCompletion: true

									
										10

.azure_pipelines/nightly-pytorch-tests-pipeline.yml
									
												View File
												
				@ -48,3 +48,13 @@ stages:

				          _PYTHON_VERSION: $(PYTHON_VERSION_WIN_2)

				          _CUDA_BUILD_VERSION: $(CUDA_BUILD_VERSION_WIN_2)

				          _RUN_TESTS: $(RUN_TESTS_WIN)

				- stage: 'NotifyWebapp'

				  displayName: 'Notify Webapp that pipeline is finished'

				  dependsOn: NightlyCustomTests

				  condition: succeededOrFailed()

				  jobs:

				  - template: job_templates/notify-webapp-template.yml

				    parameters:

				      name: ubuntu_1804_CPU

				      pool: $(BUILD_POOL_LIN_1)

									
										20

.azure_pipelines/pytorch-tests-pipeline.yml
									
												View File
												
				@ -22,7 +22,7 @@ stages:

				  - template: job_templates/wheel-wait-template.yml

				  variables:

				    _TARGET_BRANCH_TO_CHECK: ${{parameters.GitHubPyTorchPRTrigger.TARGET_BRANCH_TO_CHECK_AZ_DEVOPS_PR}}

				    _NUMBER_BUILD_PR: ${{parameters.GitHubPyTorchPRTrigger.PR_NUMBER}}

				    _TARGET_PR_NUMBER: ${{parameters.GitHubPyTorchPRTrigger.PR_NUMBER}}

				    _TARGET_COMMIT: ${{parameters.GitHubPyTorchPRTrigger.TARGET_COMMIT}}

				- stage: 'PRCustomTests'

				@ -40,7 +40,23 @@ stages:

				          _CUDA_BUILD_VERSION: $(CUDA_BUILD_VERSION_PR)

				          _TARGET_CIRCLECI_BUILD: $(TARGET_CIRCLECI_BUILD_PR)

				          _TARGET_BRANCH_TO_CHECK: ${{parameters.GitHubPyTorchPRTrigger.TARGET_BRANCH_TO_CHECK_AZ_DEVOPS_PR}}

				          _NUMBER_BUILD_PR: ${{parameters.GitHubPyTorchPRTrigger.PR_NUMBER}}

				          _TARGET_PR_NUMBER: ${{parameters.GitHubPyTorchPRTrigger.PR_NUMBER}}

				          _TARGET_COMMIT: ${{parameters.GitHubPyTorchPRTrigger.TARGET_COMMIT}}

				          _DOCKER_IMAGE: $(DOCKER_IMAGE_PR)

				          _RUN_TESTS: $(RUN_TESTS_PR)

				- stage: 'NotifyWebapp'

				  displayName: 'Notify Webapp that pipeline is finished'

				  dependsOn: PRCustomTests

				  condition: succeededOrFailed()

				  jobs:

				  - template: job_templates/notify-webapp-template.yml

				    parameters:

				      name: ubuntu_1804_CPU

				      pool: $(BUILD_POOL_LIN_1)

				      customMatrixes:

				        PR_Notify_WebApp:

				          _TARGET_CIRCLECI_BUILD: $(TARGET_CIRCLECI_BUILD_PR)

				          _TARGET_BRANCH_TO_CHECK: ${{parameters.GitHubPyTorchPRTrigger.TARGET_BRANCH_TO_CHECK_AZ_DEVOPS_PR}}

				          _TARGET_PR_NUMBER: ${{parameters.GitHubPyTorchPRTrigger.PR_NUMBER}}

				          _TARGET_COMMIT: ${{parameters.GitHubPyTorchPRTrigger.TARGET_COMMIT}}

10

.bazelrc

View File

 @ -1,3 +1,13 @@
 build --copt=--std=c++14
 build --copt=-I.
 build --copt=-isystem --copt bazel-out/k8-fastbuild/bin
 # Configuration to disable tty features for environments like CI
 build:no-tty --curses no
 build:no-tty --progress_report_interval 10
 build:no-tty --show_progress_rate_limit 10
 # Configuration to build with GPU support
 build:gpu --define=cuda=true
 # define a separate build folder for faster switching between configs
 build:gpu --platform_suffix=-gpu

2

.bazelversion

View File

 @ -1 +1 @@
 .1.0
 .2.1

									
										1

.circleci/README.md
									
												View File
												
				@ -343,7 +343,6 @@ All linux builds occur in docker images. The docker images are

				    * Has ALL CUDA versions installed. The script pytorch/builder/conda/switch_cuda_version.sh sets /usr/local/cuda to a symlink to e.g. /usr/local/cuda-10.0 to enable different CUDA builds

				    * Also used for cpu builds

				* pytorch/manylinux-cuda90

				* pytorch/manylinux-cuda92

				* pytorch/manylinux-cuda100

				    * Also used for cpu builds

									
										3

.circleci/cimodel/data/binary_build_data.py
									
												View File
												
				@ -126,9 +126,6 @@ class PackageFormatConfigNode(ConfigNode):

				        self.props["python_versions"] = python_versions

				        self.props["package_format"] = package_format

				        # XXX Disabling conda for 11.3 as there's currently no appropriate cudatoolkit available

				        if package_format == "conda":

				            self.props["gpu_versions"] = filter(lambda x: x != "cuda113", self.find_prop("gpu_versions"))

				    def get_children(self):

				        if self.find_prop("os_name") == "linux":

									
										6

.circleci/cimodel/data/binary_build_definitions.py
									
												View File
												
				@ -124,9 +124,9 @@ class Conf(object):

				        Output looks similar to:

				      - binary_upload:

				          name: binary_linux_manywheel_3_7m_cu92_devtoolset7_nightly_upload

				          name: binary_linux_manywheel_3_7m_cu113_devtoolset7_nightly_upload

				          context: org-member

				          requires: binary_linux_manywheel_3_7m_cu92_devtoolset7_nightly_test

				          requires: binary_linux_manywheel_3_7m_cu113_devtoolset7_nightly_test

				          filters:

				            branches:

				              only:

				@ -134,7 +134,7 @@ class Conf(object):

				            tags:

				              only: /v[0-9]+(\\.[0-9]+)*-rc[0-9]+/

				          package_type: manywheel

				          upload_subfolder: cu92

				          upload_subfolder: cu113

				        """

				        return {

				            "binary_upload": OrderedDict({

									
										74

.circleci/cimodel/data/pytorch_build_data.py
									
												View File
												
				@ -7,26 +7,19 @@ CONFIG_TREE_DATA = [

				            ("5.4", [  # All this subtree rebases to master and then build

				                ("3.6", [

				                    ("important", [X(True)]),

				                    ("parallel_tbb", [X(True)]),

				                    ("parallel_native", [X(True)]),

				                    ("pure_torch", [X(True)]),

				                ]),

				            ]),

				            # TODO: bring back libtorch test

				            ("7", [X("3.6")]),

				        ]),

				        ("clang", [

				            ("5", [

				            ("7", [

				                ("3.6", [

				                    ("asan", [

				                        (True, [

				                            ("shard_test", [XImportant(True)]),

				                        ]),

				                    ]),

				                ]),

				            ]),

				            ("7", [

				                ("3.6", [

				                    ("onnx", [XImportant(True)]),

				                ]),

				            ]),

				@ -34,38 +27,28 @@ CONFIG_TREE_DATA = [

				        ("cuda", [

				            ("10.2", [

				                ("3.6", [

				                    ("shard_test", [X(True)]),

				                    # Build are needed for slow_gradcheck

				                    ('build_only', [X(True)]),

				                    ("slow_gradcheck", [

				                        # If you update this slow gradcheck, you should

				                        # also update docker_definitions.py to make sure

				                        # the docker image match the config used here

				                        (True, [

				                            ('shard_test', [XImportant(True)]),

				                        ]),

				                    ]),

				                    ("libtorch", [

				                        (True, [

				                            ('build_only', [X(True)]),

				                        ]),

				                    ]),

				                ]),

				            ]),

				            ("11.1", [

				                ("3.8", [

				                    ("shard_test", [XImportant(True)]),

				                    ("libtorch", [

				                        (True, [

				                            ('build_only', [X(True)]),

				                        ]),

				                    ]),

				                    # UNCOMMENT THE BELOW TO REENABLE LIBTORCH

				                    # ("libtorch", [

				                    #     (True, [

				                    #         ('build_only', [X(True)]),

				                    #     ]),

				                    # ]),

				                ]),

				            ]),

				        ]),

				    ]),

				    ("bionic", [

				        ("clang", [

				            ("9", [

				                ("3.6", [

				                    ("noarch", [XImportant(True)]),

				                ]),

				            ]),

				            ("9", [

				                ("3.6", [

				                    ("xla", [XImportant(True)]),

				@ -73,31 +56,14 @@ CONFIG_TREE_DATA = [

				                ]),

				            ]),

				        ]),

				        ("cuda", [

				            ("10.2", [

				                ("3.9", [

				                    ("shard_test", [XImportant(True)]),

				                ]),

				            ]),

				        ]),

				        ("gcc", [

				            ("9", [

				                ("3.8", [

				                    ("coverage", [

				                        (True, [

				                            ("shard_test", [XImportant(True)]),

				                        ]),

				                    ]),

				                ]),

				            ]),

				        ]),

				        ("rocm", [

				            ("3.9", [

				                ("3.6", [

				                    ('build_only', [XImportant(True)]),

				                ]),

				            ]),

				        ]),

				        # @jithunnair-amd believes Jenkins builds are sufficient

				        # ("rocm", [

				        #     ("3.9", [

				        #         ("3.6", [

				        #             ('build_only', [XImportant(True)]),

				        #         ]),

				        #     ]),

				        # ]),

				    ]),

				]

									
										62

.circleci/cimodel/data/pytorch_build_definitions.py
									
												View File
												
				@ -31,6 +31,7 @@ class Conf:

				    is_libtorch: bool = False

				    is_important: bool = False

				    parallel_backend: Optional[str] = None

				    build_only: bool = False

				    @staticmethod

				    def is_test_phase(phase):

				@ -112,6 +113,8 @@ class Conf:

				            parameters["resource_class"] = "xlarge"

				        if hasattr(self, 'filters'):

				            parameters['filters'] = self.filters

				        if self.build_only:

				            parameters['build_only'] = miniutils.quote(str(int(True)))

				        return parameters

				    def gen_workflow_job(self, phase):

				@ -175,35 +178,6 @@ class DocPushConf(object):

				            }

				        }

				# TODO Convert these to graph nodes

				def gen_dependent_configs(xenial_parent_config):

				    extra_parms = [

				        (["multigpu"], "large"),

				        (["nogpu", "NO_AVX2"], None),

				        (["nogpu", "NO_AVX"], None),

				        (["slow"], "medium"),

				    ]

				    configs = []

				    for parms, gpu in extra_parms:

				        c = Conf(

				            xenial_parent_config.distro,

				            ["py3"] + parms,

				            pyver=xenial_parent_config.pyver,

				            cuda_version=xenial_parent_config.cuda_version,

				            restrict_phases=["test"],

				            gpu_resource=gpu,

				            parent_build=xenial_parent_config,

				            is_important=False,

				        )

				        configs.append(c)

				    return configs

				def gen_docs_configs(xenial_parent_config):

				    configs = []

				@ -211,7 +185,7 @@ def gen_docs_configs(xenial_parent_config):

				        HiddenConf(

				            "pytorch_python_doc_build",

				            parent_build=xenial_parent_config,

				            filters=gen_filter_dict(branches_list=r"/.*/",

				            filters=gen_filter_dict(branches_list=["master", "nightly"],

				                                    tags_list=RC_PATTERN),

				        )

				    )

				@ -227,7 +201,7 @@ def gen_docs_configs(xenial_parent_config):

				        HiddenConf(

				            "pytorch_cpp_doc_build",

				            parent_build=xenial_parent_config,

				            filters=gen_filter_dict(branches_list=r"/.*/",

				            filters=gen_filter_dict(branches_list=["master", "nightly"],

				                                    tags_list=RC_PATTERN),

				        )

				    )

				@ -238,13 +212,6 @@ def gen_docs_configs(xenial_parent_config):

				            branch="master",

				        )

				    )

				    configs.append(

				        HiddenConf(

				            "pytorch_doc_test",

				            parent_build=xenial_parent_config

				        )

				    )

				    return configs

				@ -369,6 +336,7 @@ def instantiate_configs(only_slow_gradcheck):

				            is_libtorch=is_libtorch,

				            is_important=is_important,

				            parallel_backend=parallel_backend,

				            build_only=build_only,

				        )

				        # run docs builds on "pytorch-linux-xenial-py3.6-gcc5.4". Docs builds

				@ -389,19 +357,19 @@ def instantiate_configs(only_slow_gradcheck):

				                                        tags_list=RC_PATTERN)

				            c.dependent_tests = gen_docs_configs(c)

				        if cuda_version == "10.2" and python_version == "3.6" and not is_libtorch and not is_slow_gradcheck:

				            c.dependent_tests = gen_dependent_configs(c)

				        if (

				            compiler_name == "gcc"

				            and compiler_version == "5.4"

				            compiler_name != "clang"

				            and not rocm_version

				            and not is_libtorch

				            and not is_vulkan

				            and not is_pure_torch

				            and parallel_backend is None

				            and not is_noarch

				            and not is_slow_gradcheck

				            and not only_slow_gradcheck

				            and not build_only

				        ):

				            bc_breaking_check = Conf(

				                "backward-compatibility-check",

				            distributed_test = Conf(

				                c.gen_build_name("") + "distributed",

				                [],

				                is_xla=False,

				                restrict_phases=["test"],

				@ -409,7 +377,7 @@ def instantiate_configs(only_slow_gradcheck):

				                is_important=True,

				                parent_build=c,

				            )

				            c.dependent_tests.append(bc_breaking_check)

				            c.dependent_tests.append(distributed_test)

				        config_list.append(c)

									
										16

.circleci/cimodel/data/simple/docker_definitions.py
									
												View File
												
				@ -6,35 +6,39 @@ from cimodel.data.simple.util.branch_filters import gen_filter_dict, RC_PATTERN

				# TODO: make this generated from a matrix rather than just a static list

				IMAGE_NAMES = [

				    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9",

				    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7",

				    "pytorch-linux-bionic-py3.6-clang9",

				    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9",

				    "pytorch-linux-bionic-py3.8-gcc9",

				    "pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7",

				    "pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7",

				    "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",

				    "pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",

				    "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7",

				    "pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				    "pytorch-linux-xenial-py3-clang5-asan",

				    "pytorch-linux-xenial-py3-clang7-asan",

				    "pytorch-linux-xenial-py3-clang7-onnx",

				    "pytorch-linux-xenial-py3.8",

				    "pytorch-linux-xenial-py3.6-clang7",

				    "pytorch-linux-xenial-py3.6-gcc5.4",  # this one is used in doc builds

				    "pytorch-linux-xenial-py3.6-gcc7.2",

				    "pytorch-linux-xenial-py3.6-gcc7",

				    "pytorch-linux-bionic-rocm3.9-py3.6",

				    "pytorch-linux-bionic-rocm4.0.1-py3.6",

				    "pytorch-linux-bionic-rocm4.1-py3.6",

				    "pytorch-linux-bionic-rocm4.2-py3.6",

				    "pytorch-linux-bionic-rocm4.3.1-py3.6",

				]

				# This entry should be an element from the list above

				# This should contain the image matching the "slow_gradcheck" entry in

				# pytorch_build_data.py

				SLOW_GRADCHECK_IMAGE_NAME = "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

				def get_workflow_jobs():

				def get_workflow_jobs(only_slow_gradcheck=False):

				    """Generates a list of docker image build definitions"""

				    ret = []

				    for image_name in IMAGE_NAMES:

				        if only_slow_gradcheck and image_name is not SLOW_GRADCHECK_IMAGE_NAME:

				            continue

				        parameters = OrderedDict({

				            "name": quote(f"docker-{image_name}"),

				            "image_name": quote(image_name),

									
										78

.circleci/cimodel/data/simple/ge_config_tests.py
									
												View File
											
				@ -1,78 +0,0 @@

				import cimodel.lib.miniutils as miniutils

				from cimodel.data.simple.util.versions import MultiPartVersion, CudaVersion

				from cimodel.data.simple.util.docker_constants import DOCKER_IMAGE_BASIC, DOCKER_IMAGE_CUDA_10_2

				class GeConfigTestJob:

				    def __init__(self,

				                 py_version,

				                 gcc_version,

				                 cuda_version,

				                 variant_parts,

				                 extra_requires,

				                 use_cuda_docker=False,

				                 build_env_override=None):

				        self.py_version = py_version

				        self.gcc_version = gcc_version

				        self.cuda_version = cuda_version

				        self.variant_parts = variant_parts

				        self.extra_requires = extra_requires

				        self.use_cuda_docker = use_cuda_docker

				        self.build_env_override = build_env_override

				    def get_all_parts(self, with_dots):

				        maybe_py_version = self.py_version.render_dots_or_parts(with_dots) if self.py_version else []

				        maybe_gcc_version = self.gcc_version.render_dots_or_parts(with_dots) if self.gcc_version else []

				        maybe_cuda_version = self.cuda_version.render_dots_or_parts(with_dots) if self.cuda_version else []

				        common_parts = [

				            "pytorch",

				            "linux",

				            "xenial",

				        ] + maybe_cuda_version + maybe_py_version + maybe_gcc_version

				        return common_parts + self.variant_parts

				    def gen_tree(self):

				        resource_class = "gpu.medium" if self.use_cuda_docker else "large"

				        docker_image = DOCKER_IMAGE_CUDA_10_2 if self.use_cuda_docker else DOCKER_IMAGE_BASIC

				        full_name = "_".join(self.get_all_parts(False))

				        build_env = self.build_env_override or "-".join(self.get_all_parts(True))

				        props_dict = {

				            "name": full_name,

				            "build_environment": build_env,

				            "requires": self.extra_requires,

				            "resource_class": resource_class,

				            "docker_image": docker_image,

				        }

				        if self.use_cuda_docker:

				            props_dict["use_cuda_docker_runtime"] = miniutils.quote(str(1))

				        return [{"pytorch_linux_test": props_dict}]

				WORKFLOW_DATA = [

				    GeConfigTestJob(

				        MultiPartVersion([3, 6], "py"),

				        MultiPartVersion([5, 4], "gcc"),

				        None,

				        ["jit_legacy", "test"],

				        ["pytorch_linux_xenial_py3_6_gcc5_4_build"]),

				    GeConfigTestJob(

				        None,

				        None,

				        CudaVersion(10, 2),

				        ["cudnn7", "py3", "jit_legacy", "test"],

				        ["pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build"],

				        use_cuda_docker=True,

				    ),

				]

				def get_workflow_jobs():

				    return [item.gen_tree() for item in WORKFLOW_DATA]

									
										2

.circleci/cimodel/data/simple/ios_definitions.py
									
												View File
												
				@ -1,7 +1,7 @@

				from cimodel.data.simple.util.versions import MultiPartVersion

				import cimodel.lib.miniutils as miniutils

				XCODE_VERSION = MultiPartVersion([12, 0, 0])

				XCODE_VERSION = MultiPartVersion([12, 5, 1])

				class ArchVariant:

									
										3

.circleci/cimodel/data/simple/nightly_ios.py
									
												View File
												
				@ -1,4 +1,5 @@

				import cimodel.data.simple.ios_definitions as ios_definitions

				import cimodel.lib.miniutils as miniutils

				class IOSNightlyJob:

				@ -43,6 +44,8 @@ class IOSNightlyJob:

				            props_dict["ios_arch"] = self.variant

				            props_dict["ios_platform"] = ios_definitions.get_platform(self.variant)

				            props_dict["name"] = self.gen_job_name()

				            props_dict["use_metal"] = miniutils.quote(str(int(True)))

				            props_dict["use_coreml"] = miniutils.quote(str(int(True)))

				        template_name = "_".join([

				            "binary",

									
										11

.circleci/cimodel/data/windows_build_definitions.py
									
												View File
												
				@ -58,7 +58,7 @@ class WindowsJob:

				            self.cudnn_version = 8 if self.cuda_version.major == 11 else 7

				        arch_env_elements = (

				            ["cuda" + str(self.cuda_version.major), "cudnn" + str(self.cudnn_version)]

				            ["cuda" + str(self.cuda_version.major) + "." + str(self.cuda_version.minor)]

				            if self.cuda_version

				            else ["cpu"]

				        )

				@ -78,6 +78,7 @@ class WindowsJob:

				            props_dict = {

				                "build_environment": build_environment_string,

				                "python_version": miniutils.quote(python_version),

				                "vs_version": miniutils.quote("16.8.6"),

				                "vc_version": miniutils.quote(self.vscode_spec.dotted_version()),

				                "vc_year": miniutils.quote(str(self.vscode_spec.year)),

				                "vc_product": self.vscode_spec.get_product(),

				@ -145,10 +146,10 @@ class VcSpec:

				_VC2019 = VcSpec(2019)

				WORKFLOW_DATA = [

				    # VS2019 CUDA-10.1

				    WindowsJob(None, _VC2019, CudaVersion(10, 1), master_only=True),

				    # VS2019 CUDA-10.1 force on cpu

				    WindowsJob(1, _VC2019, CudaVersion(10, 1), force_on_cpu=True, master_only=True),

				    # VS2019 CUDA-10.2

				    WindowsJob(None, _VC2019, CudaVersion(10, 2), master_only=True),

				    # VS2019 CUDA-10.2 force on cpu

				    WindowsJob(1, _VC2019, CudaVersion(10, 2), force_on_cpu=True, master_only=True),

				    # TODO: This test is disabled due to https://github.com/pytorch/pytorch/issues/59724

				    # WindowsJob('_azure_multi_gpu', _VC2019, CudaVersion(11, 1), multi_gpu=True, master_and_nightly=True),

1217

.circleci/config.yml generated

View File

File diff suppressed because it is too large Load Diff

									
										2

.circleci/docker/README.md
									
												View File
												
				@ -27,5 +27,5 @@ Docker builds are now defined with `.circleci/cimodel/data/simple/docker_definit

				./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest

				# Set flags (see build.sh) and build image

				sudo bash -c 'BREAKPAD=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest

				sudo bash -c 'PROTOBUF=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest

				```

									
										91

.circleci/docker/build.sh
									
												View File
												
				@ -78,119 +78,108 @@ TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/u

				case "$image" in

				  pytorch-linux-xenial-py3.8)

				    ANACONDA_PYTHON_VERSION=3.8

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=7

				    # Do not install PROTOBUF, DB, and VISION as a test

				    ;;

				  pytorch-linux-xenial-py3.6-gcc5.4)

				    ANACONDA_PYTHON_VERSION=3.6

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=5

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-xenial-py3.6-gcc7.2)

				    ANACONDA_PYTHON_VERSION=3.6

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=7

				    # Do not install PROTOBUF, DB, and VISION as a test

				    ;;

				  pytorch-linux-xenial-py3.6-gcc7)

				    ANACONDA_PYTHON_VERSION=3.6

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7)

				    CUDA_VERSION=10.0

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7)

				    CUDA_VERSION=10.1

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7)

				    CUDA_VERSION=10.2

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7)

				    CUDA_VERSION=11.1

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.6

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7)

				    CUDA_VERSION=11.3.0 # Deviating from major.minor to conform to nvidia's Docker image names

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.6

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-xenial-py3-clang5-asan)

				    ANACONDA_PYTHON_VERSION=3.6

				    CLANG_VERSION=5.0

				    CMAKE_VERSION=3.10.3

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-py3-clang7-asan)

				    ANACONDA_PYTHON_VERSION=3.6

				    CLANG_VERSION=7

				    CMAKE_VERSION=3.10.3

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-xenial-py3-clang7-onnx)

				    ANACONDA_PYTHON_VERSION=3.6

				    CLANG_VERSION=7

				    CMAKE_VERSION=3.10.3

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-xenial-py3-clang5-android-ndk-r19c)

				    ANACONDA_PYTHON_VERSION=3.6

				    CLANG_VERSION=5.0

				    CMAKE_VERSION=3.10.3

				    LLVMDEV=yes

				    PROTOBUF=yes

				    ANDROID=yes

				    ANDROID_NDK_VERSION=r19c

				    GRADLE_VERSION=6.8.3

				    CMAKE_VERSION=3.7.0

				    NINJA_VERSION=1.9.0

				    ;;

				  pytorch-linux-xenial-py3.6-clang7)

				    ANACONDA_PYTHON_VERSION=3.6

				    CMAKE_VERSION=3.10.3

				    CLANG_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-bionic-py3.6-clang9)

				    ANACONDA_PYTHON_VERSION=3.6

				@ -198,7 +187,6 @@ case "$image" in

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    VULKAN_SDK_VERSION=1.2.162.1

				    SWIFTSHADER=yes

				    ;;

				@ -208,8 +196,6 @@ case "$image" in

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9)

				    CUDA_VERSION=10.2

				@ -219,17 +205,6 @@ case "$image" in

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9)

				    CUDA_VERSION=10.2

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.8

				    GCC_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7)

				    CUDA_VERSION=10.2

				@ -239,7 +214,6 @@ case "$image" in

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ;;

				  pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9)

				    CUDA_VERSION=11.0

				@ -249,25 +223,14 @@ case "$image" in

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ROCM_VERSION=3.9

				    ;;

				  pytorch-linux-bionic-rocm4.0.1-py3.6)

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ROCM_VERSION=4.0.1

				    ;;

				  pytorch-linux-bionic-rocm4.1-py3.6)

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ROCM_VERSION=4.1

				    ;;

				  pytorch-linux-bionic-rocm4.2-py3.6)

				@ -276,16 +239,25 @@ case "$image" in

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    ROCM_VERSION=4.2

				    ;;

				  pytorch-linux-bionic-rocm4.3.1-py3.6)

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ROCM_VERSION=4.3.1

				    ;;

				  *)

				    # Catch-all for builds that are not hardcoded.

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    BREAKPAD=yes

				    echo "image '$image' did not match an existing build configuration"

				    if [[ "$image" == *xenial* ]]; then

				      CMAKE_VERSION=3.10.3

				    fi

				    if [[ "$image" == *py* ]]; then

				      extract_version_from_image_name py ANACONDA_PYTHON_VERSION

				    fi

				@ -320,7 +292,7 @@ if [ -n "${JENKINS:-}" ]; then

				  JENKINS_GID=$(id -g jenkins)

				fi

				tmp_tag="tmp-$(cat /dev/urandom | tr -dc 'a-z' | head -c 32)"

				tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')

				# Build image

				# TODO: build-arg THRIFT is not turned on for any image, remove it once we confirm

				@ -348,7 +320,6 @@ docker build \

				       --build-arg "GCC_VERSION=${GCC_VERSION}" \

				       --build-arg "CUDA_VERSION=${CUDA_VERSION}" \

				       --build-arg "CUDNN_VERSION=${CUDNN_VERSION}" \

				       --build-arg "BREAKPAD=${BREAKPAD}" \

				       --build-arg "ANDROID=${ANDROID}" \

				       --build-arg "ANDROID_NDK=${ANDROID_NDK_VERSION}" \

				       --build-arg "GRADLE_VERSION=${GRADLE_VERSION}" \

									
										25

.circleci/docker/common/install_breakpad.sh
									
												View File
											
				@ -1,25 +0,0 @@

				#!/bin/bash

				set -ex

				git clone https://github.com/driazati/breakpad.git

				pushd breakpad

				# breakpad has no actual releases, so this is pinned to the top commit from

				# main when this was forked (including the one patch commit). This uses a fork

				# of the breakpad mainline that automatically daisy-chains out to any previously

				# installed signal handlers (instead of overwriting them).

				git checkout 5485e473ed46d065e05489e50dfc59d90dfd7e22

				git clone https://chromium.googlesource.com/linux-syscall-support src/third_party/lss

				pushd src/third_party/lss

				# same as with breakpad, there are no real releases for this repo so use a

				# commit as the pin

				git checkout e1e7b0ad8ee99a875b272c8e33e308472e897660

				popd

				./configure

				make

				make install

				popd

				rm -rf breakpad

									
										3

.circleci/docker/common/install_cmake.sh
									
												View File
												
				@ -4,6 +4,9 @@ set -ex

				[ -n "$CMAKE_VERSION" ]

				# Remove system cmake install so it won't get used instead

				apt-get remove cmake -y

				# Turn 3.6.3 into v3.6

				path=$(echo "${CMAKE_VERSION}" | sed -e 's/\([0-9].[0-9]\+\).*/v\1/')

				file="cmake-${CMAKE_VERSION}-Linux-x86_64.tar.gz"

									
										11

.circleci/docker/common/install_conda.sh
									
												View File
												
				@ -69,8 +69,8 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				  }

				  # Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README

				  # DO NOT install cmake here as it would install a version newer than 3.5, but

				  # we want to pin to version 3.5.

				  # DO NOT install cmake here as it would install a version newer than 3.10, but

				  # we want to pin to version 3.10.

				  SCIPY_VERSION=1.1.0

				  if [ "$ANACONDA_PYTHON_VERSION" = "3.9" ]; then

				    # Install llvm-8 as it is required to compile llvmlite-0.30.0 from source

				@ -86,11 +86,7 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				    conda_install numpy=1.18.5 astunparse pyyaml mkl mkl-include setuptools cffi future six dataclasses typing_extensions

				  fi

				  if [[ "$CUDA_VERSION" == 10.0* ]]; then

				    conda_install magma-cuda100 -c pytorch

				  elif [[ "$CUDA_VERSION" == 10.1* ]]; then

				    conda_install magma-cuda101 -c pytorch

				  elif [[ "$CUDA_VERSION" == 10.2* ]]; then

				  if [[ "$CUDA_VERSION" == 10.2* ]]; then

				    conda_install magma-cuda102 -c pytorch

				  elif [[ "$CUDA_VERSION" == 11.0* ]]; then

				    conda_install magma-cuda110 -c pytorch

				@ -116,6 +112,7 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				    boto3==1.16.34 \

				    coverage==5.5 \

				    hypothesis==4.53.2 \

				    expecttest==0.1.3 \

				    mypy==0.812 \

				    tb-nightly

									
										17

.circleci/docker/common/install_db.sh
									
												View File
												
				@ -2,23 +2,6 @@

				set -ex

				# This function installs protobuf 2.6

				install_protobuf_26() {

				  pb_dir="/usr/temp_pb_install_dir"

				  mkdir -p $pb_dir

				  # On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or

				  # else it will fail with

				  #   g++: error: ./../lib64/crti.o: No such file or directory

				  ln -s /usr/lib64 "$pb_dir/lib64"

				  curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"

				  tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz

				  pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig

				  popd

				  rm -rf $pb_dir

				}

				install_ubuntu() {

				  apt-get update

				  apt-get install -y --no-install-recommends \

									
										4

.circleci/docker/common/install_nccl.sh
									
												View File
											
				@ -1,4 +0,0 @@

				#!/bin/bash

				sudo apt-get -qq update

				sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1

									
										6

.circleci/docker/common/install_openmpi.sh
									
												View File
												
				@ -1,4 +1,10 @@

				#!/bin/bash

				sudo apt-get update

				# also install ssh to avoid error of:

				# --------------------------------------------------------------------------

				# The value of the MCA parameter "plm_rsh_agent" was set to a path

				# that could not be found:

				#   plm_rsh_agent: ssh : rsh

				sudo apt-get install -y ssh

				sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

									
										27

.circleci/docker/common/install_protobuf.sh
									
												View File
												
				@ -2,8 +2,8 @@

				set -ex

				# This function installs protobuf 2.6

				install_protobuf_26() {

				# This function installs protobuf 3.17

				install_protobuf_317() {

				  pb_dir="/usr/temp_pb_install_dir"

				  mkdir -p $pb_dir

				@ -12,37 +12,32 @@ install_protobuf_26() {

				  #   g++: error: ./../lib64/crti.o: No such file or directory

				  ln -s /usr/lib64 "$pb_dir/lib64"

				  curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"

				  tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz

				  pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig

				  curl -LO "https://github.com/protocolbuffers/protobuf/releases/download/v3.17.3/protobuf-all-3.17.3.tar.gz"

				  tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-all-3.17.3.tar.gz

				  # -j2 to balance memory usage and speed.

				  # naked `-j` seems to use too much memory.

				  pushd "$pb_dir" && ./configure && make -j2 && make -j2 check && sudo make -j2 install && sudo ldconfig

				  popd

				  rm -rf $pb_dir

				}

				install_ubuntu() {

				  # Ubuntu 14.04 ships with protobuf 2.5, but ONNX needs protobuf >= 2.6

				  # so we install that here if on 14.04

				  # Ubuntu 14.04 also has cmake 2.8.12 as the default option, so we will

				  # Ubuntu 14.04 has cmake 2.8.12 as the default option, so we will

				  # install cmake3 here and use cmake3.

				  apt-get update

				  if [[ "$UBUNTU_VERSION" == 14.04 ]]; then

				    apt-get install -y --no-install-recommends cmake3

				    install_protobuf_26

				  else

				    apt-get install -y --no-install-recommends \

				            libprotobuf-dev \

				            protobuf-compiler

				  fi

				  # Cleanup

				  apt-get autoclean && apt-get clean

				  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				  install_protobuf_317

				}

				install_centos() {

				  # Centos7 ships with protobuf 2.5, but ONNX needs protobuf >= 2.6

				  # so we always install install that here

				  install_protobuf_26

				  install_protobuf_317

				}

				# Install base packages depending on the base OS

									
										10

.circleci/docker/common/install_rocm.sh
									
												View File
												
				@ -4,9 +4,13 @@ set -ex

				install_magma() {

				    # "install" hipMAGMA into /opt/rocm/magma by copying after build

				    git clone https://bitbucket.org/icl/magma.git

				    git clone https://bitbucket.org/icl/magma.git -b magma_ctrl_launch_bounds

				    pushd magma

				    git checkout 878b1ce02e9cfe4a829be22c8f911e9c0b6bd88f

				    # The branch "magma_ctrl_launch_bounds" is having a fix over the below commit, so keeping the below comment for reference.

				    #git checkout 878b1ce02e9cfe4a829be22c8f911e9c0b6bd88f

				    # Work around non-asii characters in certain magma sources; remove this after upstream magma fixes this.

				    perl -i.bak -pe 's/[^[:ascii:]]//g' sparse/control/magma_zfree.cpp

				    perl -i.bak -pe 's/[^[:ascii:]]//g' sparse/control/magma_zsolverinfo.cpp

				    cp make.inc-examples/make.inc.hip-gcc-mkl make.inc

				    echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc

				    echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc

				@ -15,7 +19,7 @@ install_magma() {

				    sed -i 's/^FOPENMP/#FOPENMP/g' make.inc

				    export PATH="${PATH}:/opt/rocm/bin"

				    make -f make.gen.hipMAGMA -j $(nproc)

				    LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda

				    make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda

				    make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda

				    popd

				    mv magma /opt/rocm

									
										17

.circleci/docker/common/install_vision.sh
									
												View File
												
				@ -2,23 +2,6 @@

				set -ex

				# This function installs protobuf 2.6

				install_protobuf_26() {

				  pb_dir="/usr/temp_pb_install_dir"

				  mkdir -p $pb_dir

				  # On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or

				  # else it will fail with

				  #   g++: error: ./../lib64/crti.o: No such file or directory

				  ln -s /usr/lib64 "$pb_dir/lib64"

				  curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"

				  tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz

				  pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig

				  popd

				  rm -rf $pb_dir

				}

				install_ubuntu() {

				  apt-get update

				  apt-get install -y --no-install-recommends \

									
										19

.circleci/docker/ubuntu-cuda/Dockerfile
									
												View File
												
				@ -61,6 +61,16 @@ RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi

				RUN rm install_vision.sh

				ENV INSTALLED_VISION ${VISION}

				ADD ./common/install_openssl.sh install_openssl.sh

				ENV OPENSSL_ROOT_DIR /opt/openssl

				RUN bash ./install_openssl.sh

				# (optional) Install non-default CMake version

				ARG CMAKE_VERSION

				ADD ./common/install_cmake.sh install_cmake.sh

				RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi

				RUN rm install_cmake.sh

				# Install ccache/sccache (do this last, so we get priority in PATH)

				ADD ./common/install_cache.sh install_cache.sh

				ENV PATH /opt/cache/bin:$PATH

				@ -72,11 +82,6 @@ ADD ./common/install_jni.sh install_jni.sh

				ADD ./java/jni.h jni.h

				RUN bash ./install_jni.sh && rm install_jni.sh

				# Install NCCL for when CUDA is version 10.1

				ADD ./common/install_nccl.sh install_nccl.sh

				RUN if [ "${CUDA_VERSION}" = 10.1 ]; then bash ./install_nccl.sh; fi

				RUN rm install_nccl.sh

				# Install Open MPI for CUDA

				ADD ./common/install_openmpi.sh install_openmpi.sh

				RUN if [ -n "${CUDA_VERSION}" ]; then bash install_openmpi.sh; fi

				@ -93,9 +98,5 @@ ENV TORCH_NVCC_FLAGS "-Xfatbin -compress-all"

				# Install LLVM dev version (Defined in the pytorch/builder github repository)

				COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm

				ADD ./common/install_openssl.sh install_openssl.sh

				ENV OPENSSL_ROOT_DIR /opt/openssl

				RUN bash ./install_openssl.sh

				USER jenkins

				CMD ["bash"]

									
										15

.circleci/docker/ubuntu/Dockerfile
									
												View File
												
				@ -82,13 +82,6 @@ RUN rm AndroidManifest.xml

				RUN rm build.gradle

				ENV INSTALLED_ANDROID ${ANDROID}

				# (optional) Install breakpad

				ARG BREAKPAD

				ADD ./common/install_breakpad.sh install_breakpad.sh

				RUN if [ -n "${BREAKPAD}" ]; then bash ./install_breakpad.sh; fi

				RUN rm install_breakpad.sh

				ENV INSTALLED_BREAKPAD ${BREAKPAD}

				# (optional) Install Vulkan SDK

				ARG VULKAN_SDK_VERSION

				ADD ./common/install_vulkan_sdk.sh install_vulkan_sdk.sh

				@ -113,6 +106,10 @@ ADD ./common/install_ninja.sh install_ninja.sh

				RUN if [ -n "${NINJA_VERSION}" ]; then bash ./install_ninja.sh; fi

				RUN rm install_ninja.sh

				ADD ./common/install_openssl.sh install_openssl.sh

				RUN bash ./install_openssl.sh

				ENV OPENSSL_ROOT_DIR /opt/openssl

				# Install ccache/sccache (do this last, so we get priority in PATH)

				ADD ./common/install_cache.sh install_cache.sh

				ENV PATH /opt/cache/bin:$PATH

				@ -130,9 +127,5 @@ ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}

				# Install LLVM dev version (Defined in the pytorch/builder github repository)

				COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm

				ADD ./common/install_openssl.sh install_openssl.sh

				RUN bash ./install_openssl.sh

				ENV OPENSSL_ROOT_DIR /opt/openssl

				USER jenkins

				CMD ["bash"]

									
										11

.circleci/generate_config_yml.py
									
												View File
												
				@ -13,10 +13,8 @@ from collections import namedtuple

				import cimodel.data.binary_build_definitions as binary_build_definitions

				import cimodel.data.pytorch_build_definitions as pytorch_build_definitions

				import cimodel.data.simple.android_definitions

				import cimodel.data.simple.bazel_definitions

				import cimodel.data.simple.binary_smoketest

				import cimodel.data.simple.docker_definitions

				import cimodel.data.simple.ge_config_tests

				import cimodel.data.simple.ios_definitions

				import cimodel.data.simple.macos_definitions

				import cimodel.data.simple.mobile_definitions

				@ -135,8 +133,6 @@ def gen_build_workflows_tree():

				        cimodel.data.simple.android_definitions.get_workflow_jobs,

				        cimodel.data.simple.ios_definitions.get_workflow_jobs,

				        cimodel.data.simple.mobile_definitions.get_workflow_jobs,

				        cimodel.data.simple.ge_config_tests.get_workflow_jobs,

				        cimodel.data.simple.bazel_definitions.get_workflow_jobs,

				        cimodel.data.simple.binary_smoketest.get_workflow_jobs,

				        cimodel.data.simple.nightly_ios.get_workflow_jobs,

				        cimodel.data.simple.nightly_android.get_workflow_jobs,

				@ -154,7 +150,10 @@ def gen_build_workflows_tree():

				        binary_build_definitions.get_nightly_uploads,

				    ]

				    slow_gradcheck_jobs = pytorch_build_definitions.get_workflow_jobs(only_slow_gradcheck=True)

				    slow_gradcheck_jobs = [

				        pytorch_build_definitions.get_workflow_jobs,

				        cimodel.data.simple.docker_definitions.get_workflow_jobs,

				    ]

				    return {

				        "workflows": {

				@ -172,7 +171,7 @@ def gen_build_workflows_tree():

				            },

				            "slow_gradcheck_build": {

				                "when": r"<< pipeline.parameters.run_slow_gradcheck_build >>",

				                "jobs": slow_gradcheck_jobs,

				                "jobs": [f(only_slow_gradcheck=True) for f in slow_gradcheck_jobs],

				            },

				        }

				    }

									
										4

.circleci/scripts/binary_checkout.sh
									
												View File
												
				@ -55,13 +55,13 @@ else

				  echo "Can't tell what to checkout"

				  exit 1

				fi

				retry git submodule update --init --recursive

				retry git submodule update --init --recursive --jobs 0

				echo "Using Pytorch from "

				git --no-pager log --max-count 1

				popd

				# Clone the Builder master repo

				retry git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"

				retry git clone -q https://github.com/pytorch/builder.git -b release/1.10 "$BUILDER_ROOT"

				pushd "$BUILDER_ROOT"

				echo "Using builder from "

				git --no-pager log --max-count 1

									
										6

.circleci/scripts/binary_ios_build.sh
									
												View File
												
				@ -22,7 +22,7 @@ export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

				# sync submodules

				cd ${PROJ_ROOT}

				git submodule sync

				git submodule update --init --recursive

				git submodule update --init --recursive --jobs 0

				# run build script

				chmod a+x ${PROJ_ROOT}/scripts/build_ios.sh

				@ -31,8 +31,12 @@ cat ${PROJ_ROOT}/scripts/build_ios.sh

				echo "########################################################"

				echo "IOS_ARCH: ${IOS_ARCH}"

				echo "IOS_PLATFORM: ${IOS_PLATFORM}"

				echo "USE_PYTORCH_METAL: ${USE_PYTORCH_METAL}"

				echo "USE_COREML_DELEGATE: ${USE_COREML_DELEGATE}"

				export IOS_ARCH=${IOS_ARCH}

				export IOS_PLATFORM=${IOS_PLATFORM}

				export USE_PYTORCH_METAL=${USE_PYTORCH_METAL}

				export USE_COREML_DELEGATE=${USE_COREML_DELEGATE}

				unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts

				#store the binary

									
										13

.circleci/scripts/binary_ios_test.sh
									
												View File
												
				@ -8,16 +8,17 @@ cd ${PROJ_ROOT}/ios/TestApp

				# install fastlane

				sudo gem install bundler && bundle install

				# install certificates

				echo "${IOS_CERT_KEY}" >> cert.txt

				echo "${IOS_CERT_KEY_2022}" >> cert.txt

				base64 --decode cert.txt -o Certificates.p12

				rm cert.txt

				bundle exec fastlane install_cert

				bundle exec fastlane install_root_cert

				bundle exec fastlane install_dev_cert

				# install the provisioning profile

				PROFILE=PyTorch_CI_2021.mobileprovision

				PROFILE=PyTorch_CI_2022.mobileprovision

				PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles

				mkdir -pv "${PROVISIONING_PROFILES}"

				cd "${PROVISIONING_PROFILES}"

				echo "${IOS_SIGN_KEY}" >> cert.txt

				echo "${IOS_SIGN_KEY_2022}" >> cert.txt

				base64 --decode cert.txt -o ${PROFILE}

				rm cert.txt

				# run the ruby build script

				@ -25,5 +26,5 @@ if ! [ -x "$(command -v xcodebuild)" ]; then

				    echo 'Error: xcodebuild is not installed.'

				    exit 1

				fi

				PROFILE=PyTorch_CI_2021

				ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}

				PROFILE=PyTorch_CI_2022

				ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID} -f Accelerate,MetalPerformanceShaders,CoreML

									
										18

.circleci/scripts/binary_ios_upload.sh
									
												View File
												
				@ -27,11 +27,14 @@ lipo -i ${ZIP_DIR}/install/lib/*.a

				cp ${PROJ_ROOT}/ios/LibTorch-Lite.h ${ZIP_DIR}/src/

				cp ${PROJ_ROOT}/LICENSE ${ZIP_DIR}/

				# zip the library

				ZIPFILE=libtorch_ios_nightly_build.zip

				export DATE="$(date -u +%Y%m%d)"

				export IOS_NIGHTLY_BUILD_VERSION="1.10.0.${DATE}"

				# libtorch_lite_ios_nightly_1.10.0.20210810.zip

				ZIPFILE="libtorch_lite_ios_nightly_${IOS_NIGHTLY_BUILD_VERSION}.zip"

				cd ${ZIP_DIR}

				#for testing

				touch version.txt

				echo $(date +%s) > version.txt

				echo "${IOS_NIGHTLY_BUILD_VERSION}" > version.txt

				zip -r ${ZIPFILE} install src version.txt LICENSE

				# upload to aws

				# Install conda then 'conda install' awscli

				@ -48,3 +51,14 @@ set +x

				# echo "AWS KEY: ${AWS_ACCESS_KEY_ID}"

				# echo "AWS SECRET: ${AWS_SECRET_ACCESS_KEY}"

				aws s3 cp ${ZIPFILE} s3://ossci-ios-build/ --acl public-read

				# create a new LibTorch-Lite-Nightly.podspec from the template

				echo "cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec"

				cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec

				# update pod version

				sed -i '' -e "s/IOS_NIGHTLY_BUILD_VERSION/${IOS_NIGHTLY_BUILD_VERSION}/g" ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec

				cat ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec

				# push the new LibTorch-Lite-Nightly.podspec to CocoaPods

				pod trunk push --verbose --allow-warnings --use-libraries --skip-import-validation ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec

									
										4

.circleci/scripts/binary_linux_test.sh
									
												View File
												
				@ -9,10 +9,6 @@ python_nodot="\$(echo $DESIRED_PYTHON | tr -d m.u)"

				# Set up Python

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  # There was a bug that was introduced in conda-package-handling >= 1.6.1 that makes archives

				  # above a certain size fail out when attempting to extract

				  # see: https://github.com/conda/conda-package-handling/issues/71

				  conda install -y conda-package-handling=1.6.0

				  retry conda create -qyn testenv python="$DESIRED_PYTHON"

				  source activate testenv >/dev/null

				elif [[ "$PACKAGE_TYPE" != libtorch ]]; then

									
										4

.circleci/scripts/binary_macos_build.sh
									
												View File
												
				@ -14,6 +14,10 @@ chmod +x "$build_script"

				# Build

				cat >"$build_script" <<EOL

				export PATH="$workdir/miniconda/bin:$PATH"

				if [[ "$CIRCLE_BRANCH" == "nightly" ]]; then

				  export USE_PYTORCH_METAL_EXPORT=1

				  export USE_COREML_DELEGATE=1

				fi

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  "$workdir/builder/conda/build_pytorch.sh"

				else

									
										2

.circleci/scripts/binary_populate_env.sh
									
												View File
												
				@ -62,7 +62,7 @@ if [[ -z "$DOCKER_IMAGE" ]]; then

				  if [[ "$PACKAGE_TYPE" == conda ]]; then

				    export DOCKER_IMAGE="pytorch/conda-cuda"

				  elif [[ "$DESIRED_CUDA" == cpu ]]; then

				    export DOCKER_IMAGE="pytorch/manylinux-cuda100"

				    export DOCKER_IMAGE="pytorch/manylinux-cpu"

				  else

				    export DOCKER_IMAGE="pytorch/manylinux-cuda${DESIRED_CUDA:2}"

				  fi

									
										47

.circleci/scripts/binary_windows_build.sh
									
												View File
												
				@ -8,15 +8,45 @@ export CUDA_VERSION="${DESIRED_CUDA/cu/}"

				export USE_SCCACHE=1

				export SCCACHE_BUCKET=ossci-compiler-cache-windows

				export NIGHTLIES_PYTORCH_ROOT="$PYTORCH_ROOT"

				if [[ "$CUDA_VERSION" == "92" || "$CUDA_VERSION" == "100" ]]; then

				  export VC_YEAR=2017

				else

				  export VC_YEAR=2019

				fi

				export VC_YEAR=2019

				if [[ "${DESIRED_CUDA}" == "cu111" || "${DESIRED_CUDA}" == "cu113" ]]; then

				  export BUILD_SPLIT_CUDA="ON"

				    export BUILD_SPLIT_CUDA="ON"

				fi

				echo "Free Space for CUDA DEBUG BUILD"

				if [[ "$CIRCLECI" == 'true' ]]; then

				    if [[ -d "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community" ]]; then

				        rm -rf "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community"

				    fi

				    if [[ -d "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0" ]]; then

				        rm -rf "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0"

				    fi

				    if [[ -d "C:\\Program Files (x86)\\Microsoft.NET" ]]; then

				        rm -rf "C:\\Program Files (x86)\\Microsoft.NET"

				    fi

				    if [[ -d "C:\\Program Files\\dotnet" ]]; then

				        rm -rf "C:\\Program Files\\dotnet"

				    fi

				    if [[ -d "C:\\Program Files (x86)\\dotnet" ]]; then

				        rm -rf "C:\\Program Files (x86)\\dotnet"

				    fi

				    if [[ -d "C:\\Program Files (x86)\\Microsoft SQL Server" ]]; then

				        rm -rf "C:\\Program Files (x86)\\Microsoft SQL Server"

				    fi

				    if [[ -d "C:\\Program Files (x86)\\Xamarin" ]]; then

				        rm -rf "C:\\Program Files (x86)\\Xamarin"

				    fi

				    if [[ -d "C:\\Program Files (x86)\\Google" ]]; then

				        rm -rf "C:\\Program Files (x86)\\Google"

				    fi

				fi

				set +x

				@ -32,7 +62,8 @@ if [[ "$CIRCLECI" == 'true' && -d "C:\\ProgramData\\Microsoft\\VisualStudio\\Pac

				fi

				if [[ "$CIRCLECI" == 'true' && -d "C:\\Microsoft" ]]; then

				  rm -rf "C:\\Microsoft\\Android*"

				  # don't use quotes here

				  rm -rf /c/Microsoft/AndroidNDK*

				fi

				echo "Free space on filesystem before build:"

									
										8

.circleci/scripts/binary_windows_test.sh
									
												View File
												
				@ -4,13 +4,7 @@ set -eux -o pipefail

				source "/c/w/env"

				export CUDA_VERSION="${DESIRED_CUDA/cu/}"

				export VC_YEAR=2017

				if [[ "$CUDA_VERSION" == "92" || "$CUDA_VERSION" == "100" ]]; then

				  export VC_YEAR=2017

				else

				  export VC_YEAR=2019

				fi

				export VC_YEAR=2019

				pushd "$BUILDER_ROOT"

									
										27

.circleci/scripts/cpp_doc_push_script.sh
									
												View File
												
				@ -10,18 +10,27 @@ pt_checkout="/var/lib/jenkins/workspace"

				# Since we're cat-ing this file, we need to escape all $'s

				echo "cpp_doc_push_script.sh: Invoked with $*"

				# Argument 1: Where to copy the built documentation for Python API to

				# (pytorch.github.io/$install_path)

				install_path="$1"

				if [ -z "$install_path" ]; then

				echo "error: cpp_doc_push_script.sh: install_path (arg1) not specified"

				# for statements like ${1:-${DOCS_INSTALL_PATH:-docs/}}

				# the order of operations goes:

				#   1. Check if there's an argument $1

				#   2. If no argument check for environment var DOCS_INSTALL_PATH

				#   3. If no environment var fall back to default 'docs/'

				# NOTE: It might seem weird to gather the second argument before gathering the first argument

				#       but since DOCS_INSTALL_PATH can be derived from DOCS_VERSION it's probably better to

				#       try and gather it first, just so we don't potentially break people who rely on this script

				# Argument 2: What version of the Python API docs we are building.

				version="${2:-${DOCS_VERSION:-master}}"

				if [ -z "$version" ]; then

				echo "error: cpp_doc_push_script.sh: version (arg2) not specified"

				  exit 1

				fi

				# Argument 2: What version of the Python API docs we are building.

				version="$2"

				if [ -z "$version" ]; then

				echo "error: cpp_doc_push_script.sh: version (arg2) not specified"

				# Argument 1: Where to copy the built documentation for Python API to

				# (pytorch.github.io/$install_path)

				install_path="${1:-${DOCS_INSTALL_PATH:-docs/${DOCS_VERSION}}}"

				if [ -z "$install_path" ]; then

				echo "error: cpp_doc_push_script.sh: install_path (arg1) not specified"

				  exit 1

				fi

									
										29

.circleci/scripts/python_doc_push_script.sh
									
												View File
												
				@ -13,18 +13,27 @@ echo "python_doc_push_script.sh: Invoked with $*"

				set -ex

				# Argument 1: Where to copy the built documentation to

				# (pytorch.github.io/$install_path)

				install_path="$1"

				if [ -z "$install_path" ]; then

				echo "error: python_doc_push_script.sh: install_path (arg1) not specified"

				# for statements like ${1:-${DOCS_INSTALL_PATH:-docs/}}

				# the order of operations goes:

				#   1. Check if there's an argument $1

				#   2. If no argument check for environment var DOCS_INSTALL_PATH

				#   3. If no environment var fall back to default 'docs/'

				# NOTE: It might seem weird to gather the second argument before gathering the first argument

				#       but since DOCS_INSTALL_PATH can be derived from DOCS_VERSION it's probably better to

				#       try and gather it first, just so we don't potentially break people who rely on this script

				# Argument 2: What version of the docs we are building.

				version="${2:-${DOCS_VERSION:-master}}"

				if [ -z "$version" ]; then

				echo "error: python_doc_push_script.sh: version (arg2) not specified"

				  exit 1

				fi

				# Argument 2: What version of the docs we are building.

				version="$2"

				if [ -z "$version" ]; then

				echo "error: python_doc_push_script.sh: version (arg2) not specified"

				# Argument 1: Where to copy the built documentation to

				# (pytorch.github.io/$install_path)

				install_path="${1:-${DOCS_INSTALL_PATH:-docs/${DOCS_VERSION}}}"

				if [ -z "$install_path" ]; then

				echo "error: python_doc_push_script.sh: install_path (arg1) not specified"

				  exit 1

				fi

				@ -34,7 +43,7 @@ if [ "$version" == "master" ]; then

				fi

				# Argument 3: The branch to push to. Usually is "site"

				branch="$3"

				branch="${3:-${DOCS_BRANCH:-site}}"

				if [ -z "$branch" ]; then

				echo "error: python_doc_push_script.sh: branch (arg3) not specified"

				  exit 1

									
										8

.circleci/scripts/setup_ci_environment.sh
									
												View File
												
				@ -7,6 +7,9 @@ sudo rm -f /etc/apt/heroku.list

				sudo rm -f /etc/apt/openjdk-r-ubuntu-ppa-xenial.list

				sudo rm -f /etc/apt/partner.list

				# To increase the network reliability, let apt decide which mirror is best to use

				sudo sed -i -e 's/http:\/\/.*archive/mirror:\/\/mirrors/' -e 's/\/ubuntu\//\/mirrors.txt/' /etc/apt/sources.list

				retry () {

				  $*  || $* || $* || $* || $*

				}

				@ -40,9 +43,9 @@ if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then

				  curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

				  curl -s -L "https://nvidia.github.io/nvidia-docker/${distribution}/nvidia-docker.list" | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

				  sudo apt-get update -qq

				  retry sudo apt-get update -qq

				  # Necessary to get the `--gpus` flag to function within docker

				  sudo apt-get install -y nvidia-container-toolkit

				  retry sudo apt-get install -y nvidia-container-toolkit

				  sudo systemctl restart docker

				else

				  # Explicitly remove nvidia docker apt repositories if not building for cuda

				@ -64,6 +67,7 @@ add_to_env_file() {

				}

				add_to_env_file IN_CI 1

				add_to_env_file CI_MASTER "${CI_MASTER:-}"

				add_to_env_file COMMIT_SOURCE "${CIRCLE_BRANCH:-}"

				add_to_env_file BUILD_ENVIRONMENT "${BUILD_ENVIRONMENT}"

				add_to_env_file CIRCLE_PULL_REQUEST "${CIRCLE_PULL_REQUEST}"

									
										45

.circleci/scripts/vs_install.ps1
									
												View File
												
				@ -1,8 +1,8 @@

				# https://developercommunity.visualstudio.com/t/install-specific-version-of-vs-component/1142479

				# https://docs.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers

				# Where to find the links: https://docs.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers

				# 16.8.5 BuildTools

				$VS_DOWNLOAD_LINK = "https://download.visualstudio.microsoft.com/download/pr/20130c62-1bc8-43d6-b4f0-c20bb7c79113/145a319d79a83376915d8f855605e152ef5f6fa2b2f1d2dca411fb03722eea72/vs_BuildTools.exe"

				# BuildTools from S3

				$VS_DOWNLOAD_LINK = "https://s3.amazonaws.com/ossci-windows/vs${env:VS_VERSION}_BuildTools.exe"

				$COLLECT_DOWNLOAD_LINK = "https://aka.ms/vscollect.exe"

				$VS_INSTALL_ARGS = @("--nocache","--quiet","--wait", "--add Microsoft.VisualStudio.Workload.VCTools",

				                                                     "--add Microsoft.Component.MSBuild",

				@ -18,32 +18,41 @@ if (${env:INSTALL_WINDOWS_SDK} -eq "1") {

				    $VS_INSTALL_ARGS += "--add Microsoft.VisualStudio.Component.Windows10SDK.19041"

				}

				if (Test-Path "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe") {

				    $VS_VERSION_major = [int] ${env:VS_VERSION}.split(".")[0]

				    $existingPath = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -products "Microsoft.VisualStudio.Product.BuildTools" -version "[${env:VS_VERSION}, ${env:VS_VERSION_major + 1})" -property installationPath

				    if (($existingPath -ne $null) -and (!${env:CIRCLECI})) {

				        echo "Found correctly versioned existing BuildTools installation in $existingPath"

				        exit 0

				    }

				    $pathToRemove = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -products "Microsoft.VisualStudio.Product.BuildTools" -property installationPath

				}

				echo "Downloading VS installer from S3."

				curl.exe --retry 3 -kL $VS_DOWNLOAD_LINK --output vs_installer.exe

				if ($LASTEXITCODE -ne 0) {

				    echo "Download of the VS 2019 Version 16.8.5 installer failed"

				    echo "Download of the VS 2019 Version ${env:VS_VERSION} installer failed"

				    exit 1

				}

				if (Test-Path "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe") {

				    $existingPath = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -products "Microsoft.VisualStudio.Product.BuildTools" -version "[16, 17)" -property installationPath

				    if ($existingPath -ne $null) {

				        echo "Found existing BuildTools installation in $existingPath"

				        $VS_UNINSTALL_ARGS = @("uninstall", "--installPath", "`"$existingPath`"", "--quiet","--wait")

				        $process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_UNINSTALL_ARGS -NoNewWindow -Wait -PassThru

				        $exitCode = $process.ExitCode

				        if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {

				            echo "Original BuildTools uninstall failed with code $exitCode"

				            exit 1

				        }

				        echo "Original BuildTools uninstalled"

				if ($pathToRemove -ne $null) {

				    echo "Uninstalling $pathToRemove."

				    $VS_UNINSTALL_ARGS = @("uninstall", "--installPath", "`"$pathToRemove`"", "--quiet","--wait")

				    $process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_UNINSTALL_ARGS -NoNewWindow -Wait -PassThru

				    $exitCode = $process.ExitCode

				    if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {

				        echo "Original BuildTools uninstall failed with code $exitCode"

				        exit 1

				    }

				    echo "Other versioned BuildTools uninstalled."

				}

				echo "Installing Visual Studio version ${env:VS_VERSION}."

				$process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_INSTALL_ARGS -NoNewWindow -Wait -PassThru

				Remove-Item -Path vs_installer.exe -Force

				$exitCode = $process.ExitCode

				if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {

				    echo "VS 2017 installer exited with code $exitCode, which should be one of [0, 3010]."

				    echo "VS 2019 installer exited with code $exitCode, which should be one of [0, 3010]."

				    curl.exe --retry 3 -kL $COLLECT_DOWNLOAD_LINK --output Collect.exe

				    if ($LASTEXITCODE -ne 0) {

				        echo "Download of the VS Collect tool failed."

				@ -51,6 +60,6 @@ if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {

				    }

				    Start-Process "${PWD}\Collect.exe" -NoNewWindow -Wait -PassThru

				    New-Item -Path "C:\w\build-results" -ItemType "directory" -Force

				    Copy-Item -Path "C:\Users\circleci\AppData\Local\Temp\vslogs.zip" -Destination "C:\w\build-results\"

				    Copy-Item -Path "${env:TEMP}\vslogs.zip" -Destination "C:\w\build-results\"

				    exit 1

				}

									
										118

.circleci/scripts/windows_cuda_install.sh
									
												View File
												
				@ -1,70 +1,74 @@

				#!/bin/bash

				set -eux -o pipefail

				cuda_major_version=${CUDA_VERSION%.*}

				if [[ "$cuda_major_version" == "10" ]]; then

				    cuda_installer_name="cuda_10.1.243_426.00_win10"

				    msbuild_project_dir="CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"

				    cuda_install_packages="nvcc_10.1 cuobjdump_10.1 nvprune_10.1 cupti_10.1 cublas_10.1 cublas_dev_10.1 cudart_10.1 cufft_10.1 cufft_dev_10.1 curand_10.1 curand_dev_10.1 cusolver_10.1 cusolver_dev_10.1 cusparse_10.1 cusparse_dev_10.1 nvgraph_10.1 nvgraph_dev_10.1 npp_10.1 npp_dev_10.1 nvrtc_10.1 nvrtc_dev_10.1 nvml_dev_10.1"

				elif [[ "$cuda_major_version" == "11" ]]; then

				    if [[ "${CUDA_VERSION}" == "11.1" ]]; then

				case ${CUDA_VERSION} in

				    10.1)

				        cuda_installer_name="cuda_10.1.243_426.00_win10"

				        cuda_install_packages="nvcc_10.1 cuobjdump_10.1 nvprune_10.1 cupti_10.1 cublas_10.1 cublas_dev_10.1 cudart_10.1 cufft_10.1 cufft_dev_10.1 curand_10.1 curand_dev_10.1 cusolver_10.1 cusolver_dev_10.1 cusparse_10.1 cusparse_dev_10.1 nvgraph_10.1 nvgraph_dev_10.1 npp_10.1 npp_dev_10.1 nvrtc_10.1 nvrtc_dev_10.1 nvml_dev_10.1"

				        ;;

				    10.2)

				        cuda_installer_name="cuda_10.2.89_441.22_win10"

				        cuda_install_packages="nvcc_10.2 cuobjdump_10.2 nvprune_10.2 cupti_10.2 cublas_10.2 cublas_dev_10.2 cudart_10.2 cufft_10.2 cufft_dev_10.2 curand_10.2 curand_dev_10.2 cusolver_10.2 cusolver_dev_10.2 cusparse_10.2 cusparse_dev_10.2 nvgraph_10.2 nvgraph_dev_10.2 npp_10.2 npp_dev_10.2 nvrtc_10.2 nvrtc_dev_10.2 nvml_dev_10.2"

				        ;;

				    11.1)

				        cuda_installer_name="cuda_11.1.0_456.43_win10"

				        msbuild_project_dir="visual_studio_integration/CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"

				        cuda_install_packages="nvcc_11.1 cuobjdump_11.1 nvprune_11.1 nvprof_11.1 cupti_11.1 cublas_11.1 cublas_dev_11.1 cudart_11.1 cufft_11.1 cufft_dev_11.1 curand_11.1 curand_dev_11.1 cusolver_11.1 cusolver_dev_11.1 cusparse_11.1 cusparse_dev_11.1 npp_11.1 npp_dev_11.1 nvrtc_11.1 nvrtc_dev_11.1 nvml_dev_11.1"

				    elif [[ "${CUDA_VERSION}" == "11.3" ]]; then

				        ;;

				    11.3)

				        cuda_installer_name="cuda_11.3.0_465.89_win10"

				        msbuild_project_dir="visual_studio_integration/CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"

				        cuda_install_packages="thrust_11.3 nvcc_11.3 cuobjdump_11.3 nvprune_11.3 nvprof_11.3 cupti_11.3 cublas_11.3 cublas_dev_11.3 cudart_11.3 cufft_11.3 cufft_dev_11.3 curand_11.3 curand_dev_11.3 cusolver_11.3 cusolver_dev_11.3 cusparse_11.3 cusparse_dev_11.3 npp_11.3 npp_dev_11.3 nvrtc_11.3 nvrtc_dev_11.3 nvml_dev_11.3"

				    else

				        echo "This should not happen! ABORT."

				        ;;

				    *)

				        echo "CUDA_VERSION $CUDA_VERSION is not supported yet"

				        exit 1

				    fi

				        ;;

				esac

				if [[ -f "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe" ]]; then

				    echo "Existing CUDA v${CUDA_VERSION} installation found, skipping install"

				else

				    echo "CUDA_VERSION $CUDA_VERSION is not supported yet"

				    exit 1

				    tmp_dir=$(mktemp -d)

				    (

				        # no need to popd after, the subshell shouldn't affect the parent shell

				        pushd "${tmp_dir}"

				        cuda_installer_link="https://ossci-windows.s3.amazonaws.com/${cuda_installer_name}.exe"

				        curl --retry 3 -kLO $cuda_installer_link

				        7z x ${cuda_installer_name}.exe -o${cuda_installer_name}

				        pushd ${cuda_installer_name}

				        mkdir cuda_install_logs

				        set +e

				        # This breaks for some reason if you quote cuda_install_packages

				        # shellcheck disable=SC2086

				        ./setup.exe -s ${cuda_install_packages} -loglevel:6 -log:"$(pwd -W)/cuda_install_logs"

				        set -e

				        if [[ ! -f "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe" ]]; then

				            echo "CUDA installation failed"

				            mkdir -p /c/w/build-results

				            7z a "c:\\w\\build-results\\cuda_install_logs.7z" cuda_install_logs

				            exit 1

				        fi

				    )

				    rm -rf "${tmp_dir}"

				fi

				if [[ "$cuda_major_version" == "11" && "${JOB_EXECUTOR:-}" == "windows-with-nvidia-gpu" ]]; then

				    cuda_install_packages="${cuda_install_packages} Display.Driver"

				fi

				cuda_installer_link="https://ossci-windows.s3.amazonaws.com/${cuda_installer_name}.exe"

				curl --retry 3 -kLO $cuda_installer_link

				7z x ${cuda_installer_name}.exe -o${cuda_installer_name}

				cd ${cuda_installer_name}

				mkdir cuda_install_logs

				set +e

				./setup.exe -s ${cuda_install_packages} -loglevel:6 -log:"$(pwd -W)/cuda_install_logs"

				set -e

				if [[ "${VC_YEAR}" == "2017" ]]; then

				    cp -r ${msbuild_project_dir}/* "C:/Program Files (x86)/Microsoft Visual Studio/2017/${VC_PRODUCT}/Common7/IDE/VC/VCTargets/BuildCustomizations/"

				if [[ -f "/c/Program Files/NVIDIA Corporation/NvToolsExt/bin/x64/nvToolsExt64_1.dll" ]]; then

				    echo "Existing nvtools installation found, skipping install"

				else

				    cp -r ${msbuild_project_dir}/* "C:/Program Files (x86)/Microsoft Visual Studio/2019/${VC_PRODUCT}/MSBuild/Microsoft/VC/v160/BuildCustomizations/"

				    # create tmp dir for download

				    tmp_dir=$(mktemp -d)

				    (

				        # no need to popd after, the subshell shouldn't affect the parent shell

				        pushd "${tmp_dir}"

				        curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/NvToolsExt.7z

				        7z x NvToolsExt.7z -oNvToolsExt

				        mkdir -p "C:/Program Files/NVIDIA Corporation/NvToolsExt"

				        cp -r NvToolsExt/* "C:/Program Files/NVIDIA Corporation/NvToolsExt/"

				    )

				    rm -rf "${tmp_dir}"

				fi

				if ! ls "/c/Program Files/NVIDIA Corporation/NvToolsExt/bin/x64/nvToolsExt64_1.dll"

				then

				    curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/NvToolsExt.7z

				    7z x NvToolsExt.7z -oNvToolsExt

				    mkdir -p "C:/Program Files/NVIDIA Corporation/NvToolsExt"

				    cp -r NvToolsExt/* "C:/Program Files/NVIDIA Corporation/NvToolsExt/"

				    export NVTOOLSEXT_PATH="C:\\Program Files\\NVIDIA Corporation\\NvToolsExt\\"

				fi

				if ! ls "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe"

				then

				    echo "CUDA installation failed"

				    mkdir -p /c/w/build-results

				    7z a "c:\\w\\build-results\\cuda_install_logs.7z" cuda_install_logs

				    exit 1

				fi

				cd ..

				rm -rf ./${cuda_installer_name}

				rm -f ./${cuda_installer_name}.exe

									
										62

.circleci/scripts/windows_cudnn_install.sh
									
												View File
												
				@ -1,32 +1,46 @@

				#!/bin/bash

				set -eux -o pipefail

				cuda_major_version=${CUDA_VERSION%.*}

				# This is typically blank but for CUDA 10* it'll be set to 10

				windows_version_qualifier=""

				if [[ "$cuda_major_version" == "10" ]]; then

				    cudnn_installer_name="cudnn-${CUDA_VERSION}-windows10-x64-v7.6.4.38"

				elif [[ "$cuda_major_version" == "11" ]]; then

				    if [[ "${CUDA_VERSION}" == "11.1" ]]; then

				        cudnn_installer_name="cudnn-${CUDA_VERSION}-windows-x64-v8.0.5.39"

				    elif [[ "${CUDA_VERSION}" == "11.3" ]]; then

				        cudnn_installer_name="cudnn-${CUDA_VERSION}-windows-x64-v8.2.0.53"

				    else

				        echo "This should not happen! ABORT."

				case ${CUDA_VERSION} in

				    10.1)

				        archive_version="v7.6.4.38"

				        windows_version_qualifier="10"

				        ;;

				    10.2)

				        archive_version="v7.6.5.32"

				        windows_version_qualifier="10"

				        ;;

				    11.1)

				        archive_version="v8.0.5.39"

				        ;;

				    11.3)

				        archive_version="v8.2.0.53"

				        ;;

				    *)

				        echo "CUDA_VERSION: ${CUDA_VERSION} not supported yet"

				        exit 1

				    fi

				else

				    echo "CUDNN for CUDA_VERSION $CUDA_VERSION is not supported yet"

				    exit 1

				fi

				        ;;

				esac

				cudnn_installer_link="https://ossci-windows.s3.amazonaws.com/${cudnn_installer_name}.zip"

				cudnn_installer_name="cudnn_installer.zip"

				cudnn_installer_link="https://ossci-windows.s3.amazonaws.com/cudnn-${CUDA_VERSION}-windows${windows_version_qualifier}-x64-${archive_version}.zip"

				cudnn_install_folder="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/"

				curl --retry 3 -O "$cudnn_installer_link"

				7z x "${cudnn_installer_name}.zip" -ocudnn

				# shellcheck recommends to use '${var:?}/*' to avoid potentially expanding to '/*'

				# Remove all of the directories before attempting to copy files

				rm -rf "${cudnn_install_folder:?}/*"

				cp -rf cudnn/cuda/* "${cudnn_install_folder}"

				rm -rf cudnn

				rm -f "${cudnn_installer_name}.zip"

				if [[ -f "${cudnn_install_folder}/include/cudnn.h" ]]; then

				    echo "Existing cudnn installation found, skipping install..."

				else

				    tmp_dir=$(mktemp -d)

				    (

				        pushd "${tmp_dir}"

				        curl --retry 3 -o "${cudnn_installer_name}" "$cudnn_installer_link"

				        7z x "${cudnn_installer_name}" -ocudnn

				        # Use '${var:?}/*' to avoid potentially expanding to '/*'

				        # Remove all of the directories before attempting to copy files

				        rm -rf "${cudnn_install_folder:?}/*"

				        cp -rf cudnn/cuda/* "${cudnn_install_folder}"

				    )

				    rm -rf "${tmp_dir}"

				fi

									
										12

.circleci/verbatim-sources/build-parameters/pytorch-build-params.yml
									
												View File
												
				@ -15,11 +15,15 @@ pytorch_params: &pytorch_params

				    build_only:

				      type: string

				      default: ""

				    ci_master:

				      type: string

				      default: ""

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				    DOCKER_IMAGE: << parameters.docker_image >>

				    USE_CUDA_DOCKER_RUNTIME: << parameters.use_cuda_docker_runtime >>

				    BUILD_ONLY: << parameters.build_only >>

				    CI_MASTER: << pipeline.parameters.run_master_build >>

				  resource_class: << parameters.resource_class >>

				pytorch_android_params: &pytorch_android_params

				@ -60,6 +64,9 @@ pytorch_ios_params: &pytorch_ios_params

				    lite_interpreter:

				      type: string

				      default: "1"

				    use_coreml:

				      type: string

				      default: "0"

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				    IOS_ARCH: << parameters.ios_arch >>

				@ -67,6 +74,7 @@ pytorch_ios_params: &pytorch_ios_params

				    SELECTED_OP_LIST: << parameters.op_list >>

				    USE_PYTORCH_METAL: << parameters.use_metal >>

				    BUILD_LITE_INTERPRETER: << parameters.lite_interpreter >>

				    USE_COREML_DELEGATE: << parameters.use_coreml >>

				pytorch_windows_params: &pytorch_windows_params

				  parameters:

				@ -85,6 +93,9 @@ pytorch_windows_params: &pytorch_windows_params

				    python_version:

				      type: string

				      default: "3.8"

				    vs_version:

				      type: string

				      default: "16.8.6"

				    vc_version:

				      type: string

				      default: "14.16"

				@ -102,6 +113,7 @@ pytorch_windows_params: &pytorch_windows_params

				    SCCACHE_BUCKET: "ossci-compiler-cache"

				    CUDA_VERSION: <<parameters.cuda_version>>

				    PYTHON_VERSION: <<parameters.python_version>>

				    VS_VERSION: <<parameters.vs_version>>

				    VC_VERSION: <<parameters.vc_version>>

				    VC_YEAR: <<parameters.vc_year>>

				    VC_PRODUCT: <<parameters.vc_product>>

									
										2

.circleci/verbatim-sources/commands.yml
									
												View File
												
				@ -171,4 +171,4 @@ commands:

				            cd ~/project

				            export ANDROID_BUILD_TYPE="<< parameters.build_type >>"

				            export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				            python3 .circleci/scripts/upload_binary_size_to_scuba.py android

				            python3 -m tools.stats.upload_binary_size_to_scuba android

									
										6

.circleci/verbatim-sources/job-specs/binary-job-specs.yml
									
												View File
												
				@ -29,7 +29,7 @@

				            cd /pytorch && export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				            python3 -mpip install requests && \

				            SCRIBE_GRAPHQL_ACCESS_TOKEN=${SCRIBE_GRAPHQL_ACCESS_TOKEN} \

				            python3 /pytorch/.circleci/scripts/upload_binary_size_to_scuba.py || exit 0

				            python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				    - persist_to_workspace:

				        root: /

				        paths: final_pkgs

				@ -239,7 +239,7 @@

				  binary_ios_build:

				    <<: *pytorch_ios_params

				    macos:

				      xcode: "12.0"

				      xcode: "12.5.1"

				    steps:

				    - attach_workspace:

				        at: ~/workspace

				@ -266,7 +266,7 @@

				  binary_ios_upload:

				    <<: *pytorch_ios_params

				    macos:

				      xcode: "12.0"

				      xcode: "12.5.1"

				    steps:

				    - attach_workspace:

				        at: ~/workspace

									
										70

.circleci/verbatim-sources/job-specs/job-specs-custom.yml
									
												View File
												
				@ -41,7 +41,7 @@

				        no_output_timeout: "1h"

				        command: |

				          set -ex

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          tag=${CIRCLE_TAG:1:5}

				          target=${tag:-master}

				@ -86,7 +86,7 @@

				        no_output_timeout: "1h"

				        command: |

				          set -ex

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          tag=${CIRCLE_TAG:1:5}

				          target=${tag:-master}

				@ -126,6 +126,7 @@

				            set -e

				            export IN_CI=1

				            export CROSS_COMPILE_ARM64=1

				            export JOB_BASE_NAME=$CIRCLE_JOB

				            # Install sccache

				            sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache

				@ -162,6 +163,7 @@

				          command: |

				            set -e

				            export IN_CI=1

				            export JOB_BASE_NAME=$CIRCLE_JOB

				            # Install sccache

				            sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache

				@ -198,6 +200,7 @@

				          command: |

				            set -e

				            export IN_CI=1

				            export JOB_BASE_NAME=$CIRCLE_JOB

				            chmod a+x .jenkins/pytorch/macos-test.sh

				            unbuffer .jenkins/pytorch/macos-test.sh 2>&1 | ts

				@ -208,12 +211,14 @@

				            set -ex

				            source /Users/distiller/workspace/miniconda3/bin/activate

				            pip install boto3

				            export PYTHONPATH="$PWD"

				            export IN_CI=1

				            export JOB_BASE_NAME=$CIRCLE_JOB

				            # Using the same IAM user to write stats to our OSS bucket

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}

				            python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test

				            python -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				          when: always

				      - store_test_results:

				          path: test/test-reports

				@ -235,6 +240,7 @@

				            set -e

				            export IN_CI=1

				            export BUILD_LITE_INTERPRETER=1

				            export JOB_BASE_NAME=$CIRCLE_JOB

				            chmod a+x ${HOME}/project/.jenkins/pytorch/macos-lite-interpreter-build-test.sh

				            unbuffer ${HOME}/project/.jenkins/pytorch/macos-lite-interpreter-build-test.sh 2>&1 | ts

				      - store_test_results:

				@ -258,7 +264,7 @@

				        no_output_timeout: "1h"

				        command: |

				          set -eux

				          docker_image_commit=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				          docker_image_commit=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}

				          docker_image_libtorch_android_x86_32=${docker_image_commit}-android-x86_32

				          docker_image_libtorch_android_x86_64=${docker_image_commit}-android-x86_64

				@ -347,7 +353,7 @@

				        no_output_timeout: "1h"

				        command: |

				          set -eux

				          docker_image_commit=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				          docker_image_commit=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}

				          docker_image_libtorch_android_x86_32_gradle=${docker_image_commit}-android-x86_32-gradle

				@ -384,7 +390,7 @@

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          docker_image_libtorch_android_x86_32=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}-android-x86_32

				          docker_image_libtorch_android_x86_32=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}-android-x86_32

				          echo "docker_image_libtorch_android_x86_32: "${docker_image_libtorch_android_x86_32}

				          # x86

				@ -431,7 +437,7 @@

				          echo "DOCKER_IMAGE: ${DOCKER_IMAGE}:${DOCKER_TAG}"

				          time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null

				          git submodule sync && git submodule update -q --init --recursive --depth 1

				          git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0

				          VOLUME_MOUNTS="-v /home/circleci/project/:/var/lib/jenkins/workspace"

				          export id=$(docker run --env-file "${BASH_ENV}" ${VOLUME_MOUNTS} --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})

				@ -447,7 +453,7 @@

				  pytorch_ios_build:

				    <<: *pytorch_ios_params

				    macos:

				      xcode: "12.0"

				      xcode: "12.5.1"

				    steps:

				      - checkout

				      - run_brew_for_ios_build

				@ -461,16 +467,17 @@

				            # install fastlane

				            sudo gem install bundler && bundle install

				            # install certificates

				            echo ${IOS_CERT_KEY} >> cert.txt

				            echo ${IOS_CERT_KEY_2022} >> cert.txt

				            base64 --decode cert.txt -o Certificates.p12

				            rm cert.txt

				            bundle exec fastlane install_cert

				            bundle exec fastlane install_root_cert

				            bundle exec fastlane install_dev_cert

				            # install the provisioning profile

				            PROFILE=PyTorch_CI_2021.mobileprovision

				            PROFILE=PyTorch_CI_2022.mobileprovision

				            PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles

				            mkdir -pv "${PROVISIONING_PROFILES}"

				            cd "${PROVISIONING_PROFILES}"

				            echo ${IOS_SIGN_KEY} >> cert.txt

				            echo ${IOS_SIGN_KEY_2022} >> cert.txt

				            base64 --decode cert.txt -o ${PROFILE}

				            rm cert.txt

				      - run:

				@ -500,7 +507,7 @@

				            # sync submodules

				            cd ${PROJ_ROOT}

				            git submodule sync

				            git submodule update --init --recursive --depth 1

				            git submodule update --init --recursive --depth 1 --jobs 0

				            # export

				            export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

				@ -528,12 +535,8 @@

				          no_output_timeout: "30m"

				          command: |

				            set -e

				            if [ ${BUILD_LITE_INTERPRETER} == 0 ]; then

				              echo "Run Build Test is not for full jit, skipping."

				              exit 0

				            fi

				            PROJ_ROOT=/Users/distiller/project

				            PROFILE=PyTorch_CI_2021

				            PROFILE=PyTorch_CI_2022

				            # run the ruby build script

				            if ! [ -x "$(command -v xcodebuild)" ]; then

				              echo 'Error: xcodebuild is not installed.'

				@ -557,21 +560,28 @@

				            if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then

				              echo "not SIMULATOR build, skip it."

				              exit 0

				            elif [ ${BUILD_LITE_INTERPRETER} == 0 ]; then

				              echo "Run Simulator Tests is not for full jit, skipping."

				              exit 0

				            fi

				            WORKSPACE=/Users/distiller/workspace

				            PROJ_ROOT=/Users/distiller/project

				            source ~/anaconda/bin/activate

				            pip install torch torchvision --progress-bar off

				            #run unit test

				            # use the pytorch nightly build to generate models

				            conda install pytorch torchvision -c pytorch-nightly --yes

				            # generate models for differnet backends

				            cd ${PROJ_ROOT}/ios/TestApp/benchmark

				            mkdir -p ../models

				            python trace_model.py

				            ruby setup.rb

				            if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then

				              ruby setup.rb --lite 1

				            else

				              ruby setup.rb

				            fi

				            cd ${PROJ_ROOT}/ios/TestApp

				            instruments -s -devices

				            fastlane scan

				            if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then

				              fastlane scan --only_testing TestAppTests/TestAppTests/testLiteInterpreter

				            else

				              fastlane scan --only_testing TestAppTests/TestAppTests/testFullJIT

				            fi

				  pytorch_linux_bazel_build:

				    <<: *pytorch_params

				    machine:

				@ -593,7 +603,7 @@

				          echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"

				          git submodule sync && git submodule update -q --init --recursive --depth 1

				          git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0

				          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

				@ -604,7 +614,7 @@

				          # Push intermediate Docker image for next phase to use

				          if [ -z "${BUILD_ONLY}" ]; then

				            # Augment our output image name with bazel to avoid collisions

				            output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-bazel-${CIRCLE_SHA1}

				            output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-bazel-${CIRCLE_SHA1}

				            export COMMIT_DOCKER_IMAGE=$output_image

				            docker commit "$id" ${COMMIT_DOCKER_IMAGE}

				            time docker push ${COMMIT_DOCKER_IMAGE}

				@ -624,7 +634,7 @@

				        no_output_timeout: "90m"

				        command: |

				          set -e

				          output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-bazel-${CIRCLE_SHA1}

				          output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-bazel-${CIRCLE_SHA1}

				          export COMMIT_DOCKER_IMAGE=$output_image

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				@ -684,7 +694,7 @@

				        no_output_timeout: "30m"

				        command: |

				          set -ex

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

									
										40

.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml
									
												View File
												
				@ -30,11 +30,11 @@ jobs:

				          time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null

				          export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})

				          git submodule sync && git submodule update -q --init --recursive --depth 1

				          git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0

				          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

				          export COMMAND='((echo "sudo chown -R jenkins workspace && export CIRCLE_JOB="$CIRCLE_JOB" && cd workspace && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          export COMMAND='((echo "sudo chown -R jenkins workspace && export JOB_BASE_NAME="$CIRCLE_JOB" && cd workspace && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				@ -47,7 +47,7 @@ jobs:

				            # The xla build uses the same docker image as

				            # pytorch_linux_bionic_py3_6_clang9_build. In the push step, we have to

				            # distinguish between them so the test can pick up the correct image.

				            output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				            output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}

				            if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-xla

				            elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then

				@ -81,7 +81,7 @@ jobs:

				            cd /pytorch && export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				            python3 -mpip install requests && \

				            SCRIBE_GRAPHQL_ACCESS_TOKEN=${SCRIBE_GRAPHQL_ACCESS_TOKEN} \

				            python3 .circleci/scripts/upload_binary_size_to_scuba.py || exit 0

				            python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				    - store_artifacts:

				        path: /home/circleci/project/dist

				@ -105,7 +105,7 @@ jobs:

				            export DOCKER_TAG="f3d89a32912f62815e4feaeed47e564e887dffd6"

				          fi

				          # See Note [Special build images]

				          output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}

				          output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}

				          if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-xla

				          elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then

				@ -158,13 +158,14 @@ jobs:

				          }

				          if is_vanilla_build; then

				            echo "apt-get update && apt-get install -y qemu-user gdb" | docker exec -u root -i "$id" bash

				            echo "apt-get update || apt-get install libgnutls30" | docker exec -u root -i "$id" bash

				            echo "apt-get install -y qemu-user gdb" | docker exec -u root -i "$id" bash

				            echo "cd workspace/build; qemu-x86_64 -g 2345 -cpu Broadwell -E ATEN_CPU_CAPABILITY=default ./bin/basic --gtest_filter=BasicTest.BasicTestCPU & gdb ./bin/basic -ex 'set pagination off' -ex 'target remote :2345' -ex 'continue' -ex 'bt' -ex='set confirm off' -ex 'quit \$_isvoid(\$_exitcode)'" | docker exec -u jenkins -i "$id" bash

				          else

				            echo "Skipping for ${BUILD_ENVIRONMENT}"

				          fi

				    - run:

				        name: Run tests

				        name: Test

				        no_output_timeout: "90m"

				        command: |

				          set -e

				@ -173,7 +174,16 @@ jobs:

				          # =================== The following code will be executed inside Docker container ===================

				          set -ex

				          export SCRIBE_GRAPHQL_ACCESS_TOKEN="${SCRIBE_GRAPHQL_ACCESS_TOKEN}"

				          export CIRCLE_JOB="$CIRCLE_JOB"

				          export JOB_BASE_NAME="$CIRCLE_JOB"

				          # temporary fix for https://github.com/pytorch/pytorch/issues/60746

				          if [ -z "$CIRCLE_PR_NUMBER" ]; then

				            if [[ $CIRCLE_BRANCH =~ .*pull.* ]]; then

				              export PR_NUMBER="$(echo $CIRCLE_BRANCH | sed 's/[^0-9]//g')"

				              export CIRCLE_PR_NUMBER="$PR_NUMBER"

				            fi

				          else

				            export PR_NUMBER="$CIRCLE_PR_NUMBER"

				          fi

				          ${PARALLEL_FLAGS}

				          cd workspace

				          EOL

				@ -220,11 +230,10 @@ jobs:

				          export CIRCLE_SHA1="$CIRCLE_SHA1"

				          export CIRCLE_PR_NUMBER="${CIRCLE_PR_NUMBER:-}"

				          export CIRCLE_BRANCH="$CIRCLE_BRANCH"

				          export CIRCLE_JOB="$CIRCLE_JOB"

				          export JOB_BASE_NAME="$CIRCLE_JOB"

				          export CIRCLE_WORKFLOW_ID="$CIRCLE_WORKFLOW_ID"

				          cd workspace

				          export PYTHONPATH="\${PWD}"

				          python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test

				          python -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				          EOL

				          echo "(cat docker_commands.sh | docker exec -u jenkins -e LANG=C.UTF-8 -i "$id" bash) 2>&1" > command.sh

				          unbuffer bash command.sh | ts

				@ -254,6 +263,9 @@ jobs:

				      python_version:

				        type: string

				        default: "3.8"

				      vs_version:

				        type: string

				        default: "16.8.6"

				      vc_version:

				        type: string

				        default: "14.16"

				@ -321,6 +333,9 @@ jobs:

				      python_version:

				        type: string

				        default: "3.8"

				      vs_version:

				        type: string

				        default: "16.8.6"

				      vc_version:

				        type: string

				        default: "14.16"

				@ -376,9 +391,8 @@ jobs:

				            set -ex

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}

				            export PYTHONPATH="$PWD"

				            pip install typing_extensions boto3

				            python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test

				            python -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				          when: always

				      - store_test_results:

				          path: test/test-reports

									
										166

.circleci/verbatim-sources/workflows/workflows-scheduled-ci.yml
									
												View File
												
				@ -1,169 +1,3 @@

				  scheduled-ci:

				    triggers:

				      - schedule:

				          # runs every 4 hours on the 45th minute

				          cron: "45 0,4,8,12,16,20 * * *"

				          filters:

				            branches:

				              only:

				                - master

				    jobs:

				      - docker_build_job:

				          name: "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          image_name: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				      - pytorch_linux_build:

				          name: periodic_pytorch_xenial_cuda11_3_cudnn8_gcc7_build

				          requires:

				            - "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				      - pytorch_linux_test:

				          name: periodic_pytorch_xenial_cuda11_3_cudnn8_gcc7_test

				          requires:

				            - periodic_pytorch_xenial_cuda11_3_cudnn8_gcc7_build

				          build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-test"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          use_cuda_docker_runtime: "1"

				          resource_class: gpu.medium

				      - pytorch_linux_build:

				          name: periodic_libtorch_xenial_cuda11_3_cudnn8_gcc7_build

				          requires:

				            - "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          build_environment: "pytorch-libtorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				      - pytorch_windows_build:

				          build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3

				          cuda_version: "11.3"

				          name: periodic_pytorch_windows_cuda11.3_build

				          python_version: "3.8"

				          use_cuda: "1"

				          vc_product: BuildTools

				          vc_version: "14.28.29333"

				          vc_year: "2019"

				      - pytorch_windows_test:

				          build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3

				          cuda_version: "11.3"

				          executor: windows-with-nvidia-gpu

				          name: periodic_pytorch_windows_cuda11.3_test1

				          python_version: "3.8"

				          requires:

				            - periodic_pytorch_windows_cuda11.3_build

				          test_name: pytorch-windows-test1

				          use_cuda: "1"

				          vc_product: BuildTools

				          vc_version: "14.28.29333"

				          vc_year: "2019"

				      - pytorch_windows_test:

				          build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3

				          cuda_version: "11.3"

				          executor: windows-with-nvidia-gpu

				          name: periodic_pytorch_windows_cuda11.3_test2

				          python_version: "3.8"

				          requires:

				            - periodic_pytorch_windows_cuda11.3_build

				          test_name: pytorch-windows-test2

				          use_cuda: "1"

				          vc_product: BuildTools

				          vc_version: "14.28.29333"

				          vc_year: "2019"

				  # The following allows these jobs to run on ci-all and release branches

				  debuggable-scheduled-ci:

				    jobs:

				      - docker_build_job:

				          name: "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          image_name: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          filters:

				            branches:

				              only:

				                - /ci-all\/.*/

				                - /release\/.*/

				      - pytorch_linux_build:

				          name: pytorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_build

				          requires:

				            - "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          filters:

				            branches:

				              only:

				                - /ci-all\/.*/

				                - /release\/.*/

				      - pytorch_linux_test:

				          name: pytorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_test

				          requires:

				            - pytorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_build

				          build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-test"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          use_cuda_docker_runtime: "1"

				          resource_class: gpu.medium

				          filters:

				            branches:

				              only:

				                - /ci-all\/.*/

				                - /release\/.*/

				      - pytorch_linux_build:

				          name: pytorch_libtorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_build

				          requires:

				            - "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          build_environment: "pytorch-libtorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

				          filters:

				            branches:

				              only:

				                - /ci-all\/.*/

				                - /release\/.*/

				      - pytorch_windows_build:

				          build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3

				          cuda_version: "11.3"

				          name: pytorch_windows_vs2019_py38_cuda11.3_build

				          python_version: "3.8"

				          use_cuda: "1"

				          vc_product: BuildTools

				          vc_version: "14.28.29333"

				          vc_year: "2019"

				          filters:

				            branches:

				              only:

				                - /ci-all\/.*/

				                - /release\/.*/

				      - pytorch_windows_test:

				          build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3

				          cuda_version: "11.3"

				          executor: windows-with-nvidia-gpu

				          name: pytorch_windows_vs2019_py38_cuda11.3_test1

				          python_version: "3.8"

				          requires:

				            - pytorch_windows_vs2019_py38_cuda11.3_build

				          test_name: pytorch-windows-test1

				          use_cuda: "1"

				          vc_product: BuildTools

				          vc_version: "14.28.29333"

				          vc_year: "2019"

				          filters:

				            branches:

				              only:

				                - /ci-all\/.*/

				                - /release\/.*/

				      - pytorch_windows_test:

				          build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3

				          cuda_version: "11.3"

				          executor: windows-with-nvidia-gpu

				          name: pytorch_windows_vs2019_py38_cuda11.3_test2

				          python_version: "3.8"

				          requires:

				            - pytorch_windows_vs2019_py38_cuda11.3_build

				          test_name: pytorch-windows-test2

				          use_cuda: "1"

				          vc_product: BuildTools

				          vc_version: "14.28.29333"

				          vc_year: "2019"

				          filters:

				            branches:

				              only:

				                - /ci-all\/.*/

				                - /release\/.*/

				  # the following clones pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7's tests but enables

				  # slow tests and sets an environment variable so gradcheck runs with fast_mode=False

				  slow-gradcheck-scheduled-ci:

3

.clang-tidy

View File

 @ -9,6 +9,7 @@ bugprone-*,
 -bugprone-reserved-identifier,
 cppcoreguidelines-*,
 -cppcoreguidelines-avoid-magic-numbers,
 -cppcoreguidelines-avoid-non-const-global-variables,
 -cppcoreguidelines-interfaces-global-init,
 -cppcoreguidelines-macro-usage,
 -cppcoreguidelines-owning-memory,
 @ -21,6 +22,7 @@ cppcoreguidelines-*,
 -cppcoreguidelines-pro-type-union-access,
 -cppcoreguidelines-pro-type-vararg,
 -cppcoreguidelines-special-member-functions,
 -cppcoreguidelines-non-private-member-variables-in-classes,
 -facebook-hte-RelativeInclude,
 hicpp-exception-baseclass,
 hicpp-avoid-goto,
 @ -37,5 +39,6 @@ performance-*,
 '
 HeaderFilterRegex: 'torch/csrc/.*'
 AnalyzeTemporaryDtors: false
 WarningsAsErrors: '*'
 CheckOptions:
 ...

5

.gitattributes vendored

View File

 @ -1 +1,4 @@
 *.bat	text eol=crlf
 *.bat text eol=crlf
 .circleci/config.yml linguist-generated=true
 .github/workflows/generated-*.yml linguist-generated=true
 .github/generated-* linguist-generated=true

2

.github/ISSUE_TEMPLATE/feature-request.md vendored

View File

 @ -1,5 +1,5 @@
 ---
 name: "\U0001F680Feature Request"
 name: "\U0001F680 Feature Request"
 about: Submit a proposal/request for a new PyTorch feature
 ---

									
										8

.github/actionlint.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,8 @@

				self-hosted-runner:

				  labels:

				    - linux.2xlarge

				    - linux.8xlarge.nvidia.gpu

				    - linux.16xlarge.nvidia.gpu

				    - windows.4xlarge

				    - windows.8xlarge.nvidia.gpu

				    - bm-runner

									
										102

.github/generated-ciflow-ruleset.json
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,102 @@

				{

				  "__comment": "@generated DO NOT EDIT MANUALLY, Generation script: .github/scripts/generate_ci_workflows.py",

				  "label_rules": {

				    "ciflow/all": [

				      "libtorch-linux-xenial-cuda10.2-py3.6-gcc7",

				      "libtorch-linux-xenial-cuda11.3-py3.6-gcc7",

				      "linux-bionic-cuda10.2-py3.9-gcc7",

				      "linux-bionic-py3.6-clang9",

				      "linux-bionic-py3.8-gcc9-coverage",

				      "linux-xenial-cuda10.2-py3.6-gcc7",

				      "linux-xenial-cuda11.3-py3.6-gcc7",

				      "linux-xenial-py3.6-gcc5.4",

				      "linux-xenial-py3.6-gcc7-bazel-test",

				      "parallelnative-linux-xenial-py3.6-gcc5.4",

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",

				      "periodic-linux-xenial-cuda11.1-py3.6-gcc7",

				      "periodic-win-vs2019-cuda11.1-py3",

				      "puretorch-linux-xenial-py3.6-gcc5.4",

				      "win-vs2019-cpu-py3",

				      "win-vs2019-cuda10.2-py3",

				      "win-vs2019-cuda11.3-py3"

				    ],

				    "ciflow/bazel": [

				      "linux-xenial-py3.6-gcc7-bazel-test"

				    ],

				    "ciflow/coverage": [

				      "linux-bionic-py3.8-gcc9-coverage"

				    ],

				    "ciflow/cpu": [

				      "linux-bionic-py3.6-clang9",

				      "linux-bionic-py3.8-gcc9-coverage",

				      "linux-xenial-py3.6-gcc5.4",

				      "linux-xenial-py3.6-gcc7-bazel-test",

				      "parallelnative-linux-xenial-py3.6-gcc5.4",

				      "puretorch-linux-xenial-py3.6-gcc5.4",

				      "win-vs2019-cpu-py3"

				    ],

				    "ciflow/cuda": [

				      "libtorch-linux-xenial-cuda10.2-py3.6-gcc7",

				      "libtorch-linux-xenial-cuda11.3-py3.6-gcc7",

				      "linux-bionic-cuda10.2-py3.9-gcc7",

				      "linux-xenial-cuda10.2-py3.6-gcc7",

				      "linux-xenial-cuda11.3-py3.6-gcc7",

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",

				      "periodic-linux-xenial-cuda11.1-py3.6-gcc7",

				      "periodic-win-vs2019-cuda11.1-py3",

				      "win-vs2019-cuda10.2-py3",

				      "win-vs2019-cuda11.3-py3"

				    ],

				    "ciflow/default": [

				      "linux-bionic-py3.6-clang9",

				      "linux-bionic-py3.8-gcc9-coverage",

				      "linux-xenial-cuda11.3-py3.6-gcc7",

				      "linux-xenial-py3.6-gcc5.4",

				      "linux-xenial-py3.6-gcc7-bazel-test",

				      "win-vs2019-cpu-py3",

				      "win-vs2019-cuda11.3-py3"

				    ],

				    "ciflow/libtorch": [

				      "libtorch-linux-xenial-cuda10.2-py3.6-gcc7",

				      "libtorch-linux-xenial-cuda11.3-py3.6-gcc7",

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7"

				    ],

				    "ciflow/linux": [

				      "libtorch-linux-xenial-cuda10.2-py3.6-gcc7",

				      "libtorch-linux-xenial-cuda11.3-py3.6-gcc7",

				      "linux-bionic-cuda10.2-py3.9-gcc7",

				      "linux-bionic-py3.6-clang9",

				      "linux-bionic-py3.8-gcc9-coverage",

				      "linux-xenial-cuda10.2-py3.6-gcc7",

				      "linux-xenial-cuda11.3-py3.6-gcc7",

				      "linux-xenial-py3.6-gcc5.4",

				      "linux-xenial-py3.6-gcc7-bazel-test",

				      "parallelnative-linux-xenial-py3.6-gcc5.4",

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",

				      "periodic-linux-xenial-cuda11.1-py3.6-gcc7",

				      "puretorch-linux-xenial-py3.6-gcc5.4"

				    ],

				    "ciflow/noarch": [

				      "linux-bionic-py3.6-clang9"

				    ],

				    "ciflow/scheduled": [

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",

				      "periodic-linux-xenial-cuda11.1-py3.6-gcc7",

				      "periodic-win-vs2019-cuda11.1-py3"

				    ],

				    "ciflow/slow": [

				      "linux-bionic-cuda10.2-py3.9-gcc7",

				      "linux-xenial-cuda10.2-py3.6-gcc7"

				    ],

				    "ciflow/win": [

				      "periodic-win-vs2019-cuda11.1-py3",

				      "win-vs2019-cpu-py3",

				      "win-vs2019-cuda10.2-py3",

				      "win-vs2019-cuda11.3-py3"

				    ],

				    "ciflow/xla": [

				      "linux-bionic-py3.6-clang9"

				    ]

				  },

				  "version": "v1"

				}

									
										4

.github/pytorch-circleci-labels.yml
									
										vendored
									
												View File
												
				@ -13,5 +13,9 @@ labels_to_circle_params:

				      - run_build

				  ci/master:

				    parameter: run_master_build

				    set_to_false:

				      - run_build

				  ci/slow-gradcheck:

				    parameter: run_slow_gradcheck_build

				    set_to_false:

				      - run_build

									
										1

.github/pytorch-probot.yml
									
										vendored
									
												View File
												
				@ -1 +1,2 @@

				tracking_issue: 24422

				ciflow_tracking_issue: 64124

									
										6

.github/regenerate.sh
									
										vendored
									
										Executable file
									
												View File
												
				@ -0,0 +1,6 @@

				#!/bin/bash -e

				# Allows this script to be invoked from any directory:

				cd "$(dirname "$0")"

				python3 scripts/generate_ci_workflows.py

									
										5

.github/scale-config.yml
									
										vendored
									
												View File
												
				@ -27,6 +27,11 @@ runner_types:

				    os: linux

				    max_available: 50

				    disk_size: 150

				  linux.16xlarge.nvidia.gpu:

				    instance_type: g3.16xlarge

				    os: linux

				    max_available: 10

				    disk_size: 150

				  windows.4xlarge:

				    instance_type: c5d.4xlarge

				    os: windows

									
										5

.github/scripts/ensure_actions_will_cancel.py
									
										vendored
									
												View File
												
				@ -13,7 +13,10 @@ WORKFLOWS = REPO_ROOT / ".github" / "workflows"

				def concurrency_key(filename: Path) -> str:

				    workflow_name = filename.with_suffix("").name.replace("_", "-")

				    return f"{workflow_name}-${{{{ github.event.pull_request.number || github.sha }}}}"

				    if workflow_name.startswith("generated-"):

				        workflow_name = workflow_name[len("generated-"):]

				    return f"{workflow_name}-${{{{ github.event.pull_request.number || github.sha }}}}" \

				        "-${{ github.event_name == 'workflow_dispatch' }}"

				def should_check(filename: Path) -> bool:

									
										634

.github/scripts/generate_ci_workflows.py
									
										vendored
									
												View File
												
				@ -1,222 +1,586 @@

				#!/usr/bin/env python3

				from dataclasses import asdict, dataclass, field

				from pathlib import Path

				from typing import Any, Dict

				from typing import Dict, Set

				import jinja2

				import json

				import os

				import sys

				from typing_extensions import Literal

				YamlShellBool = Literal["''", 1]

				Arch = Literal["windows", "linux"]

				DOCKER_REGISTRY = "308535385114.dkr.ecr.us-east-1.amazonaws.com"

				GITHUB_DIR = Path(__file__).parent.parent

				# it would be nice to statically specify that build_environment must be

				# present, but currently Python has no easy way to do that

				# https://github.com/python/mypy/issues/4617

				PyTorchWorkflow = Dict[str, Any]

				GITHUB_DIR = Path(__file__).resolve().parent.parent

				WINDOWS_CPU_TEST_RUNNER = "windows.4xlarge"

				WINDOWS_CUDA_TEST_RUNNER = "windows.8xlarge.nvidia.gpu"

				def PyTorchWindowsWorkflow(

				    *,

				    build_environment: str,

				    test_runner_type: str,

				    cuda_version: str,

				    on_pull_request: bool = False

				) -> PyTorchWorkflow:

				    return {

				        "build_environment": build_environment,

				        "test_runner_type": test_runner_type,

				        "cuda_version": cuda_version,

				        "on_pull_request": on_pull_request,

				    }

				WINDOWS_RUNNERS = {

				    WINDOWS_CPU_TEST_RUNNER,

				    WINDOWS_CUDA_TEST_RUNNER,

				}

				LINUX_CPU_TEST_RUNNER = "linux.2xlarge"

				LINUX_CUDA_TEST_RUNNER = "linux.8xlarge.nvidia.gpu"

				LINUX_RUNNERS = {

				    LINUX_CPU_TEST_RUNNER,

				    LINUX_CUDA_TEST_RUNNER,

				}

				CUDA_RUNNERS = {

				    WINDOWS_CUDA_TEST_RUNNER,

				    LINUX_CUDA_TEST_RUNNER,

				}

				CPU_RUNNERS = {

				    WINDOWS_CPU_TEST_RUNNER,

				    LINUX_CPU_TEST_RUNNER,

				}

				LABEL_CIFLOW_ALL = "ciflow/all"

				LABEL_CIFLOW_BAZEL = "ciflow/bazel"

				LABEL_CIFLOW_COVERAGE = "ciflow/coverage"

				LABEL_CIFLOW_CPU = "ciflow/cpu"

				LABEL_CIFLOW_CUDA = "ciflow/cuda"

				LABEL_CIFLOW_DEFAULT = "ciflow/default"

				LABEL_CIFLOW_LIBTORCH = "ciflow/libtorch"

				LABEL_CIFLOW_LINUX = "ciflow/linux"

				LABEL_CIFLOW_SCHEDULED = "ciflow/scheduled"

				LABEL_CIFLOW_SLOW = "ciflow/slow"

				LABEL_CIFLOW_WIN = "ciflow/win"

				LABEL_CIFLOW_XLA = "ciflow/xla"

				LABEL_CIFLOW_NOARCH = "ciflow/noarch"

				def PyTorchLinuxWorkflow(

				    *,

				    build_environment: str,

				    docker_image_base: str,

				    test_runner_type: str,

				    on_pull_request: bool = False,

				    enable_doc_jobs: bool = False,

				) -> PyTorchWorkflow:

				    return {

				        "build_environment": build_environment,

				        "docker_image_base": docker_image_base,

				        "test_runner_type": test_runner_type,

				        "on_pull_request": on_pull_request,

				        "enable_doc_jobs": enable_doc_jobs,

				    }

				@dataclass

				class CIFlowConfig:

				    enabled: bool = False

				    # For use to enable workflows to run on pytorch/pytorch-canary

				    run_on_canary: bool = False

				    labels: Set[str] = field(default_factory=set)

				    trigger_action: str = 'unassigned'

				    trigger_actor: str = 'pytorchbot'

				    root_job_name: str = 'ciflow_should_run'

				    root_job_condition: str = ''

				    # trigger_action_only controls if we listen only on the trigger_action of a pull_request.

				    # If it's False, we listen on all default pull_request actions, this is useful when

				    # ciflow (via probot) is not automated yet.

				    trigger_action_only: bool = False

				    def gen_root_job_condition(self) -> None:

				        # TODO: Make conditions strict

				        # At the beginning of the rollout of ciflow, we keep everything the same as what we have

				        # Once fully rollout, we can have strict constraints

				        # e.g. ADD      env.GITHUB_ACTOR == '{self.trigger_actor}

				        #      REMOVE   github.event.action !='{self.trigger_action}'

				        label_conditions = [

				            f"contains(github.event.pull_request.labels.*.name, '{label}')" for label in sorted(self.labels)]

				        if self.run_on_canary:

				            self.root_job_condition = "(github.repository_owner == 'pytorch') && "

				        else:

				            self.root_job_condition = "(github.repository == 'pytorch/pytorch') && "

				        self.root_job_condition += f"((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || " \

				            f"(github.event.action !='{self.trigger_action}') || " \

				            f"({' || '.join(label_conditions)}))"

				    def reset_root_job(self) -> None:

				        self.root_job_name = ''

				        self.root_job_condition = ''

				    def __post_init__(self) -> None:

				        if not self.enabled:

				            self.reset_root_job()

				            return

				        self.labels.add(LABEL_CIFLOW_ALL)

				        self.gen_root_job_condition()

				def generate_workflow_file(

				    *,

				    workflow: PyTorchWorkflow,

				    workflow_template: jinja2.Template,

				) -> Path:

				    output_file_path = GITHUB_DIR / f"workflows/{workflow['build_environment']}.yml"

				    with open(output_file_path, "w") as output_file:

				        GENERATED = "generated"

				        output_file.writelines([f"# @{GENERATED} DO NOT EDIT MANUALLY\n"])

				        output_file.write(workflow_template.render(**workflow))

				        output_file.write("\n")

				    return output_file_path

				@dataclass

				class CIFlowRuleset:

				    version = 'v1'

				    output_file = f'{GITHUB_DIR}/generated-ciflow-ruleset.json'

				    label_rules: Dict[str, Set[str]] = field(default_factory=dict)

				    def add_label_rule(self, labels: Set[str], workflow_name: str) -> None:

				        for label in labels:

				            if label in self.label_rules:

				                self.label_rules[label].add(workflow_name)

				            else:

				                self.label_rules[label] = {workflow_name}

				    def generate_json(self) -> None:

				        GENERATED = "generated"  # Note that please keep the variable GENERATED otherwise phabricator will hide the whole file

				        output = {

				            "__comment": f"@{GENERATED} DO NOT EDIT MANUALLY, Generation script: .github/scripts/generate_ci_workflows.py",

				            "version": self.version,

				            "label_rules": {

				                label: sorted(list(workflows))

				                for label, workflows in self.label_rules.items()

				            }

				        }

				        with open(self.output_file, 'w') as outfile:

				            json.dump(output, outfile, indent=2, sort_keys=True)

				            outfile.write('\n')

				@dataclass

				class CIWorkflow:

				    # Required fields

				    arch: Arch

				    build_environment: str

				    test_runner_type: str

				    # Optional fields

				    ciflow_config: CIFlowConfig = field(default_factory=CIFlowConfig)

				    cuda_version: str = ''

				    docker_image_base: str = ''

				    enable_doc_jobs: bool = False

				    exclude_test: bool = False

				    is_coverage: bool = False

				    is_libtorch: bool = False

				    is_scheduled: str = ''

				    num_test_shards: int = 1

				    on_pull_request: bool = False

				    only_build_on_pull_request: bool = False

				    only_run_smoke_tests_on_pull_request: bool = False

				    num_test_shards_on_pull_request: int = -1

				    distributed_test: bool = True

				    # The following variables will be set as environment variables,

				    # so it's easier for both shell and Python scripts to consume it if false is represented as the empty string.

				    enable_jit_legacy_test: YamlShellBool = "''"

				    enable_distributed_test: YamlShellBool = "''"

				    enable_multigpu_test: YamlShellBool = "''"

				    enable_nogpu_no_avx_test: YamlShellBool = "''"

				    enable_nogpu_no_avx2_test: YamlShellBool = "''"

				    enable_slow_test: YamlShellBool = "''"

				    enable_docs_test: YamlShellBool = "''"

				    enable_backwards_compat_test: YamlShellBool = "''"

				    enable_xla_test: YamlShellBool = "''"

				    enable_noarch_test: YamlShellBool = "''"

				    def __post_init__(self) -> None:

				        if self.is_libtorch:

				            self.exclude_test = True

				        if not self.on_pull_request:

				            self.only_build_on_pull_request = False

				        if self.distributed_test:

				            self.enable_distributed_test = 1

				        # If num_test_shards_on_pull_request is not user-defined, default to num_test_shards unless we are

				        # only running smoke tests on the pull request.

				        if self.num_test_shards_on_pull_request == -1:

				            # Don't waste resources on runner spinup and cooldown for another shard if we are only running a few tests

				            if self.only_run_smoke_tests_on_pull_request:

				                self.num_test_shards_on_pull_request = 1

				            else:

				                self.num_test_shards_on_pull_request = self.num_test_shards

				        self.assert_valid()

				    def assert_valid(self) -> None:

				        err_message = f"invalid test_runner_type for {self.arch}: {self.test_runner_type}"

				        if self.arch == 'linux':

				            assert self.test_runner_type in LINUX_RUNNERS, err_message

				        if self.arch == 'windows':

				            assert self.test_runner_type in WINDOWS_RUNNERS, err_message

				        if self.ciflow_config.enabled:

				            # make sure if LABEL_CIFLOW_DEFAULT is set, we then need to set trigger_action_only to False

				            assert self.ciflow_config.trigger_action_only != (LABEL_CIFLOW_DEFAULT in self.ciflow_config.labels)

				            assert self.on_pull_request

				            assert LABEL_CIFLOW_ALL in self.ciflow_config.labels

				            assert LABEL_CIFLOW_ALL in self.ciflow_config.root_job_condition

				            if self.arch == 'linux':

				                assert LABEL_CIFLOW_LINUX in self.ciflow_config.labels

				            if self.arch == 'windows':

				                assert LABEL_CIFLOW_WIN in self.ciflow_config.labels

				            if self.test_runner_type in CUDA_RUNNERS:

				                assert LABEL_CIFLOW_CUDA in self.ciflow_config.labels

				            if self.test_runner_type in CPU_RUNNERS:

				                assert LABEL_CIFLOW_CPU in self.ciflow_config.labels

				    def generate_workflow_file(self, workflow_template: jinja2.Template) -> None:

				        output_file_path = GITHUB_DIR / f"workflows/generated-{self.build_environment}.yml"

				        with open(output_file_path, "w") as output_file:

				            GENERATED = "generated"  # Note that please keep the variable GENERATED otherwise phabricator will hide the whole file

				            output_file.writelines([f"# @{GENERATED} DO NOT EDIT MANUALLY\n"])

				            try:

				                content = workflow_template.render(asdict(self))

				            except Exception as e:

				                print(f"Failed on template: {workflow_template}", file=sys.stderr)

				                raise e

				            output_file.write(content)

				            if content[-1] != "\n":

				                output_file.write("\n")

				        print(output_file_path)

				WINDOWS_WORKFLOWS = [

				    PyTorchWindowsWorkflow(

				        build_environment="pytorch-win-vs2019-cpu-py3",

				    CIWorkflow(

				        arch="windows",

				        build_environment="win-vs2019-cpu-py3",

				        cuda_version="cpu",

				        test_runner_type=WINDOWS_CPU_TEST_RUNNER,

				        on_pull_request=True

				        on_pull_request=True,

				        num_test_shards=2,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            run_on_canary=True,

				            labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_CPU, LABEL_CIFLOW_WIN}

				        ),

				    ),

				    PyTorchWindowsWorkflow(

				        build_environment="pytorch-win-vs2019-cuda10-cudnn7-py3",

				        cuda_version="10.1",

				    CIWorkflow(

				        arch="windows",

				        build_environment="win-vs2019-cuda10.2-py3",

				        cuda_version="10.2",

				        test_runner_type=WINDOWS_CUDA_TEST_RUNNER,

				        on_pull_request=True,

				        num_test_shards=2,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            trigger_action_only=True,

				            labels={LABEL_CIFLOW_CUDA, LABEL_CIFLOW_WIN}

				        ),

				    ),

				    PyTorchWindowsWorkflow(

				        build_environment="pytorch-win-vs2019-cuda11-cudnn8-py3",

				    CIWorkflow(

				        arch="windows",

				        build_environment="win-vs2019-cuda11.3-py3",

				        cuda_version="11.3",

				        test_runner_type=WINDOWS_CUDA_TEST_RUNNER,

				        num_test_shards=2,

				        on_pull_request=True,

				        only_run_smoke_tests_on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            run_on_canary=True,

				            labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_CUDA, LABEL_CIFLOW_WIN}

				        ),

				    ),

				    CIWorkflow(

				        arch="windows",

				        build_environment="periodic-win-vs2019-cuda11.1-py3",

				        cuda_version="11.1",

				        test_runner_type=WINDOWS_CUDA_TEST_RUNNER,

				    )

				        num_test_shards=2,

				        is_scheduled="45 0,4,8,12,16,20 * * *",

				        on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            trigger_action_only=True,

				            labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_WIN, LABEL_CIFLOW_CUDA}

				        ),

				    ),

				]

				LINUX_WORKFLOWS = [

				    PyTorchLinuxWorkflow(

				        build_environment="pytorch-linux-xenial-py3.6-gcc5.4",

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-xenial-py3.6-gcc5.4",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        on_pull_request=True,

				        enable_jit_legacy_test=1,

				        enable_doc_jobs=True,

				        enable_docs_test=1,

				        enable_backwards_compat_test=1,

				        num_test_shards=2,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            run_on_canary=True,

				            labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU}

				        ),

				    ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-paralleltbb-linux-xenial-py3.6-gcc5.4",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ParallelTBB does not have a maintainer and is currently flaky

				    # CIWorkflow(

				    #    arch="linux",

				    #    build_environment="paralleltbb-linux-xenial-py3.6-gcc5.4",

				    #    docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",

				    #    test_runner_type=LINUX_CPU_TEST_RUNNER,

				    #    # This is a master only job despite on_pull_request is set to True

				    #    on_pull_request=True,

				    #    ciflow_config=CIFlowConfig(

				    #        enabled=True,

				    #        trigger_action_only=True,

				    #        labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},

				    #    ),

				    # ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-parallelnative-linux-xenial-py3.6-gcc5.4",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-pure_torch-linux-xenial-py3.6-gcc5.4",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-linux-xenial-py3.6-gcc7",

				    CIWorkflow(

				        arch="linux",

				        build_environment="parallelnative-linux-xenial-py3.6-gcc5.4",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        # This is a master only job despite on_pull_request is set to True

				        on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            trigger_action_only=True,

				            labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},

				        ),

				    ),

				    # Build PyTorch with BUILD_CAFFE2=OFF

				    CIWorkflow(

				        arch="linux",

				        build_environment="puretorch-linux-xenial-py3.6-gcc5.4",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        exclude_test=True,

				        # This is a master only job despite on_pull_request is set to True

				        on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            trigger_action_only=True,

				            labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},

				        ),

				    ),

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-gcc7",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc7",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-linux-xenial-py3.6-clang5-asan",

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang5-asan",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-asan",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-linux-xenial-py3.6-clang7-onnx",

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang7-onnx",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang7-onnx",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    PyTorchLinuxWorkflow(

				        build_environment="pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7",

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-bionic-cuda10.2-py3.9-gcc7",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7",

				        test_runner_type=LINUX_CUDA_TEST_RUNNER,

				        num_test_shards=2,

				        on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            run_on_canary=True,

				            trigger_action_only=True,

				            labels={LABEL_CIFLOW_SLOW, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA}

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-xenial-cuda10.2-py3.6-gcc7",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",

				        test_runner_type=LINUX_CUDA_TEST_RUNNER,

				        enable_jit_legacy_test=1,

				        enable_multigpu_test=1,

				        enable_nogpu_no_avx_test=1,

				        enable_nogpu_no_avx2_test=1,

				        enable_slow_test=1,

				        num_test_shards=2,

				        on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            trigger_action_only=True,

				            labels=set([LABEL_CIFLOW_SLOW, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),

				        ),

				    ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-linux-xenial-cuda11.1-cudnn8-py3.6-gcc7",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",

				    #     test_runner_type=LINUX_CUDA_TEST_RUNNER,

				    # ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-libtorch-linux-xenial-cuda11.1-cudnn8-py3.6-gcc7",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",

				    #     test_runner_type=LINUX_CUDA_TEST_RUNNER,

				    # ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-linux-bionic-py3.6-clang9-noarch",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.6-clang9",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-xla-linux-bionic-py3.6-clang9",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.6-clang9",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-vulkan-linux-bionic-py3.6-clang9",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.6-clang9",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-linux-bionic-py3.8-gcc9-coverage",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.8-gcc9",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-linux-bionic-rocm3.9-py3.6",

				    CIWorkflow(

				        arch="linux",

				        build_environment="libtorch-linux-xenial-cuda10.2-py3.6-gcc7",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",

				        test_runner_type=LINUX_CUDA_TEST_RUNNER,

				        is_libtorch=True,

				        on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            trigger_action_only=True,

				            labels=set([LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-xenial-cuda11.3-py3.6-gcc7",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7",

				        test_runner_type=LINUX_CUDA_TEST_RUNNER,

				        num_test_shards=2,

				        on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            labels=set([LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="libtorch-linux-xenial-cuda11.3-py3.6-gcc7",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7",

				        test_runner_type=LINUX_CUDA_TEST_RUNNER,

				        is_libtorch=True,

				        on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            trigger_action_only=True,

				            labels=set([LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="periodic-linux-xenial-cuda11.1-py3.6-gcc7",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",

				        test_runner_type=LINUX_CUDA_TEST_RUNNER,

				        num_test_shards=2,

				        is_scheduled="45 0,4,8,12,16,20 * * *",

				        on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            trigger_action_only=True,

				            labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA}

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",

				        test_runner_type=LINUX_CUDA_TEST_RUNNER,

				        is_libtorch=True,

				        is_scheduled="45 0,4,8,12,16,20 * * *",

				        on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            trigger_action_only=True,

				            labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_CUDA},

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-bionic-py3.8-gcc9-coverage",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.8-gcc9",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        on_pull_request=True,

				        is_coverage=True,

				        num_test_shards=2,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_COVERAGE, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-bionic-py3.6-clang9",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.6-clang9",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        on_pull_request=True,

				        num_test_shards=2,

				        distributed_test=False,

				        enable_noarch_test=1,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_XLA, LABEL_CIFLOW_NOARCH},

				        ),

				    ),

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-bionic-rocm3.9-py3.6",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-rocm3.9-py3.6",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-linux-xenial-py3.6-clang5-android-ndk-r19c-x86_32",

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-x86_32",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-linux-xenial-py3.6-clang5-android-ndk-r19c-x86_64",

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-x86_64",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-linux-xenial-py3.6-clang5-android-ndk-r19c-arm-v7a",

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-arm-v7a",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-linux-xenial-py3.6-clang5-android-ndk-r19c-arm-v8a",

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-arm-v8a",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-linux-xenial-py3.6-clang5-mobile",

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang5-mobile",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-asan",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-linux-xenial-py3.6-clang5-mobile-custom-dynamic",

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang5-mobile-custom-dynamic",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-linux-xenial-py3.6-clang5-mobile-custom-static",

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang5-mobile-custom-static",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # PyTorchLinuxWorkflow(

				    #     build_environment="pytorch-linux-xenial-py3.6-clang5-mobile-code-analysis",

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang5-mobile-code-analysis",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				]

				BAZEL_WORKFLOWS = [

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-xenial-py3.6-gcc7-bazel-test",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BAZEL, LABEL_CIFLOW_CPU, LABEL_CIFLOW_LINUX},

				        ),

				    ),

				]

				if __name__ == "__main__":

				    jinja_env = jinja2.Environment(

				        variable_start_string="!{{",

				        loader=jinja2.FileSystemLoader(str(GITHUB_DIR.joinpath("templates"))),

				        undefined=jinja2.StrictUndefined,

				    )

				    template_and_workflows = [

				        (jinja_env.get_template("linux_ci_workflow.yml.j2"), LINUX_WORKFLOWS),

				        (jinja_env.get_template("windows_ci_workflow.yml.j2"), WINDOWS_WORKFLOWS)

				        (jinja_env.get_template("windows_ci_workflow.yml.j2"), WINDOWS_WORKFLOWS),

				        (jinja_env.get_template("bazel_ci_workflow.yml.j2"), BAZEL_WORKFLOWS),

				    ]

				    # Delete the existing generated files first, this should align with .gitattributes file description.

				    existing_workflows = GITHUB_DIR.glob("workflows/generated-*")

				    for w in existing_workflows:

				        try:

				            os.remove(w)

				        except Exception as e:

				            print(f"Error occurred when deleting file {w}: {e}")

				    ciflow_ruleset = CIFlowRuleset()

				    for template, workflows in template_and_workflows:

				        for workflow in workflows:

				            print(generate_workflow_file(workflow=workflow, workflow_template=template))

				            workflow.generate_workflow_file(workflow_template=template)

				            if workflow.ciflow_config.enabled:

				                ciflow_ruleset.add_label_rule(workflow.ciflow_config.labels, workflow.build_environment)

				            elif workflow.on_pull_request:

				                # If ciflow is disabled but still on_pull_request, we can denote

				                # it as a special label LABEL_CIFLOW_DEFAULT in the ruleset, which will be later

				                # turned into an actual LABEL_CIFLOW_DEFAULT label in the workflow.

				                # During the rollout phase, it has the same effect as LABEL_CIFLOW_DEFAULT

				                ciflow_ruleset.add_label_rule({LABEL_CIFLOW_DEFAULT}, workflow.build_environment)

				    ciflow_ruleset.generate_json()

									
										94

.github/scripts/generate_pytorch_test_matrix.py
									
										vendored
									
										Executable file
									
												View File
												
				@ -0,0 +1,94 @@

				#!/usr/bin/env python3

				"""Generates a matrix to be utilized through github actions

				Will output a matrix to represent our testing configurations, which is currently

				dictated by just sharding.

				"""

				import json

				import os

				import re

				from typing import Dict

				from typing_extensions import TypedDict

				class Config(TypedDict):

				    num_shards: int

				    runner: str

				def get_disabled_issues() -> str:

				    pr_body = os.getenv('PR_BODY', '')

				    # The below regex is meant to match all *case-insensitive* keywords that

				    # GitHub has delineated would link PRs to issues, more details here:

				    # https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue.

				    # E.g., "Close #62851", "fixES #62851" and "RESOLVED #62851" would all match, but not

				    # "closes  #62851" --> extra space, "fixing #62851" --> not a keyword, nor "fix 62851" --> no #

				    regex = '(?i)(Close(d|s)?|Resolve(d|s)?|Fix(ed|es)?) #([0-9]+)'

				    issue_numbers = [x[4] for x in re.findall(regex, pr_body)]

				    return ','.join(issue_numbers)

				def main() -> None:

				    TEST_RUNNER_TYPE = os.getenv('TEST_RUNNER_TYPE')

				    assert TEST_RUNNER_TYPE is not None

				    ON_PULL_REQUEST = os.getenv('GITHUB_HEAD_REF')

				    NUM_TEST_SHARDS_ON_PULL_REQUEST = os.getenv('NUM_TEST_SHARDS_ON_PULL_REQUEST')

				    NUM_TEST_SHARDS = int(os.getenv('NUM_TEST_SHARDS', '1'))

				    if ON_PULL_REQUEST and NUM_TEST_SHARDS_ON_PULL_REQUEST:

				        NUM_TEST_SHARDS = int(NUM_TEST_SHARDS_ON_PULL_REQUEST)

				    MULTIGPU_RUNNER_TYPE = os.getenv('MULTIGPU_RUNNER_TYPE')

				    NOGPU_RUNNER_TYPE = os.getenv('NOGPU_RUNNER_TYPE')

				    configs: Dict[str, Config] = {}

				    if os.getenv('ENABLE_JIT_LEGACY_TEST'):

				        configs['jit_legacy'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}

				    if MULTIGPU_RUNNER_TYPE is not None and os.getenv('ENABLE_MULTIGPU_TEST'):

				        configs['multigpu'] = {'num_shards': 1, 'runner': MULTIGPU_RUNNER_TYPE}

				    if NOGPU_RUNNER_TYPE is not None and os.getenv('ENABLE_NOGPU_NO_AVX_TEST'):

				        configs['nogpu_NO_AVX'] = {'num_shards': 1, 'runner': NOGPU_RUNNER_TYPE}

				    if NOGPU_RUNNER_TYPE is not None and os.getenv('ENABLE_NOGPU_NO_AVX2_TEST'):

				        configs['nogpu_NO_AVX2'] = {'num_shards': 1, 'runner': NOGPU_RUNNER_TYPE}

				    if os.getenv('ENABLE_DISTRIBUTED_TEST'):

				        configs['distributed'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}

				    if os.getenv('ENABLE_SLOW_TEST'):

				        configs['slow'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}

				    if os.getenv('ENABLE_DOCS_TEST'):

				        configs['docs_test'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}

				    if os.getenv('ENABLE_BACKWARDS_COMPAT_TEST'):

				        configs['backwards_compat'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}

				    if os.getenv('ENABLE_XLA_TEST'):

				        configs['xla'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}

				    if os.getenv('ENABLE_NOARCH_TEST'):

				        configs['noarch'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}

				    matrix = {

				        'include': [

				            {

				                'config': 'default',

				                'shard': shard,

				                'num_shards': NUM_TEST_SHARDS,

				                'runner': TEST_RUNNER_TYPE,

				            }

				            for shard in range(1, NUM_TEST_SHARDS + 1)

				        ] + [

				            {

				                'config': name,

				                'shard': shard,

				                'num_shards': config['num_shards'],

				                'runner': config['runner'],

				            }

				            for name, config in configs.items()

				            for shard in range(1, config['num_shards'] + 1)

				        ]

				    }

				    render_matrix = {'config': list(dict.fromkeys(x['config'] for x in matrix['include']))}

				    print(json.dumps({'matrix': matrix, 'render-matrix': render_matrix}, indent=2))

				    print(f'::set-output name=matrix::{json.dumps(matrix)}')

				    print(f'::set-output name=render-matrix::{json.dumps(render_matrix)}')

				    print(f'::set-output name=ignore-disabled-issues::{get_disabled_issues()}')

				if __name__ == "__main__":

				    main()

									
										6

.github/scripts/generate_pytorch_version.py
									
										vendored
									
												View File
												
				@ -65,6 +65,8 @@ class PytorchVersion:

				        self.no_build_suffix = no_build_suffix

				    def get_post_build_suffix(self) -> str:

				        if self.no_build_suffix:

				            return ""

				        if self.gpu_arch_type == "cuda":

				            return f"+cu{self.gpu_arch_version.replace('.', '')}"

				        return f"+{self.gpu_arch_type}{self.gpu_arch_version}"

				@ -87,9 +89,9 @@ def main() -> None:

				    )

				    parser.add_argument(

				        "--no-build-suffix",

				        type=strtobool,

				        action="store_true",

				        help="Whether or not to add a build suffix typically (+cpu)",

				        default=os.environ.get("NO_BUILD_SUFFIX", False)

				        default=strtobool(os.environ.get("NO_BUILD_SUFFIX", "False"))

				    )

				    parser.add_argument(

				        "--gpu-arch-type",

									
										11

.github/scripts/kill_active_ssh_sessions.ps1
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,11 @@

				function Get-SSH-Sessions {

				    Get-Process sshd -IncludeUserName |

				        Where-Object UserName -notLike "*SYSTEM*" |

				        Select-Object Id

				}

				$runningSessions = Get-SSH-Sessions

				foreach ($session in $runningSessions) {

				    Stop-Process -id $session.Id

				}

									
										30

.github/scripts/run_torchbench.py
									
										vendored
									
												View File
												
				@ -13,6 +13,7 @@ Testing environment:

				# 1. Does not reuse the build artifact in other CI workflows

				# 2. CI jobs are serialized because there is only one worker

				import os

				import git  # type: ignore[import]

				import pathlib

				import argparse

				import subprocess

				@ -23,6 +24,7 @@ CUDA_VERSION = "cu102"

				PYTHON_VERSION = "3.7"

				TORCHBENCH_CONFIG_NAME = "config.yaml"

				MAGIC_PREFIX = "RUN_TORCHBENCH:"

				MAGIC_TORCHBENCH_PREFIX = "TORCHBENCH_BRANCH:"

				ABTEST_CONFIG_TEMPLATE = """# This config is automatically generated by run_torchbench.py

				start: {control}

				end: {treatment}

				@ -57,7 +59,7 @@ def extract_models_from_pr(torchbench_path: str, prbody_file: str) -> List[str]:

				        lines = map(lambda x: x.strip(), pf.read().splitlines())

				        magic_lines = list(filter(lambda x: x.startswith(MAGIC_PREFIX), lines))

				        if magic_lines:

				            # Only the first magic line will be respected.

				            # Only the first magic line will be recognized.

				            model_list = list(map(lambda x: x.strip(), magic_lines[0][len(MAGIC_PREFIX):].split(",")))

				    # Shortcut: if model_list is ["ALL"], run all the tests

				    if model_list == ["ALL"]:

				@ -71,6 +73,26 @@ def extract_models_from_pr(torchbench_path: str, prbody_file: str) -> List[str]:

				            return []

				    return model_list

				def identify_torchbench_branch(torchbench_path: str, prbody_file: str) -> None:

				    branch_name: str

				    with open(prbody_file, "r") as pf:

				        lines = map(lambda x: x.strip(), pf.read().splitlines())

				        magic_lines = list(filter(lambda x: x.startswith(MAGIC_TORCHBENCH_PREFIX), lines))

				        if magic_lines:

				            # Only the first magic line will be recognized.

				            branch_name = magic_lines[0][len(MAGIC_TORCHBENCH_PREFIX):].strip()

				    # If not specified, directly return without the branch checkout

				    if not branch_name:

				        return

				    try:

				        print(f"Checking out the TorchBench branch: {branch_name} ...")

				        repo = git.Repo(torchbench_path)

				        origin = repo.remotes.origin

				        origin.fetch(branch_name)

				        repo.create_head(branch_name, origin.refs[branch_name]).checkout()

				    except git.exc.GitCommandError:

				        raise RuntimeError(f'{branch_name} doesn\'t exist in the pytorch/benchmark repository. Please double check.')

				def run_torchbench(pytorch_path: str, torchbench_path: str, output_dir: str) -> None:

				    # Copy system environment so that we will not override

				    env = dict(os.environ)

				@ -96,6 +118,12 @@ if __name__ == "__main__":

				    if not models:

				        print("Can't parse the model filter from the pr body. Currently we only support allow-list.")

				        exit(1)

				    # Identify the specified TorchBench branch, verify the branch exists, and checkout the branch

				    try:

				        identify_torchbench_branch(args.torchbench_path, args.pr_body)

				    except RuntimeError as e:

				        print(f"Identify TorchBench branch failed: {str(e)}")

				        exit(1)

				    print(f"Ready to run TorchBench with benchmark. Result will be saved in the directory: {output_dir}.")

				    # Run TorchBench with the generated config

				    torchbench_config = gen_abtest_config(args.pr_base_sha, args.pr_head_sha, models)

									
										17

.github/scripts/wait_for_ssh_to_drain.ps1
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,17 @@

				function Get-SSH-Users {

				    # Gets ssh sessions for all users not named SYSTEM

				    Get-CimInstance -ClassName Win32_Process -Filter "Name = 'sshd.exe'" |

				        Get-CimAssociatedInstance -Association Win32_SessionProcess |

				        Get-CimAssociatedInstance -Association Win32_LoggedOnUser |

				        Where-Object {$_.Name -ne 'SYSTEM'} |

				        Measure-Object

				}

				$usersLoggedOn = Get-SSH-Users

				Write-Output "Holding runner until all ssh sessions have logged out"

				while ($usersLoggedOn.Count -gt 0) {

				    $usersLoggedOn = Get-SSH-Users

				    Write-Output "."

				    Start-Sleep -s 5

				}

									
										13

.github/scripts/wait_for_ssh_to_drain.sh
									
										vendored
									
										Executable file
									
												View File
												
				@ -0,0 +1,13 @@

				#!/usr/bin/env bash

				set -eou pipefail

				echo "Holding runner for 2 hours until all ssh sessions have logged out"

				for _ in $(seq 1440); do

				    # Break if no ssh session exists anymore

				    if [ "$(who)" = "" ]; then

				      break

				    fi

				    echo "."

				    sleep 5

				done

137

.github/templates/bazel_ci_workflow.yml.j2 vendored Normal file

View File

 @ -0,0 +1,137 @@
 {%- extends "linux_ci_workflow.yml.j2" -%}
 {%- set exclude_test = true -%}
 {% block name -%}
 # Template is at:    .github/templates/bazel_ci_workflow.yml.j2
 # Generation script: .github/scripts/generate_ci_workflows.py
 name: !{{ build_environment }}
 {%- endblock %}
 on:
 {%- if on_pull_request %}
   pull_request:
   {%- if ciflow_config.enabled %}
     {%- if ciflow_config.trigger_action_only %}
     types: [!{{ ciflow_config.trigger_action }}]
     {%- else %}
     types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
     {%- endif %}
   {%- endif %}
 {%- else %}
   # TODO: Enable pull_request builds when we can verify capacity can be met by auto-scalers
 {%- endif %}
 {% block build +%}
   # building and testing in a single job since bazel runs only small subset of tests
   build-and-test:
     runs-on: !{{ test_runner_type }}
     needs: [calculate-docker-image, !{{ ciflow_config.root_job_name }}]
     env:
       DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
       JOB_BASE_NAME: !{{ build_environment }}-build-and-test
       NUM_TEST_SHARDS: !{{ num_test_shards }}
       CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
     steps:
       !{{ common.setup_ec2_linux() }}
       !{{ common.checkout_pytorch("recursive") }}
       - name: Pull docker image
         run: |
           docker pull "${DOCKER_IMAGE}"
       - name: Determine shm-size
         run: |
           shm_size="1g"
           case "${BUILD_ENVIRONMENT}" in
             *cuda*)
               shm_size="2g"
               ;;
             *rocm*)
               shm_size="8g"
               ;;
           esac
           echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
       - name: Output disk space left
         run: |
           sudo df -H
       - name: Preserve github env variables for use in docker
         run: |
           env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
       - name: Build
         run: |
           # detached container should get cleaned up by teardown_ec2_linux
           container_name=$(docker run \
             -e BUILD_ENVIRONMENT \
             -e JOB_BASE_NAME \
             -e MAX_JOBS="$(nproc --ignore=2)" \
             -e SCCACHE_BUCKET \
             -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
             -e PR_LABELS \
             -e SKIP_SCCACHE_INITIALIZATION=1 \
             -e TORCH_CUDA_ARCH_LIST \
             -e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
             --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
             --security-opt seccomp=unconfined \
             --cap-add=SYS_PTRACE \
             --tty \
             --detach \
             --user jenkins \
             -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
             -w /var/lib/jenkins/workspace \
             "${DOCKER_IMAGE}"
           )
           docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && sudo chown -R jenkins /dev && .jenkins/pytorch/build.sh'
       !{{ common.parse_ref() }}
       - name: Display and upload binary build size statistics (Click Me)
         # temporary hack: set CIRCLE_* vars, until we update
         # tools/stats/print_test_stats.py to natively support GitHub Actions
         env:
           AWS_DEFAULT_REGION: us-east-1
           SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
           CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
           CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
           CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
           CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
           CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
         run: |
           COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
           export COMMIT_TIME
           pip3 install requests==2.26
           python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
       - name: Test
         run: |
           # detached container should get cleaned up by teardown_ec2_linux
           export SHARD_NUMBER=0
           # TODO: Stop building test binaries as part of the build phase
           # Make sure we copy test results from bazel-testlogs symlink to
           # a regular directory ./test/test-reports
           container_name=$(docker run \
             -e BUILD_ENVIRONMENT \
             -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
             -e GITHUB_ACTIONS \
             -e IN_CI \
             -e SHARD_NUMBER \
             -e JOB_BASE_NAME \
             -e MAX_JOBS="$(nproc --ignore=2)" \
             -e SCCACHE_BUCKET \
             -e CONTINUE_THROUGH_ERROR \
             -e PR_LABELS \
             -e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
             --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
             --security-opt seccomp=unconfined \
             --cap-add=SYS_PTRACE \
             --shm-size="${SHM_SIZE}" \
             --tty \
             --detach \
             --user jenkins \
             -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
             -w /var/lib/jenkins/workspace \
             "${DOCKER_IMAGE}"
           )
           docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && sudo chown -R jenkins /dev && .jenkins/pytorch/test.sh && cp -Lr ./bazel-testlogs ./test/test-reports'
       - name: Chown workspace
         if: always()
         run: |
           # Ensure the working directory gets chowned back to the current user
           docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
       !{{ common.upload_test_reports(name='bazel') }}
       !{{ common.upload_test_statistics(build_environment) }}
       !{{ common.teardown_ec2_linux() }}
 {%- endblock %}

186

.github/templates/common.yml.j2 vendored Normal file

View File

 @ -0,0 +1,186 @@
 {%- set upload_artifact_s3_action = "seemethere/upload-artifact-s3@v3" -%}
 {# squid_proxy is an private ELB that only available for GHA custom runners #}
 {%- set squid_proxy    = "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -%}
 {# squid_no_proxy is a list of common set of fixed domains or IPs that we don't need to proxy. See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/http_proxy_config.html#windows-proxy #}
 {%- set squid_no_proxy = "localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" -%}
 {%- macro concurrency(build_environment) -%}
 concurrency:
   group: !{{ build_environment }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
   cancel-in-progress: true
 {%- endmacro -%}
 {%- macro display_ec2_information() -%}
       - name: Display EC2 information
         shell: bash
         run: |
           set -euo pipefail
           function get_ec2_metadata() {
             # Pulled from instance metadata endpoint for EC2
             # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
             category=$1
             curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
           }
           echo "ami-id: $(get_ec2_metadata ami-id)"
           echo "instance-id: $(get_ec2_metadata instance-id)"
           echo "instance-type: $(get_ec2_metadata instance-type)"
 {%- endmacro -%}
 {%- macro parse_ref() -%}
       - name: Parse ref
         id: parse-ref
         run: .github/scripts/parse_ref.py
 {%- endmacro -%}
 {%- macro upload_test_statistics(build_environment) -%}
       - name: Display and upload test statistics (Click Me)
         if: always()
         # temporary hack: set CIRCLE_* vars, until we update
         # tools/stats/print_test_stats.py to natively support GitHub Actions
         env:
           AWS_DEFAULT_REGION: us-east-1
           CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
           JOB_BASE_NAME: !{{ build_environment }}-test
           CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
           CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
           CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
           CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
         shell: bash
         run: |
           python3 -m pip install -r requirements.txt
           python3 -m pip install boto3==1.16.34
           python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
 {%- endmacro -%}
 {%- macro setup_ec2_linux() -%}
       !{{ display_ec2_information() }}
       - name: Log in to ECR
         env:
           AWS_RETRY_MODE: standard
           AWS_MAX_ATTEMPTS: 5
         run: |
           aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
           bash /tmp/ecr-login.sh
           rm /tmp/ecr-login.sh
       - name: Chown workspace
         env:
           ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
         run: |
           retry () {
               "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")
           }
           # Ensure the working directory gets chowned back to the current user
           retry docker pull "${ALPINE_IMAGE}"
           docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
       - name: Clean workspace
         run: |
           rm -rf "${GITHUB_WORKSPACE:?}/*"
           rm -f ~/.ssh/authorized_keys
       - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
         uses: seemethere/add-github-ssh-key@v1
         with:
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
       - name: Preserve github env variables for use in docker
         run: |
           env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
 {%- endmacro -%}
 {%- macro teardown_ec2_linux() -%}
       - name: Hold runner for 2 hours or until ssh sessions have drained
         # Always hold for active ssh sessions
         if: always()
         run: .github/scripts/wait_for_ssh_to_drain.sh
       - name: Chown workspace
         if: always()
         env:
           ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
         run: |
           # Ensure the working directory gets chowned back to the current user
           docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
       - name: Kill containers, clean up images
         if: always()
         run: |
           # ignore expansion of "docker ps -q" since it could be empty
           # shellcheck disable=SC2046
           docker stop $(docker ps -q) || true
           # Prune all of the docker images
           docker system prune -af
 {%- endmacro -%}
 {%- macro checkout_pytorch(submodules) -%}
       - name: Checkout PyTorch
         uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
         with:
           # deep clone, to allow use of git merge-base
           fetch-depth: 0
           submodules: !{{ submodules }}
 {%- endmacro -%}
 {%- macro upload_test_reports(name) -%}
       - name: Zip test reports for upload
         if: always()
         env:
 {%- if name == 'linux' or name == 'windows' %}
           FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
 {%- else %}
           FILE_SUFFIX: '!{{ name }}-${{ github.job }}'
 {%- endif %}
 {%- if name == 'windows' %}
         shell: powershell
         run: |
           # -ir => recursive include all files in pattern
 z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml'
 {%- else %}
         run: |
           # Remove any previous test reports if they exist
           rm -f test-reports-*.zip
           zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
 {%- endif %}
       - uses: actions/upload-artifact@v2
         name: Store Test Reports
         if: always()
         with:
 {%- if name == 'linux' or name == 'windows' %}
           name: test-reports-${{ matrix.config }}
 {%- else %}
           name: test-reports-!{{ name }}
 {%- endif %}
           retention-days: 14
           if-no-files-found: error
           path:
 {%- if name == 'windows' %}
             pytorch-${{ github.run_id }}/test-reports-*.zip
 {%- else %}
             test-reports-*.zip
 {%- endif %}
       - uses: !{{ upload_artifact_s3_action }}
         name: Store Test Reports on S3
         if: always()
         with:
           retention-days: 14
           if-no-files-found: error
           path:
 {%- if name == 'windows' %}
             pytorch-${{ github.run_id }}/test-reports-*.zip
 {%- else %}
             test-reports-*.zip
 {%- endif %}
 {%- endmacro -%}
 {%- macro render_test_results() -%}
       - name: Install render_test_results dependencies
         if: always()
         shell: bash
         run: |
           python3 -m pip install junitparser==2.1.1 rich==10.9.0
       - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
         if: always()
         shell: bash
         # Encoding is weird on windows, just try to default to utf-8 if possible
         env:
           PYTHONIOENCODING: "utf-8"
         run: |
           python3 tools/render_junit.py test/
 {%- endmacro -%}

412

.github/templates/linux_ci_workflow.yml.j2 vendored

View File

 @ -1,55 +1,79 @@
 {% import 'common.yml.j2' as common %}
 {%- block name -%}
 # Template is at:    .github/templates/linux_ci_workflow.yml.j2
 # Generation script: .github/scripts/generate_ci_workflows.py
 name: Linux CI (!{{ build_environment }})
 name: !{{ build_environment }}
 {%- endblock %}
 on:
   # TODO: Enable pull_request builds when we can verify capacity can be met by auto-scalers
 {%- if on_pull_request %}
   pull_request:
   {%- if ciflow_config.enabled %}
     {%- if ciflow_config.trigger_action_only %}
     types: [!{{ ciflow_config.trigger_action }}]
     {%- else %}
     types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
     {%- endif %}
   {%- endif %}
 {%- else %}
   # TODO: Enable pull_request builds when we can verify capacity can be met by auto-scalers
 {%- endif %}
 {%- if is_scheduled %}
   schedule:
     - cron: !{{ is_scheduled }}
 {%- else %}
   push:
     branches:
       - master
       - release/*
 {%- endif %}
   workflow_dispatch:
 env:
   BUILD_ENVIRONMENT: !{{ build_environment }}
   DOCKER_IMAGE_BASE: !{{ docker_image_base }}
   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
   TORCH_CUDA_ARCH_LIST: 5.2
   IN_CI: 1
   # This is used for the phase of adding wheel tests only, will be removed once completed
   IN_WHEEL_TEST: 1
   # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
   CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
   ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
   PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
   GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
 concurrency:
   group: !{{ build_environment }}-${{ github.event.pull_request.number || github.sha }}
   cancel-in-progress: true
 !{{ common.concurrency(build_environment) }}
 jobs:
 {%- if ciflow_config.enabled %}
   !{{ ciflow_config.root_job_name }}:
     runs-on: ubuntu-18.04
     if: ${{ !{{ ciflow_config.root_job_condition }} }}
     env:
       LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
     steps:
       - name: noop
         run: echo running !{{ ciflow_config.root_job_name }}
       - name: print labels
         run: echo "${LABELS}"
 {%- endif %}
   calculate-docker-image:
     runs-on: linux.2xlarge
     {%- if ciflow_config.enabled %}
     needs: [!{{ ciflow_config.root_job_name }}]
     {%- endif %}
     env:
       DOCKER_BUILDKIT: 1
     timeout-minutes: 90
     outputs:
       docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
     steps:
       - name: Log in to ECR
         run: |
           aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
           bash /tmp/ecr-login.sh
           rm /tmp/ecr-login.sh
       - name: Chown workspace
         run: |
           # Ensure the working directory gets chowned back to the current user
           docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
       - name: Checkout PyTorch
         uses: actions/checkout@v2
         with:
           # deep clone, to allow use of git merge-base
           fetch-depth: 0
       !{{ common.setup_ec2_linux() }}
       !{{ common.checkout_pytorch("false") }}
       - name: Calculate docker image tag
         id: calculate-tag
         run: |
 @ -89,93 +113,78 @@ jobs:
           fi
           echo ::set-output name=rebuild::yes
       - name: Build and push docker image
         if: steps.check.outputs.rebuild
         if: ${{ steps.check.outputs.rebuild }}
         env:
           DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
           DOCKER_SKIP_S3_UPLOAD: 1
         run: |
           export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
           cd .circleci/docker && ./build_docker.sh
 {% block build +%}
   build:
     runs-on: linux.2xlarge
     needs: calculate-docker-image
     needs: [calculate-docker-image, !{{ ciflow_config.root_job_name }}]
     env:
       DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
       JOB_BASE_NAME: !{{ build_environment }}-build
     steps:
       - name: Log in to ECR
         run: |
           aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
           bash /tmp/ecr-login.sh
           rm /tmp/ecr-login.sh
       - name: Chown workspace
         run: |
           # Ensure the working directory gets chowned back to the current user
           docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
       - name: Checkout PyTorch
         uses: actions/checkout@v2
         with:
           fetch-depth: 0 # deep clone, to allow sharding to use git rev-list
           submodules: recursive
       !{{ common.setup_ec2_linux() }}
       !{{ common.checkout_pytorch("recursive") }}
       - name: Pull docker image
         run: |
           docker pull "${DOCKER_IMAGE}"
       - name: Preserve github env variables for use in docker
       - name: Build
         run: |
           env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
       - name: Build PyTorch
         run: |
           docker run \
           # detached container should get cleaned up by teardown_ec2_linux
           container_name=$(docker run \
             -e BUILD_ENVIRONMENT \
             -e JOB_BASE_NAME \
             -e MAX_JOBS="$(nproc --ignore=2)" \
             -e SCCACHE_BUCKET \
             -e XLA_CLANG_CACHE_S3_BUCKET_NAME \
             -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
             -e SKIP_SCCACHE_INITIALIZATION=1 \
             -e TORCH_CUDA_ARCH_LIST \
             -e PR_LABELS \
             -e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
             --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
             --security-opt seccomp=unconfined \
             --cap-add=SYS_PTRACE \
             --tty \
             --detach \
             --user jenkins \
             -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
             -w /var/lib/jenkins/workspace \
             "${DOCKER_IMAGE}" \
             sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
             "${DOCKER_IMAGE}"
           )
           docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
       !{{ common.parse_ref() }}
       - name: Display and upload binary build size statistics (Click Me)
         # temporary hack: set CIRCLE_* vars, until we update
         # tools/print_test_stats.py to natively support GitHub Actions
         # tools/stats/print_test_stats.py to natively support GitHub Actions
         env:
           AWS_DEFAULT_REGION: us-east-1
           IS_GHA: 1
           SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
           CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
           CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
           CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
           CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
           CIRCLE_WORKFLOW_ID: ${{ github.run_id }} # dunno if this corresponds
           CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
         run: |
           export PYTHONPATH=$PWD
           COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
           export COMMIT_TIME
           pip3 install requests
           python3 .circleci/scripts/upload_binary_size_to_scuba.py || exit 0
           pip3 install requests==2.26 boto3==1.16.34
           python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
       - name: Chown workspace
         run: |
           # Ensure the working directory gets chowned back to the current user
           docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
       {%- if not is_libtorch %}
       - name: Archive artifacts into zip
         run: |
           zip -r artifacts.zip dist/ build/
       # Upload to github so that people can click and download artifacts
       - uses: actions/upload-artifact@v2
         # Don't fail on upload to GH since it's only for user convenience
         continue-on-error: true
         name: Store PyTorch Build Artifacts on Github
         with:
           name: ${{ env.BUILD_ENVIRONMENT }}
           retention-days: 14
           if-no-files-found: error
           path:
             artifacts.zip
       - uses: seemethere/upload-artifact-s3@9d7ceb0ab39c2c88d93ef7792b27425b27d59162
           zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
       - uses: !{{ common.upload_artifact_s3_action }}
         name: Store PyTorch Build Artifacts on S3
         with:
           name: ${{ env.BUILD_ENVIRONMENT }}
 @ -183,38 +192,78 @@ jobs:
           if-no-files-found: error
           path:
             artifacts.zip
       {%- endif %}
       !{{ common.teardown_ec2_linux() }}
       - name: Hold runner for 2 hours or until ssh sessions have drained
         # Always hold for active ssh sessions
         if: always()
         run: .github/scripts/wait_for_ssh_to_drain.sh
       - name: Clean up docker images
         if: always()
         run: |
           # Prune all of the docker images
           docker system prune -af
 {%- endblock %}
 {%- if not exclude_test %}
 {% block test +%}
   generate-test-matrix:
     runs-on: ubuntu-18.04
     {%- if ciflow_config.enabled %}
     needs: [!{{ ciflow_config.root_job_name }}]
     {%- endif %}
     env:
       TEST_RUNNER_TYPE: !{{ test_runner_type }}
       ENABLE_DISTRIBUTED_TEST: !{{ enable_distributed_test }}
       ENABLE_JIT_LEGACY_TEST: !{{ enable_jit_legacy_test }}
       ENABLE_MULTIGPU_TEST: !{{ enable_multigpu_test }}
       ENABLE_NOGPU_NO_AVX_TEST: !{{ enable_nogpu_no_avx_test }}
       ENABLE_NOGPU_NO_AVX2_TEST: !{{ enable_nogpu_no_avx2_test }}
       ENABLE_SLOW_TEST: !{{ enable_slow_test }}
       ENABLE_DOCS_TEST: !{{ enable_docs_test }}
       ENABLE_BACKWARDS_COMPAT_TEST: !{{ enable_backwards_compat_test }}
       ENABLE_XLA_TEST: !{{ enable_xla_test }}
       ENABLE_NOARCH_TEST: !{{ enable_noarch_test }}
       NUM_TEST_SHARDS: !{{ num_test_shards }}
       MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
       NOGPU_RUNNER_TYPE: linux.2xlarge
       PR_BODY: ${{ github.event.pull_request.body }}
     outputs:
       matrix: ${{ steps.set-matrix.outputs.matrix }}
       render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
       ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
     container:
       image: python:3.9
     steps:
       - name: Install dependencies
         run: pip install typing-extensions==3.10
       - name: Clone pytorch/pytorch
         uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
       - name: Generating test matrix
         id: set-matrix
         run: .github/scripts/generate_pytorch_test_matrix.py
   test:
     runs-on: !{{ test_runner_type }}
     needs:
       - calculate-docker-image
       - build
     needs: [calculate-docker-image, build, generate-test-matrix, !{{ ciflow_config.root_job_name }}]
     strategy:
       matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
       fail-fast: false
     runs-on: ${{ matrix.runner }}
     env:
       DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
       JOB_BASE_NAME: !{{ build_environment }}-test
       TEST_CONFIG: ${{ matrix.config }}
       SHARD_NUMBER: ${{ matrix.shard }}
       NUM_TEST_SHARDS: ${{ matrix.num_shards }}
       PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
       CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
     steps:
       - name: Log in to ECR
         run: |
           aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
           bash /tmp/ecr-login.sh
           rm /tmp/ecr-login.sh
       - name: Chown workspace
         run: |
           # Ensure the working directory gets chowned back to the current user
           docker run --rm -v "$(pwd)/../":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
       - name: Checkout PyTorch
         uses: actions/checkout@v2
         with:
           submodules: recursive
       !{{ common.setup_ec2_linux() }}
       !{{ common.checkout_pytorch("recursive") }}
       - name: Pull docker image
         run: |
           docker pull "${DOCKER_IMAGE}"
       - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
         if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') }}
         if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
         run: |
           bash .github/scripts/install_nvidia_utils_linux.sh
           echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"
 @ -240,134 +289,98 @@ jobs:
       - name: Output disk space left
         run: |
           sudo df -H
       - name: Preserve github env variables for use in docker
         run: |
           env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
       - name: Test PyTorch
       !{{ common.parse_ref() }}
       - name: Test
         env:
           PR_NUMBER: ${{ github.event.pull_request.number }}
           IS_GHA: 1
           CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
           CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
           CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
           AWS_DEFAULT_REGION: us-east-1
         run: |
           if [[ $TEST_CONFIG == 'multigpu' ]]; then
             TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
           else
             TEST_COMMAND=.jenkins/pytorch/test.sh
           fi
           if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
             export SHARD_NUMBER=0
           fi
           # detached container should get cleaned up by teardown_ec2_linux
           # TODO: Stop building test binaries as part of the build phase
           # Used for GPU_FLAG since that doesn't play nice
           # shellcheck disable=SC2086
           docker run \
           container_name=$(docker run \
             ${GPU_FLAG:-} \
             -e BUILD_ENVIRONMENT \
             -e PR_NUMBER \
             -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
             -e GITHUB_ACTIONS \
             -e IN_CI \
             -e IS_GHA \
             -e CIRCLE_BRANCH \
             -e CIRCLE_SHA1 \
             -e CIRCLE_PR_NUMBER \
             -e AWS_DEFAULT_REGION \
             -e IN_WHEEL_TEST \
             -e SHARD_NUMBER \
             -e JOB_BASE_NAME \
             -e TEST_CONFIG \
             -e NUM_TEST_SHARDS \
             -e PYTORCH_IGNORE_DISABLED_ISSUES \
             -e PR_LABELS \
             -e CONTINUE_THROUGH_ERROR \
             -e MAX_JOBS="$(nproc --ignore=2)" \
             -e SCCACHE_BUCKET \
             -e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
             -e XLA_CLANG_CACHE_S3_BUCKET_NAME \
             --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
             --security-opt seccomp=unconfined \
             --cap-add=SYS_PTRACE \
             --shm-size="${SHM_SIZE}" \
             --tty \
             --detach \
             --name="${container_name}" \
             --user jenkins \
             -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
             -w /var/lib/jenkins/workspace \
             "${DOCKER_IMAGE}" \
             sh -c 'sudo chown -R jenkins . && pip install dist/*.whl && .jenkins/pytorch/test.sh'
             "${DOCKER_IMAGE}"
           )
           docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"
       - name: Chown workspace
         if: always()
         run: |
           # Ensure the working directory gets chowned back to the current user
           docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
       - uses: actions/upload-artifact@v2
         name: Store PyTorch Test Reports
         if: always()
         with:
           name: test-reports
           retention-days: 14
           if-no-files-found: error
           path:
             test/**/*.xml
       - name: Clean up docker images
         if: always()
       !{{ common.render_test_results() }}
       {%- if is_coverage %}
       - name: Report coverage
         run: |
           # Ensure the working directory gets chowned back to the current user
           docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
           # Prune all of the docker images
           docker system prune -af
   # this is a separate step from test because the log files from test are too
   # long: basically, GitHub tries to render all of the log files when you click
   # through an action causing extreme slowdown on actions that contain too many
   # logs (like test); we can always move it back to the other one, but it
   # doesn't create the best experience
   render_test_results:
     if: always()
     needs:
       - test
     runs-on: ubuntu-18.04
     steps:
       - name: Checkout PyTorch
         uses: actions/checkout@v2
         with:
           # deep clone, to allow tools/print_test_stats.py to use Git commands
           fetch-depth: 0
       - uses: actions/download-artifact@v2
         name: Download PyTorch Test Reports
         with:
           name: test-reports
           path: test/test-reports
       - uses: actions/setup-python@v2
         with:
           python-version: 3.9
       - name: Install dependencies
         # boto3 version copied from .circleci/docker/common/install_conda.sh
         run: |
           pip install -r requirements.txt
           pip install boto3==1.16.34 junitparser rich
       - name: Output Test Results (Click Me)
         run: |
           python tools/render_junit.py test
       - name: Parse ref
         id: parse-ref
         run: .github/scripts/parse_ref.py
       - name: Display and upload test statistics (Click Me)
         # temporary hack: set CIRCLE_* vars, until we update
         # tools/print_test_stats.py to natively support GitHub Actions
         env:
           SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
           AWS_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_ACCESS_KEY_ID }}
           AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_SECRET_ACCESS_KEY }}
           CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
           CIRCLE_JOB: !{{ build_environment }}
           CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
           CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
           CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
           CIRCLE_WORKFLOW_ID: ${{ github.run_id }} # dunno if this corresponds
         run: |
           export PYTHONPATH=$PWD
           python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test
   {%- if enable_doc_jobs %}
   pytorch_python_doc_build:
           python3 -mpip install codecov==2.1.12
           python3 -mcodecov
       {%- endif %}
       !{{ common.upload_test_reports(name='linux') }}
       !{{ common.upload_test_statistics(build_environment) }}
       !{{ common.teardown_ec2_linux() }}
 {% endblock %}
 {%- endif -%}
 {%- if enable_doc_jobs %}
   build-docs:
     runs-on: linux.2xlarge
     needs:
       - calculate-docker-image
       - build
     strategy:
       matrix:
         docs_type: [cpp, python]
     needs: [calculate-docker-image, build, !{{ ciflow_config.root_job_name }}]
     env:
       DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
       DOCS_TYPE: ${{ matrix.docs_type }}
     steps:
       - name: Log in to ECR
         run: |
           aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
           bash /tmp/ecr-login.sh
           rm /tmp/ecr-login.sh
       - name: Chown workspace
         run: |
           # Ensure the working directory gets chowned back to the current user
           docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
       - name: Checkout PyTorch
         uses: actions/checkout@v2
         with:
           fetch-depth: 0 # deep clone, to allow sharding to use git rev-list
           submodules: recursive
       !{{ common.setup_ec2_linux() }}
       !{{ common.checkout_pytorch("recursive") }}
       - name: Pull docker image
         run: |
           docker pull "${DOCKER_IMAGE}"
       - name: Preserve github env variables for use in docker
         run: |
           env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
       - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
         name: Download PyTorch Build Artifacts
         with:
 @ -375,45 +388,64 @@ jobs:
       - name: Unzip artifacts
         run: |
           unzip -o artifacts.zip
       - name: Build Python Doc in Docker
       - name: Build ${{ matrix.docs_type }} docs
         run: |
           set -ex
           time docker pull "${DOCKER_IMAGE}" > /dev/null
           echo "${GITHUB_REF}"
           ref=${GITHUB_REF##*/}
           target=${ref//v}
           docker run \
           # detached container should get cleaned up by teardown_ec2_linux
           container_name=$(docker run \
             -e BUILD_ENVIRONMENT \
             -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
             -e IN_CI \
             -e MAX_JOBS="$(nproc --ignore=2)" \
             -e CIRCLE_SHA1="$GITHUB_SHA" \
             -e DOCS_VERSION="${target}" \
             -e DOCS_TYPE \
             -e PR_LABELS \
             --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
             --security-opt seccomp=unconfined \
             --cap-add=SYS_PTRACE \
             --name="$GITHUB_SHA" \
             --tty \
             --detach \
             --user jenkins \
             -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
             -w /var/lib/jenkins/workspace \
             "${DOCKER_IMAGE}" \
             bash -c "sudo chown -R jenkins . && pip install dist/*.whl && ./.circleci/scripts/python_doc_push_script.sh docs/$target $target site"
             "${DOCKER_IMAGE}"
           )
           docker exec -t "${container_name}" bash -c "sudo chown -R jenkins . && pip install dist/*.whl && ./.circleci/scripts/${DOCS_TYPE}_doc_push_script.sh"
       - name: Chown workspace
         run: |
           # Ensure the working directory gets chowned back to the current user
           docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
       - uses: !{{ common.upload_artifact_s3_action }}
         name: Upload Python Docs Preview
         if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'python' }}
         with:
           retention-days: 14
           s3-bucket: doc-previews
           if-no-files-found: error
           path: pytorch.github.io/docs/merge/
           s3-prefix: pytorch/${{ github.event.pull_request.number }}
       - uses: !{{ common.upload_artifact_s3_action }}
         name: Upload C++ Docs Preview
         if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'cpp' }}
         with:
           retention-days: 14
           if-no-files-found: error
           s3-bucket: doc-previews
           path: cppdocs/
           s3-prefix: pytorch/${{ github.event.pull_request.number }}/cppdocs
       - name: Archive artifacts into zip
         run: |
           zip -r pytorch_github_io.zip "${GITHUB_WORKSPACE}/pytorch.github.io"
           zip -r "docs_${DOCS_TYPE}.zip" "${GITHUB_WORKSPACE}/pytorch.github.io" "${GITHUB_WORKSPACE}/cppdocs"
       - uses: actions/upload-artifact@v2
         name: Store PyTorch Build Artifacts
         with:
           name: pytorch_github_io
           name: docs_${{ matrix.docs_type }}
           path: docs_${{ matrix.docs_type }}.zip
           if-no-files-found: error
           path: pytorch_github_io.zip
       - name: Clean up docker images
         if: always()
         run: |
           # Prune all of the docker images
           docker system prune -af
   {%- endif -%}
       !{{ common.teardown_ec2_linux() }}
 {%- endif -%}

227

.github/templates/windows_ci_workflow.yml.j2 vendored

View File

 @ -1,15 +1,43 @@
 {% import 'common.yml.j2' as common %}
 {%- macro wait_and_kill_ssh() -%}
       - name: Wait until all sessions have drained
         shell: powershell
         if: always()
         timeout-minutes: 120
         run: |
           .github\scripts\wait_for_ssh_to_drain.ps1
       - name: Kill active ssh sessions if still around (Useful if workflow was cancelled)
         shell: powershell
         if: always()
         run: |
           .github\scripts\kill_active_ssh_sessions.ps1
 {%- endmacro -%}
 # Template is at:    .github/templates/windows_ci_workflow.yml.j2
 # Generation script: .github/scripts/generate_ci_workflows.py
 name: Windows CI (!{{ build_environment }})
 name: !{{ build_environment }}
 on:
 {%- if on_pull_request %}
   pull_request:
   {%- if ciflow_config.enabled %}
     {%- if ciflow_config.trigger_action_only %}
     types: [!{{ ciflow_config.trigger_action }}]
     {%- else %}
     types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
     {%- endif %}
   {%- endif %}
 {%- endif %}
 {%- if is_scheduled %}
   schedule:
     - cron: !{{ is_scheduled }}
 {%- else %}
   push:
     branches:
       - master
       - release/*
 {%- endif %}
   workflow_dispatch:
 env:
 @ -18,33 +46,56 @@ env:
   CUDA_VERSION: "!{{ cuda_version }}"
   IN_CI: 1
   INSTALL_WINDOWS_SDK: 1
   JOB_BASE_NAME: test
   PYTHON_VERSION: "3.8"
   PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
   SCCACHE_BUCKET: "ossci-compiler-cache"
   VC_PRODUCT: "BuildTools"
   VC_VERSION: ""
   VS_VERSION: "16.8.6"
   VC_YEAR: "2019"
   ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
   no_proxy: !{{ common.squid_no_proxy }}
 {%- if cuda_version != "cpu" %}
   TORCH_CUDA_ARCH_LIST: "7.0"
   USE_CUDA: 1
 {%- endif %}
 concurrency:
   group: !{{ build_environment }}-${{ github.event.pull_request.number || github.sha }}
   cancel-in-progress: true
 !{{ common.concurrency(build_environment) }}
 jobs:
 {%- if ciflow_config.enabled %}
   !{{ ciflow_config.root_job_name }}:
     runs-on: ubuntu-18.04
     if: ${{ !{{ ciflow_config.root_job_condition }} }}
     steps:
       - name: noop
         run: echo running !{{ ciflow_config.root_job_name }}
 {%- endif %}
   build:
     runs-on: "windows.4xlarge"
     defaults:
       run:
         working-directory: pytorch-${{ github.run_id }}
     {%- if ciflow_config.enabled %}
     needs: [!{{ ciflow_config.root_job_name }}]
     {%- endif %}
     env:
       JOB_BASE_NAME: !{{ build_environment }}-build
       http_proxy: "!{{ common. squid_proxy }}"
       https_proxy: "!{{ common.squid_proxy }}"
     steps:
       - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
         uses: seemethere/add-github-ssh-key@v1
         with:
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
       - name: Checkout PyTorch
         uses: actions/checkout@v2
         uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
         with:
           submodules: recursive
       - name: Clean workspace (including things in .gitignore)
         shell: bash
         run: |
           git clean -xdf
           path: pytorch-${{ github.run_id }}
           # deep clone, to allow use of git merge-base
           fetch-depth: 0
       !{{ common.display_ec2_information() }}
       - name: Install Visual Studio 2019 toolchain
         shell: powershell
         run: |
 @ -61,6 +112,8 @@ jobs:
 {%- endif %}
       - name: Build
         shell: bash
         env:
           PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
         run: |
           .jenkins/pytorch/win-build.sh
       # Upload to github so that people can click and download artifacts
 @ -73,31 +126,86 @@ jobs:
           retention-days: 14
           if-no-files-found: error
           name: ${{ env.BUILD_ENVIRONMENT }}
           path: C:\w\build-results
           path: C:\${{ github.run_id }}\build-results
       - name: Upload artifacts to s3
         if: always()
         uses: seemethere/upload-artifact-s3@9d7ceb0ab39c2c88d93ef7792b27425b27d59162
         uses: !{{ common.upload_artifact_s3_action }}
         with:
           retention-days: 14
           if-no-files-found: error
           name: ${{ env.BUILD_ENVIRONMENT }}
           path: C:\w\build-results
           path: C:\${{ github.run_id }}\build-results
       !{{ wait_and_kill_ssh() }}
       - name: Cleanup build-results and workspaces
         if: always()
         shell: bash
         env:
           PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
         # Should remove the entirety of pytorch-${{ github.run_id }}
         run: |
           rm -rf "${PYTORCH_FINAL_PACKAGE_DIR}"
           rm -rf ./*
   generate-test-matrix:
     {%- if ciflow_config.enabled %}
     needs: [!{{ ciflow_config.root_job_name }}]
     {%- endif %}
     runs-on: ubuntu-18.04
     env:
       TEST_RUNNER_TYPE: !{{ test_runner_type }}
       NUM_TEST_SHARDS: !{{ num_test_shards }}
       NUM_TEST_SHARDS_ON_PULL_REQUEST: !{{ num_test_shards_on_pull_request }}
       PR_BODY: ${{ github.event.pull_request.body }}
     outputs:
       matrix: ${{ steps.set-matrix.outputs.matrix }}
       render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
       ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
     container:
       image: python:3.9
     steps:
       - name: Install dependencies
         run: pip install typing-extensions==3.10
       - name: Clone pytorch/pytorch
         uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
       - name: Generating test matrix
         id: set-matrix
         run: .github/scripts/generate_pytorch_test_matrix.py
   test:
     runs-on: !{{ test_runner_type }}
 {%- if only_build_on_pull_request %}
     if: ${{ github.event_name == 'push' }}
 {%- endif %}
     env:
       JOB_BASE_NAME: !{{ build_environment }}-test
     needs:
       - build
       SHARD_NUMBER: ${{ matrix.shard }}
       NUM_TEST_SHARDS: ${{ matrix.num_shards }}
       TEST_CONFIG: ${{ matrix.config }}
       http_proxy: "!{{ common.squid_proxy }}"
       https_proxy: "!{{ common.squid_proxy }}"
       RUN_SMOKE_TESTS_ONLY_ON_PR: !{{ only_run_smoke_tests_on_pull_request }}
       PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
       CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
     needs: [build, generate-test-matrix, !{{ ciflow_config.root_job_name }}]
     strategy:
       matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
       fail-fast: false
     runs-on: ${{ matrix.runner }}
     defaults:
       run:
         working-directory: pytorch-${{ github.run_id }}
     steps:
       - name: Checkout PyTorch
         uses: actions/checkout@v2
         uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
         with:
           submodules: recursive
       - name: Clean workspace (including things in .gitignore)
         shell: bash
         run: |
           git clean -xdf
           path: pytorch-${{ github.run_id }}
           # deep clone, to allow use of git merge-base
           fetch-depth: 0
       !{{ common.display_ec2_information() }}
       - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
         uses: seemethere/add-github-ssh-key@v1
         with:
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
       - name: Install Visual Studio 2019 toolchain
         shell: powershell
         run: |
 @ -126,71 +234,26 @@ jobs:
         name: Setup Python3
         with:
           python-version: '3.x'
       - name: Run test scripts
       - name: Test
         shell: bash
         env:
           PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
         run: |
             if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
               export SHARD_NUMBER=0
             fi
             if [[ -n $GITHUB_HEAD_REF && "$RUN_SMOKE_TESTS_ONLY_ON_PR" == "true" ]]; then
               export RUN_SMOKE_TESTS_ONLY=1
             fi
             .jenkins/pytorch/win-test.sh
       - uses: actions/upload-artifact@v2
         name: Store PyTorch Test Reports
       !{{ common.upload_test_reports(name='windows') }}
       !{{ common.render_test_results() }}
       !{{ wait_and_kill_ssh() }}
       !{{ common.parse_ref() }}
       !{{ common.upload_test_statistics(build_environment) }}
       - name: Cleanup workspace
         if: always()
         with:
           name: test-reports
           retention-days: 14
           if-no-files-found: error
           path:
             test/**/*.xml
   # this is a separate step from test because the log files from test are too
   # long: basically, GitHub tries to render all of the log files when you click
   # through an action causing extreme slowdown on actions that contain too many
   # logs (like test); we can always move it back to the other one, but it
   # doesn't create the best experience
   render_test_results:
     if: always()
     needs:
       - test
     runs-on: ubuntu-18.04
     # TODO: Make this into a composite step
     steps:
       - name: Checkout PyTorch
         uses: actions/checkout@v2
         with:
           # deep clone, to allow tools/print_test_stats.py to use Git commands
           fetch-depth: 0
       - uses: actions/download-artifact@v2
         name: Download PyTorch Test Reports
         with:
           name: test-reports
           path: test/test-reports
       - uses: actions/setup-python@v2
         with:
           python-version: 3.9
       - name: Install dependencies
         # boto3 version copied from .circleci/docker/common/install_conda.sh
         shell: bash
         # Should remove the entirety of pytorch-${{ github.run_id }}
         run: |
           pip install -r requirements.txt
           pip install boto3==1.16.34 junitparser rich
       - name: Output Test Results (Click Me)
         run: |
           python tools/render_junit.py test
       - name: Parse ref
         id: parse-ref
         run: .github/scripts/parse_ref.py
       - name: Display and upload test statistics (Click Me)
         # temporary hack: set CIRCLE_* vars, until we update
         # tools/print_test_stats.py to natively support GitHub Actions
         env:
           SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
           AWS_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_ACCESS_KEY_ID }}
           AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_SECRET_ACCESS_KEY }}
           CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
           CIRCLE_JOB: !{{ build_environment }}
           CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
           CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
           CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
           CIRCLE_WORKFLOW_ID: ${{ github.run_id }} # dunno if this corresponds
         run: |
           export PYTHONPATH=$PWD
           python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test
           rm -rf ./*

									
										66

.github/workflows/add_annotations.yml
									
										vendored
									
												View File
											
				@ -1,66 +0,0 @@

				name: Add annotations

				on:

				  workflow_run:

				    types:

				      - completed

				    workflows:

				      - Lint

				jobs:

				  annotate:

				    strategy:

				      fail-fast: false

				      matrix:

				        name:

				          - flake8-py3

				          - clang-tidy

				    runs-on: ubuntu-18.04

				    steps:

				      - name: Download artifact

				        uses: actions/github-script@v3

				        env:

				          RUN_ID: ${{ github.event.workflow_run.id }}

				          LINT_NAME: ${{ matrix.name }}

				        with:

				          # https://securitylab.github.com/research/github-actions-preventing-pwn-requests/

				          script: |

				            const artifacts = await github.actions.listWorkflowRunArtifacts({

				              owner: context.repo.owner,

				              repo: context.repo.repo,

				              run_id: process.env.RUN_ID,

				            });

				            const filteredArtifacts = artifacts.data.artifacts.filter(artifact => {

				              return artifact.name == process.env.LINT_NAME;

				            });

				            if (filteredArtifacts.length > 0) {

				              const matchArtifact = filteredArtifacts[0];

				              const download = await github.actions.downloadArtifact({

				                owner: context.repo.owner,

				                repo: context.repo.repo,

				                artifact_id: matchArtifact.id,

				                archive_format: 'zip',

				              });

				              const fs = require('fs');

				              fs.writeFileSync(

				                `${process.env.GITHUB_WORKSPACE}/linter-output.zip`,

				                Buffer.from(download.data),

				              );

				            }

				      - name: Unzip artifact

				        id: unzip

				        run: |

				          if unzip linter-output.zip annotations.json commit-sha.txt; then

				            echo ::set-output \

				              name=sha::"$(grep -Em1 '^[[:xdigit:]]{40}$' commit-sha.txt)"

				          fi

				      - if: steps.unzip.outputs.sha

				        name: Add annotations

				        uses: pytorch/add-annotations-github-action@master

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        with:

				          check_name: ${{ matrix.name }}

				          linter_output_path: annotations.json

				          commit_sha: ${{ steps.unzip.outputs.sha }}

				          mode: json

									
										7

.github/workflows/auto_label.yml
									
										vendored
									
												View File
												
				@ -6,8 +6,15 @@ on:

				  pull_request_target:

				    types: [edited, opened, synchronize, reopened]

				concurrency:

				  group: auto-label-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  auto-label-rocm:

				    if: ${{ github.repository == 'pytorch/pytorch' }}

				    runs-on: ubuntu-18.04

				    steps:

				    - name: Retrieve information

									
										20

.github/workflows/build_linux_conda.yml
									
										vendored
									
												View File
												
				@ -16,7 +16,7 @@ jobs:

				      image: python:3.9

				    steps:

				      - name: Clone pytorch/pytorch

				        uses: actions/checkout@v2

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				      - name: Generating build matrix

				        id: set-matrix

				        run: |

				@ -57,12 +57,12 @@ jobs:

				      - name: Clean runner workspace

				        run: rm -rf "$GITHUB_WORKSPACE"

				      - name: Clone pytorch/pytorch

				        uses: actions/checkout@v2

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          path: pytorch

				          submodules: recursive

				      - name: Clone pytorch/builder

				        uses: actions/checkout@v2

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          repository: pytorch/builder

				          path: builder

				@ -91,23 +91,25 @@ jobs:

				        with:

				          name: pytorch-conda-py${{ matrix.python_version }}-${{matrix.gpu_arch_type}}-${{ matrix.gpu_arch_version }}

				          path: /remote/**/*.bz2

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/print_test_stats.py to natively support GitHub Actions

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: ${{ github.run_id }} # dunno if this corresponds

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          export PYTHONPATH=$PWD

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				          pip3 install requests

				          python3 .circleci/scripts/upload_binary_size_to_scuba.py || exit 0

				          pip3 install requests==2.26

				          python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				concurrency:

				  group: build-linux-conda-${{ github.event.pull_request.number || github.sha }}

				  group: build-linux-conda-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

									
										20

.github/workflows/build_linux_libtorch.yml
									
										vendored
									
												View File
												
				@ -16,7 +16,7 @@ jobs:

				      image: python:3.9

				    steps:

				      - name: Clone pytorch/pytorch

				        uses: actions/checkout@v2

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				      - name: Generating build matrix

				        id: set-matrix

				        run: |

				@ -51,12 +51,12 @@ jobs:

				      - name: Clean runner workspace

				        run: rm -rf "$GITHUB_WORKSPACE"

				      - name: Clone pytorch/pytorch

				        uses: actions/checkout@v2

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          path: pytorch

				          submodules: recursive

				      - name: Clone pytorch/builder

				        uses: actions/checkout@v2

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          repository: pytorch/builder

				          path: builder

				@ -90,23 +90,25 @@ jobs:

				        with:

				          name: pytorch-libtorch-${{ matrix.libtorch_variant }}-${{ matrix.devtoolset }}-${{matrix.gpu_arch_type}}-${{ matrix.gpu_arch_version }}

				          path: /remote/**/*.zip

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/print_test_stats.py to natively support GitHub Actions

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: ${{ github.run_id }} # dunno if this corresponds

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          export PYTHONPATH=$PWD

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				          pip3 install requests

				          python3 .circleci/scripts/upload_binary_size_to_scuba.py || exit 0

				          pip3 install requests==2.26

				          python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				concurrency:

				  group: build-linux-libtorch-${{ github.event.pull_request.number || github.sha }}

				  group: build-linux-libtorch-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

									
										20

.github/workflows/build_linux_wheels.yml
									
										vendored
									
												View File
												
				@ -16,7 +16,7 @@ jobs:

				      image: python:3.9

				    steps:

				      - name: Clone pytorch/pytorch

				        uses: actions/checkout@v2

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				      - name: Generating build matrix

				        id: set-matrix

				        run: |

				@ -46,12 +46,12 @@ jobs:

				      - name: Clean runner workspace

				        run: rm -rf "$GITHUB_WORKSPACE"

				      - name: Clone pytorch/pytorch

				        uses: actions/checkout@v2

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          path: pytorch

				          submodules: recursive

				      - name: Clone pytorch/builder

				        uses: actions/checkout@v2

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          repository: pytorch/builder

				          path: builder

				@ -89,23 +89,25 @@ jobs:

				        with:

				          name: pytorch-wheel-py${{ matrix.python_version }}-${{matrix.gpu_arch_type}}-${{ matrix.gpu_arch_version }}

				          path: /remote/**/*.whl

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/print_test_stats.py to natively support GitHub Actions

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: ${{ github.run_id }} # dunno if this corresponds

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          export PYTHONPATH=$PWD

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				          pip3 install requests

				          python3 .circleci/scripts/upload_binary_size_to_scuba.py || exit 0

				          pip3 install requests==2.26

				          python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				concurrency:

				  group: build-linux-wheels-${{ github.event.pull_request.number || github.sha }}

				  group: build-linux-wheels-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

									
										48

.github/workflows/clang_format.yml
									
										vendored
									
												View File
											
				@ -1,48 +0,0 @@

				name: clang-format

				on:

				  pull_request:

				jobs:

				  clang-format:

				    runs-on: ubuntu-18.04

				    steps:

				      - name: Setup Python

				        uses: actions/setup-python@v2

				        with:

				          python-version: 3.x

				          architecture: x64

				      - name: Fetch PyTorch

				        uses: actions/checkout@v2

				        with:

				          fetch-depth: 0 # deep clone, to allow us to use git merge-base

				      - name: Run clang-format

				        env:

				          BASE_SHA: ${{ github.event.pull_request.base.sha }}

				        run: |

				          set -eu

				          # This is necessary to get the same results regardless of whether the

				          # PR was opened directly or from a forked repo. See: `9f890a92` for more info.

				          git remote add upstream https://github.com/pytorch/pytorch

				          git fetch upstream "$GITHUB_BASE_REF"

				          # only run clang-format on allowlisted files

				          echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"

				          echo "| clang-format failures found! Run: "

				          echo "|    tools/clang_format_ci.sh ${BASE_SHA} "

				          echo "| to fix this error. "

				          echo "| For more info, see: https://github.com/pytorch/pytorch/wiki/clang-format "

				          echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"

				          tools/clang_format_ci.sh "${BASE_SHA}"

				          GIT_DIFF=$(git diff)

				          if [[ -z $GIT_DIFF ]]; then

				            exit 0

				          fi

				          echo "$GIT_DIFF"

				          exit 1

				concurrency:

				  group: clang-format-${{ github.event.pull_request.number || github.sha }}

				  cancel-in-progress: true

									
										53

.github/workflows/create_release.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,53 @@

				name: Create Release

				on:

				  push:

				    tags: ['v*']

				    branches: [master]

				  release:

				    types: [published]

				  pull_request:

				    paths: [.github/workflows/create_release.yml]

				jobs:

				  release:

				    if: ${{ github.repository == 'pytorch/pytorch' }}

				    name: Create Release

				    runs-on: ubuntu-latest

				    steps:

				      - uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          submodules: 'recursive'

				      - name: Fake name for PRs

				        if: ${{ github.event_name == 'pull_request' }}

				        run: echo "PT_GITHUB_REF=refs/tags/pr-tag" >> "$GITHUB_ENV"

				      - name: Real name for non-PRs

				        if: ${{ github.event_name != 'pull_request' }}

				        run: echo "PT_GITHUB_REF=$GITHUB_REF" >> "$GITHUB_ENV"

				      - name: Set filenames

				        run: |

				          tag_or_branch="${PT_GITHUB_REF#refs/tags/}"

				          tag_or_branch="${tag_or_branch#refs/heads/}"

				          echo "PT_RELEASE_NAME=pytorch-$tag_or_branch" >> "$GITHUB_ENV"

				          echo "PT_RELEASE_FILE=pytorch-$tag_or_branch.tar.gz" >> "$GITHUB_ENV"

				      - name: Create source distribution

				        run: |

				            # Create new folder with specified name so extracting the archive yields that

				            rm -rf "/tmp/$PT_RELEASE_NAME"

				            cp -r "$PWD" "/tmp/$PT_RELEASE_NAME"

				            mv "/tmp/$PT_RELEASE_NAME" .

				            # Cleanup

				            rm -r "$PT_RELEASE_NAME"/{.azure_pipelines,.circleci,.jenkins}

				            find "$PT_RELEASE_NAME" -name '.git*' -exec rm -rv {} \; || true

				            # Create archive

				            tar -czf "$PT_RELEASE_FILE" "$PT_RELEASE_NAME"

				            echo "Created source archive $PT_RELEASE_FILE with content: $(ls -a "$PT_RELEASE_NAME")"

				      - name: Upload source distribution

				        if: ${{ github.event_name == 'release' }}

				        uses: softprops/action-gh-release@v1

				        with:

				          files: ${{env.PT_RELEASE_FILE}}

				concurrency:

				  group: create-release-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

									
										283

.github/workflows/generated-libtorch-linux-xenial-cuda10.2-py3.6-gcc7.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,283 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/linux_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: libtorch-linux-xenial-cuda10.2-py3.6-gcc7

				on:

				  pull_request:

				    types: [unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: libtorch-linux-xenial-cuda10.2-py3.6-gcc7

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				  CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				concurrency:

				  group: libtorch-linux-xenial-cuda10.2-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/libtorch') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}

				    env:

				      LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				      - name: print labels

				        run: echo "${LABELS}"

				  calculate-docker-image:

				    runs-on: linux.2xlarge

				    needs: [ciflow_should_run]

				    env:

				      DOCKER_BUILDKIT: 1

				    timeout-minutes: 90

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: false

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				          # Check if image already exists, if it does then skip building it

				          if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then

				            exit 0

				          fi

				          if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then

				            # if we're on the base branch then use the parent commit

				            MERGE_BASE=$(git rev-parse HEAD~)

				          else

				            # otherwise we're on a PR, so use the most recent base commit

				            MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")

				          fi

				          # Covers the case where a previous tag doesn't exist for the tree

				          # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly

				          if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then

				            echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"

				            exit 1

				          fi

				          PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")

				          # If no image exists but the hash is the same as the previous hash then we should error out here

				          if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then

				            echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"

				            echo "       contact the PyTorch team to restore the original images"

				            exit 1

				          fi

				          echo ::set-output name=rebuild::yes

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          DOCKER_SKIP_S3_UPLOAD: 1

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          cd .circleci/docker && ./build_docker.sh

				  build:

				    runs-on: linux.2xlarge

				    needs: [calculate-docker-image, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: libtorch-linux-xenial-cuda10.2-py3.6-gcc7-build

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Build

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e SKIP_SCCACHE_INITIALIZATION=1 \

				            -e TORCH_CUDA_ARCH_LIST \

				            -e PR_LABELS \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --tty \

				            --detach \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          IS_GHA: 1

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				          pip3 install requests==2.26 boto3==1.16.34

				          python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				      - name: Chown workspace

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Clean up docker images

				        if: always()

				        run: |

				          # Prune all of the docker images

				          docker system prune -af

									
										283

.github/workflows/generated-libtorch-linux-xenial-cuda11.3-py3.6-gcc7.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,283 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/linux_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: libtorch-linux-xenial-cuda11.3-py3.6-gcc7

				on:

				  pull_request:

				    types: [unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: libtorch-linux-xenial-cuda11.3-py3.6-gcc7

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				  CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				concurrency:

				  group: libtorch-linux-xenial-cuda11.3-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/libtorch') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}

				    env:

				      LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				      - name: print labels

				        run: echo "${LABELS}"

				  calculate-docker-image:

				    runs-on: linux.2xlarge

				    needs: [ciflow_should_run]

				    env:

				      DOCKER_BUILDKIT: 1

				    timeout-minutes: 90

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: false

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				          # Check if image already exists, if it does then skip building it

				          if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then

				            exit 0

				          fi

				          if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then

				            # if we're on the base branch then use the parent commit

				            MERGE_BASE=$(git rev-parse HEAD~)

				          else

				            # otherwise we're on a PR, so use the most recent base commit

				            MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")

				          fi

				          # Covers the case where a previous tag doesn't exist for the tree

				          # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly

				          if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then

				            echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"

				            exit 1

				          fi

				          PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")

				          # If no image exists but the hash is the same as the previous hash then we should error out here

				          if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then

				            echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"

				            echo "       contact the PyTorch team to restore the original images"

				            exit 1

				          fi

				          echo ::set-output name=rebuild::yes

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          DOCKER_SKIP_S3_UPLOAD: 1

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          cd .circleci/docker && ./build_docker.sh

				  build:

				    runs-on: linux.2xlarge

				    needs: [calculate-docker-image, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: libtorch-linux-xenial-cuda11.3-py3.6-gcc7-build

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Build

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e SKIP_SCCACHE_INITIALIZATION=1 \

				            -e TORCH_CUDA_ARCH_LIST \

				            -e PR_LABELS \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --tty \

				            --detach \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          IS_GHA: 1

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				          pip3 install requests==2.26 boto3==1.16.34

				          python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				      - name: Chown workspace

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Clean up docker images

				        if: always()

				        run: |

				          # Prune all of the docker images

				          docker system prune -af

									
										562

.github/workflows/generated-linux-bionic-cuda10.2-py3.9-gcc7.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,562 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/linux_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: linux-bionic-cuda10.2-py3.9-gcc7

				on:

				  pull_request:

				    types: [unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: linux-bionic-cuda10.2-py3.9-gcc7

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				  CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				concurrency:

				  group: linux-bionic-cuda10.2-py3.9-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository_owner == 'pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/slow'))) }}

				    env:

				      LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				      - name: print labels

				        run: echo "${LABELS}"

				  calculate-docker-image:

				    runs-on: linux.2xlarge

				    needs: [ciflow_should_run]

				    env:

				      DOCKER_BUILDKIT: 1

				    timeout-minutes: 90

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: false

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				          # Check if image already exists, if it does then skip building it

				          if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then

				            exit 0

				          fi

				          if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then

				            # if we're on the base branch then use the parent commit

				            MERGE_BASE=$(git rev-parse HEAD~)

				          else

				            # otherwise we're on a PR, so use the most recent base commit

				            MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")

				          fi

				          # Covers the case where a previous tag doesn't exist for the tree

				          # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly

				          if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then

				            echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"

				            exit 1

				          fi

				          PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")

				          # If no image exists but the hash is the same as the previous hash then we should error out here

				          if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then

				            echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"

				            echo "       contact the PyTorch team to restore the original images"

				            exit 1

				          fi

				          echo ::set-output name=rebuild::yes

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          DOCKER_SKIP_S3_UPLOAD: 1

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          cd .circleci/docker && ./build_docker.sh

				  build:

				    runs-on: linux.2xlarge

				    needs: [calculate-docker-image, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-build

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Build

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e SKIP_SCCACHE_INITIALIZATION=1 \

				            -e TORCH_CUDA_ARCH_LIST \

				            -e PR_LABELS \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --tty \

				            --detach \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          IS_GHA: 1

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				          pip3 install requests==2.26 boto3==1.16.34

				          python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				      - name: Chown workspace

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Archive artifacts into zip

				        run: |

				          zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store PyTorch Build Artifacts on S3

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            artifacts.zip

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Clean up docker images

				        if: always()

				        run: |

				          # Prune all of the docker images

				          docker system prune -af

				  generate-test-matrix:

				    runs-on: ubuntu-18.04

				    needs: [ciflow_should_run]

				    env:

				      TEST_RUNNER_TYPE: linux.8xlarge.nvidia.gpu

				      ENABLE_DISTRIBUTED_TEST: 1

				      ENABLE_JIT_LEGACY_TEST: ''

				      ENABLE_MULTIGPU_TEST: ''

				      ENABLE_NOGPU_NO_AVX_TEST: ''

				      ENABLE_NOGPU_NO_AVX2_TEST: ''

				      ENABLE_SLOW_TEST: ''

				      ENABLE_DOCS_TEST: ''

				      ENABLE_BACKWARDS_COMPAT_TEST: ''

				      ENABLE_XLA_TEST: ''

				      ENABLE_NOARCH_TEST: ''

				      NUM_TEST_SHARDS: 2

				      MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu

				      NOGPU_RUNNER_TYPE: linux.2xlarge

				      PR_BODY: ${{ github.event.pull_request.body }}

				    outputs:

				      matrix: ${{ steps.set-matrix.outputs.matrix }}

				      render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}

				      ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}

				    container:

				      image: python:3.9

				    steps:

				      - name: Install dependencies

				        run: pip install typing-extensions==3.10

				      - name: Clone pytorch/pytorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				      - name: Generating test matrix

				        id: set-matrix

				        run: .github/scripts/generate_pytorch_test_matrix.py

				  test:

				    needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]

				    strategy:

				      matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}

				      fail-fast: false

				    runs-on: ${{ matrix.runner }}

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test

				      TEST_CONFIG: ${{ matrix.config }}

				      SHARD_NUMBER: ${{ matrix.shard }}

				      NUM_TEST_SHARDS: ${{ matrix.num_shards }}

				      PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}

				      CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG

				        if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}

				        run: |

				          bash .github/scripts/install_nvidia_utils_linux.sh

				          echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"

				      - name: Determine shm-size

				        run: |

				          shm_size="1g"

				          case "${BUILD_ENVIRONMENT}" in

				            *cuda*)

				              shm_size="2g"

				              ;;

				            *rocm*)

				              shm_size="8g"

				              ;;

				          esac

				          echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"

				      - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b

				        name: Download PyTorch Build Artifacts

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				      - name: Unzip artifacts

				        run: |

				          unzip -o artifacts.zip

				      - name: Output disk space left

				        run: |

				          sudo df -H

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Test

				        env:

				          PR_NUMBER: ${{ github.event.pull_request.number }}

				          IS_GHA: 1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          AWS_DEFAULT_REGION: us-east-1

				        run: |

				          if [[ $TEST_CONFIG == 'multigpu' ]]; then

				            TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh

				          else

				            TEST_COMMAND=.jenkins/pytorch/test.sh

				          fi

				          if [[ $NUM_TEST_SHARDS -ne 2 ]]; then

				            export SHARD_NUMBER=0

				          fi

				          # detached container should get cleaned up by teardown_ec2_linux

				          # TODO: Stop building test binaries as part of the build phase

				          # Used for GPU_FLAG since that doesn't play nice

				          # shellcheck disable=SC2086

				          container_name=$(docker run \

				            ${GPU_FLAG:-} \

				            -e BUILD_ENVIRONMENT \

				            -e PR_NUMBER \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e GITHUB_ACTIONS \

				            -e IN_CI \

				            -e IS_GHA \

				            -e CIRCLE_BRANCH \

				            -e CIRCLE_SHA1 \

				            -e CIRCLE_PR_NUMBER \

				            -e AWS_DEFAULT_REGION \

				            -e IN_WHEEL_TEST \

				            -e SHARD_NUMBER \

				            -e JOB_BASE_NAME \

				            -e TEST_CONFIG \

				            -e NUM_TEST_SHARDS \

				            -e PYTORCH_IGNORE_DISABLED_ISSUES \

				            -e PR_LABELS \

				            -e CONTINUE_THROUGH_ERROR \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --shm-size="${SHM_SIZE}" \

				            --tty \

				            --detach \

				            --name="${container_name}" \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"

				      - name: Chown workspace

				        if: always()

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Install render_test_results dependencies

				        if: always()

				        shell: bash

				        run: |

				          python3 -m pip install junitparser==2.1.1 rich==10.9.0

				      - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"

				        if: always()

				        shell: bash

				        # Encoding is weird on windows, just try to default to utf-8 if possible

				        env:

				          PYTHONIOENCODING: "utf-8"

				        run: |

				          python3 tools/render_junit.py test/

				      - name: Zip test reports for upload

				        if: always()

				        env:

				          FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'

				        run: |

				          # Remove any previous test reports if they exist

				          rm -f test-reports-*.zip

				          zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'

				      - uses: actions/upload-artifact@v2

				        name: Store Test Reports

				        if: always()

				        with:

				          name: test-reports-${{ matrix.config }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store Test Reports on S3

				        if: always()

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - name: Display and upload test statistics (Click Me)

				        if: always()

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        shell: bash

				        run: |

				          python3 -m pip install -r requirements.txt

				          python3 -m pip install boto3==1.16.34

				          python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

									
										562

.github/workflows/generated-linux-bionic-py3.6-clang9.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,562 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/linux_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: linux-bionic-py3.6-clang9

				on:

				  pull_request:

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: linux-bionic-py3.6-clang9

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.6-clang9

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				  CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				concurrency:

				  group: linux-bionic-py3.6-clang9-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/noarch') || contains(github.event.pull_request.labels.*.name, 'ciflow/xla'))) }}

				    env:

				      LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				      - name: print labels

				        run: echo "${LABELS}"

				  calculate-docker-image:

				    runs-on: linux.2xlarge

				    needs: [ciflow_should_run]

				    env:

				      DOCKER_BUILDKIT: 1

				    timeout-minutes: 90

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: false

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				          # Check if image already exists, if it does then skip building it

				          if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then

				            exit 0

				          fi

				          if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then

				            # if we're on the base branch then use the parent commit

				            MERGE_BASE=$(git rev-parse HEAD~)

				          else

				            # otherwise we're on a PR, so use the most recent base commit

				            MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")

				          fi

				          # Covers the case where a previous tag doesn't exist for the tree

				          # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly

				          if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then

				            echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"

				            exit 1

				          fi

				          PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")

				          # If no image exists but the hash is the same as the previous hash then we should error out here

				          if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then

				            echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"

				            echo "       contact the PyTorch team to restore the original images"

				            exit 1

				          fi

				          echo ::set-output name=rebuild::yes

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          DOCKER_SKIP_S3_UPLOAD: 1

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          cd .circleci/docker && ./build_docker.sh

				  build:

				    runs-on: linux.2xlarge

				    needs: [calculate-docker-image, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: linux-bionic-py3.6-clang9-build

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Build

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e SKIP_SCCACHE_INITIALIZATION=1 \

				            -e TORCH_CUDA_ARCH_LIST \

				            -e PR_LABELS \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --tty \

				            --detach \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          IS_GHA: 1

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				          pip3 install requests==2.26 boto3==1.16.34

				          python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				      - name: Chown workspace

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Archive artifacts into zip

				        run: |

				          zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store PyTorch Build Artifacts on S3

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            artifacts.zip

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Clean up docker images

				        if: always()

				        run: |

				          # Prune all of the docker images

				          docker system prune -af

				  generate-test-matrix:

				    runs-on: ubuntu-18.04

				    needs: [ciflow_should_run]

				    env:

				      TEST_RUNNER_TYPE: linux.2xlarge

				      ENABLE_DISTRIBUTED_TEST: ''

				      ENABLE_JIT_LEGACY_TEST: ''

				      ENABLE_MULTIGPU_TEST: ''

				      ENABLE_NOGPU_NO_AVX_TEST: ''

				      ENABLE_NOGPU_NO_AVX2_TEST: ''

				      ENABLE_SLOW_TEST: ''

				      ENABLE_DOCS_TEST: ''

				      ENABLE_BACKWARDS_COMPAT_TEST: ''

				      ENABLE_XLA_TEST: ''

				      ENABLE_NOARCH_TEST: 1

				      NUM_TEST_SHARDS: 2

				      MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu

				      NOGPU_RUNNER_TYPE: linux.2xlarge

				      PR_BODY: ${{ github.event.pull_request.body }}

				    outputs:

				      matrix: ${{ steps.set-matrix.outputs.matrix }}

				      render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}

				      ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}

				    container:

				      image: python:3.9

				    steps:

				      - name: Install dependencies

				        run: pip install typing-extensions==3.10

				      - name: Clone pytorch/pytorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				      - name: Generating test matrix

				        id: set-matrix

				        run: .github/scripts/generate_pytorch_test_matrix.py

				  test:

				    needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]

				    strategy:

				      matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}

				      fail-fast: false

				    runs-on: ${{ matrix.runner }}

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: linux-bionic-py3.6-clang9-test

				      TEST_CONFIG: ${{ matrix.config }}

				      SHARD_NUMBER: ${{ matrix.shard }}

				      NUM_TEST_SHARDS: ${{ matrix.num_shards }}

				      PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}

				      CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG

				        if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}

				        run: |

				          bash .github/scripts/install_nvidia_utils_linux.sh

				          echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"

				      - name: Determine shm-size

				        run: |

				          shm_size="1g"

				          case "${BUILD_ENVIRONMENT}" in

				            *cuda*)

				              shm_size="2g"

				              ;;

				            *rocm*)

				              shm_size="8g"

				              ;;

				          esac

				          echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"

				      - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b

				        name: Download PyTorch Build Artifacts

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				      - name: Unzip artifacts

				        run: |

				          unzip -o artifacts.zip

				      - name: Output disk space left

				        run: |

				          sudo df -H

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Test

				        env:

				          PR_NUMBER: ${{ github.event.pull_request.number }}

				          IS_GHA: 1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          AWS_DEFAULT_REGION: us-east-1

				        run: |

				          if [[ $TEST_CONFIG == 'multigpu' ]]; then

				            TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh

				          else

				            TEST_COMMAND=.jenkins/pytorch/test.sh

				          fi

				          if [[ $NUM_TEST_SHARDS -ne 2 ]]; then

				            export SHARD_NUMBER=0

				          fi

				          # detached container should get cleaned up by teardown_ec2_linux

				          # TODO: Stop building test binaries as part of the build phase

				          # Used for GPU_FLAG since that doesn't play nice

				          # shellcheck disable=SC2086

				          container_name=$(docker run \

				            ${GPU_FLAG:-} \

				            -e BUILD_ENVIRONMENT \

				            -e PR_NUMBER \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e GITHUB_ACTIONS \

				            -e IN_CI \

				            -e IS_GHA \

				            -e CIRCLE_BRANCH \

				            -e CIRCLE_SHA1 \

				            -e CIRCLE_PR_NUMBER \

				            -e AWS_DEFAULT_REGION \

				            -e IN_WHEEL_TEST \

				            -e SHARD_NUMBER \

				            -e JOB_BASE_NAME \

				            -e TEST_CONFIG \

				            -e NUM_TEST_SHARDS \

				            -e PYTORCH_IGNORE_DISABLED_ISSUES \

				            -e PR_LABELS \

				            -e CONTINUE_THROUGH_ERROR \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --shm-size="${SHM_SIZE}" \

				            --tty \

				            --detach \

				            --name="${container_name}" \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"

				      - name: Chown workspace

				        if: always()

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Install render_test_results dependencies

				        if: always()

				        shell: bash

				        run: |

				          python3 -m pip install junitparser==2.1.1 rich==10.9.0

				      - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"

				        if: always()

				        shell: bash

				        # Encoding is weird on windows, just try to default to utf-8 if possible

				        env:

				          PYTHONIOENCODING: "utf-8"

				        run: |

				          python3 tools/render_junit.py test/

				      - name: Zip test reports for upload

				        if: always()

				        env:

				          FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'

				        run: |

				          # Remove any previous test reports if they exist

				          rm -f test-reports-*.zip

				          zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'

				      - uses: actions/upload-artifact@v2

				        name: Store Test Reports

				        if: always()

				        with:

				          name: test-reports-${{ matrix.config }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store Test Reports on S3

				        if: always()

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - name: Display and upload test statistics (Click Me)

				        if: always()

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          JOB_BASE_NAME: linux-bionic-py3.6-clang9-test

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        shell: bash

				        run: |

				          python3 -m pip install -r requirements.txt

				          python3 -m pip install boto3==1.16.34

				          python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

									
										566

.github/workflows/generated-linux-bionic-py3.8-gcc9-coverage.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,566 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/linux_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: linux-bionic-py3.8-gcc9-coverage

				on:

				  pull_request:

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: linux-bionic-py3.8-gcc9-coverage

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.8-gcc9

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				  CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				concurrency:

				  group: linux-bionic-py3.8-gcc9-coverage-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/coverage') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}

				    env:

				      LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				      - name: print labels

				        run: echo "${LABELS}"

				  calculate-docker-image:

				    runs-on: linux.2xlarge

				    needs: [ciflow_should_run]

				    env:

				      DOCKER_BUILDKIT: 1

				    timeout-minutes: 90

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: false

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				          # Check if image already exists, if it does then skip building it

				          if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then

				            exit 0

				          fi

				          if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then

				            # if we're on the base branch then use the parent commit

				            MERGE_BASE=$(git rev-parse HEAD~)

				          else

				            # otherwise we're on a PR, so use the most recent base commit

				            MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")

				          fi

				          # Covers the case where a previous tag doesn't exist for the tree

				          # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly

				          if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then

				            echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"

				            exit 1

				          fi

				          PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")

				          # If no image exists but the hash is the same as the previous hash then we should error out here

				          if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then

				            echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"

				            echo "       contact the PyTorch team to restore the original images"

				            exit 1

				          fi

				          echo ::set-output name=rebuild::yes

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          DOCKER_SKIP_S3_UPLOAD: 1

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          cd .circleci/docker && ./build_docker.sh

				  build:

				    runs-on: linux.2xlarge

				    needs: [calculate-docker-image, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: linux-bionic-py3.8-gcc9-coverage-build

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Build

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e SKIP_SCCACHE_INITIALIZATION=1 \

				            -e TORCH_CUDA_ARCH_LIST \

				            -e PR_LABELS \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --tty \

				            --detach \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          IS_GHA: 1

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				          pip3 install requests==2.26 boto3==1.16.34

				          python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				      - name: Chown workspace

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Archive artifacts into zip

				        run: |

				          zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store PyTorch Build Artifacts on S3

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            artifacts.zip

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Clean up docker images

				        if: always()

				        run: |

				          # Prune all of the docker images

				          docker system prune -af

				  generate-test-matrix:

				    runs-on: ubuntu-18.04

				    needs: [ciflow_should_run]

				    env:

				      TEST_RUNNER_TYPE: linux.2xlarge

				      ENABLE_DISTRIBUTED_TEST: 1

				      ENABLE_JIT_LEGACY_TEST: ''

				      ENABLE_MULTIGPU_TEST: ''

				      ENABLE_NOGPU_NO_AVX_TEST: ''

				      ENABLE_NOGPU_NO_AVX2_TEST: ''

				      ENABLE_SLOW_TEST: ''

				      ENABLE_DOCS_TEST: ''

				      ENABLE_BACKWARDS_COMPAT_TEST: ''

				      ENABLE_XLA_TEST: ''

				      ENABLE_NOARCH_TEST: ''

				      NUM_TEST_SHARDS: 2

				      MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu

				      NOGPU_RUNNER_TYPE: linux.2xlarge

				      PR_BODY: ${{ github.event.pull_request.body }}

				    outputs:

				      matrix: ${{ steps.set-matrix.outputs.matrix }}

				      render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}

				      ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}

				    container:

				      image: python:3.9

				    steps:

				      - name: Install dependencies

				        run: pip install typing-extensions==3.10

				      - name: Clone pytorch/pytorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				      - name: Generating test matrix

				        id: set-matrix

				        run: .github/scripts/generate_pytorch_test_matrix.py

				  test:

				    needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]

				    strategy:

				      matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}

				      fail-fast: false

				    runs-on: ${{ matrix.runner }}

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: linux-bionic-py3.8-gcc9-coverage-test

				      TEST_CONFIG: ${{ matrix.config }}

				      SHARD_NUMBER: ${{ matrix.shard }}

				      NUM_TEST_SHARDS: ${{ matrix.num_shards }}

				      PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}

				      CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG

				        if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}

				        run: |

				          bash .github/scripts/install_nvidia_utils_linux.sh

				          echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"

				      - name: Determine shm-size

				        run: |

				          shm_size="1g"

				          case "${BUILD_ENVIRONMENT}" in

				            *cuda*)

				              shm_size="2g"

				              ;;

				            *rocm*)

				              shm_size="8g"

				              ;;

				          esac

				          echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"

				      - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b

				        name: Download PyTorch Build Artifacts

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				      - name: Unzip artifacts

				        run: |

				          unzip -o artifacts.zip

				      - name: Output disk space left

				        run: |

				          sudo df -H

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Test

				        env:

				          PR_NUMBER: ${{ github.event.pull_request.number }}

				          IS_GHA: 1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          AWS_DEFAULT_REGION: us-east-1

				        run: |

				          if [[ $TEST_CONFIG == 'multigpu' ]]; then

				            TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh

				          else

				            TEST_COMMAND=.jenkins/pytorch/test.sh

				          fi

				          if [[ $NUM_TEST_SHARDS -ne 2 ]]; then

				            export SHARD_NUMBER=0

				          fi

				          # detached container should get cleaned up by teardown_ec2_linux

				          # TODO: Stop building test binaries as part of the build phase

				          # Used for GPU_FLAG since that doesn't play nice

				          # shellcheck disable=SC2086

				          container_name=$(docker run \

				            ${GPU_FLAG:-} \

				            -e BUILD_ENVIRONMENT \

				            -e PR_NUMBER \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e GITHUB_ACTIONS \

				            -e IN_CI \

				            -e IS_GHA \

				            -e CIRCLE_BRANCH \

				            -e CIRCLE_SHA1 \

				            -e CIRCLE_PR_NUMBER \

				            -e AWS_DEFAULT_REGION \

				            -e IN_WHEEL_TEST \

				            -e SHARD_NUMBER \

				            -e JOB_BASE_NAME \

				            -e TEST_CONFIG \

				            -e NUM_TEST_SHARDS \

				            -e PYTORCH_IGNORE_DISABLED_ISSUES \

				            -e PR_LABELS \

				            -e CONTINUE_THROUGH_ERROR \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --shm-size="${SHM_SIZE}" \

				            --tty \

				            --detach \

				            --name="${container_name}" \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"

				      - name: Chown workspace

				        if: always()

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Install render_test_results dependencies

				        if: always()

				        shell: bash

				        run: |

				          python3 -m pip install junitparser==2.1.1 rich==10.9.0

				      - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"

				        if: always()

				        shell: bash

				        # Encoding is weird on windows, just try to default to utf-8 if possible

				        env:

				          PYTHONIOENCODING: "utf-8"

				        run: |

				          python3 tools/render_junit.py test/

				      - name: Report coverage

				        run: |

				          python3 -mpip install codecov==2.1.12

				          python3 -mcodecov

				      - name: Zip test reports for upload

				        if: always()

				        env:

				          FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'

				        run: |

				          # Remove any previous test reports if they exist

				          rm -f test-reports-*.zip

				          zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'

				      - uses: actions/upload-artifact@v2

				        name: Store Test Reports

				        if: always()

				        with:

				          name: test-reports-${{ matrix.config }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store Test Reports on S3

				        if: always()

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - name: Display and upload test statistics (Click Me)

				        if: always()

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          JOB_BASE_NAME: linux-bionic-py3.8-gcc9-coverage-test

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        shell: bash

				        run: |

				          python3 -m pip install -r requirements.txt

				          python3 -m pip install boto3==1.16.34

				          python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

									
										562

.github/workflows/generated-linux-xenial-cuda10.2-py3.6-gcc7.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,562 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/linux_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: linux-xenial-cuda10.2-py3.6-gcc7

				on:

				  pull_request:

				    types: [unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: linux-xenial-cuda10.2-py3.6-gcc7

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				  CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				concurrency:

				  group: linux-xenial-cuda10.2-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/slow'))) }}

				    env:

				      LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				      - name: print labels

				        run: echo "${LABELS}"

				  calculate-docker-image:

				    runs-on: linux.2xlarge

				    needs: [ciflow_should_run]

				    env:

				      DOCKER_BUILDKIT: 1

				    timeout-minutes: 90

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: false

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				          # Check if image already exists, if it does then skip building it

				          if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then

				            exit 0

				          fi

				          if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then

				            # if we're on the base branch then use the parent commit

				            MERGE_BASE=$(git rev-parse HEAD~)

				          else

				            # otherwise we're on a PR, so use the most recent base commit

				            MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")

				          fi

				          # Covers the case where a previous tag doesn't exist for the tree

				          # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly

				          if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then

				            echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"

				            exit 1

				          fi

				          PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")

				          # If no image exists but the hash is the same as the previous hash then we should error out here

				          if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then

				            echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"

				            echo "       contact the PyTorch team to restore the original images"

				            exit 1

				          fi

				          echo ::set-output name=rebuild::yes

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          DOCKER_SKIP_S3_UPLOAD: 1

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          cd .circleci/docker && ./build_docker.sh

				  build:

				    runs-on: linux.2xlarge

				    needs: [calculate-docker-image, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: linux-xenial-cuda10.2-py3.6-gcc7-build

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Build

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e SKIP_SCCACHE_INITIALIZATION=1 \

				            -e TORCH_CUDA_ARCH_LIST \

				            -e PR_LABELS \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --tty \

				            --detach \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          IS_GHA: 1

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				          pip3 install requests==2.26 boto3==1.16.34

				          python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				      - name: Chown workspace

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Archive artifacts into zip

				        run: |

				          zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store PyTorch Build Artifacts on S3

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            artifacts.zip

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Clean up docker images

				        if: always()

				        run: |

				          # Prune all of the docker images

				          docker system prune -af

				  generate-test-matrix:

				    runs-on: ubuntu-18.04

				    needs: [ciflow_should_run]

				    env:

				      TEST_RUNNER_TYPE: linux.8xlarge.nvidia.gpu

				      ENABLE_DISTRIBUTED_TEST: 1

				      ENABLE_JIT_LEGACY_TEST: 1

				      ENABLE_MULTIGPU_TEST: 1

				      ENABLE_NOGPU_NO_AVX_TEST: 1

				      ENABLE_NOGPU_NO_AVX2_TEST: 1

				      ENABLE_SLOW_TEST: 1

				      ENABLE_DOCS_TEST: ''

				      ENABLE_BACKWARDS_COMPAT_TEST: ''

				      ENABLE_XLA_TEST: ''

				      ENABLE_NOARCH_TEST: ''

				      NUM_TEST_SHARDS: 2

				      MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu

				      NOGPU_RUNNER_TYPE: linux.2xlarge

				      PR_BODY: ${{ github.event.pull_request.body }}

				    outputs:

				      matrix: ${{ steps.set-matrix.outputs.matrix }}

				      render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}

				      ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}

				    container:

				      image: python:3.9

				    steps:

				      - name: Install dependencies

				        run: pip install typing-extensions==3.10

				      - name: Clone pytorch/pytorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				      - name: Generating test matrix

				        id: set-matrix

				        run: .github/scripts/generate_pytorch_test_matrix.py

				  test:

				    needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]

				    strategy:

				      matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}

				      fail-fast: false

				    runs-on: ${{ matrix.runner }}

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: linux-xenial-cuda10.2-py3.6-gcc7-test

				      TEST_CONFIG: ${{ matrix.config }}

				      SHARD_NUMBER: ${{ matrix.shard }}

				      NUM_TEST_SHARDS: ${{ matrix.num_shards }}

				      PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}

				      CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG

				        if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}

				        run: |

				          bash .github/scripts/install_nvidia_utils_linux.sh

				          echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"

				      - name: Determine shm-size

				        run: |

				          shm_size="1g"

				          case "${BUILD_ENVIRONMENT}" in

				            *cuda*)

				              shm_size="2g"

				              ;;

				            *rocm*)

				              shm_size="8g"

				              ;;

				          esac

				          echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"

				      - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b

				        name: Download PyTorch Build Artifacts

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				      - name: Unzip artifacts

				        run: |

				          unzip -o artifacts.zip

				      - name: Output disk space left

				        run: |

				          sudo df -H

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Test

				        env:

				          PR_NUMBER: ${{ github.event.pull_request.number }}

				          IS_GHA: 1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          AWS_DEFAULT_REGION: us-east-1

				        run: |

				          if [[ $TEST_CONFIG == 'multigpu' ]]; then

				            TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh

				          else

				            TEST_COMMAND=.jenkins/pytorch/test.sh

				          fi

				          if [[ $NUM_TEST_SHARDS -ne 2 ]]; then

				            export SHARD_NUMBER=0

				          fi

				          # detached container should get cleaned up by teardown_ec2_linux

				          # TODO: Stop building test binaries as part of the build phase

				          # Used for GPU_FLAG since that doesn't play nice

				          # shellcheck disable=SC2086

				          container_name=$(docker run \

				            ${GPU_FLAG:-} \

				            -e BUILD_ENVIRONMENT \

				            -e PR_NUMBER \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e GITHUB_ACTIONS \

				            -e IN_CI \

				            -e IS_GHA \

				            -e CIRCLE_BRANCH \

				            -e CIRCLE_SHA1 \

				            -e CIRCLE_PR_NUMBER \

				            -e AWS_DEFAULT_REGION \

				            -e IN_WHEEL_TEST \

				            -e SHARD_NUMBER \

				            -e JOB_BASE_NAME \

				            -e TEST_CONFIG \

				            -e NUM_TEST_SHARDS \

				            -e PYTORCH_IGNORE_DISABLED_ISSUES \

				            -e PR_LABELS \

				            -e CONTINUE_THROUGH_ERROR \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --shm-size="${SHM_SIZE}" \

				            --tty \

				            --detach \

				            --name="${container_name}" \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"

				      - name: Chown workspace

				        if: always()

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Install render_test_results dependencies

				        if: always()

				        shell: bash

				        run: |

				          python3 -m pip install junitparser==2.1.1 rich==10.9.0

				      - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"

				        if: always()

				        shell: bash

				        # Encoding is weird on windows, just try to default to utf-8 if possible

				        env:

				          PYTHONIOENCODING: "utf-8"

				        run: |

				          python3 tools/render_junit.py test/

				      - name: Zip test reports for upload

				        if: always()

				        env:

				          FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'

				        run: |

				          # Remove any previous test reports if they exist

				          rm -f test-reports-*.zip

				          zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'

				      - uses: actions/upload-artifact@v2

				        name: Store Test Reports

				        if: always()

				        with:

				          name: test-reports-${{ matrix.config }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store Test Reports on S3

				        if: always()

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - name: Display and upload test statistics (Click Me)

				        if: always()

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          JOB_BASE_NAME: linux-xenial-cuda10.2-py3.6-gcc7-test

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        shell: bash

				        run: |

				          python3 -m pip install -r requirements.txt

				          python3 -m pip install boto3==1.16.34

				          python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

									
										562

.github/workflows/generated-linux-xenial-cuda11.3-py3.6-gcc7.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,562 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/linux_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: linux-xenial-cuda11.3-py3.6-gcc7

				on:

				  pull_request:

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: linux-xenial-cuda11.3-py3.6-gcc7

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				  CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				concurrency:

				  group: linux-xenial-cuda11.3-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}

				    env:

				      LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				      - name: print labels

				        run: echo "${LABELS}"

				  calculate-docker-image:

				    runs-on: linux.2xlarge

				    needs: [ciflow_should_run]

				    env:

				      DOCKER_BUILDKIT: 1

				    timeout-minutes: 90

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: false

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				          # Check if image already exists, if it does then skip building it

				          if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then

				            exit 0

				          fi

				          if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then

				            # if we're on the base branch then use the parent commit

				            MERGE_BASE=$(git rev-parse HEAD~)

				          else

				            # otherwise we're on a PR, so use the most recent base commit

				            MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")

				          fi

				          # Covers the case where a previous tag doesn't exist for the tree

				          # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly

				          if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then

				            echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"

				            exit 1

				          fi

				          PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")

				          # If no image exists but the hash is the same as the previous hash then we should error out here

				          if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then

				            echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"

				            echo "       contact the PyTorch team to restore the original images"

				            exit 1

				          fi

				          echo ::set-output name=rebuild::yes

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          DOCKER_SKIP_S3_UPLOAD: 1

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          cd .circleci/docker && ./build_docker.sh

				  build:

				    runs-on: linux.2xlarge

				    needs: [calculate-docker-image, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: linux-xenial-cuda11.3-py3.6-gcc7-build

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Build

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e SKIP_SCCACHE_INITIALIZATION=1 \

				            -e TORCH_CUDA_ARCH_LIST \

				            -e PR_LABELS \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --tty \

				            --detach \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          IS_GHA: 1

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				          pip3 install requests==2.26 boto3==1.16.34

				          python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				      - name: Chown workspace

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Archive artifacts into zip

				        run: |

				          zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store PyTorch Build Artifacts on S3

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            artifacts.zip

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Clean up docker images

				        if: always()

				        run: |

				          # Prune all of the docker images

				          docker system prune -af

				  generate-test-matrix:

				    runs-on: ubuntu-18.04

				    needs: [ciflow_should_run]

				    env:

				      TEST_RUNNER_TYPE: linux.8xlarge.nvidia.gpu

				      ENABLE_DISTRIBUTED_TEST: 1

				      ENABLE_JIT_LEGACY_TEST: ''

				      ENABLE_MULTIGPU_TEST: ''

				      ENABLE_NOGPU_NO_AVX_TEST: ''

				      ENABLE_NOGPU_NO_AVX2_TEST: ''

				      ENABLE_SLOW_TEST: ''

				      ENABLE_DOCS_TEST: ''

				      ENABLE_BACKWARDS_COMPAT_TEST: ''

				      ENABLE_XLA_TEST: ''

				      ENABLE_NOARCH_TEST: ''

				      NUM_TEST_SHARDS: 2

				      MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu

				      NOGPU_RUNNER_TYPE: linux.2xlarge

				      PR_BODY: ${{ github.event.pull_request.body }}

				    outputs:

				      matrix: ${{ steps.set-matrix.outputs.matrix }}

				      render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}

				      ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}

				    container:

				      image: python:3.9

				    steps:

				      - name: Install dependencies

				        run: pip install typing-extensions==3.10

				      - name: Clone pytorch/pytorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				      - name: Generating test matrix

				        id: set-matrix

				        run: .github/scripts/generate_pytorch_test_matrix.py

				  test:

				    needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]

				    strategy:

				      matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}

				      fail-fast: false

				    runs-on: ${{ matrix.runner }}

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: linux-xenial-cuda11.3-py3.6-gcc7-test

				      TEST_CONFIG: ${{ matrix.config }}

				      SHARD_NUMBER: ${{ matrix.shard }}

				      NUM_TEST_SHARDS: ${{ matrix.num_shards }}

				      PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}

				      CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG

				        if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}

				        run: |

				          bash .github/scripts/install_nvidia_utils_linux.sh

				          echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"

				      - name: Determine shm-size

				        run: |

				          shm_size="1g"

				          case "${BUILD_ENVIRONMENT}" in

				            *cuda*)

				              shm_size="2g"

				              ;;

				            *rocm*)

				              shm_size="8g"

				              ;;

				          esac

				          echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"

				      - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b

				        name: Download PyTorch Build Artifacts

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				      - name: Unzip artifacts

				        run: |

				          unzip -o artifacts.zip

				      - name: Output disk space left

				        run: |

				          sudo df -H

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Test

				        env:

				          PR_NUMBER: ${{ github.event.pull_request.number }}

				          IS_GHA: 1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          AWS_DEFAULT_REGION: us-east-1

				        run: |

				          if [[ $TEST_CONFIG == 'multigpu' ]]; then

				            TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh

				          else

				            TEST_COMMAND=.jenkins/pytorch/test.sh

				          fi

				          if [[ $NUM_TEST_SHARDS -ne 2 ]]; then

				            export SHARD_NUMBER=0

				          fi

				          # detached container should get cleaned up by teardown_ec2_linux

				          # TODO: Stop building test binaries as part of the build phase

				          # Used for GPU_FLAG since that doesn't play nice

				          # shellcheck disable=SC2086

				          container_name=$(docker run \

				            ${GPU_FLAG:-} \

				            -e BUILD_ENVIRONMENT \

				            -e PR_NUMBER \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e GITHUB_ACTIONS \

				            -e IN_CI \

				            -e IS_GHA \

				            -e CIRCLE_BRANCH \

				            -e CIRCLE_SHA1 \

				            -e CIRCLE_PR_NUMBER \

				            -e AWS_DEFAULT_REGION \

				            -e IN_WHEEL_TEST \

				            -e SHARD_NUMBER \

				            -e JOB_BASE_NAME \

				            -e TEST_CONFIG \

				            -e NUM_TEST_SHARDS \

				            -e PYTORCH_IGNORE_DISABLED_ISSUES \

				            -e PR_LABELS \

				            -e CONTINUE_THROUGH_ERROR \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --shm-size="${SHM_SIZE}" \

				            --tty \

				            --detach \

				            --name="${container_name}" \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"

				      - name: Chown workspace

				        if: always()

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Install render_test_results dependencies

				        if: always()

				        shell: bash

				        run: |

				          python3 -m pip install junitparser==2.1.1 rich==10.9.0

				      - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"

				        if: always()

				        shell: bash

				        # Encoding is weird on windows, just try to default to utf-8 if possible

				        env:

				          PYTHONIOENCODING: "utf-8"

				        run: |

				          python3 tools/render_junit.py test/

				      - name: Zip test reports for upload

				        if: always()

				        env:

				          FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'

				        run: |

				          # Remove any previous test reports if they exist

				          rm -f test-reports-*.zip

				          zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'

				      - uses: actions/upload-artifact@v2

				        name: Store Test Reports

				        if: always()

				        with:

				          name: test-reports-${{ matrix.config }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store Test Reports on S3

				        if: always()

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - name: Display and upload test statistics (Click Me)

				        if: always()

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          JOB_BASE_NAME: linux-xenial-cuda11.3-py3.6-gcc7-test

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        shell: bash

				        run: |

				          python3 -m pip install -r requirements.txt

				          python3 -m pip install boto3==1.16.34

				          python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

									
										709

.github/workflows/generated-linux-xenial-py3.6-gcc5.4.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,709 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/linux_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: linux-xenial-py3.6-gcc5.4

				on:

				  pull_request:

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: linux-xenial-py3.6-gcc5.4

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				  CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				concurrency:

				  group: linux-xenial-py3.6-gcc5.4-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository_owner == 'pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}

				    env:

				      LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				      - name: print labels

				        run: echo "${LABELS}"

				  calculate-docker-image:

				    runs-on: linux.2xlarge

				    needs: [ciflow_should_run]

				    env:

				      DOCKER_BUILDKIT: 1

				    timeout-minutes: 90

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: false

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				          # Check if image already exists, if it does then skip building it

				          if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then

				            exit 0

				          fi

				          if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then

				            # if we're on the base branch then use the parent commit

				            MERGE_BASE=$(git rev-parse HEAD~)

				          else

				            # otherwise we're on a PR, so use the most recent base commit

				            MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")

				          fi

				          # Covers the case where a previous tag doesn't exist for the tree

				          # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly

				          if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then

				            echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"

				            exit 1

				          fi

				          PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")

				          # If no image exists but the hash is the same as the previous hash then we should error out here

				          if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then

				            echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"

				            echo "       contact the PyTorch team to restore the original images"

				            exit 1

				          fi

				          echo ::set-output name=rebuild::yes

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          DOCKER_SKIP_S3_UPLOAD: 1

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          cd .circleci/docker && ./build_docker.sh

				  build:

				    runs-on: linux.2xlarge

				    needs: [calculate-docker-image, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: linux-xenial-py3.6-gcc5.4-build

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Build

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e SKIP_SCCACHE_INITIALIZATION=1 \

				            -e TORCH_CUDA_ARCH_LIST \

				            -e PR_LABELS \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --tty \

				            --detach \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          IS_GHA: 1

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				          pip3 install requests==2.26 boto3==1.16.34

				          python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				      - name: Chown workspace

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Archive artifacts into zip

				        run: |

				          zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store PyTorch Build Artifacts on S3

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            artifacts.zip

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Clean up docker images

				        if: always()

				        run: |

				          # Prune all of the docker images

				          docker system prune -af

				  generate-test-matrix:

				    runs-on: ubuntu-18.04

				    needs: [ciflow_should_run]

				    env:

				      TEST_RUNNER_TYPE: linux.2xlarge

				      ENABLE_DISTRIBUTED_TEST: 1

				      ENABLE_JIT_LEGACY_TEST: 1

				      ENABLE_MULTIGPU_TEST: ''

				      ENABLE_NOGPU_NO_AVX_TEST: ''

				      ENABLE_NOGPU_NO_AVX2_TEST: ''

				      ENABLE_SLOW_TEST: ''

				      ENABLE_DOCS_TEST: 1

				      ENABLE_BACKWARDS_COMPAT_TEST: 1

				      ENABLE_XLA_TEST: ''

				      ENABLE_NOARCH_TEST: ''

				      NUM_TEST_SHARDS: 2

				      MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu

				      NOGPU_RUNNER_TYPE: linux.2xlarge

				      PR_BODY: ${{ github.event.pull_request.body }}

				    outputs:

				      matrix: ${{ steps.set-matrix.outputs.matrix }}

				      render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}

				      ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}

				    container:

				      image: python:3.9

				    steps:

				      - name: Install dependencies

				        run: pip install typing-extensions==3.10

				      - name: Clone pytorch/pytorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				      - name: Generating test matrix

				        id: set-matrix

				        run: .github/scripts/generate_pytorch_test_matrix.py

				  test:

				    needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]

				    strategy:

				      matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}

				      fail-fast: false

				    runs-on: ${{ matrix.runner }}

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: linux-xenial-py3.6-gcc5.4-test

				      TEST_CONFIG: ${{ matrix.config }}

				      SHARD_NUMBER: ${{ matrix.shard }}

				      NUM_TEST_SHARDS: ${{ matrix.num_shards }}

				      PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}

				      CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG

				        if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}

				        run: |

				          bash .github/scripts/install_nvidia_utils_linux.sh

				          echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"

				      - name: Determine shm-size

				        run: |

				          shm_size="1g"

				          case "${BUILD_ENVIRONMENT}" in

				            *cuda*)

				              shm_size="2g"

				              ;;

				            *rocm*)

				              shm_size="8g"

				              ;;

				          esac

				          echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"

				      - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b

				        name: Download PyTorch Build Artifacts

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				      - name: Unzip artifacts

				        run: |

				          unzip -o artifacts.zip

				      - name: Output disk space left

				        run: |

				          sudo df -H

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Test

				        env:

				          PR_NUMBER: ${{ github.event.pull_request.number }}

				          IS_GHA: 1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          AWS_DEFAULT_REGION: us-east-1

				        run: |

				          if [[ $TEST_CONFIG == 'multigpu' ]]; then

				            TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh

				          else

				            TEST_COMMAND=.jenkins/pytorch/test.sh

				          fi

				          if [[ $NUM_TEST_SHARDS -ne 2 ]]; then

				            export SHARD_NUMBER=0

				          fi

				          # detached container should get cleaned up by teardown_ec2_linux

				          # TODO: Stop building test binaries as part of the build phase

				          # Used for GPU_FLAG since that doesn't play nice

				          # shellcheck disable=SC2086

				          container_name=$(docker run \

				            ${GPU_FLAG:-} \

				            -e BUILD_ENVIRONMENT \

				            -e PR_NUMBER \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e GITHUB_ACTIONS \

				            -e IN_CI \

				            -e IS_GHA \

				            -e CIRCLE_BRANCH \

				            -e CIRCLE_SHA1 \

				            -e CIRCLE_PR_NUMBER \

				            -e AWS_DEFAULT_REGION \

				            -e IN_WHEEL_TEST \

				            -e SHARD_NUMBER \

				            -e JOB_BASE_NAME \

				            -e TEST_CONFIG \

				            -e NUM_TEST_SHARDS \

				            -e PYTORCH_IGNORE_DISABLED_ISSUES \

				            -e PR_LABELS \

				            -e CONTINUE_THROUGH_ERROR \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --shm-size="${SHM_SIZE}" \

				            --tty \

				            --detach \

				            --name="${container_name}" \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"

				      - name: Chown workspace

				        if: always()

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Install render_test_results dependencies

				        if: always()

				        shell: bash

				        run: |

				          python3 -m pip install junitparser==2.1.1 rich==10.9.0

				      - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"

				        if: always()

				        shell: bash

				        # Encoding is weird on windows, just try to default to utf-8 if possible

				        env:

				          PYTHONIOENCODING: "utf-8"

				        run: |

				          python3 tools/render_junit.py test/

				      - name: Zip test reports for upload

				        if: always()

				        env:

				          FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'

				        run: |

				          # Remove any previous test reports if they exist

				          rm -f test-reports-*.zip

				          zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'

				      - uses: actions/upload-artifact@v2

				        name: Store Test Reports

				        if: always()

				        with:

				          name: test-reports-${{ matrix.config }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store Test Reports on S3

				        if: always()

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - name: Display and upload test statistics (Click Me)

				        if: always()

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          JOB_BASE_NAME: linux-xenial-py3.6-gcc5.4-test

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        shell: bash

				        run: |

				          python3 -m pip install -r requirements.txt

				          python3 -m pip install boto3==1.16.34

				          python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

				  build-docs:

				    runs-on: linux.2xlarge

				    strategy:

				      matrix:

				        docs_type: [cpp, python]

				    needs: [calculate-docker-image, build, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      DOCS_TYPE: ${{ matrix.docs_type }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b

				        name: Download PyTorch Build Artifacts

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				      - name: Unzip artifacts

				        run: |

				          unzip -o artifacts.zip

				      - name: Build ${{ matrix.docs_type }} docs

				        run: |

				          set -ex

				          time docker pull "${DOCKER_IMAGE}" > /dev/null

				          echo "${GITHUB_REF}"

				          ref=${GITHUB_REF##*/}

				          target=${ref//v}

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e IN_CI \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e CIRCLE_SHA1="$GITHUB_SHA" \

				            -e DOCS_VERSION="${target}" \

				            -e DOCS_TYPE \

				            -e PR_LABELS \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --tty \

				            --detach \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" bash -c "sudo chown -R jenkins . && pip install dist/*.whl && ./.circleci/scripts/${DOCS_TYPE}_doc_push_script.sh"

				      - name: Chown workspace

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Upload Python Docs Preview

				        if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'python' }}

				        with:

				          retention-days: 14

				          s3-bucket: doc-previews

				          if-no-files-found: error

				          path: pytorch.github.io/docs/merge/

				          s3-prefix: pytorch/${{ github.event.pull_request.number }}

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Upload C++ Docs Preview

				        if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'cpp' }}

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          s3-bucket: doc-previews

				          path: cppdocs/

				          s3-prefix: pytorch/${{ github.event.pull_request.number }}/cppdocs

				      - name: Archive artifacts into zip

				        run: |

				          zip -r "docs_${DOCS_TYPE}.zip" "${GITHUB_WORKSPACE}/pytorch.github.io" "${GITHUB_WORKSPACE}/cppdocs"

				      - uses: actions/upload-artifact@v2

				        name: Store PyTorch Build Artifacts

				        with:

				          name: docs_${{ matrix.docs_type }}

				          path: docs_${{ matrix.docs_type }}.zip

				          if-no-files-found: error

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

									
										367

.github/workflows/generated-linux-xenial-py3.6-gcc7-bazel-test.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,367 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/bazel_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: linux-xenial-py3.6-gcc7-bazel-test

				on:

				  pull_request:

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: linux-xenial-py3.6-gcc7-bazel-test

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				  CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				concurrency:

				  group: linux-xenial-py3.6-gcc7-bazel-test-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/bazel') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}

				    env:

				      LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				      - name: print labels

				        run: echo "${LABELS}"

				  calculate-docker-image:

				    runs-on: linux.2xlarge

				    needs: [ciflow_should_run]

				    env:

				      DOCKER_BUILDKIT: 1

				    timeout-minutes: 90

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: false

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				          # Check if image already exists, if it does then skip building it

				          if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then

				            exit 0

				          fi

				          if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then

				            # if we're on the base branch then use the parent commit

				            MERGE_BASE=$(git rev-parse HEAD~)

				          else

				            # otherwise we're on a PR, so use the most recent base commit

				            MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")

				          fi

				          # Covers the case where a previous tag doesn't exist for the tree

				          # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly

				          if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then

				            echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"

				            exit 1

				          fi

				          PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")

				          # If no image exists but the hash is the same as the previous hash then we should error out here

				          if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then

				            echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"

				            echo "       contact the PyTorch team to restore the original images"

				            exit 1

				          fi

				          echo ::set-output name=rebuild::yes

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          DOCKER_SKIP_S3_UPLOAD: 1

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          cd .circleci/docker && ./build_docker.sh

				  # building and testing in a single job since bazel runs only small subset of tests

				  build-and-test:

				    runs-on: linux.2xlarge

				    needs: [calculate-docker-image, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: linux-xenial-py3.6-gcc7-bazel-test-build-and-test

				      NUM_TEST_SHARDS: 1

				      CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Determine shm-size

				        run: |

				          shm_size="1g"

				          case "${BUILD_ENVIRONMENT}" in

				            *cuda*)

				              shm_size="2g"

				              ;;

				            *rocm*)

				              shm_size="8g"

				              ;;

				          esac

				          echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"

				      - name: Output disk space left

				        run: |

				          sudo df -H

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Build

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e PR_LABELS \

				            -e SKIP_SCCACHE_INITIALIZATION=1 \

				            -e TORCH_CUDA_ARCH_LIST \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --tty \

				            --detach \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && sudo chown -R jenkins /dev && .jenkins/pytorch/build.sh'

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				          pip3 install requests==2.26

				          python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				      - name: Test

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          export SHARD_NUMBER=0

				          # TODO: Stop building test binaries as part of the build phase

				          # Make sure we copy test results from bazel-testlogs symlink to

				          # a regular directory ./test/test-reports

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e GITHUB_ACTIONS \

				            -e IN_CI \

				            -e SHARD_NUMBER \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e CONTINUE_THROUGH_ERROR \

				            -e PR_LABELS \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --shm-size="${SHM_SIZE}" \

				            --tty \

				            --detach \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && sudo chown -R jenkins /dev && .jenkins/pytorch/test.sh && cp -Lr ./bazel-testlogs ./test/test-reports'

				      - name: Chown workspace

				        if: always()

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Zip test reports for upload

				        if: always()

				        env:

				          FILE_SUFFIX: 'bazel-${{ github.job }}'

				        run: |

				          # Remove any previous test reports if they exist

				          rm -f test-reports-*.zip

				          zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'

				      - uses: actions/upload-artifact@v2

				        name: Store Test Reports

				        if: always()

				        with:

				          name: test-reports-bazel

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store Test Reports on S3

				        if: always()

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - name: Display and upload test statistics (Click Me)

				        if: always()

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          JOB_BASE_NAME: linux-xenial-py3.6-gcc7-bazel-test-test

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        shell: bash

				        run: |

				          python3 -m pip install -r requirements.txt

				          python3 -m pip install boto3==1.16.34

				          python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

									
										562

.github/workflows/generated-parallelnative-linux-xenial-py3.6-gcc5.4.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,562 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/linux_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: parallelnative-linux-xenial-py3.6-gcc5.4

				on:

				  pull_request:

				    types: [unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: parallelnative-linux-xenial-py3.6-gcc5.4

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				  CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				concurrency:

				  group: parallelnative-linux-xenial-py3.6-gcc5.4-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}

				    env:

				      LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				      - name: print labels

				        run: echo "${LABELS}"

				  calculate-docker-image:

				    runs-on: linux.2xlarge

				    needs: [ciflow_should_run]

				    env:

				      DOCKER_BUILDKIT: 1

				    timeout-minutes: 90

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: false

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				          # Check if image already exists, if it does then skip building it

				          if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then

				            exit 0

				          fi

				          if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then

				            # if we're on the base branch then use the parent commit

				            MERGE_BASE=$(git rev-parse HEAD~)

				          else

				            # otherwise we're on a PR, so use the most recent base commit

				            MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")

				          fi

				          # Covers the case where a previous tag doesn't exist for the tree

				          # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly

				          if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then

				            echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"

				            exit 1

				          fi

				          PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")

				          # If no image exists but the hash is the same as the previous hash then we should error out here

				          if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then

				            echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"

				            echo "       contact the PyTorch team to restore the original images"

				            exit 1

				          fi

				          echo ::set-output name=rebuild::yes

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          DOCKER_SKIP_S3_UPLOAD: 1

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          cd .circleci/docker && ./build_docker.sh

				  build:

				    runs-on: linux.2xlarge

				    needs: [calculate-docker-image, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: parallelnative-linux-xenial-py3.6-gcc5.4-build

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Build

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e SKIP_SCCACHE_INITIALIZATION=1 \

				            -e TORCH_CUDA_ARCH_LIST \

				            -e PR_LABELS \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --tty \

				            --detach \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          IS_GHA: 1

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				          pip3 install requests==2.26 boto3==1.16.34

				          python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				      - name: Chown workspace

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Archive artifacts into zip

				        run: |

				          zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store PyTorch Build Artifacts on S3

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            artifacts.zip

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Clean up docker images

				        if: always()

				        run: |

				          # Prune all of the docker images

				          docker system prune -af

				  generate-test-matrix:

				    runs-on: ubuntu-18.04

				    needs: [ciflow_should_run]

				    env:

				      TEST_RUNNER_TYPE: linux.2xlarge

				      ENABLE_DISTRIBUTED_TEST: 1

				      ENABLE_JIT_LEGACY_TEST: ''

				      ENABLE_MULTIGPU_TEST: ''

				      ENABLE_NOGPU_NO_AVX_TEST: ''

				      ENABLE_NOGPU_NO_AVX2_TEST: ''

				      ENABLE_SLOW_TEST: ''

				      ENABLE_DOCS_TEST: ''

				      ENABLE_BACKWARDS_COMPAT_TEST: ''

				      ENABLE_XLA_TEST: ''

				      ENABLE_NOARCH_TEST: ''

				      NUM_TEST_SHARDS: 1

				      MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu

				      NOGPU_RUNNER_TYPE: linux.2xlarge

				      PR_BODY: ${{ github.event.pull_request.body }}

				    outputs:

				      matrix: ${{ steps.set-matrix.outputs.matrix }}

				      render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}

				      ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}

				    container:

				      image: python:3.9

				    steps:

				      - name: Install dependencies

				        run: pip install typing-extensions==3.10

				      - name: Clone pytorch/pytorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				      - name: Generating test matrix

				        id: set-matrix

				        run: .github/scripts/generate_pytorch_test_matrix.py

				  test:

				    needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]

				    strategy:

				      matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}

				      fail-fast: false

				    runs-on: ${{ matrix.runner }}

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: parallelnative-linux-xenial-py3.6-gcc5.4-test

				      TEST_CONFIG: ${{ matrix.config }}

				      SHARD_NUMBER: ${{ matrix.shard }}

				      NUM_TEST_SHARDS: ${{ matrix.num_shards }}

				      PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}

				      CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG

				        if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}

				        run: |

				          bash .github/scripts/install_nvidia_utils_linux.sh

				          echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"

				      - name: Determine shm-size

				        run: |

				          shm_size="1g"

				          case "${BUILD_ENVIRONMENT}" in

				            *cuda*)

				              shm_size="2g"

				              ;;

				            *rocm*)

				              shm_size="8g"

				              ;;

				          esac

				          echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"

				      - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b

				        name: Download PyTorch Build Artifacts

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				      - name: Unzip artifacts

				        run: |

				          unzip -o artifacts.zip

				      - name: Output disk space left

				        run: |

				          sudo df -H

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Test

				        env:

				          PR_NUMBER: ${{ github.event.pull_request.number }}

				          IS_GHA: 1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          AWS_DEFAULT_REGION: us-east-1

				        run: |

				          if [[ $TEST_CONFIG == 'multigpu' ]]; then

				            TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh

				          else

				            TEST_COMMAND=.jenkins/pytorch/test.sh

				          fi

				          if [[ $NUM_TEST_SHARDS -ne 2 ]]; then

				            export SHARD_NUMBER=0

				          fi

				          # detached container should get cleaned up by teardown_ec2_linux

				          # TODO: Stop building test binaries as part of the build phase

				          # Used for GPU_FLAG since that doesn't play nice

				          # shellcheck disable=SC2086

				          container_name=$(docker run \

				            ${GPU_FLAG:-} \

				            -e BUILD_ENVIRONMENT \

				            -e PR_NUMBER \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e GITHUB_ACTIONS \

				            -e IN_CI \

				            -e IS_GHA \

				            -e CIRCLE_BRANCH \

				            -e CIRCLE_SHA1 \

				            -e CIRCLE_PR_NUMBER \

				            -e AWS_DEFAULT_REGION \

				            -e IN_WHEEL_TEST \

				            -e SHARD_NUMBER \

				            -e JOB_BASE_NAME \

				            -e TEST_CONFIG \

				            -e NUM_TEST_SHARDS \

				            -e PYTORCH_IGNORE_DISABLED_ISSUES \

				            -e PR_LABELS \

				            -e CONTINUE_THROUGH_ERROR \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --shm-size="${SHM_SIZE}" \

				            --tty \

				            --detach \

				            --name="${container_name}" \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"

				      - name: Chown workspace

				        if: always()

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Install render_test_results dependencies

				        if: always()

				        shell: bash

				        run: |

				          python3 -m pip install junitparser==2.1.1 rich==10.9.0

				      - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"

				        if: always()

				        shell: bash

				        # Encoding is weird on windows, just try to default to utf-8 if possible

				        env:

				          PYTHONIOENCODING: "utf-8"

				        run: |

				          python3 tools/render_junit.py test/

				      - name: Zip test reports for upload

				        if: always()

				        env:

				          FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'

				        run: |

				          # Remove any previous test reports if they exist

				          rm -f test-reports-*.zip

				          zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'

				      - uses: actions/upload-artifact@v2

				        name: Store Test Reports

				        if: always()

				        with:

				          name: test-reports-${{ matrix.config }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store Test Reports on S3

				        if: always()

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - name: Display and upload test statistics (Click Me)

				        if: always()

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          JOB_BASE_NAME: parallelnative-linux-xenial-py3.6-gcc5.4-test

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        shell: bash

				        run: |

				          python3 -m pip install -r requirements.txt

				          python3 -m pip install boto3==1.16.34

				          python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

									
										281

.github/workflows/generated-periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,281 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/linux_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7

				on:

				  pull_request:

				    types: [unassigned]

				  schedule:

				    - cron: 45 0,4,8,12,16,20 * * *

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				  CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				concurrency:

				  group: periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/libtorch') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/scheduled'))) }}

				    env:

				      LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				      - name: print labels

				        run: echo "${LABELS}"

				  calculate-docker-image:

				    runs-on: linux.2xlarge

				    needs: [ciflow_should_run]

				    env:

				      DOCKER_BUILDKIT: 1

				    timeout-minutes: 90

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: false

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				          # Check if image already exists, if it does then skip building it

				          if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then

				            exit 0

				          fi

				          if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then

				            # if we're on the base branch then use the parent commit

				            MERGE_BASE=$(git rev-parse HEAD~)

				          else

				            # otherwise we're on a PR, so use the most recent base commit

				            MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")

				          fi

				          # Covers the case where a previous tag doesn't exist for the tree

				          # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly

				          if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then

				            echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"

				            exit 1

				          fi

				          PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")

				          # If no image exists but the hash is the same as the previous hash then we should error out here

				          if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then

				            echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"

				            echo "       contact the PyTorch team to restore the original images"

				            exit 1

				          fi

				          echo ::set-output name=rebuild::yes

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          DOCKER_SKIP_S3_UPLOAD: 1

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          cd .circleci/docker && ./build_docker.sh

				  build:

				    runs-on: linux.2xlarge

				    needs: [calculate-docker-image, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7-build

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Build

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e SKIP_SCCACHE_INITIALIZATION=1 \

				            -e TORCH_CUDA_ARCH_LIST \

				            -e PR_LABELS \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --tty \

				            --detach \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          IS_GHA: 1

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				          pip3 install requests==2.26 boto3==1.16.34

				          python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				      - name: Chown workspace

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Clean up docker images

				        if: always()

				        run: |

				          # Prune all of the docker images

				          docker system prune -af

									
										560

.github/workflows/generated-periodic-linux-xenial-cuda11.1-py3.6-gcc7.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,560 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/linux_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: periodic-linux-xenial-cuda11.1-py3.6-gcc7

				on:

				  pull_request:

				    types: [unassigned]

				  schedule:

				    - cron: 45 0,4,8,12,16,20 * * *

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: periodic-linux-xenial-cuda11.1-py3.6-gcc7

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				  CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				concurrency:

				  group: periodic-linux-xenial-cuda11.1-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/scheduled'))) }}

				    env:

				      LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				      - name: print labels

				        run: echo "${LABELS}"

				  calculate-docker-image:

				    runs-on: linux.2xlarge

				    needs: [ciflow_should_run]

				    env:

				      DOCKER_BUILDKIT: 1

				    timeout-minutes: 90

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: false

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				          # Check if image already exists, if it does then skip building it

				          if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then

				            exit 0

				          fi

				          if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then

				            # if we're on the base branch then use the parent commit

				            MERGE_BASE=$(git rev-parse HEAD~)

				          else

				            # otherwise we're on a PR, so use the most recent base commit

				            MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")

				          fi

				          # Covers the case where a previous tag doesn't exist for the tree

				          # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly

				          if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then

				            echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"

				            exit 1

				          fi

				          PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")

				          # If no image exists but the hash is the same as the previous hash then we should error out here

				          if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then

				            echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"

				            echo "       contact the PyTorch team to restore the original images"

				            exit 1

				          fi

				          echo ::set-output name=rebuild::yes

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          DOCKER_SKIP_S3_UPLOAD: 1

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          cd .circleci/docker && ./build_docker.sh

				  build:

				    runs-on: linux.2xlarge

				    needs: [calculate-docker-image, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: periodic-linux-xenial-cuda11.1-py3.6-gcc7-build

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Build

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e SKIP_SCCACHE_INITIALIZATION=1 \

				            -e TORCH_CUDA_ARCH_LIST \

				            -e PR_LABELS \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --tty \

				            --detach \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          IS_GHA: 1

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				          pip3 install requests==2.26 boto3==1.16.34

				          python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				      - name: Chown workspace

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Archive artifacts into zip

				        run: |

				          zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store PyTorch Build Artifacts on S3

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            artifacts.zip

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Clean up docker images

				        if: always()

				        run: |

				          # Prune all of the docker images

				          docker system prune -af

				  generate-test-matrix:

				    runs-on: ubuntu-18.04

				    needs: [ciflow_should_run]

				    env:

				      TEST_RUNNER_TYPE: linux.8xlarge.nvidia.gpu

				      ENABLE_DISTRIBUTED_TEST: 1

				      ENABLE_JIT_LEGACY_TEST: ''

				      ENABLE_MULTIGPU_TEST: ''

				      ENABLE_NOGPU_NO_AVX_TEST: ''

				      ENABLE_NOGPU_NO_AVX2_TEST: ''

				      ENABLE_SLOW_TEST: ''

				      ENABLE_DOCS_TEST: ''

				      ENABLE_BACKWARDS_COMPAT_TEST: ''

				      ENABLE_XLA_TEST: ''

				      ENABLE_NOARCH_TEST: ''

				      NUM_TEST_SHARDS: 2

				      MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu

				      NOGPU_RUNNER_TYPE: linux.2xlarge

				      PR_BODY: ${{ github.event.pull_request.body }}

				    outputs:

				      matrix: ${{ steps.set-matrix.outputs.matrix }}

				      render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}

				      ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}

				    container:

				      image: python:3.9

				    steps:

				      - name: Install dependencies

				        run: pip install typing-extensions==3.10

				      - name: Clone pytorch/pytorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				      - name: Generating test matrix

				        id: set-matrix

				        run: .github/scripts/generate_pytorch_test_matrix.py

				  test:

				    needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]

				    strategy:

				      matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}

				      fail-fast: false

				    runs-on: ${{ matrix.runner }}

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: periodic-linux-xenial-cuda11.1-py3.6-gcc7-test

				      TEST_CONFIG: ${{ matrix.config }}

				      SHARD_NUMBER: ${{ matrix.shard }}

				      NUM_TEST_SHARDS: ${{ matrix.num_shards }}

				      PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}

				      CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG

				        if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}

				        run: |

				          bash .github/scripts/install_nvidia_utils_linux.sh

				          echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"

				      - name: Determine shm-size

				        run: |

				          shm_size="1g"

				          case "${BUILD_ENVIRONMENT}" in

				            *cuda*)

				              shm_size="2g"

				              ;;

				            *rocm*)

				              shm_size="8g"

				              ;;

				          esac

				          echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"

				      - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b

				        name: Download PyTorch Build Artifacts

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				      - name: Unzip artifacts

				        run: |

				          unzip -o artifacts.zip

				      - name: Output disk space left

				        run: |

				          sudo df -H

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Test

				        env:

				          PR_NUMBER: ${{ github.event.pull_request.number }}

				          IS_GHA: 1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          AWS_DEFAULT_REGION: us-east-1

				        run: |

				          if [[ $TEST_CONFIG == 'multigpu' ]]; then

				            TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh

				          else

				            TEST_COMMAND=.jenkins/pytorch/test.sh

				          fi

				          if [[ $NUM_TEST_SHARDS -ne 2 ]]; then

				            export SHARD_NUMBER=0

				          fi

				          # detached container should get cleaned up by teardown_ec2_linux

				          # TODO: Stop building test binaries as part of the build phase

				          # Used for GPU_FLAG since that doesn't play nice

				          # shellcheck disable=SC2086

				          container_name=$(docker run \

				            ${GPU_FLAG:-} \

				            -e BUILD_ENVIRONMENT \

				            -e PR_NUMBER \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e GITHUB_ACTIONS \

				            -e IN_CI \

				            -e IS_GHA \

				            -e CIRCLE_BRANCH \

				            -e CIRCLE_SHA1 \

				            -e CIRCLE_PR_NUMBER \

				            -e AWS_DEFAULT_REGION \

				            -e IN_WHEEL_TEST \

				            -e SHARD_NUMBER \

				            -e JOB_BASE_NAME \

				            -e TEST_CONFIG \

				            -e NUM_TEST_SHARDS \

				            -e PYTORCH_IGNORE_DISABLED_ISSUES \

				            -e PR_LABELS \

				            -e CONTINUE_THROUGH_ERROR \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --shm-size="${SHM_SIZE}" \

				            --tty \

				            --detach \

				            --name="${container_name}" \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"

				      - name: Chown workspace

				        if: always()

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Install render_test_results dependencies

				        if: always()

				        shell: bash

				        run: |

				          python3 -m pip install junitparser==2.1.1 rich==10.9.0

				      - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"

				        if: always()

				        shell: bash

				        # Encoding is weird on windows, just try to default to utf-8 if possible

				        env:

				          PYTHONIOENCODING: "utf-8"

				        run: |

				          python3 tools/render_junit.py test/

				      - name: Zip test reports for upload

				        if: always()

				        env:

				          FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'

				        run: |

				          # Remove any previous test reports if they exist

				          rm -f test-reports-*.zip

				          zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'

				      - uses: actions/upload-artifact@v2

				        name: Store Test Reports

				        if: always()

				        with:

				          name: test-reports-${{ matrix.config }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store Test Reports on S3

				        if: always()

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - name: Display and upload test statistics (Click Me)

				        if: always()

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          JOB_BASE_NAME: periodic-linux-xenial-cuda11.1-py3.6-gcc7-test

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        shell: bash

				        run: |

				          python3 -m pip install -r requirements.txt

				          python3 -m pip install boto3==1.16.34

				          python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

									
										314

.github/workflows/generated-periodic-win-vs2019-cuda11.1-py3.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,314 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/windows_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: periodic-win-vs2019-cuda11.1-py3

				on:

				  pull_request:

				    types: [unassigned]

				  schedule:

				    - cron: 45 0,4,8,12,16,20 * * *

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: periodic-win-vs2019-cuda11.1-py3

				  BUILD_WHEEL: 1

				  CUDA_VERSION: "11.1"

				  IN_CI: 1

				  INSTALL_WINDOWS_SDK: 1

				  PYTHON_VERSION: "3.8"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  SCCACHE_BUCKET: "ossci-compiler-cache"

				  VC_PRODUCT: "BuildTools"

				  VC_VERSION: ""

				  VS_VERSION: "16.8.6"

				  VC_YEAR: "2019"

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  no_proxy: localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock

				  TORCH_CUDA_ARCH_LIST: "7.0"

				  USE_CUDA: 1

				concurrency:

				  group: periodic-win-vs2019-cuda11.1-py3-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/scheduled') || contains(github.event.pull_request.labels.*.name, 'ciflow/win'))) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				  build:

				    runs-on: "windows.4xlarge"

				    defaults:

				      run:

				        working-directory: pytorch-${{ github.run_id }}

				    needs: [ciflow_should_run]

				    env:

				      JOB_BASE_NAME: periodic-win-vs2019-cuda11.1-py3-build

				      http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"

				      https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"

				    steps:

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          submodules: recursive

				          path: pytorch-${{ github.run_id }}

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Install Visual Studio 2019 toolchain

				        shell: powershell

				        run: |

				          .\.circleci\scripts\vs_install.ps1

				      - name: Install Cuda

				        shell: bash

				        run: |

				          .circleci/scripts/windows_cuda_install.sh

				      - name: Install Cudnn

				        shell: bash

				        run: |

				          .circleci/scripts/windows_cudnn_install.sh

				      - name: Build

				        shell: bash

				        env:

				          PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/

				        run: |

				          .jenkins/pytorch/win-build.sh

				      # Upload to github so that people can click and download artifacts

				      - name: Upload artifacts to Github

				        if: always()

				        uses: actions/upload-artifact@v2

				        # Don't fail on upload to GH since it's only for user convenience

				        continue-on-error: true

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          path: C:\${{ github.run_id }}\build-results

				      - name: Upload artifacts to s3

				        if: always()

				        uses: seemethere/upload-artifact-s3@v3

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          path: C:\${{ github.run_id }}\build-results

				      - name: Wait until all sessions have drained

				        shell: powershell

				        if: always()

				        timeout-minutes: 120

				        run: |

				          .github\scripts\wait_for_ssh_to_drain.ps1

				      - name: Kill active ssh sessions if still around (Useful if workflow was cancelled)

				        shell: powershell

				        if: always()

				        run: |

				          .github\scripts\kill_active_ssh_sessions.ps1

				      - name: Cleanup build-results and workspaces

				        if: always()

				        shell: bash

				        env:

				          PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/

				        # Should remove the entirety of pytorch-${{ github.run_id }}

				        run: |

				          rm -rf "${PYTORCH_FINAL_PACKAGE_DIR}"

				          rm -rf ./*

				  generate-test-matrix:

				    needs: [ciflow_should_run]

				    runs-on: ubuntu-18.04

				    env:

				      TEST_RUNNER_TYPE: windows.8xlarge.nvidia.gpu

				      NUM_TEST_SHARDS: 2

				      NUM_TEST_SHARDS_ON_PULL_REQUEST: 2

				      PR_BODY: ${{ github.event.pull_request.body }}

				    outputs:

				      matrix: ${{ steps.set-matrix.outputs.matrix }}

				      render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}

				      ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}

				    container:

				      image: python:3.9

				    steps:

				      - name: Install dependencies

				        run: pip install typing-extensions==3.10

				      - name: Clone pytorch/pytorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				      - name: Generating test matrix

				        id: set-matrix

				        run: .github/scripts/generate_pytorch_test_matrix.py

				  test:

				    env:

				      JOB_BASE_NAME: periodic-win-vs2019-cuda11.1-py3-test

				      SHARD_NUMBER: ${{ matrix.shard }}

				      NUM_TEST_SHARDS: ${{ matrix.num_shards }}

				      TEST_CONFIG: ${{ matrix.config }}

				      http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"

				      https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"

				      RUN_SMOKE_TESTS_ONLY_ON_PR: False

				      PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}

				      CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}

				    needs: [build, generate-test-matrix, ciflow_should_run]

				    strategy:

				      matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}

				      fail-fast: false

				    runs-on: ${{ matrix.runner }}

				    defaults:

				      run:

				        working-directory: pytorch-${{ github.run_id }}

				    steps:

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          submodules: recursive

				          path: pytorch-${{ github.run_id }}

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Install Visual Studio 2019 toolchain

				        shell: powershell

				        run: |

				          .\.circleci\scripts\vs_install.ps1

				      - name: Install Cuda

				        shell: bash

				        run: |

				          .circleci/scripts/windows_cuda_install.sh

				      - name: Install Cudnn

				        shell: bash

				        run: |

				          .circleci/scripts/windows_cudnn_install.sh

				      - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b

				        name: Download PyTorch Build Artifacts

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          path: C:\${{ github.run_id }}\build-results

				      - name: Check build-results folder

				        shell: powershell

				        run: |

				          tree /F C:\$Env:GITHUB_RUN_ID\build-results

				      # Needed for coverage in win-test.sh

				      - uses: actions/setup-python@v2

				        name: Setup Python3

				        with:

				          python-version: '3.x'

				      - name: Test

				        shell: bash

				        env:

				          PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/

				        run: |

				            if [[ $NUM_TEST_SHARDS -ne 2 ]]; then

				              export SHARD_NUMBER=0

				            fi

				            if [[ -n $GITHUB_HEAD_REF && "$RUN_SMOKE_TESTS_ONLY_ON_PR" == "true" ]]; then

				              export RUN_SMOKE_TESTS_ONLY=1

				            fi

				            .jenkins/pytorch/win-test.sh

				      - name: Zip test reports for upload

				        if: always()

				        env:

				          FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'

				        shell: powershell

				        run: |

				          # -ir => recursive include all files in pattern

				          7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml'

				      - uses: actions/upload-artifact@v2

				        name: Store Test Reports

				        if: always()

				        with:

				          name: test-reports-${{ matrix.config }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            pytorch-${{ github.run_id }}/test-reports-*.zip

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store Test Reports on S3

				        if: always()

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            pytorch-${{ github.run_id }}/test-reports-*.zip

				      - name: Install render_test_results dependencies

				        if: always()

				        shell: bash

				        run: |

				          python3 -m pip install junitparser==2.1.1 rich==10.9.0

				      - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"

				        if: always()

				        shell: bash

				        # Encoding is weird on windows, just try to default to utf-8 if possible

				        env:

				          PYTHONIOENCODING: "utf-8"

				        run: |

				          python3 tools/render_junit.py test/

				      - name: Wait until all sessions have drained

				        shell: powershell

				        if: always()

				        timeout-minutes: 120

				        run: |

				          .github\scripts\wait_for_ssh_to_drain.ps1

				      - name: Kill active ssh sessions if still around (Useful if workflow was cancelled)

				        shell: powershell

				        if: always()

				        run: |

				          .github\scripts\kill_active_ssh_sessions.ps1

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload test statistics (Click Me)

				        if: always()

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          JOB_BASE_NAME: periodic-win-vs2019-cuda11.1-py3-test

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        shell: bash

				        run: |

				          python3 -m pip install -r requirements.txt

				          python3 -m pip install boto3==1.16.34

				          python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				      - name: Cleanup workspace

				        if: always()

				        shell: bash

				        # Should remove the entirety of pytorch-${{ github.run_id }}

				        run: |

				          rm -rf ./*

									
										332

.github/workflows/pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7.yml → .github/workflows/generated-puretorch-linux-xenial-py3.6-gcc5.4.yml
									
										generated
									
										vendored
									
												View File
												
				@ -1,10 +1,11 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/linux_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: Linux CI (pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7)

				name: puretorch-linux-xenial-py3.6-gcc5.4

				on:

				  # TODO: Enable pull_request builds when we can verify capacity can be met by auto-scalers

				  pull_request:

				    types: [unassigned]

				  push:

				    branches:

				      - master

				@ -12,42 +13,92 @@ on:

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7

				  BUILD_ENVIRONMENT: puretorch-linux-xenial-py3.6-gcc5.4

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				  CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				concurrency:

				  group: pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}

				  group: puretorch-linux-xenial-py3.6-gcc5.4-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}

				    env:

				      LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				      - name: print labels

				        run: echo "${LABELS}"

				  calculate-docker-image:

				    runs-on: linux.2xlarge

				    needs: [ciflow_should_run]

				    env:

				      DOCKER_BUILDKIT: 1

				    timeout-minutes: 90

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: actions/checkout@v2

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: false

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				@ -87,7 +138,7 @@ jobs:

				          fi

				          echo ::set-output name=rebuild::yes

				      - name: Build and push docker image

				        if: steps.check.outputs.rebuild

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          DOCKER_SKIP_S3_UPLOAD: 1

				@ -97,83 +148,115 @@ jobs:

				  build:

				    runs-on: linux.2xlarge

				    needs: calculate-docker-image

				    needs: [calculate-docker-image, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: puretorch-linux-xenial-py3.6-gcc5.4-build

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Checkout PyTorch

				        uses: actions/checkout@v2

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          fetch-depth: 0 # deep clone, to allow sharding to use git rev-list

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Preserve github env variables for use in docker

				      - name: Build

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Build PyTorch

				        run: |

				          docker run \

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e SKIP_SCCACHE_INITIALIZATION=1 \

				            -e TORCH_CUDA_ARCH_LIST \

				            -e PR_LABELS \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --tty \

				            --detach \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}" \

				            sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/print_test_stats.py to natively support GitHub Actions

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          IS_GHA: 1

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: ${{ github.run_id }} # dunno if this corresponds

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          export PYTHONPATH=$PWD

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				          pip3 install requests

				          python3 .circleci/scripts/upload_binary_size_to_scuba.py || exit 0

				          pip3 install requests==2.26 boto3==1.16.34

				          python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				      - name: Chown workspace

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Archive artifacts into zip

				        run: |

				          zip -r artifacts.zip dist/ build/

				      # Upload to github so that people can click and download artifacts

				      - uses: actions/upload-artifact@v2

				        # Don't fail on upload to GH since it's only for user convenience

				        continue-on-error: true

				        name: Store PyTorch Build Artifacts on Github

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            artifacts.zip

				      - uses: seemethere/upload-artifact-s3@9d7ceb0ab39c2c88d93ef7792b27425b27d59162

				          zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store PyTorch Build Artifacts on S3

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				@ -181,158 +264,31 @@ jobs:

				          if-no-files-found: error

				          path:

				            artifacts.zip

				      - name: Clean up docker images

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: |

				          # Prune all of the docker images

				          docker system prune -af

				  test:

				    runs-on: linux.8xlarge.nvidia.gpu

				    needs:

				      - calculate-docker-image

				      - build

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				    steps:

				      - name: Log in to ECR

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)/../":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Checkout PyTorch

				        uses: actions/checkout@v2

				        with:

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				      - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG

				        if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') }}

				        run: |

				          bash .github/scripts/install_nvidia_utils_linux.sh

				          echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"

				      - name: Determine shm-size

				        run: |

				          shm_size="1g"

				          case "${BUILD_ENVIRONMENT}" in

				            *cuda*)

				              shm_size="2g"

				              ;;

				            *rocm*)

				              shm_size="8g"

				              ;;

				          esac

				          echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"

				      - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b

				        name: Download PyTorch Build Artifacts

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				      - name: Unzip artifacts

				        run: |

				          unzip -o artifacts.zip

				      - name: Output disk space left

				        run: |

				          sudo df -H

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Test PyTorch

				        run: |

				          # TODO: Stop building test binaries as part of the build phase

				          # Used for GPU_FLAG since that doesn't play nice

				          # shellcheck disable=SC2086

				          docker run \

				            ${GPU_FLAG:-} \

				            -e BUILD_ENVIRONMENT \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e GITHUB_ACTIONS \

				            -e IN_CI \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --shm-size="${SHM_SIZE}" \

				            --tty \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}" \

				            sh -c 'sudo chown -R jenkins . && pip install dist/*.whl && .jenkins/pytorch/test.sh'

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - uses: actions/upload-artifact@v2

				        name: Store PyTorch Test Reports

				        if: always()

				        with:

				          name: test-reports

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test/**/*.xml

				      - name: Clean up docker images

				        if: always()

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				          # Prune all of the docker images

				          docker system prune -af

				  # this is a separate step from test because the log files from test are too

				  # long: basically, GitHub tries to render all of the log files when you click

				  # through an action causing extreme slowdown on actions that contain too many

				  # logs (like test); we can always move it back to the other one, but it

				  # doesn't create the best experience

				  render_test_results:

				    if: always()

				    needs:

				      - test

				    runs-on: ubuntu-18.04

				    steps:

				      - name: Checkout PyTorch

				        uses: actions/checkout@v2

				        with:

				          # deep clone, to allow tools/print_test_stats.py to use Git commands

				          fetch-depth: 0

				      - uses: actions/download-artifact@v2

				        name: Download PyTorch Test Reports

				        with:

				          name: test-reports

				          path: test/test-reports

				      - uses: actions/setup-python@v2

				        with:

				          python-version: 3.9

				      - name: Install dependencies

				        # boto3 version copied from .circleci/docker/common/install_conda.sh

				        run: |

				          pip install -r requirements.txt

				          pip install boto3==1.16.34 junitparser rich

				      - name: Output Test Results (Click Me)

				        run: |

				          python tools/render_junit.py test

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload test statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/print_test_stats.py to natively support GitHub Actions

				        env:

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_ACCESS_KEY_ID }}

				          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_SECRET_ACCESS_KEY }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_JOB: pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: ${{ github.run_id }} # dunno if this corresponds

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          export PYTHONPATH=$PWD

				          python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Clean up docker images

				        if: always()

				        run: |

				          # Prune all of the docker images

				          docker system prune -af

									
										298

.github/workflows/generated-win-vs2019-cpu-py3.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,298 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/windows_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: win-vs2019-cpu-py3

				on:

				  pull_request:

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: win-vs2019-cpu-py3

				  BUILD_WHEEL: 1

				  CUDA_VERSION: "cpu"

				  IN_CI: 1

				  INSTALL_WINDOWS_SDK: 1

				  PYTHON_VERSION: "3.8"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  SCCACHE_BUCKET: "ossci-compiler-cache"

				  VC_PRODUCT: "BuildTools"

				  VC_VERSION: ""

				  VS_VERSION: "16.8.6"

				  VC_YEAR: "2019"

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  no_proxy: localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock

				concurrency:

				  group: win-vs2019-cpu-py3-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository_owner == 'pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/win'))) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				  build:

				    runs-on: "windows.4xlarge"

				    defaults:

				      run:

				        working-directory: pytorch-${{ github.run_id }}

				    needs: [ciflow_should_run]

				    env:

				      JOB_BASE_NAME: win-vs2019-cpu-py3-build

				      http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"

				      https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"

				    steps:

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          submodules: recursive

				          path: pytorch-${{ github.run_id }}

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Install Visual Studio 2019 toolchain

				        shell: powershell

				        run: |

				          .\.circleci\scripts\vs_install.ps1

				      - name: Build

				        shell: bash

				        env:

				          PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/

				        run: |

				          .jenkins/pytorch/win-build.sh

				      # Upload to github so that people can click and download artifacts

				      - name: Upload artifacts to Github

				        if: always()

				        uses: actions/upload-artifact@v2

				        # Don't fail on upload to GH since it's only for user convenience

				        continue-on-error: true

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          path: C:\${{ github.run_id }}\build-results

				      - name: Upload artifacts to s3

				        if: always()

				        uses: seemethere/upload-artifact-s3@v3

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          path: C:\${{ github.run_id }}\build-results

				      - name: Wait until all sessions have drained

				        shell: powershell

				        if: always()

				        timeout-minutes: 120

				        run: |

				          .github\scripts\wait_for_ssh_to_drain.ps1

				      - name: Kill active ssh sessions if still around (Useful if workflow was cancelled)

				        shell: powershell

				        if: always()

				        run: |

				          .github\scripts\kill_active_ssh_sessions.ps1

				      - name: Cleanup build-results and workspaces

				        if: always()

				        shell: bash

				        env:

				          PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/

				        # Should remove the entirety of pytorch-${{ github.run_id }}

				        run: |

				          rm -rf "${PYTORCH_FINAL_PACKAGE_DIR}"

				          rm -rf ./*

				  generate-test-matrix:

				    needs: [ciflow_should_run]

				    runs-on: ubuntu-18.04

				    env:

				      TEST_RUNNER_TYPE: windows.4xlarge

				      NUM_TEST_SHARDS: 2

				      NUM_TEST_SHARDS_ON_PULL_REQUEST: 2

				      PR_BODY: ${{ github.event.pull_request.body }}

				    outputs:

				      matrix: ${{ steps.set-matrix.outputs.matrix }}

				      render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}

				      ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}

				    container:

				      image: python:3.9

				    steps:

				      - name: Install dependencies

				        run: pip install typing-extensions==3.10

				      - name: Clone pytorch/pytorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				      - name: Generating test matrix

				        id: set-matrix

				        run: .github/scripts/generate_pytorch_test_matrix.py

				  test:

				    env:

				      JOB_BASE_NAME: win-vs2019-cpu-py3-test

				      SHARD_NUMBER: ${{ matrix.shard }}

				      NUM_TEST_SHARDS: ${{ matrix.num_shards }}

				      TEST_CONFIG: ${{ matrix.config }}

				      http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"

				      https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"

				      RUN_SMOKE_TESTS_ONLY_ON_PR: False

				      PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}

				      CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}

				    needs: [build, generate-test-matrix, ciflow_should_run]

				    strategy:

				      matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}

				      fail-fast: false

				    runs-on: ${{ matrix.runner }}

				    defaults:

				      run:

				        working-directory: pytorch-${{ github.run_id }}

				    steps:

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          submodules: recursive

				          path: pytorch-${{ github.run_id }}

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Install Visual Studio 2019 toolchain

				        shell: powershell

				        run: |

				          .\.circleci\scripts\vs_install.ps1

				      - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b

				        name: Download PyTorch Build Artifacts

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          path: C:\${{ github.run_id }}\build-results

				      - name: Check build-results folder

				        shell: powershell

				        run: |

				          tree /F C:\$Env:GITHUB_RUN_ID\build-results

				      # Needed for coverage in win-test.sh

				      - uses: actions/setup-python@v2

				        name: Setup Python3

				        with:

				          python-version: '3.x'

				      - name: Test

				        shell: bash

				        env:

				          PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/

				        run: |

				            if [[ $NUM_TEST_SHARDS -ne 2 ]]; then

				              export SHARD_NUMBER=0

				            fi

				            if [[ -n $GITHUB_HEAD_REF && "$RUN_SMOKE_TESTS_ONLY_ON_PR" == "true" ]]; then

				              export RUN_SMOKE_TESTS_ONLY=1

				            fi

				            .jenkins/pytorch/win-test.sh

				      - name: Zip test reports for upload

				        if: always()

				        env:

				          FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'

				        shell: powershell

				        run: |

				          # -ir => recursive include all files in pattern

				          7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml'

				      - uses: actions/upload-artifact@v2

				        name: Store Test Reports

				        if: always()

				        with:

				          name: test-reports-${{ matrix.config }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            pytorch-${{ github.run_id }}/test-reports-*.zip

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store Test Reports on S3

				        if: always()

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            pytorch-${{ github.run_id }}/test-reports-*.zip

				      - name: Install render_test_results dependencies

				        if: always()

				        shell: bash

				        run: |

				          python3 -m pip install junitparser==2.1.1 rich==10.9.0

				      - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"

				        if: always()

				        shell: bash

				        # Encoding is weird on windows, just try to default to utf-8 if possible

				        env:

				          PYTHONIOENCODING: "utf-8"

				        run: |

				          python3 tools/render_junit.py test/

				      - name: Wait until all sessions have drained

				        shell: powershell

				        if: always()

				        timeout-minutes: 120

				        run: |

				          .github\scripts\wait_for_ssh_to_drain.ps1

				      - name: Kill active ssh sessions if still around (Useful if workflow was cancelled)

				        shell: powershell

				        if: always()

				        run: |

				          .github\scripts\kill_active_ssh_sessions.ps1

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload test statistics (Click Me)

				        if: always()

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          JOB_BASE_NAME: win-vs2019-cpu-py3-test

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        shell: bash

				        run: |

				          python3 -m pip install -r requirements.txt

				          python3 -m pip install boto3==1.16.34

				          python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				      - name: Cleanup workspace

				        if: always()

				        shell: bash

				        # Should remove the entirety of pytorch-${{ github.run_id }}

				        run: |

				          rm -rf ./*

									
										316

.github/workflows/generated-win-vs2019-cuda10.2-py3.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,316 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/windows_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: win-vs2019-cuda10.2-py3

				on:

				  pull_request:

				    types: [unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: win-vs2019-cuda10.2-py3

				  BUILD_WHEEL: 1

				  CUDA_VERSION: "10.2"

				  IN_CI: 1

				  INSTALL_WINDOWS_SDK: 1

				  PYTHON_VERSION: "3.8"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  SCCACHE_BUCKET: "ossci-compiler-cache"

				  VC_PRODUCT: "BuildTools"

				  VC_VERSION: ""

				  VS_VERSION: "16.8.6"

				  VC_YEAR: "2019"

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  no_proxy: localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock

				  TORCH_CUDA_ARCH_LIST: "7.0"

				  USE_CUDA: 1

				concurrency:

				  group: win-vs2019-cuda10.2-py3-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/win'))) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				  build:

				    runs-on: "windows.4xlarge"

				    defaults:

				      run:

				        working-directory: pytorch-${{ github.run_id }}

				    needs: [ciflow_should_run]

				    env:

				      JOB_BASE_NAME: win-vs2019-cuda10.2-py3-build

				      http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"

				      https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"

				    steps:

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          submodules: recursive

				          path: pytorch-${{ github.run_id }}

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Install Visual Studio 2019 toolchain

				        shell: powershell

				        run: |

				          .\.circleci\scripts\vs_install.ps1

				      - name: Install Cuda

				        shell: bash

				        run: |

				          .circleci/scripts/windows_cuda_install.sh

				      - name: Install Cudnn

				        shell: bash

				        run: |

				          .circleci/scripts/windows_cudnn_install.sh

				      - name: Build

				        shell: bash

				        env:

				          PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/

				        run: |

				          .jenkins/pytorch/win-build.sh

				      # Upload to github so that people can click and download artifacts

				      - name: Upload artifacts to Github

				        if: always()

				        uses: actions/upload-artifact@v2

				        # Don't fail on upload to GH since it's only for user convenience

				        continue-on-error: true

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          path: C:\${{ github.run_id }}\build-results

				      - name: Upload artifacts to s3

				        if: always()

				        uses: seemethere/upload-artifact-s3@v3

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          path: C:\${{ github.run_id }}\build-results

				      - name: Wait until all sessions have drained

				        shell: powershell

				        if: always()

				        timeout-minutes: 120

				        run: |

				          .github\scripts\wait_for_ssh_to_drain.ps1

				      - name: Kill active ssh sessions if still around (Useful if workflow was cancelled)

				        shell: powershell

				        if: always()

				        run: |

				          .github\scripts\kill_active_ssh_sessions.ps1

				      - name: Cleanup build-results and workspaces

				        if: always()

				        shell: bash

				        env:

				          PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/

				        # Should remove the entirety of pytorch-${{ github.run_id }}

				        run: |

				          rm -rf "${PYTORCH_FINAL_PACKAGE_DIR}"

				          rm -rf ./*

				  generate-test-matrix:

				    needs: [ciflow_should_run]

				    runs-on: ubuntu-18.04

				    env:

				      TEST_RUNNER_TYPE: windows.8xlarge.nvidia.gpu

				      NUM_TEST_SHARDS: 2

				      NUM_TEST_SHARDS_ON_PULL_REQUEST: 2

				      PR_BODY: ${{ github.event.pull_request.body }}

				    outputs:

				      matrix: ${{ steps.set-matrix.outputs.matrix }}

				      render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}

				      ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}

				    container:

				      image: python:3.9

				    steps:

				      - name: Install dependencies

				        run: pip install typing-extensions==3.10

				      - name: Clone pytorch/pytorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				      - name: Generating test matrix

				        id: set-matrix

				        run: .github/scripts/generate_pytorch_test_matrix.py

				  test:

				    env:

				      JOB_BASE_NAME: win-vs2019-cuda10.2-py3-test

				      SHARD_NUMBER: ${{ matrix.shard }}

				      NUM_TEST_SHARDS: ${{ matrix.num_shards }}

				      TEST_CONFIG: ${{ matrix.config }}

				      http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"

				      https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"

				      RUN_SMOKE_TESTS_ONLY_ON_PR: False

				      PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}

				      CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}

				    needs: [build, generate-test-matrix, ciflow_should_run]

				    strategy:

				      matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}

				      fail-fast: false

				    runs-on: ${{ matrix.runner }}

				    defaults:

				      run:

				        working-directory: pytorch-${{ github.run_id }}

				    steps:

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          submodules: recursive

				          path: pytorch-${{ github.run_id }}

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Install Visual Studio 2019 toolchain

				        shell: powershell

				        run: |

				          .\.circleci\scripts\vs_install.ps1

				      - name: Install Cuda

				        shell: bash

				        run: |

				          .circleci/scripts/windows_cuda_install.sh

				      - name: Install Cudnn

				        shell: bash

				        run: |

				          .circleci/scripts/windows_cudnn_install.sh

				      - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b

				        name: Download PyTorch Build Artifacts

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          path: C:\${{ github.run_id }}\build-results

				      - name: Check build-results folder

				        shell: powershell

				        run: |

				          tree /F C:\$Env:GITHUB_RUN_ID\build-results

				      # Needed for coverage in win-test.sh

				      - uses: actions/setup-python@v2

				        name: Setup Python3

				        with:

				          python-version: '3.x'

				      - name: Test

				        shell: bash

				        env:

				          PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/

				        run: |

				            if [[ $NUM_TEST_SHARDS -ne 2 ]]; then

				              export SHARD_NUMBER=0

				            fi

				            if [[ -n $GITHUB_HEAD_REF && "$RUN_SMOKE_TESTS_ONLY_ON_PR" == "true" ]]; then

				              export RUN_SMOKE_TESTS_ONLY=1

				            fi

				            .jenkins/pytorch/win-test.sh

				      - name: Zip test reports for upload

				        if: always()

				        env:

				          FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'

				        shell: powershell

				        run: |

				          # -ir => recursive include all files in pattern

				          7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml'

				      - uses: actions/upload-artifact@v2

				        name: Store Test Reports

				        if: always()

				        with:

				          name: test-reports-${{ matrix.config }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            pytorch-${{ github.run_id }}/test-reports-*.zip

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store Test Reports on S3

				        if: always()

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            pytorch-${{ github.run_id }}/test-reports-*.zip

				      - name: Install render_test_results dependencies

				        if: always()

				        shell: bash

				        run: |

				          python3 -m pip install junitparser==2.1.1 rich==10.9.0

				      - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"

				        if: always()

				        shell: bash

				        # Encoding is weird on windows, just try to default to utf-8 if possible

				        env:

				          PYTHONIOENCODING: "utf-8"

				        run: |

				          python3 tools/render_junit.py test/

				      - name: Wait until all sessions have drained

				        shell: powershell

				        if: always()

				        timeout-minutes: 120

				        run: |

				          .github\scripts\wait_for_ssh_to_drain.ps1

				      - name: Kill active ssh sessions if still around (Useful if workflow was cancelled)

				        shell: powershell

				        if: always()

				        run: |

				          .github\scripts\kill_active_ssh_sessions.ps1

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload test statistics (Click Me)

				        if: always()

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          JOB_BASE_NAME: win-vs2019-cuda10.2-py3-test

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        shell: bash

				        run: |

				          python3 -m pip install -r requirements.txt

				          python3 -m pip install boto3==1.16.34

				          python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				      - name: Cleanup workspace

				        if: always()

				        shell: bash

				        # Should remove the entirety of pytorch-${{ github.run_id }}

				        run: |

				          rm -rf ./*

									
										316

.github/workflows/generated-win-vs2019-cuda11.3-py3.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,316 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/windows_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: win-vs2019-cuda11.3-py3

				on:

				  pull_request:

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: win-vs2019-cuda11.3-py3

				  BUILD_WHEEL: 1

				  CUDA_VERSION: "11.3"

				  IN_CI: 1

				  INSTALL_WINDOWS_SDK: 1

				  PYTHON_VERSION: "3.8"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  SCCACHE_BUCKET: "ossci-compiler-cache"

				  VC_PRODUCT: "BuildTools"

				  VC_VERSION: ""

				  VS_VERSION: "16.8.6"

				  VC_YEAR: "2019"

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  no_proxy: localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock

				  TORCH_CUDA_ARCH_LIST: "7.0"

				  USE_CUDA: 1

				concurrency:

				  group: win-vs2019-cuda11.3-py3-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository_owner == 'pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/win'))) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				  build:

				    runs-on: "windows.4xlarge"

				    defaults:

				      run:

				        working-directory: pytorch-${{ github.run_id }}

				    needs: [ciflow_should_run]

				    env:

				      JOB_BASE_NAME: win-vs2019-cuda11.3-py3-build

				      http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"

				      https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"

				    steps:

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          submodules: recursive

				          path: pytorch-${{ github.run_id }}

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Install Visual Studio 2019 toolchain

				        shell: powershell

				        run: |

				          .\.circleci\scripts\vs_install.ps1

				      - name: Install Cuda

				        shell: bash

				        run: |

				          .circleci/scripts/windows_cuda_install.sh

				      - name: Install Cudnn

				        shell: bash

				        run: |

				          .circleci/scripts/windows_cudnn_install.sh

				      - name: Build

				        shell: bash

				        env:

				          PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/

				        run: |

				          .jenkins/pytorch/win-build.sh

				      # Upload to github so that people can click and download artifacts

				      - name: Upload artifacts to Github

				        if: always()

				        uses: actions/upload-artifact@v2

				        # Don't fail on upload to GH since it's only for user convenience

				        continue-on-error: true

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          path: C:\${{ github.run_id }}\build-results

				      - name: Upload artifacts to s3

				        if: always()

				        uses: seemethere/upload-artifact-s3@v3

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          path: C:\${{ github.run_id }}\build-results

				      - name: Wait until all sessions have drained

				        shell: powershell

				        if: always()

				        timeout-minutes: 120

				        run: |

				          .github\scripts\wait_for_ssh_to_drain.ps1

				      - name: Kill active ssh sessions if still around (Useful if workflow was cancelled)

				        shell: powershell

				        if: always()

				        run: |

				          .github\scripts\kill_active_ssh_sessions.ps1

				      - name: Cleanup build-results and workspaces

				        if: always()

				        shell: bash

				        env:

				          PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/

				        # Should remove the entirety of pytorch-${{ github.run_id }}

				        run: |

				          rm -rf "${PYTORCH_FINAL_PACKAGE_DIR}"

				          rm -rf ./*

				  generate-test-matrix:

				    needs: [ciflow_should_run]

				    runs-on: ubuntu-18.04

				    env:

				      TEST_RUNNER_TYPE: windows.8xlarge.nvidia.gpu

				      NUM_TEST_SHARDS: 2

				      NUM_TEST_SHARDS_ON_PULL_REQUEST: 1

				      PR_BODY: ${{ github.event.pull_request.body }}

				    outputs:

				      matrix: ${{ steps.set-matrix.outputs.matrix }}

				      render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}

				      ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}

				    container:

				      image: python:3.9

				    steps:

				      - name: Install dependencies

				        run: pip install typing-extensions==3.10

				      - name: Clone pytorch/pytorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				      - name: Generating test matrix

				        id: set-matrix

				        run: .github/scripts/generate_pytorch_test_matrix.py

				  test:

				    env:

				      JOB_BASE_NAME: win-vs2019-cuda11.3-py3-test

				      SHARD_NUMBER: ${{ matrix.shard }}

				      NUM_TEST_SHARDS: ${{ matrix.num_shards }}

				      TEST_CONFIG: ${{ matrix.config }}

				      http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"

				      https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"

				      RUN_SMOKE_TESTS_ONLY_ON_PR: True

				      PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}

				      CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}

				    needs: [build, generate-test-matrix, ciflow_should_run]

				    strategy:

				      matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}

				      fail-fast: false

				    runs-on: ${{ matrix.runner }}

				    defaults:

				      run:

				        working-directory: pytorch-${{ github.run_id }}

				    steps:

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          submodules: recursive

				          path: pytorch-${{ github.run_id }}

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Install Visual Studio 2019 toolchain

				        shell: powershell

				        run: |

				          .\.circleci\scripts\vs_install.ps1

				      - name: Install Cuda

				        shell: bash

				        run: |

				          .circleci/scripts/windows_cuda_install.sh

				      - name: Install Cudnn

				        shell: bash

				        run: |

				          .circleci/scripts/windows_cudnn_install.sh

				      - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b

				        name: Download PyTorch Build Artifacts

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          path: C:\${{ github.run_id }}\build-results

				      - name: Check build-results folder

				        shell: powershell

				        run: |

				          tree /F C:\$Env:GITHUB_RUN_ID\build-results

				      # Needed for coverage in win-test.sh

				      - uses: actions/setup-python@v2

				        name: Setup Python3

				        with:

				          python-version: '3.x'

				      - name: Test

				        shell: bash

				        env:

				          PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/

				        run: |

				            if [[ $NUM_TEST_SHARDS -ne 2 ]]; then

				              export SHARD_NUMBER=0

				            fi

				            if [[ -n $GITHUB_HEAD_REF && "$RUN_SMOKE_TESTS_ONLY_ON_PR" == "true" ]]; then

				              export RUN_SMOKE_TESTS_ONLY=1

				            fi

				            .jenkins/pytorch/win-test.sh

				      - name: Zip test reports for upload

				        if: always()

				        env:

				          FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'

				        shell: powershell

				        run: |

				          # -ir => recursive include all files in pattern

				          7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml'

				      - uses: actions/upload-artifact@v2

				        name: Store Test Reports

				        if: always()

				        with:

				          name: test-reports-${{ matrix.config }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            pytorch-${{ github.run_id }}/test-reports-*.zip

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store Test Reports on S3

				        if: always()

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            pytorch-${{ github.run_id }}/test-reports-*.zip

				      - name: Install render_test_results dependencies

				        if: always()

				        shell: bash

				        run: |

				          python3 -m pip install junitparser==2.1.1 rich==10.9.0

				      - name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"

				        if: always()

				        shell: bash

				        # Encoding is weird on windows, just try to default to utf-8 if possible

				        env:

				          PYTHONIOENCODING: "utf-8"

				        run: |

				          python3 tools/render_junit.py test/

				      - name: Wait until all sessions have drained

				        shell: powershell

				        if: always()

				        timeout-minutes: 120

				        run: |

				          .github\scripts\wait_for_ssh_to_drain.ps1

				      - name: Kill active ssh sessions if still around (Useful if workflow was cancelled)

				        shell: powershell

				        if: always()

				        run: |

				          .github\scripts\kill_active_ssh_sessions.ps1

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload test statistics (Click Me)

				        if: always()

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          JOB_BASE_NAME: win-vs2019-cuda11.3-py3-test

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        shell: bash

				        run: |

				          python3 -m pip install -r requirements.txt

				          python3 -m pip install boto3==1.16.34

				          python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				      - name: Cleanup workspace

				        if: always()

				        shell: bash

				        # Should remove the entirety of pytorch-${{ github.run_id }}

				        run: |

				          rm -rf ./*

Compare commits

2546 Commits issue#5873 ... v1.10.0

2 .azure_pipelines/job_templates/build-verify-publish-template-unix.yml Unescape Escape View File

2 .azure_pipelines/job_templates/build-verify-publish-template-win.yml Unescape Escape View File

26 .azure_pipelines/job_templates/notify-webapp-template.yml Normal file Unescape Escape View File

2 .azure_pipelines/job_templates/pytorch-template-unix.yml Unescape Escape View File

2 .azure_pipelines/job_templates/wheel-wait-job-template.yml Unescape Escape View File

10 .azure_pipelines/nightly-pytorch-tests-pipeline.yml Unescape Escape View File

20 .azure_pipelines/pytorch-tests-pipeline.yml Unescape Escape View File

10 .bazelrc Unescape Escape View File

2 .bazelversion Unescape Escape View File

1 .circleci/README.md Unescape Escape View File

3 .circleci/cimodel/data/binary_build_data.py Unescape Escape View File

6 .circleci/cimodel/data/binary_build_definitions.py Unescape Escape View File

74 .circleci/cimodel/data/pytorch_build_data.py Unescape Escape View File

62 .circleci/cimodel/data/pytorch_build_definitions.py Unescape Escape View File

16 .circleci/cimodel/data/simple/docker_definitions.py Unescape Escape View File

78 .circleci/cimodel/data/simple/ge_config_tests.py Unescape Escape View File

2 .circleci/cimodel/data/simple/ios_definitions.py Unescape Escape View File

3 .circleci/cimodel/data/simple/nightly_ios.py Unescape Escape View File

11 .circleci/cimodel/data/windows_build_definitions.py Unescape Escape View File

1217 .circleci/config.yml generated View File

2 .circleci/docker/README.md Unescape Escape View File

91 .circleci/docker/build.sh Unescape Escape View File

25 .circleci/docker/common/install_breakpad.sh Unescape Escape View File

3 .circleci/docker/common/install_cmake.sh Unescape Escape View File

11 .circleci/docker/common/install_conda.sh Unescape Escape View File

17 .circleci/docker/common/install_db.sh Unescape Escape View File

4 .circleci/docker/common/install_nccl.sh Unescape Escape View File

6 .circleci/docker/common/install_openmpi.sh Unescape Escape View File

27 .circleci/docker/common/install_protobuf.sh Unescape Escape View File

10 .circleci/docker/common/install_rocm.sh Unescape Escape View File

17 .circleci/docker/common/install_vision.sh Unescape Escape View File

19 .circleci/docker/ubuntu-cuda/Dockerfile Unescape Escape View File

15 .circleci/docker/ubuntu/Dockerfile Unescape Escape View File

11 .circleci/generate_config_yml.py Unescape Escape View File

4 .circleci/scripts/binary_checkout.sh Unescape Escape View File

6 .circleci/scripts/binary_ios_build.sh Unescape Escape View File

13 .circleci/scripts/binary_ios_test.sh Unescape Escape View File

18 .circleci/scripts/binary_ios_upload.sh Unescape Escape View File

4 .circleci/scripts/binary_linux_test.sh Unescape Escape View File

4 .circleci/scripts/binary_macos_build.sh Unescape Escape View File

2 .circleci/scripts/binary_populate_env.sh Unescape Escape View File

47 .circleci/scripts/binary_windows_build.sh Unescape Escape View File

8 .circleci/scripts/binary_windows_test.sh Unescape Escape View File

27 .circleci/scripts/cpp_doc_push_script.sh Unescape Escape View File

29 .circleci/scripts/python_doc_push_script.sh Unescape Escape View File

8 .circleci/scripts/setup_ci_environment.sh Unescape Escape View File

45 .circleci/scripts/vs_install.ps1 Unescape Escape View File

118 .circleci/scripts/windows_cuda_install.sh Unescape Escape View File

62 .circleci/scripts/windows_cudnn_install.sh Unescape Escape View File

12 .circleci/verbatim-sources/build-parameters/pytorch-build-params.yml Unescape Escape View File

2 .circleci/verbatim-sources/commands.yml Unescape Escape View File

6 .circleci/verbatim-sources/job-specs/binary-job-specs.yml Unescape Escape View File

70 .circleci/verbatim-sources/job-specs/job-specs-custom.yml Unescape Escape View File

40 .circleci/verbatim-sources/job-specs/pytorch-job-specs.yml Unescape Escape View File

166 .circleci/verbatim-sources/workflows/workflows-scheduled-ci.yml Unescape Escape View File

3 .clang-tidy Unescape Escape View File

5 .gitattributes vendored Unescape Escape View File

2 .github/ISSUE_TEMPLATE/feature-request.md vendored Unescape Escape View File

8 .github/actionlint.yaml vendored Normal file Unescape Escape View File

102 .github/generated-ciflow-ruleset.json generated vendored Normal file Unescape Escape View File

4 .github/pytorch-circleci-labels.yml vendored Unescape Escape View File

1 .github/pytorch-probot.yml vendored Unescape Escape View File

6 .github/regenerate.sh vendored Executable file Unescape Escape View File

5 .github/scale-config.yml vendored Unescape Escape View File

5 .github/scripts/ensure_actions_will_cancel.py vendored Unescape Escape View File

634 .github/scripts/generate_ci_workflows.py vendored Unescape Escape View File

94 .github/scripts/generate_pytorch_test_matrix.py vendored Executable file Unescape Escape View File

6 .github/scripts/generate_pytorch_version.py vendored Unescape Escape View File

11 .github/scripts/kill_active_ssh_sessions.ps1 vendored Normal file Unescape Escape View File

30 .github/scripts/run_torchbench.py vendored Unescape Escape View File

17 .github/scripts/wait_for_ssh_to_drain.ps1 vendored Normal file Unescape Escape View File

13 .github/scripts/wait_for_ssh_to_drain.sh vendored Executable file Unescape Escape View File

137 .github/templates/bazel_ci_workflow.yml.j2 vendored Normal file Unescape Escape View File

186 .github/templates/common.yml.j2 vendored Normal file Unescape Escape View File

412 .github/templates/linux_ci_workflow.yml.j2 vendored Unescape Escape View File

227 .github/templates/windows_ci_workflow.yml.j2 vendored Unescape Escape View File

66 .github/workflows/add_annotations.yml vendored Unescape Escape View File

7 .github/workflows/auto_label.yml vendored Unescape Escape View File

2546 Commits

issue#5873 ... v1.10.0

2

.azure_pipelines/job_templates/build-verify-publish-template-unix.yml

View File

2

.azure_pipelines/job_templates/build-verify-publish-template-win.yml

View File

26

.azure_pipelines/job_templates/notify-webapp-template.yml Normal file

View File

2

.azure_pipelines/job_templates/pytorch-template-unix.yml

View File

2

.azure_pipelines/job_templates/wheel-wait-job-template.yml

View File

10

.azure_pipelines/nightly-pytorch-tests-pipeline.yml

View File

20

.azure_pipelines/pytorch-tests-pipeline.yml

View File

10

.bazelrc

View File

2

.bazelversion

View File

1

.circleci/README.md

View File

3

.circleci/cimodel/data/binary_build_data.py

View File

6

.circleci/cimodel/data/binary_build_definitions.py

View File

74

.circleci/cimodel/data/pytorch_build_data.py

View File

62

.circleci/cimodel/data/pytorch_build_definitions.py

View File

16

.circleci/cimodel/data/simple/docker_definitions.py

View File

78

.circleci/cimodel/data/simple/ge_config_tests.py

View File

2

.circleci/cimodel/data/simple/ios_definitions.py

View File

3

.circleci/cimodel/data/simple/nightly_ios.py

View File

11

.circleci/cimodel/data/windows_build_definitions.py

View File

1217

.circleci/config.yml generated

View File

2

.circleci/docker/README.md

View File

91

.circleci/docker/build.sh

View File

25

.circleci/docker/common/install_breakpad.sh

View File

3

.circleci/docker/common/install_cmake.sh

View File

11

.circleci/docker/common/install_conda.sh

View File

17

.circleci/docker/common/install_db.sh

View File

4

.circleci/docker/common/install_nccl.sh

View File

6

.circleci/docker/common/install_openmpi.sh

View File

27

.circleci/docker/common/install_protobuf.sh

View File

10

.circleci/docker/common/install_rocm.sh

View File

17

.circleci/docker/common/install_vision.sh

View File

19

.circleci/docker/ubuntu-cuda/Dockerfile

View File

15

.circleci/docker/ubuntu/Dockerfile

View File

11

.circleci/generate_config_yml.py

View File

4

.circleci/scripts/binary_checkout.sh

View File

6

.circleci/scripts/binary_ios_build.sh

View File

13

.circleci/scripts/binary_ios_test.sh

View File

18

.circleci/scripts/binary_ios_upload.sh

View File

4

.circleci/scripts/binary_linux_test.sh

View File

4

.circleci/scripts/binary_macos_build.sh

View File

2

.circleci/scripts/binary_populate_env.sh

View File

47

.circleci/scripts/binary_windows_build.sh

View File

8

.circleci/scripts/binary_windows_test.sh

View File

27

.circleci/scripts/cpp_doc_push_script.sh

View File

29

.circleci/scripts/python_doc_push_script.sh

View File

8

.circleci/scripts/setup_ci_environment.sh

View File

45

.circleci/scripts/vs_install.ps1

View File

118

.circleci/scripts/windows_cuda_install.sh

View File

62

.circleci/scripts/windows_cudnn_install.sh

View File

12

.circleci/verbatim-sources/build-parameters/pytorch-build-params.yml

View File

2

.circleci/verbatim-sources/commands.yml

View File

6

.circleci/verbatim-sources/job-specs/binary-job-specs.yml

View File

70

.circleci/verbatim-sources/job-specs/job-specs-custom.yml

View File

40

.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml

View File

166

.circleci/verbatim-sources/workflows/workflows-scheduled-ci.yml

View File

3

.clang-tidy

View File

5

.gitattributes vendored

View File

2

.github/ISSUE_TEMPLATE/feature-request.md vendored

View File

8

.github/actionlint.yaml vendored Normal file

View File

102

.github/generated-ciflow-ruleset.json generated vendored Normal file

View File

4

.github/pytorch-circleci-labels.yml vendored

View File

1

.github/pytorch-probot.yml vendored

View File

6

.github/regenerate.sh vendored Executable file

View File

5

.github/scale-config.yml vendored

View File

5

.github/scripts/ensure_actions_will_cancel.py vendored

View File

634

.github/scripts/generate_ci_workflows.py vendored

View File

94

.github/scripts/generate_pytorch_test_matrix.py vendored Executable file

View File

6

.github/scripts/generate_pytorch_version.py vendored

View File

11

.github/scripts/kill_active_ssh_sessions.ps1 vendored Normal file

View File

30

.github/scripts/run_torchbench.py vendored

View File

17

.github/scripts/wait_for_ssh_to_drain.ps1 vendored Normal file

View File

13

.github/scripts/wait_for_ssh_to_drain.sh vendored Executable file

View File

137

.github/templates/bazel_ci_workflow.yml.j2 vendored Normal file

View File

186

.github/templates/common.yml.j2 vendored Normal file

View File

412

.github/templates/linux_ci_workflow.yml.j2 vendored

View File

227

.github/templates/windows_ci_workflow.yml.j2 vendored

View File

66

.github/workflows/add_annotations.yml vendored

View File

7

.github/workflows/auto_label.yml vendored

View File

20

.github/workflows/build_linux_conda.yml vendored

View File