Compare commits

...

2546 Commits

Author SHA1 Message Date
36449ea931 (torch/elastic) add fqdn hostname to error printout (#66182) (#66662)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66182

closes https://github.com/pytorch/pytorch/issues/63174

Does a few things:

1. adds hostname to the error report
2. moves the "root cause" section to the end (presumably since the logs are being "tailed" we want the root cause to appear at the end)
3. moves redundant error info logging to debug
4. makes the border max 60 char in length and justifies left for the header

NOTE: YOU HAVE TO annotate your main function with torch.distributed.elastic.multiprocessing.errors.record, otherwise no traceback is printed (this is because python exception propagation does NOT work out of the both for IPC - hence the extra record annotation).

Test Plan:
Sample

```
============================================================
run_script_path FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2021-10-05_17:37:22
  host      : devvm4955.prn0.facebook.com
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 3296201)
  error_file: /home/kiuk/tmp/elastic/none_3_lsytqe/attempt_0/0/error.json
  traceback :
  Traceback (most recent call last):
    File "/tmp/jetter.xr3_x6qq/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 372, in wrapper
      return f(*args, **kwargs)
    File "main.py", line 28, in main
      raise RuntimeError(args.throws)
  RuntimeError: foobar

============================================================
```

Reviewed By: cbalioglu, aivanou

Differential Revision: D31416492

fbshipit-source-id: 0aeaf6e634e23ce0ea7f6a03b12c8a9ac57246e9
2021-10-14 18:35:23 -07:00
b544cbddfa Handle shared memory cases in MathBitFallback (#66667)
* Handle shared memory cases in MathBithFallback (#63602)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63602

This PR fixes the case when a read and write is performed on a memory shared between mutable and (or) non-mutable arguments. Example:
```
a=torch.tensor([1+1j])
b=a.conj()
b.add_(a) # should return tensor([2]) but returns tensor ([2-2j])
```

The issue here is that in the conjugate fallback, we resolve the conjugation in-place for mutable arguments which can be a problem as shown above in the case when other input arguments share memory with the mutable argument(s).
This PR fixes this issue by:
1. first scanning through the operator input arguments and creating a vector of mutable arguments that have the conj bit set to `True` (and accordingly setting the flag `check_for_alias_with_mut_arg ` to `True` or `False`).
2. Iterating through all the arguments. At this time we only look at the non-mutable arguments. If `check_for_alias_with_mut_arg` is set to `True`, then we iterate through `mutable_inputs` to check if the current arg tensor in question doesn't alias any of the entries in `mutable_inputs`. If yes, then we clone the non-mutable tensor arg, else we resolve the conjugation as before.
3. Now we look through the mutable_inputs vector (which contains only mutable input tensors with conj bit set to `True`). We in-place conjugate each of the entries in the vector.
4. Do the computation.
5. Re-conjugate the mutable argument tensors.

NOTE: `TensorLists` are not fully handled in ConjugateFallback. Please see the in-line comment for more details.

Fixes https://github.com/pytorch/pytorch/issues/59943

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D30466905

Pulled By: anjali411

fbshipit-source-id: 58058e5e6481da04a12d03f743c1491942a6cc9b

* fix lint (#66572)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66572

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D31624043

Pulled By: suo

fbshipit-source-id: 9db9cee3140d78c2a2f0c937be84755206fee1dd

Co-authored-by: anjali411 <chourdiaanjali123@gmail.com>
Co-authored-by: Michael Suo <suo@fb.com>
2021-10-14 18:34:13 -07:00
ddf3092581 Disable .numpy() and .tolist() for tensor subclasses subclasses and f… (#66642)
* Disable .numpy() and .tolist() for tensor subclasses subclasses and fix .tolist() for conjugated and negated tensors (#66082)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66082

Fixes https://github.com/pytorch/pytorch/issues/66024 #65779

cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved albanD

Test Plan: Imported from OSS

Reviewed By: Gamrix, albanD

Differential Revision: D31615588

Pulled By: anjali411

fbshipit-source-id: c3e65ef0fe301630eb76732ccd7819683c09aa19

* Apply suggestions from code review

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
Co-authored-by: Nikita Shulga <nshulga@fb.com>
2021-10-14 16:00:56 -07:00
cc360fa38f Delete extraneous whitespaces 2021-10-14 15:57:16 -07:00
3c134b8b1e Disable .numpy() and .tolist() for tensor subclasses subclasses and fix .tolist() for conjugated and negated tensors (#66082) (#66576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66082

Fixes https://github.com/pytorch/pytorch/issues/66024 #65779

cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved albanD

Test Plan: Imported from OSS

Reviewed By: Gamrix, albanD

Differential Revision: D31615588

Pulled By: anjali411

fbshipit-source-id: c3e65ef0fe301630eb76732ccd7819683c09aa19
2021-10-14 13:16:03 -07:00
4a514dd81e Call PyArray_Check only if NumPy is available (#66433) (#66629)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66353

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66433

Reviewed By: seemethere, janeyx99

Differential Revision: D31548290

Pulled By: malfet

fbshipit-source-id: 3b094bc8195d0392338e0bdc6df2f39587b85bb3
2021-10-14 09:46:41 -07:00
c3ea586e32 fix normal with empty std (#66524) 2021-10-14 09:42:41 -07:00
9509e8a3d6 Fix cosine similarity dim checks (#66214)
* fix cosine similarity dimensionality check

* fix shapes in the doc
2021-10-08 07:22:40 -07:00
1774a6a2f4 [ONNX] Deprecate various args (#65962)
* [ONNX] Remove argument _retain_param_name from torch.onnx.export() function. (#61702) (#64370)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64370

As of now, the "_retain_param_name" parameter has no description in PyTorch docs website. According to code, this argument determines if we keep the original parameter names of PyTorch model in the final ONNX graph. If this is False, those original parameter names will be replaced with a series of integers starting from 1.

Since setting numbers as parameter names make no sense to users, we remove this argument from the torch.onnx.export() function to increase user experience of calling this function.

This PR will still keep it in torch.onnx.export() function for backward support while all backend logic has been changed to work as _retain_param_name is set to True.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905270

Pulled By: malfet

fbshipit-source-id: ca60757ca17daaff937e9f08da42596086795f4a

Co-authored-by: fatcat-z <zhang-ji@outlook.com>

* [ONNX] Remove strip_doc_string param from torch.onnx.export() function. (#61712) (#64371)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64371

As of now, the "strip_doc_string" parameter was described as below:

strip_doc_string (bool, default True): do not include the field
doc_string``` from the exported model. Otherwise the field will mention the source code locations for model``.

This is usually useless to users who want to transform a PyTorch model to ONNX one. Only when the user wants to debug the export process, these source code locations could provide benefits.

To make the export() function more friendly by providing less parameters, we combined "strip_doc_string" into "verbose" parameter. If a user set verbose to True, it means the users need some log information for debugging the export process and this is similar with the purpose of strip_doc_string parameter.

But the usage of these 2 arguments are opposite: setting verbose to True means we want to print log information to help debug, which means strip_doc_string should be False. And this is how we replace strip_doc_string with verbose argument in this PR.

This PR will still keep it in torch.onnx.export() function for backward support while the usage of it has been combined with verbose argument.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905268

Pulled By: malfet

fbshipit-source-id: 2f06eb805c01fe15ff7a1b4f6595c937ba716d60

Co-authored-by: fatcat-z <zhang-ji@outlook.com>

* [ONNX] minor doc improvements and cleanup (#62514) (#64373)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64373

* Fix some bad formatting and clarify things in onnx.rst.
* In `export_to_pretty_string`:
    * Add documentation for previously undocumented args.
    * Document that `f` arg is ignored and mark it deprecated.
    * Update tests to stop setting `f`.
    * Warn if `_retain_param_name` is set.
* Use double quotes for string literals in test_operators.py.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905271

Pulled By: malfet

fbshipit-source-id: 3627eeabf40b9516c4a83cfab424ce537b36e4b3

* [ONNX] Deprecated the example_outputs param from torch.onnx.export() function. (#62815) (#64380)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64380

* `example_outputs` used to determine the type and shape of the outputs without tracing the execution of the model. And it must be provided when exporting a ScriptModule or ScriptFunction when using export() function.

* Since we can work out `example_outputs` in internal function instead of being provided by user, so we deprecated this argument in the export() function to increase user experience of calling this function.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905266

Pulled By: malfet

fbshipit-source-id: d00b00d7d02b365d165028288ad915678caa51f2

Co-authored-by: hwangdeyu <dejack953@outlook.com>

* [ONNX] Deprecate use_external_data_format param from torch.onnx.export() function. (#62257) (#64382)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64382

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905265

Pulled By: malfet

fbshipit-source-id: 82b4e17bfa6a8de2bfd700a5282c12f6835603cb

Co-authored-by: hwangdeyu <dejack953@outlook.com>

* fix clang-tidy error introduced by #64382 (#65977)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65977

Reviewed By: ngimel

Differential Revision: D31423174

Pulled By: malfet

fbshipit-source-id: 0ea560b9a6ddd6431f70bd3ac10ace68e26ab352

Co-authored-by: BowenBao <bowbao@microsoft.com>
Co-authored-by: fatcat-z <zhang-ji@outlook.com>
Co-authored-by: hwangdeyu <dejack953@outlook.com>
2021-10-08 07:21:29 -07:00
a27906c250 Convert Sampler back to lazily construction (#63646) (#65926)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63646

Fixes #63609

Test Plan: Imported from OSS

Reviewed By: NivekT

Differential Revision: D30451774

Pulled By: ejguan

fbshipit-source-id: 550d77494326446d1a42b5da0559e0d384c47413
2021-10-08 07:20:03 -07:00
49f52b6c07 Revert "Added option to update parameters using state_dict in AveragedModel (#65495) (#65755)" (#66308)
This reverts commit 5f1a434599b46afd99607839d15892e09269a1c4.
2021-10-08 07:17:47 -07:00
5f1a434599 Added option to update parameters using state_dict in AveragedModel (#65495) (#65755)
* Added option to update parameters using state_dict in AveragedModel (#65495)

Summary:
While implementing [EMA](https://github.com/pytorch/vision/pull/4381)(which extends AveragedModel) in torchvision, update_parameters() from AveragedModel could not be used as it did not handle state_dict(), so a custom update_parameters() needed to be defined in [EMA class](https://github.com/pytorch/vision/pull/4406). This PR aims to handle this scenario removing the need for this custom update_parameters() implementation.

Discussion: https://github.com/pytorch/vision/pull/4406#pullrequestreview-753734102

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65495

Reviewed By: datumbox

Differential Revision: D31176742

Pulled By: prabhat00155

fbshipit-source-id: 326d14876018f21cf602bab5eaba344678dbabe2
(cherry picked from commit 2ea724b1fd543304e3be7bd223cac451cd093e16)

* Added validation of mode parameter in AveragedModel (#65921)

Summary:
Discussion: https://github.com/pytorch/pytorch/pull/65495#issuecomment-930460469

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65921

Reviewed By: albanD

Differential Revision: D31310105

Pulled By: prabhat00155

fbshipit-source-id: 417691832a7c793744830c11e0ce53e3972d21a3
(cherry picked from commit c7748fc172553da66368fd0b7fea3fe5661e2dc1)
2021-10-06 11:13:31 -07:00
ecbf5a7439 Tweak file_diff_from_base for release/1.10 branch (#66202) 2021-10-06 08:34:46 -07:00
4e3ebebcff [DataPipe] DataPipe Fix and Deprecation Warnings for Release 1.10 (#65932)
* Unify the output pathname of archive reader and extractor (#65424)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65424

This PR is re-implementation for https://github.com/facebookexternal/torchdata/pull/93
Same PR has landed into torchdata https://github.com/facebookexternal/torchdata/pull/157

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D31090447

Pulled By: ejguan

fbshipit-source-id: 45af1ad9b24310bebfd6e010f41cff398946ba65

* [DatePipe] add deprecation warnings for DataPipes that will solely exist in TorchData (#65827)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65827

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31272794

Pulled By: NivekT

fbshipit-source-id: 8da8266184b4df050422904cbc5fca6d7c3d2e02

* [DataPipe] Fixes an issue where TarArchiveReader closes stream when read into a buffer (#65877)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65877

Fixes #65808

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31296041

Pulled By: NivekT

fbshipit-source-id: cdcad3a333ae9781d6063678a122a128955b0ff4

Co-authored-by: Erjia Guan <erjia@fb.com>
2021-10-05 20:54:40 -07:00
2b46c95e7c [iOS][CI] Update dev certs (#66004) (#66188)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65988

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66004

Reviewed By: xta0

Differential Revision: D31340893

Pulled By: malfet

fbshipit-source-id: 3bf0be266e9686a73d62e86c5cf0bebeb0416260

Co-authored-by: Tao Xu <taox@fb.com>
2021-10-05 20:12:40 -07:00
5f3eee1ca5 Fix backward compatibility tests (#66186)
Compare operator list against RC1 build rather than against nightly
2021-10-05 20:12:13 -07:00
4731f33d02 Fix Windows ninja builds when MAX_JOBS is specified (#65444) (#66155)
Summary:
Reported by cloudhan in https://github.com/pytorch/pytorch/pull/64733#issuecomment-924545463

Fixes regression introduced by 047e68235f

cc malfet seemethere

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65444

Reviewed By: dagitses, seemethere

Differential Revision: D31103260

Pulled By: malfet

fbshipit-source-id: 9d5454a64cb8a0b96264119cf16582cc5afed284
2021-10-05 12:03:27 -07:00
ecfcb8ff5a Binary building wthout python fix (#66031) (#66117)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66030

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66031

Reviewed By: VitalyFedyunin

Differential Revision: D31356243

Pulled By: malfet

fbshipit-source-id: d1537bc65bbba5d6497ecb8db7160a397eca81fd
2021-10-05 12:02:51 -07:00
6aadfda9e2 [ci] try installing libgnutls to fix cert error (#65934) (#65979)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65934

see: https://github.com/pytorch/pytorch/issues/65931, this was a
suggested remediation on the linked issue

Test Plan: Imported from OSS

Reviewed By: malfet, zhouzhuojie

Differential Revision: D31313040

Pulled By: suo

fbshipit-source-id: a9e2b82a1e879962af768ed3049c73ab77394738

Co-authored-by: Michael Suo <suo@fb.com>
2021-09-30 18:55:44 -07:00
13666d20fd [DataPipe] Fix deepcopy filehandle for Mapper and in-place modification for IterableWrapper (#65220) (#65924)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65220

Fixes #65221

- Remove deepcopy from Mapper to support file handles
- Convert `IterableWrapper` to deepcopy iterable instance within each iterator to prevent in-place modification (different data per epoch)
- Convert `IDP` to `IterableWrapper` in test_datapipe.py
- Refine the variable names (prevent using `dp` that is module reference)

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31021886

Pulled By: ejguan

fbshipit-source-id: 72a9eee66c758e2717d591cd0942892bddedc223
2021-09-30 18:36:49 -07:00
1fa17a20fc Fix the slowdown of _object_to_tensor since 1.9 (#65721) (#65835)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65721

#Closes: https://github.com/pytorch/pytorch/issues/65696

The bug is introduced in https://github.com/pytorch/pytorch/pull/55861, and it causes 100X slowdown since 1.9.
ghstack-source-id: 139128267

Test Plan:
Performance test:
```
import time

from torch.distributed.distributed_c10d import _object_to_tensor

start = time.time()
_object_to_tensor("x" * 50_000_000)
print("Time:", time.time() - start)
```

Reviewed By: rohan-varma

Differential Revision: D31219794

fbshipit-source-id: 1abec38f9d51361c1eab6ad5efd87b589322e208

Co-authored-by: Yi Wang <wayi@fb.com>
2021-09-29 14:38:54 -07:00
c05547fa6c Fix test reporting git merge-base (#65787) 2021-09-28 15:48:32 -07:00
0e857bf109 [1.10] Remove torch.vmap (#65496)
torch.vmap is a prototype feature and should not be in the stable
binary. This PR:
- Removes the torch.vmap API
- Removes the documentation entry for torch.vmap
- Changes the vmap tests to use an internal API instead of torch.vmap.

Test Plan:
- Tested locally (test_torch, test_autograd, test_type_hints, test_vmap),
but also wait for CI.
2021-09-24 10:29:08 -07:00
ad22804b95 [release/1.10] Pin builder and xla repo (#65433)
Pin builder to https://github.com/pytorch/builder/commits/release/1.10
Pin xla to https://github.com/pytorch/xla/tree/r1.10
2021-09-21 16:16:22 -07:00
eb4fb1ed81 THCTensor cleanup (#65369)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65369

Reviewed By: bhosmer

Differential Revision: D31071406

Pulled By: ngimel

fbshipit-source-id: bbc3f2781003333641524aeb692b944fd3ad8d7a
2021-09-21 10:28:19 -07:00
600df80296 [PT/ShardedTensor]Allow zero size local shard (#65007)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65007

Relax shard size check in ShardMetadata to allow zero size local shard.

When sharding a tensor on N ranks, some ranks may have empty shard allocated. As we are assuming SPMD, the ranks w/ empty shard still need to participate in all collectives, and we need to allow this in ShardMetadata.

Test Plan: Unit tests and CLI

Reviewed By: jiaqizhai, wanchaol

Differential Revision: D30926566

fbshipit-source-id: afa562c94ffa8f8d91d65ddb4c348156d871dc36
2021-09-21 09:58:54 -07:00
7f6580a868 OpInfo: nn.functional.conv2d (#65233)
Summary:
Reland : https://github.com/pytorch/pytorch/issues/63517
Reference: https://github.com/pytorch/pytorch/issues/54261

Reference: facebookresearch/functorch#78

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65233

Reviewed By: malfet

Differential Revision: D31025538

Pulled By: zou3519

fbshipit-source-id: b1cd38c22f4cb8eedd3f958e02dd7410dcbb8d8d
2021-09-21 09:26:23 -07:00
9324181d0a [JIT] Re-land "Add aten::slice optimization" (#65341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65341

The changes in D30231044 (babd449978) were removed due to a downstream issue in glow. Now that the issue has been fixed by D30849396, we can safely re-introduce the changes.

Test Plan:
`buck test //caffe2/test:jit -- TestPeephole`

Glow test:
* `buck test //glow/fb/torch_glow/tests:unfuse_glow_ops_test`
* qxy11 confirmed that the problematic glow model now loads correctly with these changes

Reviewed By: eellison

Differential Revision: D31056878

fbshipit-source-id: 049903ee04ba88885cc9d1a91427af0f1f44f681
2021-09-21 07:29:51 -07:00
9c23f6eb7d [nn] TripletMarginLoss and PairwiseDistance : no batch dim (#64882)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64882

Reviewed By: malfet

Differential Revision: D31055577

Pulled By: jbschlosser

fbshipit-source-id: 2f0a5a08619b672026b48a78bc7d83a6dccba0bf
2021-09-21 07:29:48 -07:00
d35ee431d8 correlate forward and backward op (#62553)
Summary:
Use startThreadId+seqNumber of forward-op and fwdThreadId+seqNumber of backward-op to correlate pair of them.
third_party/kineto should be updated accordingly: https://github.com/pytorch/kineto/pull/372

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62553

Reviewed By: malfet

Differential Revision: D30125728

Pulled By: gdankel

fbshipit-source-id: 9877a54392ba043d0eac56ce5b7bbf244277fa7e
2021-09-21 07:28:29 -07:00
f0ada4bd54 [docs] Remove .data from some docs (#65358)
Summary:
Related to https://github.com/pytorch/pytorch/issues/30987. Fix the following task:

- [ ] Remove the use of `.data` in all our internal code:
  - [ ] ...
  - [x] `docs/source/scripts/build_activation_images.py` and `docs/source/notes/extending.rst`

In `docs/source/scripts/build_activation_images.py`, I used `nn.init` because the snippet already assumes `nn` is available (the class inherits from `nn.Module`).

cc albanD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65358

Reviewed By: malfet

Differential Revision: D31061790

Pulled By: albanD

fbshipit-source-id: be936c2035f0bdd49986351026fe3e932a5b4032
2021-09-21 06:32:31 -07:00
daa50f1e9f Adds keyword only args to gradcheck (#65290)
Summary:
Changes the call signature of gradcheck so that kwargs are kwargs only.

Also modifies return call from gradgradcheck, to reflect these changes.

Fixes https://github.com/pytorch/pytorch/issues/65165

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65290

Reviewed By: soulitzer

Differential Revision: D31061316

Pulled By: albanD

fbshipit-source-id: 3505569a33a497a8be4347bdd425bb2b8e536999
2021-09-21 06:31:07 -07:00
880098a7e3 [PyTorch Edge] Backport function for defaults args with out args, flag on (#63651)
Summary:
1. Enable support for operators with default args and out args. For `torch.add(x, h, out=x)`, the number of specified arguments will be 3 instead of 4.
2. Bump bytecode version from 6 to 7
3. Implement backport_v7_to_v6 function. Also slightly refactor the local_thread to allow re-emit operators.
4. unittest to cover backport function
5. Update expect result from 4 to 3 in unit test DefaultArgsWithOutArg to cover the number of specified arguments.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63651

ghstack-source-id: 138539912

Test Plan:
```
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions
```

Reviewed By: raziel, tugsbayasgalan

Differential Revision: D30454080

fbshipit-source-id: 357c50b96682430675142d20d688d1f64e1de307
2021-09-20 22:50:30 -07:00
5826d207ad [JIT] Delete obsolete message: or if you absolutely have to, use c10::impl::GenericDict(c10::impl::deprecatedUntypedDict()) (#65164)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65164

Looks like it was forgotten in https://github.com/pytorch/pytorch/pull/25439

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31072625

Pulled By: pbelevich

fbshipit-source-id: a5ffcfb0836f962ab6952a187ba7717c4d4a6e33
2021-09-20 22:50:28 -07:00
19a1063888 [JIT] Support device as Dict key (#65079)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65079

This is required to use RPC DeviceMap aka Dict[torch.device, torch.device] in torchscript

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31072626

Pulled By: pbelevich

fbshipit-source-id: 51cfa5653db86de73b624e9157d68d1b319bfc64
2021-09-20 22:49:15 -07:00
512834b61d Reduce PyTorch warnings: Cast fix xplat/caffe2/aten/src/ATen/core/DeprecatedTypeProperties.h (#65031)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65031

Test Plan:
```
buck build --show-output //caffe2/torch/fb/sparsenn:sparsenn_operators

buck test caffe2/torch/fb/sparsenn:test
```

Reviewed By: r-barnes

Differential Revision: D30948791

fbshipit-source-id: 13046e1d0ce2c24864ad38f318ca5e34b1bb9552
2021-09-20 20:29:58 -07:00
0dc98728bc Basic implementation of ShardedLinear using ShardedTensor. (#64128)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64128

This PR implements a sharded nn.Linear layer using ShardedTensors with
the following limitations:

1) Works only for ChunkShardingSpec.
2) Implementation is only aimed to demonstrate functionality and is most likely
not performant at all.

The PR also introduces a `shard_parameter` API to easily shard parameters of
`nn.Modules`. This also has the following limitations:

1) Works only for ChunkShardingSpec.
2) Is not performant since it uses broadcast instead of scatter since
ProcessGroupNCCL doesn't yet support scatter.

Overall user API for running a sharded linear would be something like this:

```
# SPMD programming paradigm running same code on all nodes.
fc = nn.Linear(10, 10)

# Setup sharding.
sharding_spec=ChunkShardingSpec(...)
shard_parameter(fc, 'weight', sharding_spec, src_rank=0)

# Run as a normal linear layer.
inp = torch.rand(10, 10)
output = fc(inp)
```
ghstack-source-id: 138500985

Test Plan:
1) unit tests.
2) waitforbuildbot

Reviewed By: wanchaol, bowangbj

Differential Revision: D30621215

fbshipit-source-id: 1aa7478568c18a4572f6c3462fdf24a4cbde01d6
2021-09-20 18:31:11 -07:00
257a18d951 Track peak memory usage (#65157)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65157

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31029049

Pulled By: driazati

fbshipit-source-id: 3e87e94e4872d118ad191aef2b77b8cefe90aeb6
2021-09-20 17:25:16 -07:00
58909395ab Fix logic to determine master vs PR (#65155)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65155

This was bugged before on empty strings which caused the hook to write on any job, not just `master` regardless of the `only_on_master` flag.

Test Plan: see `[scribe] Skipping RDS write on PR` in the logs for `linux-xenial-cuda11.3-py3.6-gcc7`

Reviewed By: malfet

Differential Revision: D31029048

Pulled By: driazati

fbshipit-source-id: 77c4a60e443d8fc19990755a3a346576afee86d8
2021-09-20 17:25:14 -07:00
60915eb810 [quant] Add fp32/fp16 zero_point support for CPU fakeQuant (#65055)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65055

Test Plan: Imported from OSS

Reviewed By: jingsh, supriyar

Differential Revision: D30975238

Pulled By: b-koopman

fbshipit-source-id: 2000660ffe71cb85d00fdabaf8fc3ba7323f9a1e
2021-09-20 17:25:12 -07:00
ce101fed02 [PyPer] copy-free freeze_module (#65118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65118

Cloning the module can increase memory use. By freezing the module directly without cloning it first, we can avoid this memory usage increase.

Reviewed By: eellison, movefast1990

Differential Revision: D30955053

fbshipit-source-id: 2feb738eddcf66aa68c92bf695cc05b57bd990f0
2021-09-20 17:25:10 -07:00
ca649851c6 Reduce PyTorch warnings: Cast fix xplat/caffe2/c10/core/TensorOptions.h (#65030)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65030

Test Plan:
```
buck build --show-output //caffe2/torch/fb/sparsenn:sparsenn_operators

buck test caffe2/torch/fb/sparsenn:test
```

Reviewed By: r-barnes

Differential Revision: D30948721

fbshipit-source-id: 16fe42daab35709c56a4d3ccc276ea635a3510c1
2021-09-20 17:23:58 -07:00
2465a103b8 [iOS] Zero out NSError to avoid heap corruptions for the OSS builds (#65355)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65355

I've been seeing heap corruptions in the CMake builds due to the NSError* not being initialized with `nil`.  However, I haven't see this issue for the BUCK builds.
ghstack-source-id: 138502708

Test Plan:
1. Test the OSS builds to make sure the heap corruption has gone.
2. Test the Buck build in the playground app
3. Circle CI

Reviewed By: hanton

Differential Revision: D31048010

fbshipit-source-id: cfd8d614f3f91f09caee4aab61237007ec080481
2021-09-20 16:31:23 -07:00
=
b7adb3350a Add crow_/col_indices to view types (#63176)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61103

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63176

Reviewed By: malfet, albanD

Differential Revision: D30315882

Pulled By: cpuhrsch

fbshipit-source-id: eedae5265a757ed68fd69e4f9d07070b05de4bd8
2021-09-20 14:35:58 -07:00
31f61122da Creating a helper function to generate an unique name for an attr in a module (#64970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64970

Add a helper function to create an unique name for an attr.
This can be used when we want to add a weight to a module.

Test Plan: run CI.

Reviewed By: jfix71

Differential Revision: D30921497

fbshipit-source-id: 598569d107df8b516ff12920a4bef3a42577e987
2021-09-20 14:35:56 -07:00
b45ec16310 Add support to lower acc_ops.transpose (#65036)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65036

Reviewed By: jfix71, 842974287

Differential Revision: D30934503

fbshipit-source-id: 51880d3d36492f5206f77c9d1a994d8532597b62
2021-09-20 14:35:54 -07:00
e33a1fa680 [fx] give warning instead of fatal the program when submod not found during adding get_attr (#65225)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65225

Currently when create get_attr node, if the attribute is in a submodule, we'll fist find the submodule. If the submodule isn't in the owning module we throw an exception.

However, if the attribute can't be found, we give a warning but still allow to create the get_attr node. To align with this behavior, we change the reaction when submod not found from fatal to giving a warning.

Test Plan: CI

Reviewed By: jamesr66a, jfix71

Differential Revision: D31021535

fbshipit-source-id: 4c0b471448c09cc927d0f47b5bf56594f25a8863
2021-09-20 14:35:52 -07:00
8fb253757d Remove @balioglu from PyTorch Distributed code owners (#65239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65239

Due to too much noise caused by the GitHub notifications, going forward I prefer to track PRs manually.
ghstack-source-id: 138386041

Test Plan: N/A

Reviewed By: mrshenli

Differential Revision: D31027792

fbshipit-source-id: 6578e41d4ab53ad2c64a41584716f4903298cd6b
2021-09-20 14:34:37 -07:00
e3210ca184 [CUDA graphs] Beta, not prototype (#65247)
Summary:
Powers have decided this API should be listed as beta.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65247

Reviewed By: malfet

Differential Revision: D31057940

Pulled By: ngimel

fbshipit-source-id: 137b63cbd2c7409fecdc161a22135619bfc96bfa
2021-09-20 13:32:36 -07:00
b71f01f70d Fix full backward hook when grad is disabled (#65335)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59901. See discussion in the issue.

cc albanD soulitzer

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65335

Reviewed By: malfet

Differential Revision: D31055865

Pulled By: albanD

fbshipit-source-id: 53605df62bc73c99d8908248087ab400b81ac495
2021-09-20 13:31:19 -07:00
2abf3594d5 Fix unassigned ciflow trigger (#65354)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65250#issuecomment-923120764

this is a limitation of github action triggers, it's hard to introduce condition before the workflow, that's why we intentionally pick the rare event ("unassigned"). The fix I think for people didn't opt-in ciflow and manually unassign, is to run all the CI (otherwise we introduce new condition on this and not worth to make things even complex).

`unassigned` event payload looks like this, just to make sure `github.event.assignee.login` is pointing to the right location.

```
  {
    "action": "unassigned",
    "assignee": {
      "avatar_url": "https://avatars.githubusercontent.com/u/658840?v=4",
      "events_url": "https://api.github.com/users/zhouzhuojie/events{/privacy}",
      "followers_url": "https://api.github.com/users/zhouzhuojie/followers",
      "following_url": "https://api.github.com/users/zhouzhuojie/following{/other_user}",
      "gists_url": "https://api.github.com/users/zhouzhuojie/gists{/gist_id}",
      "gravatar_id": "",
      "html_url": "https://github.com/zhouzhuojie",
      "id": 658840,
      "login": "zhouzhuojie",
      "node_id": "MDQ6VXNlcjY1ODg0MA==",
      "organizations_url": "https://api.github.com/users/zhouzhuojie/orgs",
      "received_events_url": "https://api.github.com/users/zhouzhuojie/received_events",
      "repos_url": "https://api.github.com/users/zhouzhuojie/repos",
      "site_admin": false,
      "starred_url": "https://api.github.com/users/zhouzhuojie/starred{/owner}{/repo}",
      "subscriptions_url": "https://api.github.com/users/zhouzhuojie/subscriptions",
      "type": "User",
      "url": "https://api.github.com/users/zhouzhuojie"
    },
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65354

Reviewed By: malfet, seemethere, janeyx99

Differential Revision: D31060212

Pulled By: zhouzhuojie

fbshipit-source-id: ce815cc96e8a00016646d6f02f0917169fa652dc
2021-09-20 12:33:23 -07:00
378949b83c fix typo missing f string (#65226)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65226

Reviewed By: malfet

Differential Revision: D31055793

Pulled By: albanD

fbshipit-source-id: fafac53e75223c4f599bd2162095aacad7b690df
2021-09-20 12:31:54 -07:00
0430d1da12 [iOS] Fix the TestApp (#65319)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65319

Test Plan: Imported from OSS

Reviewed By: hanton

Differential Revision: D31049543

Pulled By: xta0

fbshipit-source-id: ff0d0baac30682c63b2a28254ee0a5d8d9b8ca6f
2021-09-20 11:28:40 -07:00
3e64c9e176 [Pipe] Add a WithDevice wrapper to specify device execution for a module. (#65190)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65190

As described in https://github.com/pytorch/pytorch/issues/65093, there
could be modules which don't have any parameters/buffers. In this case, Pipe
determines that the module should be executed on CPU. However this might result
in unnecessary GPU to CPU transfers whereas the user expected the module to be
executed on the GPU itself by keeping its inputs and outputs on GPU.

For this use case, we introduce a `WithDevice` wrapper which can be used to
override which device a particular module should be executed on as part of the
pipeline.

#Closes: https://github.com/pytorch/pytorch/issues/65093
ghstack-source-id: 138376272

Test Plan:
1) waitforbuildbot
2) unit tests

Reviewed By: SciPioneer

Differential Revision: D31010027

fbshipit-source-id: 4c1c61d3c6feeef341e002e5f7e83dd33ff3a516
2021-09-20 11:27:27 -07:00
0a3cf8886a Torchhub: More robust assumption regarding main or master branch (#64364)
Summary:
Closes https://github.com/pytorch/pytorch/issues/63753

This PR changes the assumption regarding the default branch of a repo to the following:

> If main exist then use main,otherwise use master

This will make torchhub more robust w.r.t. to the ongoing changes where repo use `main` instead of `master` as the development / default branch.

cc nairbv NicolasHug

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64364

Reviewed By: saketh-are

Differential Revision: D30731551

Pulled By: NicolasHug

fbshipit-source-id: 7232a30e956dcccca21933a29de5eddd711aa99b
2021-09-20 10:36:13 -07:00
99e4ab5d44 [Static Runtime] Implement and enable variadic tuple unpack (#64934)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64934

Add a new op `static_runtime::VarTupleUnpack` and a graph pass transforming graph sequences from:
```
%0, %1 = prim::TupleUnpack(%a)
%2, %3 = prim::TupleUnpack(%b)
```
into:
```
%0, %1, %2, %3 = static_runtime::VarTupleUnpack(%a, %b)
```

The pass is only applied to contiguous blocks of `TupleUnpack` nodes. This is the most straightforward way to guarantee correctness, and it is sufficient for the models we care about.

Test Plan: New unit tests: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- VarTupleUnpack`

Reviewed By: d1jang

Differential Revision: D30872109

fbshipit-source-id: 1ed4a7e201c532da28f703a3a50241c392a6c7e9
2021-09-20 10:36:11 -07:00
14347d0dd5 [quant][fx][graphmode] Fix a bug for sub (#65109)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65109

Previously for sub we set the dtype for sub with qconfig since it's matched with a QuantizeHandler,
however this is incorrect, the dtype for sub is decided by whether the output is quantized or not,
so we added a check of is_output_quantized while deciding the dtype for the output of sub.

Later: is_output_quantized now depends on is_reference, which is pretty confusing and it may cause problems down the road, we should remove this dependency in the future.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_sub_scalar

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30977826

fbshipit-source-id: 551fd63bd61b43b3c3415944ff73174e3a21cc8a
2021-09-20 10:36:09 -07:00
c562ebca23 Revert "Revert D30558877: Ported std/var to ReductionOpInfo (#65262)
Summary:
Reland of https://github.com/pytorch/pytorch/issues/63978

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65262

Reviewed By: mruberry

Differential Revision: D31037360

Pulled By: ngimel

fbshipit-source-id: 1c60f40c547229767cba3bbe7e11ca0fbbc8f95f
2021-09-20 10:36:06 -07:00
fb1e6835cc simplify torch.meshgrid's shape computation (#62905)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62905

Reviewed By: mruberry

Differential Revision: D31021274

Pulled By: dagitses

fbshipit-source-id: c219389bdc543e9592f7b1c707acfbf752ee6f34
2021-09-20 10:34:45 -07:00
cf60d24028 [DataPipe] Unlimited buffer for Forker and Demultiplexer (#64994)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64994

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D30934362

Pulled By: ejguan

fbshipit-source-id: d3b774d7e28c0b9659e999511e5a68c3929857d4
2021-09-20 09:30:39 -07:00
88032d8943 Automated submodule update: FBGEMM (#64640)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: d1ecc7dbe2

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64640

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: jspark1105

Differential Revision: D30805660

fbshipit-source-id: 9f783862e89fe3974badd5194ef793db55e7d275
2021-09-18 16:29:30 -07:00
d8189db80f [quant][fx2trt] Generate engine graph for explicit quant/implicit quant and fp16 graph (#65289)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65289

Turn on VERBOSE logging and use engine visualizer to generate the graph.

Runtime:
```
explicit quant result diff max tensor(0.0771)
implicit quant result diff max tensor(0.1909)
trt fp16 time (ms/iter) 1.0740923881530762
trt int8 time (ms/iter) 0.5288887023925781
trt implicit int8 time (ms/iter) 0.6334662437438965
PyTorch time (CUDA) (ms/iter) 4.448361396789551
PyTorch time (CPU) (ms/iter) 45.13296604156494
```

Generated Graphs:
```
explicit int8: https://www.internalfb.com/intern/graphviz/?paste=P458669571
implicit int8: https://www.internalfb.com/intern/graphviz/?paste=P458669656
fp16: https://www.internalfb.com/intern/graphviz/?paste=P458669708
```

Test Plan:
```
buck run mode/opt -c python.package_style=inplace caffe2:fx2trt_quantized_resnet_test 2>log
buck run //deeplearning/trt/fx2trt/tools:engine_layer_visualize -- --log_file log
```

Reviewed By: 842974287

Differential Revision: D30955035

fbshipit-source-id: 24949458ad9823fb026d56d78a6ee1c6874b6034
2021-09-18 13:30:37 -07:00
7f8d622d70 [Static Runtime] Add perf metrics for number of managed tensors & unmanaged values (#64992)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64992

This change lets Static Runtime print out number of managed tensors & unmanaged values as performance metrics during profile runs.

We will use /enhance these metrics to guide the effort of managing output tensors.

Test Plan:
Confirmed that a profile run prints out the added metric values on inline_cvr nets:
```
(inline_cvr/local)
...
Total number of managed tensors: 2754
Total number of unmanaged values: 3240
...
(inline_cvr/local_ro)
Total number of managed tensors: 1554
Total number of unmanaged values: 2966
...
(inline_cvr/remote_ro)
Total number of managed tensors: 1439
Total number of unmanaged values: 28
...
```

Reviewed By: hlu1

Differential Revision: D30926617

fbshipit-source-id: b86e071003ac941b9663db103eaa7c614466b4e0
2021-09-18 11:26:37 -07:00
4a128ed811 Remove incorrect stride assert in Reduce.cuh (#65227)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/37583

Per discussion with ngimel, the condition asserted here may not always hold after TensorIterator's dimension coalescing and reordering. However, the reduction output should still be correct when `sub_iter.strides(0)[0]` is non-zero.

I've verified correctness empirically by:
1. Lowering the threshold ([configured here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/TensorIterator.cpp#L1127)) at which iterators are split into sub-iterators, making it easier to trigger.
2. Generating many tensors with random dimensions and randint elements which produce a non-zero `sub_iter.strides(0)[0]` in the CUDA kernel.
3. Verifying that the reduction `t.sum(dim=0)` produces the same results for those tensors on CPU and on CUDA.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65227

Reviewed By: ngimel

Differential Revision: D31031406

Pulled By: saketh-are

fbshipit-source-id: 5cbf2001224454c74f6db42455c507365ad1f2b1
2021-09-18 10:29:13 -07:00
543185a0fd support using gradients named for outputs in derivatives (#63947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63947

Fixes #62196

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30541485

Pulled By: dagitses

fbshipit-source-id: ea1dd0edd1a51936a295631e52b85e9c022a9c87
2021-09-18 07:31:45 -07:00
926a3d2e85 clarify implementation of check_grad_usage (#64439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64439

1) remove unused fully_implemented
2) rename used_grad to uses_grad and make it a boolean
3) rename used_grads to num_grads_uses
4) add comments explaining what some of the checks mean

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30733904

Pulled By: dagitses

fbshipit-source-id: dccbbef8a4be8713215ef91aa97a34124f06a7a1
2021-09-18 07:30:30 -07:00
d3e36fade2 [quant][fx2trt] Enable comparison with implicit quant mode (#65043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65043

Currently got following result, will take a look at the executed graph again:
```
trt fp16 time (ms/iter) 1.0483217239379883
trt int8 time (ms/iter) 0.5329632759094238
trt implicit int8 time (ms/iter) 0.6769704818725586
PyTorch time (ms/iter) 6.453146934509277
```

Test Plan:
```
python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py
```

Imported from OSS

Reviewed By: 842974287

Differential Revision: D30954871

fbshipit-source-id: 8d7ff82b8f5d0b7946fbd38a7cddede7d40b28aa
2021-09-17 23:29:35 -07:00
4150b672aa [Codemod][FBSourceBlackLinter] Daily arc lint --take BLACK
Reviewed By: zertosh

Differential Revision: D31039372

fbshipit-source-id: a5e54a9b1d2ef97e9bc206b9e2a82124e5a22a7a
2021-09-17 20:33:12 -07:00
6707dfeefb Remove 9.2 related macros for CONSTEXPR (#65066)
Summary:
Removes C10_HOST_CONSTEXPR_EXCEPT_CUDA92 references in the code

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65066

Reviewed By: driazati

Differential Revision: D31022520

Pulled By: janeyx99

fbshipit-source-id: f02cdc6caba5b48405575242921f5845ff18f729
2021-09-17 17:31:20 -07:00
1cd9018b6f Make github.com in noproxy list (#65256)
Summary:
Fixes #{issue number}

Attempt to solve some ratelimiting issue we saw from calling GitHub apis

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65256

Reviewed By: seemethere

Differential Revision: D31035115

Pulled By: zhouzhuojie

fbshipit-source-id: 7efd5d5af7d91805e4bf27b86847791e991b741e
2021-09-17 17:31:18 -07:00
50c29fef3e remove utils.cpp (#65184)
Summary:
Dead code

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65184

Reviewed By: mruberry

Differential Revision: D31031777

Pulled By: ngimel

fbshipit-source-id: 13633888229a7af8cfd8ea7e55ea2880b2e47273
2021-09-17 17:31:15 -07:00
19471c54a6 [fx const fold] fix a case when some inputs are unused (#65223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65223

If there're unused inputs, they won't appear in `submod_1`. We need to add all the unused inputs so that the model after const fold has the same inputs as the original model.

Reviewed By: jfix71

Differential Revision: D31021217

fbshipit-source-id: b7452c90d133b747e0699936a81d3fee14af9cc9
2021-09-17 17:29:55 -07:00
992dad1855 [Profiler] Update kineto submodule (#65236)
Summary:
Update to latest kineto revision. See Kineto repo for change log.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65236

Reviewed By: malfet

Differential Revision: D31031638

Pulled By: gdankel

fbshipit-source-id: 681655b2e8e151895afa91445ced0fd57a11fa93
2021-09-17 16:26:30 -07:00
4408b755bc [fx2trt] re-enable profiler and some miscs for TRTModule (#65072)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65072

Previously disabled attaching trt profiler to exec context in TRTModule because https://fburl.com/mc33n880 states that `enqueue()` doesn't support profiling. Seems to be a lie though. Re-enable attaching profiler in this diff.

Also added a bunch of checks for dtype and shape, and fixed saving state_dict and loading back.

Test Plan: buck run mode/opt -c python.package_style=inplace -j 40 deeplearning/trt/fx2trt:acc2trt_test

Reviewed By: yinghai

Differential Revision: D30962757

fbshipit-source-id: 9c664b0500a8169b7952f6f912239a5a05772aea
2021-09-17 16:26:28 -07:00
afa25c77f1 [package] Make it possible to re-save a PackageImporter module (#65101)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65101

As title. Previously this was guarded against for implementation
simplicity, as we didn't really think there was a use case for saving a
mangled module name directly.

But people started doing stuff like:
```
exporter.save_module(my_imported_obj.__module__)
```
which implicitly passes along the mangled module name.

This PR makes it so that given `PackageImporter` instance can always
import modules that it created, and changes `PackageExporter` to
properly demangle the resulting module name when writing the package to
the export archive.

Differential Revision:
D30975712
D30975712

Test Plan: Imported from OSS

Pulled By: suo

fbshipit-source-id: d9e849bf651713890e72dccdcef74fa52d377149
2021-09-17 16:25:11 -07:00
487c771593 [FX] Fix tracing of bitwise and/or (#65196)
Summary:
Previously resulted in `AttributeError: module 'operator' has no attribute 'and'`

and/or are python keywords, so they are renamed to `operator.and_` and `operator.or_`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65196

Reviewed By: Chillee

Differential Revision: D31020336

Pulled By: jansel

fbshipit-source-id: 51d888151fe78c0c1197ecaf161976b219c59694
2021-09-17 14:33:02 -07:00
6596173811 Revert D30731191: [pytorch][PR] Torchhub: rewrite commit hash check to avoid using unnecessary GitHub API credits
Test Plan: revert-hammer

Differential Revision:
D30731191 (f9bf144a0c)

Original commit changeset: d1ee7c2ef259

fbshipit-source-id: 5c7207f66c5354ce7b9ac2594e4f5b8307619b0c
2021-09-17 14:33:00 -07:00
3d32dec5ba [ONNX] Deprecate enable_onnx_checker argument in torch.onnx.export() (#61708) (#64369)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64369

As of now, the "enable_onnx_checker" parameter was described as below:

enable_onnx_checker (bool, default True): If True the ONNX model checker will be run to ensure the exported model is a valid ONNX model.

An invalid ONNX graph is useless to users so such checker should be done for each call.

In this PR, we will still write the model to an ONNX file even it is invalid. And the exception will be thrown after the ONNX file has been created. This enables user output an invalid ONNX graph for debug.

This PR will still keep it in torch.onnx.export() function for backward support while all backend logic has been changed to work as enable_onnx_checker is set to True.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905267

Pulled By: malfet

fbshipit-source-id: 3ad3f68e77fcec012cc7ef674cc9a61755eebc9e

Co-authored-by: fatcat-z <zhang-ji@outlook.com>
2021-09-17 14:31:41 -07:00
ae00075ac7 [Static Runtime] Move MemoryPlanner out into memory_planner.cpp (#65123)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65123

This change re-reverts D30883290 (0e11454d19). D30883290 (0e11454d19) broke the OSS build since the change in this change implicitly removed the default move constructor of `StaticRuntime`.

```
ep 15 15:39:57 /var/lib/jenkins/workspace/benchmarks/static_runtime/deep_wide_pt_bench.cc:95:10: error: call to implicitly-deleted copy constructor of 'torch::jit::StaticRuntime'
Sep 15 15:39:57   return torch::jit::StaticRuntime(*smod);
Sep 15 15:39:57          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sep 15 15:39:57 /var/lib/jenkins/workspace/torch/csrc/jit/runtime/static/impl.h:321:34: note: copy constructor of 'StaticRuntime' is implicitly deleted because field 'planner_' has a deleted copy constructor
Sep 15 15:39:57   std::unique_ptr<MemoryPlanner> planner_;
Sep 15 15:39:57                                  ^
Sep 15 15:39:57 /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/unique_ptr.h:356:7: note: 'unique_ptr' has been explicitly marked deleted here
Sep 15 15:39:57       unique_ptr(const unique_ptr&) = delete;
Sep 15 15:39:57       ^
Sep 15 15:39:57 /var/lib/jenkins/workspace/benchmarks/static_runtime/deep_wide_pt_bench.cc:99:9: error: call to implicitly-deleted copy constructor of 'torch::jit::StaticRuntime'
Sep 15 15:39:57    auto sr = getStaticRuntime();
Sep 15 15:39:57         ^    ~~~~~~~~~~~~~~~~~~
Sep 15 15:39:57 /var/lib/jenkins/workspace/torch/csrc/jit/runtime/static/impl.h:321:34: note: copy constructor of 'StaticRuntime' is implicitly deleted because field 'planner_' has a deleted copy constructor
Sep 15 15:39:57   std::unique_ptr<MemoryPlanner> planner_;
Sep 15 15:39:57                                  ^
Sep 15 15:39:57 /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/unique_ptr.h:356:7: note: 'unique_ptr' has been explicitly marked deleted here
Sep 15 15:39:57       unique_ptr(const unique_ptr&) = delete;
Sep 15 15:39:57       ^
Sep 15 15:39:57 2 errors generated.
```

This change fixes the issue by explicitly defining the default move constructor (courtesy of mikeiovine).

Original Summary:

This change moves `MemoryPlanner` out of impl.cpp into memory_planner.cpp.

`MemoryPlanner` performs an independent sub-task of static analysis of a graph, and creating memory planning, and allocating/deallocating managed Tensors.

This change will reduce merge conflicts as I work on MemoryPlanner more actively for output Tensor support.

Test Plan: - Confirm that OSS build went well (See External Tests section).

Reviewed By: mikeiovine

Differential Revision: D30983292

fbshipit-source-id: a59f407fa1123527824157268111144a1bf58116
2021-09-17 13:32:01 -07:00
eaf85fad62 [PyTorch] Extract parseOperator() into a standalone source file (#65179)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65179

This is following up this PR: https://github.com/pytorch/pytorch/pull/61862. The purpose is to modularize operator parsing so that it can be used as needed without pulling the whole `import.cpp` into build.

Test Plan: Added a unit test in `test_lite_predictor.cpp` called `ParseOperators`, similar to `ParseBytecode`.

Reviewed By: iseeyuan

Differential Revision: D31006555

fbshipit-source-id: c38e221800af4cf72963a353c452c5437f56a0ac
2021-09-17 13:31:59 -07:00
35084ee451 [PyTorch] Improve OperatorEntry::getKernelForDispatchKey (#64838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64838

The returned pointer, if present, could never be nullptr, so there is no reason to wrap it in an optional rather than just using the nullptr state. The repeated calls to kernels_.at() were not getting optimized away, so just use the perfectly good iterator find() already gave us.
ghstack-source-id: 138304429

Test Plan: CI

Reviewed By: bdhirsh

Differential Revision: D30875748

fbshipit-source-id: 9cbb875715b7a582380c7402155fdbe21944dc85
2021-09-17 13:31:56 -07:00
fcaf526815 avoid moving Argument in infer_schema (#64822)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64822

Turns out the suppressed lint message was trying to tell us something: we can construct our Argument in-place rather than create a temporary and move into the argument vector.
ghstack-source-id: 138304423

Test Plan: CI, profile op registration and observe reduced Argument move ctor and dtor costs

Reviewed By: smessmer

Differential Revision: D30860718

fbshipit-source-id: c8da45ab7e61b5df9fa1273301896309bca108b5
2021-09-17 13:31:54 -07:00
79cbcd3e7c [PyTorch] Fix missing move in Argument ctor (#64821)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64821

Not moving adds excess refcounting overhead.
ghstack-source-id: 138304432

Test Plan: CI

Reviewed By: dhruvbird

Differential Revision: D30860720

fbshipit-source-id: de695e5cdfb1fa314b53a8bcb291343ae4eb87a5
2021-09-17 13:31:51 -07:00
5a3475df21 [PyTorch] shrink Argument (#64820)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64820

Putting boolean fields next to each other avoids wasting space for padding.
ghstack-source-id: 138304433

Test Plan: CI

Reviewed By: dhruvbird

Differential Revision: D30860717

fbshipit-source-id: ad45c37574a7c857958978aad42fd1333c6b29ee
2021-09-17 13:31:48 -07:00
132d65ed25 [PyTorch] Compare pointers before calling expensive Type comparison (#64784)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64784

See code comment for explanation.
ghstack-source-id: 138304431

Test Plan: Reduced overhead in findSchemaDifferences while profiling registration at startup in a case where I forced duplicates to be registered (by looping in RegisterDispatchKey.cpp).

Reviewed By: dhruvbird

Differential Revision: D30854036

fbshipit-source-id: 568733c3cf449697cdeb74cf57fed0926729fa68
2021-09-17 13:31:46 -07:00
cf5c00f155 CI: Consolidate Build and Test naming for better stats collection (#65232)
Summary:
All build pytorch steps should now be named "Build" and test steps named "Test" for workflows that test PyTorch on Linux and Windows.

I left the binary stuff alone as that build is different.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65232

Reviewed By: driazati, seemethere

Differential Revision: D31024232

Pulled By: janeyx99

fbshipit-source-id: 24b1a1e2b1b25aba70b7adc41603ec8fa4ce7dd6
2021-09-17 13:30:31 -07:00
45bd0f6181 Back out "Revert D30745960: [DDP] Remove SPMD from self.modules_buffers" (#64778)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64778

Original commit changeset: d3f3fb813c45
ghstack-source-id: 138326910

Test Plan: ci

Reviewed By: H-Huang

Differential Revision: D30849443

fbshipit-source-id: 15dab8a959a29d2e2fefac6ad52b8d8168eacc02
2021-09-17 12:28:36 -07:00
70f286c1e2 Back out "Revert D30745961: [DDP] Remove self.modules_params" (#64777)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64777

Original commit changeset: 59f7cc50d369
ghstack-source-id: 138326909

Test Plan: ci

Reviewed By: H-Huang

Differential Revision: D30849442

fbshipit-source-id: bb87ba83935374d8a3ebbc29365df1417dd4f26f
2021-09-17 12:28:34 -07:00
61dfcbf4bc Back out "Revert D30745921: [DDP] Fix when buffers are reassigned in module" (#64776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64776

Original commit changeset: 343ead86bf1e
ghstack-source-id: 138326914

Test Plan: ci

Reviewed By: H-Huang

Differential Revision: D30849444

fbshipit-source-id: 9a72805416fe7d6c68e51bdcdb88f6e1fecb614d
2021-09-17 12:28:32 -07:00
cce5381238 [xplat][pytorch]: Disabling too many logging. (#65170)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65170

Disabling too many logging. These are per frame logging
and outputting lots of logs in Skylight command line.

Test Plan:
```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
cd -
```

Reviewed By: SS-JIA

Differential Revision: D30778852

fbshipit-source-id: bcf75ec417dfe3e9ce3df92a1894352772bd663d
2021-09-17 12:28:30 -07:00
047e68235f delegate parallelism to Ninja when possible (#64733)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64733

The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
 * the cmake command did not specify "-j"
 * the number of top-level jobs in top/pstree never exceeded 26 (24 +
   2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

Reviewed By: jbschlosser, driazati

Differential Revision: D30968796

Pulled By: dagitses

fbshipit-source-id: 29547dd378fea793957bcc2f7d52d5def1ecace2
2021-09-17 12:28:28 -07:00
b936a10074 add test for number of jobs when building (#65162)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65162

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30998006

Pulled By: dagitses

fbshipit-source-id: 8b8d45668acf0e6c0f16df0f705a1af8c6d4f22d
2021-09-17 12:28:25 -07:00
1ee66a5278 Remove CUDA 9.2 references conditionals and workarounds (#65070)
Summary:
Title says it all

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65070

Reviewed By: malfet

Differential Revision: D30966464

Pulled By: janeyx99

fbshipit-source-id: e454906fd5d7d321d390939ba5d237e1d9b150f8
2021-09-17 12:28:23 -07:00
51e12f0071 fix torch.distributed.elastic event docs (#64974)
Summary:
the example code wasn't working for me.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang cbalioglu gcramer23

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64974

Reviewed By: kiukchung, cbalioglu

Differential Revision: D30926481

Pulled By: edward-io

fbshipit-source-id: f5e32cc2b948b6ee30d84a8247856f39fc786f67
2021-09-17 12:27:09 -07:00
bbe25af0df [nnc] Updated inlining to handle cases when producer indices are constants after eval (#65044)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65044

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30954655

Pulled By: navahgar

fbshipit-source-id: dfaedb5af710b2625ceec3a443a6c4e34158ab16
2021-09-17 11:28:48 -07:00
03fc636d5c [nnc] Updated inliner to remove assertions and exception (#64719)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64719

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30828583

Pulled By: navahgar

fbshipit-source-id: 9826a59085a210e44d101a843ff2cae440dfd633
2021-09-17 11:28:46 -07:00
340531f2e0 [ONNX] Do not use numpy in ONNX opsets (#65188)
Summary:
Replace `torch.tensor([numpy.arange(a, b, c)])` with `torch.arange(a, b, c).unsqueeze(0)`
Replace `tuple(numpy.add(a, b))` with `tuple( x + y for (x, y) in zip(a, b)`

As `numpy` is an optional dependency, it shouldn't be used in PyTorch core by default

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65188

Reviewed By: mruberry

Differential Revision: D31009490

Pulled By: malfet

fbshipit-source-id: 528e48f055bf9ac1de1fd7e94c0be41915df9a0b
2021-09-17 11:28:44 -07:00
7ced25eee3 [CoreML][OSS] Include Core ML in iOS/MacOS nightlies (#65075)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65075

Need to drop one line at - https://github.com/pytorch/builder/blob/master/conda/pytorch-nightly/meta.yaml#L65
ghstack-source-id: 138324213

Test Plan:
- Check the iOS nightly builds
  - `pod install LibTorch-Lite-Nightly`

Reviewed By: hanton

Differential Revision: D30912269

fbshipit-source-id: b07679b75ecf89beae2975c37cf17d2449a3304f
2021-09-17 11:27:20 -07:00
f9c0a39ad9 add a test case for const fold (#65224)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65224

Add a test case for the fix D30996277 (8c38d141df).

Test Plan: buck test mode/opt -c python.package_style=inplace -c fbcode.nvcc_arch=v100,a100 -c fbcode.enable_gpu_sections=true -j 40 caffe2/test:fx_const_fold -- test_const_fold_module_attr

Reviewed By: jfix71

Differential Revision: D31000386

fbshipit-source-id: f444361839decc583bf93ac946cfe2049376719e
2021-09-17 10:32:07 -07:00
3c003aa6ae [PyTorchEdge] promote prim ops by using ops table for mobile runtime (#64816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64816

## Context:
Promoting prim ops:
Certain prim ops are frequent than others (like tupleIndex, raiseException, ...). These ops are frequent that they are chosen to be promoted as first class instructions. To promote it requires multiple steps and support from TS team as it changes how the bytecode is serialized and deserialized. So to prevent multiple bytecode version bumps and provided stability while these changes happen, an iterim iterative process is proposed which uses a table to lookup for "promoted" op's function. This allows us to rapidly update the ops list and test on production model without having to change the bytecode. In case of failure, we can quickly revert this change.

## Observation
The ops are chosen based on the notebook N1135657 which examines the top frequent ops.

## Fix
An iterim solution of having a static table, which when given a prim op name returns a function to be applied on the stack. This helps us check in `function.cpp` to get the "promoted" op. As a fall back, the "promoted" op still resides in `register_prim_ops.cpp` so that the function of prim op is never missed.

ghstack-source-id: 138261338

Test Plan:
```
[pavithran@67109.od ~/fbsource/fbcode (eddab7da6)]$ buck test caffe2/test/cpp/jit:jit -- BackendTest.TestComposite
Building: finished in 5.4 sec (100%) 7284/7284 jobs, 0/7284 updated
  Total time: 5.8 sec
More details at https://www.internalfb.com/intern/buck/build/480191aa-a1ba-42ca-99e9-ee4bf2b06d65
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 867382eb-327f-43d7-a45c-875b7f484b15
Trace available for this run at /tmp/tpx-20210914-100224.283682/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/844425134506115
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (12.159)
    ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestCompositeWithSetStates (0.797)
    ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestComposite (0.779)
Summary
  Pass: 2
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/844425134506115
```

{F663491347}

Reviewed By: iseeyuan

Differential Revision: D30819926

fbshipit-source-id: 4cbe05d5761bdc9d62ef08e18172dcf64cb49526
2021-09-17 10:32:05 -07:00
ecfc784e67 Revert D30993855: [pytorch][PR] OpInfo: nn.functional.conv2d
Test Plan: revert-hammer

Differential Revision:
D30993855 (873255c6d9)

Original commit changeset: 7402f99addb4

fbshipit-source-id: b0539daa195dc6a3739bce5c264cb2177b7721ff
2021-09-17 10:32:02 -07:00
18fa58c4e9 [CoreML][OSS] Integrate with CMake (#64523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64523

- Build Pytorch with CoreML delegate - ` USE_PYTORCH_METAL=ON python setup.py install --cmake`
- Build iOS static libs - `IOS_PLATFORM=SIMULATOR USE_COREML_DELEGATE=1  ./scripts/build_ios.sh`
ghstack-source-id: 138324216

Test Plan:
- Test the Helloword example

{F657778559}

Reviewed By: iseeyuan

Differential Revision: D30594041

fbshipit-source-id: 8cece0b2d4b3ef82d3ef4da8c1054919148beb16
2021-09-17 10:32:00 -07:00
c1415a0a72 [Reland] [Model Averaging] Simplify PostLocalSGD Optimizer API (#65197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65197

1. The constructor accepts a local optimizer instance instead of the inputs of local optimizer constructor and the class type.
2. The parameters are read from local optimizer's param_groups instead of a separate input.

Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 138307226

Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity

Reviewed By: rohan-varma

Differential Revision: D31007439

fbshipit-source-id: bbb0526e6763ef76775b85088571506b3942c722
2021-09-17 10:31:58 -07:00
752a820230 Bf16 matmul (#64619)
Summary:
Re-create PR to fix https://github.com/pytorch/pytorch/pull/61891.

Drop the support for addbmm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64619

Reviewed By: jbschlosser

Differential Revision: D30902995

Pulled By: VitalyFedyunin

fbshipit-source-id: dc318d73adff8f6974c9752d0d097e69276f8206
2021-09-17 10:31:56 -07:00
f9bf144a0c Torchhub: rewrite commit hash check to avoid using unnecessary GitHub API credits (#64362)
Summary:
This PR adds more detailed error messages to torchhub if the commit hash validation goes wrong, providing suggestions to the users on how to resolve the issue.

It also documents why such validation is important.

EDIT: it also avoids validatating some stuff when we know "stuff" isn't a commit since there's no risk in this case

CC malfet mthrok

cc nairbv NicolasHug

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64362

Reviewed By: gchanan, malfet

Differential Revision: D30731191

Pulled By: NicolasHug

fbshipit-source-id: d1ee7c2ef2591dd7a5291977af1635ada2552d1b
2021-09-17 10:30:39 -07:00
0559cb37cd [FX] Ensure BC coverage for all of torch.fx.passes (#65081)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65081

Test Plan: Imported from OSS

Reviewed By: jbschlosser, khabinov

Differential Revision: D30967428

Pulled By: jamesr66a

fbshipit-source-id: 2ff83da728dc469f086cf504e71b43396db612d8
2021-09-17 09:32:43 -07:00
cf7409e184 [FX] Move graph_manipulation and param_fetch out of experimental and into passes (#65183)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65183

ghstack-source-id: 138309655

Test Plan: waitforsadcastle

Reviewed By: protonu

Differential Revision: D31007630

fbshipit-source-id: 77d14b284737aabbe2b9e6394177a0c2e40aafba
2021-09-17 09:32:40 -07:00
6aa04b6843 [fx2trt] make gpu trace better (#65168)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65168

Add record_function to TRTModule and EngineHolder so each parts would appear on gpu trace.

Test Plan: CI

Reviewed By: wushirong

Differential Revision: D30997968

fbshipit-source-id: b90662f20a8c0d321846c222f3e8c8eb7e010eba
2021-09-17 09:32:37 -07:00
a8d7b885c5 [CoreML][iOS/MacOS] Add the CoreML executor (#64522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64522

The `PTMCoreMLExecutor` serves as a bridge between the delegate APIs and Core ML runtime.
ghstack-source-id: 138324217

allow-large-files

Test Plan:
iOS:
Run the CoreML tests in the playground app

MacOS:

```
buck test pp-macos

PASS     633ms  1 Passed   0 Skipped   0 Failed   CoreMLTests
```

{F657776101}

Reviewed By: raziel, iseeyuan

Differential Revision: D30594042

fbshipit-source-id: a42a5307a24c2f364333829f3a84f7b9a51e1b3e
2021-09-17 09:32:34 -07:00
aafeea3a6c Allow extra unused arguments in symbolic shape function (#65095)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65095

The reason I didn't do this initially was because I was worried that matching one schema to another schema with an extra argument might change semantics, e.g. Add(Tensor, Tensor) to Add(Tensor, Tensor, Tensor) might be different. However we don't actually need to worry about this because the graph schema isn't used for node matching, unlike symbolic_script.cpp

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30972081

Pulled By: eellison

fbshipit-source-id: d4089e8feafc330df2ca158866fe779a7da0b073
2021-09-17 09:31:02 -07:00
6eafe7f15e Actually deprecate __torch_function__ as plain methods (#64843)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64843

Fix for https://github.com/pytorch/pytorch/issues/63767

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30991425

Pulled By: albanD

fbshipit-source-id: 1214143b8aea87e6ff406c7fc13096bd15d1a768
2021-09-17 08:32:53 -07:00
1ed9c33d08 Update fx proxy to use classmethod for __torch_function__ (#64842)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64842

Change the `__torch_function__` to follow best guidelines of using classmethods.
I am not sure how to handle the case where multiple tracer objects are given as input but given that before we were getting an arbitrary tracer from there based on the "self" that was arbitrarily chosen by the torch_function caller, the new implementation is not worst?
Let me know what you think!

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30991423

Pulled By: albanD

fbshipit-source-id: d28940df230b543952b278a0eb2d61cf7ae123ce
2021-09-17 08:32:51 -07:00
473e55d5b2 Use classmethods for overrides (#64841)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64841

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30991424

Pulled By: albanD

fbshipit-source-id: 551e2119768f3a4292713f3bfa83930f5506adbd
2021-09-17 08:32:49 -07:00
a95fabfecb Fix port allocation race condition for elastic test (#65149)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65149

Fixes #64789

There is a race condition between when the free port is acquired to when it is used to create the store in which it may have been used. Since this test only tests that timeout is triggered for tcpstore, we can bind to any port on tcpstore creation.

This only affects the test on the server (since that is where the port is used), but I changed both tests for clarity

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang cbalioglu gcramer23

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D30993166

Pulled By: H-Huang

fbshipit-source-id: eac4f28d641ac87c4ebee89df83f90955144f2f1
2021-09-17 08:32:47 -07:00
f101070587 Small improvements to compare_models_torch binary (#65171)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65171

Add the model comparison binary to BUCK, and also add some quality of life features such as controlling the input range.

Test Plan:
```
# Build the binary
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:ptmobile_compareAndroid\#android-arm64 --show-ou
# Push it to the device
adb push buck-out/gen/xplat/caffe2/ptmobile_compareAndroid\#android-arm64 /data/local/tmp/compare_models

# Run the benchmark binary
BENCH_CMD="/data/local/tmp/compare_models"
BENCH_CMD+=" --model=$PATH_TO_MODEL"
BENCH_CMD+=" --refmodel=$PATH_TO_REFERENCE_MODEL"
BENCH_CMD+=" --input_type=float --input_dims=$MODEL_INPUT_SIZE"
BENCH_CMD+=" --iter=100"
BENCH_CMD+=" --tolerance 1e-5"

```

Reviewed By: beback4u

Differential Revision: D30371322

fbshipit-source-id: 5e520aaf119c90985a1d5a135f76e4057148333b
2021-09-17 08:32:45 -07:00
9601deb1b3 Disable autograd fallback tests on Windows (#65147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65147

I think they trigger an MSVC bug per https://github.com/pytorch/pytorch/issues/48763
ghstack-source-id: 138247203

Test Plan: breakpointed https://www.internalfb.com/intern/sandcastle/job/9007199738584981/ and sush'ed into the host and ran `buck build arvr/mode/win/opt //xplat/caffe2:autograd_libtorch_test_ovrsource` in `/cygdrive/d/ovrsource-null-hg`

Reviewed By: soulitzer

Differential Revision: D30992685

fbshipit-source-id: 06c6fb2c18d55490f89fc91ee5b7a4c5a7faf1c6
2021-09-17 08:32:43 -07:00
aaffcfe9cd implement "xy" indexing for torch.meshgrid (#62724)
Summary:
This is step 4/7 of https://github.com/pytorch/pytorch/issues/50276. This allows the use of `"xy"` indexing but doesn't change any defaults.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62724

Reviewed By: heitorschueroff

Differential Revision: D30995290

Pulled By: dagitses

fbshipit-source-id: 08a6a6144b20bc019f68bc3c52e3bbf967976d8f
2021-09-17 08:31:17 -07:00
d37c02be08 Allow parametrization to be nested (#65167)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65163

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65167

Reviewed By: jbschlosser

Differential Revision: D31002318

Pulled By: albanD

fbshipit-source-id: b1f1c6c9efa9e83af9789ed13efc133f777f418e
2021-09-17 07:29:01 -07:00
9157a2889f Pass GITHUB_TOKEN to linux CI jobs and avoid skipping torchhub tests (#64807)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64760

This should hopefully put the torchhub tests back.

This also avoids skipping the torchhub tests: currently the tests are skipped if they fail, which pretty much defeats the purpose of having a test in the first place since we're never notified when they do fail.

cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra nairbv NicolasHug

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64807

Reviewed By: seemethere

Differential Revision: D30994585

Pulled By: NicolasHug

fbshipit-source-id: 561782c22462b5cfec99cca153eb59623db5660a
2021-09-17 03:30:56 -07:00
7dc3858deb [CoreML][fbcode] Add the preprocess python APIs (#64521)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64521

Add the preprocess part for the coreml delegate. Check out the `example.py` for the usage.
ghstack-source-id: 138324214

Test Plan:
```
(base) [taox@devvm2780.vll0 ~/fbsource/fbcode/caffe2/fb]  buck run coreml:example -- --model="/home/taox/mobilenetv2/mobilenetv2.pt" --out="/home/taox/mobilenetv2/mobilenetv2_coreml.pt"
Parsing buck files: finished in 0.5 sec
Downloaded 0/1 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 10.6 sec (100%) 12611/57623 jobs, 1/57623 updated
  Total time: 11.1 sec
Converting Frontend ==> MIL Ops: 100%|██████████████████████████████████████████▉| 382/383 [00:00<00:00, 692.58 ops/s]
Running MIL optimization passes: 100%|███████████████████████████████████████████| 18/18 [00:00<00:00, 45.55 passes/s]
Translating MIL ==> MLModel Ops: 100%|███████████████████████████████████████████| 704/704 [00:01<00:00, 468.56 ops/s]
input {
  name: "input_0"
  type {
    multiArrayType {
      shape: 1
      shape: 3
      shape: 224
      shape: 224
      dataType: FLOAT32
    }
  }
}
output {
  name: "645"
  type {
    multiArrayType {
      dataType: FLOAT32
    }
  }
}
metadata {
  userDefined {
    key: "com.github.apple.coremltools.source"
    value: "torch==1.10.0a0+fb"
  }
  userDefined {
    key: "com.github.apple.coremltools.version"
    value: "4.1"
  }
}

{'inputs': '[["input_0", "0", "[1, 3, 224, 224]"]]', 'outputs': '[["645", "0", "[1, 1000]"]]', 'config': '{"spec_ver": "4", "backend": "cpu", "allow_low_precision": "True"}', 'metadata': '{"coremltool_ver": "4.1", "torch_ver": "torch==1.10.0a0+fb"}'}
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0826 13:27:12.690302 2477051 backend_detail.cpp:376] Warning: Backend [coreml] is not available. Execution of this Module is still possible by saving and loading on a device where the backend is available. (function codegen_backend_module)
graph(%self.1 : torch.jit.LoweredModule.coreml.__torch__.torchvision.models.mobilenetv2.MobileNetV2,
      %x.1 : Tensor):
  %51 : str = prim::Constant[value="Exception: Backend is not available."]()
  %50 : str = prim::Constant[value="AssertionError: "]()
  %14 : str = prim::Constant[value="forward"]() # <string>:5:62
  %48 : Tensor = prim::Uninitialized()
  %44 : Tensor = prim::Uninitialized()
  %typed_inputs.1 : Any[] = prim::ListConstruct(%x.1)
  %__backend.3 : __torch__.torch.classes.__backends__.coreml = prim::GetAttr[name="__backend"](%self.1)
  %8 : bool = prim::CallMethod[name="is_available"](%__backend.3) # <string>:4:19
  %49 : Tensor = prim::If(%8) # <string>:4:16
    block0():
      %__backend : __torch__.torch.classes.__backends__.coreml = prim::GetAttr[name="__backend"](%self.1)
      %__handles : Dict(str, Any) = prim::GetAttr[name="__handles"](%self.1)
      %15 : Any = aten::__getitem__(%__handles, %14) # <string>:5:47
      %17 : Any[] = prim::CallMethod[name="execute"](%__backend, %15, %typed_inputs.1) # <string>:5:24
      %18 : Any = prim::ListUnpack(%17)
      %20 : bool = prim::isinstance[types=[Tensor]](%18)
      %39 : Tensor = prim::If(%20) # <string>:6:18
        block0():
          %22 : Tensor = prim::unchecked_cast(%18)
          -> (%22)
        block1():
           = prim::RaiseException(%50) # <string>:6:18
          -> (%44)
      -> (%39)
    block1():
       = prim::RaiseException(%51) # <string>:9:18
      -> (%48)
  return (%49)

```

Reviewed By: raziel

Differential Revision: D30585154

fbshipit-source-id: 66c7d2e931be6eaa3c43a0ee131ea8046452449d
2021-09-17 00:25:14 -07:00
8241193d76 [Static Runtime] Introduce static_runtime::dict_unpack (#64771)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64771

Test Plan:
- Added `StaticRuntime.RemoveImmutableInputDictLookupsWithImmutableInputDict`
- Added `StaticRuntime.RemoveImmutableInputDictLookupsWithMutableInputDict`
- TBD: Perf impact measurement

Reviewed By: mikeiovine

Differential Revision: D30685083

fbshipit-source-id: 050a92ef3b3ed0fdc0ab7a13a4b5dbfede9342a9
2021-09-16 23:25:13 -07:00
e6c39a521b [ONNX] Update submodule to 1.10.1 (#63716) (#64576)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **https://github.com/pytorch/pytorch/issues/64576 [ONNX] Update submodule to 1.10.1 (https://github.com/pytorch/pytorch/issues/63716)**

* [ONNX] Update IR version to 7

* [ONNX] update submodule to 1.10.1

* Disable some tests in caffe2 that fail b/c caffe2 doesn't support the
  new ops.
* Update Bazel file.

* Update expect files for new ONNX IR version

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64576

Reviewed By: jansel

Differential Revision: D31006896

Pulled By: msaroufim

fbshipit-source-id: f3bf97709f23a5a2cd49c708e7363231f2c1961a
2021-09-16 22:29:54 -07:00
9117eed6ed [FX} Add torch.ops.profiler._record_function_{enter,exit} as stateful ops for DCE (#65180)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65180

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D31007115

Pulled By: jamesr66a

fbshipit-source-id: 823b15db712a382a4f2a4fd409983d47bc067150
2021-09-16 21:31:54 -07:00
02dec91212 [quant] AO migration of the torch/quantization/utils.py (phase 1) (#64919)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64919

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates the quantization utilities.
ghstack-source-id: 138303325

Test Plan: `buck test mode/dev //caffe2/test:quantization`

Reviewed By: jerryzh168

Differential Revision: D30899082

fbshipit-source-id: 85eb38c419e417147e71758b682cd095308dd0c9
2021-09-16 21:30:18 -07:00
64641eaee6 [acc_utils] Add print_model_info (#65045)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65045

This is a useful tool for printing out all of the ops that are found in a model after acc_tracer. It assumes the provided model has no `call_module` or `call_method`, which is generally a reasonable assumption assuming a model has been successfully traced by the acc_tracer.

Test Plan:
Tested locally. Sample output:
```
Model Info:
> placeholder: 1184
> get_attr: 655
> output: 2
> torch.fx.experimental.fx_acc.acc_ops.add: 2
> torch.fx.experimental.fx_acc.acc_ops.cat: 23
> torch.fx.experimental.fx_acc.acc_ops.embedding_bag: 576
> torch.fx.experimental.fx_acc.acc_ops.layer_norm: 15
> torch.fx.experimental.fx_acc.acc_ops.linear: 27
> torch.fx.experimental.fx_acc.acc_ops.matmul: 3
> torch.fx.experimental.fx_acc.acc_ops.mul: 17
> torch.fx.experimental.fx_acc.acc_ops.permute: 2
> torch.fx.experimental.fx_acc.acc_ops.reshape: 419
> torch.fx.experimental.fx_acc.acc_ops.sigmoid: 16
> torch.fx.experimental.fx_acc.acc_ops.slice_tensor: 630
> torch.fx.experimental.fx_acc.acc_ops.sum: 4
> torch.fx.experimental.fx_acc.acc_ops.tanh: 315
```

Reviewed By: 842974287

Differential Revision: D30954829

fbshipit-source-id: 5c4f0770667b72859b74099d9f4575284fc48bd2
2021-09-16 20:29:22 -07:00
8c38d141df Add back the owning_module fix (#65159)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65159

This was a legit fix originally introduced in D30905949 (446d95a7f6). But we hesitated and removed it for some reason. Putting it back.

Reviewed By: 842974287

Differential Revision: D30996277

fbshipit-source-id: 3f5eede11dba2072e7cd5ae6ca7ac81d55fb75fa
2021-09-16 19:29:56 -07:00
c886406ce0 Add dropout shape inference as no-op in acc_tracer (#65113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65113

Register dropout as no-op in acc_tracer & Add shape inference for no-op

Test Plan:
buck test glow/fb/fx/acc_tracer:test_acc_shape_inference --- test_unary_15_dropout_no_op
buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_dropout

Reviewed By: jfix71

Differential Revision: D30880679

fbshipit-source-id: 592fe50e17137c94c12727658191dedf08daf8cf
2021-09-16 18:26:55 -07:00
6f120ada50 Pin SciPy to 1.6.2 on Windows (#65017)
Summary:
Re-enable previously disabled test_distributions

Note: conda does not have ScipPy-1.6.3, only 1.6.2

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65017

Reviewed By: seemethere

Differential Revision: D31003199

Pulled By: malfet

fbshipit-source-id: 96b9d2a833f703008bb1f4df9361db8ec6f8ccc6
2021-09-16 18:25:43 -07:00
0a5149019f Added logging for the Reducer's non-member functions. (#65023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65023

Added an optional logging parameter for non-member functions `compute_bucket_assignment_by_size` and `verify_replica0_across_processes`. If a logger is provided then `TORCH_CHECK` assertions are replaced with a wrapper that logs the error to the DDP reducer's logger before calling `TORCH_CHECK`. If a logger is not provided `TORCH_CHECK` is still called.

Modified python-side calls to `_compute_bucket_assignment_by_size` and `_verify_model_across_ranks` to include a logger whenever possible. A notable exception is when these non-member functions are called in DDP's constructor - we cannot pass in a logger as they may have not been initialized yet.

We also added 4 new tests: `test_compute_bucket_assignment_by_size_sparse_error_{with, without}_logger` which tests the `_compute_bucket_assignment_by_size` function to ensure that sparse tensors are rejected and the errors are logged.  `test_verify_model_across_rank_{with, without}_logger` calls `_verify_model_across_ranks` to ensure that ill-formed models (different ranks have different number of parameters compared to rank 0) are rejected and the errors are logged. The test `test_ddp_model_diff_across_ranks` remains unchanged - while it does construct a ill-formed DDP instance which triggers the error in `_verify_model_across_ranks`, we cannot check the logger because this error appears in the constructor.

Lastly, did some cleanup of the `test_ddp_model_diff_across_ranks` function to make the logic of choosing which context manager and error message to use more clean.

Test Plan:
**Build commands**
`buck build mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn --keep-going`

`buck build mode/dev-nosan //caffe2/test/distributed:distributed_gloo_spawn --keep-going`

**Test commands**
Test for `_compute_bucket_assignment_by_size` (Python)/ `compute_bucket_assignment_by_size` (C++)
`BACKEND={nccl, gloo} WORLD_SIZE=2 ../buck-out/dev/gen/caffe2/test/distributed/distributed_{nccl, gloo}_spawn#binary.par -r test_compute_bucket_assignment_by_size_sparse_error_{with, without}_logger`

Test for `_verify_model_across_ranks` (Python)/`verify_replicas0_across_process` (C++)
`BACKEND={nccl, gloo} WORLD_SIZE=2 ../buck-out/dev/gen/caffe2/test/distributed/distributed_{nccl, gloo}_spawn#binary.par -r test_verify_model_across_ranks_{with, without}_logger`

Test that constructs an ill-formed DDP instance. Only did cleanup of this function.
`BACKEND={nccl, gloo} WORLD_SIZE=2 ../buck-out/dev/gen/caffe2/test/distributed/distributed_{nccl, gloo}_spawn#binary.par -r test_ddp_model_diff_across_ranks`

Reviewed By: rohan-varma

Differential Revision: D30924790

fbshipit-source-id: dae6fa82485a204a6a4b022f2d073417d07ebb2f
2021-09-16 16:39:39 -07:00
873255c6d9 OpInfo: nn.functional.conv2d (#63517)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/54261

Reference: https://github.com/facebookresearch/functorch/issues/78

Mostly inspired from https://github.com/pytorch/pytorch/issues/62882

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63517

Reviewed By: heitorschueroff

Differential Revision: D30993855

Pulled By: zou3519

fbshipit-source-id: 7402f99addb4ef8f19c2ce1a09ed9006e737cc7e
2021-09-16 14:27:36 -07:00
4c4c03124b Remove old references to 9.2 in documentation (#65059)
Summary:
Removes references in .rst and README.md and comments in the Dockerfile

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65059

Reviewed By: malfet

Differential Revision: D30961110

Pulled By: janeyx99

fbshipit-source-id: 702a9a81bf08125ec4ac38bc656fc2c128c30018
2021-09-16 13:24:05 -07:00
4c15f8e8b4 Provide function interface for remove_duplicate_output_args (#65134)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65134

So that its implementation can be abstracted and replaced

Test Plan: Run linter, CI

Reviewed By: 842974287

Differential Revision: D30966916

fbshipit-source-id: 92ec78c7410d0be14faecb0ba1eafdc74bab5a5d
2021-09-16 13:17:37 -07:00
f9c341fdf2 Add type annotation for TRTInterpreter.run (#65135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65135

Opportunistically adding type annotation as I work through fx2trt code base.

Test Plan: run linter and CI

Reviewed By: houseroad, 842974287

Differential Revision: D30903185

fbshipit-source-id: 3f700b57f4433f2d312c1ff2e6b99948e3c8845c
2021-09-16 13:16:06 -07:00
8a094e3270 [quant]ao migration for quantization mappings and fuser method mappings hg mv (#64985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64985

moving quantization_mappings.py and fuser_method_mappings.py to the ao folder while retaining backwards compatibility

also added dict test

ghstack-source-id: 138215312

Test Plan:
buck test mode/dev //caffe2/test:quantization

https://www.internalfb.com/intern/testinfra/testrun/7036874471986444

buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization

https://www.internalfb.com/intern/testinfra/testrun/5348024625792701

Reviewed By: z-a-f

Differential Revision: D30982551

fbshipit-source-id: 00f53bd44009d6012a7de852000aad6885131edb
2021-09-16 12:59:20 -07:00
9af6fe991c Remove CUDA 9.2 and older references from our cmake (#65065)
Summary:
Removes old CUDA references in our cuda.cmake

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65065

Reviewed By: malfet

Differential Revision: D30992673

Pulled By: janeyx99

fbshipit-source-id: 85b524089ed57e5acbc71720267cf05e24a8c20a
2021-09-16 12:54:49 -07:00
67570a60ba Disable ParallelTBB (#65092)
Summary:
As ParallelTBB's `at::get_thread_num` is not compatible with general model used by OpenMP and ParallelNative (where it is an contiguous thread index within parallel loop), see https://github.com/pytorch/pytorch/issues/64571#issuecomment-914691883

More examples of similar regressions: https://github.com/pytorch/pytorch/runs/3612142217

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65092

Reviewed By: zhouzhuojie

Differential Revision: D30995936

Pulled By: malfet

fbshipit-source-id: db145b6a850d794f2c954f59f30249b291473e36
2021-09-16 12:38:45 -07:00
96cb05b49a Introduce tensorRT as builtin module for torch::deploy. (#63818)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63818

ghstack-source-id: 138156957

Test Plan: next diff

Reviewed By: wconstab

Differential Revision: D30499309

fbshipit-source-id: 4ab1bc9896243c0c1503afb18fbfb196fc37404e
2021-09-16 11:27:51 -07:00
8eb21488fd [JIT] Improve BatchMM mutability handling (#65097)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65097

Previously, BatchMM would skip any block containing any mutable
operators. Now it will avoid batching any operation whose inputs or
outputs are ever mutated. Specifically: consider a tree of ADD, T,
and MM nodes rooted at an ADD node.  If any input or output to any
node in the tree is ever mutated, then the entire tree will be ignored
by BatchMM.

Test Plan: python test/test_jit.py TestBatchMM

Reviewed By: eellison

Differential Revision: D30973515

Pulled By: davidberard98

fbshipit-source-id: 9d836faa1ef0c9e3fefe0ffc0bd265f275471f48
2021-09-16 10:46:14 -07:00
f309f8fbd4 [quant] ao migration of observer and qconfig (#64982)
Summary:
(Had to recreate this diff so it wasn't dependent on the stack)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64982

migration of qconfig.py and observer.py to torch/ao/quantization using new test format
ghstack-source-id: 138215256

Test Plan:
buck test mode/opt //caffe2/test:quantization

https://www.internalfb.com/intern/testinfra/testconsole/testrun/8444249354294701/

buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization

https://www.internalfb.com/intern/testinfra/testrun/3940649742829796

Reviewed By: z-a-f

Differential Revision: D30982534

fbshipit-source-id: 48d08969b1984311ceb036eac0877c811cd6add9
2021-09-16 10:33:16 -07:00
97e86cf319 [Fix] Raise error when empty index tensor is passed (gather) (#65006)
Summary:
See https://github.com/pytorch/pytorch/pull/63312#issuecomment-919330081 for context.

cc: ezyang ysiraichi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65006

Reviewed By: mruberry

Differential Revision: D30937730

Pulled By: ezyang

fbshipit-source-id: a8f77b1f40d07e7e3bef6caaafa119685f297638
2021-09-16 10:14:26 -07:00
874f9bd509 [FX] Gate FXGraphDrawer on whether pydot is installed (#65088)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65088

Test Plan: Imported from OSS

Reviewed By: khabinov

Differential Revision: D30967951

Pulled By: jamesr66a

fbshipit-source-id: dba2f13a47889b3d4187de925b4fe74ee90b7f79
2021-09-16 10:04:33 -07:00
2c57bbf521 add support for indexing to meshgrid (#62722)
Summary:
This is step 3/7 of https://github.com/pytorch/pytorch/issues/50276. It only adds support for the argument but doesn't implement new indexing modes yet.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62722

Test Plan:
Verified this is not FC breaking by adding logging to both meshgrid
overloads and then called meshgrid twice:

`meshgrid(*tensors)`
  and
`meshgrid(*tensors, indexing='ij')`

This confirmed that the former signature triggered the original native
function and the latter signature triggered the new native function.

Reviewed By: H-Huang

Differential Revision: D30394313

Pulled By: dagitses

fbshipit-source-id: e265cb114d8caae414ee2305dc463b34fdb57fa6
2021-09-16 09:59:49 -07:00
67bd2a31b5 [Reland] Add python mode (#64360)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64360

This PR adds a (private) enable_python_mode context manager.
(see torch/utils/_python_dispatch.py).
enable_python_mode accepts the type of a __torch_dispatch__ object
as its argument. Whenever an operator gets called inside of the
context manager, it dispatches to the __torch_dispatch__ of
the passed-in type.

Example usage:
```
with enable_python_mode(LoggingTensor):
    z = torch.empty([])
    assert isinstance(z, LoggingTensor)
```

There are quite a few changes that were made to support this.

First, we added TorchDispatchTypeObject, a C++ struct that represents the
type of a `__torch_dispatch__` object (e.g. LoggingTensor).
It holds both the PyObject* representing the class and a PyInterpreter*
so we know which Python interpreter it came from.

Next, we updated the concrete_dispatch_fn in python_variable.cpp to accept
a `const std::shared_ptr<TorchDispatchTypeObject>&` argument. When this
is null, dispatching happens as usual. When it is non-null, we prepend
the TorchDispatchTypeObject's PyObject* to the overloaded args list so that
it is considered first for dispatch.

To get that to work, we changed how `handle_torch_dispatch_no_python_arg_parser`
works. The "overloaded args list" previously only consisted of Tensor PyObjects,
but now it can have types in addition to Tensors!
- We renamed `append_overloaded_arg` to `append_overloaded_arg`
- We added a new `append_overloaded_type` that appends a type to
overloaded_args
- We added special handling in `handle_torch_dispatch_no_python_arg_parser`
and `append_overloaded_arg` to handle types in addition to Tensors.

Then, there is PythonMode and PythonModeTLS.
- We reuse the DispatchKey::Python dispatch key as a mode key
- We use PythonMode::enter and PythonMode::exit to enable/disable
DispatchKey::Python and set the PythonModeTLS.
- PythonModeTLS stores a TorchDispatchTypeObject as metadata.
- PythonMode is in libtorch_python, and PythonModeTLS is in ATen.
This split is due to the libtorch_python library boundary (because we need
to save TLS in ATen/ThreadLocalState)
- We modify the PythonFallbackKernel to look up
the relevant TorchDispatchTypeObject (if Python Mode is active) and
dispatch using it.

There are two more miscellaneous changes:
- internal_new_from_data (torch/csrc/utils/tensor_new.cpp) gets an
exclude guard. enable_python_mode currently does not handle
torch.tensor and the exclude guard is to prevent a bug.

Future:
- This PR does not allow for the nesting of Python modes. In the future we
should be able to enable this with a more sane no_dispatch API and by changing
the TLS to a stack. For now I did not need this for CompositeImplicitAutograd testing.

Test Plan: - new tests

Reviewed By: ezyang

Differential Revision: D30698082

Pulled By: zou3519

fbshipit-source-id: 7094a90eee6aa51f8b71bc4d91cfb6f49e9691f8
2021-09-16 09:02:30 -07:00
8800a8b428 Revert D30888794: [Model Averaging] Simplify PostLocalSGD Optimizer API
Test Plan: revert-hammer

Differential Revision:
D30888794 (3d312b3b8e)

Original commit changeset: 21261b480f6b

fbshipit-source-id: 87abb7e8cd9ecaac909ec6c3ee053fa7c4ae1975
2021-09-16 06:39:57 -07:00
83878e19ff Improve LSTM documentation for proj_size > 0 (#65102)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65053. Although the documentation states that:

fe0f9d1daf/torch/nn/modules/rnn.py (L500-L506)

It seems that the definition of `weight_ih_l[k]` could be improved by specifying what happens when `k > 0` and `proj_size > 0`. As `proj_size` is only used in LSTM, no changes are needed for the other RNNs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65102

Reviewed By: supriyar

Differential Revision: D30975781

Pulled By: jbschlosser

fbshipit-source-id: 12df06e5e6a8d5de0ad10fb15e33c3e6311c11d3
2021-09-16 06:35:27 -07:00
f69cf3cf2f [Static Runtime] Use FastSet instead of std::set everywhere (#65114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65114

There doesn't seem to be any reason to use std::set for sets of pointers, right?
ghstack-source-id: 138198504

Reviewed By: hlu1

Differential Revision: D30978450

fbshipit-source-id: 4599c6249fda3a89959f839d3bf6400c5891f82c
2021-09-15 21:44:54 -07:00
0bda7476cf Reduce PyToch Warnings - Cast fixes from D26624430 (#65015)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65015

Split out the existing fixes into a diff we can land separately.

Test Plan:
pooled_embeddings_modules_test

Parsing buck files: finished in 8.3 sec
Creating action graph: finished in 38.3 sec
[RE] Metadata: Session ID=[https://fburl.com/b/reSessionID-9bea421c-875e-4168-9e00-7d67479b1a9f]
[RE] Waiting on 46 remote actions. Completed 905 actions remotely, action cache hit rate: 5.08%.
Downloaded 7002/8869 artifacts, 560.00 Mbytes, 11.6% cache miss (for updated rules)
Building: finished in 13:12.4 min (100%) 31964/31964 jobs, 17344/31964 updated
  Total time: 13:59.1 min
More details at https://www.internalfb.com/intern/buck/build/b9a58bba-e0aa-4c2b-8824-a0c4074b0954
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 28cbe2b1-6fbc-450c-91c9-c06a7ff1d53b
Trace available for this run at /tmp/tpx-20210914-114921.005504/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/1407375088325000
    ✓ ListingSuccess: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - main (23.849)
    {emoji:2702} Omit: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda)
Test output:
> This test was disabled.
To run this test locally, add the command line flag --run-disabled to your test command (prefix with -- if using buck).
To view why this is disabled or re-enable this test in the test console, visit https://our.intern.facebook.com/intern/testinfra/testdetail/562949981577936
    ↻ Skip: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu) (13.201)
Test output:
> Repro command : $(cat "/tmp/tpx-20210914-114921.005504/dc174692-8d92-4459-8b8f-201643c6ab7d/execution_command")
Skipped: CUDA is not available or no GPUs detected
stdout:

stderr:

    ↻ Skip: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation_autograd (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda) (13.201)
Test output:
> Repro command : $(cat "/tmp/tpx-20210914-114921.005504/dc174692-8d92-4459-8b8f-201643c6ab7d/execution_command")
Skipped: CUDA is not available or no GPUs detected
stdout:

stderr:

    ✓ Pass: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_compatibility (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda) (13.201)
    ↻ Skip: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation_autograd (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu) (13.201)
Test output:
> Repro command : $(cat "/tmp/tpx-20210914-114921.005504/dc174692-8d92-4459-8b8f-201643c6ab7d/execution_command")
Skipped: CUDA is not available or no GPUs detected
stdout:

stderr:

    ✓ Pass: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_compatibility (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu) (13.201)
    ✓ Pass: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - main (13.201)
Summary
  Pass: 3
  Skip: 3
    ↻ caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu)
    ↻ caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation_autograd (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda)
    ↻ caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation_autograd (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu)
  Omit: 1
    {emoji:2702} caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda)
  ListingSuccess: 1

shape_inference_mode_test

[amrelshennawy@devvm855.ftw0 /data/users/amrelshennawy/fbsource/fbcode] buck test caffe2/torch/fb/sparsenn:shape_inference_mode_test
Downloaded 6/18 artifacts, 11.69 Kbytes, 53.8% cache miss (for updated rules)
Building: finished in 1.6 sec (100%) 110/110 jobs, 26/110 updated
  Total time: 1.8 sec
More details at https://www.internalfb.com/intern/buck/build/0e5f45b2-5777-49e9-a3b0-09bd05687b2b
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 99509108-5ff3-4b1a-b7b3-2f43c4036209
Trace available for this run at /tmp/tpx-20210914-120119.723607/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/6192449502564504
    ✓ ListingSuccess: caffe2/torch/fb/sparsenn:shape_inference_mode_test - main (0.374)
    ✓ Pass: caffe2/torch/fb/sparsenn:shape_inference_mode_test - test_set_upper_bound_mode (torch.python.fb.shape_inference_mode_test.TestShapeInferenceMode) (0.249)
    ✓ Pass: caffe2/torch/fb/sparsenn:shape_inference_mode_test - test_set_upper_bound_settings (torch.python.fb.shape_inference_mode_test.TestShapeInferenceMode) (0.253)
Summary
  Pass: 2
  ListingSuccess: 1

test
[amrelshennawy@devvm855.ftw0 /data/users/amrelshennawy/fbsource/fbcode] buck test caffe2/torch/fb/sparsenn:test
Parsing buck files: finished in 1.1 sec
Creating action graph: finished in 38.6 sec
Downloaded 6/30 artifacts, 11.29 Kbytes, 66.7% cache miss (for updated rules)
Building: finished in 41.6 sec (100%) 26783/26783 jobs, 43/26783 updated
  Total time: 01:21.4 min
More details at https://www.internalfb.com/intern/buck/build/8f794eb0-3d3c-4ee3-9aec-5ec5cec1b0f4
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: a06164b5-d7d7-444c-a4ff-e312cb9970d9
Trace available for this run at /tmp/tpx-20210914-120428.464799/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/3377699789132066
    ✓ ListingSuccess: caffe2/torch/fb/sparsenn:test - main (16.637)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_dense_mlp_quantize_ops (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.870)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges_shape_inference_mode (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.922)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_gather_ranges_to_dense_caffe2 (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.348)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_simple (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.370)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_recat_embedding_grad_output_mixed_D_batch (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.516)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_embedding_bag_byte_rowwise_offsets (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.515)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_ranges (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.861)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_embedding_bags (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.873)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_ranges_out (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.969)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_pack_segments_pad_minf (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.104)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_multiple_runs (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.342)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_sigrid_transform (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.664)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_ranges_out_empty_batch (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.745)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_lengths (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.771)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_multiple_runs_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.944)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_ranges_empty_batch (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.944)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_gather_ranges_shape_inference_mode (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.245)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_prior_correction_calibration_prediction_nonbinary (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.328)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_8bitfakefused (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.501)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_ranges (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (20.608)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_lengths_inference_tests (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (22.403)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_broadcast_cat_out (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (23.025)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_lengths_negatives_tests (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (23.956)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_broadcast_cat (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (24.100)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_transform_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (17.384)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_values_scores_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.672)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_empty_values_scores_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.679)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_pack_segments (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.726)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_ranges_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (17.567)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_box_cox_all_zeros (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.036)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_rowwise_prune_op_32bit_indices (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.430)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_transform_torch_bind_upper_bound (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.176)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_dense_feature_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.006)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges_gather (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.555)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_int_nbit_split_embedding_codegen_lookup_function (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.791)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_pack_segments_smaller_max_len (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.737)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_pos (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.212)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_embedding_bag_2bit_rowwise_offsets (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.612)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_prior_correction_calibration_prediction_binary (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.858)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_tracing_torch_bind_upper_bound (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.002)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_tracing (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (20.824)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_1d_counts (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.976)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_recat_embedding_grad_output_mixed_D (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.832)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_one_hot_lengths (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.844)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.558)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_box_cox_non_zeros (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.418)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_prior_correction_calibration_accumulate (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.222)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_unsqueeze_vector (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.327)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_embedding_bag_4bit_rowwise_offsets (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.772)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.425)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_broadcast_cat_backward (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.956)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_offsets_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.320)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_gather_ranges (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.923)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_one_hot (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.549)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_sigrid_transforms_create (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.932)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges_gather_lengths_to_offsets (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.807)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_length_to_row_idx (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.738)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_tracing_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (20.175)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_box_cox_mixed (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.116)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_1d_bins (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.671)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_permute_out (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.002)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_create_sigrid_transforms_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.151)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_ranges_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (16.780)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_no_bins (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.185)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_cumsum (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.242)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_le_one (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.876)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_pack_and_unpack_segments (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.222)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_dims (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.007)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_sigrid_hash_op (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.959)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_rowwise_prune_op_64bit_indices (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.601)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_ranges_torch_bind_upper_bound (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (17.977)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_broadcast_stack (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (22.588)
    ✓ Pass: caffe2/torch/fb/sparsenn:test - test_multiple_runs_torch_bind_upper_bound (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (15.342)
Summary
  Pass: 73
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/3377699789132066

Did not run (no GPU on my devserver):
gpu_test
cpp_gpu_test

Reviewed By: r-barnes

Differential Revision: D30940399

fbshipit-source-id: d867ca646723340775a49c1b983cdab64f2d67d8
2021-09-15 21:20:41 -07:00
db601434ef Bug fix (#65105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65105

Using buildErrorMessage in external_functions.cpp was breaking build target nnc_cpu_backend_lib as buildErrorMessage is defined in tensorexpr/kernel.cpp which is not included in mobile builds and we don't want to include it in mobile builds.
Also buildErrorMessage wraps error messages for fuser whereas nnc_aten_conv2d is now only used in AOT workflow and not called by the fuser. So wrapping assertion failures with fuser error message would be misleading for AOT workflow.

Test Plan:
Before fix:
```
+ buck build //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc
Downloading... 3/3 artifacts, 24.81 Kbytes, 0.0% cache miss (for updated rules)
Building... 1.7 sec (99%) 4639/4641 jobs, 3/4641 updated
     - //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc#binary... 0.7 sec (running c++ link[0.6 sec])
Command failed with exit code 1.

command: [/data/users/priyaramani/fbsource/buck-out/cells/fbcode/gen/aab7ed39/tools/build/buck/wrappers/__ld__/ld.sh, --ld=/data/users/priyaramani/fbsource/fbcode/third-party-buck/platform009/build/llvm-fb/9.0.0/bin/clang++, --cc=/data/users/priyaramani/fbsource/buck-out/cells/fbcode/gen/aab7ed39/tools/build/buck/wrappers/__fbc...
<truncated>
...

stderr: clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument]
ld.lld: error: undefined symbol: torch::jit::tensorexpr::buildErrorMessage(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
>>> referenced by external_functions.cpp:69 (xplat/caffe2/torch/csrc/jit/tensorexpr/external_functions.cpp:69)
>>>               ../nnc_cpu_backend_lib#compile-external_functions.cpp.o50e02bc2,platform009-clang/torch/csrc/jit/tensorexpr/external_functions.cpp.o:(nnc_aten_conv2d) in archive /data/users/priyaramani/fbsource/buck-out/gen/aab7ed39/xplat/caffe2/nnc_cpu_backend_lib#platform009-clang,static/libnnc_cpu_backend_lib.a
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)

    When running <c++ link>.
    When building rule //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc#binary (ovr_config//platform/linux:x86_64-fbcode).
clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument]
ld.lld: error: undefined symbol: torch::jit::tensorexpr::buildErrorMessage(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
>>> referenced by external_functions.cpp:69 (xplat/caffe2/torch/csrc/jit/tensorexpr/external_functions.cpp:69)
>>>               ../nnc_cpu_backend_lib#compile-external_functions.cpp.o50e02bc2,platform009-clang/torch/csrc/jit/tensorexpr/external_functions.cpp.o:(nnc_aten_conv2d) in archive /data/users/priyaramani/fbsource/buck-out/gen/aab7ed39/xplat/caffe2/nnc_cpu_backend_lib#platform009-clang,static/libnnc_cpu_backend_lib.a
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)

Command failed with exit code 1.

command: [/data/users/priyaramani/fbsource/buck-out/cells/fbcode/gen/aab7ed39/tools/build/buck/wrappers/__ld__/ld.sh, --ld=/data/users/priyaramani/fbsource/fbcode/third-party-buck/platform009/build/llvm-fb/9.0.0[DEBUG kernel.cpp:2766]       }
```

After fix:
```
+ buck build //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc
Action graph will be rebuilt because files have been added or removed.
clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument]

Downloaded 11/15 artifacts, 78.37 Kbytes, 15.4% cache miss (for updated rules)
Building: finished in 7.4 sec (100%) 4718/4718 jobs, 46/4718 updated
  Total time: 7.5 sec
More details at https://www.internalfb.com/intern/buck/build/b87be016-340c-49f8-b832-0c1de70aae9e
```

Reviewed By: ZolotukhinM

Differential Revision: D30975952

fbshipit-source-id: 85c028cc6af63c03b505b51302f5158c23e1a047
2021-09-15 20:11:30 -07:00
2bb898e039 [acc_ops] Add support for torch variants of squeeze and mul (#65037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65037

att

Test Plan: updated unit tests

Reviewed By: yuhc

Differential Revision: D30952224

fbshipit-source-id: aaf75b27b4fc6c0436ba7bfcf324f761b900171b
2021-09-15 19:41:04 -07:00
206646d6ed Add NNC AOT Compiler executable (#63994)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63994

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30582149

Pulled By: priyaramani

fbshipit-source-id: 3bbf085428824c3cb308e006c18bb0a57f50fef6
2021-09-15 19:18:24 -07:00
e0ecd09011 [quant] AO migration of the _correct_bias.py, _equalize.py, and _learnable_fake_quantize.py (#64917)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64917

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates from torch.quantization to torch.ao.quantization the following files:
- `_correct_bias.py`
- `_equalize.py`
- `_learnable_fake_quantize.py`

**Note:** These file are migrated completely without any warning. The old location is thus silently deprecated.

Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestBiasCorrection`

Reviewed By: vkuzo

Differential Revision: D30898565

fbshipit-source-id: 1d39be2539dd1adfcb42e16bdcc0daf5c8316bbd
2021-09-15 18:15:39 -07:00
3ceecebed0 .circleci/.jenkins: Remove 9.2 references in CI (#65024)
Summary:
Removes 9.2 references in CI scripts and configs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65024

Reviewed By: driazati

Differential Revision: D30945948

Pulled By: janeyx99

fbshipit-source-id: 77890a00520c61500a934a90a74e3fcca84c09b5
2021-09-15 18:06:57 -07:00
d9d8250e3f .github: GHA add retry for docker run in chown workspace step (#65104)
Summary:
This should help prevent further errors in GHA workflows during the Chown Workspace step such as https://github.com/pytorch/pytorch/runs/3614067053

I did not add retries to other steps with docker run

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65104

Reviewed By: seemethere

Differential Revision: D30976330

Pulled By: janeyx99

fbshipit-source-id: e403008548aa01c9a0a4ccebe56df0e889dd045c
2021-09-15 18:02:07 -07:00
03389dc851 Revert D30752939: [pytorch][PR] nvfuser update
Test Plan: revert-hammer

Differential Revision:
D30752939 (cfaecaf40b)

Original commit changeset: ce122e80f01b

fbshipit-source-id: 57685df8f9946032a06eff1de8a3d1498500d2d2
2021-09-15 17:38:47 -07:00
c151d62f45 [quant] AO migration of the quant_types.py (phase 1) (#64916)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64916

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates the quant_type.py from torch.quantization to torch.ao.quantization.
At this point both locations will be supported. Eventually the torch.quantization will be deprecated.

Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization`

Reviewed By: vkuzo

Differential Revision: D30898422

fbshipit-source-id: 3e6126b49f0565a4136d6928cea9eb25368927ff
2021-09-15 17:30:00 -07:00
a42996f16e [quant] AO migration of the fuse_modules.py (phase 1) (#64913)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64913

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates the fuse_module.py from torch.quantization to torch.ao.quantization.
At this point both locations will be supported. Eventually the torch.quantization will be deprecated.

Test Plan: `buck test mode/dev //caffe2/test:quantization`

Reviewed By: vkuzo

Differential Revision: D30882819

fbshipit-source-id: 1926ad6aa49136aceb5b625dcef4bfde3a2860d4
2021-09-15 17:28:47 -07:00
7e9c599784 [TensorExpr] Add a method for sanitizing Var and Buf names in Stmt. (#65010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65010

This pass ensures all names are legal and not-duplicated.

Fixes #52727.

Test Plan: Imported from OSS

Reviewed By: bertmaher, navahgar

Differential Revision: D30939717

Pulled By: ZolotukhinM

fbshipit-source-id: 7dbe7f937de41f22ad49137a5e067d698443ed63
2021-09-15 17:15:06 -07:00
3d5923366d .github: Enable only specific workflows for canary (#65099)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65099

Utilizes ciflow to enable only specific workflows for
pytorch/pytorch-canary to reduce noise on that specific repository

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D30973691

Pulled By: seemethere

fbshipit-source-id: 371765535b42a00bd72c2551c4faebf733d759f0
2021-09-15 16:53:12 -07:00
59c486f2f3 ci: Disable jit legacy on circleci, enable on gha (#65106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65106

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra

Test Plan: Imported from OSS

Reviewed By: malfet, janeyx99

Differential Revision: D30976186

Pulled By: seemethere

fbshipit-source-id: 8958f821eab9aa284496c57915894ed70f6b2fff
2021-09-15 16:11:38 -07:00
b75d3cae4c CI: Upgrade windows 10.1 jobs to 10.2 (#65080)
Summary:
This is first 2 steps in the following task:
1. Upgrade 10.1 to 10.2
2. Migrate force_on_cpu job to GHA

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65080

Test Plan: https://github.com/pytorch/pytorch/pull/65086

Reviewed By: seemethere

Differential Revision: D30973655

Pulled By: janeyx99

fbshipit-source-id: 67ab69ea99ff9e0336400a7173efef6d7daac07c
2021-09-15 16:04:50 -07:00
3f27c1ae78 Replace windows 10.2 smoke tests on PRs to be 11.3 (#65090)
Summary:
As we default to linux CUDA 11.3 on PRs, we should do the same thing with Windows (instead of having 10.2 be the default). This means that 10.2 will now be master only, and 11.3 windows smoke tests will run on every PR.

This also copies over the "run smoke tests only" config--removing that will be in a separate PR once there's more certain decision making.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65090

Reviewed By: seemethere

Differential Revision: D30968382

Pulled By: janeyx99

fbshipit-source-id: c73f9a2cc800b678909365c4d80627d29fc09f94
2021-09-15 16:01:07 -07:00
ec1af11c2e Revert D30883290: [Static Runtime] Move MemoryPlanner out into memory_planner.cpp
Test Plan: revert-hammer

Differential Revision:
D30883290 (0e11454d19)

Original commit changeset: a37570f8d943

fbshipit-source-id: 65c57a2b0d2e3c7006765195dd519e8cf2472f72
2021-09-15 15:40:34 -07:00
37bcefa248 [quant] Removing hardcoded "torch.quantization.observer" for migration (#64981)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64981

this would have cause errors when observer.py was moved to ao.

see: D30391189
ghstack-source-id: 138118430

Test Plan:
buck test mode/opt //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_dynamic_quant_multi_uses (quantization.jit.test_quantize_jit.TestQuantizeDynamicJitPasses)'

buck test mode/opt //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_save_load_state_dict_script (quantization.core.test_workflow_module.TestObserver)'

Reviewed By: supriyar

Differential Revision: D30432008

fbshipit-source-id: 754727a89c78f6ceada6f8ff92c304f3953f38fc
2021-09-15 15:22:19 -07:00
fe0f9d1daf [Caffe2][easy] Avoid spurious vector copy in TransposeOp (#64403)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64403

No need to copy to the heap here.
ghstack-source-id: 138033019

Test Plan: CI

Reviewed By: smacke

Differential Revision: D30712506

fbshipit-source-id: 5f4131b2569ebb1f5092262aaddb17215dea88f1
2021-09-15 15:15:51 -07:00
208cf051d4 [Caffe2] Don't pass vector by value in SqueezeOp (#64400)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64400

There appears to be no need to copy this vector.
ghstack-source-id: 138033020

Test Plan: CI

Reviewed By: smacke

Differential Revision: D30711014

fbshipit-source-id: b9fcf3d496a663b8478aa22d52b2c41f8f85e90f
2021-09-15 15:14:30 -07:00
177ebea4c5 Use RDS for build size tracking (#64303)
Summary:
This adds 2 utilities: `register_rds_table` and `rds_write`. `register_rds_table` needs to be called once with the schema for the data that `rds_write` will write. These go to a lambda called `rds-proxy`, which will write to/read from the DB as necessary. This data can then be arbitrarily queried via `rds-proxy` (for use in CI) or on metrics.pytorch.org (for analysis).

It also hooks these up for build size tracking (which previously was not working on GHA)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64303

Reviewed By: mruberry

Differential Revision: D30941182

Pulled By: driazati

fbshipit-source-id: 12c5575ddd29902477464fc989ad76a052306b9b
2021-09-15 14:47:37 -07:00
cfaecaf40b nvfuser update (#63745)
Summary:
Syncing nvfuser code base from devel branch, Listing a few of our development since last sync:

- Extends support to normalization and reduction kernels.
- Multiple kernel launch for single `CudaFusionGroup`. Hierarchical caching system has been updated to cache graph segmentation.
- profile_ivalue is enabled to convert dynamic scalar into compile time constants, which are required by the codegen. (e.g. reduction axes).

To keep this PR simple and relatively review-free. We stripped most external changes and submitted them as separate PRs, so this gigantic PR is easier to handle.

internal updates are files located in:
1. updates in nvfuser codegen `torch/csrc/jit/coddgen/cuda`
2. added nvfuser specific benchmarks `benchmarks/cpp/nvfuser`
3. nvfuser jit cpp tests `test/cpp/jit/test_gpu.cpp` `test/cpp/jit/test_gpu_shift.cpp` `test/cpp/jit/test_gpu_validator.h`

updates affecting integration:

1. profile_ivalue enabled for nvfuser. related changes are in `torch/csrc/jit/runtime/*`,
2. exposed a few more symbols `aten/src/ATen/core/*` used by codegen

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63745

Reviewed By: saketh-are

Differential Revision: D30752939

Pulled By: malfet

fbshipit-source-id: ce122e80f01bcd3865f5bd3c4dfde660665fd84c
2021-09-15 14:42:55 -07:00
59988f81bd Add embedding shape analysis (#64323)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64323

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30738145

Pulled By: eellison

fbshipit-source-id: be12408330d671bc65cf645aa2c20fafd954e6a9
2021-09-15 13:45:48 -07:00
29514bfcdb Max Pool with indices (#64121)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64121

Add support for aten operators which return multiple outputs

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30738142

Pulled By: eellison

fbshipit-source-id: 0d7e51187bd5e3e9b43f0fdb5178366a97aec943
2021-09-15 13:45:46 -07:00
2626cd3ba4 Add Maxpool to shape analysis / Opinfo (#63530)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63530

how to review: pretty much just check that the inputs generated are a good representation of the op semantics, that should be sufficient for correctness, and then you can also double check the op size semantics by going to https://codebrowser.bddppq.com/pytorch/pytorch/ typing in native::{op_name} and looking at the op implementation as a bonus if you want

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30738147

Pulled By: eellison

fbshipit-source-id: cf52339e572ee04e0d6167fd95d8a82d58ea7706
2021-09-15 13:44:33 -07:00
425f173f9d [quant][refactor] Change the structure of the ao migration tests (#64912)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64912

The test naming was confusing and ambiguous. The file was changed to reflect the framework that is being migrated ("quantization" instead of "quantize"). Also, the common testing class was extracted out
ghstack-source-id: 138157450

Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization`

Reviewed By: vkuzo

Differential Revision: D30898214

fbshipit-source-id: 017f95995271d35bcdf6ff6a1b3974b837543e84
2021-09-15 13:15:43 -07:00
2967a48b78 Add retries to ECR login step (#65013)
Summary:
Switch retry mode from `legacy` to `standard` (https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-retries.html#cli-usage-retries-configure) and up the number of retries.

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65013

Reviewed By: zhouzhuojie, mruberry

Differential Revision: D30943292

Pulled By: driazati

fbshipit-source-id: 0a21e9b4eacbb77e6aca22f9256d94cd591b23cd
2021-09-15 13:12:57 -07:00
df3d649380 To add state dict and load_dict for Chained Scheduler (#65034)
Summary:
Adding state_dict() and load_state_dict() methods for Chained Scheduler

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65034

Reviewed By: prabhat00155, nateanl

Differential Revision: D30958207

Pulled By: datumbox

fbshipit-source-id: 1a587a330d34e0548e891a39f8fb5a3d251b71fa
2021-09-15 13:11:41 -07:00
6512838fab [ONNX] Enhance shape (two changes merged) (#64585)
Summary:
Enhanced shape inference by introducing typeReliableMap.
[ONNX] exporter changes for torch hub models (https://github.com/pytorch/pytorch/issues/62856)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64585

Reviewed By: ezyang

Differential Revision: D30870418

Pulled By: msaroufim

fbshipit-source-id: 87a294799cb87d649d1d13b6114a5cfbac9be15c

Co-authored-by: jiafatom <jiafa@microsoft.com>
2021-09-15 13:02:19 -07:00
0e11454d19 [Static Runtime] Move MemoryPlanner out into memory_planner.cpp (#65011)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65011

This change moves `MemoryPlanner` out of impl.cpp into memory_planner.cpp.

`MemoryPlanner` performs an independent sub-task of static analysis of a graph, and creating memory planning, and allocating/deallocating managed Tensors.

This change will reduce merge conflicts as I work on MemoryPlanner more actively for output Tensor support.

Test Plan: N/A

Reviewed By: mikeiovine

Differential Revision: D30883290

fbshipit-source-id: a37570f8d9430224a6987d2190bcf81cf875043d
2021-09-15 12:57:39 -07:00
db134a6843 (torch.distributed.elastic) properly format traceback on error (#65041)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65041

Fixes a bug introduced in https://github.com/pytorch/pytorch/pull/64036 where the traceback of the error handler is printed out rather than the traceback of the actual exception.

Fixes https://github.com/pytorch/pytorch/issues/60910
Closes https://github.com/pytorch/pytorch/issues/60910

BEFORE (note that the `py_callstack` is NOT the traceback of the RuntimeError):
```
**************************************************************************************************************************************************************************************************************************************************
                                                                                                              run_script_path FAILED
==================================================================================================================================================================================================================================================
Root Cause:
[0]:
  time: 2021-09-14_22:01:06
  rank: 0 (local_rank: 0)
  exitcode: 1 (pid: 1092727)
  error_file: /tmp/torchelastic_aeyvjbpe/none_8zuih7tj/attempt_0/0/error.json
  msg:
    {
      "message": "RuntimeError: rasing error since --throw was specified",
      "extraInfo": {
        "py_callstack": [
          "  File \"<string>\", line 1, in <module>\n",
          "  File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/spawn.py\", line 116, in spawn_main\n    exitcode = _main(fd, parent_sentinel)\n",
          "  File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/spawn.py\", line 129, in _main\n    return self._bootstrap(parent_sentinel)\n",
          "  File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/process.py\", line 315, in _bootstrap\n    self.run()\n",
          "  File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/process.py\", line 108, in run\n    self._target(*self._args, **self._kwargs)\n",
          "  File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/multiprocessing/spawn.py\", line 59, in _wrap\n    fn(i, *args)\n",
          "  File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/api.py\", line 382, in _wrap\n    ret = record(fn)(*args_)\n",
          "  File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/__init__.py\", line 373, in wrapper\n    error_handler.record_exception(e)\n",
          "  File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 86, in record_exception\n    _write_error(e, self._get_error_file_path())\n",
          "  File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 26, in _write_error\n    \"py_callstack\": traceback.format_stack(),\n"
        ],
        "timestamp": "1631682066"
      }
    }

==================================================================================================================================================================================================================================================
Other Failures:
  <NO_OTHER_FAILURES>
**************************************************************************************************************************************************************************************************************************************************
```

AFTER (note the traceback is the traceback of the RuntimeError):
```
********************************************************************************
                             run_script_path FAILED
================================================================================
Root Cause:
[0]:
  time: 2021-09-14_21:49:25
  rank: 0 (local_rank: 0)
  exitcode: 1 (pid: 1014681)
  error_file: /tmp/torchelastic_q0zods2c/none_qwmz5dgj/attempt_0/0/error.json
  msg: Traceback (most recent call last):
    File "/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 361, in wrapper
      return f(*args, **kwargs)
    File "/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/run.py", line 671, in run_script_path
      runpy.run_path(sys.argv[0], run_name="__main__")
    File "/usr/local/fbcode/platform009/lib/python3.8/runpy.py", line 265, in run_path
      return _run_module_code(code, init_globals, run_name,
    File "/usr/local/fbcode/platform009/lib/python3.8/runpy.py", line 97, in _run_module_code
      _run_code(code, mod_globals, init_globals,
    File "/usr/local/fbcode/platform009/lib/python3.8/runpy.py", line 87, in _run_code
      exec(code, run_globals)
    File "/home/kiuk/tmp/test.py", line 55, in <module>
      main()
    File "/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 361, in wrapper
      return f(*args, **kwargs)
    File "/home/kiuk/tmp/test.py", line 25, in main
      raise RuntimeError("rasing error since --throw was specified")
  RuntimeError: rasing error since --throw was specified

================================================================================
Other Failures:
  <NO_OTHER_FAILURES>
********************************************************************************
```

Test Plan:
(see summary for before and after)

`test.py` contents:
```
import argparse
import os
import sys

import torch
import torch.distributed as dist
import torch.nn.functional as F

from torch.distributed.elastic.multiprocessing.errors import record

def parse_args(argv):
    parser = argparse.ArgumentParser(description="test script")
    parser.add_argument("--init_method", type=str, default="env://")
    parser.add_argument("--backend", type=str, default="gloo")
    parser.add_argument("--throw", action="store_true", default=False)
    parser.add_argument("--exit", action="store_true", default=False)
    return parser.parse_args()

record
def main():
    args = parse_args(sys.argv[1:])

    if args.throw:
        raise RuntimeError("rasing error since --throw was specified")

    if args.exit:
        sys.exit(1)

    init_method=args.init_method
    backend=args.backend

    world_size = int(os.environ["WORLD_SIZE"])
    rank = int(os.environ["RANK"])

    print(f"initializing `{backend}` process group with rank={rank}, world_size={world_size} at {init_method}")

    dist.init_process_group(
        backend=backend,
        init_method=init_method,
        world_size=world_size,
        rank=rank)

    print(f"successfully initialized process group with rank={dist.get_rank()}, world_size={dist.get_world_size()}")

    t = F.one_hot(torch.tensor(rank), num_classes=world_size)
    dist.all_reduce(t)
    derived_world_size = torch.sum(t).item()
    if derived_world_size != world_size:
        raise RuntimeError(f"derived world size: {derived_world_size} != actual world size: {world_size}")
    else:
        print(f"sucessfully derived world size: {derived_world_size} (expected: {world_size}). Exiting")

if __name__ == "__main__":
    main()
```

run it as:

```
$ python -m torch.distributed.run --nproc_per_node 2 test.py --throw
```

Reviewed By: cbalioglu

Differential Revision: D30953731

fbshipit-source-id: bbea04c59c2aec58969cf44d8e3723d5f8abe8a8
2021-09-15 12:50:21 -07:00
4bf7959de2 Remove run_functional_checks from test_autograd and create necessary OpInfos (#64993)
Summary:
OpInfo tracker: https://github.com/pytorch/pytorch/issues/54261

 - Eliminate duplicated testing logic in test_autograd
 - Moved tests that rely on this testing logic to use OpInfos
   - `cat` already has OpInfo (no action needed)
   - Created OpInfo for `block_diag` and `broadcast_tensors`

Running into some FX errors. Added op to skip-list and created an issue here: https://github.com/pytorch/pytorch/issues/64997
Both `block_diag` and `broadcast_tensors` are variadic, so skipping `test_variant_consistency_jit` (from comments on other OpInfos, it looks like JIT does not support variadic tensors)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64993

Reviewed By: jbschlosser

Differential Revision: D30961736

Pulled By: soulitzer

fbshipit-source-id: e169305384a683acae1178c4e12e9e214a67226a
2021-09-15 12:45:38 -07:00
21017ad1a1 Dispatch.h: Avoid including ivalue (#64165)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64165

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30728587

Pulled By: ezyang

fbshipit-source-id: d0d2e97491d9d5e2d2fc2d6e51420a4467c1bba4
2021-09-15 12:16:44 -07:00
211ad231dc To add state_dict and load_state_dict to SequentialLR (#65035)
Summary:
To add state_dict() and load_state_dict() methods to SequentialLR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65035

Reviewed By: prabhat00155, nateanl

Differential Revision: D30958204

Pulled By: datumbox

fbshipit-source-id: 65114e1b07146526ae2680233f5cd42b2534d67a
2021-09-15 12:01:51 -07:00
8a652e0e91 [CircleCI] Disable pytorch_linux_xenial_cuda10_2 test jobs (#65071)
Summary:
As all of them has been migrated to GHA:
- pytorch_linux_pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_distributed_test -> "linux-xenial-cuda11.3-py3.6-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)"
- pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1 -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (default, 1, 2,
linux.8xlarge.nvidia.gpu)"
- pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (default, 2, 2,
linux.8xlarge.nvidia.gpu)"
- pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (multigpu, 1, 1,
linux.16xlarge.nvidia.gpu)"
- pytorch_linux_xenial_cuda10_2_cudnn7_py3_nogpu_NO_AVX2_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (nogpu_NO_AVX2, 1, 1, linux.2xlarge)"
- pytorch_linux_xenial_cuda10_2_cudnn7_py3_nogpu_NO_AVX_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (nogpu_NO_AVX, 1, 1, linux.2xlarge)"
- pytorch_linux_xenial_cuda10_2_cudnn7_py3_slow_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (slow, 1, 1, linux.8xlarge.nvidia.gpu)"

"pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build" is still a holdout due to slow gradchecks

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65071

Reviewed By: driazati, seemethere, janeyx99

Differential Revision: D30963413

Pulled By: malfet

fbshipit-source-id: d9a5188ce7eb2f60547b91b854a5db83af2b10e7
2021-09-15 11:59:40 -07:00
f1ce64a58e Starter Task 1 (#64927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64927

Mypy error corrections

Test Plan: Corrected mypy errors to make code less prone to bugs by modifying types or adding lines that avoid special undesired cases e.g. asserting a variable to not None.

Reviewed By: wushirong

Differential Revision: D30901654

fbshipit-source-id: daae8692603b8b38203a98f673c455749c2fb855
2021-09-15 11:55:07 -07:00
dab6496dbe [ROCm] Update CI images for ROCm 4.3.1 (#64610)
Summary:
Signed-off-by: Kyle Chen <kylechen@amd.com>

reference:
https://github.com/pytorch/pytorch/issues/58017

jithunnair-amd
jeffdaily
arindamroy-eng

cc jeffdaily sunway513 jithunnair-amd ROCmSupport

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64610

Reviewed By: seemethere

Differential Revision: D30964582

Pulled By: malfet

fbshipit-source-id: a8335d3d32d7f1557d3cf6cb055ad0f9c49ef7aa
2021-09-15 11:49:54 -07:00
54d060a8c9 Port all and any full reductions to structured kernels. (#64642)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64642

Tracking issue: #55070

This PR creates out overloads for both `all` and `any` kernels (full reduction overload),
and ports them to structured kernels.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30867354

Pulled By: ezyang

fbshipit-source-id: 46bccaf6c94a09ed77cc6c724d1183c82f801751
2021-09-15 11:06:47 -07:00
54cdf651fd [PyTorch] remove string_view::operator[] bounds check (#64670)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64670

Bounds checking is not required for `std::string_view`, and the checking hoses performance for the following performance prototype diff.
ghstack-source-id: 138037531

Test Plan: CI

Reviewed By: ezyang, bhosmer

Differential Revision: D30747515

fbshipit-source-id: 1f4374415a82dfdccce76ea2c6885c13cb93d369
2021-09-15 09:57:58 -07:00
57420a6063 [PyTorch][easy] Add cbegin/cend to SmallVector (#64682)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64682

Looks like it was forked from llvm before cbegin and cend existed.
ghstack-source-id: 138036981

Test Plan: CI

Reviewed By: dhruvbird

Differential Revision: D30814434

fbshipit-source-id: 9740fa8d3df1c90b77298a95ab9f1d0cf8c90320
2021-09-15 09:57:56 -07:00
bdbc622988 [PyTorch] Avoid extra std::vector in parseSchemaOrName (#64678)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64678

We know we only want one declaration, so let's not create an excess std::vector (and thus a heap allocation) for that.
ghstack-source-id: 138036978

Test Plan: CI

Reviewed By: dhruvbird, tugsbayasgalan

Differential Revision: D30813785

fbshipit-source-id: c67e0100cdef5d894282939fb6d39a57309bc240
2021-09-15 09:56:41 -07:00
0f1bccb692 [quant] Removing unnecessary import from torch/quantization/quantize.py (#64910)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64910

This bled through from the original location. Removing it is not just refactoring, but also prevents potential recursive imports.
ghstack-source-id: 138112663

Test Plan: `buck test mode/dev //caffe2/test:quantization`

Reviewed By: vkuzo

Differential Revision: D30882924

fbshipit-source-id: 8652a334a5186c635761ea5e50f978d1f1078c12
2021-09-15 09:39:04 -07:00
3fb33b38b9 [Static Runtime] Check if outputs of a node do not overlap with each other (#63013)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63013

This change enhances the current memory overlapping check to include outputs: the enhancement enforces a constraint that all outputs of a node should NOT overlap with each other since they are supposed to be update by a node at the same time, holding the node's outputs.

This check will detect a problem like T97393697 immediately in debug mode.

Test Plan:
- Added a unittest `ProcessedNode.VerifyMemoryOverlapWithOverlappingOutputs`

- Ran `inline_cvr` on ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench with this diff and confirmed that the checking condition holds true during the run.

Reviewed By: hlu1

Differential Revision: D30211705

fbshipit-source-id: 994d8dace2422e2498e504eb61452a55739238c0
2021-09-15 08:38:05 -07:00
26e43fe9f3 Forward fix SkipInfo missing mypy (#65063)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65063

Reviewed By: malfet

Differential Revision: D30961556

Pulled By: janeyx99

fbshipit-source-id: 9618e12ba873fb48fe5c846a48d4560ad521eb3e
2021-09-15 08:30:38 -07:00
fb8bdb8039 When test set_affinity, don't hardcode the CPU ID (#65042)
Summary:
The setaffinity test always fails when the number of CPUs is smaller
than 3. Changed the test to be dynamically based on the number of CPUs
of the system.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65042

Reviewed By: jbschlosser

Differential Revision: D30960554

Pulled By: ejguan

fbshipit-source-id: 55ac12714b4b0964b48c3617b79a7a345d40ebce
2021-09-15 08:10:59 -07:00
c625f971d3 [DataPipe] Make TarArchiveReader and ZipArchiveReader accepts FileSream with attempt to close and additional warning (#64788)
Summary:
ghstack is not working for the second commit so I'm manually creating this PR for now. Please only look at changes related to the second commit in this PR (there is a PR for the first commit).

This PR removes TarArchiveReader's dependency on FileLoader DataPipe, by allowing it to use a IterDataPipe of path names as input rather than a tuple of path name and a stream.

It also adds additional tests to ensure that the DataPipe is functioning properly when it is read multiple times or reset half way through reading.

The whole stack fixes https://github.com/pytorch/pytorch/issues/64281 - issues related to unclosed buffer stream.

Stack:
* __->__ https://github.com/pytorch/pytorch/issues/64788
* https://github.com/pytorch/pytorch/issues/64786

cc VitalyFedyunin ejguan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64788

Reviewed By: jbschlosser, ejguan

Differential Revision: D30901176

Pulled By: NivekT

fbshipit-source-id: 59746a8d0144fc6d3ce0feb2d76445b82e6d414e
2021-09-15 07:34:29 -07:00
32c5da8cd2 add OpInfo for torch.nn.functional.dropout (#62315)
Summary:
Addresses facebookresearch/functorch#78.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62315

Reviewed By: mruberry

Differential Revision: D30932765

Pulled By: zou3519

fbshipit-source-id: 481c67b59a966b4d640973d252b3e392d8db728e
2021-09-15 07:18:04 -07:00
d6d286f651 [dnnlowp] reduce num of test cases to avoid time out (#64935)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64935

As title

Test Plan: CI

Reviewed By: dskhudia

Differential Revision: D30889157

fbshipit-source-id: 316c808806b084bd2e44c56e1cdb61adf2369a9d
2021-09-14 21:32:12 -07:00
b7ec7d760d Generic test parametrization functionality (#60753)
Summary:
This PR plays around with implementation & usage of a `parametrize` decorator for test parametrization similar to `pytest.mark.parametrize`, based on previous work introducing a `_TestParametrizer` class. It works with the internal `DeviceTest` hierarchy & composes with `dtype`, `skip*`, and other decorators. Basic usage is demonstrated in `test/test_blah.py`:

```python
import unittest
from itertools import product
from torch.testing._internal.common_device_type import (
    instantiate_device_type_tests, deviceCountAtLeast, ops)
from torch.testing._internal.common_methods_invocations import op_db
from torch.testing._internal.common_utils import (
    TestCase, run_tests, parametrize, instantiate_parametrized_tests, subtest)

class TestBlah(TestCase):
    parametrize("x", range(5))
    def test_default_names(self, x):
        print('Passed in:', x)

    # Use default names but add an expected failure.
    parametrize("x", [subtest(0, decorators=[unittest.expectedFailure]),
                       *range(1, 5)])
    def test_default_names_expected_failure(self, x):
        if x == 0:
            raise RuntimeError('Boom')
        print('Passed in:', x)

    parametrize("bias", [False, True], name_fn=lambda b: 'bias' if b else 'no_bias')
    def test_custom_names(self, bias):
        print('Passed in:', bias)

    parametrize("bias", [subtest(True, name='bias'),
                          subtest(False, name='no_bias')])
    def test_custom_names_alternate(self, bias):
        print('Passed in:', bias)

    parametrize("x,y", [(1, 2), (1, 3), (1, 4)])
    def test_two_things_default_names(self, x, y):
        print('Passed in:', x, y)

    parametrize("x", [1, 2, 3])
    parametrize("y", [4, 5, 6])
    def test_two_things_composition(self, x, y):
        print('Passed in:', x, y)

    parametrize("x", [subtest(0, decorators=[unittest.expectedFailure]),
                       *range(1, 3)])
    parametrize("y", [4, 5, subtest(6, decorators=[unittest.expectedFailure])])
    def test_two_things_composition_expected_failure(self, x, y):
        if x == 0 or y == 6:
            raise RuntimeError('Boom')
        print('Passed in:', x, y)

    parametrize("x", [1, 2])
    parametrize("y", [3, 4])
    parametrize("z", [5, 6])
    def test_three_things_composition(self, x, y, z):
        print('Passed in:', x, y, z)

    parametrize("x", [1, 2], name_fn=str)
    parametrize("y", [3, 4], name_fn=str)
    parametrize("z", [5, 6], name_fn=str)
    def test_three_things_composition_custom_names(self, x, y, z):
        print('Passed in:', x, y, z)

    parametrize("x,y", product(range(2), range(3)))
    def test_two_things_product(self, x, y):
        print('Passed in:', x, y)

    parametrize("x,y", [subtest((1, 2), name='double'),
                         subtest((1, 3), name='triple'),
                         subtest((1, 4), name='quadruple')])
    def test_two_things_custom_names(self, x, y):
        print('Passed in:', x, y)

    parametrize("x,y", [(1, 2), (1, 3), (1, 4)], name_fn=lambda x, y: '{}_{}'.format(x, y))
    def test_two_things_custom_names_alternate(self, x, y):
        print('Passed in:', x, y)

class TestDeviceBlah(TestCase):
    parametrize("x", range(10))
    def test_default_names(self, device, x):
        print('Passed in:', device, x)

    parametrize("x,y", [(1, 2), (3, 4), (5, 6)])
    def test_two_things(self, device, x, y):
        print('Passed in:', device, x, y)

    deviceCountAtLeast(1)
    def test_multiple_devices(self, devices):
        print('Passed in:', devices)

    ops(op_db)
    parametrize("flag", [False, True], lambda f: 'flag_enabled' if f else 'flag_disabled')
    def test_op_parametrized(self, device, dtype, op, flag):
        print('Passed in:', device, dtype, op, flag)

instantiate_parametrized_tests(TestBlah)
instantiate_device_type_tests(TestDeviceBlah, globals())

if __name__ == '__main__':
    run_tests()
```

Generated tests:
```
TestBlah.test_custom_names_alternate_bias
TestBlah.test_custom_names_alternate_no_bias
TestBlah.test_custom_names_bias
TestBlah.test_custom_names_no_bias
TestBlah.test_default_names_expected_failure_x_0
TestBlah.test_default_names_expected_failure_x_1
TestBlah.test_default_names_expected_failure_x_2
TestBlah.test_default_names_expected_failure_x_3
TestBlah.test_default_names_expected_failure_x_4
TestBlah.test_default_names_x_0
TestBlah.test_default_names_x_1
TestBlah.test_default_names_x_2
TestBlah.test_default_names_x_3
TestBlah.test_default_names_x_4
TestBlah.test_three_things_composition_custom_names_1_3_5
TestBlah.test_three_things_composition_custom_names_1_3_6
TestBlah.test_three_things_composition_custom_names_1_4_5
TestBlah.test_three_things_composition_custom_names_1_4_6
TestBlah.test_three_things_composition_custom_names_2_3_5
TestBlah.test_three_things_composition_custom_names_2_3_6
TestBlah.test_three_things_composition_custom_names_2_4_5
TestBlah.test_three_things_composition_custom_names_2_4_6
TestBlah.test_three_things_composition_x_1_y_3_z_5
TestBlah.test_three_things_composition_x_1_y_3_z_6
TestBlah.test_three_things_composition_x_1_y_4_z_5
TestBlah.test_three_things_composition_x_1_y_4_z_6
TestBlah.test_three_things_composition_x_2_y_3_z_5
TestBlah.test_three_things_composition_x_2_y_3_z_6
TestBlah.test_three_things_composition_x_2_y_4_z_5
TestBlah.test_three_things_composition_x_2_y_4_z_6
TestBlah.test_two_things_composition_expected_failure_x_0_y_4
TestBlah.test_two_things_composition_expected_failure_x_0_y_5
TestBlah.test_two_things_composition_expected_failure_x_0_y_6
TestBlah.test_two_things_composition_expected_failure_x_1_y_4
TestBlah.test_two_things_composition_expected_failure_x_1_y_5
TestBlah.test_two_things_composition_expected_failure_x_1_y_6
TestBlah.test_two_things_composition_expected_failure_x_2_y_4
TestBlah.test_two_things_composition_expected_failure_x_2_y_5
TestBlah.test_two_things_composition_expected_failure_x_2_y_6
TestBlah.test_two_things_composition_x_1_y_4
TestBlah.test_two_things_composition_x_1_y_5
TestBlah.test_two_things_composition_x_1_y_6
TestBlah.test_two_things_composition_x_2_y_4
TestBlah.test_two_things_composition_x_2_y_5
TestBlah.test_two_things_composition_x_2_y_6
TestBlah.test_two_things_composition_x_3_y_4
TestBlah.test_two_things_composition_x_3_y_5
TestBlah.test_two_things_composition_x_3_y_6
TestBlah.test_two_things_custom_names_alternate_1_2
TestBlah.test_two_things_custom_names_alternate_1_3
TestBlah.test_two_things_custom_names_alternate_1_4
TestBlah.test_two_things_custom_names_double
TestBlah.test_two_things_custom_names_quadruple
TestBlah.test_two_things_custom_names_triple
TestBlah.test_two_things_default_names_x_1_y_2
TestBlah.test_two_things_default_names_x_1_y_3
TestBlah.test_two_things_default_names_x_1_y_4
TestBlah.test_two_things_product_x_0_y_0
TestBlah.test_two_things_product_x_0_y_1
TestBlah.test_two_things_product_x_0_y_2
TestBlah.test_two_things_product_x_1_y_0
TestBlah.test_two_things_product_x_1_y_1
TestBlah.test_two_things_product_x_1_y_2
TestDeviceBlahCPU.test_default_names_x_0_cpu
TestDeviceBlahCPU.test_default_names_x_1_cpu
TestDeviceBlahCPU.test_default_names_x_2_cpu
TestDeviceBlahCPU.test_default_names_x_3_cpu
TestDeviceBlahCPU.test_default_names_x_4_cpu
TestDeviceBlahCPU.test_default_names_x_5_cpu
TestDeviceBlahCPU.test_default_names_x_6_cpu
TestDeviceBlahCPU.test_default_names_x_7_cpu
TestDeviceBlahCPU.test_default_names_x_8_cpu
TestDeviceBlahCPU.test_default_names_x_9_cpu
TestDeviceBlahCPU.test_multiple_devices_cpu
TestDeviceBlahCPU.test_op_parametrized_<opname>_<variant>_cpu_uint8_flag_enabled_cpu
TestDeviceBlahCPU.test_two_things_x_1_y_2_cpu
TestDeviceBlahCPU.test_two_things_x_3_y_4_cpu
TestDeviceBlahCPU.test_two_things_x_5_y_6_cpu
TestDeviceBlahMETA.test_default_names_x_0_meta
TestDeviceBlahMETA.test_default_names_x_1_meta
TestDeviceBlahMETA.test_default_names_x_2_meta
TestDeviceBlahMETA.test_default_names_x_3_meta
TestDeviceBlahMETA.test_default_names_x_4_meta
TestDeviceBlahMETA.test_default_names_x_5_meta
TestDeviceBlahMETA.test_default_names_x_6_meta
TestDeviceBlahMETA.test_default_names_x_7_meta
TestDeviceBlahMETA.test_default_names_x_8_meta
TestDeviceBlahMETA.test_default_names_x_9_meta
TestDeviceBlahMETA.test_multiple_devices_meta
TestDeviceBlahMETA.test_op_parametrized_<opname>_<variant>_meta_uint8_flag_enabled_meta
TestDeviceBlahMETA.test_two_things_x_1_y_2_meta
TestDeviceBlahMETA.test_two_things_x_3_y_4_meta
TestDeviceBlahMETA.test_two_things_x_5_y_6_meta
```

Caveats:
* `parametrize` decorators cannot be "stacked" yet; each one overwrites the previous. This will change to either:
  * Allow stacking of multiple decorators
  * Error out with a nice error message if multiple decorators are specified

The PR introduces `instantiate_parametrized_tests()` in addition to `instantiate_device_type_tests()`. The former should be used for non-device-specific tests, and the latter should be used for device-specific tests, as usual. Both of these support the `parametrize` decorator. Only the latter supports the `ops` decorator (no change here- this was already the case).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60753

Reviewed By: saketh-are

Differential Revision: D30606615

Pulled By: jbschlosser

fbshipit-source-id: a34f36d643f68a6e221f419d9bb3e1ae1d84dd65
2021-09-14 19:52:59 -07:00
6ab97fbc28 [vulkan] Use volk to load vulkan libraries and fix Windows build errors (#64988)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64988

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64968

The current wrapper (provided by [Vulkan-Tools](https://github.com/KhronosGroup/Vulkan-Tools/tree/master/common)) can't handle dynamically loading Vulkan on Windows/Mac. Therefore, we can bring in [volk](https://github.com/zeux/volk) to load the vulkan libraries for other platforms.

1. Use `volk` with `link_style="static"` only if Windows. Use `vulkan_wrapper` for all others (temporary solution)
2. Make DotSlash work on Windows when resolving glslc path

Test Plan:
For Android:

```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
cd -
```

For Mac:
```
buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac
./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64
```

On Local OSS repo with `pr/64988` branch:

The build and test are fine. Note that `VulkanAPITest.log_softmax()` has been broken for the past month. Ivan will take a look at when he is available.

Build: `BUILD_TEST=1 USE_VULKAN=1 USE_VULKAN_SHADERC_RUNTIME=1 USE_VULKAN_WRAPPER=0 MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install`

Test: `$PYTORCH_ROOT/build/bin/vulkan_api_test /data/local/tmp`

```
Running main() from ../third_party/googletest/googletest/src/gtest_main.cc
[==========] Running 69 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 69 tests from VulkanAPITest
[ RUN      ] VulkanAPITest.adaptive_avg_pool2d
[       OK ] VulkanAPITest.adaptive_avg_pool2d (228 ms)
[ RUN      ] VulkanAPITest.add
[       OK ] VulkanAPITest.add (51 ms)
[ RUN      ] VulkanAPITest.add_broadcast0
[       OK ] VulkanAPITest.add_broadcast0 (13 ms)
[ RUN      ] VulkanAPITest.add_broadcast1
[       OK ] VulkanAPITest.add_broadcast1 (9 ms)
[ RUN      ] VulkanAPITest.add_broadcast2
[       OK ] VulkanAPITest.add_broadcast2 (9 ms)
[ RUN      ] VulkanAPITest.add_
[       OK ] VulkanAPITest.add_ (60 ms)
[ RUN      ] VulkanAPITest.add_broadcast0_
[       OK ] VulkanAPITest.add_broadcast0_ (10 ms)
[ RUN      ] VulkanAPITest.add_broadcast1_
[       OK ] VulkanAPITest.add_broadcast1_ (1 ms)
[ RUN      ] VulkanAPITest.add_scalar
[       OK ] VulkanAPITest.add_scalar (24 ms)
[ RUN      ] VulkanAPITest.add_scalar_
[       OK ] VulkanAPITest.add_scalar_ (8 ms)
[ RUN      ] VulkanAPITest.addmm
[       OK ] VulkanAPITest.addmm (22 ms)
[ RUN      ] VulkanAPITest.addmm_expand
[       OK ] VulkanAPITest.addmm_expand (12 ms)
[ RUN      ] VulkanAPITest.avg_pool2d
[       OK ] VulkanAPITest.avg_pool2d (9 ms)
[ RUN      ] VulkanAPITest.clamp
[       OK ] VulkanAPITest.clamp (92 ms)
[ RUN      ] VulkanAPITest.clamp_
[       OK ] VulkanAPITest.clamp_ (60 ms)
[ RUN      ] VulkanAPITest.conv2d
[       OK ] VulkanAPITest.conv2d (15 ms)
[ RUN      ] VulkanAPITest.conv2d_dw
[       OK ] VulkanAPITest.conv2d_dw (15 ms)
[ RUN      ] VulkanAPITest.conv2d_pw
[       OK ] VulkanAPITest.conv2d_pw (34 ms)
[ RUN      ] VulkanAPITest.conv2d_winograd
[       OK ] VulkanAPITest.conv2d_winograd (10 ms)
[ RUN      ] VulkanAPITest.copy
[       OK ] VulkanAPITest.copy (1 ms)
[ RUN      ] VulkanAPITest.div
[       OK ] VulkanAPITest.div (32 ms)
[ RUN      ] VulkanAPITest.div_broadcast0
[       OK ] VulkanAPITest.div_broadcast0 (11 ms)
[ RUN      ] VulkanAPITest.div_broadcast1
[       OK ] VulkanAPITest.div_broadcast1 (9 ms)
[ RUN      ] VulkanAPITest.div_broadcast2
[       OK ] VulkanAPITest.div_broadcast2 (7 ms)
[ RUN      ] VulkanAPITest.div_
[       OK ] VulkanAPITest.div_ (46 ms)
[ RUN      ] VulkanAPITest.div_broadcast0_
[       OK ] VulkanAPITest.div_broadcast0_ (9 ms)
[ RUN      ] VulkanAPITest.div_broadcast1_
[       OK ] VulkanAPITest.div_broadcast1_ (2 ms)
[ RUN      ] VulkanAPITest.div_scalar
[       OK ] VulkanAPITest.div_scalar (95 ms)
[ RUN      ] VulkanAPITest.div_scalar_
[       OK ] VulkanAPITest.div_scalar_ (18 ms)
[ RUN      ] VulkanAPITest.empty
[       OK ] VulkanAPITest.empty (0 ms)
[ RUN      ] VulkanAPITest.hardsigmoid
[       OK ] VulkanAPITest.hardsigmoid (76 ms)
[ RUN      ] VulkanAPITest.hardsigmoid_
[       OK ] VulkanAPITest.hardsigmoid_ (80 ms)
[ RUN      ] VulkanAPITest.hardshrink
[       OK ] VulkanAPITest.hardshrink (630 ms)
[ RUN      ] VulkanAPITest.hardshrink_
[       OK ] VulkanAPITest.hardshrink_ (573 ms)
[ RUN      ] VulkanAPITest.leaky_relu
[       OK ] VulkanAPITest.leaky_relu (271 ms)
[ RUN      ] VulkanAPITest.leaky_relu_
[       OK ] VulkanAPITest.leaky_relu_ (254 ms)
[ RUN      ] VulkanAPITest.hardswish
[       OK ] VulkanAPITest.hardswish (83 ms)
[ RUN      ] VulkanAPITest.hardswish_
[       OK ] VulkanAPITest.hardswish_ (72 ms)
[ RUN      ] VulkanAPITest.max_pool2d
[       OK ] VulkanAPITest.max_pool2d (16 ms)
[ RUN      ] VulkanAPITest.mean
[       OK ] VulkanAPITest.mean (17 ms)
[ RUN      ] VulkanAPITest.mean2d
[       OK ] VulkanAPITest.mean2d (20 ms)
[ RUN      ] VulkanAPITest.mm
[       OK ] VulkanAPITest.mm (12 ms)
[ RUN      ] VulkanAPITest.mul
[       OK ] VulkanAPITest.mul (28 ms)
[ RUN      ] VulkanAPITest.mul_broadcast0
[       OK ] VulkanAPITest.mul_broadcast0 (9 ms)
[ RUN      ] VulkanAPITest.mul_broadcast1
[       OK ] VulkanAPITest.mul_broadcast1 (9 ms)
[ RUN      ] VulkanAPITest.mul_broadcast2
[       OK ] VulkanAPITest.mul_broadcast2 (9 ms)
[ RUN      ] VulkanAPITest.mul_
[       OK ] VulkanAPITest.mul_ (43 ms)
[ RUN      ] VulkanAPITest.mul_broadcast0_
[       OK ] VulkanAPITest.mul_broadcast0_ (8 ms)
[ RUN      ] VulkanAPITest.mul_broadcast1_
[       OK ] VulkanAPITest.mul_broadcast1_ (1 ms)
[ RUN      ] VulkanAPITest.mul_scalar
[       OK ] VulkanAPITest.mul_scalar (64 ms)
[ RUN      ] VulkanAPITest.mul_scalar_
[       OK ] VulkanAPITest.mul_scalar_ (17 ms)
[ RUN      ] VulkanAPITest.reflection_pad2d
[       OK ] VulkanAPITest.reflection_pad2d (7 ms)
[ RUN      ] VulkanAPITest.reshape
[       OK ] VulkanAPITest.reshape (73 ms)
[ RUN      ] VulkanAPITest.reshape_
[       OK ] VulkanAPITest.reshape_ (41 ms)
[ RUN      ] VulkanAPITest.sigmoid
[       OK ] VulkanAPITest.sigmoid (81 ms)
[ RUN      ] VulkanAPITest.sigmoid_
[       OK ] VulkanAPITest.sigmoid_ (68 ms)
[ RUN      ] VulkanAPITest.softmax
[       OK ] VulkanAPITest.softmax (28 ms)
[ RUN      ] VulkanAPITest.log_softmax
Max Diff allowed: 5.87862e-05
../aten/src/ATen/test/vulkan_api_test.cpp:1470: Failure
Value of: check
  Actual: false
Expected: true
[  FAILED  ] VulkanAPITest.log_softmax (19 ms)
[ RUN      ] VulkanAPITest.tanh
[       OK ] VulkanAPITest.tanh (63 ms)
[ RUN      ] VulkanAPITest.tanh_
[       OK ] VulkanAPITest.tanh_ (68 ms)
[ RUN      ] VulkanAPITest.sub
[       OK ] VulkanAPITest.sub (28 ms)
[ RUN      ] VulkanAPITest.sub_broadcast0
[       OK ] VulkanAPITest.sub_broadcast0 (9 ms)
[ RUN      ] VulkanAPITest.sub_broadcast1
[       OK ] VulkanAPITest.sub_broadcast1 (9 ms)
[ RUN      ] VulkanAPITest.sub_broadcast2
[       OK ] VulkanAPITest.sub_broadcast2 (8 ms)
[ RUN      ] VulkanAPITest.sub_
[       OK ] VulkanAPITest.sub_ (43 ms)
[ RUN      ] VulkanAPITest.sub_broadcast0_
[       OK ] VulkanAPITest.sub_broadcast0_ (10 ms)
[ RUN      ] VulkanAPITest.sub_broadcast1_
[       OK ] VulkanAPITest.sub_broadcast1_ (2 ms)
[ RUN      ] VulkanAPITest.upsample_nearest2d
[       OK ] VulkanAPITest.upsample_nearest2d (5 ms)
[ RUN      ] VulkanAPITest.mobilenetv2
[       OK ] VulkanAPITest.mobilenetv2 (82 ms)
[----------] 69 tests from VulkanAPITest (3885 ms total)

[----------] Global test environment tear-down
[==========] 69 tests from 1 test suite ran. (3885 ms total)
[  PASSED  ] 68 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] VulkanAPITest.log_softmax

 1 FAILED TEST
```

Differential Revision: D30925995

fbshipit-source-id: 1b1b7f7f22090064424a5379d2f0559d0da7846a
2021-09-14 19:35:05 -07:00
ff6b475d4a [fix] don't expose unique_dim in torch (#63080)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62793

This is mostly a quick fix. I think the more correct fix could be updating `unique_dim` to `_unique_dim` which could be BC-breaking for C++ users (� maybe). Maybe something else I am missing.

~~Not sure how to add a test for it.~~ Have tested it locally.

We can add a test like following. Tested this locally, it fails currently but passes with the fix.
```python
        def test_wildcard_import(self):
            exec('from torch import *')

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63080

Reviewed By: gchanan

Differential Revision: D30738711

Pulled By: zou3519

fbshipit-source-id: b86d0190e45ba0b49fd2cffdcfd2e3a75cc2a35e
2021-09-14 18:19:17 -07:00
36cac2be4d [CUDA graphs] moves memory sharing intro paragraph (#64996)
Summary:
Puts memory sharing intro under Sharing memory... header, where it should have been all along.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64996

Reviewed By: mruberry

Differential Revision: D30948619

Pulled By: ngimel

fbshipit-source-id: 5d9dd267b34e9d3fc499d4738377b58a22da1dc2
2021-09-14 17:53:43 -07:00
36a0d97281 Revert D30558877: Ported std/var to ReductionOpInfo and minimum/maximum to BinaryUfuncInfo
Test Plan: revert-hammer

Differential Revision:
D30558877 (382e008fbf)

Original commit changeset: 3e62ff24a935

fbshipit-source-id: 3b9f03c1f43c6d5f2738ed139d0236f2ded78dbf
2021-09-14 17:33:38 -07:00
3d312b3b8e [Model Averaging] Simplify PostLocalSGD Optimizer API (#64885)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64885

1) The constructor accepts a local optimizer instance instead of the inputs of local optimizer constructor and the class type.
2) The parameters are read from local optimizer's `param_groups` instead of a separate input.

Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 137865867

Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity

Reviewed By: rohan-varma

Differential Revision: D30888794

fbshipit-source-id: 21261b480f6bbb9b2333426020e3f350da3f73c2
2021-09-14 16:37:14 -07:00
382e008fbf Ported std/var to ReductionOpInfo and minimum/maximum to BinaryUfuncInfo (#63978)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63978

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D30558877

Pulled By: heitorschueroff

fbshipit-source-id: 3e62ff24a935784fc93a76a0f46a1deb060ba680
2021-09-14 16:18:09 -07:00
c65128679b [DataPipe] Improve Mapper to accept input/output index when apply fn (#64951)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64951

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D30910035

Pulled By: ejguan

fbshipit-source-id: d687fe10939920a3617a60552fe743e8526438a0
2021-09-14 15:46:42 -07:00
670853295a [quant][tensorrt] Add tensorrt backend config (#64623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64623

The config api will change, but we'll add configs gradually for TensorRT to unblock experimentation

Test Plan:
python torch/fx/experimental/fx2trt/example/unittests.py

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30800474

fbshipit-source-id: 3c4640de1205a0f19b62943ab84f386d80394ec2
2021-09-14 15:27:33 -07:00
85222c050f [PyTorch] Add c10::hash<c10::ArrayRef<T>> (#64277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64277

Just moved the vector implementation to ArrayRef and re-implemented the former using the latter.
ghstack-source-id: 137978947

Test Plan: existing CI

Reviewed By: dhruvbird

Differential Revision: D30647666

fbshipit-source-id: c0f4f06c348d36882ec0db802be44d8c7749562f
2021-09-14 14:22:12 -07:00
5d4efed83e [PyTorch] Add OpCode cache in ByteCodeDeserializer (#64110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64110

As the code comment says, we can exploit pickler string interning to accelerate OpCode parsing. No more strcmp!
ghstack-source-id: 137978946

Test Plan:
Pixel 3 before: https://www.internalfb.com/intern/aibench/details/591414145082422
Pixel 3 after: https://www.internalfb.com/intern/aibench/details/484557404703261

new mean is 292 ms, down from 302 ms.

Reviewed By: dhruvbird

Differential Revision: D30615052

fbshipit-source-id: 9707625e778388a7920ab72704d71ad57ddaac17
2021-09-14 14:22:10 -07:00
a9121df09c [PyTorch] Remove implicit conversion from Tuple to vector reference (#63993)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63993

This seems to be unused, and it's pretty scary.
ghstack-source-id: 137978949

Test Plan: CI

Reviewed By: lw

Differential Revision: D30560441

fbshipit-source-id: 08b7ce971fd1e2dbeddbf37b02413fef513b4753
2021-09-14 14:22:08 -07:00
452402b984 [PyTorch] Fix SourceRangeDeserializer vector copy (#64031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64031

More copies of tuple elements.
ghstack-source-id: 137978948

Test Plan:
Pixel 3 before: https://our.intern.facebook.com/intern/aibench/details/724509739115867
Pixel 3 after: https://our.intern.facebook.com/intern/aibench/details/232361457767293

Top-line number doesn't seem to have moved, but we can see that the vector copy disappeared in the flame graph.

Reviewed By: raziel

Differential Revision: D30559545

fbshipit-source-id: e5343abae96b8e80e0ccec482ad316884ae231ea
2021-09-14 14:20:45 -07:00
57eda69219 [fx2trt] fix elementwise op converter with one operand being a literal and has different type (#65004)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65004

If we have some code like `torch.add(x, 1)` and x is a float tensor then in conversion things would falling apart because currently we will add a constant layer of int32 dtype for `1` but we actually need float dtype.

This diff adds an arg to `get_trt_tensor` which specify the dtype of the constant layer we would created.

Also, start to add doc string for functions.

Reviewed By: yinghai

Differential Revision: D30852156

fbshipit-source-id: 650ce72d2794093a4616e640ea503dcc1c6b2bc4
2021-09-14 12:27:37 -07:00
3727baea6f [PyTorch Edge][Model Loading] Operator Call De-dup at TorchScript Serialization Level [2/2] (#64269)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64269

Revert changes in D29826210 (693d8f2f07) (we don't need operator lambda caching since there aren't duplicate operators anymore)

This diff stack results in an additional approx 12% speedup in model loading time (from 229ms to 200ms) when run against an 87MB speech model that jiatongzhou provided.
ghstack-source-id: 138014904

Test Plan:
**Speech Transducer v25 model (as in D29826210 (693d8f2f07))**

|| Before | After |
|Load Time|[229ms](https://www.internalfb.com/intern/aibench/details/160889436133243)|[200ms](https://www.internalfb.com/intern/aibench/details/837884532607514)|
|Save File Size|[86.23 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658544950)|[86.1 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658554403)|

The "after" flamegraph shows significantly less time is spent on ```append_operator``` than before.

Steps
- Check out desired commit in devserver (base branch or this diff)
- ```buck build bento/kernels:bento_kernel_pytorch```
- Use N1094068 with pytorch_local kernel to save model for lite interpreter
- Edit ```aibench/specifications/models/pytorch/speech_transducer/v25.json ``` to have new model location and md5
- ```buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/speech_transducer/v25.json --framework pytorch --platform android/arm64 --devices "S8US" --force_profile --remote ```

**Test that saving a model with de-dup ops doesn't change its output**
https://www.internalfb.com/intern/anp/view/?id=1137434

Reviewed By: iseeyuan

Differential Revision: D30615710

fbshipit-source-id: bb4052f0f16eccab386585e94411056f94bce43c
2021-09-14 12:12:46 -07:00
86e6bed0d4 [PyTorch Edge][Model Loading] Operator Call De-dup at TorchScript Serialization Level [1/2] (#64268)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64268

If the same pair of operator name and num inputs have been used to add an instruction to the operator table previously (and the operator's schema is not vararg), use the same index as that instruction rather than creating a new one.
ghstack-source-id: 138014905

Test Plan: Phabricator tests, and test performance changes in next diff

Reviewed By: iseeyuan, tugsbayasgalan

Differential Revision: D30615434

fbshipit-source-id: f442f557f12412693a73004ce44733ccef063b82
2021-09-14 12:11:32 -07:00
97df69eac6 .github: Add render test results step (#64937)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64937

Adds CLI output for rendered test results to go alongside test exeuction, users should be able to quickly diagnose test failures like so:
![fdsfdsfdsfdsf](https://user-images.githubusercontent.com/1700823/133156245-ba939cbf-8aa2-47a7-b1fb-7cc876ca75c4.png)

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D30917897

Pulled By: seemethere

fbshipit-source-id: f51ea499462e3cfd64496cb711b84a93971c91bd
2021-09-14 11:25:14 -07:00
d188204323 remove SkipInfo class (#64972)
Summary:
per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64972

Reviewed By: mruberry

Differential Revision: D30924598

Pulled By: ngimel

fbshipit-source-id: 1ac1ec8fd50ca27e3cd36c12a588d334e7466899
2021-09-14 11:23:54 -07:00
eedc234e33 [PyTorch] Don't store multiple kernels per key on mobile (#64447)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64447

As the code comment says, we needn't worry about Jupyter notebooks on mobile.
ghstack-source-id: 137951718

Test Plan: Profiled startup of //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark on devserver with -niter 0 -nrep 0 and `C10_DISPATCHER_ONE_KERNEL_PER_DISPATCH_KEY` defined. Time spent in sherwood_v3_table lookups went way down.

Reviewed By: ezyang, bhosmer

Differential Revision: D30736094

fbshipit-source-id: bcc22cd0d9adceba259a03898c992759d501fe89
2021-09-14 10:36:43 -07:00
446d95a7f6 [fx const fold] fix some cases with deep model hierarchy (#64945)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64945

In the const folding pass, we try to create `get_attr` nodes in submod_1 for `get_attr` nodes that are in the main graph. But we don't have the real attributes in submod_1. To fix this we assign main module as the owning module of sumod_1 graph.

The fix above would cause problem for `call_module` node in submod_1 because during split modules gets inlined (target changed from "mod.a.b" -> "mod_a_b") to submod_1. Changing the owning module would make those `call_module nodes unable to find the referring module. To fix this, we set the targeting module to main module.

Reviewed By: jfix71

Differential Revision: D30905949

fbshipit-source-id: cd67bc8fe4b8ad4344ae97b8e36753fdce3ece6d
2021-09-14 09:45:44 -07:00
00e6e0c593 [Model Averaging] Revert #63895 (#64903)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64903

Fix the accuracy regression caused by https://github.com/pytorch/pytorch/pull/63895.

Test Plan:
buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_periodic_model_averager
buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity

Reviewed By: rohan-varma

Differential Revision: D30894688

fbshipit-source-id: fe00b8b23b860d9f806f87c1b6caba1d0b807485
2021-09-14 09:45:42 -07:00
882b67dff4 Drop incremental linking on Windows with REL_WITH_DEB_INFO=1. (#64892)
Summary:
The library will no longer link properly on VS 2019 (14.29.30133). To
ensure that engineers building on Windows can use and debug with this
build type, incremental linking needs to be turned off for this build
flag.

Verified that this build type successfully builds, links, and provides
debuggable Python modules on Windows.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64892

Reviewed By: jbschlosser

Differential Revision: D30902565

Pulled By: malfet

fbshipit-source-id: e5286a4c6f45c7cbe4cdc1b98560129bd386970b
2021-09-14 09:44:18 -07:00
01cfea9485 Disable target determination for now (#64921)
Summary:
There were several reports of target determinator incorrectly skipping
tests, most recent one is https://github.com/pytorch/pytorch/issues/64902

Let's disable it until it could be further stabilized

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64921

Reviewed By: seemethere, janeyx99

Differential Revision: D30901186

Pulled By: malfet

fbshipit-source-id: 531afd2d390c6b51f727330d5dd1882d70b6fdde
2021-09-14 09:40:13 -07:00
4e225da363 print_test_stats.py: dedup test report upload name with TEST_CONFIG (#64948)
Summary:
Connected with issue https://github.com/pytorch/pytorch/issues/64845, takeover of https://github.com/pytorch/pytorch/issues/64091

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64948

Reviewed By: malfet, seemethere

Differential Revision: D30908592

Pulled By: janeyx99

fbshipit-source-id: dc31b0bbc9f4e35d23412aa14acbbab7422b4146
2021-09-14 09:01:06 -07:00
e884554008 Make {select,slice,diagonal}_backward primitives wrt autograd (#64933)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64933

Fixes https://github.com/facebookresearch/functorch/issues/108

This is a short-term fix. A longer-term fix would be to either:
1. have proper {select,slice,diagonal}_embed functions
2. have efficient {select,slice,diagonal}_scatter functions (and
efficient zero tensors).

NB: I didn't use diag_embed because diag_embed is slightly different
from diagonal_backward.

There are no BC concerns because TorchScript (luckily) does not
serialize the backwards graph.

Test Plan:
- run tests
- run benchmarks.
https://gist.github.com/zou3519/e7c0774d1ac97f32aa02ec44d81e60e1.
Surprisingly the instruction count goes down. This is probably because
we create fewer autograd nodes now.

Reviewed By: ezyang

Differential Revision: D30909333

Pulled By: zou3519

fbshipit-source-id: 3b33e13010ba13b4d487b346aa9bee8a0e8c378c
2021-09-14 08:10:59 -07:00
2853c7da22 Replace composite dispatch with CompositeExplicitAutograd (#64641)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64641

`sum`, `mean`, and `norm` were ported to structured kernels in #61642, #61643, and #62711,
respectively. Those PRs changed related overlads into composite kernels. However, their
dispatch section remained the same, when they really should be marked as
`CompositeExplicitAutograd`. This PR fixes this issue.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30867122

Pulled By: ezyang

fbshipit-source-id: b951aee41a3cab9ca546df826a285d60013e3b3a
2021-09-14 07:56:54 -07:00
09d221e8d4 Revert D30711934: [pytorch][PR] Use RDS for build size tracking
Test Plan: revert-hammer

Differential Revision:
D30711934 (1cd0252eed)

Original commit changeset: 0af808ddf528

fbshipit-source-id: 6f67ed5cbaf333cc55729be2a23e385772e31b10
2021-09-14 06:10:03 -07:00
f23f21dafe [TensorExpr] Remove 'Placeholder' class. (#64887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64887

BufHandle has exactly the same functionality and should be used instead.

Differential Revision:
D30889483
D30889483

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 365fe8e396731b88920535a3de96bd3301aaa3f3
2021-09-14 00:22:44 -07:00
199031c48e [TensorExpr] PyBinds: improve QoL of pybind users. (#64886)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64886

Bind methods for implicit conversions and constructors to avoid
boilerplate code.

Differential Revision:
D30889193
D30889193

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Pulled By: ZolotukhinM

fbshipit-source-id: 137c0c98f7f1576e1bb97c8de8a900b28407a30e
2021-09-14 00:21:28 -07:00
caaa6efc1a Fix use of deprecated tensor.type() in SegmentReduce.cpp (#64151)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64151

Reviewed By: mruberry

Differential Revision: D30917268

Pulled By: ngimel

fbshipit-source-id: 63427372b651ac495d48ef552eba5fbf0e4378e9
2021-09-13 23:16:47 -07:00
d4b4d83521 [quant] handle empty input in fused_moving_avg_obs_fake_quant op (#64829)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64829

If an empty input is passed in, the aminmax operator fails with a runtime error like
```
RuntimeError: aminmax(): cannot compute aminmax over an empty dimension as the operation has no identity.
```

To avoid this during training we just return the input if we find it to be empty

Test Plan:
python test/test_quantization.py TestFusedObsFakeQuant

Imported from OSS

Reviewed By: jingsh

Differential Revision: D30870879

fbshipit-source-id: 0cb4b187449a45a37150a77510d2292f93a7d1cd
2021-09-13 22:22:31 -07:00
0aef44cb3d Add forward AD for torch.linalg.eigh (#62163)
Summary:
This PR adds forward mode differentiation for `torch.linalg.eigh` and a few other functions required for tests to pass.

For some reason running tests for `torch.linalg.eigvalsh` and complex `torch.linalg.eigh` hangs. These tests are skipped for now.

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry heitorschueroff walterddr IvanYashchuk xwang233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62163

Reviewed By: jbschlosser

Differential Revision: D30903988

Pulled By: albanD

fbshipit-source-id: d6a74adb9e6d2f4be8ac707848ecabf06d629823
2021-09-13 21:15:38 -07:00
35c82dbf5c [THC] remove TensorTypeUtils and TensorInfo (#64965)
Summary:
per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64965

Reviewed By: mruberry

Differential Revision: D30916754

Pulled By: ngimel

fbshipit-source-id: b24020d6a7ce8a05a5ab6c579d176dd94dd3b1d7
2021-09-13 20:36:28 -07:00
816048e7e6 EmbeddingBag sort thrust->cub (#64498)
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/57505

Also fixes a warning I found when compiling:
```
/home/gaoxiang/pytorch-cub/torch/csrc/distributed/c10d/quantization/quantization_gpu.cu(7): warning: inline qualifier ignored for "__global__" function
```
I also updated the bfloat16 guard to CUDA 11.5

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64498

Reviewed By: mruberry

Differential Revision: D30917077

Pulled By: ngimel

fbshipit-source-id: fb9df08fd469038478a563014b5af7452b4b28c0
2021-09-13 19:51:12 -07:00
ed30afd480 Speed up torch.unique_consecutive() (#64835)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62690

Like the way `unique_consecutive_cpu_template` implemented, this PR reimplements `_unique_dim_cpu_impl` to get better performance.
Also, because the overhead of `unique_dim_consecutive_cpu` is quite large, directly call `unique_consecutive_cpu_template` when we know the given input is a 1d-array.

## Benchmark
### Script
```python
import torch
import time

torch.manual_seed(0)
t = torch.randint(500, (10000000, ))
t = torch.sort(t)[0]

start = time.time()
uniques, inverse, counts = torch.unique_consecutive(t, dim=0, return_inverse=True, return_counts=True)
end = time.time()
print("torch.unique_consecutive(dim=0) time:", end - start)

start = time.time()
uniques2, inverse2, counts2 = torch.unique_consecutive(t, return_inverse=True, return_counts=True)
end = time.time()
print("torch.unique_consecutive() time:", end - start)

t = torch.randint(500, (10000000, 2))
t = torch.sort(t)[0]

start = time.time()
uniques, inverse, counts = torch.unique_consecutive(t, dim=0, return_inverse=True, return_counts=True)
end = time.time()
print("torch.unique_consecutive(dim=0) time:", end - start)

start = time.time()
uniques, inverse, counts = torch.unique_consecutive(t, dim=1, return_inverse=True, return_counts=True)
end = time.time()
print("torch.unique_consecutive(dim=1) time:", end - start)
```

### Before
```
torch.unique_consecutive(dim=0) time: 78.64345622062683
torch.unique_consecutive() time: 0.029544353485107422
torch.unique_consecutive(dim=0) time: 91.49796152114868
torch.unique_consecutive(dim=1) time: 0.30872368812561035
```

### After
```
torch.unique_consecutive(dim=0) time: 0.08256125450134277
torch.unique_consecutive() time: 0.08162403106689453
torch.unique_consecutive(dim=0) time: 35.58408498764038
torch.unique_consecutive(dim=1) time: 1.6258199214935303
```

## System Information
```
Collecting environment information...
PyTorch version: 1.10.0a0+git7f1932e
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8.10 (default, Jun  2 2021, 10:49:15)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.11.0-34-generic-x86_64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.2
[pip3] torch==1.10.0a0+gitbe09195
[conda] Could not collect
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64835

Reviewed By: jbschlosser

Differential Revision: D30894906

Pulled By: ngimel

fbshipit-source-id: 42ab76d638391ce6c4e589d9c71bdf7579310ad9
2021-09-13 19:00:36 -07:00
ab5e1c69a7 [WIP] Example of DataPipes and DataFrames integration (#60840)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60840

Test Plan: Imported from OSS

Reviewed By: wenleix, ejguan

Differential Revision: D29461080

Pulled By: VitalyFedyunin

fbshipit-source-id: 4909394dcd39e97ee49b699fda542b311b7e0d82
2021-09-13 18:50:15 -07:00
ee554e2e96 Re-land Fix test report uploading (#64958)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64958

This is a re-do of #64846 which was missing a path prefix for windows test reports

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D30915253

Pulled By: driazati

fbshipit-source-id: d14d0a64d2f8aabc335db9c4d0d2b63512887c66
2021-09-13 18:36:26 -07:00
f159f12fee [iOS][OSS][BE] Add Simulator tests for full JIT (#64851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64851

ghstack-source-id: 137970229

Test Plan: CircleCI

Reviewed By: hanton, cccclai

Differential Revision: D30877963

fbshipit-source-id: 7bb8ade1959b85c3902ba9dc0660cdac8f558d64
2021-09-13 18:16:08 -07:00
fd09e564d6 add acc_ops.max, acc_ops.maximum, consolidate acc_ops.min and acc_ops.minimum
Summary:
This diff adds `acc_ops.max` and `acc_ops.maximum` support.
It further consolidates the logic for `acc_ops.min` and `acc_ops.minimum` to match the logic for max.

torch.max has three behaviors:
```1. max(input)
2. max(input, dim, keepdim=False, *, out=None)
3. max(input, other, *, out=None)
```

Likewise, `torch.min` has three identical behaviors.

I've chosen to implement each as an acc_op, then map to the appropriate one.

the third max function is effectively `torch.maximum`, so I've implemented it as that.

Reviewed By: yinghai, jfix71, 842974287

Differential Revision: D30551464

fbshipit-source-id: 0a2eec10e5185cbf7d9984eec3fd399b23528b2a
2021-09-13 18:04:33 -07:00
3855c24639 Add BFloat16 support for cross, tril, triu, tril_indices, triu_indices and cumsum operators on CPU (#62454)
Summary:
Add BFloat16 support for cross, tril, triu, tril_indices, triu_indices and cumsum operators on CPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62454

Reviewed By: albanD

Differential Revision: D30845805

Pulled By: heitorschueroff

fbshipit-source-id: f83836862e38109ec929e83567133e9e88096b8b
2021-09-13 17:59:43 -07:00
1cd0252eed Use RDS for build size tracking (#64303)
Summary:
This adds 2 utilities: `register_rds_table` and `rds_write`. `register_rds_table` needs to be called once with the schema for the data that `rds_write` will write. These go to a lambda called `rds-proxy`, which will write to/read from the DB as necessary. This data can then be arbitrarily queried via `rds-proxy` (for use in CI) or on metrics.pytorch.org (for analysis).

It also hooks these up for build size tracking (which previously was not working on GHA)

TODO:
* verify output in logs + clean up prints

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64303

Reviewed By: malfet, seemethere

Differential Revision: D30711934

Pulled By: driazati

fbshipit-source-id: 0af808ddf528a24875a378caeb1aa9cb0693f802
2021-09-13 17:48:44 -07:00
c4073af61d Add skipIfTBB decorator (#64942)
Summary:
And replace two existing usages in the codebase with it

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64942

Reviewed By: jbschlosser

Differential Revision: D30906382

Pulled By: malfet

fbshipit-source-id: e7f20f53aff734b0379eded361255543dab4fa4b
2021-09-13 17:11:51 -07:00
8131bc85d0 Raise TypeError on assigned grad with wrong type (#64876)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64813

Raises a TypeError when assigned value to a grad is not a Tensor or
None.

Adds tests.

cc ezyang gchanan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64876

Reviewed By: anjali411

Differential Revision: D30901678

Pulled By: soulitzer

fbshipit-source-id: dbb3cb5fd0bbac6918e0b2e2f51d340daa43dee0
2021-09-13 16:41:45 -07:00
1e25a84993 kill SkipInfo (#64878)
Summary:
Per offline discussion, replaces SkipInfo with DecorateInfo. SkipInfo class itself is not removed yet to give functorch time to replace its SkipInfos.
cc zou3519

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64878

Reviewed By: mruberry

Differential Revision: D30908052

Pulled By: ngimel

fbshipit-source-id: 5124180b25c6e32517722883b9f3a2b488e3fe20
2021-09-13 16:32:36 -07:00
3710edc86b Fix TRTOperatorSupport (#64873)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64873

Fix TRTOperatorSupport's key naming to match the key generated by torch.fx.passes.tools_common.get_node_target. The get_node_target is used by splitter_base for comparing whether operator is supported by name.

Test Plan:
print out the supported operator dict and check name.
Run TRTSplitter with lrm_split_model_generator and verify split result is correct with all supported operators printed.
current split result:
````
Supported node types in the model:
acc_ops.size: ((), {'input': torch.float32})
acc_ops.getitem: ((), {'input': torch.float32})
acc_ops.getitem: ((), {'input': None})
acc_ops.reshape: ((), {'input': torch.float32})
acc_ops.unsqueeze: ((), {'input': torch.float32})
acc_ops.linear: ((), {'input': torch.float32, 'weight': torch.float32})
acc_ops.linear: ((), {'input': torch.float32, 'weight': torch.float32, 'bias': torch.float32})
acc_ops.mul: ((), {'input': torch.float32, 'other': torch.float32})
acc_ops.cat: ((), {})
acc_ops.add: ((), {'input': torch.float32, 'other': torch.float32})
acc_ops.add: ((), {'input': torch.float32})
acc_ops.tanh: ((), {'input': torch.float32})
acc_ops.transpose: ((), {'input': torch.float32})
acc_ops.matmul: ((), {'input': torch.float32, 'other': torch.float32})
acc_ops.div: ((), {'input': torch.float32, 'other': torch.float32})
acc_ops.squeeze: ((), {'input': torch.float32})
acc_ops.noop: ((), {'input': torch.float32})
acc_ops.layer_norm: ((), {'input': torch.float32, 'weight': torch.float32, 'bias': torch.float32})
acc_ops.permute: ((), {'input': torch.float32})
acc_ops.sigmoid: ((), {'input': torch.float32})
acc_ops.flatten: ((), {'input': torch.float32})
acc_ops.softmax: ((), {'input': torch.float32})
acc_ops.sum: ((), {'input': torch.float32})

Unsupported node types in the model:
torch.ops.fb.pad_sequence_embeddings: ((), {'embeddings': torch.float32, 'offsets': torch.int32})
acc_ops.linalg_norm: ((), {'input': torch
```

Reviewed By: yinghai

Differential Revision: D30884463

fbshipit-source-id: 22442aa6a69cd148ce9bc8be8f62157dd6d19954
2021-09-13 15:55:15 -07:00
914e3a861a Revert D30878101: [pytorch][PR] Fix test report uploading
Test Plan: revert-hammer

Differential Revision:
D30878101 (fba40bfc1a)

Original commit changeset: 0730f17fa3f4

fbshipit-source-id: dad89e68b4daf656dd0b592bc9b2758f00af38c6
2021-09-13 15:24:44 -07:00
6101cbcedb torch.ao migration: fake_quantize.py, phase 1 (#64814)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64814

1. move the file
```
hg mv caffe2/torch/quantization/fake_quantize.py caffe2/torch/ao/quantization/
```

2. create a new file in the old location and copy the imports
3. fix all callsites inside `torch`

Test Plan:
```
buck test mode/dev //caffe2/test:quantization
```

Reviewed By: z-a-f

Differential Revision: D30866792

fbshipit-source-id: 7a221cb46c0ab01f1c5de9be061f09ecc83ce23e
2021-09-13 15:22:28 -07:00
e4314dac57 [PyTorch] Reduce heap allocations in OperatorName::setNamespaceIfNotSet (#64673)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64673

We are now guaranteed to allocate at most one time in this function.
ghstack-source-id: 137786392

Test Plan: Previous diff adds test coverage for this function.

Reviewed By: dhruvbird

Differential Revision: D30813014

fbshipit-source-id: 17d844a1cc8c30574afcc6b0b41b219e62c0b723
2021-09-13 14:33:55 -07:00
000f3310d7 [PyTorch] Add test for operator_name (#64672)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64672

Just a small struct missing test coverage. Next diff changes it.
ghstack-source-id: 137786388

Test Plan: CI

Reviewed By: dhruvbird

Differential Revision: D30813013

fbshipit-source-id: 05f39494bb9512a71a928bfe6fcfa710016bdf61
2021-09-13 14:32:50 -07:00
c99277e177 handle the case in acc_ops.sum when dim == 0, differentiating it from the case when dim is None (#64869)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64869

handle the case in acc_ops.sum when dim == 0, differentiating it from the case when dim is None

Reviewed By: 842974287

Differential Revision: D30872739

fbshipit-source-id: 2755d3230804a16ef1c9289f804138c6dd7766b3
2021-09-13 14:24:16 -07:00
0561e104d9 fix build error when system cmake3 version >=3.5 but <=3.10 (#64914)
Summary:
For PyTorch source build using conda, there will raise an error in 8535418a06/CMakeLists.txt (L1) when we get a CMake version < 3.10, it can be fixed by upgrade CMake in conda env, but for centos, there has CMake3, PyTorch fist check whether CMake3's verison<=3.5, so if user's system camke<= 3.5, PyTorch will use the system's cmake3, which will have build error like:
```
CMake Error at CMakeLists.txt:1 (cmake_minimum_required):
  CMake 3.10 or higher is required.  You are running version 3.6.3

-- Configuring incomplete, errors occurred!
```

we need to check CMake3 also >=3.10, if not, then check conda's CMake version.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64914

Reviewed By: jbschlosser

Differential Revision: D30901673

Pulled By: ezyang

fbshipit-source-id: 064e2c5bc0b9331d6ecd65cd700e5a42c3403790
2021-09-13 13:26:06 -07:00
fba40bfc1a Fix test report uploading (#64846)
Summary:
Previously we just weren't uploading Windows test report XML files to S3, only to GitHub actions. This was different than Linux where we use both (though maybe we can kill the GHA upload in a follow up PR since I don't think it's very useful anymore). This factors it all out into a macro so they both do the same thing. This also fixes the naming of uploaded files to include info about the job name (the full config, so they can be matched to the job visually or by the included job id).

See https://hud.pytorch.org/pr/64846 for results

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64846

Reviewed By: seemethere

Differential Revision: D30878101

Pulled By: driazati

fbshipit-source-id: 0730f17fa3f46a32c131f52669084c3103b0e616
2021-09-13 13:22:54 -07:00
af984c78a9 Pin SciPy to 1.6.3 on Mac (take 2) (#64922)
Summary:
It's already pinned by via docker install on Linux

`scipy.stats.`[`poission`|`geom`|`binom`] returns quite different results between 1.6.x and 1.7+ versions of SciPy, which results in several distributions tests failing accuracy thresholds

Reland of https://github.com/pytorch/pytorch/pull/64844 but limited to just Mac platform
Followup PR for Windows are coming as well

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64922

Reviewed By: janeyx99

Differential Revision: D30901257

Pulled By: malfet

fbshipit-source-id: 0543e7bae9d3bbeb8b6be7b3ecf605880f97665f
2021-09-13 12:48:11 -07:00
1bea49c716 [Deploy] Avoid use-after-free during autograd shutdown (#64620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64620

`autograd` extension module's shutdown logic destructs `PyThreadState` by `pybind11::gil_scoped_acquire` using the RAII pattern.

The problem is that torch.deploy also destructs `PyThreadState` as part of its shutdown process (https://www.internalfb.com/phabricator/paste/view/P456363738), causing double destruction, use-after-free.

This change adds `defined(USE_DEPLOY)` as a special case to avoid destruction of `PyThreadState` to the existing special treatment for  `IS_PYTHON_3_9_PLUS`.

Test Plan: Added `TorchpyTest.Autograd` unittest to ensure that torch.deploy can create multiple instances that use autograd without causing a crash.

Reviewed By: albanD

Differential Revision: D30779080

fbshipit-source-id: 4de3283cc2d394acc9b8141c17cacbfab5eea052
2021-09-13 12:43:10 -07:00
fd716fcda2 [Pytorch Edge] Quantized Ops Dtype Selective (#63680)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63680

Quantized ops not covered by DType Selectivity. Add the check, and adjust call sites to be constexpr friendly.

Test Plan: CI (this covers all model unit tests), verified that segmentation (a model that uses some of these quant ops) still works on instagram.

Reviewed By: dhruvbird, raymondethan

Differential Revision: D30457626

fbshipit-source-id: 5ba850d2b53a18558dfbb1cfaa78d8f53b5dbad8
2021-09-13 11:04:07 -07:00
4ca40aeb83 Disable more of the pragma warning stuff (#64899)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64899

ghstack-source-id: 137882055

Test Plan: sandcastle, ossci

Reviewed By: malfet, ngimel

Differential Revision: D30893691

fbshipit-source-id: 67ec8cc9f212aa16a201771603236e429944b561
2021-09-13 10:58:31 -07:00
8cfc74400a [PyTorch] Gate tls_local_dispatch_key_set off on iOS too (#64753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64753

This may possibly be causing problems on iOS. (Maybe we should just revert inlining access to this thing? Really don't understand what's wrong with it, though.)
ghstack-source-id: 137830520

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D30826897

fbshipit-source-id: 0438dee9d49e7601c26cdca0e8540229c777eddb
2021-09-13 10:54:28 -07:00
d4b031b31e typo fix (#64615)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64615

Reviewed By: jbschlosser

Differential Revision: D30884298

Pulled By: ngimel

fbshipit-source-id: 230f9d06aa85abcdd69828a1ea0a83f36cbfcb17
2021-09-13 10:50:01 -07:00
01e92f2a56 [nn] no batch dim support: CosineEmbeddingLoss (#64590)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/60585

TODO
* [x] Add tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64590

Reviewed By: H-Huang

Differential Revision: D30900775

Pulled By: jbschlosser

fbshipit-source-id: d24e72787017e79afbf8f04a94901a290485b81a
2021-09-13 10:45:33 -07:00
2ae938e15e Fixes failure in test_dataloader.py that occurs on jetson boards (#64757)
Summary:
CUDA IPC is not supported for jetsons

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64757

Reviewed By: jbschlosser

Differential Revision: D30900593

Pulled By: ejguan

fbshipit-source-id: c6b2e8a9746276fdb4a009b6412e47cc8aac69f2
2021-09-13 10:11:04 -07:00
8e63199c7c .github: Always run chown workspace (#64854)
Summary:
In some workflow runs, like https://github.com/pytorch/pytorch/runs/3568714658, the chown workspace step is duplicated.

Is that intentional? Unfortunately it is pretty necessary since (w/ docker) the folder can sometimes be in a broken permission state before and after we run jobs.

So this PR makes the second chown workspace run always because that's the true intention of the step.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64854

Reviewed By: jbschlosser, seemethere

Differential Revision: D30879289

Pulled By: janeyx99

fbshipit-source-id: 4157ff826c86e8c912deb1ba0cb5c47ea7596529
2021-09-13 10:06:31 -07:00
70e64feda7 Reland .circleci: Skip cuda /cudnn install if existing (#64880)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64880

This reverts commit 5836a116d0de214d6d759e70671f23150a5deaba.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30885675

Pulled By: seemethere

fbshipit-source-id: 8c96584d5a632170e29f91c5daf0206680a78661
2021-09-13 09:52:16 -07:00
3d976d9ceb torch.ao migration: quantize_jit.py phase1 (#64860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64860

ghstack-source-id: 137885395

Test Plan: buck test mode/dev //caffe2/test:quantization

Reviewed By: jerryzh168

Differential Revision: D30880574

fbshipit-source-id: 9629027dd3b00bb8d45633e1564fc03a866f8c31
2021-09-13 08:41:48 -07:00
9d52651d4e torch.ao migration: stubs.py phase 1 (#64861)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64861

1. move the file
  ```
  hg mv caffe2/torch/quantization/stubs.py caffe2/torch/ao/quantization/
  ```

  2. create a new file in the old location and copy the imports
  3. fix all call sites inside `torch`
ghstack-source-id: 137885365

Test Plan: buck test mode/dev //caffe2/test:quantization

Reviewed By: jerryzh168

Differential Revision: D30879678

fbshipit-source-id: a2d24f25d01064212aca15e94e8c78240ba48953
2021-09-13 08:40:29 -07:00
c08b2491cc add BFloat16 operators on CPU: cummax, cummin (#63307)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63307

Reviewed By: nikithamalgifb

Differential Revision: D30342002

Pulled By: anjali411

fbshipit-source-id: eee6e640da996ef0e983960119608d9c12405336
2021-09-13 08:00:17 -07:00
d932ddd24b fix quantization.rst doc (#64802)
Summary:
RT。

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64802

Reviewed By: jbschlosser

Differential Revision: D30887210

Pulled By: vkuzo

fbshipit-source-id: 0267883d3065d724ea654a28db78f5fe5702ef06
2021-09-13 07:19:54 -07:00
9c73a48ecf ND Embeddings benchmark - Standardize randomized inputs (#64707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64707

Use torch.randn instead of torch.from_numpy to generate the tensor

Test Plan: buck run //caffe2/benchmarks/operator_benchmark/pt:qembedding_pack_test

Reviewed By: jingsh

Differential Revision: D30817302

fbshipit-source-id: 924c05517812b4b9f7df05a8999f9236cfe7b672
2021-09-13 06:47:35 -07:00
b37503e452 Initial implementation of nanmean (#62671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62671

Very crude first implementation of `torch.nanmean`. The current reduction kernels do not have good support for implementing nan* variants. Rather than implementing new kernels for each nan* operator, I will work on new reduction kernels with support for a `nan_policy` flag and then I will port `nanmean` to use that.

**TODO**

- [x] Fix autograd issue

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30515181

Pulled By: heitorschueroff

fbshipit-source-id: 303004ebd7ac9cf963dc4f8e2553eaded5f013f0
2021-09-13 05:53:58 -07:00
8535418a06 [Reland] Added reference tests to ReductionOpInfo (#64273)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64273

Reintroduced sample_inputs_prod and constrained the range of values for large reference tests.

This reverts commit e4fd2ab59ce8645f5ae9477c7724b6af82124b3b.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30672097

Pulled By: heitorschueroff

fbshipit-source-id: b44ed8dfd5eb0c74c194164dafc3242f6728a78f
2021-09-12 20:05:43 -07:00
1cb3507ed3 Adds DLPack support (#57110)
Summary:
Partially Fixes https://github.com/pytorch/pytorch/issues/55090
Depends on https://github.com/pytorch/pytorch/issues/55365

Inspired by https://github.com/dmlc/dlpack/issues/57#issuecomment-774482973

Questions, in PyTorch we can't create streams or easily synchronize them from just an integer. Should we add an [`ExternalStream`](https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.ExternalStream.html) object like the one we have in CuPy?

TODO: Add tests

Would like some feedback as this design needs quite a few iterations
rgommers leofang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57110

Reviewed By: saketh-are

Differential Revision: D30761481

Pulled By: mruberry

fbshipit-source-id: e85d78df3c1f8defc2a698878da89cd843cb1209
2021-09-12 19:47:15 -07:00
d46ea03871 [fix] fix test_python_dispatch with pytest (#64574)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62501

Another approach for fixing the same issue

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64574

Reviewed By: ngimel

Differential Revision: D30867237

Pulled By: ezyang

fbshipit-source-id: c632a1e0b241effdc21ae929abe42fccec88aa24
2021-09-12 17:06:55 -07:00
be79da3303 Revert D30876591: [pytorch][PR] Pin scipy to 1.6.3 on Windows and Mac
Test Plan: revert-hammer

Differential Revision:
D30876591 (39f2b9de2a)

Original commit changeset: 4946e0922063

fbshipit-source-id: b8beff3d973b21fe09c158baef25344030f8fb08
2021-09-12 15:56:40 -07:00
1577c106dc torch.ao migration: numeric suite, eager and fx (#64817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64817

This migrates `torch.quantization._numeric_suite` to `torch.ao.ns._numeric_suite`, and `torch.quantization._numeric_suite_fx` to `torch.ao.ns._numeric_suite_fx`.

1. move the files
```
HG: move eager mode
hg mv caffe2/torch/quantization/_numeric_suite.py caffe2/torch/ao/ns/
HG: move fx
hg mv caffe2/torch/quantization/_numeric_suite_fx.py caffe2/torch/ao/ns/
hg mv caffe2/torch/quantization/ns/* caffe2/torch/ao/ns/fx/
```

2. create new versions of `_numeric_suite.py` and `_numeric_suite_fx.py` with
imports

3. update all FB callsites

Test Plan: buck test mode/dev //caffe2/test:quantization

Reviewed By: z-a-f

Differential Revision: D30867538

fbshipit-source-id: 120ee830434ca490c1183a187a518eebcbbaf22c
2021-09-12 12:00:45 -07:00
39f2b9de2a Pin scipy to 1.6.3 on Windows and Mac (#64844)
Summary:
It's already pinned by via docker install on Linux

As `scipy.stats.`[`poission`|`geom`|`binom`] returns quite different results in 1.7+ versions of SciPy

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64844

Reviewed By: driazati

Differential Revision: D30876591

Pulled By: malfet

fbshipit-source-id: 4946e0922063e9ac320c218a0b089f73486466f7
2021-09-12 10:53:48 -07:00
47144de473 Revert D30867266: [pytorch][PR] TST Adds gradcheck and gradgradcheck to module info
Test Plan: revert-hammer

Differential Revision:
D30867266 (67ebde5645)

Original commit changeset: cbc073326151

fbshipit-source-id: 00234e01eafc45fb999f7c83a397f9d6b3e01e46
2021-09-12 10:30:28 -07:00
30a7c768d7 [RFC] Modularize functions of parsing bytecode (#61862)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61862

Modularize functions of parsing bytecode tables so that they can be used as needed in situations other than mobile lite interpreter.
* The decoupled functions are re-used by current lite interpreter loader.
* The bytecode can be serialized/deserialized from other formats.
* The decoupled functions have minimum dependencies on other PyTorch components.

Next:
Build a driver binary to include the parser and interpreter, but only has necessary dependency on other PyTorch components.
ghstack-source-id: 137867287

Test Plan:
As an example, a simple bytecode is parsed to a mobile function, and directly run in the added unit test, `RunTimeTest:ParseBytecode`. It contains basic control flow (if, else) and basic data orchestration (list construction).
CI

Reviewed By: larryliu0820

Differential Revision: D29798382

Pulled By: iseeyuan

fbshipit-source-id: 1c173a5f5d37097e3a97baec3f3e48e1eea1400f
2021-09-11 22:24:05 -07:00
dd2d48df07 Revert D30875977: [caffe2] [aten] Remove loose (unpaired) #pragma warning ( pop ) in TensorBase.h
Test Plan: revert-hammer

Differential Revision:
D30875977 (1f35d20a89)

Original commit changeset: bd593feb5a75

fbshipit-source-id: 4c82dbc857fdb28e0240dacc1a0e607a76552bb4
2021-09-11 17:18:37 -07:00
d13e0c9c39 [iOS][OSS][BE] Update XCode to use 12.5.1 (#64850)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64850

ghstack-source-id: 137827895

Test Plan: CircleCI

Reviewed By: hanton

Differential Revision: D30877964

fbshipit-source-id: 803f2506a755b3815024704e6177c7826bc42de8
2021-09-11 11:24:06 -07:00
c9eb312ce9 [iOS][OSS][BE] Remove unused files (#64849)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64849

ghstack-source-id: 137827893

Test Plan: CircleCI

Reviewed By: hanton

Differential Revision: D30877962

fbshipit-source-id: a76f7fe888b990ba6cad650f72be7f4a1e58a2f1
2021-09-11 11:22:55 -07:00
82ac3f108d [TensorExpr] Move 2 graph passes from kernel.cpp to graph_opt.cpp (#64828)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64828

Also, make `removeUnusedSelfArgument` more consistent with other passes
by mutating the graph in-place rather than returning a copy.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30870776

Pulled By: ZolotukhinM

fbshipit-source-id: 4873f01b013921143a5aa43746d655a2d8d620c9
2021-09-11 10:23:15 -07:00
ff65f637df [TensorExpr] Add debug logging (store/load tracing) to IREval. (#64848)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64848

Test Plan: Imported from OSS

Reviewed By: Chillee

Differential Revision: D30878278

Pulled By: ZolotukhinM

fbshipit-source-id: bd946075336ba2e9786602161c236a0ff8a5a011
2021-09-11 09:25:55 -07:00
180e4fbfae [TensorExpr] LLVMCodegen: fix lowering for UInt->Float casts. (#64862)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64862

Previously we erroneously were looking at dst signedness. This was
discovered when we tried to implement quantize/dequantize ops.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30881696

Pulled By: ZolotukhinM

fbshipit-source-id: 34af842e5e52a3b6b5d2e70c4ef32f910a20341f
2021-09-11 09:24:36 -07:00
1f35d20a89 [caffe2] [aten] Remove loose (unpaired) #pragma warning ( pop ) in TensorBase.h (#64870)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64870

Remove loose (unpaired) #pragma warning ( pop ) in TensorBase.h
Issue started with D30728580 (d701357d92), was fixed with D30846958 (40098f48a1), and brought back again with the reversion of D30846958 (40098f48a1).

Reviewed By: H-Huang

Differential Revision: D30875977

fbshipit-source-id: bd593feb5a75245470e43ad568ebdd3f1738da7c
2021-09-11 00:43:19 -07:00
d4a86c1f3b [quant][fx2trt] Add lowering support for reference linear/conv modules (#64368)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64368

Test Plan:
python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py

Imported from OSS

Reviewed By: 842974287

Differential Revision: D30708738

fbshipit-source-id: 88142b7ce43ed96093597112dab03a2d277de993
2021-09-10 22:25:27 -07:00
4481c87ac4 [tensorexpr] Simplify x/100 -> 0 if x is a non-negative integer less than 100. (#64763)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64763

Simplification pattern:
  x/N -> 0; N is a constant positive integer and x is a for-loop index whose range is a subset of [0, N).

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30845854

Pulled By: huiguoo

fbshipit-source-id: 814d69ed4be05e57405c222183cc1c6c526721cd
2021-09-10 20:33:02 -07:00
5836a116d0 Revert D30869803: .circleci: Skip cuda /cudnn install if existing
Test Plan: revert-hammer

Differential Revision:
D30869803 (717d267e19)

Original commit changeset: 9eb3bd20875d

fbshipit-source-id: bef8d0c693696307a3be7abd5331b7fa813d754a
2021-09-10 18:56:50 -07:00
67ebde5645 TST Adds gradcheck and gradgradcheck to module info (#64444)
Summary:
Follow up to https://github.com/pytorch/pytorch/issues/61935

cc albanD mruberry jbschlosser walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64444

Reviewed By: ngimel

Differential Revision: D30867266

Pulled By: jbschlosser

fbshipit-source-id: cbc0733261517dbfcdd3415d969b9e802b62b7ac
2021-09-10 16:53:11 -07:00
c60075d4b5 Preserve types during empty container assignment (#58911)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58911

Stack from [ghstack](https://github.com/ezyang/ghstack):
* __->__ #58911

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D30785623

Pulled By: ansley

fbshipit-source-id: 4e05d6369318974290fea02ad2bc148293c25090
2021-09-10 16:49:21 -07:00
b4855619d1 Always upload stats to S3 (#64853)
Summary:
It's not very useful when stats are only uploaded when the tests all pass.

Like for this failing run, the stats were not uploaded to Scribe or S3: https://github.com/pytorch/pytorch/runs/3568714658

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64853

Reviewed By: seemethere

Differential Revision: D30878361

Pulled By: janeyx99

fbshipit-source-id: 19a4c520efdd5575785a3ffbc60e6c09456b9e0d
2021-09-10 16:49:19 -07:00
f3f410880a [DataPipe] Remove ZipArchiveReader's dependency on FileLoader (#64786)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* https://github.com/pytorch/pytorch/issues/64788
* __->__ https://github.com/pytorch/pytorch/issues/64786

This PR removes ZipArchiveReader's dependency on FileLoader DataPipe, by allowing it to use a IterDataPipe of path names as input rather than a tuple of path name and a stream.

It also adds additional tests to ensure that the DataPipe is functioning properly when it is read multiple times or reset half way through reading.

The whole stack fixes issues related to unclosed buffer stream (see https://github.com/pytorch/pytorch/issues/64281).

cc VitalyFedyunin ejguan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64786

Reviewed By: ngimel

Differential Revision: D30870968

Pulled By: NivekT

fbshipit-source-id: 64b04d1697b99772f2fa20fc141668e6b8e18c41
2021-09-10 16:49:17 -07:00
717d267e19 .circleci: Skip cuda /cudnn install if existing (#64825)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64825

Rewrites this script to only install the CUDA tools if they are not already
pre-installed

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30869803

Pulled By: seemethere

fbshipit-source-id: 9eb3bd20875df0f2b18f5314ac825dbdf91637b5
2021-09-10 16:49:14 -07:00
dafa0a5a3b [doc][hackathon] To add Adadelta Optimizer to the documentation (#63255)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of AdaDelta Algorithm to the documentation.  For more details, we refer to the paper  here https://arxiv.org/abs/1212.5701

<img width="654" alt="AdaDeltaalg" src="https://user-images.githubusercontent.com/73658284/132770544-82ccf90a-1d54-4ad5-8fc4-51c8dec63a12.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63255

Reviewed By: ngimel

Differential Revision: D30867589

Pulled By: iramazanli

fbshipit-source-id: 5ba602c20c724a4486bdd38b73e1b64c0e767bdc
2021-09-10 16:49:12 -07:00
d8ae3cc318 Add more error checking in subclass creation (#64746)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64746

This extracts the error checking that used to be in the PR above.
We are not going to land the proposed fix there, but I think we want this error checking in right now as these would lead to respectively a memory leak and arbitrary memory read/write.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30867569

Pulled By: albanD

fbshipit-source-id: bf468033fb8b49fcb26eed423f5fad82b4a46c56
2021-09-10 16:49:10 -07:00
89f94fc15f Move THPVariable_NewWithVar around (#64550)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64550

Just moves a function around to make the next PR easier to read.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30867570

Pulled By: albanD

fbshipit-source-id: 99ae925568ed29ca7fdea059762c21d430d4a204
2021-09-10 16:49:08 -07:00
2cc9778495 [MicroBench] Added a log_vml version of the signed log1p kernel (#64205)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64205

The log_vml version of the micro-bench is over **2x** faster than the log1p version. Here are the perf numbers:

```
---------------------------------------------------------------------------------------------
Benchmark                                   Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------
SignedLog1pBench/ATen/10/1467           45915 ns        45908 ns        14506 GB/s=2.5564G/s
SignedLog1pBench/NNC/10/1467            40469 ns        40466 ns        17367 GB/s=2.9002G/s
SignedLog1pBench/NNCLogVml/10/1467      19560 ns        19559 ns        35902 GB/s=6.00016G/s
```

Thanks to bertmaher for pointing this out.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30644716

Pulled By: navahgar

fbshipit-source-id: ba2b32c79d4265cd48a2886b0c62d0e89ff69c19
2021-09-10 16:49:06 -07:00
cad7a4b0ea [nnc] Added an implementation of sign op (#64033)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64033

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30579197

Pulled By: navahgar

fbshipit-source-id: f9f7fa7f2ffa109cf4e441eb1af821b8b891d4d3
2021-09-10 16:49:04 -07:00
3fbb49e75d Extend 2Dim embedding bag benchmarking to include 3Dim benchmarks (#64647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64647

Add support for benchmarking of 8 bit quantizations of N-D batched embeddings. Currently only works for 3Dim embeddings and still requires thought on ramping up from 3Dim to NDim.

Test Plan: ```buck run //caffe2/benchmarks/operator_benchmark/pt:qembedding_pack_test```

Reviewed By: jingsh

Differential Revision: D30770085

fbshipit-source-id: 26659020f3458991592065a05366bde0f060494e
2021-09-10 16:49:02 -07:00
227aafd1d9 Revert D30846958: [caffe2/aten] Remove loose #pragma warning ( pop ) in TensorBase.h
Test Plan: revert-hammer

Differential Revision:
D30846958 (40098f48a1)

Original commit changeset: 52a3fb66e426

fbshipit-source-id: 1d749f6981756f2169d6867538555a945cbb8ca6
2021-09-10 16:47:08 -07:00
5060b69d62 [DataPipe] fixing tests related fork() to remove warnings (#64827)
Summary:
There are two warnings produced by `test_fork_datapipe`. This PR addresses the issues raised by those warnings without impacting the test cases.

cc VitalyFedyunin ejguan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64827

Reviewed By: ejguan

Differential Revision: D30870528

Pulled By: NivekT

fbshipit-source-id: 580a001c6fa3ff6f8b04a7e5183e58861938204b
2021-09-10 11:01:42 -07:00
ade4bf3e82 [tensorexpr] Add 'pre_alloc' argument in python API of tensorexpr kernel (#64718)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64718

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30826582

Pulled By: huiguoo

fbshipit-source-id: 6c173c8964f2643039273cdc83e64fb02bb5f381
2021-09-10 10:03:00 -07:00
92cd5ab1cb Skip conjugate and negate fallback for view ops and their in-place versions (#64392)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64392

cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30866330

Pulled By: anjali411

fbshipit-source-id: 7b2f51486bf1d610ad2b1472306bab608ee69c37
2021-09-10 09:57:27 -07:00
54b72a99ef To add Rprop documentation (#63866)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Rprop to the documentation.  For more details, we refer to the paper  http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.1417

<img width="657" alt="Rpropalg" src="https://user-images.githubusercontent.com/73658284/132750009-a5ec059e-6d53-4c67-917b-57174c8ca27b.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63866

Reviewed By: ngimel

Differential Revision: D30867590

Pulled By: iramazanli

fbshipit-source-id: 0d2d4ffc6c4d939290bbbaa84d2c6e901ed8b54a
2021-09-10 09:49:10 -07:00
c7b03e2b83 [ROCm] define C10_WARP_SIZE to warpSize HIP constant (#64302)
Summary:
warpSize is defined as a constexpr in HIP headers.  It is incorrect to assume warpSize 64.  This change fixes the C10_WARP_SIZE definition in torch sources similar to [how it was done in caffe2](https://github.com/pytorch/pytorch/blob/master/caffe2/utils/GpuDefs.cuh#L10-L14).

cc jeffdaily sunway513 jithunnair-amd ROCmSupport

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64302

Reviewed By: mrshenli

Differential Revision: D30785975

Pulled By: malfet

fbshipit-source-id: 68f8333182ad4d02bd0c8d02f1751a50bc5bafa7
2021-09-10 09:43:47 -07:00
db3fcf0af3 fix typo in torch/onnx/utils.py (#63396)
Summary:
fixes minor typo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63396

Reviewed By: pbelevich

Differential Revision: D30644295

Pulled By: SplitInfinity

fbshipit-source-id: c506f67383909aa2c0c7c533038446b4b3d76a3a
2021-09-10 09:37:44 -07:00
rui
c12df2dc23 build: bump bazel to 4.2.1 (#64455)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64455

Reviewed By: saketh-are

Differential Revision: D30752580

Pulled By: malfet

fbshipit-source-id: 4f5cc6f820396348181c09463f7e5628b5f69471
2021-09-10 08:30:10 -07:00
63b180beed ROCm MIOpen NHWC Convolution support (#63617)
Summary:
- Added 2D-Convolution NHWC support
  - on ROCm 4.3, with `PYTORCH_MIOPEN_SUGGEST_NHWC=1` flag
  - May need to force MIOpen to search for solutions ( see examples below for flags )

**PYTORCH_MIOPEN_SUGGEST_NHWC Environment Flag**
MIOpen does not officially support NHWC yet, although convolution support has been added to tip-of-tree of MIOpen. This flag is intended to be a short-lived flag to explicitly turn on NHWC support until ROCm officially supports NHWC and performance is verified.

**Examples**
1. Example usage 1 : Run test on ROCm4.3
`PYTORCH_TEST_WITH_ROCM=1 PYTORCH_MIOPEN_SUGGEST_NHWC=1 MIOPEN_FIND_ENFORCE=4 MIOPEN_DEBUG_CONV_GEMM=0 MIOPEN_FIND_MODE=1 pytest test_nn.py -v -k "test_conv_cudnn_nhwc" `
2. Example usage 2: Run the following with `PYTORCH_MIOPEN_SUGGEST_NHWC=1` on ROCm4.3.
```
#!/usr/bin/env python3
import torch
model = torch.nn.Conv2d(8, 4, 3).cuda().half()
model = model.to(memory_format=torch.channels_last)
input = torch.randint(1, 10, (2, 8, 4, 4), dtype=torch.float32, requires_grad=True)
input = input.to(device="cuda", memory_format=torch.channels_last, dtype=torch.float16)

# should print True for is_contiguous(channels_last), and strides must match NHWC format
print(input.is_contiguous(memory_format=torch.channels_last), input.shape, input.stride() )

out = model(input)

# should print True for is_contiguous(channels_last), and strides must match NHWC format
print("Contiguous channel last :", out.is_contiguous(memory_format=torch.channels_last), " out shape :",  out.shape, "out stride :", out.stride() )
```

See https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html for more examples.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63617

Reviewed By: saketh-are

Differential Revision: D30730800

Pulled By: ezyang

fbshipit-source-id: 61906a0f30be8299e6547d312ae6ac91cc7c3238
2021-09-10 08:06:32 -07:00
2a81e8b8f1 Let all_reduce_coalesced and all_gather_coalesced return Future objects (#64722)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64722

`all_reduce_coalesced` and `all_gather_coalesced` are never publicly
released in our API docs. So, I would assume the blast radius to be small.

The motivation for this change to allow implementing
`all_reduce_coalesced` and `all_gather_coalesced` by re-using `allreduce`
and `allgather` C++ cores and perform flatten and copy only on the Python
side. With that, we can then remove `all_reduce_coalesced` and
`all_gather_coalesced` from C++ ProcessGroup APIs. For the async mode,
the copy-back logic after the communication will need to be chained
as a callback on the returned Future and use the chained child Future
as the return value (otherwise, we will need to wrap the child Future
into another work handle). This PR tries to test if we can directly
return a Future without breaking tests and internal use cases. If yes,
it will make the consolidation a lot easier.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D30830994

Pulled By: mrshenli

fbshipit-source-id: dcde0ed9245e9e8fee357b3588b07d540a4b6318
2021-09-10 07:45:25 -07:00
88fff22023 torch.lu: forward AD support (#64742)
Summary:
As per title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64742

Reviewed By: H-Huang

Differential Revision: D30841227

Pulled By: albanD

fbshipit-source-id: dc4d043ab94358594adb110fbbbb60750c98262a
2021-09-10 07:19:11 -07:00
be091950d0 [const_fold] Keep around node.meta for replaced folded ops (#64782)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64782

Previously, get_attrs that were added to the graph did not retain node.meta after folding. Add such support, and improve coverage in general here.

Test Plan: Added test coverage.

Reviewed By: protonu

Differential Revision: D30852704

fbshipit-source-id: ece87a61c69b2e68982964c6adc4dde14dae12c7
2021-09-09 23:52:39 -07:00
40098f48a1 [caffe2/aten] Remove loose #pragma warning ( pop ) in TensorBase.h (#64773)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64773

Remove loose `#pragma warning ( pop )` in TensorBase.h.

Reviewed By: ezyang

Differential Revision: D30846958

fbshipit-source-id: 52a3fb66e426bc16ef7bde2a13e26e8293969026
2021-09-09 23:45:45 -07:00
95d98dfeec Add TRTSplitter (#64762)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64762

Extract and format TRTSplitter from fx2trt_example code, current implementation is tentative, subject to changed based on feeds model lowering progress.

Test Plan:
manul print of supported operator:
`{<class 'torch.nn.modules.activation.ReLU'>: None, <function relu at 0x7f9b1abd0790>: None, <class 'torch.nn.modules.activation.Sigmoid'>: None, <class 'torch.nn.modules.pooling.AdaptiveAvgPool2d'>: None, <built-in method add of type object at 0x7f9b7f402498>: None, <built-in function add>: None, <built-in method add of PyCapsule object at 0x7f9b1a3dc690>: None, <built-in method add_relu of PyCapsule object at 0x7f9b1a34cf90>: None, <class 'torch.nn.modules.batchnorm.BatchNorm2d'>: None, <class 'torch.nn.quantized.modules.batchnorm.BatchNorm2d'>: None, <class 'torch.nn.modules.conv.Conv2d'>: None, <class 'torch.nn.quantized.modules.conv.Conv2d'>: None, <class 'torch.nn.intrinsic.quantized.modules.conv_relu.ConvReLU2d'>: None, <class 'torch.nn.modules.linear.Linear'>: None, <class 'torch.nn.quantized.modules.linear.Linear'>: None, <class 'torch.nn.modules.pooling.MaxPool2d'>: None, <built-in function mul>: None, <built-in method mul of type object at 0x7f9b7f402498>: None, <built-in method mul of PyCapsule object at 0x7f9b1a3dc6c0>: None, <built-in method flatten of type object at 0x7f9b7f402498>: None, <class 'torch.nn.quantized.modules.DeQuantize'>: None, <built-in method dequantize of type object at 0x7f9b7f402498>: None, 'dequantize': None, <class 'torch.nn.quantized.modules.Quantize'>: None, <built-in method quantize_per_tensor of type object at 0x7f9b7f402498>: None, <class 'torch.nn.modules.linear.Identity'>: None, <function conv2d at 0x7f9b1a1fe9d0>: None, <function flatten at 0x7f9b1a1f5ca0>: None, <function size at 0x7f9b1a1f5b80>: None, <function batch_norm at 0x7f9b1a1feaf0>: None, <function layer_norm at 0x7f9b1a1feb80>: None, <function softmax at 0x7f9b1a1f9550>: None, <function relu at 0x7f9b1a1fe040>: None, <function sin at 0x7f9b1a2030d0>: None, <function cos at 0x7f9b1a203160>: None, <function tan at 0x7f9b1a2031f0>: None, <function sinh at 0x7f9b1a1fe160>: None, <function cosh at 0x7f9b1a1fe280>: None, <function tanh at 0x7f9b1a1fe310>: None, <function asin at 0x7f9b1a1fe3a0>: None, <function acos at 0x7f9b1a1fe430>: None, <function atan at 0x7f9b1a1fe4c0>: None, <function exp at 0x7f9b1a1fe550>: None, <function log at 0x7f9b1a1fe5e0>: None, <function sqrt at 0x7f9b1a1fe670>: None, <function reciprocal at 0x7f9b1a1fe700>: None, <function abs at 0x7f9b1a1fe790>: None, <function neg at 0x7f9b1a1fe820>: None, <function floor at 0x7f9b1a1fe8b0>: None, <function ceil at 0x7f9b1a1fe940>: None, <function sum at 0x7f9b1a1f9c10>: None, <function max_pool2d at 0x7f9b1a1f5d30>: None, <function squeeze at 0x7f9b1a1f5c10>: None, <function add at 0x7f9b1a1f91f0>: None, <function sub at 0x7f9b1a1f9ca0>: None, <function div at 0x7f9b1a1f9dc0>: None, <function mul at 0x7f9b1a1f9d30>: None, <function pow at 0x7f9b1a1f9e50>: None, <function min_two_tensors_input at 0x7f9b1a1f9940>: None, <function unsqueeze at 0x7f9b1a1f9280>: None, <function topk at 0x7f9b1a203280>: None, <function adaptive_avg_pool2d at 0x7f9b1a1f5dc0>: None, <function avg_pool2d at 0x7f9b1a1f5e50>: None, <function reshape at 0x7f9b1a203550>: None, <function slice_tensor at 0x7f9b1a1fee50>: None, <function split at 0x7f9b1a1fec10>: None, <function linear at 0x7f9b1a1f51f0>: None, <function clamp at 0x7f9b1a1f93a0>: None, <function tuple_construct at 0x7f9b1a1fed30>: None, <function contiguous at 0x7f9b1a1f9430>: None, <function getitem at 0x7f9b1a203310>: None, <function cat at 0x7f9b1a1f9310>: None, <function transpose at 0x7f9b1a1f94c0>: None, <function matmul at 0x7f9b1a1f98b0>: None, <function sigmoid at 0x7f9b1a1fe1f0>: None, <function permute at 0x7f9b1a1f9670>: None, <function quantize_per_tensor at 0x7f9b1a1f9b80>: None, <function dequantize at 0x7f9b1a1f99d0>: None, <function sign at 0x7f9b1a1f5ee0>: None}`

Reviewed By: 842974287

Differential Revision: D30798047

fbshipit-source-id: 69076a550874425b7186fbbf2ecf03da4a99b42f
2021-09-09 21:08:57 -07:00
88c0ea9131 [PyTorch] Fix missing move in torch::jit::Lexer::next (#64653)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64653

Saves shared_ptr refcount inc/dec in SourceRange.
ghstack-source-id: 137608457

Test Plan: Profiled startup of framework overheads benchmark from high_per_models; self time spent in next() is way down.

Reviewed By: dhruvbird

Differential Revision: D30739240

fbshipit-source-id: ac455678c9d46e657b111d3788d4369983028674
2021-09-09 19:01:07 -07:00
b7b4f63bbc [PyTorch] Use std::find in the JIT lexer (#64652)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64652

If nothing else, it is slightly clearer code.
ghstack-source-id: 137608456

Test Plan: CI

Reviewed By: dhruvbird

Differential Revision: D30739239

fbshipit-source-id: bc7917b59883ca4a33fc6916b4e422bad79cf04b
2021-09-09 18:59:27 -07:00
a17d6c7f80 [TensorExpr] Simplify TE IR before applying any transformations. (#64717)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64717

This also exposed several bugs, which are fixed in this PR.

Differential Revision:
D30826408
D30826408

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: a67ec5739aceed9ffdf0d24f77eb3787cefe4560
2021-09-09 18:50:51 -07:00
ef2c9d7d8a [quant][fix] Fix quantization for sub_scalar (#64603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64603

We'll insert observer only when both the operator and dtype is supported

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_sub_scalar

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30797025

fbshipit-source-id: a77c21e2749405534fc245374cf33a0657a3d2c8
2021-09-09 17:18:31 -07:00
1b5b210f2c [Android] print type name for IValues (#64602)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64602

print type name in error message for easier debugging.

Test Plan:
Example:
java.lang.IllegalStateException: Expected IValue type Tensor, actual type TensorList

Reviewed By: beback4u

Differential Revision: D30782318

fbshipit-source-id: 60d88a659e7b4bb2b574b12c7652a28f0d5ad0d2
2021-09-09 17:06:15 -07:00
11ef68938c [caffe2][tiny] Add logging to report what the current lengths are when mismatched lengths are detected (#64768)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64768

as title

Differential Revision: D30846637

fbshipit-source-id: 266768c81b315fdebba854135ea2db1faf67fd6a
2021-09-09 16:46:55 -07:00
d4b09dbab3 [doc][hackathon] To add Adagrad Optimizer to the documentation (#63254)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Adagrad to the documentation.  For more details, we refer to the paper
http://jmlr.org/papers/v12/duchi11a.html

<img width="658" alt="AdaGradAlgo" src="https://user-images.githubusercontent.com/73658284/132743276-a52ea3fb-70a5-4788-94b7-f99367907a26.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63254

Reviewed By: albanD

Differential Revision: D30852139

Pulled By: iramazanli

fbshipit-source-id: 9e496560a97e92be8386585b01d9bd3bba4b0c66
2021-09-09 15:41:29 -07:00
9ad75281f6 [Static Runtime] Fix resize_output_check warning coming from prim::VarConcat (#64765)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64765

Test Plan: Tested the fix with BR v1 model predictor-replayer setup.

Reviewed By: ajyu

Differential Revision: D30846506

fbshipit-source-id: 3ef3c93f11285c7cd1e2b188ca298a7ab4fba579
2021-09-09 14:38:50 -07:00
7f1932e1b9 Rename profiler metadata key (#63743)
Summary:
rename metadata key to be the same with variable name

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63743

Reviewed By: albanD

Differential Revision: D30839501

Pulled By: gdankel

fbshipit-source-id: b9b4e670dcc9557b8d8d0730baea0ad39a1a0ca4
2021-09-09 13:06:16 -07:00
6cc8cc6e56 Add support for lowering info during serialize_module, and add padding/partial to it (#5810)
Summary:
Pull Request resolved: https://github.com/pytorch/glow/pull/5810

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64725

- Any info added to the dict in node.meta["lowering_info"] will be added to the node_rep during serialization.
- Use this to add annotations on placeholders that allow partial inputs and require padding.
- Check for these annotations and set them in the NNPICompiledFunction as expected

Test Plan: Validated working on inline_cvr in stack. Additionally existing fx_glow end to end tests should still pass.

Reviewed By: 842974287

Differential Revision: D30824192

fbshipit-source-id: def64ef097aa35c337abb494415f7d437c6c7fa9
2021-09-09 13:01:28 -07:00
d43fb75a21 cat_shape_check: Fixes dimension in the error message for CUDA cat shape check and removes unnecessary offending index information (#64556)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/64207

Thank you, SsnL for providing the reproducing script.

cc ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64556

Reviewed By: albanD

Differential Revision: D30843859

Pulled By: ngimel

fbshipit-source-id: 457ebe80eaef793d9f5d35ee962b6697e5de1907
2021-09-09 12:51:11 -07:00
2c243ed112 Enable the on-demand performance PR testing to run on a specified TB branch (#64701)
Summary:
This is to enable performance testing of experimental features such as LazyTensor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64701

Test Plan:
TorchBench CI

RUN_TORCHBENCH: BERT_pytorch, mobilenet_v3_large
TORCHBENCH_BRANCH: v1.0

Reviewed By: seemethere

Differential Revision: D30847389

Pulled By: xuzhao9

fbshipit-source-id: 6853b368fa6f1ba8ffde517805c74bf318dcb35b
2021-09-09 12:41:21 -07:00
517033916c .github: Remove add_annotations workflow (#64449)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64449

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: suo, janeyx99

Differential Revision: D30738460

Pulled By: seemethere

fbshipit-source-id: f1259fcba2f0c14a9bcfbe811ec0a4bf61106619
2021-09-09 12:22:12 -07:00
9797a32faf [Dist/CI] Remove dist from target determinator (#64721)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64721

There are a couple PRs where distributed CI did not runa nd we expect
it to. Examples:

https://github.com/pytorch/pytorch/pull/64513/checks?check_run_id=3539190960,
https://github.com/pytorch/pytorch/pull/64113. All distributed tests should've
been run on these PRs, but we can see they were not:

```
Determination is skipping distributed/test_c10d_common
Determination is skipping distributed/test_c10d_gloo
Determination is skipping distributed/test_c10d_nccl
Determination is skipping distributed/test_c10d_spawn_gloo
Determination is skipping distributed/test_c10d_spawn_nccl
Running distributed/test_data_parallel without determination
Determination is skipping distributed/test_distributed_spawn
Determination is skipping distributed/test_jit_c10d
```

Since it is important to run distributed tests on PRs that touch distributed,
exclude distributed from target_det_list for now.
ghstack-source-id: 137654015

Test Plan: CI

Reviewed By: driazati, mrshenli

Differential Revision: D30830455

fbshipit-source-id: 8b0fdf5b57c2c647b0d82c48e2bb8e2bdbe4d307
2021-09-09 12:07:43 -07:00
46c886e8a6 fix acc topk's handling of the case when dim=0, fix tests as well (#64727)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64727

the acc ops convertor for topk has a subtle bug (i found this while trying to introduce max/min)
the code does not differentiate between dim == None and dim ==0, but these are both different computations

Reviewed By: jfix71, 842974287

Differential Revision: D30833621

fbshipit-source-id: 6cd84e6ca4e95bb1a6d6465e61830b76808a9c78
2021-09-09 10:35:23 -07:00
3d3ff4a9e7 Fix a shadowed variable (#64695)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64695

Resolves this warning:
```
caffe2/aten/src/ATen/ParallelOpenMP.h:109:63: warning: declaration of 'int64_t begin' shadows a parameter [-Wshadow=compatible-local]
  109 |   internal::invoke_parallel(begin, end, grain_size, [&](int64_t begin, int64_t end) {
      |                                                       ~~~~~~~~^~~~~
caffe2/aten/src/ATen/ParallelOpenMP.h:86:1: note: shadowed declaration is here
   85 | inline scalar_t parallel_reduce(
      |                 ~~~~~~~~~~~~~~~~
   86 |     const int64_t begin,
      | ^   ~
```

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D30816128

fbshipit-source-id: 3adff6d94eea9fbd65885e88283cae10b87dba18
2021-09-09 10:34:01 -07:00
8deaa476ac Added more version comparison operations (#63848)
Summary:
Currently the [TorchVersion](1022443168/torch/torch_version.py (L13)) only only supports 'greater than', and 'equal to' operations for comparing torch versions and something like `TorchVersion('1.5.0') < (1,5,1)` or `TorchVersion('1.5.0') >= (1,5)` will throw an error.

I have added 'less than' (`__lt__()`), 'greater than or equal to' (`__ge__()`) and 'less than or equal to' (`__le__()`) operations, so that the TorchVersion object can be useful for wider range of version comparisons.

cc seemethere zsol

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63848

Reviewed By: fmassa, heitorschueroff

Differential Revision: D30526996

Pulled By: seemethere

fbshipit-source-id: 1db6bee555043e0719fd541cec27810852590940
2021-09-09 10:30:20 -07:00
cfa6162e5e Reverts cat and stack warning when out= is not the expected shape (#64714)
Summary:
These warnings are being thrown too aggressively at the moment. See https://github.com/pytorch/pytorch/issues/64709 for a follow-up to reenable them once internal call sites are reviewed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64714

Reviewed By: ngimel

Differential Revision: D30822965

Pulled By: mruberry

fbshipit-source-id: 3ad7c92d381d42ac6187ed84afab477c579a8f35
2021-09-09 10:03:22 -07:00
2b41bf40c5 To add SequentialLR to PyTorch Core Schedulers (#64037)
Summary:
Partially resolves https://github.com/pytorch/vision/issues/4281

In this PR we are proposing a new scheduler --SequentialLR-- which enables list of different schedulers called in different periods of the training process.

The main motivation of this scheduler is recently gained popularity of warming up phase in the training time. It has been shown that having a small steps in initial stages of training can help convergence procedure get faster.

With the help of SequentialLR we mainly enable to call a small constant (or linearly increasing) learning rate followed by actual target learning rate scheduler.

```PyThon
scheduler1 = ConstantLR(optimizer, factor=0.1, total_iters=2)
scheduler2 = ExponentialLR(optimizer, gamma=0.9)
scheduler = SequentialLR(optimizer, schedulers=[scheduler1, scheduler2], milestones=[5])

for epoch in range(100):
    train(...)
    validate(...)
    scheduler.step()
```

which this code snippet will call `ConstantLR` in the first 5 epochs and will follow up with `ExponentialLR` in the following epochs.

This scheduler could be used to provide call of any group of schedulers next to each other. The main consideration we should make is every time we switch to a new scheduler we assume that new scheduler starts from the beginning- zeroth epoch.

We also add Chained Scheduler to `optim.rst` and `lr_scheduler.pyi` files here.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64037

Reviewed By: albanD

Differential Revision: D30841099

Pulled By: iramazanli

fbshipit-source-id: 94f7d352066ee108eef8cda5f0dcb07f4d371751
2021-09-09 09:36:32 -07:00
c3203efe80 [pytorch] Make qlinear weight packing thread safe (#63804)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63804

Adding a lock around weight packing section of qlinear + qlinear_dynamic

Test Plan: automated tests

Reviewed By: kimishpatel

Differential Revision: D30340957

fbshipit-source-id: 1c9faf796c4ffbc74345396188a6f1154a76bea6
2021-09-09 09:31:48 -07:00
dc53546655 torch.lu_solve: forward AD support (#64646)
Summary:
As per title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64646

Reviewed By: VitalyFedyunin

Differential Revision: D30807898

Pulled By: albanD

fbshipit-source-id: 1f943c22357dd1b3662cfe0d2a26af68e3a2df4c
2021-09-09 08:58:00 -07:00
b7c86365d1 [nnc] Handled cast in index expression during inlining (#64716)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64716

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30826388

Pulled By: navahgar

fbshipit-source-id: 7e446602f650527e0d954e437f0370602019e040
2021-09-09 08:30:52 -07:00
652a8bf7d0 [nnc] Updated indices during broadcast to use int64_t (#64627)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64627

This fixes the root cause of S242719

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30801686

Pulled By: navahgar

fbshipit-source-id: b6d3ebdc7eb57116eaced53c2f35c7798bb17e80
2021-09-09 08:29:37 -07:00
459653a0f6 Revert D30745921: [DDP] Fix when buffers are reassigned in module
Test Plan: revert-hammer

Differential Revision:
D30745921 (d59ecc02df)

Original commit changeset: 25eb1edbf445

fbshipit-source-id: 343ead86bf1e2d0b2d4124be331ea2fa437303ad
2021-09-09 08:23:16 -07:00
5bc53ac5ef Revert D30745961: [DDP] Remove self.modules_params
Test Plan: revert-hammer

Differential Revision:
D30745961 (8c09510294)

Original commit changeset: 32d102502570

fbshipit-source-id: 59f7cc50d369b6cc2856cf4ebd0f58b96202336d
2021-09-09 08:23:14 -07:00
f1aaf8afcd Revert D30745960: [DDP] Remove SPMD from self.modules_buffers
Test Plan: revert-hammer

Differential Revision:
D30745960 (1553259520)

Original commit changeset: 66a8f9847e9f

fbshipit-source-id: d3f3fb813c45ac1b0ff15c6154b2e99e5dbab433
2021-09-09 08:22:12 -07:00
3bf93d769c [JIT] Add gradient check in constants (#64613)
Summary:
fixes internal issue

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64613

Reviewed By: Gamrix

Differential Revision: D30799016

Pulled By: eellison

fbshipit-source-id: 48ef52d1cac627919e6cd232216d24878a2a8b58
2021-09-09 08:13:57 -07:00
d4b1016850 Filter out _disabled_torch_function_impl from handle_torch_function (#64689)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64689

This brings it in line with the C++ implementation.

Fixes https://github.com/pytorch/pytorch/issues/64687

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30816215

Pulled By: ezyang

fbshipit-source-id: ed36af6c35467ae678d9548197efd97c36d38dec
2021-09-09 07:29:09 -07:00
239366c9c2 To add Rectified Adam Description to Documentation (#63772)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Rectified Adam Algorithm to the documentation.  For more details, we refer to the paper  https://arxiv.org/abs/1908.03265

<img width="446" alt="RadamAlgo" src="https://user-images.githubusercontent.com/73658284/132587815-4764b642-df53-4e41-975f-72e0f40fdc48.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63772

Reviewed By: datumbox

Differential Revision: D30839694

Pulled By: iramazanli

fbshipit-source-id: 6f5629ce56e10c66a451433334b587b99eda1610
2021-09-09 07:10:36 -07:00
5b21f172a4 [doc][hackathon] To add AdamW Optimizer to the documentation (#63252)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of AdamW Algorithm to the documentation.  For more details, we refer to the paper  here https://arxiv.org/abs/1711.05101

<img width="442" alt="AdamWalgo" src="https://user-images.githubusercontent.com/73658284/132589957-6d381e96-cb62-40d0-990f-82a32ec455be.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63252

Reviewed By: datumbox

Differential Revision: D30839685

Pulled By: iramazanli

fbshipit-source-id: 1a426c874ab86408d286a34f41aefcf5b21167c0
2021-09-09 07:05:31 -07:00
39ce801d1f To add Adamax algorithm to documentation (#63903)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Adamax Algorithm to the documentation.  For more details, we refer to the paper  https://arxiv.org/abs/1412.6980

<img width="447" alt="Adamx" src="https://user-images.githubusercontent.com/73658284/132577306-878ce64c-627a-4086-808c-d0482868d4a1.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63903

Reviewed By: albanD

Differential Revision: D30819055

Pulled By: iramazanli

fbshipit-source-id: 37f748cbea9f93bf37193ee30fc295fb1a1e9ffd
2021-09-09 06:42:33 -07:00
15c21fa45f [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D30835585

fbshipit-source-id: a7d35319fd3ae3eddd29b69d299d842f68d587f6
2021-09-09 04:23:50 -07:00
233e3e5bb4 Fix lop1p lowering bug (#64724)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64724

`1` will introduce a int tensor instead of float tensor, which doesn't work well with downstream operators (elementwise). Error would be like
```
[TensorRT] WARNING: IElementWiseLayer with inputs (Unnamed Layer* 1) [Unary]_output and (Unnamed Layer* 2) [Constant]_output: first input has type Float but second input has type Int32.
```
Changing the constant to be float type fixes this.

Reviewed By: 842974287

Differential Revision: D30796959

fbshipit-source-id: 0538e4dd960df9ce87a2d4cafe8f1a0c061b6bad
2021-09-09 00:59:44 -07:00
d0b207e68b Migrate uses of THCReduceApplyUtils to cuda_utils::BlockReduce (#64713)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64713

Resubmit of #64442

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D30825646

Pulled By: ngimel

fbshipit-source-id: 66b06bd0b30b401833e337920681d19d96b11f9d
2021-09-08 22:09:01 -07:00
1553259520 [DDP] Remove SPMD from self.modules_buffers (#64474)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64474

No need for a nested list here.
ghstack-source-id: 137526312

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D30745960

fbshipit-source-id: 66a8f9847e9fe1e02c51b79647e93bf7665cf4d9
2021-09-08 19:16:15 -07:00
8c09510294 [DDP] Remove self.modules_params (#64473)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64473

Unused after SPMD deprecated.
ghstack-source-id: 137526305

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D30745961

fbshipit-source-id: 32d102502570291e01579e5b47a6d74dc71013bb
2021-09-08 19:16:13 -07:00
d59ecc02df [DDP] Fix when buffers are reassigned in module (#64472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64472

Sometimes, user module can reassign tensor buffer, as in:

```
self.buffer = torch.randn(1, 2) # in init
self.buffer += 1 # in forward
```

in this case, `self.modules_buffers` will become outdated and we should
repopulate self.modules_buffers if we need to sync module buffers.

See https://github.com/pytorch/pytorch/issues/63916 for full description of the
issue.
ghstack-source-id: 137526309

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D30745921

fbshipit-source-id: 25eb1edbf445703a481802e07f3058d38ea6fc64
2021-09-08 19:14:55 -07:00
b6544ef815 [PyTorch] Fix MobileDebugInfo vector copy (#64030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64030

ghstack-source-id: 137566816

Test Plan:
Pixel 3 before:  https://our.intern.facebook.com/intern/aibench/details/320277034999340
Pixel 3 after: https://our.intern.facebook.com/intern/aibench/details/724509739115867

can see the vector copy disappear in the flame graph. Overall mean decreased from 354 ms to 348 ms (though I'm not sure if this is outside usual noise).

Reviewed By: raziel

Differential Revision: D30559032

fbshipit-source-id: 6d8bb5396d3449cc63023ee7acf694b5d146ddc1
2021-09-08 18:32:50 -07:00
0d0d2f2ac5 [PyTorch] move from input ivalues in ByteCodeDeserializer (#64029)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64029

This should save us a separate pass over the data structure to destroy it.
ghstack-source-id: 137566821

Test Plan:
Pixel3
before:
https://www.internalfb.com/intern/aibench/details/503337445067962
after:
https://our.intern.facebook.com/intern/aibench/details/320277034999340

overall mean time decreased from 373 ms to 358 ms. In flame graph, we
can see that some time spent destroying a vector of IValues was moved
into parseMethods, and the new parseMethods time is less than the old
time plus the recursive destruction time.

Reviewed By: dhruvbird

Differential Revision: D30559530

fbshipit-source-id: d080295a846745ea03ac50f08f4f6c95f4eaf3d8
2021-09-08 18:32:48 -07:00
f5e76b4e38 [PyTorch] Copy vectors less in Function::append_operator (#63977)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63977

Doesn't seem to be any reason to copy these argument vectors.
ghstack-source-id: 137566815

Test Plan: CI

Reviewed By: dhruvbird, raziel

Differential Revision: D30550301

fbshipit-source-id: 33c199f975e4fb62c50a8210dc08aa9bb7a3e2f2
2021-09-08 18:31:38 -07:00
0ef32625a8 [FX] make visualizer produce different formatted output (#64699)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64699

Previously we just hardcode to svg format. We should give folks a choice in terms of what format they want to see. If we give a weird extension like .abc and this will error out and we expect this to be the right behavior.

Reviewed By: houseroad

Differential Revision: D30718883

fbshipit-source-id: fe8827262f94ea6887999bb225de763d1909eef8
2021-09-08 18:22:12 -07:00
86e3b2727e Re-enable nightly doc pushes (#64708)
Summary:
That were accidentally disabled by https://github.com/pytorch/pytorch/pull/64222

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64708

Reviewed By: seemethere

Differential Revision: D30822089

Pulled By: malfet

fbshipit-source-id: 056b5c006f236c78ffe8afa4a5eab2f35e1bce89
2021-09-08 18:07:54 -07:00
9a6c2a75b8 [acc_tracer] Enable check_mutable_operations (#64456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64456

att

Test Plan: CI

Reviewed By: protonu

Differential Revision: D30679174

fbshipit-source-id: 73f3a07d58380cd44fb3481aa97d463c0a964de8
2021-09-08 16:11:15 -07:00
5c27a580ec [tensorexpr] Allocate intermediate buffers at compile time (#64227)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64227

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30652220

Pulled By: huiguoo

fbshipit-source-id: cd75005cdfa42751318de7174b44e14a3a01634e
2021-09-08 15:34:44 -07:00
527348a6fe [tensorexpr] Add 'is_allocated' flag for buffers and use it to insert 'Alloc/Free' stmts (#64226)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64226

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30652221

Pulled By: huiguoo

fbshipit-source-id: ef9bb0e3db2c444b476e5fc23956bc34ae0f0111
2021-09-08 15:34:42 -07:00
f90153cda3 [acc_normalizer] Improve error when kwarg normalization fails (#64408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64408

att

Test Plan: NFC

Reviewed By: protonu

Differential Revision: D30716392

fbshipit-source-id: e1c3bb1afcd5363a9d502549d8a46b90226be40c
2021-09-08 15:33:32 -07:00
4533e76e7c Update breakpad to an existing commit: 7d188f6 (#64666)
Summary:
Fixes issue https://github.com/pytorch/pytorch/issues/64561

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64666

Reviewed By: driazati

Differential Revision: D30814127

Pulled By: hyuen

fbshipit-source-id: 511a30fc26153569b1cd39f34e4a1a6bb99cc5e4
2021-09-08 15:29:10 -07:00
149f1114fe To add Stochastic Gradient Descent to Documentation (#63805)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Stochastic Gradient Descent to the documentation.

<img width="466" alt="SGDalgo" src="https://user-images.githubusercontent.com/73658284/132585881-b351a6d4-ece0-4825-b9c0-126d7303ed53.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63805

Reviewed By: albanD

Differential Revision: D30818947

Pulled By: iramazanli

fbshipit-source-id: 3812028e322c8a64f4343552b0c8c4582ea382f3
2021-09-08 15:22:30 -07:00
ff18195df9 .github: Upgrade windows CUDA 10.1 -> 10.2 (#64658)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64658

We don't release 10.1 anymore so let's bump to 10.2

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet, janeyx99

Differential Revision: D30811178

Pulled By: seemethere

fbshipit-source-id: c504ebf7f0d4c0d6229319d774f808b4ba0facd9
2021-09-08 14:43:33 -07:00
cc0565326c Add plugin for linalg norm operation (#64611)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64611

Add plugin for torch.linalg.norm, this plugin correctly only support norm operation without batch_size change, so vector input or matrix input with dim including '0' is not supported with this plugin.

Test Plan: Unit test

Reviewed By: 842974287

Differential Revision: D30525958

fbshipit-source-id: 0d66b60a390bb6235166e5a80390090d0acf691a
2021-09-08 14:33:20 -07:00
a97015f22c Revert D30735341: Migrate uses of THCReduceApplyUtils to cuda_utils::BlockReduce
Test Plan: revert-hammer

Differential Revision:
D30735341 (a5ad08ec70)

Original commit changeset: 3cb58bed8f1f

fbshipit-source-id: 874dd0f93b24a99694db42a15714834069d402bc
2021-09-08 14:27:40 -07:00
b12150608e [fx] make const fold code more pythonic (#64451)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64451

No functional change.

Test Plan:
```
buck test caffe2/test:fx_const_fold
```

Reviewed By: jfix71, RoshanPAN, houseroad

Differential Revision: D30718255

fbshipit-source-id: 95f98561c7f33fcc6c839db68683c85eb152c949
2021-09-08 13:55:10 -07:00
24e1315d4b [quant] Enable jit tracing on quantizable LSTM (resubmission) (#64638)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64638

The quantizable LSTM didn't support jit tracing because it had several non taceable paths. We sacrifice some of the user experience to enable the tracing.
The main UX feature removed is a user-friendly message when trying to access the backwards path in a bidirectional LSTM: When the bidirectional flag is False, we used to throw a nice error message when the user tried accessing backwards weights. Now the message is default (removed properties).

Test Plan: `buck test mode/dev //caffe2/test:quantization -- test_custom_module_lstm`

Reviewed By: HDCharles

Differential Revision: D30803753

fbshipit-source-id: a639955a96cee22538d9436f1c952a5d121f50f9
2021-09-08 13:34:18 -07:00
d701357d92 Factor out TensorBase that doesn't depend on native operators (#63612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63612

This makes Tensor inherit from a new class TensorBase, that provides a subset of Tensor that doesn't
directly depend on native_functions.yaml. Code that only includes TensorBase.h with thus not need to
be rebuilt every time someone changes an operator signature.

Making `Tensor` inherit from this class means that `const TensorBase&` parameters will be callable
with an ordinary `Tensor`. I've also made `Tensor` constructible and assignable from `TensorBase` to
minimize friction in code mixing the two types.

To help enforce that `Tensor.h` and `Functions.h` aren't accidentally included, I've added an error
into `Operators.h` if `TORCH_ASSERT_NO_OPERATORS` is defined. We can either set this in the build
system for certain folders, or just define it at the top of any file.

I've also included an example of manually special-casing the commonly used `contiguous` operator.
The inline function's slow path defers to `TensorBase::__dispatch_contiguous` which is defined in
`Tensor.cpp`. I've made it so `OptionalTensorRef` is constructible from `TensorBase`, so I can
materialize a `Tensor` for use in dispatch without actually increasing its refcount.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30728580

Pulled By: ezyang

fbshipit-source-id: 2cbc8eee08043382ee6904ea8e743b1286921c03
2021-09-08 13:28:54 -07:00
92318a9116 Make doc previews use its own S3 bucket (#64594)
Summary:
We had been using the gha-artifacts bucket (which previously only stored workflow artifacts) to keep the docs around. This makes it hard to see how our storage for artifacts vs docs is trending.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64594

Reviewed By: seemethere

Differential Revision: D30794328

Pulled By: driazati

fbshipit-source-id: 6b2721a3d76e8a273bde055783d56551f8409edd
2021-09-08 11:36:50 -07:00
43c0f033fc TST Adds inplace checks to module_info (#63739)
Summary:
Follow up to https://github.com/pytorch/pytorch/pull/61935

This PR adds inplace checks to `test_modules`. This version checks the constructor for `inplace` and performs the check automatically.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63739

Reviewed By: saketh-are

Differential Revision: D30737774

Pulled By: jbschlosser

fbshipit-source-id: 8813534511e9296c8424d1ca878412726ddd4043
2021-09-08 11:08:12 -07:00
a5ad08ec70 Migrate uses of THCReduceApplyUtils to cuda_utils::BlockReduce (#64442)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64442

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D30735341

Pulled By: ngimel

fbshipit-source-id: 3cb58bed8f1f5aa32fd49fd37b10c8490bcc645a
2021-09-08 11:02:12 -07:00
deb9775c07 .github: Run docker containers in detach mode (#64459)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64459

Should allow users to exec into the docker container if using with-ssh,
even if the build / test command has finished executing

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30742797

Pulled By: seemethere

fbshipit-source-id: 969ed8799216c6051439c7d41ab709b2d40938ac
2021-09-08 11:01:08 -07:00
18d24bb537 [NNC] Add Softplus operator (#64589)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64589

Adding softplus operator lowering for NNC. Enabling element wise fusion as well.

Test Plan: Added a test in test_jit_fuser.py

Reviewed By: bertmaher

Differential Revision: D30736449

fbshipit-source-id: 6c5fc3bceb5cef2322ecd4449f827e4af018ea93
2021-09-08 10:49:58 -07:00
35413a16f7 Add __matmul__ to the magic methods for FX tracing (#64512)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64483

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64512

Reviewed By: mrshenli

Differential Revision: D30797265

Pulled By: Chillee

fbshipit-source-id: 7630e048a960e0b27c4309d04d85301abe325189
2021-09-08 10:03:48 -07:00
195cb4efa8 update scatter formula (#64546)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63430

Already tested OpInfo gradient tests
544c8e6a5d/torch/testing/_internal/common_methods_invocations.py (L8575-L8577)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64546

Reviewed By: saketh-are

Differential Revision: D30768759

Pulled By: albanD

fbshipit-source-id: 27d144971c51a956a232fc7d02df5c9d2706d565
2021-09-08 10:02:35 -07:00
1409492fdb fixing trapezoid() comments for clarity (#64592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64592

cc mruberry rgommers heitorschueroff

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D30785663

Pulled By: NivekT

fbshipit-source-id: e968687fbb83a59bb46ce6858c6caafa5aa04412
2021-09-08 09:45:46 -07:00
dd8f6ac597 Add forward mode differentiation for torch.linalg.cholesky and transpose (#62159)
Summary:
This PR adds forward mode differentiation for `torch.linalg.cholesky`, `torch.linalg.cholesky_ex`, and `transpose` functions.
Complex tests for Cholesky fail because for some reason the gradcheck sends matrices full of zeros to `cholesky_jvp` function.

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry heitorschueroff walterddr IvanYashchuk xwang233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62159

Reviewed By: mrshenli

Differential Revision: D30776829

Pulled By: albanD

fbshipit-source-id: 32e5539ed6423eed8c18cce16271330ab0ea8d5e
2021-09-08 09:44:30 -07:00
a2934b38f8 Fix typo embedding_renorm_cuda_ (#64542)
Summary:
Fixes #{issue number}

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64542

Reviewed By: mrshenli

Differential Revision: D30792842

Pulled By: ngimel

fbshipit-source-id: c9a548256d02b3ce6fb77dd9fb058084f2c91608
2021-09-08 09:36:24 -07:00
e0e832c2ba [c10d] Provide failure reason from ProcessGroup when aborting NCCL comm (#64241)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64241

When things go wrong PG NCCL aborts nccl communicators via `ncclCommAbort`, but one issues is that often the error can be set to `ncclSystemError` (see  https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/c10d/NCCLUtils.hpp#L176) when that might not be the true cause of the issue and the actual issue is that some prior work timed out, communicator was aborted on other rank, etc.

This results in a lot of confusion when debugging jobs with a large no. of processes as the current message for ncclSystemError is not very informative: https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/c10d/NCCLUtils.hpp#L22

The fix here is to pass in a string exception message from PG NCCL down to `NCCLUtils` which will aim to raise that as the actual issue and not the confusing `ncclSystemError` message.

Test Plan: CI

Reviewed By: pallab-zz, cbalioglu

Differential Revision: D30658855

fbshipit-source-id: 17661dbe0a1bb8cc5b87b637c47634b1f52f54e1
2021-09-08 09:19:24 -07:00
7205ca0210 Change MaxUnpool to accept tensors with 0-dim batch sizes. (#64082)
Summary:
Part of the fix for https://github.com/pytorch/pytorch/issues/38115.

Changes the `MaxUnpool` module to work with 0-dimensions batch sizes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64082

Reviewed By: mrshenli

Differential Revision: D30793907

Pulled By: jbschlosser

fbshipit-source-id: d21aa665be5aa18f592b39ef7b4e3cbc632e21ed
2021-09-08 08:41:09 -07:00
ba8c1fc648 Add Half conversion of bit cast for SYCL kernel (#64340)
Summary:
## Motivation
Enhance the performance of Half/float conversion in SYCL kernels.

## Solution
Add the native SYCL half type to help convert the half from/to float in the kernel code.

## Additional Context
`__SYCL_DEVICE_ONLY__` is a MACRO only valid when compiling the kernel code for SYCL backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64340

Reviewed By: gchanan

Differential Revision: D30720823

Pulled By: ezyang

fbshipit-source-id: e7e770d02df5b2d45da61d2fed3ba59383b3dc3a
2021-09-08 08:25:47 -07:00
7f0feafa55 [nnc] Provide helpful error messages about turning off the fuser (#64516)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64516

If fuser compilation fails due to a bug (which should be highly
unlikely at this point) we want to direct the user how to unblock themselves by
disabling fusion, in addition to requesting that they report a bug.
ghstack-source-id: 137398537

Test Plan: existing tests

Reviewed By: ZolotukhinM

Differential Revision: D30758051

fbshipit-source-id: 98be89f1b1d4fb3bc816f5b2634c618b9297930e
2021-09-08 08:10:22 -07:00
768014b3e6 Allow disabling cache in autocast (automatic mixed precision) (#63552)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63552

In this PR, we want to exclude these 2 cases in the `Autocast` weight cache usages:

- Using `torch.jit.trace` under the `Autocast`
As report in https://github.com/pytorch/pytorch/issues/50231 and several other discussions, using `torch.jit.trace` under the `Autocast`, the trace process would hit Autocast's weight cache and fails. So we should disable weight cache under the trace process.
- Using `Autocast` with `Grad mode`

  - Usually we are using `Grad mode` for training. Since in the training phase, the weight will change in every step. So we doesn't need to cache the weight.
  - For the recommended `Autocast` training case in the [doc](https://pytorch.org/docs/stable/amp.html), `Autocast` will clear the cache every step leaving the context. We should disable it to save the clear operations.
    ```
    model = Net().cuda()
    optimizer = optim.SGD(model.parameters(), ...)

    for input, target in data:
        optimizer.zero_grad()
        with autocast():
            output = model(input)
            loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
    ```

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D30644913

Pulled By: ezyang

fbshipit-source-id: ad7bc87372e554e7aa1aa0795e9676871b3974e7
2021-09-08 07:47:18 -07:00
b616132403 Adding support for lowering 4Bit EmbeddingBag Operator (#5806)
Summary:
Pull Request resolved: https://github.com/pytorch/glow/pull/5806

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64001

Add 4 bit embeddingbag operator in  acc_ops.

Test Plan: Let CI run.

Reviewed By: jfix71

Differential Revision: D30532824

fbshipit-source-id: bf476c9710477792aae202dacf64e23539c33bd9
2021-09-08 07:13:16 -07:00
2223737da9 restore test_inplace_comparison_ops_require_inputs_have_same_dtype Expected behavior (#64267)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64267

This test expects every operation to throw a runtime error.

And Reinsert in-place operation test,Fix bug for comparison operation

fix: #64018

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30720915

Pulled By: ezyang

fbshipit-source-id: 215a6556d20770f70f4ced1c1f9a9753933f1d37
2021-09-08 06:42:12 -07:00
9cc44aad21 [quant] AO migration of the quantize.py (resubmission) (#64445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64445

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates the quantize.py from torch.quantization to torch.ao.quantization.
At this point both locations will be supported. Eventually the torch.quantization will be deprecated.

Test Plan: `buck test mode/dev //caffe2/test:quantization`

Reviewed By: HDCharles

Differential Revision: D30734870

fbshipit-source-id: dc204f3cc46bff2cc81c95159eab9d333b43bb4b
2021-09-08 04:58:47 -07:00
72274e2a2f [TensorExpr] Don't rely on exceptions in Vectorizer. (#64609)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64609

We've been using exceptions to indicate whether vectorization succeeded
or not, but that posed some problems with (e.g. we spent too much time
symbolicazing these exceptions). This change converts this mechanism to
a standard error return code.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30795342

Pulled By: ZolotukhinM

fbshipit-source-id: 16e38b37bcdd78ceb438ac814cc377f35b058e17
2021-09-08 00:25:34 -07:00
2341ec9ef1 [fx_const_fold] Fix constant folding for attrs in submodule hierarchies (#64342)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64342

Previously we weren't handling the case where an attribute was in a module that wasn't the root.

Test Plan: Added unit test coverage.

Reviewed By: yinghai

Differential Revision: D30691730

fbshipit-source-id: b39b5cf748c4c882f315a4f32b51ad88cc7a43ed
2021-09-07 22:44:39 -07:00
5721205417 Add __ge__ to TorchVersion (#64565)
Summary:
This PR adds greater equal comparison so that not the base class's (str) comparison method is used.
This is necessary for a correct comparison with a version string.

Previously the following was the case:
```py
>>> torch.__version__
'1.10.0.dev20210830+cpu'
>>> torch.__version__>"1.9"
True
>>> torch.__version__>="1.9"
False  # Wrong output since the base class (str) was used for __ge__ comparison
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64565

Reviewed By: raghuramank100

Differential Revision: D30790463

Pulled By: mrshenli

fbshipit-source-id: 79c680f8b448001b34d3e5d5332124a78bea4e34
2021-09-07 20:16:09 -07:00
81fe2c5e49 add out variant of linear (#61801)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61801

resubmitting because the last one was unrecoverable due to making changes incorrectly in the stack

Test Plan: Imported from OSS

Reviewed By: desertfire

Differential Revision: D29812510

Pulled By: makslevental

fbshipit-source-id: ba9685dc81b6699724104d5ff3211db5852370a6
2021-09-07 19:58:52 -07:00
71ba76b1b5 Fix building docs instructions (#64508)
Summary:
Fixes #{64507}

Removed duplicate instruction and linted the file a bit (consistent spacing around codeblocks/headers, adding code types in codeblocks, remove `$` from bash code blocks when uncecessary).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64508

Reviewed By: raghuramank100

Differential Revision: D30791164

Pulled By: mrshenli

fbshipit-source-id: a00db32dcfdd1ecc194c836f31174c806062eb6d
2021-09-07 19:01:52 -07:00
4e98304eb9 Fix quicklint (#64612)
Summary:
Fixes land-race introduced by a22c936b63

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64612

Reviewed By: ngimel

Differential Revision: D30798648

Pulled By: malfet

fbshipit-source-id: ca546f68141d44493deba7bbf840e5f9662e8558
2021-09-07 18:52:22 -07:00
e777e1b01c Revert D29998114: [pytorch][PR] enable bf16 mkldnn path for gemm
Test Plan: revert-hammer

Differential Revision:
D29998114 (acc9f9afc8)

Original commit changeset: 459dc5874c63

fbshipit-source-id: 1994623a3afc22a94bd0cf5de766b023185f5238
2021-09-07 18:45:13 -07:00
1a033b45dd [JIT] Fix a bug of rejecting ops with AliasAnalysisKind::CONSERVATIVE (#64336)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64336

Currently AliasDB rejects any user-defined ops with `AliasAnalysisKind::CONSERVATIVE` if they do not have a special treatment for alias analysis. For example, the following alias schema gets rejects:

```
  m.def(torch::schema(
      "namescope::my_op(...) -> ...",
      c10::AliasAnalysisKind::CONSERVATIVE));
```

This rejection condition is contradictory: AliasDB can handle ops with `CONSERVATIVE` in a general way without any special casing at https://fburl.com/diffusion/op5u72sk calling https://fburl.com/diffusion/h3aws5dd which seems very appropriate to be conservative for alias analysis.

This change corrects the rejection condition to be satisfied for ops *with* special casing but have `CONSERVATIVE`, since they both cannot be used simultaneously.

Test Plan:
Confirmed that
```
  m.def(torch::schema(
      "namescope::my_op(...) -> ...",
      c10::AliasAnalysisKind::CONSERVATIVE));
```
gets accepted and `my_op`'s all inputs and outputs are put to point to wildcard(*) by AliasDB.

Reviewed By: eellison

Differential Revision: D30690121

fbshipit-source-id: 431cc1a84edd5227f52b44a0fd85d5eb16f3c288
2021-09-07 18:26:31 -07:00
8e1fdd4cd3 Add symbolic shape comparison optimization (#64300)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64300

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30738146

Pulled By: eellison

fbshipit-source-id: 96287798535b367f23d3e9430d70fc02c59744ab
2021-09-07 18:22:32 -07:00
474a51b6bf Refactor to use shape arguments (#64299)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64299

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30738141

Pulled By: eellison

fbshipit-source-id: 37ca30de81349ecf23d8656291863737b6ad6d96
2021-09-07 18:22:30 -07:00
bccbe310ef Add view with negative dim (#63516)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63516

how to review: pretty much just check that the inputs generated are a good representation of the op semantics, that should be sufficient for correctness, and then you can also double check the op size semantics by going to https://codebrowser.bddppq.com/pytorch/pytorch/ typing in native::{op_name} and looking at the op implementation as a bonus if you want

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30738143

Pulled By: eellison

fbshipit-source-id: c7cd01cb2c8a13cb2664415f3d98aedec19a8e07
2021-09-07 18:22:28 -07:00
5a1f8b8573 Generalize expand logic (#63615)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63615

how to review: pretty much just check that the inputs generated are a good representation of the op semantics, that should be sufficient for correctness, and then you can also double check the op size semantics by going to https://codebrowser.bddppq.com/pytorch/pytorch/ typing in native::{op_name} and looking at the op implementation as a bonus if you want

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30738148

Pulled By: eellison

fbshipit-source-id: 4ef74a9c9b39c0beb73949e63aa844c46ab637eb
2021-09-07 18:22:26 -07:00
5eb8cec663 Add permute, arange (#63407)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63407

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30738149

Pulled By: eellison

fbshipit-source-id: 36d572488408d38b0643aa93cb08aab5c45218ad
2021-09-07 18:22:24 -07:00
cf2d15bf84 Add support for slice, selec twith int, index_select (#63365)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63365

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30738144

Pulled By: eellison

fbshipit-source-id: 7e0c572209bdc6e62ecb4fd1f06f80291de69803
2021-09-07 18:22:22 -07:00
c8a608b197 Add squeeze, unsqueeze, transpose shape functins (#63099)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63099

These are checked by OpInfos, which represent all of the inputs and semantics of the operators so it should be an easy stamp

Test Plan: Imported from OSS

Reviewed By: desertfire, astaff

Differential Revision: D30347514

Pulled By: eellison

fbshipit-source-id: 37b4c9ecd8c222cc12bf39166181464b43218830
2021-09-07 18:22:19 -07:00
a39f3c68b7 Add batch of unary functions (#63050)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63050

Test Plan: Imported from OSS

Reviewed By: priyaramani, astaff

Differential Revision: D30347513

Pulled By: eellison

fbshipit-source-id: abaf641778671d17df87a2b7b47bad7501a91b5a
2021-09-07 18:21:04 -07:00
c1b701bc3e Back out "update rpc tensorpipe logic for sparse tensors" (#64575)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64575

Original commit changeset: daee9a567645

Test Plan: unit test

Reviewed By: gcramer23

Differential Revision: D30778736

fbshipit-source-id: 8d9386158fb6a3d025c149cdc37558d57c615e9f
2021-09-07 18:00:39 -07:00
566ee1217f Use trsm for triangular_solve in CPU (#63567)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63567

The current implementation called trtrs for CPU and trsm for CUDA.
See https://github.com/pytorch/pytorch/issues/56326#issuecomment-825496115 for a discussion on the differences between
these two functions and why we prefer trsm vs trtrs on CUDA.

This PR also exposes the `side` argument of this function which is used
in the second PR of this stack to optimise the number copies one needs to make
when preparing the arguments to be sent to the backends.

It also changes the use of `bool`s to a common enum type to represent
whether a matrix is transposed / conj transposed, etc. This makes the API
consistent, as before, the behaviour of these functions with `transpose=True`
and `conjugate_transpose=True` it was not well defined.
Functions to transform this type into the specific types / chars for the different
libraries are provided under the names `to_blas`, `to_lapack`, `to_magma`, etc.

This is the first of a stack of PRs that aim to improve the performance of
`linalg.solve_triangular`. `trsm` has an extra parameter (`side`), which allows to
ellide the copy of the triangular matrix in many cases.

Fixes https://github.com/pytorch/pytorch/issues/56326

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30566479

Pulled By: mruberry

fbshipit-source-id: 3831af9b51e09fbfe272c17c88c21ecf45413212
2021-09-07 17:26:17 -07:00
52ff9bc639 [iOS][Metal] Add aten:hardswish (#64588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64588

Add `aten::hardswish` to run the mobilenetv3 model from torchvision.
ghstack-source-id: 137479323

Test Plan:
- buck test pp-macos
- circleCI

Reviewed By: beback4u

Differential Revision: D30781008

fbshipit-source-id: 83454869195ef4ab50570ea9b3bf2a55f32a3e86
2021-09-07 15:41:29 -07:00
2c351c76e0 [special] Alias igamma, igammac to special.gammaninc, special.gammaincc (#61902)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345

Also added relevant OpInfo

TODO:
* [x] Check rendered docs gammainc : https://docs-preview.pytorch.org/61902/special.html#torch.special.gammainc
* [x] Check rendered docs gammaincc: https://docs-preview.pytorch.org/61902/special.html#torch.special.gammaincc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61902

Reviewed By: ngimel

Differential Revision: D30761428

Pulled By: mruberry

fbshipit-source-id: 06a16432873357958d53364f12a4e91c29779d26
2021-09-07 15:31:26 -07:00
b01d2d1d3e Disables four failing distributions tests on windows (#64596)
Summary:
Per title. Unblocks CI. See https://github.com/pytorch/pytorch/issues/64595.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64596

Reviewed By: mrshenli

Differential Revision: D30787296

Pulled By: mruberry

fbshipit-source-id: 84b90cb25c0185f1851db02425ea40aa13d3e598
2021-09-07 15:29:13 -07:00
a22c936b63 Add lint to ensure .github/ pypi dependencies are pinned (#64463)
Summary:
Example failing run: https://github.com/pytorch/pytorch/pull/64463/checks?check_run_id=3501249102

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64463

Reviewed By: janeyx99

Differential Revision: D30744930

Pulled By: driazati

fbshipit-source-id: 4dd97054db1d4c776a4512bc3d664987cd7b6d23
2021-09-07 15:28:11 -07:00
7e88d0b370 Update explicit_ci_jobs to work with GHA (#64598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64598

This adds a filter option rather than an all-or-nothing so it's easier to iterate on a specific job.

```bash
python tools/testing/explicit_ci_jobs.py --filter-gha '*generated-linux-*gcc5.4*'
```

See #64600 for an example usage

NB: If you regenerate the worfklows you will need to re-run that command to re-delete everything.

Test Plan: Imported from OSS

Reviewed By: janeyx99

Differential Revision: D30788850

Pulled By: driazati

fbshipit-source-id: a32c266bbd876c396665bceef9a0a961b4586564
2021-09-07 15:21:12 -07:00
a48d83a575 Move ParallelTBB to GHA (take 2) (#64193)
Summary:
2nd attempt to do the same
Skip failing `TestTensorCreationCPU.test_trilu_indices_cpu`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64193

Reviewed By: mrshenli

Differential Revision: D30779469

Pulled By: malfet

fbshipit-source-id: 5c51fcbb383d0823d0e953d7af181b5f22eda9ab
2021-09-07 15:11:00 -07:00
369db8924f [Static Runtime] Add first iter metric (#64457)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64457

The first iteration is special since it initializes the memory planner. This change logs and reports first iteration time during benchmarking. It also generates a FAI-PEP output when `generate_ai_pep_output` is set.

Test Plan:
Run any benchmark, and observe:
```
I0902 15:19:32.528977 2492358 impl.cpp:948] PyTorchObserver {"value":6.415958881378174,"unit":"ms","metric":"latency","type":"static_runtime_first_iter"}
...
First iter time: 6.41596 ms
```

Note that this metric is likely to have significantly more noise than the others since we don't have as many data points.

Unit tests: `buck test //caffe2/test:static_runtime`

Reviewed By: d1jang

Differential Revision: D30740619

fbshipit-source-id: 4dcfccd5629f4fa34254fd355073ef19e151245a
2021-09-07 15:00:30 -07:00
3bd69d3020 add bubdle input into AIBench (#64557)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64557

MaskRCNN speed depends on how many people detected in the detection stage. A random input from dataloader doesn't satisfy this. In order to standardize the benchmarking, we use 2 standard image for benchmarking, 2/3 people.

Test Plan: AIBench result: https://www.internalfb.com/intern/aibench/details/945883114818980

Reviewed By: axitkhurana

Differential Revision: D30446049

fbshipit-source-id: a2826fdb69e9f840c0afc566c4cbbcde1c2fba89
2021-09-07 14:46:23 -07:00
3c87f55752 Automated submodule update: FBGEMM (#64582)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: 3ce04fc664

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64582

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: mrshenli

Differential Revision: D30779695

fbshipit-source-id: 22460a4047e2462e672eb4931e44648ae6bde627
2021-09-07 14:16:22 -07:00
acc9f9afc8 enable bf16 mkldnn path for gemm (#61891)
Summary:
# Goal: Integrate mkldnn bf16 Gemm to pytorch

## BF16 Suport for mm, addmm, bmm, addbmm, baddbmm, mv, addmv, dot (with mkldnn matmul primitive):
https://oneapi-src.github.io/oneDNN/group__dnnl__api__matmul.html
For gemm related ops, we keep all inputs under plain format. So we will not introduce opaque tensor for these ops to save mem copy here.

![mkldnn bf16 gemm integration](https://user-images.githubusercontent.com/54701539/126263077-4b5134e1-52a7-4fad-94fb-19e13a0377f6.png)

The minimized integration is only dispatch to mkldnn in addmm, but for gemm with 3-D input (with additional dim for"batch") this will call mkldnn gemm for "batch" times. Since mkldnn matmul support input with multiple dims, we directly dispatch to mkldnn gemm in {bmm, addbmm, baddbmm} to reduce the time to create mkldnn memory desc, primitive, etc.

For the different definition for "bias" between mkldnn(which must be shape of (1, N)) and pytorch (which can be same shape with gemm result (M, N)), we use a fused sum to handle it.

## User Case:
User case is exactly same with before because no opaque tensor's is introduced. Since the pytorch has already support bf16 data type with CPU tensor before, we can leverage the existed bf16 gemm UT.

## Gemm performance gain on CPX 28Cores/Socket:
Note: data is collected using PyTorch operator benchmarks: https://github.com/pytorch/pytorch/tree/master/benchmarks/operator_benchmark (with adding bfloat16 dtype)

### use 1 thread on 1 core
### torch.addmm (M, N) * (N, K) + (M, K)
| impl |16x16x16|32x32x32| 64x64x64 | 128x128x128| 256x256x256| 512x512x512|1024x1024x1024|
|:---:|:---:| :---: | :---: | :---: | :---: | :---: | :---: |
| aten-fp32| 4.115us|4.583us|8.230us|26.972us|211.857us|1.458ms|11.258ms|
| aten-bf16 | 15.812us| 105.087us|801.787us|3.767ms|20.274ms|122.440ms|836.453ms|
| mkldnn-bf16 |20.561us |22.510us|24.551us|37.709us|143.571us|0.835ms|5.76ms|

We can see mkldnn-bf16 are better than aten bf16, but for smaller shapes, mkldnn bf16 are not better than aten fp32. This is because onednn overhead, this overhead more like a "constant" overhead and while problems get larger, we can ignore it. Also we are continue optimize the kernel efficiency and decrease the overhead as well.

More shapes
| impl |1x2048x2048|2048x1x2048| 2048x2048x1 |
|:---:|:---:| :---: | :---: |
| aten-fp32| 0.640ms|3.794ms|0.641ms|
| aten-bf16 | 2.924ms| 3.868ms|23.413ms|
| mkldnn-bf16 |0.335ms |4.490ms|0.368ms|

### use 1 socket (28 thread, 28 core)
| impl | 256x256x256| 512x512x512|1024x1024x1024| 2048x2048x2048|4096x4096x4096|
|:---:| :---: | :---: | :---: | :---: | :---: |
| aten-fp32| 35.943us |140.315us|643.510us|5.827ms|41.761ms|
| mkldnn-bf16 |53.432us|114.716us|421.858us|2.863ms|23.029ms|

More shapes
| impl |128x2048x2048|2048x128x2048| 2048x2048x128 |
|:---:|:---:| :---: | :---: |
| aten-fp32| 0.561ms|0.458ms|0.406ms|
| mkldnn-bf16 |0.369ms |0.331ms|0.239ms|

We dose not show aten-bf16 for this case since aten-bf16 always compute as single thread and the performance is extreme poor. The trend for this case is similar for 1 thread on 1 core.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61891

Reviewed By: iramazanli

Differential Revision: D29998114

Pulled By: VitalyFedyunin

fbshipit-source-id: 459dc5874c638d62f290c96684ca0a694ded4b5a
2021-09-07 13:00:37 -07:00
337c71be05 Array API: Add torch.linalg.matmul alias to torch.matmul (#63227)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62811

Add `torch.linalg.matmul` alias to `torch.matmul`. Note that the `linalg.matmul` doesn't have a `method` variant.

Also cleaning up `torch/_torch_docs.py` when formatting is not needed.

cc IvanYashchuk Lezcano mruberry rgommers

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63227

Reviewed By: mrshenli

Differential Revision: D30770235

Pulled By: mruberry

fbshipit-source-id: bfba77dfcbb61fcd44f22ba41bd8d84c21132403
2021-09-07 12:35:32 -07:00
8407ce7e38 [small BE] .github: refactor concurrency into a common macro (#64587)
Summary:
By using a macro for these concurrency groups, we can edit just one place for the linux and windows workflows (vs 2).

I wanted to loop all the other workflow files in as well, but since those aren't generated, the macros won't work the same way.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64587

Reviewed By: mrshenli

Differential Revision: D30783224

Pulled By: janeyx99

fbshipit-source-id: ae16ebb12d2d63a563d28f0ce88e280f68ed4b9b
2021-09-07 12:31:55 -07:00
7e4ebe06ca Fixes issue related torch.trapezoid broadcasting behavior and documentation (#64054)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64054

Fixes #63608

cc mruberry rgommers heitorschueroff

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D30617078

Pulled By: NivekT

fbshipit-source-id: 815896ec56d447562790df4d662e94fd13457e2a
2021-09-07 11:41:55 -07:00
c9d6ca4c54 Add space in Feature Request issue template (#64563)
Summary:
Add space between emoji and text in Feature Request issue template

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64563

Reviewed By: janeyx99

Differential Revision: D30779429

Pulled By: seemethere

fbshipit-source-id: 3625299923a7022fa66473633524a6620d58188b
2021-09-07 11:36:06 -07:00
85eeb4d682 Clean up op BC check list (#64584)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64584

It has been a while since last clean up. The list is really long.

Test Plan: ci

Reviewed By: hl475

Differential Revision: D30779350

fbshipit-source-id: 908b47d0b9a16b784aad6a34c5c87f923500c247
2021-09-07 11:25:40 -07:00
43248d9112 [doc][hackathon] To add Adam Optimizer to the documentation (#63251)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Adam Algorithm to the documentation.  For more details, we refer to the paper  https://arxiv.org/abs/1412.6980

<img width="442" alt="Screen Shot 2021-08-27 at 6 37 54 PM" src="https://user-images.githubusercontent.com/73658284/131195297-35fce613-3691-4fed-b42d-db234d4fcd7c.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63251

Reviewed By: albanD

Differential Revision: D30779163

Pulled By: iramazanli

fbshipit-source-id: 319a80fc3952793b0d064d0e641ddc1de3c05a86
2021-09-07 11:03:35 -07:00
adb85b32d3 minor fix for elastic doc (#64531)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64531

fix #64530

Test Plan: unit test

Reviewed By: mrshenli

Differential Revision: D30760879

fbshipit-source-id: 94ed1476e886513427d928a36f5be6b9bfff0826
2021-09-07 09:31:01 -07:00
26b7ff5aea deprecate dtype getters from torch.testing namespace (#63554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554

Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold:

1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`.
2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries.

We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D30662206

Pulled By: mruberry

fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56
2021-09-07 08:58:51 -07:00
f767cf6683 To change WarmUp Scheduler with ConstantLR and LinearLR (#64395)
Summary:
Partially unblocks https://github.com/pytorch/vision/issues/4281

Previously we have added WarmUp Schedulers to PyTorch Core in the PR : https://github.com/pytorch/pytorch/pull/60836 which had two mode of execution - linear and constant depending on warming up function.

In this PR we are changing this interface to more direct form, as separating linear and constant modes to separate Schedulers. In particular

```Python
scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant")
scheduler2 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="linear")
```

will look like

```Python
scheduler1 = ConstantLR(optimizer, warmup_factor=0.1, warmup_iters=5)
scheduler2 = LinearLR(optimizer, warmup_factor=0.1, warmup_iters=5)
```

correspondingly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64395

Reviewed By: datumbox

Differential Revision: D30753688

Pulled By: iramazanli

fbshipit-source-id: e47f86d12033f80982ddf1faf5b46873adb4f324
2021-09-07 08:42:31 -07:00
75b9e4a128 [JIT] Freeze unrolls constant loops (#63614)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63614

There are a number of optimizations (`RemoveListMutation` in particular) that are tied to loop unrolling in `runOptimizations`. However, these were not invoked from `freeze_module` since the freezing pass should be idempotent.

This diff makes `runOptimizations` run `UnrollConstantLoops` instead of `UnrollLoops`. `freeze_module` is then able to run these optimizations.

Test Plan: Observed that `freeze_module` applies `RemoveListMutation`

Reviewed By: eellison

Differential Revision: D30437356

fbshipit-source-id: cba04bd958a48ad51b151aa3264f3d5bbb1fc2a4
2021-09-07 08:06:47 -07:00
adbcc819cd Fix fx2trt SplitterBase non_tensor_input logic (#64286)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64286

During graph splitting, `_SplitterBase` supports taking into consideration whether the subnet boundary nodes
produces "supported" outputs that will cross the acc/non-acc boundary. Specifically, if the backend only
supports Tensor-based data passing cross boundary, then we cannot split the graph at a place where the node
output is a non-Tensor type (e.g., `Tuple[Tensor]`).

There's currently a bug in this logic that it does not correctly detect the output type of a Node. Instead of
using `Node.meta['tensor_meta']`, we should instead check `Node.meta['type']`.

`Node.meta['tensor_meta']` is not appropriate because this key will exist if the node output is an iterable
and one of the element is of type `Tensor`. So `Tuple[Tensor]` will be wrongly considered "supported".

Test Plan:
arc lint
run CI tests

Reviewed By: yinghai, 842974287

Differential Revision: D30617147

fbshipit-source-id: e8ba70dfaddc05cafb8037d58fca73b7ccbb1a49
2021-09-07 04:02:29 -07:00
32fbeb170d Update error messages that use LAPACK error codes (#63864)
Summary:
This PR updates the` batchCheckErrors` and `singleCheckErrors` functions so that the error messages are defined only once.
`batchCheckErrors` function reuses `singleCheckErrors` now.

Fixes https://github.com/pytorch/pytorch/issues/63220, fixes https://github.com/pytorch/pytorch/issues/59779

cc jianyuh nikitaved pearu mruberry heitorschueroff walterddr IvanYashchuk xwang233 Lezcano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63864

Reviewed By: ngimel

Differential Revision: D30672933

Pulled By: mruberry

fbshipit-source-id: 0ba37ff98ef278efdb12c3890aa07d687047da7a
2021-09-07 00:05:46 -07:00
1a1fb31cfa Support torch.concat alias, add cat OpInfo & remove OpInfo test_out skips {cat, stack, hstack, vtack, dstack} (#62560)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61767

## Changes

- [x] Add `torch.concat` alias to `torch.cat`
- [x] Add OpInfo for `cat`/`concat`
- [x] Fix `test_out` skips (Use `at::native::resize_output` or `at::native::resize_output_check`)
  - [x] `cat`/`concat`
  - [x] `stack`
  - [x] `hstack`
  - [x] `dstack`
  - [x] `vstack`/`row_stack`
- [x] Remove redundant tests for `cat`/`stack`

~I've not added `cat`/`concat` to OpInfo `op_db` yet, since cat is a little more tricky than other OpInfos (should have a lot of tests) and currently there are no OpInfos for that. I can try to add that in a subsequent PR or maybe here itself, whatever is suggested.~
**Edit**: cat/concat OpInfo has been added.

**Note**: I've added the named tensor support for `concat` alias as well, maybe that's out of spec in `array-api` but it is still useful for consistency in PyTorch.

Thanks to krshrimali for guidance on my first PR :))

cc mruberry rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff krshrimali

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62560

Reviewed By: saketh-are

Differential Revision: D30762069

Pulled By: mruberry

fbshipit-source-id: 6985159d1d9756238890488a0ab3ae7699d94337
2021-09-06 23:57:18 -07:00
0a1aaff0de Remove dead code from THC (THCApply.cuh) (#64559)
Summary:
cc peterbell10

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64559

Reviewed By: mruberry

Differential Revision: D30769526

Pulled By: ngimel

fbshipit-source-id: 034a5c778a2b902cffa57b76511fa0dcdea26825
2021-09-06 21:26:08 -07:00
571a2becf3 Move ParallelNative and PureTorch to GHA (#64452)
Summary:
Separate ParallelTBB move to https://github.com/pytorch/pytorch/pull/64193 as it requires some further investiagation

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64452

Reviewed By: seemethere, janeyx99

Differential Revision: D30738337

Pulled By: malfet

fbshipit-source-id: 81c46423e903058bd1a3e8553e8a10ce978eeefd
2021-09-06 11:40:44 -07:00
544c8e6a5d Mark functions in backend header as inline to suppress warning (#64098)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64098

Reviewed By: kimishpatel, iseeyuan

Differential Revision: D30593104

fbshipit-source-id: 328196b9bc4a89a28ad89bede7e337107976c303
2021-09-05 16:45:23 -07:00
bcc7e82371 Revert D30745610: [nnc] Make our exceptions c10::Errors, get C++ stacktraces
Test Plan: revert-hammer

Differential Revision:
D30745610 (18b2751ea1)

Original commit changeset: a1cfaa7364ef

fbshipit-source-id: 9b716053b96a65745240ddef1c456c44d5d09671
2021-09-05 16:08:09 -07:00
49fe829cae [Vulkan] Code Quality: Remove duplicate code for hardshrink and leaky_relu functions (#64405)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64405

Code quality improvement: removed duplicate code for hardshrink and leaky_relu functions.
ghstack-source-id: 137319378

Test Plan:
```buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"```

Reviewed By: SS-JIA

Differential Revision: D30690251

fbshipit-source-id: 5729d1f32946e42f41df77756a8313f297dd822f
2021-09-05 12:53:58 -07:00
1901c675e1 Back out "nn.functional.linear OpInfo" (#64517)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64517

Original commit changeset: ca41dbd98176

Test Plan: PyTorch CI

Reviewed By: ngimel

Differential Revision: D30758201

fbshipit-source-id: 2d3274293d340373b8af86083336607818019619
2021-09-05 02:25:00 -07:00
008bf6689b Back out "D30740897 Add fusion enabled apis" (#64500)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64500

D30740897 (39aeb3bf63) broke caffe2/torch/fb/module_factory/optimizers/tests:test_full_sync_optimizer_needed_coverage (https://fburl.com/test/mb46jxon) and blocked training_platform_unit_tests

{F660271297}

multsect results confirms

```
multisect --config FBCODE_TEST bisect 844424966128796 --workers 16 revisions --begin 09629edc --end fc86b434
D30740897 (39aeb3bf63)

````

{F660271232}

Test Plan:
```
buck test mode/opt //caffe2/torch/fb/module_factory/optimizers/tests:test_full_sync_optimizer_needed_coverage

Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4785074671474181
    ✓ Pass: caffe2/torch/fb/module_factory/optimizers/tests:test_full_sync_optimizer_needed_coverage - main (3.729)
Summary
  Pass: 1

```

Differential Revision: D30753916

fbshipit-source-id: 302fd4113ef1f3069846be03edc2300d82b66719
2021-09-04 20:55:58 -07:00
18b2751ea1 [nnc] Make our exceptions c10::Errors, get C++ stacktraces (#64332)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64332

With this diff, if a compiler bug occurs (unlikely, I know!) we'll be able to get a c++ stacktrace leading to the exception, rather than just a terse message.  E.g.,
```
RuntimeError: UNSUPPORTED DTYPE
Exception raised from compilation_error at ../torch/csrc/jit/tensorexpr/exceptions.h:32 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f966659b2eb in /fsx/users/bertrand/c\
onda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x376f099 (0x7f966a195099 in /fsx/users/bertrand/conda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0x3763bf5 (0x7f966a189bf5 in /fsx/users/bertrand/conda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #3: torch::jit::tensorexpr::CudaCodeGen::Initialize() + 0xdd8 (0x7f966a193368 in /fsx/users/bertrand/conda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_cuda\
.so)
```

Test Plan: Imported from OSS

Reviewed By: huiguoo

Differential Revision: D30745610

Pulled By: bertmaher

fbshipit-source-id: a1cfaa7364ef4120de834e9cbe57ced1d082ab4e
2021-09-04 20:31:54 -07:00
6cac7ca980 Ensure num_threads is initialized in get_num_threads (#64486)
Summary:
Possible source of the recent layernorm CI failures. `lazy_init_num_threads` appears at the top of `parallel_for` and can change the number of threads set. So, we need to ensure `num_threads` is initialized during `get_num_threads` calls as well. It's already done this way for OpenMP, but is missing from other parallel backends.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64486

Reviewed By: mruberry

Differential Revision: D30752615

Pulled By: ngimel

fbshipit-source-id: 085873ce312edbee1254c0aaae30dec7fcfe2c57
2021-09-04 12:38:09 -07:00
604e885925 Automated submodule update: FBGEMM (#64338)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: 9ccb2714a9

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64338

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: jspark1105

Differential Revision: D30690319

fbshipit-source-id: 884d1f950cd1f7d2a77b79affb9215f285d5d0da
2021-09-04 00:44:28 -07:00
a91a278d60 Fix copy_transpose_valid condition for copy_same_type_transpose_ (#64425)
Summary:
Thanks to ngimel for the hint where the problem might be (https://github.com/pytorch/pytorch/issues/64358#issuecomment-910868849)!

I added a test that fails on master to verify the fix. The shape `(60, 60)` was chosen because of `MIN_SZ = 60 * 60` in `copy_transpose_valid`.

Fixes https://github.com/pytorch/pytorch/issues/64358

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64425

Reviewed By: mruberry

Differential Revision: D30752725

Pulled By: ngimel

fbshipit-source-id: f40370ea8365c94e30f8e8a3dcab5f3b3462464a
2021-09-03 18:50:33 -07:00
e4ff14ad59 [CUDA graphs] Error if attempting to capture uncapturable nccl (#64440)
Summary:
NCCL < 2.9.6 is not capturable. Attempting to capture it can cause nasty behavior (for example, ive seen capture succeed, but replay silently hang). Pytorch should preempt this with a friendlier error.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64440

Reviewed By: mruberry

Differential Revision: D30733884

Pulled By: ngimel

fbshipit-source-id: 5f2df3cf5cc0e5e68f49bf22a80d9f58064dc7ec
2021-09-03 13:23:07 -07:00
0e3b45eaef Fix logical typo in _compare_trilu_indices (#64468)
Summary:
I'm pretty sure that repeating the same call twice is pretty meaningless and intend was to call `tril`/`tril_indices` in first case and `triu`/`triu_indices` in another

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64468

Reviewed By: mruberry

Differential Revision: D30744978

Pulled By: malfet

fbshipit-source-id: 7cd36789a7ebf1cc263fb2d875e479c05e7588a4
2021-09-03 10:22:49 -07:00
6831d8e379 Support Union in TorchScript (#64234)
Summary:
This PR is created to replace https://github.com/pytorch/pytorch/pull/53180 PR stack, which has all the review discussions. Reason for needing a replacement is due to a messy Sandcastle issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64234

Reviewed By: gmagogsfm

Differential Revision: D30656444

Pulled By: ansley

fbshipit-source-id: 77536c8bcc88162e2c72636026ca3c16891d669a
2021-09-03 06:12:24 -07:00
91b926fab3 Add fx2trt pass for removing duplicate output args (#64461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64461

Fx2TRT does not support duplicate nodes in the output args tuple.

This pass removes duplicate output args from the target subnets and fixes their uses in the top level module where the subnets are called. This pass must be called after acc split on the top-level net and subsequent calls to the acc trace on the subnets.

This pass will change both the subnets and top level module.

Test Plan:
Run:

```
buck run mode/opt -c python.package_style=inplace //caffe2/torch/fb/fx2trt/tests/passes/:test_remove_duplicate_output_args

```

Reviewed By: yinghai

Differential Revision: D30740499

fbshipit-source-id: 98459f7677980b21c7bffda918158001285572db
2021-09-02 23:04:12 -07:00
39aeb3bf63 Add fusion enabled apis (#64429)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64429

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D30740897

Pulled By: eellison

fbshipit-source-id: 446aa63b5d763f1cfffea62547db7294368e3438
2021-09-02 22:19:09 -07:00
7031fbdc63 update optimize_for_inference docs (#64428)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64428

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D30740898

Pulled By: eellison

fbshipit-source-id: b94d2c3deb661a6ba048f19e8c1d5e1799667eeb
2021-09-02 22:17:58 -07:00
e1c3e5f830 [resubmit][FX] Prototype for guarding against mutable operations in tracing (#64467)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64467

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30744870

Pulled By: jamesr66a

fbshipit-source-id: fc652f8b17748f90dbeb83fabf3bd5bb57d6ff1a
2021-09-02 21:13:21 -07:00
cd82bc1af9 Skips layer norm OpInfo on tbb platform (#64469)
Summary:
The OpInfo tests appear to be discovering a layer norm x tbb issue that requires investigation. Skipping tests on that platform for now to restore CI signal.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64469

Reviewed By: ngimel

Differential Revision: D30745746

Pulled By: mruberry

fbshipit-source-id: 282484cc00b867fac85b7df61430d64277da6421
2021-09-02 20:53:01 -07:00
c19bd05e84 THC: Cleanup dead code (#64441)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64441

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30735342

Pulled By: ngimel

fbshipit-source-id: 84ab36f7aec6b8cd7f1f34c19a58a382c06ad68d
2021-09-02 17:45:16 -07:00
db692ec0b3 Regenerate generated github workflows (#64465)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64465

These were out of date and causing master failures

Test Plan: Imported from OSS

Reviewed By: zhouzhuojie

Differential Revision: D30744594

Pulled By: driazati

fbshipit-source-id: 09a21c3c5d9bc83b368d66cabbafd1ba83302dd3
2021-09-02 17:31:29 -07:00
e161872aab Revert D30732630: [quant] Enable jit tracing on quantizable LSTM
Test Plan: revert-hammer

Differential Revision:
D30732630 (116142143c)

Original commit changeset: 443e351ebb0e

fbshipit-source-id: 49001392f01366f3b1ccc31139f824c80b86cd40
2021-09-02 17:08:26 -07:00
046ed57a4d Revert D30055886: [quant] AO migration of the quantize.py
Test Plan: revert-hammer

Differential Revision:
D30055886 (44e3ed88c9)

Original commit changeset: 8ef7470f9fa6

fbshipit-source-id: c5bd3ead43a2d44b9e56872ec5bd7a195bdac725
2021-09-02 16:59:59 -07:00
4968d0b34f [POC] .github: Add event name to concurrency (#64402)
Summary:
This would ensure that manually/API triggered workflows would not cancel other triggered workflows. For example, the manually triggered periodic 11.1 linux job cancelled the scheduled one here, which we may not want:
![image](https://user-images.githubusercontent.com/31798555/131752175-1c99d56e-d344-46e1-b8ac-9c12bba0569a.png).

This would be helpful later as we use more dispatched workflows (e.g., for bisect functionality)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64402

Reviewed By: malfet

Differential Revision: D30734860

Pulled By: janeyx99

fbshipit-source-id: 220016716094666e9af836fcd716dd529cf23d8a
2021-09-02 16:24:05 -07:00
b12f34e8c2 update rpc tensorpipe logic for sparse tensors (#62960)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62960

A bug was filed a few years ago for sending sparse tensor over rpc #30807.

This pr updates rpc/tensorpipe logic for CUDA sparse tensors. During the serialization process, the pickler.cpp implementation breaks down the sparse tensor into two tensors and metadata. torch/csrc/distributed/rpc/tensorpipe_agent.cpp needs to be updated because it does not have logic sparse tensors. It pushes a single device for a sparse tensor. This is wrong because after the sparse tensor has been serialized, there will be two tensors. The second tensor will not have a device. This will cause the second tensor to have the wrong target device. tensorpipe_utils.cpp needs to be updated because deserialization happens after the data is received on the target pipe. This takes the two tensors and metadata sent and rebuilds the sparse tensor. There will be two tpDescriptors but only one tensor after deserialization. The logic is updated to verify the sparse tensor is on the correct device using the first tpDescriptor.

This pr also updates ivalue.cpp and ivalue.h to support more paths for Sparse COO tensors.

I tested these changes by adding sparse tests to rpc_test.py and dist_autograd_test.py.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30717285

Pulled By: gcramer23

fbshipit-source-id: daee9a56764550f56b131f9dd8e74e23113d6714
2021-09-02 16:16:19 -07:00
32a93c2424 Revert D30675780: [FX] Prototype for guarding against mutable operations in tracing
Test Plan: revert-hammer

Differential Revision:
D30675780 (795387477f)

Original commit changeset: b2116b51dcc8

fbshipit-source-id: d4f1173f4989556ea54974f4c2739ef85a705fae
2021-09-02 16:07:29 -07:00
116142143c [quant] Enable jit tracing on quantizable LSTM (#64438)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64438

The quantizable LSTM didn't support jit tracing because it had several non taceable paths. We sacrifice some of the user experience to enable the tracing.

The main UX feature removed is a user-friendly message when trying to access the backwards path in a bidirectional LSTM: When the bidirectional flag is `False`, we used to throw a nice error message when the user tried accessing backwards weights. Now the message is default (removed properties).

Test Plan: `buck test mode/dev //caffe2/test:quantization -- test_custom_module_lstm`

Reviewed By: mtl67

Differential Revision: D30732630

fbshipit-source-id: 443e351ebb0e2b636c86dea9691b9bf42ffe618f
2021-09-02 15:59:20 -07:00
795387477f [FX] Prototype for guarding against mutable operations in tracing (#64295)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64295

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D30675780

Pulled By: jamesr66a

fbshipit-source-id: b2116b51dcc87357f0c84192c4c336680875e27a
2021-09-02 15:17:04 -07:00
3c79e0b314 .github: Migrate pytorch_linux_bionic_py_3_6_clang9 to GHA (#64218)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64218

Relies on https://github.com/fairinternal/pytorch-gha-infra/pull/11

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra bdhirsh

Test Plan: Imported from OSS

Reviewed By: malfet, H-Huang, janeyx99

Differential Revision: D30651516

Pulled By: seemethere

fbshipit-source-id: e5843dfe84f096f2872d88f2e53e9408ad2fe399
2021-09-02 14:51:00 -07:00
257623da39 Switch Shuffler to use iter-local buffer (#64195)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64195

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D30642947

Pulled By: ejguan

fbshipit-source-id: d4b52479b4ae37ad693388b9cdb8eed83a136474
2021-09-02 13:40:28 -07:00
f555348aaa Disable CircleCI ROCm build (#64434)
Summary:
Per jithunnair-amd suggestion

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64434

Reviewed By: seemethere, janeyx99

Differential Revision: D30732289

Pulled By: malfet

fbshipit-source-id: 1932d0a7d1e648006f8030c8237b187d0709f688
2021-09-02 13:32:02 -07:00
4ce9c530d6 [DataPipe] removing filter's inheritance from map (#64404)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64404

This PR remove `filter`'s inheritance from `map`. This allows `filter` to not have a `__len__` function and that behavior is what we would like.

cc VitalyFedyunin ejguan

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30713120

Pulled By: NivekT

fbshipit-source-id: 4d5d07555297ee2bd4b49842c0d26cdc00638f6c
2021-09-02 13:09:47 -07:00
4f43480186 [DataPipe] adding/removing __len__ for different DataPipe (#64398)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64398

cc VitalyFedyunin ejguan

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30710437

Pulled By: NivekT

fbshipit-source-id: 524eda43a2faa0db0c1a662bf9bb4283f0ade83c
2021-09-02 13:08:32 -07:00
3cd0a4ac15 Fix test_ind_worker_queue by setting max_num_worker based on system resource (#63779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63779

Fixes #63657

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30494185

Pulled By: ejguan

fbshipit-source-id: d1bd24299b25d589889604aaf18ad347bdff4df4
2021-09-02 12:36:56 -07:00
7d010539c9 ENH Adds test and docs for modules that already support no batch dims (#62729)
Summary:
Towards https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62729

Reviewed By: H-Huang

Differential Revision: D30669546

Pulled By: jbschlosser

fbshipit-source-id: c771c98c1fd9d28fa984b72893585c738c736505
2021-09-02 12:36:54 -07:00
d0cb26ba57 [DDP] Fix logging iterations (#64411)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64411

These are not actually the training iterations, but are offset by how
frequently DDP stats collection actually runs (default being
kDDPRuntimeLoggingSampleRate = 100). So with this change, they are actually
logged to scuba every:
10, 10 * 100, 40 * 100, etc iterations.

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D30718274

fbshipit-source-id: 146bd2428753c93363bee37e487f40104fce3c18
2021-09-02 12:35:01 -07:00
22f3bcd164 .github: Move squid vars to common vars (#64436)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64436

Moves the squid variables to our common jinja template so that when we
have to update them they're all in the same place.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra

Test Plan: Imported from OSS

Reviewed By: malfet, zhouzhuojie

Differential Revision: D30732776

Pulled By: seemethere

fbshipit-source-id: 22e3757c4eec775baa8abbaac2ba2a0c69c2b2a9
2021-09-02 11:31:54 -07:00
c932afe39b .github: Move upload-artifact-s3 to common var (#64435)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64435

Move upload-artifact-s3 to a common variable to be used amongst our
jinja templates, this should make it easier in the future to update
these images

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30732777

Pulled By: seemethere

fbshipit-source-id: 51cd485f5abae134c3c49dfa878e6303ba8e5f25
2021-09-02 11:31:52 -07:00
1519b6084f nn.functional.linear OpInfo (#61971)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61971

Test Plan: - wait for tests

Reviewed By: heitorschueroff

Differential Revision: D30013750

Pulled By: zou3519

fbshipit-source-id: ca41dbd98176c12e50ad1410a658f4b06fe99a1e
2021-09-02 11:31:50 -07:00
c0cdbb1cc5 Revert D30468409: Add fx2trt pass for removing duplicate output args
Test Plan: revert-hammer

Differential Revision:
D30468409 (6da7552a8e)

Original commit changeset: b4d91b76ab5d

fbshipit-source-id: e138dc425fe55ffe3585ea5fac4db476931bafed
2021-09-02 11:31:49 -07:00
9214450b7f [tensorexpr] Wrap error msgs with buildErrorMessages for internal asserts (#64409)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64409

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30717786

Pulled By: huiguoo

fbshipit-source-id: a3b147d339ff4927f14efa24407cd3b63d80001d
2021-09-02 11:30:34 -07:00
6da7552a8e Add fx2trt pass for removing duplicate output args (#64433)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64433

Fx2TRT does not support duplicate nodes in the output args tuple.

This pass removes duplicate output args from the target subnets and fixes their uses in the top level module where the subnets are called. This pass must be called after acc split on the top-level net and subsequent calls to the acc trace on the subnets.

This pass will change both the subnets and top level module.

Test Plan:
Run:

```
buck run mode/opt -c python.package_style=inplace //caffe2/torch/fb/fx2trt/tests/passes/:test_remove_duplicate_output_args

```

Reviewed By: 842974287

Differential Revision: D30468409

fbshipit-source-id: b4d91b76ab5d8a5275d68dd48d1327a44c22568e
2021-09-02 10:40:37 -07:00
aeafcde087 CI: Enable using labels to control GHA workflows (#64314)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62852

Sets a global environment variable containing a list of PR labels. For this PR, the PR_LABELS variable looks like:
```
[
  "cla signed",
  "ciflow/default"
]
```
confirmed in a run: https://github.com/pytorch/pytorch/runs/3490072161?check_suite_focus=true

This information can be used in other workflow steps to control the logic. For example, if I want to force a build, I can label my PR with "force-build" and do something like the following in my build script:
```
if [[ "${PR_LABELS}" = *force-build* ]]; then
   python setup.py install
else
   #use cached wheel or something
fi
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64314

Reviewed By: driazati

Differential Revision: D30714570

Pulled By: janeyx99

fbshipit-source-id: 80b060ee32643ddd22eb7b8ec548579c7ccf6441
2021-09-02 10:34:44 -07:00
66ddc6ef9e Fixes and details to torchhub docs (#63783)
Summary:
This PR:

- adds a few details regarding the newly added `skip_validation` parameter https://github.com/pytorch/pytorch/pull/62139
- uses double-backticks instead of single-backticks since this is rst, not mardown.
- adds a few minor doc nits here and there

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63783

Reviewed By: zou3519

Differential Revision: D30696658

Pulled By: NicolasHug

fbshipit-source-id: 6f01c7eb3cfcd7e17e4c33c09d193054fa18ad36
2021-09-02 09:32:57 -07:00
50067c020a TST Adds __repr__ and str to module info (#63737)
Summary:
Follow up to https://github.com/pytorch/pytorch/pull/61935

This PR adds `test_repr` to `test_modules`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63737

Reviewed By: gchanan

Differential Revision: D30729642

Pulled By: jbschlosser

fbshipit-source-id: c11a28bc0739abd3ed40727389dd28ed4069edad
2021-09-02 09:32:55 -07:00
2c258d91cc Fix torch.istft length mismatch and window runtime error (#63469)
Summary:
The PR fixes two issues:
- See https://github.com/pytorch/pytorch/issues/62747 and https://github.com/pytorch/audio/issues/1409. The length mismatch when the given ``length`` parameter is longer than expected. Add padding logic in consistent with librosa.
- See https://github.com/pytorch/pytorch/issues/62323. The current implementations checks if the min value of window_envelop.abs() is greater than zero.  In librosa they normalize the signal on non-zero values by indexing. Like
```
approx_nonzero_indices = ifft_window_sum > util.tiny(ifft_window_sum)
y[approx_nonzero_indices] /= ifft_window_sum[approx_nonzero_indices]
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63469

Reviewed By: fmassa

Differential Revision: D30695827

Pulled By: nateanl

fbshipit-source-id: d034e53f0d65b3fd1dbd150c9c5acf3faf25a164
2021-09-02 09:31:47 -07:00
616fd9219d [Static Runtime] Add sign/abs/lop1p/mul fusion pass (#64209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64209

Add a new fusion pass that turns transforms the following pattern:
```
graph(%input):
    %0 : Tensor = aten::sign(%input)
    %1 : Tensor = aten::abs(%input)
    %2 : Tensor = aten::log1p(%1)
    %res : Tensor = aten::mul(%0, %2)
    return (%res)
```
Into a single op:
```
graph(%input):
    %res : Tensor = static_runtim::signed_log1p(%input)
    return (%res)
```

The intent is to reduce the number of passes over the tensor. However, enabling this pass actually causes a performance regression, probably due to a lack of vectorization in the fused implementation. Because of this issue, this diff **does not** enable this pass.

Followup: navahgar will add an NNC kernel which is faster than the the unfused version and enable this pass. We still need this version as a fallback since the NNC kernel will not support all dtypes.

Test Plan:
`buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- SignedLog1p`

Test passed with new graph pass disabled and enabled.

Reviewed By: hlu1

Differential Revision: D30559929

fbshipit-source-id: e4e080cb2e6a705cfdde1fc98bee92b723f8132a
2021-09-02 08:31:40 -07:00
cd3be4675f [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D30710635

fbshipit-source-id: e8dae05a7e3a19d656067a4f102aab4a3c93ac42
2021-09-02 08:31:37 -07:00
f04e6594ed Fix broken caffe2 test: PlanExecutorTest.BlockingErrorPlan (#64401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64401

PlanExecutorTest.BlockingErrorPlan uses `ASSERT_DEATH` which internally performs a `fork()`. This can cause problems under certain configurations that use threads. This change updates this test to use the "threadsafe" style for GTest death tests in order to improve its quality in multithreaded environments.

Test Plan:
I confirmed that this change fixes the issue on my devvm with the following command:
```
buck test mode/dev //caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest.BlockingErrorPlan
```

Reviewed By: praihan

Differential Revision: D30709447

fbshipit-source-id: 12ffd9ad0371e2e5b43a9873c80568e5ab02d246
2021-09-02 08:30:29 -07:00
b737629ff0 simplify op name determination into a single forward pass (#64261)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64261

Note that this does not preserve byte-for-byte compatibility with
existing names.

Test Plan:
* Rely on CI to catch gross errors.
* Merge after release cut to catch subtle issues.

Reviewed By: albanD

Differential Revision: D30700647

Pulled By: dagitses

fbshipit-source-id: 7b02f34b8fae3041240cc78fbc6bcae498c3acd4
2021-09-02 07:32:11 -07:00
b2c7c1dfcf fix copy.deepcopy on LinearPackedParams (#64367)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64367

This is the same thing as https://github.com/pytorch/pytorch/pull/56154
but for quantized linear. It fixes the behavior of `copy.deepcopy` on
these modules. Before this PR, copied instances of `LinearPackedParams`
were not properly initialized, and inspecting them raised errors of
missing `_modules`. After this PR, inspecting and using the copies
works.

Test Plan:
```
python test/test_quantization.py TestStaticQuantizedModule.test_linear_api
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D30702667

fbshipit-source-id: 38c26d1e72663416eeb989985b77ffc2052c12b9
2021-09-02 06:30:42 -07:00
99b064fac4 [jit] shape propagation for prepack (#63585)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63585

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30428905

Pulled By: IvanKobzarev

fbshipit-source-id: c18f6605a69b2e000bdf14a23e637c5a1c2ec64c
2021-09-02 05:30:38 -07:00
cdb46f4c6e extract TestAutogradComplex into its own test file (#63400)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63400

This is the first step to break up test_autograd.py for #63205.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30541499

Pulled By: dagitses

fbshipit-source-id: 8d9d32007938b9eade0e88f95a6a3190e7e2ef01
2021-09-02 04:34:35 -07:00
be5b05c1dc require that TARGET_DET_LIST is sorted (and sort it here) (#64102)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64102

We sort this list so that we may add comments to indicate the absence
of a file right where that file would need to be put. This makes it
difficult to wrongly add such a file.

The sorting itself was done programmatically to ensure that no entries
were inadvertently removed.

I printed the sorted list with:

```
  for p in sorted(TARGET_DET_LIST):
    print(f'    "{p}",')
```

Then copied it back into the file.

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30625076

Pulled By: dagitses

fbshipit-source-id: cf36fcb3e53e274b76d1f4aae83da1f53c03f9ed
2021-09-02 04:34:33 -07:00
aedd70fcfe Fix list() and help() torchhub functions for Windows (#63773)
Summary:
This PR Fixes the help() and list() torchhub functions which were probably failing for Windows since the `/` OS separator was hardcoded.

Before merging this I need to double check whether the CI actually runs the corresponding tests on Windows or not

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63773

Reviewed By: zou3519

Differential Revision: D30695664

Pulled By: NicolasHug

fbshipit-source-id: fac328163fd05db804a8186ae28f22b3cc3a6404
2021-09-02 04:34:31 -07:00
030154e241 Remove outdated comment in hub.py (#63757)
Summary:
This PR removes an outdated comment about Python2 that was orginally introduced in https://github.com/pytorch/pytorch/pull/25083/files. The code has changed since then, but the comment wasn't removed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63757

Reviewed By: zou3519

Differential Revision: D30695656

Pulled By: NicolasHug

fbshipit-source-id: 431cf414588b9e5a1ad6acdae724ff5af1b16971
2021-09-02 04:34:29 -07:00
1c735768ed Update hub.load() signature to avoid polluting kwargs param (#63755)
Summary:
This PR addresses an old comment about Python2 EOL, directly putting some parameters in the function signature instead of in a `**kargs` dict.

I believe the changes are fully backward compatible.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63755

Reviewed By: zou3519

Differential Revision: D30695634

Pulled By: NicolasHug

fbshipit-source-id: 398f347c5a04bfb58e77e46773a869cb9d0eb225
2021-09-02 04:32:22 -07:00
6db8f7a709 Fix TRTModule not adding outputs in order (#64418)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64418

In T99368564, we found that when running TRT lowered module, the output tensors are out-of-order, as compared to the output from the original, non-lowered module. It turns out that in `TRTModule.forward()`, we cannot rely on `ICudaEngine` bindings natural order indices to create the output tensors, but rather, we should explicitly construct the output tensor from the bindings' names, in an ordered that we supply.

Test Plan:
* Arc lint
* Run CI/sandcastle tests
* Run GPU lowering using commands and code changes in D30171741 and ensure we don't observe out-of-order outputs

Reviewed By: yinghai

Differential Revision: D30693545

fbshipit-source-id: 32a894ceeb148fcf4e8d279be3835c7d1f1aa2ba
2021-09-02 01:36:23 -07:00
76e187aa08 Port gather to structured kernel (#63312)
Summary:
Will add a description once this is ready for review.

cc: ysiraichi ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63312

Reviewed By: iramazanli

Differential Revision: D30597447

Pulled By: ezyang

fbshipit-source-id: d36e59835c2f4b38e286032dd2a1111a7e16b7e5
2021-09-02 01:36:21 -07:00
ee8a6c1d14 Replace std::unordered_map<c10::Device, c10::Device> with DeviceMap (#64393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64393

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D30708384

Pulled By: pbelevich

fbshipit-source-id: 1c565727e4f09cd9e560874dd90aa403470b4a97
2021-09-02 01:36:19 -07:00
8d5b95019d [PyTorch Edge] Support default args with out arg, flag off (#63540)
Summary:
1. Allow consuming operators with defaults arguments and out arguments. Flag is off to keep the same behavior as v6, in pr 63651, turn on the flag.
2. Add two unittests to cover this type of operators.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63540

ghstack-source-id: 137211562

Test Plan:
```
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg
```

Reviewed By: raziel, iseeyuan, tugsbayasgalan

Differential Revision: D30414156

fbshipit-source-id: 0f3a219a22aee10ac53184cbd95940726c459d1f
2021-09-02 01:36:16 -07:00
0addd75be9 Remove unnecessary resize_output (#64272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64272

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: H-Huang, bdhirsh

Differential Revision: D30686941

Pulled By: ezyang

fbshipit-source-id: de60e6f1115648f8cf7daaa1e652594fe8b06742
2021-09-02 01:34:17 -07:00
69e1207084 Move graph util to fx2trt (#64064)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64064

Move original util in torch2trt to fx2trt dir since torch2trt is gonne be deprecated. This is a follow up diff for D30379124

Test Plan: manual

Reviewed By: yinghai, mikekgfb

Differential Revision: D30591687

fbshipit-source-id: ae0e59dfbc2d2e2aa4f3ccea7cff2291c7deb388
2021-09-01 22:34:11 -07:00
71e149834b Add a warning about DataLoader num_workers > 0 "memory leak" (#64337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64337

See https://github.com/pytorch/pytorch/issues/13246

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D30690320

Pulled By: ezyang

fbshipit-source-id: 2751aca05a94e63d25162599f458855988516fad
2021-09-01 21:49:41 -07:00
d067f15622 [Dist CI] Move rest of distributed tests to their own CI job (#64253)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64253

Follow up to D30496178 (f4aff3a346) to move the rest of distributed tests to their own jobs for Linux GHA.
ghstack-source-id: 137233785

Test Plan: CI

Reviewed By: walterddr

Differential Revision: D30662999

fbshipit-source-id: f7cfbc0d1223aca52120f17f9da987d70fda8de6
2021-09-01 21:43:41 -07:00
4d6314a16e [DDP] Log num threads (#64072)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64072

Log gloo threads to DDP logging.
ghstack-source-id: 137119480

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D30596083

fbshipit-source-id: 2b4f6e762cb5d850be6056bcc5922029a1af3c91
2021-09-01 18:36:15 -07:00
59c6ceb6a8 add documentation to shape inference algorithm (#64312)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64312

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D30709254

Pulled By: migeed-z

fbshipit-source-id: 3297d26fe6727c5b9ca176625b1683d787f59659
2021-09-01 18:34:17 -07:00
778af56504 [DDP Comm Hook] Add debugging communication hooks to ddp_comm_hooks.rst (#64352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64352

as title
ghstack-source-id: 137246253

Test Plan: N/A

Reviewed By: rohan-varma

Differential Revision: D30694089

fbshipit-source-id: a78110b11d59bb0718f43c99ede23f2fd8ab21d0
2021-09-01 17:37:19 -07:00
bf9d66586c [DDP Comm Hook] Create a noop hook for performance debugging (#64344)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64344

As title.

Additionally, avoid using numpy array in test_ddp_hooks.py.
ghstack-source-id: 137170449

Test Plan: buck test mode/dev-nosan caffe2/test/distributed/algorithms/ddp_comm_hooks:test_ddp_hooks -- test_ddp_comm_hook_noop_hook

Reviewed By: rohan-varma

Differential Revision: D30693220

fbshipit-source-id: e17f0d1c6198863cf20a53566f586a6bff602522
2021-09-01 17:36:22 -07:00
baceea4426 [DDP] Add more logging iterations (#64071)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64071

Adding more logging iterations to get additional data.
ghstack-source-id: 137119476

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D30579367

fbshipit-source-id: 57195266ada5e5926f0d8eaf4fb4e01dc98924d7
2021-09-01 17:32:17 -07:00
59fcbd172b Fix incorrect DDP test (#64074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64074

Previous PR https://github.com/pytorch/pytorch/pull/63831 did not actually test the error in https://github.com/pytorch/pytorch/issues/63812. Introduce a test
directly from the repro that simulates it.
ghstack-source-id: 137171460

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D30569719

fbshipit-source-id: fd61250ef6d291c093607663d91d6d2cb5574eb7
2021-09-01 16:34:06 -07:00
9b8f9d5a25 [c10d] Prefer use of torch_check (#63928)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63928

throw std::invalid_argument results in not getting stacktraces with
TORCH_SHOW_CPP_STACKTRACES=1, so instead prefer torch_check here.
ghstack-source-id: 137135328

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D30533955

fbshipit-source-id: 33e5bf4f449e3043dec68da93f8022f6624d9675
2021-09-01 16:34:05 -07:00
5d80a48cef Add fast path for addmm when the inputs are conjugate (#59380)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59380

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D28898374

Pulled By: anjali411

fbshipit-source-id: eab0e64d37bb57c18b54cabb8e5c00666338ba04
2021-09-01 16:34:02 -07:00
a8f9aab840 [DDP Comm Hook] Add bf16 gradient compression to ddp_comm_hooks.rst (#64346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64346

as title
ghstack-source-id: 137170288

Test Plan: N/A

Reviewed By: rohan-varma

Differential Revision: D30693513

fbshipit-source-id: 8c64b8404ff3b0322e1bbbd93f6ef051ea91307d
2021-09-01 16:34:00 -07:00
ed89937d2c [quant][graphmode][fx] Add fbgemm backend_config_dict (#64288)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64288

This is just to setup the file structure and unblock experimentation.
The format for backend_config_dict will change in the future

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: zou3519

Differential Revision: D30699457

fbshipit-source-id: 28211a4def05d34757850c045a36e311f54760fe
2021-09-01 16:32:43 -07:00
69f4401b7b Make datasets in ConcatDataset not need to be sized (#64114)
Summary:
`datasets` needs to be iterable, but also sized because the length is checked. But immediately after it's converted to a list. By changing the order of these 2 lines, it doesn't need to be sized anymore.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64114

Reviewed By: H-Huang

Differential Revision: D30641480

Pulled By: ejguan

fbshipit-source-id: 7e16548c2123afa65b83845f9929271fa07fe1e8
2021-09-01 15:32:50 -07:00
535526b95c Restore LayerNorm numerics test (#64385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64385

It was deleted in https://github.com/pytorch/pytorch/pull/63276.

The numerics test was meant to check LayerNorm behavior on large inputs,
but we deleted it without realizing that.

Test Plan: - wait for tests.

Reviewed By: ngimel

Differential Revision: D30702950

Pulled By: zou3519

fbshipit-source-id: a480e26c45ec38fb628938b70416cdb22d976a46
2021-09-01 15:32:49 -07:00
7ffcf15503 [quant][graphmode][api] Add backend_config_dict to prepare_fx api (#64135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64135

We want to start aligning the api with the design in https://github.com/pytorch/pytorch/wiki/Extending-PyTorch-Quantization-to-Custom-Backends

We plan to gradually move things from `prepare_custom_config_dict` and `convert_custom_config_dict`
to `backend_config_dict` and allow custom backend developer to define their own way of quantizing operators.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: zou3519

Differential Revision: D30699456

fbshipit-source-id: e3c068da8d3da2270f57719f7159cc71cafa8598
2021-09-01 15:32:47 -07:00
93bc03622e Silent rm error for sccache log file (#64388)
Summary:
Sample reporting from dr.ci

![image](https://user-images.githubusercontent.com/658840/131724645-75afa04f-7554-4674-8e7c-cf139c84d994.png)

The `rm` command is not actually running into problems, just need to silent the console output.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64388

Reviewed By: walterddr, malfet, seemethere

Differential Revision: D30704439

Pulled By: zhouzhuojie

fbshipit-source-id: ecd35531decf05b75cef30d08d46635f81112f67
2021-09-01 15:32:45 -07:00
9495674905 [xplat][metal] Add getters and setters for ivars in Conv2dOpContext (#57395)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57395

As title
ghstack-source-id: 137223806

(Note: this ignores all push blocking failures!)

Test Plan:
### Lib Build
- `buck build caffe2:aten_metal_prepack`

### Integration Test
- `arc focus2 pp-ops -a ModelRunner`
- Click "Test Person/Hair Segmentation Model"

{F612831435}

- Image Classification Demo

{F614144868}

Reviewed By: xta0

Differential Revision: D28132020

fbshipit-source-id: 73560263a9d14e9ecfa39c69deb158a2ed8cb179
2021-09-01 15:31:12 -07:00
968d7ee46a [structured] Preserve computed elements from meta func to impl (#61746)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61746

**Summary**
This commit introduces a new feature for structured kernels that allows
kernels to declare quantities as "precomputed" in
`native_functions.yaml`, compute them once in the `meta` function and
reuse them again in the `impl`. The names and types of these quantities
are used to generate code for a struct containing them that the `meta`
function must return. In the case of a handful of surveyed kernels
(`all,`, `any`, `avg_pool2d`), these quantities that are used both in
the `meta` and `impl` have the same meaning as certain kernel arguments
and in fact supersede them. Accordingly, the correspondence between a
kernel argument and the precomputed elements that supersede it is also
captured in `native_functions.yaml`. This information is used to unpack
the struct returned by `meta` and pass its contents correctly to the
`impl` function.

The primary goal is to avoid recompute and enhance developer experience
(e.g. sometimes people can forget to compute these elements while
porting a kernel).

Test Plan: Imported from OSS

Reviewed By: tugsbayasgalan

Differential Revision: D30407831

Pulled By: SplitInfinity

fbshipit-source-id: 00975525ea373721fe52d06f75cd4ac91f3dc556
2021-09-01 14:34:25 -07:00
4aad366111 [Static Runtime] Make per-op latency readable by FAI-PEP (#64315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64315

Add a new flag `generate_ai_pep_output` to `StaticRuntime::benchmark`. If set, produces per-op-kind average total latency in milliseconds in a JSON format recognized by [Facebook AI performance evaluation platform (FAI-PEP)](https://github.com/facebook/FAI-PEP).

This is useful for observing the impact of changes that make a big difference for a specific op, but do not affect the overall SR latency by more than a few percent.

Reviewed By: hlu1

Differential Revision: D30679352

fbshipit-source-id: c847fa6ea20774aaf1e7949b11db4421d1f70b7e
2021-09-01 14:34:22 -07:00
86c9654291 Update optimize_for_mobile to preserve node's debug information (#63106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63106

Propagate debug info to the re-written nodes in the graph.

Test Plan:
- Clone open source repo and build
- ``` python3 test/test_jit.py TestOptimizeForMobilePreserveDebugInfo ```
- Tests pass

Reviewed By: kimishpatel

Differential Revision: D28654659

fbshipit-source-id: 2d7c87f2fb95a3be53246375f35639bbd97c237e
2021-09-01 14:34:20 -07:00
15ff25d1fc Break up "@generated" string so Phabricator shows changes
Summary: Created from CodeHub with https://fburl.com/edit-in-codehub

Test Plan:
CI

Sandcastle run

Reviewed By: larryliu0820

Differential Revision: D30701781

fbshipit-source-id: 3acab8b65a327c4ec7da90bc855ecf02f801c40a
2021-09-01 14:34:18 -07:00
e322547fe6 Add forward AD support for custom Functions (#64061)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64061

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D30640868

Pulled By: albanD

fbshipit-source-id: b0e6610430a879074d6d5306443772fc154b431f
2021-09-01 14:33:09 -07:00
25e2578967 Fix bytes_written and bytes_read (#64244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64244

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64040

In operator cost inference functions, in many places we are using sizeof(x.data_type()). Since data_type() returns a 32 bit integer from [this enum](https://www.internalfb.com/code/fbsource/[15e7ffe4073cf08c61077c7c24a4839504b964a2]/fbcode/caffe2/caffe2/proto/caffe2.proto?lines=20), we are basically always getting 4 for sizeof(x.data_type()) no matter what actual data type x has. Big thanks to Jack Langman for specifically pointing to this bug.

We would instead use the size in bytes based on actual data type.

Test Plan:
Added unit tests BatchMatMulMemCostTest:

buck test //caffe2/caffe2/fb/fbgemm:batch_matmul_op_test -- BatchMatMulMemCostTest

Extended existing unit test test_columnwise_concat for different data types:

buck test //caffe2/caffe2/python/operator_test:concat_op_cost_test -- test_columnwise_concat

Reviewed By: CrazySherman

Differential Revision: D30656698

fbshipit-source-id: d42c0c9a0c5b0ddc5dba39e4994f1f85a5e618bf
2021-09-01 13:35:41 -07:00
03a58a2ba0 [Caffe2] Create fewer strings during argument fetching (#64285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64285

With C++14 heterogeneous ordered container lookup, it is no longer necessary to create a `std::string` in order to look up elements of a `CaffeMap` keyed by std::string. Accordingly, this diff reworks the argument-getting operator functions to avoid that in favor of `c10::string_view`.
ghstack-source-id: 137139818
ghstack-source-id: 137139818

Test Plan: buildsizebot iOS apps -- code size win. less strings is probably marginally good for perf but this only happens at setup time anyway.

Reviewed By: dzhulgakov

Differential Revision: D26826676

fbshipit-source-id: ee653b14dc2c528bae8c90f0fc6a7a419cbca1d6
2021-09-01 13:30:54 -07:00
468001600c Back out "Revert D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling." (#64307)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64307

Original commit changeset: 0b2aa7c57d08

Restores original changes.
This diff changes the way operator profiling is done in lite predictor
benchmarking binary.
Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile
events and then generate operator level metric from it.
Since KinetoEvents do not contain cpu clock time, now we report only wallclock
time.
This unifies various profiling effort that we have for benchmarking purpose. In
production we will still use observer based mechanism, but the advantage of
using kineto profiler is that we get few other things for free, such as:
chrome trace generation.
operator level memory profiling (to be added)
flop counts (to be added)
Furthermore possible we can use python post processing script to parse chrome
trace and generate output similar to torch.profiler. (To be done)

Furthermore removes some tests from test_lite_interpreter.cpp which were testing module hierarchy in debug info. They should be covered by test_mobile_profiler.cpp.

Test Plan:
aibench run
Model without debug info:
https://www.internalfb.com/intern/aibench/details/219598441154763
Model with debug info and --print_module_info true (see Operator summary has now module hierarchy information).
https://www.internalfb.com/intern/aibench/details/617154236292985

Reviewed By: raziel

Differential Revision: D30680354

fbshipit-source-id: b6ba0d59c510c13d13d9935b1d8051cc82ffa4e9
2021-09-01 13:29:35 -07:00
421d8f86b6 Add a record scope around autograd::engine::evaluate_function (#63619)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63619

Adds a RECORD_FUNCTION with the function that is being valuate as part
of backwards execution. This has been useful in picking up some operations
in the backwards pass that otherwise would not show up, for example custom cpp
functions that use custom C++ code.
ghstack-source-id: 137041723

Test Plan:
CI

benchmark:
buck run mode/opt //scripts/rvarm1/ddp:bench

Reviewed By: albanD

Differential Revision: D30439492

fbshipit-source-id: 955917770cdf2a2edb0303223ace710b668ba388
2021-09-01 12:32:30 -07:00
0b48d96895 [Bootcamp] Include both python unittest and parser parameters in --help and -h flag (#64297)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/45945

Creates a new thread to run -h or --help with unittest.main if the help flag is present, and keeps the add_help default for parameters.

Includes both python unittest and parser parameters in --help and -h flag and will remain up to date since both messages are displayed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64297

Test Plan:
Imported from GitHub

`python test/test_spectral_ops.py --help`

Output:
```
% python test/test_spectral_ops.py --help
usage: test_spectral_ops.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]]

positional arguments:
  tests                a list of any number of test modules, classes and test methods.

optional arguments:
  -h, --help           show this help message and exit
  -v, --verbose        Verbose output
  -q, --quiet          Quiet output
  --locals             Show local variables in tracebacks
  -f, --failfast       Stop on first fail or error
  -c, --catch          Catch Ctrl-C and display results so far
  -b, --buffer         Buffer stdout and stderr during tests
  -k TESTNAMEPATTERNS  Only run tests which match the given substring

Examples:
  test_spectral_ops.py                           - run default set of tests
  test_spectral_ops.py MyTestSuite               - run suite 'MyTestSuite'
  test_spectral_ops.py MyTestCase.testSomething  - run MyTestCase.testSomething
  test_spectral_ops.py MyTestCase                - run all 'test*' test methods
                                       in MyTestCase

usage: test_spectral_ops.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT]
                            [--test_bailouts] [--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX]
                            [--run-parallel RUN_PARALLEL] [--import-slow-tests [IMPORT_SLOW_TESTS]]
                            [--import-disabled-tests [IMPORT_DISABLED_TESTS]]

optional arguments:
  -h, --help            show this help message and exit
  --subprocess          whether to run each test in a subprocess
  --seed SEED
  --accept
  --jit_executor JIT_EXECUTOR
  --repeat REPEAT
  --test_bailouts
  --save-xml [SAVE_XML]
  --discover-tests
  --log-suffix LOG_SUFFIX
  --run-parallel RUN_PARALLEL
  --import-slow-tests [IMPORT_SLOW_TESTS]
  --import-disabled-tests [IMPORT_DISABLED_TESTS]
  ```

Also ran some other tests to make sure tests still worked, and other tests with --help or -h flag

Reviewed By: seemethere

Differential Revision: D30677776

Pulled By: PatrickKan

fbshipit-source-id: eb3d6e3fa677137ec703ec3a23808efb99acc896
2021-09-01 12:30:47 -07:00
c6505cc383 [FX] Fix python code generation for wrapped getattr() with default value (#64271)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64271

Closes #60417

Modified emit_node() in fx/graph.py to generate getattr() call with default value when len(node.args) != 2 instead of accessing the attribute.
Added test_torch_fx_getattr() in test/test_fx.py.

Test Plan:
pytest test/test_fx.py

Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D30671265

fbshipit-source-id: f2db9ea47e0cb247547e200684f715aab006c374
2021-09-01 11:30:27 -07:00
87d8ab6e50 [nnc] Updated generic error message with info about turning off the fuser (#64316)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64316

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30683942

Pulled By: navahgar

fbshipit-source-id: d86607563672213f99a1436dcf4f5dc28053b713
2021-09-01 10:31:50 -07:00
c4f3f6e62d Fixes reduction launch config (#64304)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48573
See also https://github.com/pytorch/pytorch/pull/64194

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64304

Reviewed By: janeyx99

Differential Revision: D30689600

Pulled By: ngimel

fbshipit-source-id: bf2103ca177fd3b6e27bc0324b81925234483a29
2021-09-01 10:30:40 -07:00
d5bfdd3dac OpInfo for nn.functional.layer_norm (#63276)
Summary:
Please see https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261.

Note:

* This PR also adds a reference test inspired by existing tests in `test_nn.py`.

cc: mruberry zou3519

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63276

Reviewed By: ejguan

Differential Revision: D30452483

Pulled By: zou3519

fbshipit-source-id: 2578d01ca34e031668a41bd284db60c31ae1fba8
2021-09-01 09:31:45 -07:00
d1f3d85fd8 fix GradBucket.is_last() logic (#63768)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63768

passed number of buckets to GradBucket constructor, to check if index is equal to num_buckets - 1 in the .is_last() function.

Test Plan:
buck test mode/dev-nosan //caffe2/test/distributed/algorithms/ddp_comm_hooks:test_ddp_hooks

test output: https://www.internalfb.com/intern/testinfra/testconsole/testrun/8162774375985873/

Reviewed By: SciPioneer, mrshenli

Differential Revision: D30455913

fbshipit-source-id: 8c67ca69cbf191d6e189e09248407eb167bb24b6
2021-09-01 09:29:13 -07:00
92b31b59af Revert D29699456: [pytorch][PR] Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA]
Test Plan: revert-hammer

Differential Revision:
D29699456 (ad4848565e)

Original commit changeset: 407ae53392ac

fbshipit-source-id: b6c70ba8bb28c0c38de47857030b69792a8470de
2021-09-01 07:32:24 -07:00
0c4e4e588e [FX] Rename reduce functions back to their old, public names (#64324)
Summary:
Unfortunately pickle serializes the names of these functions. Also put them under backward-compatibility enforcement.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64324

Test Plan: Local repro https://fb.workplace.com/groups/3440841732711443/permalink/4018921611570116/

Reviewed By: SplitInfinity, TailofJune

Differential Revision: D30684185

Pulled By: jamesr66a

fbshipit-source-id: 900701220155d15115cd0c07cf7774a2891bd04f
2021-08-31 22:36:11 -07:00
05ecaefbbf [Metal][GPU] Enable metal for simulators and fix test failures if possible (#64322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64322

As title
ghstack-source-id: 137143877

Test Plan:
- `aibench-cli mobile`
- Select iOS -> `y` -> `1` -> `n` -> "--metal_op_test"
- Select all iPhone 6 + iPhone 7 + iPhone 8 and a iPhone X or 11 or 12
```
Benchmark Submitted. Find more details at: https://our.intern.facebook.com/intern/aibench/details/318120612514604
Benchmark Status:
        D10 (b8256280ce)AP-12.0.1: DONE
        N71mAP-14.3: DONE
DUMMY latency:
        D10 (b8256280ce)AP-12.0.1: 4319.3
        N71mAP-14.3: 8868.51
I0831 16:06:27.210558 605277 ClientSingletonManager.cpp:99] Shutting down Manifold ClientSingletonManager
```

Reviewed By: xta0

Differential Revision: D30147163

fbshipit-source-id: 2de6bbd9bd525e32ca92b2845eb435800855edcc
2021-08-31 22:36:09 -07:00
24e50b8453 [CUDA graphs] hotfix for test_graph_ (#64339)
Summary:
Graphed workloads that try to capture a full backward pass must do warmup on a non-default stream. If warmup happens on the default stream, AccumulateGrad functions might tag themselves to run on the default stream, and therefore won't be capturable.

ngimel and I suspect some test_cuda.py tests run with the default stream as the ambient stream, which breaks `test_graph_grad_scaling` because `test_graph_grad_scaling` does warmup on the ambient stream _assuming_ the ambient stream is a non-default stream.

This PR explicitly sets a side stream for the warmup in `test_graph_grad_scaling`, which is what I should have done all along because it's what the new documentation recommends.

I pushed the PR branch straight to the main pytorch repo because we need to run ci-all on it, and I'm not sure what the requirements are these days.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64339

Reviewed By: mruberry

Differential Revision: D30690711

Pulled By: ngimel

fbshipit-source-id: 91ad75f46a11f311e25bc468ea184e22acdcc25a
2021-08-31 22:34:10 -07:00
479fc4e412 Remove outdated warning about RecursiveScriptModule not being copiable (#64085)
Summary:
RecursiveScriptModule has its customized `__copy__` and `__deepcopy__` defined. The warning/error  that says it is not copiable is outdated

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64085

Reviewed By: rohan-varma

Differential Revision: D30598623

Pulled By: gmagogsfm

fbshipit-source-id: 0701d8617f42d818bc7b88244caee4cd47fbe976
2021-08-31 21:31:32 -07:00
8337a3fb3f [TensorExpr] Wrap error messages with buildErrorMessage call. (#64330)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64330

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30687226

Pulled By: ZolotukhinM

fbshipit-source-id: ade1be2ad6847c6afbba60307ef854696821b4e3
2021-08-31 20:31:16 -07:00
a87808de93 Fix bug in ShardedTensorMetadata serde. (#63902)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63902

The 'memory_format' field was not being serialized correctly and used
the same encoding for different fields.
ghstack-source-id: 137142406

Test Plan: waitforbuildbot

Reviewed By: bowangbj

Differential Revision: D30527324

fbshipit-source-id: f0f223e2d660ef6e4abae9649d9992acc36e1278
2021-08-31 20:31:14 -07:00
fa5676a41b Delete some dead code from RRefMessageBase (#64298)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64298

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D30676702

Pulled By: pbelevich

fbshipit-source-id: 77dbc0f8064c3518376454ff573d45ed0274956b
2021-08-31 20:30:04 -07:00
6bb4b5d150 disallow empty named dims list to flatten(names, name) (#61953)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61137 by raising an error if an empty tuple is passed in for the names:
```
>>> torch.empty((2, 3), names=['a', 'b']).flatten((), 'abc')
RuntimeError: flatten(tensor, dims, out_dim): dims cannot be empty
```

or from the original issue:
```
>>> torch.empty((2, 3)).flatten((), 'abc')
RuntimeError: flatten(tensor, dims, out_dim): dims cannot be empty
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61953

Reviewed By: iramazanli

Differential Revision: D30574571

Pulled By: malfet

fbshipit-source-id: e606e84458a8dd66e5da6d0eb1a260f37b4ce91b
2021-08-31 19:32:30 -07:00
c59970db6b [caffe2][easy] Save heap allocation in ConcatOp (#63529)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63529

Output() takes an IntArrayRef, so we can just use a std::initializer_list (stack-allocated array) instead of std::vector here.
ghstack-source-id: 137085908

Test Plan: existing CI

Reviewed By: mruberry

Differential Revision: D29687400

fbshipit-source-id: 9f2a7c6679f2552c098bb1bf7befaca18e0e5d4d
2021-08-31 18:33:32 -07:00
b23e4f6086 Convert mul to use opmath_gpu_kernel_with_scalars (#64019)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64019

Note that previously the functor operated on scalar_t and
this modifies it to operate on opmath_t, but this is not
a problem as half precision was implemented by performing the
compute in float anyway.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30575282

Pulled By: ezyang

fbshipit-source-id: cc6900ef996e755740afe48f9cb4d0366858dd47
2021-08-31 18:33:30 -07:00
0733582087 Use the correct overloaded name to skip boxed autograd not implemented kernel registration (#64182)
Summary:
Some internal use_count tests are failing for `dequantize_self` because we only compare the skip list with the base name `dequantize` when we should be comparing with the full name including the overload

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64182

Reviewed By: albanD

Differential Revision: D30639909

Pulled By: soulitzer

fbshipit-source-id: d4d22dd1a5c8f7180251ce7739830764cce6f151
2021-08-31 18:33:28 -07:00
09e610e36d [Static Runtime] Out version for softmax (#64243)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64243

Test Plan:
```
> buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1
...
V0830 16:35:22.524479 613839 impl.cpp:1410] Switch to out variant for node: %5 : Tensor = aten::softmax(%a.1, %dim.1, %dtype.1)
...
[       OK ] StaticRuntime.IndividualOps_Softmax (803 ms)
```

Reviewed By: hlu1

Differential Revision: D30656149

fbshipit-source-id: 115b7b4a75448fd6a5c526808080ca9a4251302c
2021-08-31 18:33:26 -07:00
0b9cdeb295 .circleci: Remove already migrated CUDA configs (#64231)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64231

This migrates over the CUDA 11.1 and CUDA 10.2 configs that we had
previously migrated to GHA

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra

Test Plan: Imported from OSS

Reviewed By: zhouzhuojie

Differential Revision: D30683811

Pulled By: seemethere

fbshipit-source-id: 71b0761461557d871c26eb02f665a2e4d9b1d9fb
2021-08-31 18:33:24 -07:00
23da90ab84 .github: Consolidate linux setup / teardown (#64229)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64229

Consolidates linux setup / teardown into easy to use jinja2 macros

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra

Test Plan: Imported from OSS

Reviewed By: zhouzhuojie, driazati

Differential Revision: D30683810

Pulled By: seemethere

fbshipit-source-id: 2578630df3e212fb79392a699090553baef44cc2
2021-08-31 18:31:48 -07:00
5ecb966e0f Add ciflow-tracking issue to pytorch-probot (#64125)
Summary:
Doesn't do anything yet...

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64125

Reviewed By: zhouzhuojie

Differential Revision: D30620283

Pulled By: malfet

fbshipit-source-id: 91869d35c1b70a55e32261d2c32fb0136ec33960
2021-08-31 17:38:34 -07:00
9e25634833 [TensorExpr] Move declaration of buildErrorMessage to exception.h (#64301)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64301

Test Plan: Imported from OSS

Reviewed By: navahgar, huiguoo

Differential Revision: D30678215

Pulled By: ZolotukhinM

fbshipit-source-id: 599c83b3890450a0fb6526815f037eec9563661c
2021-08-31 17:37:29 -07:00
44fcb00a56 Fix redundant class definition in GraphModule singleton constructor (#64274)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63883

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64274

Reviewed By: jamesr66a

Differential Revision: D30675970

Pulled By: jayleverett

fbshipit-source-id: e74ef2a28013f0fa7c58d14f38e66cfe48d26b74
2021-08-31 17:34:14 -07:00
c2da103fe6 Discover new tests in run_tests.py (#64246)
Summary:
Introduce `discover_tests` function that globs for all Python files
starting with `test_` in test folder excluding subfolders which are
executed differently

Fixes https://github.com/pytorch/pytorch/issues/64178

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64246

Reviewed By: walterddr, seemethere

Differential Revision: D30661652

Pulled By: malfet

fbshipit-source-id: a52e78ec717b6846add267579dd8d9ae75326bf9
2021-08-31 17:32:55 -07:00
0457a85d45 Revert D30543236: Add python mode
Test Plan: revert-hammer

Differential Revision:
D30543236 (4bd03b0242)

Original commit changeset: ef5444d96a5a

fbshipit-source-id: b0042ac2c22765fa11d6d00bf751f6a4489eb6d8
2021-08-31 15:28:33 -07:00
6c8cb9bd76 [DataPipe] export fork, mux, demux for public usage (#64279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64279

cc VitalyFedyunin ejguan

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30671971

Pulled By: NivekT

fbshipit-source-id: 056ac12ef7183b254d1eec341145594639e47ef6
2021-08-31 14:34:30 -07:00
491bf7cb74 [DataPipe] adding description, __len__, tests for mux() (#64224)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64224

cc VitalyFedyunin ejguan

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30651551

Pulled By: NivekT

fbshipit-source-id: f8af98ba71a592900b992a8077432062ec57bb48
2021-08-31 14:34:28 -07:00
9a0456939b Try the forked checkout action with retry (#64120)
Summary:
Fixes #{issue number}

The main difference is:
ffc6f93ad4

Can test multiple times in this PR to see if it works, will make the `retry` number configurable if it's usable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64120

Reviewed By: malfet

Differential Revision: D30656099

Pulled By: zhouzhuojie

fbshipit-source-id: a89932196bb0c44e412a34664ed6a061b02ef92e
2021-08-31 14:34:26 -07:00
13484084a6 fix syntax error in bfloat16 PR (#64122)
Summary:
fixes prior syntax error from PR ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64122

Reviewed By: H-Huang

Differential Revision: D30643596

Pulled By: ngimel

fbshipit-source-id: 0a2d5a40fb6dc7339cd03112e57ef0e1bf8a000e
2021-08-31 14:33:12 -07:00
8d08b103be [CUDA graphs] Prototype API and documentation (#63269)
Summary:
RFC: https://github.com/pytorch/pytorch/issues/61880

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63269

Reviewed By: mruberry

Differential Revision: D30596643

Pulled By: ngimel

fbshipit-source-id: b1f8061406364b667e2c2d4d30fbce1f0d8456be
2021-08-31 13:34:23 -07:00
1c2b5e59ae Remove ref to test_distributed_fork (#64197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64197

Removes this line as test is gone.
ghstack-source-id: 136986275

Test Plan: CI

Reviewed By: walterddr

Differential Revision: D30642929

fbshipit-source-id: a0c7dfdfb35a4a7f7ec1b881dbea53d85136012c
2021-08-31 13:31:27 -07:00
555171a273 .circleci: Remove migrated jobs, move docs builds (#64222)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64222

Removes both backwards_compat as well as docs_test from the general
gcc5.4 config and moves the docs build from being run on every PR to
only being run on master.

We can remove docs builds when we migrate the docs push job (including
all secrets associated with that)

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30650953

Pulled By: seemethere

fbshipit-source-id: ac11da6a551a6c81f3dc1d47fd81846cbfe9975a
2021-08-31 13:30:13 -07:00
347ef69529 [ao][docs] Clarify operator support for quantization (#63270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63270

Add table to quantization main page showing supported modules
for static and dynamic quantization.
ghstack-source-id: 137087204

Test Plan: Imported from OSS

Reviewed By: HDCharles

Differential Revision: D30658654

fbshipit-source-id: a82c998e1db6370596d5b0ca4c7cc96c1c90f30e
2021-08-31 12:32:47 -07:00
3a46edb8d8 ns for fx: make layer types more readable (#64270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64270

Before this PR, layer types were populated by doing
`str(module_instance)` and `str(function)`. This resulted
in moderately readable strings for modules, and poorly readable
strings for functions.

This PR switches the logic to use `torch.typename` utility instead.
The results are significantly more readable.

Example function type:

```
# before
'<built-in method linear of PyCapsule object at 0x7fe9b20ce7b0>'

# after
'torch._ops.quantized.PyCapsule.linear'
```

Example module type:

```
# before
"<class 'torch.nn.quantized.modules.conv.Conv2d'>"

# after
'torch.nn.quantized.modules.conv.Conv2d'
```

Test Plan:
Manually inspect NS results for modules and functions, verify they are
more readable.

Manually inspect NS results for modules and functions, verify they are
more readable.

Imported from OSS

Differential Revision:
D30669545
D30669545

Reviewed By: jerryzh168

Pulled By: vkuzo

fbshipit-source-id: 60959e5cafa0a4992b083bf99f5d8260f9acdac0
2021-08-31 12:31:34 -07:00
845bc89811 [fx2trt] Add acc_ops.sign and converter for it (#63876)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63876

Add `acc_ops.sign` which maps from `torch.sign`.

Add a plugin (not support dynamic shape currently) for `acc_ops.sign`. The plugin calls `at::sign` directly.

Test Plan: buck test mode/opt -c python.package_style=inplace -c fbcode.nvcc_arch=a100 caffe2/torch/fb/fx2trt:test_unary_ops

Reviewed By: yinghai

Differential Revision: D30518081

fbshipit-source-id: a0b9e6c30deac0b04b8cb09a162579e229985330
2021-08-31 11:31:53 -07:00
83e28a7d28 Use stacklevel for floordiv deprecation warnings (#64034)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60548

`Tensor.__floordiv__` was indirectly deprecated by deprecation of `torch.floor_divide` (see https://github.com/pytorch/pytorch/issues/43874). Deprecating it directly provides clearer feedback.

Repro:
```
import torch
x = torch.tensor(0)
x // 1
```

Before this change, a deprecation warning was triggered within the C++ implementation of floor_divide:
```
UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  ../aten/src/ATen/native/BinaryOps.cpp:571.)
  return torch.floor_divide(self, other)
```

After this change, the warning instead cites the user's offending line of Python code:
```
UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  x // 1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64034

Reviewed By: mruberry

Differential Revision: D30658010

Pulled By: saketh-are

fbshipit-source-id: b0e6c5008d741897509d102f4a89efb47de4aa2a
2021-08-31 11:27:56 -07:00
b9275a4003 [ao][docs] Add description of qconfig and qengine to quantization page (#63582)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63582

Current quantization docs do not define qconfig and qengine. Added text to define these concepts before they are used.
ghstack-source-id: 137051719

Test Plan: Imported from OSS

Reviewed By: HDCharles

Differential Revision: D30658656

fbshipit-source-id: a45a0fcdf685ca1c3f5c3506337246a430f8f506
2021-08-31 10:33:07 -07:00
ca8dd296ee Add OpInfo for nn.functional.cosine_similarity (#62959)
Summary:
Please see https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261.

Notes:

* Some redundant tests from `test_nn.py` have been removed. I'm unsure about precision checks if they can be removed as well.
* Broadcasting is also checked in the OpInfo for `cosine_similarity`.

cc: mruberry zou3519 Chillee

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62959

Reviewed By: heitorschueroff

Differential Revision: D30520176

Pulled By: zou3519

fbshipit-source-id: 14e902eb4bcce875edab28a1669a2ea021052b9b
2021-08-31 10:31:36 -07:00
0ef8760bf6 [DataPipe] implementing __len__ for fork (no valid length for demux) (#64215)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64215

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30648672

Pulled By: NivekT

fbshipit-source-id: 4780f2f6a79ae15a4009092475e7d92f96dd09a2
2021-08-31 08:32:31 -07:00
0deb7a0bc0 [DataPipe] implementing demux() (#63650)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63650

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30493944

Pulled By: NivekT

fbshipit-source-id: 0aa06dee8c7fb1744975b8f6a0694b90c11ef80d
2021-08-31 08:32:29 -07:00
eee054e6ea [DataPipe] implementing fork() (#63649)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63649

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30493945

Pulled By: NivekT

fbshipit-source-id: 40db7d4134facd266d86bc0dc2edf2729c4e5842
2021-08-31 08:32:27 -07:00
67cb131458 Revert D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling.
Test Plan: revert-hammer

Differential Revision:
D30327514 (bc9277dca3)

Original commit changeset: 3bb2f2daaaed

fbshipit-source-id: 0b2aa7c57d08de77c9aaa75e546a7d0938610f64
2021-08-31 08:30:36 -07:00
3c15822f5f [Static Runtime] Implement aten::nonzero out variant (#64126)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64126

Test Plan:
Confirm out variant is called:

```
> buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1
```

Reviewed By: mikeiovine

Differential Revision: D30617729

fbshipit-source-id: 752749638c8f467815efa57021cb3de5c728ab1b
2021-08-31 00:51:15 -07:00
a3d6dae319 Automated submodule update: FBGEMM (#64213)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: 9d69998df6

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64213

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: jspark1105

Differential Revision: D30647878

fbshipit-source-id: b903b39441b4e28dda7eab226ac874e2227e750a
2021-08-30 21:33:17 -07:00
bc9277dca3 [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling. (#63367)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63367

This diff changes the way operator profiling is done in lite predictor
benchmarking binary.
Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile
events and then generate operator level metric from it.
Since KinetoEvents do not contain cpu clock time, now we report only wallclock
time.
This unifies various profiling effort that we have for benchmarking purpose. In
production we will still use observer based mechanism, but the advantage of
using kineto profiler is that we get few other things for free, such as:
- chrome trace generation.
- operator level memory profiling (to be added)
- flop counts (to be added)

Furthermore possible we can use python post processing script to parse chrome
trace and generate output similar to torch.profiler. (To be done)

Test Plan:
aibench run
Model without debug info:
https://www.internalfb.com/intern/aibench/details/219598441154763
Model with debug info and `--print_module_info true` (see Operator summary has now module hierarchy information).
https://www.internalfb.com/intern/aibench/details/617154236292985

Reviewed By: raziel

Differential Revision: D30327514

fbshipit-source-id: 3bb2f2daaaedfb04bd6f5d9c91292783f9c4344f
2021-08-30 20:54:51 -07:00
7ca4728e6d Compile BatchLinearAlgebra without nvcc (#64146)
Summary:
These files only use cuda libraries interfaces, so don't actually need to be compiled with nvcc.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64146

Reviewed By: ezyang

Differential Revision: D30633189

Pulled By: ngimel

fbshipit-source-id: c9d0ae5259a10cb49332d31f0da89ad758736ea8
2021-08-30 20:18:21 -07:00
e7fb35021a [nnc] Enable fusion of bfloat16 ops (#64196)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64196

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30643864

Pulled By: bertmaher

fbshipit-source-id: e95edeaf7089464d713ea1d1f951743d3e5f61c5
2021-08-30 20:09:36 -07:00
538647fe1f [WIP][FX] BC guarantees for 1.10 (#63888)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63888

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D30523133

Pulled By: jamesr66a

fbshipit-source-id: b04cc0d842a74862f42ecba98b757310cd2ec7b0
2021-08-30 19:56:46 -07:00
09dfaa0339 add operation list for AutocastCPU (#63534)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63534

In this PR:
* We have changed the default dtype of `AutocastCPU` from `float16` to `bfloat16` as discussed here `https://github.com/pytorch/pytorch/pull/61002`
* We also update the operation list which needs casting to `lower_precision_fp` or `float32`.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D30644914

Pulled By: ezyang

fbshipit-source-id: 8b93485ba452b3759611e3f0ac88e920fe495ac1
2021-08-30 19:30:33 -07:00
93f1090267 Update contribution_guide.rst (#64142)
Summary:
Grammatical update.

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64142

Reviewed By: mruberry

Differential Revision: D30639394

Pulled By: ezyang

fbshipit-source-id: cf1a4dfbd8e34b0772f1b09f5d820278e8ef8574
2021-08-30 19:26:59 -07:00
6b85c99ce5 Avoid an unnecessary list creation in DataChunk (#64111)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64111

Reviewed By: mruberry

Differential Revision: D30639383

Pulled By: ezyang

fbshipit-source-id: 96b243307413c99a67d55d862a71937e1ef210f4
2021-08-30 19:25:42 -07:00
c7c711bfb8 Add optional tensor arguments to (#63967)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63435

Adds optional tensor arguments to check handling torch function checks. The only one I didn't do this for in the functional file was `multi_head_attention_forward` since that already took care of some optional tensor arguments but not others so it seemed like arguments were specifically chosen

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63967

Reviewed By: albanD

Differential Revision: D30640441

Pulled By: ezyang

fbshipit-source-id: 5ef9554d2fb6c14779f8f45542ab435fb49e5d0f
2021-08-30 19:21:26 -07:00
cb7cf823b3 add BFloat16 support for fold and unfold on CPU (#62880)
Summary:
Add BFloat16 support for fold and unfold operators on CPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62880

Reviewed By: iramazanli

Differential Revision: D30576387

Pulled By: zou3519

fbshipit-source-id: c48f6e56702bfea34448db1b3a1634c49c5d8ec8
2021-08-30 19:14:10 -07:00
ffc2612087 Add acc_gpu_kernel_with_scalars and port add to use it (#63884)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63884

See https://dev-discuss.pytorch.org/t/cuda-loops-case-study-code-generation-vs-templates/302
for explanation of what's going on here.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30545296

Pulled By: ezyang

fbshipit-source-id: f0da52153ae63599fe1d57e90e73f50ca2116939
2021-08-30 19:10:16 -07:00
a49907f984 Modify inline doc for DataPipe (#64221)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64221

List of tasks in this PR
- [x]  Add inline doc for DataPipe
- [x] Improve the inline doc
- [x] Expose DataPipe to `datapipes.iter` (`UnBatcher`) Note: `Forker`, `Demux`, `Mux` are exposed in another PR authored by Kevin
- [x] Add correct typing to DataPipe
- [x] Unify the argument to `datapipe` rather than `source_datapipe`

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D30650541

Pulled By: ejguan

fbshipit-source-id: c09d1b9742b8097d8e645c15947cef80c876877b
2021-08-30 18:45:46 -07:00
af85bc5ffd Replace group_by_key by group_by IterDataPipe (#64220)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64220

Remove `ByKeyGrouperIterDataPipe` due to duplicated functionality.
Fix a bug in `GrouperIterDataPipe` using the existing test.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D30650542

Pulled By: ejguan

fbshipit-source-id: 666b4d28282fb4f49f3ff101b8d08be16a50d836
2021-08-30 18:45:44 -07:00
4bd03b0242 Add python mode (#63496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63496

This PR adds a (private) enable_python_mode context manager.
(see torch/utils/_python_dispatch.py).
enable_python_mode accepts the type of a __torch_dispatch__ object
as its argument. Whenever an operator gets called inside of the
context manager, it dispatches to the __torch_dispatch__ of
the passed-in type.

Example usage:
```
with enable_python_mode(LoggingTensor):
    z = torch.empty([])
    assert isinstance(z, LoggingTensor)
```

There are quite a few changes that were made to support this.

First, we added TorchDispatchTypeObject, a C++ struct that represents the
type of a `__torch_dispatch__` object (e.g. LoggingTensor).
It holds both the PyObject* representing the class and a PyInterpreter*
so we know which Python interpreter it came from.

Next, we updated the concrete_dispatch_fn in python_variable.cpp to accept
a `const std::shared_ptr<TorchDispatchTypeObject>&` argument. When this
is null, dispatching happens as usual. When it is non-null, we prepend
the TorchDispatchTypeObject's PyObject* to the overloaded args list so that
it is considered first for dispatch.

To get that to work, we changed how `handle_torch_dispatch_no_python_arg_parser`
works. The "overloaded args list" previously only consisted of Tensor PyObjects,
but now it can have types in addition to Tensors!
- We renamed `append_overloaded_arg` to `append_overloaded_arg`
- We added a new `append_overloaded_type` that appends a type to
overloaded_args
- We added special handling in `handle_torch_dispatch_no_python_arg_parser`
and `append_overloaded_arg` to handle types in addition to Tensors.

Then, there is PythonMode and PythonModeTLS.
- We reuse the DispatchKey::Python dispatch key as a mode key
- We use PythonMode::enter and PythonMode::exit to enable/disable
DispatchKey::Python and set the PythonModeTLS.
- PythonModeTLS stores a TorchDispatchTypeObject as metadata.
- PythonMode is in libtorch_python, and PythonModeTLS is in ATen.
This split is due to the libtorch_python library boundary (because we need
to save TLS in ATen/ThreadLocalState)
- We modify the PythonFallbackKernel to look up
the relevant TorchDispatchTypeObject (if Python Mode is active) and
dispatch using it.

There are two more miscellaneous changes:
- internal_new_from_data (torch/csrc/utils/tensor_new.cpp) gets an
exclude guard. enable_python_mode currently does not handle
torch.tensor and the exclude guard is to prevent a bug.

Future:
- This PR does not allow for the nesting of Python modes. In the future we
should be able to enable this with a more sane no_dispatch API and by changing
the TLS to a stack. For now I did not need this for CompositeImplicitAutograd testing.

Test Plan: - new tests

Reviewed By: malfet, albanD

Differential Revision: D30543236

Pulled By: zou3519

fbshipit-source-id: ef5444d96a5a957d1657b7e37dce80f9a497d452
2021-08-30 18:44:35 -07:00
ebc0aacf83 [nnc] Fix half2float conversion and re-enable float16 (#64199)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64199

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30643865

Pulled By: bertmaher

fbshipit-source-id: 9de6adca53bd08839328cbaf6364f7de9550264b
2021-08-30 18:37:55 -07:00
1f16c22dc8 [Static Runtime] Implement aten::cumsum out variant (#64159)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64159

Test Plan:
Confirm out variant is called for both versions:

```
> buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1
```

Reviewed By: mikeiovine

Differential Revision: D30622819

fbshipit-source-id: a2c8c7f969dae5f507718fb3d513e1fb4f026736
2021-08-30 16:18:22 -07:00
5401159b8f OpInfo for nn.functional.interpolate (#61956)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61956

Each mode goes through a different implementation so they are listed as
different variants.

Test Plan: - run tests

Reviewed By: malfet

Differential Revision: D30013751

Pulled By: zou3519

fbshipit-source-id: 4253b40b55667d7486ef2d98b441c13d807ab292
2021-08-30 16:00:43 -07:00
a7ae73a238 BUG Fixes regression for nllloss gradcheck (#64203)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64163

This PR includes the fix and the opinfo from https://github.com/pytorch/pytorch/pull/63854/ for non-regression testing.

cc albanD mruberry jbschlosser

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64203

Reviewed By: albanD

Differential Revision: D30647522

Pulled By: jbschlosser

fbshipit-source-id: 2974d299763505908fa93532aca2bd5d5b71f2e9
2021-08-30 15:13:09 -07:00
ad4848565e Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA] (#59980)
Summary:
This PR enables Half, BFloat16, ComplexFloat, and ComplexDouble support for matrix-matrix multiplication of COO sparse matrices.
The change is applied only to CUDA 11+ builds.

`cusparseSpGEMM` also supports `CUDA_C_16F` (complex float16) and `CUDA_C_16BF` (complex bfloat16). PyTorch also supports the complex float16 dtype (`ScalarType::ComplexHalf`), but there is no convenient dispatch, so this dtype is omitted in this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59980

Reviewed By: ngimel

Differential Revision: D29699456

Pulled By: cpuhrsch

fbshipit-source-id: 407ae53392acb2f92396a62a57cbaeb0fe6e950b
2021-08-30 15:06:25 -07:00
c3464e78a4 Revert D30561459: Fix bytes_written and bytes_read
Test Plan: revert-hammer

Differential Revision:
D30561459 (e98173ff34)

Original commit changeset: 976fa5167097

fbshipit-source-id: 43f4c234ca400820fe6db5b4f37a25e14dc4b0dd
2021-08-30 14:59:54 -07:00
e4fd2ab59c Back out "Added reference tests to ReductionOpInfo" (#64183)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64183

Original commit changeset: 6a1f82ac2819

Test Plan: CI

Reviewed By: soulitzer

Differential Revision: D30639835

fbshipit-source-id: e238043c6fbd0453317a9ed219e348298f98aaca
2021-08-30 14:48:10 -07:00
8f88f797db [quant][graphmode][fx] Add reference quantized conv module (#63828)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63828

Added reference quantized conv module for the custom backend flow, the reference quantized module will
have the following code:
```
        w(float) -- quant - dequant \
        x(float) ------------- F.conv2d ---
```
In the full model, we will see
```
        w(float) -- quant - *dequant \
        x -- quant --- *dequant --  *F.conv2d --- *quant - dequant
```
and the backend should be able to fuse the ops with `*` into a quantized linear

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_conv_linear_reference

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30504749

fbshipit-source-id: e1d8c43a0e0d6d9ea2375b8ca59a9c0f455514fb
2021-08-30 14:23:17 -07:00
65050ec924 Back out "[JIT] Add aten::slice optimization"
Summary:
Original commit changeset: d12ee39f6828
build-break
overriding_review_checks_triggers_an_audit_and_retroactive_review
Oncall Short Name: dskhudia

Test Plan: Local run succeeds

Differential Revision: D30633990

fbshipit-source-id: 91cf7cc0ad7e47d919347c2a1527688e062e0c62
2021-08-30 14:05:04 -07:00
09e53c0cfe .github: Adding configuration for backwards_compat (#64204)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64204

Adds backwards_compat to our existing test matrix for github actions

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30646764

Pulled By: seemethere

fbshipit-source-id: f0da6027e29fab03aff058cb13466fae5dcf3678
2021-08-30 13:59:00 -07:00
9035a1cb4d .github: Adding configuration for docs_test (#64201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64201

Adds docs_test to our existing test matrix for github actions

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30646765

Pulled By: seemethere

fbshipit-source-id: 946adae01ff1f1f7ebe626e408e161b77b19a011
2021-08-30 13:57:20 -07:00
85df73658c Make name() part of IMethod interface (#63995)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63995

JIT methods already have name() in their interface, and Py methods have names in their implementation.  I'm adding this for a particular case where someone tried to use name() on a JIT method that we're replacing with an IMethod.

Test Plan: add case to imethod API test

Reviewed By: suo

Differential Revision: D30559401

fbshipit-source-id: 76236721f5cd9a9d9d488ddba12bfdd01d679a2c
2021-08-30 13:31:55 -07:00
b9933f08b9 Fix type annotation in tools/nightly.py (#64202)
Summary:
`tempfile.TemporaryDirectory` is a generic only in python-3.9 and above

Workaround by wrapping type annotation in quotes

Fixes https://github.com/pytorch/pytorch/issues/64017

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64202

Reviewed By: janeyx99

Differential Revision: D30644215

Pulled By: malfet

fbshipit-source-id: 3c16240b9fa899bd4d572c1732a7d87d3dd0fbd5
2021-08-30 13:27:43 -07:00
f3e329cbec Implements the orthogonal parametrization (#62089)
Summary:
Implements an orthogonal / unitary parametrisation.

It does passes the tests and I have trained a couple models with this implementation, so I believe it should be somewhat correct. Now, the implementation is very subtle. I'm tagging nikitaved  and IvanYashchuk as reviewers in case they have comments / they see some room for optimisation of the code, in particular of the `forward` function.

Fixes https://github.com/pytorch/pytorch/issues/42243

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62089

Reviewed By: ezyang

Differential Revision: D30639063

Pulled By: albanD

fbshipit-source-id: 988664f333ac7a75ce71ba44c8d77b986dff2fe6
2021-08-30 13:12:07 -07:00
e98173ff34 Fix bytes_written and bytes_read (#64040)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64040

In operator cost inference functions, in many places we are using sizeof(x.data_type()). Since data_type() returns a 32 bit integer from [this enum](https://www.internalfb.com/code/fbsource/[15e7ffe4073cf08c61077c7c24a4839504b964a2]/fbcode/caffe2/caffe2/proto/caffe2.proto?lines=20), we are basically always getting 4 for sizeof(x.data_type()) no matter what actual data type x has. Big thanks to Jack Langman for specifically pointing to this bug.

We would instead use the size in bytes based on actual data type.

Test Plan:
Added unit tests BatchMatMulMemCostTest:

buck test //caffe2/caffe2/fb/fbgemm:batch_matmul_op_test -- BatchMatMulMemCostTest

Extended existing unit test test_columnwise_concat for different data types:

buck test //caffe2/caffe2/python/operator_test:concat_op_cost_test -- test_columnwise_concat

Differential Revision: D30561459

fbshipit-source-id: 976fa5167097a35af548498480001aafd7851d93
2021-08-30 12:57:31 -07:00
eafe33c995 remove componentwise comparison of complex values in torch.testing.assert_close (#63841)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63841

Closes #61906.

cc ezyang gchanan

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30633526

Pulled By: mruberry

fbshipit-source-id: ddb5d61838cd1e12d19d0093799e827344382cdc
2021-08-30 12:38:44 -07:00
401bbb2aa0 remove componentwise comparison of complex values in TestCase.assertEqual (#63572)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63572

Addresses #61906. Issue will be fixed later in the stack when `torch.testing.assert_close` got the same treatment.

cc ezyang gchanan

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30633527

Pulled By: mruberry

fbshipit-source-id: c2002a4998a7a75cb2ab83f87190bde43a9d4f7c
2021-08-30 12:36:45 -07:00
a8ffe81b2c Bring back old algorithm for sorting on small number of segments (#64127)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63456
The code was copy-pasted from the previous commit without modification.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64127

Reviewed By: mruberry

Differential Revision: D30632090

Pulled By: ngimel

fbshipit-source-id: 58bbdd9b0423f01d4e65e2ec925ad9a3f88efc9b
2021-08-30 12:30:50 -07:00
d37636901e [Doc] make_tensor to torch.testing module (#63925)
Summary:
This PR aims to add `make_tensor` to the `torch.testing` module in PyTorch docs.

TODOs:

* [x] Add examples

cc: pmeier mruberry brianjo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63925

Reviewed By: ngimel

Differential Revision: D30633487

Pulled By: mruberry

fbshipit-source-id: 8e5a1f880c6ece5925b4039fee8122bd739538af
2021-08-30 12:25:40 -07:00
5b0dfd0f8a Fix bad use of channels last kernel in sync batch norm backward (#64100)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64039

There are two distinct problems here.
1. If `grad_output` is channels last but not input, then input would be read as-if it were channels last. So reading the wrong values.
2. `use_channels_last_kernels` doesn't guarunte that `suggest_memory_format` will actually return channels last, so use `empty_like` instead so the strides always match.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64100

Reviewed By: mruberry

Differential Revision: D30622127

Pulled By: ngimel

fbshipit-source-id: e28cc57215596817f1432fcdd6c49d69acfedcf2
2021-08-30 12:16:30 -07:00
ac99d63f83 [jit] Make operation call accept Stack& instead Stack* (#63414)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63414

Misuse of raw pointer in here where stack is never nullable.
ghstack-source-id: 136938318

Test Plan:
compiles.

Imported from OSS

Reviewed By: ejguan

Differential Revision: D30375410

fbshipit-source-id: 9d65b620bb76d90d886c800f54308520095d58ee
2021-08-30 11:49:20 -07:00
=
93d2e5090f Improve performance of index_select by avoiding item (#63008)
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/61788

From a CUDA perspective: item already pulls all Tensor content onto the host (albeit one-by-one), which incurs very expensive memory transfers. This way we'll do it all at once.
From a CPU perspective: item has a lot of overhead as a native function in comparison to simply using a pointer.

Overall there's still lots of performance gains to be had, but this is a small change that should take us into a more usable landscape. This doesn't land a separate benchmark, but I postulate that's not necessary to decide on the benefit of this (we'll also see if it shows up indirectly), however is still a good follow-up item.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63008

Reviewed By: zou3519

Differential Revision: D30211160

Pulled By: cpuhrsch

fbshipit-source-id: 70b752be5df51afc66b5aa1c77135d1205520cdd
2021-08-30 09:50:41 -07:00
e24c3644d8 [Static Runtime] aten::cat out version when it is not being replaced by prim::VarConcat (#64157)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64157

UseVariadicCat optimization is not applied to aten::cat if list input to the op can not be moved to the position before op (https://fburl.com/diffusion/l6kweimu). For these cases we will need out version for SR.

Test Plan:
Confirm out variant is called:
```
> buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1
```

Reviewed By: d1jang

Differential Revision: D30598574

fbshipit-source-id: 74cfa8291dc8b5df4aef58adfb1ab2a16f10d90a
2021-08-30 09:42:38 -07:00
16ecdbbaa2 [PyTorch] Fix missing move in unpickler (#63974)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63974

Saw some time spent in this for model loading, no reason not to move here.
ghstack-source-id: 136760979

Test Plan: Re-profile model loading on devserver; IValue copy ctor time has gone down

Reviewed By: dhruvbird

Differential Revision: D30548923

fbshipit-source-id: 42000f2e18582762b43353cca10ae094833de3b3
2021-08-30 09:38:55 -07:00
9777887f0e [PyTorch] Reduce copies/refcount bumps in BytecodeDeserializer::parseMethods (#63961)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63961

Saw a report that this function was slow and was doing unexplained vector copies. First pass to remove a bunch of copying.
ghstack-source-id: 136760976

Test Plan:
Pixel 3
before: https://our.intern.facebook.com/intern/aibench/details/461850118893980
after: https://www.internalfb.com/intern/aibench/details/48965886029524

MilanBoard failed to return data from simpleperf

Reviewed By: dhruvbird

Differential Revision: D30544551

fbshipit-source-id: 0e2b5471a10c0803d52c923e6fb5625f5542b99d
2021-08-30 09:37:10 -07:00
dc4fd3bdda [MicroBench] Added a micro benchmark for a signed log1p kernel. (#64032)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64032

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30579198

Pulled By: navahgar

fbshipit-source-id: a53d68225fba768b26491d14b535f8f2dcf50c0e
2021-08-30 09:27:51 -07:00
f79df24859 Automated submodule update: FBGEMM (#64149)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: f6dfed87a1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64149

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: jspark1105

Differential Revision: D30632209

fbshipit-source-id: aa1cebaf50169c3a93dbcb994fa47e29d6b6a0d7
2021-08-30 08:30:57 -07:00
82174330d0 [DataLoader2] Adding Messages, Protocols, Loop wrappers (#63882)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63882

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30627452

Pulled By: VitalyFedyunin

fbshipit-source-id: 561ea2df07f3572e04401171946154024126387b
2021-08-30 07:57:20 -07:00
7701ea48be remove one more distributed test (#64108)
Summary:
Follow up on https://github.com/pytorch/pytorch/issues/62896. one more place we should remove distributed test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64108

Reviewed By: janeyx99, soulitzer

Differential Revision: D30614062

Pulled By: walterddr

fbshipit-source-id: 6576415dc2d481d65419da19c5aa0afc37a86cff
2021-08-30 07:51:11 -07:00
093a12aaa9 [nnc] Updated internal asserts to include more detailed error messages (#64118)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64118

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30616944

Pulled By: navahgar

fbshipit-source-id: 35289696cc0e7faa01599304243b86f0febc6daf
2021-08-30 04:40:51 -07:00
a836d83957 [nnc] Fixed warning due to implicit parameter conversion (#64117)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64117

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30616945

Pulled By: navahgar

fbshipit-source-id: eaf69232ac4a684ab5f97a54a514971655f86ef3
2021-08-30 04:39:34 -07:00
d3bcba5f85 ENH Adds label_smoothing to cross entropy loss (#63122)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/7455

Partially resolves pytorch/vision#4281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63122

Reviewed By: iramazanli

Differential Revision: D30586076

Pulled By: jbschlosser

fbshipit-source-id: 06afc3aa1f8b9edb07fe9ed68c58968ad1926924
2021-08-29 23:33:04 -07:00
8af1407eab [Static Runtime] Out version for torch.linalg.norm (#64070)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64070

Test Plan:
Confirm out variant is called for both versions:

```
> buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1
```

Reviewed By: d1jang

Differential Revision: D30595816

fbshipit-source-id: e88d88d4fc698774e83a98efce66b8fa4e281563
2021-08-29 21:00:11 -07:00
44e3ed88c9 [quant] AO migration of the quantize.py (#64086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64086

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.

This migrates the `quantize.py` from torch.quantization to `torch.ao.quantization`.

At this point both locations will be supported. Eventually the torch.quantization will be deprecated.

Test Plan: `buck test mode/opt //caffe2/test:quantization`

Reviewed By: jerryzh168, raghuramank100

Differential Revision: D30055886

fbshipit-source-id: 8ef7470f9fa640c0042bef5bb843e7a05ecd0b9f
2021-08-29 20:30:01 -07:00
29ad84f252 Removes beta warning from the special module documentation (#64148)
Summary:
Updates documentation per feature review. torch.special is now stable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64148

Reviewed By: ngimel

Differential Revision: D30632049

Pulled By: mruberry

fbshipit-source-id: 8f6148ec7737e7b3a90644eeca23eb217eda513d
2021-08-29 19:38:46 -07:00
c5ed31e4a7 add channel last support for MaxUnpool2d (#49984)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49984

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D26007051

Pulled By: VitalyFedyunin

fbshipit-source-id: 6c54751ade4092e03c1651aaa60380f7d6e92f6b
2021-08-29 18:37:10 -07:00
9db56531f7 Revert D30620966: [pytorch][PR] Move Parallel[Native|TBB] to GHA
Test Plan: revert-hammer

Differential Revision:
D30620966 (223f886032)

Original commit changeset: 9a23e4b3e168

fbshipit-source-id: b9248d377b9a7b850dfb3f10f3350fbc9855acfe
2021-08-29 15:51:27 -07:00
710a2e933f [DOC] Add doc for maybe_wrap_dim (#63161)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63161

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D30629451

Pulled By: tugsbayasgalan

fbshipit-source-id: b03f030f197e10393a8ff223b240d23c30858028
2021-08-29 14:19:28 -07:00
7ebdbf82dc add support for sending cpu sparse tensors over rpc (#62794)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62794

This pr updates jit serialization to support pickling Sparse COO tensors.
This pr updates message.cpp to support Sparse COO tensors.
A bug was filed a few years ago https://github.com/pytorch/pytorch/issues/30807.

I tested the fix by adding sparse tensor tests to rpc_test.py and dist_autograd_test.py.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23 gmagogsfm

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D30608848

Pulled By: gcramer23

fbshipit-source-id: 629ba8e4a3d8365875a709c9b87447c7a71204fb
2021-08-29 11:35:00 -07:00
52d7dd7398 [DOC] improve docstring for Optimizer.state_dict (#63153)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63153

Fixes: https://github.com/pytorch/pytorch/issues/60121

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D30629462

Pulled By: tugsbayasgalan

fbshipit-source-id: a9160e02ac53bb1a6219879747d73aae9ebe4d2f
2021-08-29 10:20:58 -07:00
371c6612b3 Automated submodule update: FBGEMM (#64141)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: 9939bac9de

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64141

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: jspark1105

Differential Revision: D30629417

fbshipit-source-id: 1b1ad3d4caff925f798b86b358ab193554c9b8e0
2021-08-29 09:58:04 -07:00
2e6221a232 [nnc] Make 64-bit dimensions work (#64077)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64077

We were assuming kernel dimensions fit in 32 bits (the old fuser made
this assumption too), but we should be able to support 64.
ghstack-source-id: 136933272

Test Plan: unit tests; new IR level test with huge sizes

Reviewed By: ZolotukhinM

Differential Revision: D30596689

fbshipit-source-id: 23b7e393a2ebaecb0c391a6b1f0c4b05a98bcc94
2021-08-28 19:59:47 -07:00
405c15516c Parse int64 sizes/strides (#64076)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64076

We were parsing sizes into int32s, so if you had a tensor with more
than 2^32 elements, you couldn't represent it.
ghstack-source-id: 136933273

Test Plan: parseIR with size of 4e9

Reviewed By: ZolotukhinM

Differential Revision: D30521116

fbshipit-source-id: 1e28e462cba52d648e0e2acb4e234d86aae25a3e
2021-08-28 19:58:34 -07:00
4f969db325 [nnc] Fix batchnorm implementation (#64112)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64112

Fixes #64062

Test Plan: Imported from OSS

Reviewed By: zhxchen17

Differential Revision: D30622897

Pulled By: bertmaher

fbshipit-source-id: 7d7c6131aa786e61fa1d0a517288396a0bdb1d22
2021-08-28 19:20:35 -07:00
aefa2f3e64 To add RMSProp algorithm documentation (#63721)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of RMSProp to the documentation.  For more details, we refer to the paper   https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf

<img width="464" alt="RMSProp" src="https://user-images.githubusercontent.com/73658284/131179226-3fb6fe5a-5301-4948-afbe-f38bf57f24ff.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63721

Reviewed By: albanD

Differential Revision: D30612426

Pulled By: iramazanli

fbshipit-source-id: c3ac630a9658d1282866b53c86023ac10cf95398
2021-08-28 15:55:56 -07:00
8b6266fe4f Automated submodule update: FBGEMM (#64129)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: f14e794814

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64129

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: jspark1105

Differential Revision: D30621549

fbshipit-source-id: 34c109e75c96a261bf370f7a06dbb8b9004860ab
2021-08-28 11:56:17 -07:00
223f886032 Move Parallel[Native|TBB] to GHA (#64123)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64123

Reviewed By: driazati

Differential Revision: D30620966

Pulled By: malfet

fbshipit-source-id: 9a23e4b3e16870f77bf18df4370cd468603d592d
2021-08-28 11:50:30 -07:00
d0c63e857d Enhancement for smart serialization for out schemas (#63096)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63096

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D30415255

Pulled By: tugsbayasgalan

fbshipit-source-id: eb40440a3b46258394d035479f5fc4a4baa12bcc
2021-08-28 11:46:27 -07:00
f4496528e3 [Light] Fix error message (#64010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64010

Fixing typos in a error message

Test Plan:
Error message before fix:
Lite Interpreter verson number does not match. The model version must be between 3 and 5But the model version is 6

Error message after fix:
Lite Interpreter version number does not match. The model version must be between 3 and 5 but the model version is 6

Reviewed By: larryliu0820

Differential Revision: D30568367

fbshipit-source-id: 205f3278ee8dcf38579dbb828580a9e986ccacc1
2021-08-27 22:54:38 -07:00
0d0605eaa9 [quant][graphmode][fx] Add reference quantized linear module (#63627)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63627

Added reference quantized linear module for the custom backend flow, the reference quantized module will
have the following code:
```
        w(float) -- quant - dequant \
        x(float) ------------- F.linear ---
```
In the full model, we will see
```
        w(float) -- quant - *dequant \
        x -- quant --- *dequant --  *F.linear --- *quant - dequant
```
and the backend should be able to fuse the ops with `*` into a quantized linear

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_conv_linear_reference

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30504750

fbshipit-source-id: 5729921745c2b6a0fb344efc3689f3b170e89500
2021-08-27 22:53:24 -07:00
a3a7a67048 [iOS][GPU] Consolidate array and non-array kernel for hardswish (#63369)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63369

ghstack-source-id: 136918152

(Note: this ignores all push blocking failures!)

Test Plan:
- `buck test pp-macos`
- Op tests in PyTorchPlayground app
- Run mobilenetv3 test

https://pxl.cl/1Ncls

Reviewed By: xta0

Differential Revision: D30354454

fbshipit-source-id: 88bf4f8b5871e63170161b3f3e44f99b8a3086c6
2021-08-27 19:31:08 -07:00
9ccb9299e0 To add Nesterov Adam algorithm description to documentation (#63793)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper  https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Nesterov Adam Algorithm to the documentation.  For more details, we refer to the paper  https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ

<img width="439" alt="NAdam" src="https://user-images.githubusercontent.com/73658284/131185124-e81b2edf-33d9-4a9d-a7bf-f7e5eea47d7c.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63793

Reviewed By: NivekT

Differential Revision: D30617057

Pulled By: iramazanli

fbshipit-source-id: cd2054b0d9b6883878be74576e86e307f32f1435
2021-08-27 19:29:34 -07:00
07c5cb8c48 [Static Runtime] Optimize memory planner initialization (#64101)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64101

Checking `getOutOfPlaceOperation(n)` is a very expensive operation, especially in multithreaded environments, due to a lock acquisition when the NNC cache is queried. This slows down the memory planner initialization time, and by extension, the latency for the first static runtime inference.

There are two optimizations in this diff:
* Cache the result of `p_node->has_out_variant()` to avoid the call to `getOutOfPlaceOperation`. This speeds up calls to `canReuseInputOutputs`, which in turn speeds up `isOptimizableContainerType`
* Precompute all `isOptimizableContainerType` during static runtime initialization to avoid a pass over all of each node's inputs.

Test Plan: All unit tests pass: `buck test caffe2/benchmarks/static_runtime/...`

Reviewed By: movefast1990

Differential Revision: D30595579

fbshipit-source-id: 70aaa7af9589c739c672788bf662f711731864f2
2021-08-27 17:40:43 -07:00
2d75ab0c8f [TensorExpr] Update tutorial. (#64109)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64109

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30614050

Pulled By: ZolotukhinM

fbshipit-source-id: e8f9bd9ef2483e6eafbc0bd5394d311cd694c7b2
2021-08-27 16:19:29 -07:00
3abbcf079d .github: Add cpp_docs job to current gcc5 workflow (#64044)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64044

Adds the cpp_docs job to the current workflow, also modifies the scripts
surrounding building docs so that they can be powered through
environment variables with sane defaults rather than having to have
passed arguments.

Ideally should not break current jobs running in circleci but those
should eventually be turned off anyways.

Coincides with work from:
* https://github.com/seemethere/upload-artifact-s3/pull/1
* https://github.com/seemethere/upload-artifact-s3/pull/2

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30610010

Pulled By: seemethere

fbshipit-source-id: f67adeb1bd422bb9e24e0f1ec0098cf9c648f283
2021-08-27 16:06:12 -07:00
6ccb74b837 Update codegen to use boxed kernel (#63459)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63459

 - Replaces the usual registration basically when "requires_derivative" is True (as in we still need a grad_fn), but `fn.info` is `None` (TODO maybe make sure differentiable inputs > 0 also to match requires_derivative).
 - Adds some (temporary?) fixes to some sparse functions See: https://github.com/pytorch/pytorch/issues/63549
 - To remove the codegen that generates NotImplemented node (though that should only be one line),  because there are some ops listed under `RESET_GRAD_ACCUMULATOR` that have a extra function call. We would need to make this list of ops available to c++, but this would either mean we'd have to codegen a list of strings, or move the RESET_GRAD_ACCUMULATOR to cpp land. We could do this in a future PR if necessary.

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30518571

Pulled By: soulitzer

fbshipit-source-id: 99a35cbced46292d1b4e51594ae4d534c2caf8b6
2021-08-27 15:01:50 -07:00
90a6498a12 Add autograd not implemented boxed fallback (#63458)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63458

See description and discussion from https://github.com/pytorch/pytorch/pull/62450

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30518572

Pulled By: soulitzer

fbshipit-source-id: 3b1504d49abb84560ae17077f0dec335749c9882
2021-08-27 15:00:28 -07:00
8406dba65a Removing references to ProcessGroupAgent in comments (#64051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64051

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D30587076

Pulled By: jaceyca

fbshipit-source-id: 414cb95faad0b4da0eaf2956c0668af057f93574
2021-08-27 14:47:37 -07:00
bdde898d9c Add README to datapipes (#63982)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63982

Add a readme to `datapipes` for developer. This is can be a replacement of https://github.com/pytorch/pytorch/blob/master/torch/utils/data/datapipes_tutorial_dev_loaders.ipynb

After this PR is landed, the README.md will be added to PyTorch Wiki

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D30554198

Pulled By: ejguan

fbshipit-source-id: 6091aae8ef915c7c1f00fbf45619c86c9558d308
2021-08-27 14:17:08 -07:00
358c46f99e Implement leaky relu op
Summary: Implemented leaky relu op as per: https://www.internalfb.com/tasks/?t=97492679

Test Plan:
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"

all tests pass, including new ones

Reviewed By: SS-JIA

Differential Revision: D30186225

fbshipit-source-id: fdb1f8f7b3a28b5504581822185c0475dcd53a3e
2021-08-27 13:52:49 -07:00
18cb3fc910 [FX] Validate data type of target on Node Construction (#64050)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64050

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D30585535

Pulled By: yqhu

fbshipit-source-id: 96778a87e75f510b4ef42f0e5cf76b35b7b2f331
2021-08-27 13:40:57 -07:00
ff4569ae29 Sparse CUDA: rename files *.cu -> *.cpp (#63894)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63894

This PR introduces a few code structure changes. There is no need to use
.cu extension for pure c++ code without cuda. Moved
`s_addmm_out_csr_sparse_dense_cuda_worker` to a separate cpp file from
cu file.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30548771

Pulled By: cpuhrsch

fbshipit-source-id: 6f12d36e7e506d2fdbd57ef33eb73192177cd904
2021-08-27 13:22:54 -07:00
8fc1064b7f [PyTorch] Reduce code size of register_prim_ops.cpp (#61494)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61494

Creating a constexpr array and then looping over it is much cheaper than emitting a function call per item.
ghstack-source-id: 136639302

Test Plan:
fitsships

Buildsizebot some mobile apps to check size impact.

Reviewed By: dhruvbird, iseeyuan

Differential Revision: D29646977

fbshipit-source-id: 6144999f6acfc4e5dcd659845859702051344d88
2021-08-27 12:56:35 -07:00
6a76ee04de Adding alltoall_single collective to collective quantization API (#63154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63154

The collective quantization API now supports alltoall, alltoall_single, and allscatter. The test is also included.
ghstack-source-id: 136856877

Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/algorithms/quantization:DistQuantizationTests_nccl -- test_all_to_all_single_bfp16

Reviewed By: wanchaol

Differential Revision: D30255251

fbshipit-source-id: 856f4fa12de104689a03a0c8dc9e3ecfd41cad29
2021-08-27 12:46:31 -07:00
04108592a3 New TLS to disable forward mode AD (#63117)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63117

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30388097

Pulled By: albanD

fbshipit-source-id: f1bc777064645db1ff848bdd64af95bffb530984
2021-08-27 11:59:24 -07:00
6257f5b168 [pruner] add README to repo (#64099)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64099

adding readme to pruner in OSS
ghstack-source-id: 136867516

Test Plan: should not affect behavior

Reviewed By: z-a-f

Differential Revision: D30608045

fbshipit-source-id: 3e9899a853395b2e91e8a69a5d2ca5f3c2acc646
2021-08-27 11:52:59 -07:00
101a626330 Improve distributed.get_rank() API docstring (#63296)
Summary:
See discussion in https://pytorch.slack.com/archives/CBHSWPNM7/p1628792389008600

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63296

Reviewed By: cbalioglu

Differential Revision: D30332042

Pulled By: mrshenli

fbshipit-source-id: 3a642fda2e106fd35b67709ed2adb60e408854c2
2021-08-27 11:34:55 -07:00
196fd3ee7a Modules note v2 (#63963)
Summary:
This PR expands the [note on modules](https://pytorch.org/docs/stable/notes/modules.html) with additional info for 1.10.

It adds the following:
* Examples of using hooks
* Examples of using apply()
* Examples for ParameterList / ParameterDict
* register_parameter() / register_buffer() usage
* Discussion of train() / eval() modes
* Distributed training overview / links
* TorchScript overview / links
* Quantization overview / links
* FX overview / links
* Parametrization overview / link to tutorial

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63963

Reviewed By: albanD

Differential Revision: D30606604

Pulled By: jbschlosser

fbshipit-source-id: c1030b19162bcb5fe7364bcdc981a2eb6d6e89b4
2021-08-27 11:30:18 -07:00
19c1b45f25 Detect out argument in the schema (#62755)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62755

After this change, out argument can be checked by calling is_out()

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30415256

Pulled By: tugsbayasgalan

fbshipit-source-id: b2e1fa46bab7c813aaede1f44149081ef2df566d
2021-08-27 11:20:33 -07:00
9f1f22b9bc [Static Runtime] Add out variant of quantized::embedding_bag_byte_prepack (#64081)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64081

This change add an out variant of `quantized::embedding_bag_byte_prepack`.

Test Plan:
- Added `ShapeInferenceTest.QEmbeddingBagByteUnpack`.

- Observed

```
V0824 13:38:49.723708 1322143 impl.cpp:1394] Switch to out variant for node: %2 : Tensor = quantized::embedding_bag_byte_prepack(%input)
```

Reviewed By: hlu1

Differential Revision: D30504216

fbshipit-source-id: 1d9d428e77a15bcc7da373d65e7ffabaf9c6caf2
2021-08-27 10:53:23 -07:00
6ab3a21098 fix resize bug (#61166)
Summary:
I think the original intention here is to only take effect in the case of align_corners (because output_size = 1 and the divisor will be 0), but it affects non-align_corners too. For example:

```python
input = torch.tensor(
        np.arange(1, 5, dtype=np.int32).reshape((1, 1, 2, 2)) )
m = torch.nn.Upsample(scale_factor=0.5, mode="bilinear")
of_out = m(input)
```

The result we expect should be [[[[2.5]]]]

but pytorch get [[[[1.0]]]] which is different from OpenCV  and PIL, this pr try to fixed it。

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61166

Reviewed By: malfet

Differential Revision: D30543178

Pulled By: heitorschueroff

fbshipit-source-id: 21a4035483981986b0ae4a401ef0efbc565ccaf1
2021-08-27 10:49:31 -07:00
538c30a713 [caffe2] fixes to allow stricter compilation flag (#64016)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64016

In order to increase the strictness of the compilation for some target depending on caffe2 we need to fix some errors uncovered when rising such flags.

This change introduces the required override tokens for virtual destructors

Test Plan: CI. Moreover targets depending on caffe2  using clang strict warnings now compile

Reviewed By: kalman5

Differential Revision: D30541714

fbshipit-source-id: 564af31b4a9df3536d7d6f43ad29e1d0c7040551
2021-08-27 10:38:37 -07:00
eca87f729d Added reference tests to ReductionOpInfo (#62900)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62900

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30408815

Pulled By: heitorschueroff

fbshipit-source-id: 6a1f82ac281920ff7405a42f46ccd796e60af9d6
2021-08-27 10:32:16 -07:00
babd449978 [JIT] Add aten::slice optimization (#63049)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63049

Given a graph produced from a function like this:
```
def foo():
    li = [1, 2, 3, 4, 5, 6]
    return li[0:2]
```
This pass produces a graph like this:
```
def foo():
    li = [1, 2]
    return li
```

These changes are mostly adapted from https://github.com/pytorch/pytorch/pull/62297/

Test Plan: `buck test //caffe2/test:jit -- TestPeephole`

Reviewed By: eellison

Differential Revision: D30231044

fbshipit-source-id: d12ee39f68289a574f533041a5adb38b2f000dd5
2021-08-27 10:12:45 -07:00
3abb606091 Add doc for nn.MultiMarginLoss (shape, example) (#63760)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63747

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63760

Reviewed By: malfet

Differential Revision: D30541581

Pulled By: jbschlosser

fbshipit-source-id: 99560641e614296645eb0e51999513f57dfcfa98
2021-08-27 09:51:05 -07:00
a9983ac09c Refactor structured set_output in Register{DispatchKey}.cpp (#62188)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62188

These parts of the `set_output` code are identical for all operators in the
kernel registration files. So, this moves them from being copied into every
class to two helper functions at the top of the file.

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D29962045

Pulled By: albanD

fbshipit-source-id: 753b8aac755f3c91b77ffa2c30a89ac91a84b7c4
2021-08-27 09:38:27 -07:00
f922b58b5f [bazel] GPU-support: add @local_config_cuda and @cuda (#63604)
Summary:
## Context

We take the first step at tackling the GPU-bazel support by adding bazel external workspaces `local_config_cuda` and `cuda`, where the first one has some hardcoded values and lists of files, and the second one provides a nicer, high-level wrapper that maps into the already expected by pytorch bazel targets that are guarded with `if_cuda` macro.

The prefix `local_config_` signifies the fact that we are breaking the bazel hermeticity philosophy by explicitly relaying on the CUDA installation that is present on the machine.

## Testing

Notice an important scenario that is unlocked by this change: compilation of cpp code that depends on cuda libraries (i.e. cuda.h and so on).

Before:
```
sergei.vorobev@cs-sv7xn77uoy-gpu-1628706590:~/src/pytorch4$ bazelisk build --define=cuda=true //:c10
ERROR: /home/sergei.vorobev/src/pytorch4/tools/config/BUILD:12:1: no such package 'tools/toolchain': BUILD file not found in any of the following directories. Add a BUILD file to a directory to mark it as a package.
 - /home/sergei.vorobev/src/pytorch4/tools/toolchain and referenced by '//tools/config:cuda_enabled_and_capable'
ERROR: While resolving configuration keys for //:c10: Analysis failed
ERROR: Analysis of target '//:c10' failed; build aborted: Analysis failed
INFO: Elapsed time: 0.259s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (2 packages loaded, 2 targets configured)
```

After:
```
sergei.vorobev@cs-sv7xn77uoy-gpu-1628706590:~/src/pytorch4$ bazelisk build --define=cuda=true //:c10
INFO: Analyzed target //:c10 (6 packages loaded, 246 targets configured).
INFO: Found 1 target...
Target //:c10 up-to-date:
  bazel-bin/libc10.lo
  bazel-bin/libc10.so
INFO: Elapsed time: 0.617s, Critical Path: 0.04s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
```

The `//:c10` target is a good testing one for this, because it has such cases where the [glob is different](075024b9a3/BUILD.bazel (L76-L81)), based on do we compile for CUDA or not.

## What is out of scope of this PR

This PR is a first in a series of providing the comprehensive GPU bazel build support. Namely, we don't tackle the [cu_library](11a40ad915/tools/rules/cu.bzl (L2)) implementation here. This would be a separate large chunk of work.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63604

Reviewed By: soulitzer

Differential Revision: D30442083

Pulled By: malfet

fbshipit-source-id: b2a8e4f7e5a25a69b960a82d9e36ba568eb64595
2021-08-27 09:33:42 -07:00
22d38bd10d [OSS] Enable Metal in PyTorch MacOS nightly builds (#63718)
Summary:
Build on https://github.com/pytorch/pytorch/pull/63825

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63718

Test Plan:
1.Add `ci/binaries` label to PR, so the CI will build those nightly builds

2.Make sure the following CI jobs build with `USE_PYTORCH_METAL_EXPORT` option is `ON`:
```
ci/circleci: binary_macos_arm64_conda_3_8_cpu_nightly_build
ci/circleci: binary_macos_arm64_conda_3_9_cpu_nightly_build
ci/circleci: binary_macos_arm64_wheel_3_8_cpu_nightly_build
ci/circleci: binary_macos_arm64_wheel_3_9_cpu_nightly_build
ci/circleci: binary_macos_conda_3_6_cpu_nightly_build
ci/circleci: binary_macos_conda_3_7_cpu_nightly_build
ci/circleci: binary_macos_conda_3_8_cpu_nightly_build
ci/circleci: binary_macos_conda_3_9_cpu_nightly_build
ci/circleci: binary_macos_libtorch_3_7_cpu_nightly_build
ci/circleci: binary_macos_wheel_3_6_cpu_nightly_build
ci/circleci: binary_macos_wheel_3_7_cpu_nightly_build
ci/circleci: binary_macos_wheel_3_8_cpu_nightly_build
ci/circleci: binary_macos_wheel_3_9_cpu_nightly_build
```

3.Test `conda` and `wheel` builds locally on [HelloWorld-Metal](https://github.com/pytorch/ios-demo-app/tree/master/HelloWorld-Metal) demo with [(Prototype) Use iOS GPU in PyTorch](https://pytorch.org/tutorials/prototype/ios_gpu_workflow.html)

(1) conda
```
conda install https://15667941-65600975-gh.circle-artifacts.com/0/Users/distiller/project/final_pkgs/pytorch-1.10.0.dev20210826-py3.8_0.tar.bz2
```
(2) wheel
```
pip3 install https://15598647-65600975-gh.circle-artifacts.com/0/Users/distiller/project/final_pkgs/torch-1.10.0.dev20210824-cp38-none-macosx_10_9_x86_64.whl
```

Reviewed By: xta0

Differential Revision: D30593167

Pulled By: hanton

fbshipit-source-id: 471da204e94b29c11301c857c50501307a5f0785
2021-08-27 09:25:05 -07:00
a43e7a51d7 Adds return type annotation for fork_rng function (#63724)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63723

Since it's a generator function the type annotation shall be `Generator`.
![image](https://user-images.githubusercontent.com/47299190/130318830-29ef9529-0daa-463c-90b2-1b11f63ade8a.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63724

Reviewed By: iramazanli

Differential Revision: D30543098

Pulled By: heitorschueroff

fbshipit-source-id: ebdd34749defe1e26c899146786a0357ab4b4b9b
2021-08-27 09:03:40 -07:00
ad8eddbd80 More robust check of whether a class is defined in torch (#64083)
Summary:
This would prevent bugs for classes that
1) Is defined in a module that happens to start with `torch`, say `torchvision`
2) Is defined in torch but with an import alias like `import torch as th`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64083

Reviewed By: soulitzer

Differential Revision: D30598369

Pulled By: gmagogsfm

fbshipit-source-id: 9d3a7135737b2339c9bd32195e4e69a9c07549d4
2021-08-27 08:55:35 -07:00
f2c47cf4db [Static Runtime] Out version for fmod (#64046)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64046

Test Plan:
Confirm out variant is used:
```
> //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1

V0826 23:31:30.321382 193428 impl.cpp:1395] Switch to out variant for node: %4 : Tensor = aten::fmod(%a.1, %b.1)
```

Reviewed By: mikeiovine

Differential Revision: D30581228

fbshipit-source-id: dfab9a16ff8afd40b29338037769f938f154bf74
2021-08-27 03:05:06 -07:00
c90b3cb1da [Static Runtime] Manage temporary Tensors for aten::layer_norm (#64078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64078

This change converts `aten::layer_norm -> output Tensor` to `static_runtime::layer_norm -> (output Tensor, temp1 Tensor, tmp2 Tensor)` to manage `tmp1` and `tmp2` Tensors by the static runtime.

Currently the out-variant of `aten::layer_norm` creates two temporary Tensors inside it:
```
    at::Tensor mean = create_empty_from({M}, *X);
    at::Tensor rstd = create_empty_from({M}, *X);
```
that the static runtime misses an opportunity to manage.

This change puts them into (unused) output Tensors of a new placeholder op `static_runtime::layer_norm` so that the static runtime can mange them since the static runtime as of now chooses to manage only output tensors.

Test Plan:
- Enhanced `StaticRuntime.LayerNorm` to ensure that `static_runtime::layer_norm` gets activated.

- Confirmed that the new op gets activated during testing:

```
V0825 12:51:50.017890 2265227 impl.cpp:1396] Switch to out variant for node: %8 : Tensor, %9 : Tensor, %10 : Tensor = static_runtime::layer_norm(%input.1, %normalized_shape.1, %4, %4, %5, %3)

```

Reviewed By: hlu1

Differential Revision: D30486475

fbshipit-source-id: 5121c44ab58c2d8a954aa0bbd9dfeb7468347a2d
2021-08-27 02:44:43 -07:00
3c3bba4169 [Static Runtime] Use F14FastMap/F14FastSet (#63999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63999

Use folly::F14FastMap/F14FastSet instead of std::unordered_map/unordered_set in the Static Runtime code base. folly::F14FastMap/F14FastSet implements the same APIs as std::unordered_map/unordered_set but faster. For details see https://github.com/facebook/folly/blob/master/folly/container/F14.md

Reviewed By: d1jang

Differential Revision: D30566149

fbshipit-source-id: 20a7fa2519e4dde96fb3fc61ef6c92bf6d759383
2021-08-27 01:40:41 -07:00
3f1c809470 [static runtime] port c2 argmin kernel (#63632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63632

Local benchmarking with 1 input repeated 10k iter on 290331537_4 local net. Reduces argmin runtime by about 80% and and local net execution by about ~0.71-0.77ms.

Before:
```
I0826 17:25:53.972786 1104614 PyTorchPredictorBenchLib.cpp:313] PyTorch run finished. Milliseconds per iter: 7.37599. Iters per second: 135.57
```
```
Static runtime ms per iter: 8.22086. Iters per second: 121.642
Time per node type:
        4.13527 ms.    50.9157%. fb::sigrid_transforms_torch_bind (1 nodes, out variant)
       0.868506 ms.    10.6935%. aten::argmin (1 nodes, out variant)
...
```

After:
```
I0826 17:17:54.165174 1064079 PyTorchPredictorBenchLib.cpp:313] PyTorch run finished. Milliseconds per iter: 6.66724. Iters per second: 149.987
```
```
Static runtime ms per iter: 7.68172. Iters per second: 130.179
Time per node type:
         4.1452 ms.    54.0612%. fb::sigrid_transforms_torch_bind (1 nodes, out variant)
       0.656778 ms.    8.56562%. fb::quantized_linear (8 nodes)
       0.488229 ms.    6.36741%. static_runtime::to_copy (827 nodes, out variant)
       0.372678 ms.    4.86042%. aten::argmin (1 nodes, out variant)
...Time per node type:
        3.39387 ms.    53.5467%. fb::sigrid_transforms_torch_bind (1 nodes, out variant)
       0.636216 ms.    10.0379%. fb::quantized_linear (8 nodes, out variant)
       0.410535 ms.    6.47721%. fb::clip_ranges_to_gather_to_offsets (304 nodes, out variant)
       0.212721 ms.     3.3562%. fb::clip_ranges_gather_sigrid_hash_precompute_v3 (157 nodes, out variant)
       0.173736 ms.    2.74111%. aten::matmul (1 nodes, out variant)
       0.150514 ms.    2.37474%. aten::argmin (1 nodes, out variant)
```
P447422384

Test Plan:
Test with local replayer sending traffic to `ansha_perf_test_0819.test`, and compare outputs to jit interpreter.

Start compute tier:
```
RUN_UUID=ansha_perf_test_0819.test.storage JOB_EXPIRE_TIME=864000 MODEL_ID=290331537_4 PREDICTOR_TAG= PREDICTOR_VERSION=405 PREDICTOR_TYPE=CPU ADDITIONAL_FLAGS="--enable_disagg_file_split=true --enable_adx=false --load_remote_file_locally=true --pytorch_predictor_static_runtime_whitelist_by_id=290331537" GFLAGS_CONFIG_PATH=sigrid/predictor/gflags/predictor_gflags_ads_perf_cpu_pyper SMC_TIER_NAME=sigrid.predictor.perf.ansha_per_test_0819.test.storage CLUSTER=tsp_rva ENTITLEMENT_NAME=ads_ranking_infra_test_t6 PREDICTOR_LOCAL_DIRECTORY= ICET_CONFIG_PATH= NNPI_COMPILATION_CONFIG_FILE= NUM_TASKS=1 NNPI_NUM_WORKERS=0 tw job start /data/users/ansha/fbsource/fbcode/tupperware/config/admarket/sigrid/predictor/predictor_perf_canary.tw
```

Start nnpi tier:
```
RUN_UUID=ansha_perf_test_0819.test JOB_EXPIRE_TIME=247200 MODEL_ID=290331537_4 PREDICTOR_TAG= PREDICTOR_VERSION=343 PREDICTOR_TYPE=NNPI_TWSHARED ADDITIONAL_FLAGS="--torch_glow_min_fusion_group_size=30 --pytorch_storage_tier_replayer_sr_connection_options=overall_timeout:1000000,processing_timeout:1000000 --predictor_storage_smc_tier=sigrid.predictor.perf.ansha_perf_test_0819.test.storage --pytorch_predictor_static_runtime_whitelist_by_id=290331537" GFLAGS_CONFIG_PATH=sigrid/predictor/gflags/predictor_gflags_ads_perf_glow_nnpi_pyper_v1 SMC_TIER_NAME=sigrid.predictor.perf.ansha_perf_test_0819.test CLUSTER=tsp_rva ENTITLEMENT_NAME=ads_ranking_infra_test_t17 PREDICTOR_LOCAL_DIRECTORY= ICET_CONFIG_PATH= NNPI_COMPILATION_CONFIG_FILE= NUM_TASKS=1 NNPI_NUM_WORKERS=0 tw job start /data/users/ansha/fbsource/fbcode/tupperware/config/admarket/sigrid/predictor/predictor_perf_canary.tw
```

```buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- StaticRuntime.IndividualOps_Argmin --print-passing-details```

Compared outputs to jit interpreter to check for no differences greater than 1e-3 (with nnc on) https://www.internalfb.com/intern/diff/view-version/136824794/

Reviewed By: hlu1

Differential Revision: D30445635

fbshipit-source-id: 048de8867ac72f764132295d1ebfa843cde2fa27
2021-08-26 23:19:19 -07:00
294db0603f [quant] Add support for linear_relu fusion for FP16 dynamic quant (#63826)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63826

Support the conversion of the intrinsic linearRelu module to the quantized dynamic LinearReLU module
Verify the support works for both linear module and functional linear fusion

Test Plan:
python test/test_quantization.py test_dynamic_with_fusion

Imported from OSS

Reviewed By: iramazanli

Differential Revision: D30503513

fbshipit-source-id: 70446797e9670dfef7341cba2047183d6f88b70f
2021-08-26 21:12:06 -07:00
cec44aa574 [quant] Add op support for linear_relu_dynamic_fp16 (#63824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63824

Add a fused operator implementation that will work with the quantization fusion APIs.
Once FBGEMM FP16 kernel supports relu fusion natively we can remove the addition from the PT operator.

Test Plan:
python test/test_quantization.py

Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30503514

fbshipit-source-id: 6bf3bd53f47ffaa3f1d178eaad8cc980a7f5258a
2021-08-26 21:12:04 -07:00
975f4ccad6 [quant] support linear_relu_dynamic for qnnpack backend (#63820)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63820

Adds support in the operator directly to call relu operator if relu fusion is enabled.
Once QNNPACK natively supports relu fusion in the linear_dynamic this can be removed

Test Plan:
python test/test_quantization.py TestDynamicQuantizedLinear.test_qlinear

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30502813

fbshipit-source-id: 3352ee5f73e482b6d1941f389d720a461b84ba23
2021-08-26 21:12:02 -07:00
c7027f19ef [quant][fx] Add support for dynamic linear + relu fusion (INT8) (#63799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63799

Add a new module that can be used for module swap with the nni.LinearReLU module in convert function.
Supports INT8 currently (since FP16 op doesn't have relu fusion yet).

Fixes #55393

Test Plan:
python test/test_quantization.py test_dynamic_fusion

Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30502812

fbshipit-source-id: 3668e4f001a0626d469e17ac323acf582ee28a51
2021-08-26 21:10:46 -07:00
63c90ec3bf [torch/deploy] add torch.distributed to build (#63918)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63918

Previously we were building with `USE_DISTRIBUTED` off, because c10d was built as a separately library for historical reasons. Since then, lw has merged the c10d build into libtorch, so this is fairly easy to turn on.

Differential Revision:
D30492442

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.intern.facebook.com/intern/diff/D30492442/)!
D30492442
D30492442

Test Plan: added a unit test

Reviewed By: wconstab

Pulled By: suo

fbshipit-source-id: 843b8fcf349a72a7f6fcbd1fcc8961268690fb8c
2021-08-26 20:58:44 -07:00
65e6194aeb Introduce the torchrun entrypoint (#64049)
Summary:
This PR introduces a new `torchrun` entrypoint that simply "points" to `python -m torch.distributed.run`. It is shorter and less error-prone to type and gives a nicer syntax than a rather cryptic `python -m ...` command line. Along with the new entrypoint the documentation is also updated and places where `torch.distributed.run` are mentioned are replaced with `torchrun`.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64049

Reviewed By: cbalioglu

Differential Revision: D30584041

Pulled By: kiukchung

fbshipit-source-id: d99db3b5d12e7bf9676bab70e680d4b88031ae2d
2021-08-26 20:17:48 -07:00
510d2ece81 Merge script and _script_pdt API (#62420)
Summary:
Merge `torch.jit.script` and `torch.jit._script_pdt` API. This PR merges profile directed typing with script api

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62420

Reviewed By: iramazanli

Differential Revision: D30579015

Pulled By: nikithamalgifb

fbshipit-source-id: 99ba6839d235d61b2dd0144b466b2063a53ccece
2021-08-26 18:58:19 -07:00
0e8c3c51d9 port glu to use structured kernel approach (#61800)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61800

resubmitting because the [last one](https://github.com/pytorch/pytorch/pull/61433) was unrecoverable due to making changes incorrectly in the stack

Test Plan: Imported from OSS

Reviewed By: iramazanli

Differential Revision: D29812492

Pulled By: makslevental

fbshipit-source-id: c3dfeacd1e00a526e24fbaab02dad48069d690ef
2021-08-26 18:01:28 -07:00
a5f35ac7cd Run through failures on trunk (#64063)
Summary:
This PR runs all the tests on trunk instead of stopping on first failure.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64063

Reviewed By: malfet, seemethere

Differential Revision: D30592020

Pulled By: janeyx99

fbshipit-source-id: 318b225cdf918a98f73e752d1cc0227d9227f36c
2021-08-26 17:38:19 -07:00
0c9dce90ed [pytorch] add per_sample_weights support for embedding_bag_4bit_rowwise_offsets (#63605)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63605

Reviewed By: houseroad

Differential Revision: D30434664

fbshipit-source-id: eb4cbae3c705f9dec5c073a56f0f23daee353bc1
2021-08-26 17:31:45 -07:00
81764d1153 document that torch.triangular_solve has optional out= parameter (#63253)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63253

Fixes #57955

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30312134

Pulled By: dagitses

fbshipit-source-id: 4f484620f5754f4324a99bbac1ff783c64cee6b8
2021-08-26 17:28:17 -07:00
ed573a8e08 Enable test_api IMethodTest in OSS (#63345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63345

This diff did the following few things to enable the tests:
1. Exposed IMethod as TORCH_API.
2. Linked torch_deploy to test_api if USE_DEPLOY == 1.
3. Generated torch::deploy examples when building torch_deploy library.

Test Plan: ./build/bin/test_api --gtest_filter=IMethodTest.*

Reviewed By: ngimel

Differential Revision: D30346257

Pulled By: alanwaketan

fbshipit-source-id: 932ae7d45790dfb6e00c51893933a054a0fad86d
2021-08-26 16:50:52 -07:00
0bd8d0951d [Static Runtime] Remove unnecessary fb::equally_split nodes (#64022)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64022

Test Plan: - Added unittest `StaticRuntime.RemoveEquallySplitListUnpack`.

Reviewed By: hlu1

Differential Revision: D30472189

fbshipit-source-id: 36040b0146f4be9d0d0fda293f7205f43aad0b87
2021-08-26 16:29:43 -07:00
dfa35ab3e7 [pytorch][quant][oss] Support 2-bit embedding_bag op "embedding_bag_2bit_rowwise_offsets" (#63658)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63658

Support 2-bit embedding_bag op "embedding_bag_2bit_rowwise_offsets"

Reviewed By: jingsh, supriyar

Differential Revision: D30454994

fbshipit-source-id: 7aa7bfe405c2ffff639d5658a35181036e162dc9
2021-08-26 16:09:35 -07:00
92a154aa29 Move variabletype functions around (#63330)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63330

 - This is in preparation for templated/boxed autograd-not-implemented fallback
 - Make sure VariableTypeUtils does not depend on generated code
 - Lift `isFwGradDefined` into `autograd/functions/utils.cpp` so it's available to mobile builds
 - Removes `using namespace at` from VariableTypeUtils, previously we needed this for Templated version, but now its not strictly necessary but still a good change to avoid name conflicts if this header is included elsewhere in the future.

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30518573

Pulled By: soulitzer

fbshipit-source-id: a0fb904baafc9713de609fffec4b813f6cfcc000
2021-08-26 16:02:39 -07:00
49353e319c More sharded_tensor creation ops: harded_tensor.zeros, sharded_tensor.full, sharded_tensor.rand (#63732)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63732

Test Plan:
$ python test/distributed/_sharded_tensor/test_sharded_tensor.py  --v

$ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestCreateTensorFromParams --v
$ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestShardedTensorChunked --v

Imported from OSS

Differential Revision:
D30472621
D30472621

Reviewed By: pritamdamania87

Pulled By: bowangbj

fbshipit-source-id: fd8ebf9b815fdc292ad1aad521f9f4f454163d0e
2021-08-26 16:01:38 -07:00
49b782b2cb Add shard number to print_test_stats.py upload name (#64055)
Summary:
Now that the render test results job is gone, each shard on GHA is uploading a JSON test stats report. To ensure differentiation, this PR includes the shard number in the report name.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64055

Reviewed By: iramazanli

Differential Revision: D30586869

Pulled By: janeyx99

fbshipit-source-id: fd19f347131deec51486bb0795e4e13ac19bc71a
2021-08-26 15:43:29 -07:00
085278f8b1 Derivatives of relu (#63027) (#63089)
Summary:
Optimization of relu and leaky_relu derivatives for reduction of VRAM needed for the backward-passes

Fixes https://github.com/pytorch/pytorch/issues/63027

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63089

Reviewed By: iramazanli

Differential Revision: D30582049

Pulled By: albanD

fbshipit-source-id: a9481fe8c10cbfe2db485e28ce80cabfef501eb8
2021-08-26 15:33:25 -07:00
7861dba7f6 Automated submodule update: FBGEMM (#62879)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: ce54703857

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62879

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: jspark1105

Differential Revision: D30154801

fbshipit-source-id: b2ce185da6f6cadf5128f82b15097d9e13e9e6a0
2021-08-26 15:20:06 -07:00
aeec177833 [JIT] UseVariadicOp takes list_idx parameter (#63915)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63915

Previously, this function only worked for variadic op substitutions of the form `op(list, args) -> variadic_op(list_1, ..., list_n, args)`. This change allows for transformations of the form `op(args_0, list, args_1) -> variadic_op(args_0, list_1, ..., list_n, args_1)`.

Test Plan:
`buck test caffe2/test/cpp/jit:jit -- Stack Concat`

(tests exercising `list_idx != 0` will be added further up in this diff stack)

Reviewed By: navahgar

Differential Revision: D30529729

fbshipit-source-id: 568080679c3b40bdaedee56bef2e8a5ce7985d2f
2021-08-26 14:10:35 -07:00
d8d8e4902a [torch/elastic] Pretty print the failure message captured by @record (#64036)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64036

This PR slightly revises the implementation of the internal `_format_failure()` method in order to pretty print the error message captured in a subprocess by the `record` annotation.

With this PR a failure log is formatted as below:

```
Root Cause:
[0]:
  time: 2021-08-26_17:12:07
  rank: 0 (local_rank: 0)
  exitcode: 1 (pid: 8045)
  error_file: /tmp/torchelastic_6cj9eppm/6d9d844a-6ce4-4838-93ed-1639a9525b00_rec9kuv3/attempt_0/0/error.json
  msg:
    {
      "message": "ValueError: Test",
      "extraInfo": {
        "py_callstack": [
          "  File \"/data/home/balioglu/fail.py\", line 7, in <module>\n    main()\n",
          "  File \"/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/__init__.py\", line 373, in wrapper\n    error_handler.record_exception(e)\n",
          "  File \"/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 86, in record_exception\n    _write_error(e, self._get_error_file_path())\n",
          "  File \"/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 26, in _write_error\n    \"py_callstack\": traceback.format_stack(),\n"
        ],
        "timestamp": "1629997927"
      }
    }
```

in contrast to the old formatting:

```
Root Cause:
[0]:
  time: 2021-08-26_17:15:50
  rank: 0 (local_rank: 0)
  exitcode: 1 (pid: 9417)
  error_file: /tmp/torchelastic_22pwarnq/19f22638-848c-4b8f-8379-677f34fc44e7_u43o9vs7/attempt_0/0/error.json
  msg: "{'message': 'ValueError: Test', 'extraInfo': {'py_callstack': 'Traceback (most recent call last):\n  File "/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 351, in wrapper\n    return f(*args, **kwargs)\n  File "/data/home/balioglu/fail.py", line 5, in main\n    raise ValueError("BALIOGLU")\nValueError: BALIOGLU\n', 'timestamp': '1629998150'}}"
```
ghstack-source-id: 136761768

Test Plan: Run the existing unit tests.

Reviewed By: kiukchung

Differential Revision: D30579025

fbshipit-source-id: 37df0b7c7ec9b620355766122986c2c77e8495ae
2021-08-26 13:56:46 -07:00
5a12cb611f To add Chained Scheduler to the list of PyTorch schedulers. (#63491)
Summary:
In this PR we are introducing ChainedScheduler which initially proposed in the discussion https://github.com/pytorch/pytorch/pull/26423#discussion_r329976246 .

The idea is to provide a user friendly chaining method for schedulers, especially for the cases many of them are involved and we want to have a clean and easy to read interface for schedulers. This method will be even more crucial once CompositeSchedulers and Schedulers for different type of parameters are involved.

The immediate application of Chained Scheduler is expected to happen in TorchVision Library to combine WarmUpLR and  MultiStepLR https://github.com/pytorch/vision/blob/master/references/video_classification/scheduler.py#L5 . However, it can be expected that in many other use cases also this method could be applied.

### Example
The usage is as simple as below:

```python
sched=ChainedScheduler([ExponentialLR(self.opt, gamma=0.9),
                        WarmUpLR(self.opt, warmup_factor=0.2, warmup_iters=4, warmup_method="constant"),
                        StepLR(self.opt, gamma=0.1, step_size=3)])
```

Then calling
```python
sched.step()
```
would trigger step function for all three schedulers consecutively

Partially resolves https://github.com/pytorch/vision/issues/4281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63491

Reviewed By: datumbox, mruberry

Differential Revision: D30576180

Pulled By: iramazanli

fbshipit-source-id: b43f0749f55faab25079641b7d91c21a891a87e4
2021-08-26 13:30:21 -07:00
7cfbc85821 [fx_acc] [fx2trt] add acc op mapper for argmin and converter for topk (#63823)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63823

Add mapper for `torch.argmin` which maps it to `acc_ops.flatten` (optional) + `acc_ops.topk` + `acc_ops.getitem` + `acc_ops.squeeze` (optional). This diff doesn't allow mapping if `dim=None && keepdim=True` in `torch.argmin`.

Add fx2trt converter for `acc_ops.topk`.

Test Plan:
buck test mode/opt glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_argmin
buck run mode/opt caffe2/torch/fb/fx2trt:test_topk

Reviewed By: jfix71

Differential Revision: D30501771

fbshipit-source-id: 0babc45e69bac5e61ff0b9b4dfb98940398e3e57
2021-08-26 13:16:22 -07:00
cbfec02007 [Static Runtime] Add native op for aten::expand_as (#64024)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64024

`aten::expand_as` creates a view of the input tensor. This change adds its native op implementation for the static runtime.

Test Plan: - Added `StaticRuntime.IndividualOps_ExpandAs`

Reviewed By: hlu1

Differential Revision: D30546851

fbshipit-source-id: e53483048af890bc41b6192a1ab0c5ba0ee2bdc0
2021-08-26 13:05:53 -07:00
95d0b3199b Back out "[ONNX] Fix an issue that optimizations might adjust graph inputs unexpectedly. (#61280)" (#64004)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64004

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63904

Fixes T98808160

Test Plan: T98808160

Reviewed By: msaroufim

Differential Revision: D30527450

fbshipit-source-id: 6262901a78ca929cecda1cf740893139aa26f1b4
2021-08-26 12:49:42 -07:00
c5cc185b6d Allow uncompiled strings as input to checkScriptRaisesRegex (#63901)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63901

cc gmagogsfm

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D30579472

Pulled By: ansley

fbshipit-source-id: 59ee09c1f25278d4f6e51f626588251bd095c6ea
2021-08-26 12:17:07 -07:00
48c57b9b2e Leverage TensorPipe's automatic SHM address selection (#63028)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63028

TensorPipe until now required PyTorch to come up and provide a unique identifier to use as address for the UNIX domain socket used in the SHM transport. However the Linux kernel can automatically assign an available address (like it does with IP ports), and TensorPipe now supports it, so we can remove that useless PyTorch logic.

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D30220352

fbshipit-source-id: 78e8a6ef5916b2a72df26cdc9cd367b9d083e821
2021-08-26 12:15:53 -07:00
ad47fb8858 Rename IterableAsDataPipe to IterableWrapper (#63981)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63981

Rename `IterableAsDataPipe` to `IterableWrapper` based on our naming convention `Op-er`

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D30554197

Pulled By: ejguan

fbshipit-source-id: c2eacb20df5645d83ca165d6a1591f7e4791990f
2021-08-26 10:23:25 -07:00
0f6b524665 [NNC] Add C++ codegen backend to NNC (#62869)
Summary:
Adds a C++ codegen backend to NNC to generate C++ for CPU instead of generating LLVM IR.
Tensors are represented as blobs of float. Vector operations are devectorized/unrolled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62869

Test Plan:
https://github.com/pytorch/pytorch/tree/mvz-nnc-aot-prototype makes it able to AOT compile the whole MobileNetV3 model into binary code through LLVM codegen in NNC.

I forked that branch to https://github.com/cheng-chang/pytorch/tree/cc-aot-cpp, merged this PR into it, and modified `fancy_compile` to compile MobileNetV3 into C++ through

```
import torch

m = torch.jit.load('mobnet.pt')
m.eval()
f = torch.jit.freeze(m)
torch._C._fancy_compile(f.graph, [1, 3, 224, 224])
```

The generated C++ file `mobnet.cc` can be found at https://gist.github.com/cheng-chang/e2830cc6920b39204ebf368035b2bcec.

I manually compiled the generated C++ through `g++ -o mobnet -std=c++14 -L./build/lib -ltorch_cpu -ltorch mobnet.cc`, and it succeeded.

Reviewed By: ZolotukhinM

Differential Revision: D30149482

Pulled By: cheng-chang

fbshipit-source-id: e77b189f0353e37cd309423a48a513e668d07675
2021-08-26 09:56:37 -07:00
6d31ba6ddc [nnc] Sanitized the names of constants in the input graph. (#63990)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63923

The input graph can contain constants whose names contain special characters. So, all names of constants in the input graph need to be sanitized.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63990

Reviewed By: ZolotukhinM

Differential Revision: D30558432

Pulled By: navahgar

fbshipit-source-id: de5b0c23d50ee8997f40f2c0fc605dda3719186f
2021-08-26 09:52:02 -07:00
ba5f1b1076 [nnc] Fix dtype promotion involving scalars (#64002)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64002

Fixes https://github.com/pytorch/vision/issues/4315

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30566979

Pulled By: bertmaher

fbshipit-source-id: eaa98b9534a926be7fcd337d46c5a0acb3243179
2021-08-26 09:43:15 -07:00
1354ee417a run_test.py: add option to run only core tests (#63976)
Summary:
This is in response to a feature request from some folks in the core team to have a local command that would only run relevant "core" tests. The idea is to have a local smoke test option for developers to run locally before making a PR in order to verify their changes did not break core functionality. These smoke tests are not targeted to be short but rather relevant.

This PR enables that by allowing developers to run `python test/run_test.py --core` or `python test/run_test.py -core` in order to run the CORE_TEST_LIST, which is currently test_nn.py, test_torch.py, and test_ops.py.

I am not the best person to judge what should be considered "core", so please comment which tests should be included and/or excluded from the CORE_TEST_LIST!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63976

Test Plan:
```
(pytorch) janeyx@janeyx-mbp test % python run_test.py --core -v
Selected tests: test_nn, test_ops, test_torch
Running test_nn ... [2021-08-25 14:48:28.865078]
Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', 'test_nn.py', '-v'] ... [2021-08-25 14:48:28.865123]
test_to (__main__.PackedSequenceTest) ... ok
test_to_memory_format (__main__.PackedSequenceTest) ... ok
```

Reviewed By: walterddr

Differential Revision: D30575560

Pulled By: janeyx99

fbshipit-source-id: 3f151982c1e315e50e60cb0d818adaea34556a04
2021-08-26 09:29:57 -07:00
fbe7133b58 [Static Runtime] Disable out variant of aten::clone (#63980)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63980

The out variant implementation of `aten::clone` causes a crash, which needs further investigation. This change disables it until the problem gets fixed.

Note that `inline_cvr` doesn't use `aten::clone` as of now, so no perf implication: https://www.internalfb.com/phabricator/paste/view/P446858755?lines=121

Test Plan: N/A

Reviewed By: hlu1

Differential Revision: D30544149

fbshipit-source-id: facb334d67473f622b36862fbdb2633358556fdf
2021-08-26 08:10:13 -07:00
7ccc4b5cc8 [CI] move distributed test into its own CI job (#62896)
Summary:
Moving distributed to its own job.

- [x] ensure there should be a distributed test job for every default test job matrix (on GHA)
- [x] ensure that circleci jobs works for distributed as well
- [x] waiting for test distributed to have its own run_test.py launch options, see https://github.com/pytorch/pytorch/issues/63147

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62896

Reviewed By: seemethere

Differential Revision: D30230856

Pulled By: walterddr

fbshipit-source-id: 0cad620f6cd9e56c727c105458d76539a5ae976f
2021-08-26 08:02:20 -07:00
733755f72c remove special grad_mode tls handling (#63116)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63116

This PR removes the special flag to disable grad mode tracking on the ThreadLocalState and replaces it with an explicit setter that users can use.
This allows to reduce complexity of ThreadLocalState.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30388098

Pulled By: albanD

fbshipit-source-id: 85641b3d711179fb78ff6a41ed077548dc821a2f
2021-08-26 07:51:30 -07:00
950f7c0237 Added API tests to ReductionOpInfo and ported amax/amin/nansum tests (#62899)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62899

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30408816

Pulled By: heitorschueroff

fbshipit-source-id: 6cb0aa7fa7edba93549ef873baa2fb8a003bd91d
2021-08-26 07:18:43 -07:00
10da1fc3f8 Deify opmath_t into its own header, align with accscalar_t (#63986)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63986

Fixes #63985

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30555996

Pulled By: ezyang

fbshipit-source-id: b6e4d56a5658ed028ffc105cc4b479faa6882b65
2021-08-26 06:59:46 -07:00
774ae0851d [OpInfo] Added ReductionOpInfo subclass of OpInfo and ported sum test (#62737)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62737

ReductionOpInfo is a specialization of OpInfo for reduction operators. For now, it is designed to work with reductions that return a single tensor and that reduce all elements along one or more dimensions to a single value. In particular this excludes operators such as `max` and `min` that return multiple tensors and `quantile` that can return multiple values.

fixes https://github.com/pytorch/pytorch/issues/49746

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30406568

Pulled By: heitorschueroff

fbshipit-source-id: 218b1da1902f67bcf4c3681e2a0f0029a25d51f1
2021-08-26 06:06:38 -07:00
c02eda8166 Update TensorPipe submodule
Summary: The bot failed to do it.

Test Plan: D30542677

Reviewed By: beauby

Differential Revision: D30573500

fbshipit-source-id: 50abd6fc415cead0a6b6d9290fa0e5f97d0e4989
2021-08-26 05:44:38 -07:00
61d88cdd1c use const auto& as type for grad alias (#63949)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63949

This is an extension of the discussion in
https://github.com/pytorch/pytorch/pull/63040#discussion_r687793027.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30546789

Pulled By: dagitses

fbshipit-source-id: 3046aff4f129d5492d73dfb67717a824e16ffee8
2021-08-26 04:44:03 -07:00
5757d03145 Add logging for _MinimizerBase
Summary: Add logging so we know which nodes are currently being visited

Test Plan: lint & SC tests

Reviewed By: 842974287

Differential Revision: D30509865

fbshipit-source-id: 09e77e44c97c825242e0b24f90463b50f3ca19c6
2021-08-26 00:52:58 -07:00
a6f767ed3d Fix issue re: DDP and create_graph=True (#63831)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63831

Closes https://github.com/pytorch/pytorch/issues/63812

`at::mul_out` is not supported when `grad` itself requires grad, which is useful for computing higher order derivatives.

In this case, fall back to a mul + copy instead of mul_out.
ghstack-source-id: 136614644

Test Plan: UT

Reviewed By: SciPioneer

Differential Revision: D30505573

fbshipit-source-id: 83532b6207b3d80116fcc4dff0e5520d73b3454f
2021-08-25 23:50:25 -07:00
3b284ab024 Adding BFP16 quantization/dequantization support to OSS (#63059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63059

Supporting BFP16 quantization method to OSS. Currently only support CPU
ghstack-source-id: 136639528

Test Plan: Imported from OSS

Reviewed By: wanchaol

Differential Revision: D30194538

fbshipit-source-id: ac248567ad8028457c2a91b77ef2ce81709fce53
2021-08-25 23:41:34 -07:00
9d95d48567 (torch.distributed) Add torch.distributed.is_torchelastic_launched() util method + make init_method=tcp:// compatible with torchelastic (#63910)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63910

Addresses the current issue that `init_method=tcp://` is not compatible with `torch.distributed.run` and `torch.distributed.launch`. When running with a training script that initializes the process group with `init_method=tcp://localhost:$port` as such:

```
$ python -u -m torch.distributed.run --max_restarts 0 --nproc_per_node 1 --nnodes 1 --master_addr $(hostname) --master_port 6000 ~/tmp/test.py
```

An `Address in use` error is raised since the training script tries to create a TCPStore on port 6000, which is already taken since the elastic agent is already running a TCPStore on that port.

For details see: https://github.com/pytorch/pytorch/issues/63874.

This change does a couple of things:

1. Adds `is_torchelastic_launched()` check function that users can use in the training scripts to see whether the script is launched via torchelastic.
1. Update the `torch.distributed` docs page to include the new `is_torchelastic_launched()` function.
1. Makes `init_method=tcp://` torchelastic compatible by modifying `_tcp_rendezvous_handler` in `torch.distributed.rendezvous` (this is NOT the elastic rendezvous, it is the old rendezvous module which is slotted for deprecation in future releases) to check `is_torchelastic_launched()` AND `torchelastic_use_agent_store()` and if so, only create TCPStore clients (no daemons, not even for rank 0).
1. Adds a bunch of unittests to cover the different code paths

NOTE: the issue mentions that we should fail-fast with an assertion on `init_method!=env://` when `is_torchelastic_launched()` is `True`. There are three registered init_methods in pytorch: env://, tcp://, file://. Since this diff makes tcp:// compatible with torchelastic and I've validated that file is compatible with torchelastic. There is no need to add assertions. I did update the docs to point out that env:// is the RECOMMENDED init_method. We should probably deprecate the other init_methods in the future but this is out of scope for this issue.

Test Plan: Unittests.

Reviewed By: cbalioglu

Differential Revision: D30529984

fbshipit-source-id: 267aea6d4dad73eb14a2680ac921f210ff547cc5
2021-08-25 22:57:43 -07:00
b629ea4620 Update persons_of_interest.rst (#63907)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63907

Reviewed By: jspisak

Differential Revision: D30534972

Pulled By: dzhulgakov

fbshipit-source-id: ba726fc53e292a362c387cc8b5f7776ca2a2544c
2021-08-25 22:50:54 -07:00
b1154cc774 enable equal_nan for complex values in isclose (#63571)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63571

Test Plan: Imported from OSS

Reviewed By: malfet, ngimel

Differential Revision: D30560127

Pulled By: mruberry

fbshipit-source-id: 8958121ca24e7c139d869607903aebbe87bc0740
2021-08-25 22:05:49 -07:00
49c8fbc92f Clean up related to type refinements (#62444)
Summary:
Creates a helper function to refine the types into a torchScript compatible format in the monkeytype config for profile directed typing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62444

Reviewed By: malfet

Differential Revision: D30548159

Pulled By: nikithamalgifb

fbshipit-source-id: 7c09ce5f5e043d069313b87112837d7e226ade1f
2021-08-25 21:53:00 -07:00
80a61142e4 inference for algebraic expressions (#63822)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63822

Infer algebraic expressions and add it to our symbolic inferencer. Works for conv2D and can be extended to other operations.

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D30518469

Pulled By: migeed-z

fbshipit-source-id: b92dfa40b2d834a535177da42b851701b8f7178c
2021-08-25 20:47:23 -07:00
124ae597fb [quant] Fixing the conversion of the quantizable RNN (#63879)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63879

Quantizable RNN had a bug, where the `from_observed` was an instance method, instead of a class method. This caused the `tq.convert` to fail. This fixes the issue by making the `from_observed` a classmethod.

The tests were passing before because the unittests were not using the custom module path, but a conventional `from_float`, which is also supported.

Test Plan:
`buck test mode/dev //caffe2/test:quantization -- test_custom_module_lstm`

```
buck test mode/dev //caffe2/test:quantization -- test_custom_module_lstm
Parsing buck files: finished in 0.5 sec
Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 9.2 sec (100%) 12622/12622 jobs, 2/12622 updated
  Total time: 9.7 sec
More details at https://www.internalfb.com/intern/buck/build/0d87b987-649f-4d06-b0e2-97b5077
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: cb99305f-65c9-438b-a99f-a0a2a3089778
Trace available for this run at /tmp/tpx-20210824-115652.540356/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5066549645030046
    ✓ ListingSuccess: caffe2/test:quantization - main (12.550)
    ✓ Pass: caffe2/test:quantization - test_custom_module_lstm (quantization.core.test_quantized_op.TestQuantizedOps) (174.867)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5066549645030046
```

Reviewed By: jerryzh168, mtl67

Differential Revision: D30520473

fbshipit-source-id: bc5d0b5bb079fd146e2614dd42526fc7d4d4f3c6
2021-08-25 20:39:02 -07:00
2ea2711501 Make frozen symbol name customizable in torch deploy. (#63817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63817

ghstack-source-id: 136699671

Test Plan: eyes

Reviewed By: wconstab

Differential Revision: D29571559

fbshipit-source-id: 8e3caa4932ef8d7c8559f264f0e9bb5474ad2237
2021-08-25 20:10:35 -07:00
f4bc28990f Compute cuda reduction buffer size in elements (#63969)
Summary:
Resubmit of https://github.com/pytorch/pytorch/issues/63885

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63969

Reviewed By: mruberry

Differential Revision: D30549423

Pulled By: ngimel

fbshipit-source-id: b16d25030d44ced789c125a333d72b02a8f45067
2021-08-25 18:18:37 -07:00
01b8162d00 Back out "Revert D30384746: [fx2trt] Add a test for quantized resnet18" (#63973)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63973

Original commit changeset: b93235323e22

Test Plan: buck run mode/opt -c python.package_style=inplace caffe2:fx2trt_quantized_resnet_test

Reviewed By: 842974287

Differential Revision: D30546036

fbshipit-source-id: 2c8302456f072d04da00cf9ad97aa8304bc5e43e
2021-08-25 17:52:22 -07:00
57d4c6cf42 replace self.assertTrue(torch.allclose(..)) with self.assertEqual(…) (#63637)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63565

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63637

Reviewed By: malfet

Differential Revision: D30541266

Pulled By: mruberry

fbshipit-source-id: ab461949782c6908a589ea098fcfcf5c3e081ee6
2021-08-25 16:47:40 -07:00
1be1c901aa Remove render_test_results job (#63877)
Summary:
This removes the `render_test_results` job we had before which had been causing some confusion among devs when it failed and isn't really necessary now that we can actually render test results on the PR HUD.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63877

Reviewed By: walterddr, janeyx99

Differential Revision: D30546705

Pulled By: driazati

fbshipit-source-id: 55fdafdb6f80924d941ffc15ee10787cb54f34a1
2021-08-25 15:55:55 -07:00
ba0e6a1e03 [EASY] Update the clang-tidy error message (#63370)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63370

As shown by this CI run, the actual thing that is incorrect is the prompt.
https://github.com/pytorch/pytorch/actions/runs/1137298261

The CI runs the below command instead of the original command.
The original command errors out when importing another file on line 1.
Trying to fix the code to work with the original command causes the CI to error out.

We should actually ask the user to run
`python3 -m tools.linter.install.clang_tidy`

Test Plan: Imported from OSS

Reviewed By: janeyx99, heitorschueroff

Differential Revision: D30530216

Pulled By: Gamrix

fbshipit-source-id: 2a2b8d539dcc2839e4000c13e82c207fa89bfc9f
2021-08-25 15:30:13 -07:00
44ede71751 Shard python_torch_functions.cpp (#62187)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62187

This file can take 3 minutes on its own to compile, and after
python_functions.cpp is the second limiting factor for compile time of
`libtorch_python` on a 32-core threadripper. This splits it into 3 files that
take around 1 minute each to compile.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D29962048

Pulled By: albanD

fbshipit-source-id: 99016d75912bff483fe21b130cef43a6882f8c0e
2021-08-25 15:10:43 -07:00
730ce29baf Add note on ifdefing based on CUDA_VERSION for ROCm path (#62850)
Summary:
CUDA_VERSION and HIP_VERSION follow very unrelated versioning schemes, so it does not make sense to use CUDA_VERSION to determine the ROCm path. This note explicitly addresses it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62850

Reviewed By: mruberry

Differential Revision: D30547562

Pulled By: malfet

fbshipit-source-id: 02990fa66a88466c2330ab85f446b25b78545150
2021-08-25 15:02:03 -07:00
b5b9ce146f Small fixes to the Contributing.txt (#63385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63385

Correcting a mistake for the pytorch uninstall, and
adding an extra note for Darwin.

Test Plan: Imported from OSS

Reviewed By: janeyx99, heitorschueroff

Differential Revision: D30530234

fbshipit-source-id: e0f88a1725eeadabfb4b28c1da11e369ee878ab4
2021-08-25 14:50:37 -07:00
52ebe7e14e Back out "Temporary fix for remote gpu execution issue" (#63983)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63983

Test for fixes in D30545351. it should resolve the remote execution flag being populated incorrectly issue.

Test Plan: CI

Reviewed By: malfet, seemethere

Differential Revision: D30549443

fbshipit-source-id: b3895909f5cd654ba163b77950872b332fbad3fe
2021-08-25 14:37:01 -07:00
5b548f6f64 Shape Propagation Pass: Fix AdaptiveAveragePooling2d (#63629)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63629

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30461727

Pulled By: priyaramani

fbshipit-source-id: 3873d1d636f79185680b82de06174d8de288c941
2021-08-25 13:13:41 -07:00
ab5cf5a1eb Move existing target determinator to tools (#63809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63809

This moves out the modulefinder determinator to `tools/testing` since it is supposed to be CI-only. This also simplifies run_test.py a little bit.

Test Plan: Imported from OSS

Reviewed By: malfet, seemethere, janeyx99

Differential Revision: D30497438

Pulled By: driazati

fbshipit-source-id: 1d203037af5af6a20c1e7812da935e7cbb5cd82f
2021-08-25 13:03:53 -07:00
7edeead796 Add a comment on the potential implicit type up-casting (#63905)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63905

as title
ghstack-source-id: 136590703

Test Plan: N/A

Reviewed By: mrshenli

Differential Revision: D30527929

fbshipit-source-id: 69402bbfa87cfd8fc166ce313cde9736ee072589
2021-08-25 12:47:45 -07:00
b0782f0f32 add BFloat16 support for bernoulli and Dropout on CPU (#56372)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56372

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D28836792

Pulled By: VitalyFedyunin

fbshipit-source-id: ede951d172a59276e11383fd767778ab959b5a6b
2021-08-25 12:01:27 -07:00
7299565768 Update torch.distributed.run OMP_NUM_THREADS message to log.warning (#63953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63953

Closes #61138

Test:
`python -m torch.distributed.run --nproc_per_node 2 test.py`
Still outputs message

`LOGLEVEL=ERROR python -m torch.distributed.run --nproc_per_node 2 test.py`
Does not output message anymore

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30542997

Pulled By: H-Huang

fbshipit-source-id: e7da30dcda51516abf4e56f1f510132e44397027
2021-08-25 11:55:06 -07:00
3d4aabfc48 Fix ciflow/all label generation (#63954)
Summary:
the `ciflow/all` is automatically added but need to be added before we call `gen_root_job_condition`.

- fix the order of adding `ciflow/all`
- refactor all the string into global constants

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63954

Reviewed By: malfet

Differential Revision: D30545596

Pulled By: zhouzhuojie

fbshipit-source-id: 83ab668f0234488afb855a72e3ebd4503f7f1a78
2021-08-25 11:32:32 -07:00
67d8e7b659 Reformat run_test.py (#63808)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63808

`black run_test.py`

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D30497437

Pulled By: driazati

fbshipit-source-id: 41b29b73f41fa4bb15fce5eaa69f8efe614e02f7
2021-08-25 11:27:18 -07:00
64d605bab8 [Static Runtime] Added caching for the NNC code generated for Logit. (#63840)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63840

Added NNC generated code for Logit to the cache.

```
Logit NNC Benchmark	Time (ns)
	            w/o cache	w/ cache
logit_nnc_sleef/64	543	536
logit_nnc_sleef/512	3517	3465
logit_nnc_sleef/8192	88483	85881
logit_nnc_sleef/32768	337016	323090
logit_nnc_fast/64	167	163
logit_nnc_fast/512	866	817
logit_nnc_fast/8192	13069	12801
logit_nnc_fast/32768	53429	52530
logit_nnc_vml/64	164	151
logit_nnc_vml/512	783	769
logit_nnc_vml/8192	11563	11674
logit_nnc_vml/32768	46720	46452
```

Test Plan: Unit tests and inline_cvr model.

Reviewed By: hlu1

Differential Revision: D30405424

fbshipit-source-id: 938b1b74758e2612ae151bac890c5f8ebbc42d50
2021-08-25 11:19:58 -07:00
dde07cad6f [Static Runtime] Added a variable for clamp in the NNC code for Logit. (#63839)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63839

Replaced the use of a constant for clamp in the NNC code for Logit
with a variable. This makes it easier to enable caching for Logit.

There is no performance difference with this change, as shown in the micro-benchmarks below.

```
Logit NNC Benchmark	Time (ns)
	           const-clamp	var-clamp
logit_nnc_sleef/64	550	543
logit_nnc_sleef/512	3514	3517
logit_nnc_sleef/8192	85537	82900
logit_nnc_sleef/32768	347635	337016
logit_nnc_fast/64	173	167
logit_nnc_fast/512	829	866
logit_nnc_fast/8192	13286	13069
logit_nnc_fast/32768	51116	53429
logit_nnc_vml/64	146	164
logit_nnc_vml/512	773	783
logit_nnc_vml/8192	11556	11563
logit_nnc_vml/32768	44815	46720
```

Test Plan: SR unit tests and the inline_cvr model.

Reviewed By: bertmaher

Differential Revision: D30405466

fbshipit-source-id: adb891fdae5746439931ce5f43165291fec08f52
2021-08-25 11:19:55 -07:00
a2399a76e1 [Static Runtime] Moved NNC operator definitions to separate files. (#63838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63838

Refactored NNC operator definitions code into separate files.

Made `TEWrapper` a class with a fixed set of methods and added separate definitions for them based on `TORCH_ENABLE_LLVM` to keep the same functionality as before.

Test Plan: Build and ran Static Runtime tests.

Reviewed By: hlu1

Differential Revision: D30405467

fbshipit-source-id: 606ef852bb820d5e23a0f8af1bf5dc122e90bceb
2021-08-25 11:18:32 -07:00
8a22d4fa5c [Reland] Replacing the p.data acccess in utils with tensor.set_ . Passes both test_post_localSGD_optimizer_pari and test_periodic_model_averager tests (#63895)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63895

When updating the model parameter, updating `parameter.data` is no longer recommended, because this `data` field will be deprecated in the future.

The replacement is `tensor.set_`.
ghstack-source-id: 136593433

Test Plan:
buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_periodic_model_averager
buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity

Reviewed By: SciPioneer

Differential Revision: D30526178

fbshipit-source-id: a1ac0ec3665d8623edd5bf94f01c1132daff5c00
2021-08-25 11:12:55 -07:00
ab954cb0d1 clean up engine.cpp thread state (#63115)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63115

This actually changes:
- callbacks now run with proper grad mode even in worker threads
- graphtask's Future callbacks now run with proper TLS when erroring
  out from a worker thread

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30388100

Pulled By: albanD

fbshipit-source-id: 7ae9c461c2f0040548dd9e1e314f25e8da0c2e67
2021-08-25 11:08:43 -07:00
c06dfd7c26 [fx2trt] Check input device in TRTModule (#63893)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63893

Add a check to ensure all the inputs are on cuda device.

Test Plan: CI

Reviewed By: kflu, houseroad

Differential Revision: D30525265

fbshipit-source-id: 6e50b70fd535defc1f802d51e8bb991b2dd73741
2021-08-25 10:25:34 -07:00
6324d98e9e bf16 Error message cleanup as well as addition of is_bf16_supported (#63798)
Summary:
ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63798

Reviewed By: heitorschueroff

Differential Revision: D30526187

Pulled By: ngimel

fbshipit-source-id: c484aec14638097c96c720095d3491249b6b2d14
2021-08-25 09:59:59 -07:00
eebac46282 [pruner] add getter for pruned outputs in base pruner (#63520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63520

Rather than having to call `module.parametrizations.weight[0].pruned_outputs` each time we need to access the set of pruned indices, we add a getter `get_module_pruned_outputs` which takes the module as an argument and returns the set.

This is used for testing.
ghstack-source-id: 136561130

Test Plan:
` buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1N4gK

Reviewed By: z-a-f

Differential Revision: D30374558

fbshipit-source-id: e38dfee0879cadde52b942e899a3d8d7151ee493
2021-08-25 09:57:29 -07:00
83b132b112 [pruner] add support for pruning BatchNorm2d (#63519)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63519

If the pruner should be pruning biases along with weights, then if the model has BatchNorm2d following pruned Conv2d layers, then the corresponding channels of the BatchNorm must also be pruned.

Specifically, they need to zeroed out, rather than fully removed, since in eager mode, the dimensions between layers need to be preserved.

To do this, we add a pruning parametrization called `ZeroesParametrization` which zeroes out pruned channels, rather than removing them.

The user must provide in the config, a tuple of the Conv2d and BatchNorm layers that go together. The `prepare` method will add the tuple to the `module_groups`; then it will add a PruningParametrization to the Conv2d layer, and a ZeroesParametrization to BatchNorm, and then set their pruned sets to be the same set. That way, during `step`, both masks are updated with the same pruned indices.

ghstack-source-id: 136562278

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1N1P6

Reviewed By: z-a-f

Differential Revision: D30349855

fbshipit-source-id: 3199d3688d5a70963f9b32d7a8fdac3962ae6a65
2021-08-25 09:56:19 -07:00
c1dfd58715 Minor OptionalTensorRef updates (#63611)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63611

A few minor updates to `OptionalTensorRef`:
1. use `Tensor`'s `unsafe_borrow_t` constructor which avoids an unnecesary `nullptr` check.
2. copy constructor cannot defer to the `const Tensor&` constructor because it checks the tensor is
defined, and so would fail for disengaged optionals.
3. use copy-swap idiom to avoid issues with self-assignment. `x = x` should be a no-op, but the old
version would clear `x`.
4. Add pointer-like access for consistency with `optional` and `MaybeOwned`

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D30484704

Pulled By: ezyang

fbshipit-source-id: 738f4bd22359eaecd0a519a04e89a4b44d92da5b
2021-08-25 09:37:02 -07:00
5ab356ffe6 Update CMake minimum version to 3.10 (#63660)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63660

Test Plan: Imported from OSS

Reviewed By: janeyx99, mruberry

Differential Revision: D30543878

fbshipit-source-id: a7d938807653f39727f2cc7d7ca167200567b6a0
2021-08-25 09:25:43 -07:00
34ed16ffef Temporary fix for remote gpu execution issue (#63899)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63899

See: T99020845

Test Plan: sandcastle

Reviewed By: heitorschueroff

Differential Revision: D30527384

fbshipit-source-id: ce9933e5e181322c02d4ed17f3fdaabe4c5ba29e
2021-08-25 09:14:03 -07:00
01c35115d8 Fix bug in check_empty_containers (#63492)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63492

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D30402749

Pulled By: ansley

fbshipit-source-id: 7de533355fe91ca4f45b2bafc3bfb205a028c1ed
2021-08-25 09:05:08 -07:00
8c897d254d Swap CUDA 11.1 and 11.3 in CI to make 11.1 periodic (#63900)
Summary:
Preparing for supporting 11.3 in the next release.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63900

Reviewed By: malfet

Differential Revision: D30541437

Pulled By: janeyx99

fbshipit-source-id: a7297da7f7818a4291b1c321d62d76fc2c0f1f90
2021-08-25 09:01:26 -07:00
3926fdbaa4 [skip ci] Add generated comment to ruleset json (#63896)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63896

Reviewed By: heitorschueroff

Differential Revision: D30529820

Pulled By: zhouzhuojie

fbshipit-source-id: 7529803af23ea36a7bcb673cd399da80da8e3feb
2021-08-25 08:53:33 -07:00
87a661c79f Revert D30526034: [pytorch][PR] compute reduction intermediate buffer size in elements
Test Plan: revert-hammer

Differential Revision:
D30526034 (e69a1398cb)

Original commit changeset: 0aca7f887974

fbshipit-source-id: a22472723818d6fe0c11a6e134080df1ac408038
2021-08-25 07:17:22 -07:00
839eaa2e91 Revert D30384746: [fx2trt] Add a test for quantized resnet18
Test Plan: revert-hammer

Differential Revision:
D30384746 (10dfa58eba)

Original commit changeset: 1a8638777116

fbshipit-source-id: b93235323e229b391f5456f6e3543988062dd0d4
2021-08-25 00:43:06 -07:00
10dfa58eba [fx2trt] Add a test for quantized resnet18 (#63446)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63446

Add a test for quantized resnet18 running in TensorRT

Test Plan: buck run mode/opt -c python.package_style=inplace caffe2:fx2trt_quantized_resnet_test

Reviewed By: 842974287

Differential Revision: D30384746

fbshipit-source-id: 1a863877711618cd23d887694269ed9e44ee606c
2021-08-24 21:34:23 -07:00
0301c3bc01 [quant][graphmode][fx] Make maxpool and flatten produce the reference pattern (#63501)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63501

Currently some of the ops are considered as working with both float and quantized input,
so we may have things like "quant - some_op - dequant" this might not work well with the backend,
we may consider change everything to produce "quant - dequant - some_op - quant - dequant" instead
in the future, this PR fixes it for maxpool and flatten only to unblock resnet benchmarking on TensorRT

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: mruberry

Differential Revision: D30402788

fbshipit-source-id: 892c5ff6552775070e2c1453f65846590fb12735
2021-08-24 21:31:01 -07:00
d388a1a5df [TensorExpr] LLVMCodegen: Use addFnAttr instead of addAttribute which was deleted. (#63886)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63886

cc gmagogsfm

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30523135

Pulled By: ZolotukhinM

fbshipit-source-id: 62e125f917b2a0153eb30879d93cf956587a05e0
2021-08-24 21:23:06 -07:00
c8527bc398 [qunat][graphmode][fx] Add a separate lower_to_native_backend function for relu (#62861)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62861

This PR adds a lower_to_native_backend function to lower a quantized reference model
to a model that uses fbgemm/qnnpack ops. We'll gradually add support and remove
the fbgemm/qnnpack specific handling in quantization_patterns.py

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30165828

fbshipit-source-id: de1149cd7e7c1840c17c251cd4d35004afd015b7
2021-08-24 21:07:03 -07:00
e69a1398cb compute reduction intermediate buffer size in elements (#63885)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63869
`iter` strides are in bytes, and we are additionally multiplying size computed using those strides by `sizeof(arg_t)`. Computing `output_memory_size` in elements should be enough.
This doesn't fix the still real problem of allocating large intermediate tensor, but it makes this tensor smaller by typically a factor of 4.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63885

Reviewed By: mruberry

Differential Revision: D30526034

Pulled By: ngimel

fbshipit-source-id: 0aca7f887974b7776e380463bbd82d32a5786ee8
2021-08-24 19:39:21 -07:00
ba126df614 TST Adds more modules into common module tests (#62999)
Summary:
This PR moves some modules into `common_modules` to see what it looks like.

While migrating some no batch modules into `common_modules`, I noticed that `desc` is not used for the name. This means we can not use `-k` to filter tests. This PR moves the sample generation into `_parametrize_test`, and passes in the already generated `module_input` into users of `modules(modules_db)`.

I can see this is a little different from opsinfo and would be happy to revert to the original implementation of `modules`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62999

Reviewed By: heitorschueroff

Differential Revision: D30522737

Pulled By: jbschlosser

fbshipit-source-id: 7ed1aeb3753fc97a4ad6f1a3c789727c78e1bc73
2021-08-24 19:16:32 -07:00
544af391b5 Allow arbitrary objects in state_dicts (#62976)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62094

Introduces functionality for adding arbitrary objects to module state_dicts. To take advantage of this, the following functions can be defined on a module:
* `get_extra_state(self) -> dict` - Returns a dict defining any extra state this module wants to save
* `set_extra_state(self, state)` - Subsumes the given state within the module

In the details, a sub-dictionary is stored in the state_dict under the key `_extra_state` for each module that requires extra state.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62976

Reviewed By: heitorschueroff

Differential Revision: D30518657

Pulled By: jbschlosser

fbshipit-source-id: 5fb35ab8e3d36f35e3e96dcd4498f8c917d1f386
2021-08-24 19:06:14 -07:00
58ef99bd5a TST Adds pickle testing for ModuleInfo (#63736)
Summary:
Follow up to https://github.com/pytorch/pytorch/pull/61935

This PR adds `test_pickle` to `test_modules`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63736

Reviewed By: heitorschueroff

Differential Revision: D30522462

Pulled By: jbschlosser

fbshipit-source-id: a03b66ea0d81c6d0845c4fddf0ddc3714bbf0ab1
2021-08-24 19:04:46 -07:00
8dda299d96 Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776

I reverted this out of an abundance of caution because some test
failures occurred, but they were all due to precision issues fixed lower in
this stack.  Let's try again.

I've rolled the elimination of the allow-parallelism-in-fusions toggle into
this diff since they're pretty tightly coupled.
ghstack-source-id: 136529847

Test Plan: CI

Reviewed By: huiguoo

Differential Revision: D30484555

fbshipit-source-id: 38fd33520f710585d1130c365a8c60c9ce794a59
2021-08-24 18:56:55 -07:00
1787b905c4 Don't switch executors mid test (#63830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63830

It's really not safe to change the executor out from under models that may have
already been partially compiled.
ghstack-source-id: 136526228

Test Plan:
```
DEBUG=1 CFLAGS="-fsanitize=address" CXXFLAGS="-fsanitize=address" USE_LLVM=$(realpath ../llvm-project/install) CMAKE_PREFIX_PATH=$CONDA_PREFIX python setup.py install
LD_PRELOAD=/lib64/libasan.so.5 numactl -C3 pytest -v --cov --cov-report xml:test/coverage.xml --cov-append onnx/test_pytorch_onnx_onnxruntime.py::TestONNXRuntime_opset11 -s
```

Reviewed By: desertfire

Differential Revision: D30504489

fbshipit-source-id: 188581cb53f0cf5bd3442d1e9d46e8c0c7e124f8
2021-08-24 18:56:53 -07:00
543130511a [nnc] Disable erf and erfc (#63775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63775

These introduce small accuracy differences that cause some internal
tests to fail, and it's not worth fixing the tests right now because they're
slower than the ATen ops anyways.
ghstack-source-id: 136526229

Test Plan:
```
buck test mode/dev //aml/eccv/mcm/training:tests -- --exact 'aml/eccv/mcm/training:tests - test_build_torch_script_model (aml.eccv.mcm.training.tests.publish_helper_tests.TransformerPredictorPublishHelperTests)'
```

Reviewed By: navahgar

Differential Revision: D30484557

fbshipit-source-id: 095a9c810539a499105b76e1d96843dbc61b0079
2021-08-24 18:55:45 -07:00
d454c9e76e Migrate THCTensor_copyIgnoringOverlaps to ATen (#63505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63505

This isn't a public operator, just a helper function used in CUDA_tensor_apply.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30441305

Pulled By: ngimel

fbshipit-source-id: 84fabc701cbd8479e02d80f373a3dd62d70df2ce
2021-08-24 18:50:28 -07:00
5b28e3c183 [quant][graphmode][fx] Add reference option support for binary ops (#62698)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62698

We also removed the special handling in match_utils for binary ops

Test Plan:
python test/test_quantize.py TestQuantizeFx
python test/test_quantize.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30093781

fbshipit-source-id: 58cc972de8211a80dd4d111e25dc4ad36057933f
2021-08-24 18:22:11 -07:00
6fa646ad54 [StaticRuntime] Fix bug in HasInplaceOp (#63842)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63842

Reviewed By: mikeiovine

Differential Revision: D30506914

fbshipit-source-id: b2e358cfb991dacdb295b61bbc37beb36b73b852
2021-08-24 17:07:45 -07:00
956c8fa01e Microbenchmarking matrix mult (einsum, torch.mult, torch.mm) (#63654)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63654

Test Plan:
```
> buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:matrix_mult_test

# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: einsum_bmm
# Mode: Eager
# Name: einsum_bmm_B4_M5_N3_K2_cpu
# Input: B: 4, M: 5, N: 3, K: 2, device: cpu
Forward Execution Time (us) : 27.970

# Benchmarking PyTorch: einsum_bmm
# Mode: Eager
# Name: einsum_bmm_B32_M25_N20_K30_cpu
# Input: B: 32, M: 25, N: 20, K: 30, device: cpu
Forward Execution Time (us) : 41.830

# Benchmarking PyTorch: einsum_bmm
# Mode: Eager
# Name: einsum_bmm_B128_M100_N120_K110_cpu
# Input: B: 128, M: 100, N: 120, K: 110, device: cpu
Forward Execution Time (us) : 499.114

# Benchmarking PyTorch: bmm
# Mode: Eager
# Name: bmm_B4_M5_N3_K2_cpu
# Input: B: 4, M: 5, N: 3, K: 2, device: cpu
Forward Execution Time (us) : 6.268

# Benchmarking PyTorch: bmm
# Mode: Eager
# Name: bmm_B32_M25_N20_K30_cpu
# Input: B: 32, M: 25, N: 20, K: 30, device: cpu
Forward Execution Time (us) : 12.676

# Benchmarking PyTorch: bmm
# Mode: Eager
# Name: bmm_B128_M100_N120_K110_cpu
# Input: B: 128, M: 100, N: 120, K: 110, device: cpu
Forward Execution Time (us) : 438.219

# Benchmarking PyTorch: einsum_elementwise
# Mode: Eager
# Name: einsum_elementwise_B4_M5_N3_cpu
# Input: B: 4, M: 5, N: 3, device: cpu
Forward Execution Time (us) : 7.657

# Benchmarking PyTorch: einsum_elementwise
# Mode: Eager
# Name: einsum_elementwise_B32_M25_N20_cpu
# Input: B: 32, M: 25, N: 20, device: cpu
Forward Execution Time (us) : 18.523

# Benchmarking PyTorch: einsum_elementwise
# Mode: Eager
# Name: einsum_elementwise_B100_M90_N110_cpu
# Input: B: 100, M: 90, N: 110, device: cpu
Forward Execution Time (us) : 55.103

# Benchmarking PyTorch: mul
# Mode: Eager
# Name: mul_B4_M5_N3_cpu
# Input: B: 4, M: 5, N: 3, device: cpu
Forward Execution Time (us) : 2.501

# Benchmarking PyTorch: mul
# Mode: Eager
# Name: mul_B32_M25_N20_cpu
# Input: B: 32, M: 25, N: 20, device: cpu
Forward Execution Time (us) : 10.589

# Benchmarking PyTorch: mul
# Mode: Eager
# Name: mul_B100_M90_N110_cpu
# Input: B: 100, M: 90, N: 110, device: cpu
Forward Execution Time (us) : 50.102

Reviewed By: ajyu

Differential Revision: D30455179

fbshipit-source-id: 9f2d92b2d2b860f41a8e59be2cc086d75b587f7b
2021-08-24 16:26:26 -07:00
6d58c83007 Turn off layer norm in jit symbolic differentiation (#63816)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63816

Test Plan:
Confirmed this can rescue the NE:

https://www.internalfb.com/mast/job/torchx_xdwang-SparseNNApplication_72cf593d

Reviewed By: ngimel

Differential Revision: D30498746

fbshipit-source-id: 4a387f32ee2f70685de6104459c7f21bfbddc187
2021-08-24 15:47:13 -07:00
41ffec07ce Add a common autograd TLS state (#63860)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63860

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30513253

Pulled By: albanD

fbshipit-source-id: 97d76ed54dfbdf4ba3fc7051ce3b9bb636cefb4b
2021-08-24 15:34:06 -07:00
865d127a66 .github: Enable with-ssh for Windows (#63440)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63440

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: janeyx99

Differential Revision: D30521460

Pulled By: seemethere

fbshipit-source-id: e987e170e73fb4f9d9f024bed0e58404ed206848
2021-08-24 14:14:27 -07:00
4e37a015c7 [FX] Fix _replicate_for_data_parallel (#63821)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63821

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D30502115

Pulled By: jamesr66a

fbshipit-source-id: 0f004f95def6e1ba21ccbeab40cb0a739a0ad20c
2021-08-24 13:48:15 -07:00
5be17ec1fc Do not modify saved variables in-place for spectral norm during power iteration (#62293)
Summary:
Interestingly enough, the original code did have a mechanism that aims to prevent this very issue:
but it performs a clone AFTER modifying u and v in-place.
This wouldn't work though because we can later use the cloned u and v in operations that save for backward, and the next time we execute forward, we modify the same cloned u and v in-place.
So if the idea is that we want to avoid modifying saved variable in-place we should clone it BEFORE the in-place operation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62293

Reviewed By: bdhirsh

Differential Revision: D30489750

Pulled By: soulitzer

fbshipit-source-id: cbe8dea885aef97adda8481f7a822e5bd91f7889
2021-08-24 13:08:59 -07:00
4a0776100e Migrate legacy lstsq from THC to ATen (CUDA) (#63504)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63504

Closes gh-24592

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30441304

Pulled By: ngimel

fbshipit-source-id: ec176596f54bc084af48a73d1dbb0dcb82fec593
2021-08-24 12:47:16 -07:00
699c764d2e Revert D30513613: Removing tensor.data usage in utils with tensor set_ method
Test Plan: revert-hammer

Differential Revision:
D30513613 (d08a36f831)

Original commit changeset: 402efb9c30fa

fbshipit-source-id: 911c66a9852de77dc5274b5fb373258c0c97739a
2021-08-24 12:20:37 -07:00
835dac0869 Merge common fields from TensorInitParams and ShardedTensorMetadata into TensorProperties (#63731)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63731
1) Follow up [PR/63378 last comment](https://github.com/pytorch/pytorch/pull/63378#discussion_r693143053)
2) Also updated the caller side (usage of ShardedTensorMetadta) in fbcode

Ref: [landing workflow 3](https://www.internalfb.com/intern/wiki/PyTorch/PyTorchDev/Workflow/Landing/#landing-your-prs-from-gi-1)

Test Plan:
Imported from OSS

OSS: (pytorch).. $ python test/distributed/_sharded_tensor/test_sharded_tensor.py --v
FB:  fbcode $ buck test mode/dev //aiplatform/modelstore/checkpointing/pyper/tests:checkpoint_utils_test

Reviewed By: wanchaol, heitorschueroff

Differential Revision: D30472281

fbshipit-source-id: 727fb0e7f10eab4eb7a10476194e9008f2ac1fb5
2021-08-24 11:49:06 -07:00
d08a36f831 Removing tensor.data usage in utils with tensor set_ method (#63867)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63867

When updating the model parameter, updating `parameter.data` is no longer recommended, because this `data` field will be deprecated in the future.

The replacement is `tensor.set_`.

ghstack-source-id: 136531233

Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_periodic_model_averager

Reviewed By: SciPioneer

Differential Revision: D30513613

fbshipit-source-id: 402efb9c30fafc3f285bebc631639f656ceae585
2021-08-24 11:20:44 -07:00
73431449b3 update readme and contributing.md (#63843)
Summary:
1. In fact, Visual Studio isn't supported as CMAKE generator
2. I was asked many times why there's error as 'Could NOT find OpenMP'
3. Add Newly added Best Practices link in contributing.md

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63843

Reviewed By: seemethere, heitorschueroff

Differential Revision: D30514095

Pulled By: janeyx99

fbshipit-source-id: 76715a1d8c049122546e5a7778cafe54e4dfd5d6
2021-08-24 10:52:11 -07:00
e6dc7bc61b Subprocess encoding fixes for cpp extension (#63756)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63584

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63756

Reviewed By: bdhirsh

Differential Revision: D30485046

Pulled By: ezyang

fbshipit-source-id: 4f0ac383da4e8843e2a602dceae85f389d7434ee
2021-08-24 10:46:11 -07:00
14d4723abd add bf16 support for bucketize (#55588)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55588

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D28836796

Pulled By: VitalyFedyunin

fbshipit-source-id: c9ae5b969c30a45473533be5f29bb497f8da5143
2021-08-24 10:31:42 -07:00
1256dcd509 [pruner] modify base pruner to prune bias by default (#63202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63202

By default, the prune will also prune biases, such that the whole output channel is removed. The user can manually set `also_prune_bias` to False when calling `prepare` if they don't want the bias to be pruned.
ghstack-source-id: 136466671

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1MV32

modify `fusion_tests` according to API change
`buck test mode/opt //scripts/kazhou:fusion_tests`

https://pxl.cl/1NbKz

Reviewed By: z-a-f

Differential Revision: D30294494

fbshipit-source-id: c84655648bee0035559195ca855b98fb7edaa134
2021-08-24 10:25:45 -07:00
16ba20507a [pruner] amend base pruner API to match base sparsifier (#63178)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63178

Update base pruner API to match base sparsifier API as defined in D28970960 / PR58955

Changes include:
- `enable_mask_update = True` in `__init__`
- `prepare` takes model and config instead of constructor
- convert functionality renamed to `squash_mask`, `convert` method call now raises Error
- `activation_handles` ad `bias_handles` initialized in `_prepare` instead of constructor
ghstack-source-id: 136467595

Test Plan:
Function names updates according to changes

`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1MTgH

TODO will need to modify `fbcode/scripts/kazhou/fusion_tests.py` to use new API

Reviewed By: z-a-f

Differential Revision: D30287179

fbshipit-source-id: d4727bea1873b500f2d4bb784db26d532bf26cce
2021-08-24 10:25:43 -07:00
5dee15401c [pruner] refactor ActivationReconstruction forward hooks (#63158)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63158

Combined functionality for `ActivationReconstruction` for both Linear and Conv2d in one class. The only difference between the old classes was the size and indexing of the reconstructed tensor -- that logic can be generalized by iterating over the size of `output`.
ghstack-source-id: 136467465

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1MSSv

Reviewed By: raghuramank100

Differential Revision: D30282765

fbshipit-source-id: 08a1e4e0650511019fff85cf52b41dd818b0c7f8
2021-08-24 10:24:29 -07:00
7774a4e95b [Static Runtime] Implement prim::VarStack out variant (#63579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63579

Provide a static runtime out variant implementation for the new op introduced in D30426232 (1385f9fb12).

Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- IndividualOps_VarStack`

Reviewed By: navahgar

Differential Revision: D30410525

fbshipit-source-id: bc59a3d8ad23e3d94561ec2dca9cc20687dbadf8
2021-08-24 09:44:29 -07:00
227cb268bc [Reland] Embedding thrust->cub migration (#63806)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63427

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63806

Reviewed By: bdhirsh

Differential Revision: D30498255

Pulled By: ngimel

fbshipit-source-id: 78b7085a92a168cf0163f53dcb712bac922f5235
2021-08-24 09:30:32 -07:00
94d621584a optimize BFloat16 elemwise operators CPU: sigmoid, sigmoid_backward, tanh_backward, addcmul, addcdiv (#55221)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55221

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D28836797

Pulled By: VitalyFedyunin

fbshipit-source-id: 6b79098c902ffe65d228668118ef36fb49bab800
2021-08-24 08:56:17 -07:00
33a163d886 Enable BFloat16 LeakyReLU and RReLU in CPU path (#61514)
Summary:
Enable and optimize BFloat16 LeakyReLU and RReLU in CPU path.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61514

Reviewed By: ejguan

Differential Revision: D30257612

Pulled By: VitalyFedyunin

fbshipit-source-id: 8cc0d1faacd02dcc9827af724a86d95b6952748f
2021-08-24 08:34:56 -07:00
2ca2761f3c ENH Adds no_batch_dim for NLLLoss (#62651)
Summary:
Towards https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62651

Reviewed By: VitalyFedyunin

Differential Revision: D30303340

Pulled By: jbschlosser

fbshipit-source-id: 7ab478cf63bf6cd1f850cad5fd101e74a2cfe3f5
2021-08-24 08:27:27 -07:00
d3be02d100 fix batchnorm2d issue when input is non contiguous (#63392)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63392

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30476317

Pulled By: VitalyFedyunin

fbshipit-source-id: 03055a0aec21cf2c029b6f32315da2b09cb722d0
2021-08-24 08:24:01 -07:00
1385f9fb12 [JIT] Add variadic stack op (#63578)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63578

Added a new op `prim::VarStack` and a pass that transforms instances of `aten::stack(list, dim)` into `prim::VarStack(list[0], ..., list[n], dim)`. Also provided a JIT interpreter implementation.

Most of the implementation/tests are the same as `prim::VarConcat`.

Test Plan: `buck test caffe2/test/cpp/jit:jit -- TestStackOpt`

Reviewed By: navahgar

Differential Revision: D30426232

fbshipit-source-id: 9829a7db6e0a5038c9b7528c43c25b0c221aa2ce
2021-08-24 08:20:54 -07:00
f4aff3a346 [BE] add distributed run_test options (#63147)
Summary:
Currently distributed tests are mixed within test_python.
We would like to split the distributed tests into its own batch thus we need to split them out.

Adding an option to include/exclude distributed tests with CUSTOM_HANDLERS.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63147

Test Plan:
- locally run with the addition run_test.py options.
- CI

Dependency: found a bug in mpiexec test and need https://github.com/pytorch/pytorch/issues/63580 to fix it first.

Reviewed By: bdhirsh

Differential Revision: D30496178

Pulled By: walterddr

fbshipit-source-id: 7903a57b619f2425028028f944211938823918a6
2021-08-24 08:03:01 -07:00
688f06cac3 Revert D30388099: Add a common autograd TLS state
Test Plan: revert-hammer

Differential Revision:
D30388099 (83d9bad44a)

Original commit changeset: 8e03f940150f

fbshipit-source-id: f6d60fec66e8292f5268335bb8a3e7e1a662f23b
2021-08-24 07:22:39 -07:00
9914fb6615 ENH Adds no_batch_dim tests/docs for LPPool1d and Identity (#62190)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62190

Reviewed By: ejguan

Differential Revision: D29942385

Pulled By: jbschlosser

fbshipit-source-id: 00df6f6f01ad039631bb8679f8de94863aac7650
2021-08-24 06:59:41 -07:00
83d9bad44a Add a common autograd TLS state (#63114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63114

This PR collapses the GradMode and InferenceMode thread local booleans into a single thread local uint8.
This helps reducing the number of thread local variable accesses done when we propagate ThreadLocalStates.

Note that this is even more beneficial as we will add a forward mode AD TLS (similar to GradMode) higher in this stack and this new structure should reduce the perf impact of adding this new TLS.

Here is the full benchmark result between master and the top of this stack: https://gist.github.com/albanD/e421101e9ed344e94999bef3a54bf0f3
tl;dr: give a benefit in most cases. It is never detrimental.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30388099

Pulled By: albanD

fbshipit-source-id: 8e03f940150ff063c2edd792733663413ae2f486
2021-08-24 06:54:02 -07:00
c545b099aa Separating quantization test from distributed_test (#63058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63058

Dedicating separate tests for different quantization methods. Currently supporting FP16 method.
ghstack-source-id: 136499767

Test Plan: uck test mode/dev //caffe2/test/distributed/algorithms/quantization:quantization_gloo_fork -- name_of_the_test

Reviewed By: wanchaol

Differential Revision: D30142580

fbshipit-source-id: 3aacec1a231a662067d2b48c001f0c69fefcdd60
2021-08-24 01:44:55 -07:00
f0d274294d [TensorExpr] Nuke KernelArena and KernelScope. (#63587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587

Now that there is no classes using KernelArena for memory management we
can remove it.

Differential Revision:
D30429115
D30429115

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544
2021-08-24 00:32:16 -07:00
62d02f2b57 [TensorExpr] Make 'Tensor' a value type. (#63586)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586

This is another commit in transition from KernelArena memory management.
Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need
to dynamically allocate it at all - it's cheap to pass it by value, and
that's what we're switching to in this commit.

After this change nothing uses KernelScope/KernelArena and they can be
safely removed.

Differential Revision:
D30429114
D30429114

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819
2021-08-24 00:32:13 -07:00
4e15a6f495 [TensorExpr] Switch Exprs and Stmt from kernel-arena to shared_ptr. (#63216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63216

Currently there are three classes managed by KernelArena: Expr, Stmt,
and Tensor (and derived classes). KernelArena has been a long standing
painpoint for NNC devs and we're moving away from that memory management
model to ref-count based memory model (using shared_ptr). This commit
switches Expr and Stmt to shared_ptr and is the biggest change in this
transition. Later commits will detach Tensor from KernelArena and kill
the arena + scope altogether.

Differential Revision:
D30353195
D30353195

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 9575225ada3d0fb65087ae40435f3dfea4792cae
2021-08-24 00:32:11 -07:00
dd96c26066 [TensorExpr] More NFC changes like Expr* -> ExprPtr. (#63778)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63778

This is a preparation for a switch from raw pointers to shared pointers
as a memory model for TE expressions and statements.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30487425

Pulled By: ZolotukhinM

fbshipit-source-id: 9cbe817b7d4e5fc2f150b29bb9b3bf578868f20c
2021-08-24 00:30:49 -07:00
5b7cdc5a3d add channels last for GroupNorm (#49821)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49821

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D26007053

Pulled By: VitalyFedyunin

fbshipit-source-id: 34a48d5d3b66a159febf3c3d96748fbaba1b9e31
2021-08-23 22:54:59 -07:00
f5d585391d Add ROCm as a platform for which tests can be disabled (#63813)
Summary:
Realized we were missing ROCm as a platform on which one could disable a flaky test. (like how this issue specifies windows https://github.com/pytorch/pytorch/issues/61655)

cc jeffdaily sunway513 jithunnair-amd ROCmSupport

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63813

Reviewed By: seemethere

Differential Revision: D30498478

Pulled By: janeyx99

fbshipit-source-id: f1abe8677e1ddd01de3291e1618272ad8e287dc4
2021-08-23 18:50:04 -07:00
d96ef8c1b1 [Static Runtime] SR clones graph input (#63704)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63704

Previously SR did not clone the graph. This was leading to subtle bugs in `testStaticRuntime`; static runtime would modify its graph, and the graph used by the JIT interpreter would change as well. The JIT interpreter would then crash if SR-only ops were added!

Cloning the graph is more consistent with the behavior of the `Module` ctor.

Test Plan: `buck test caffe2/benchmarks/static_runtime/...`

Reviewed By: hlu1

Differential Revision: D30463294

fbshipit-source-id: b771551a1f55f95fde79373b23babcf3e5ddf726
2021-08-23 18:45:41 -07:00
195c60d844 [fx2trt] Add acc op and converter for torch.pow (#63795)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63795

att

Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_binary_ops

Reviewed By: jackm321, wushirong

Differential Revision: D30492488

fbshipit-source-id: 6d615770567b13720316f06fd2f866ea2fdc2995
2021-08-23 18:18:31 -07:00
e1bdebf685 Adding DataLoader2 class as future replacement of DataLoader (#63742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63742

Supports sharding and batching on loader level**

Supports sharding and batching on loader level

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30494506

Pulled By: VitalyFedyunin

fbshipit-source-id: 6648e09d955055ac38e3a4e3973f701acefca762
2021-08-23 18:09:07 -07:00
fc07489ec5 [BE] Enable PostLocalSGD tests on windows (#63463)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63463

Now that `torch.distributed.optim` gates DistributedOptimizer on RPC availability, local sgd optimizer can be used on windows.
ghstack-source-id: 136437632

Test Plan: Ci

Reviewed By: SciPioneer

Differential Revision: D30358922

fbshipit-source-id: 9b56aebf1075f026637296d338805ad8851c9d40
2021-08-23 17:49:03 -07:00
16a4434422 [BE] Enable functional optim tests for windows (#63462)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63462

Now that `torch.distributed.optim` gates DistributedOptimizer on RPC availability, these tests can be run on windows.
ghstack-source-id: 136437635

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D30358923

fbshipit-source-id: 36739bdfe7214789f17de652d30c62c2bc124c73
2021-08-23 17:49:01 -07:00
630ec2e190 [fx_acc] Add mapper for torch.log1p (#63792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63792

Map `torch.log1p` to `acc_ops.add` + `acc_ops.log`.

Test Plan: buck test mode/opt glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_log1p

Reviewed By: wushirong

Differential Revision: D30491706

fbshipit-source-id: bcbeddf06131113185d2019cfd7cf5e9193a8a78
2021-08-23 17:48:59 -07:00
e4f44bec27 Fix pocketfft include path in mobile build (#63714)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63714

PocketFFT was disabled for CMake < 3.9 but CMake 3.11 is the first version to support `INCLUDE_DIRECTORIES` as a target property. So updating to CMake 3.10 causes the mobile builds to fail. Instead of limiting the CMake support, this just adds the include directory to the entire target,

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D30498369

Pulled By: malfet

fbshipit-source-id: 83372e29c477c97e7015763b7c29d6d7e456bcef
2021-08-23 17:48:57 -07:00
fc47497905 Simplify ccache instructions in CONTRIBUTING.md (#62549)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62549

When building CUDA files with native CMake support, it will respect the
`CMAKE_CUDA_COMPILER_LAUNCHER` setting. So, there's no need for symlinks.

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D30498488

Pulled By: malfet

fbshipit-source-id: 71c2ae9d4570cfac2a64d777bc95cda3764332a0
2021-08-23 17:47:38 -07:00
d9231dc3df Skip archiving useless build artifacts (#63785)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63785

We currently zip up everything in `build/` which includes a lot of cruft (`.o` files, random things copied in from dependencies, etc). This makes the artifact bigger (slower upload/download times, and takes about 1.5 minutes to archive). This change makes archiving instead take ~15 seconds and removes the 50 second upload to GitHub step that isn't as useful now that we have the HUD PR page that lists out all artifacts.

Test Plan: Imported from OSS

Reviewed By: seemethere, janeyx99

Differential Revision: D30494444

Pulled By: driazati

fbshipit-source-id: 93202dba7387daeb4859a938110b02ff2dc2ccc4
2021-08-23 17:40:01 -07:00
172e5c76ab Fix some memory bugs in onnx passes (#63754)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63754

Running onnx tests with ASAN uncovers several memory errors.  These two are caused by: (1) iterating the uses list of a node after mutation, and (2) accessing the `blocks` attribute of a possibly deleted node.

To reproduce (this is on a CentOS 7 box):
```
DEBUG=1 CFLAGS="-fsanitize=address" CXXFLAGS="-fsanitize=address" USE_LLVM=$(realpath ../llvm-project/install) CMAKE_PREFIX_PATH=$CONDA_PREFIX python setup.py install
LD_PRELOAD=$(realpath /lib64/libasan.so.5) numactl -C3 pytest -v --cov --cov-report xml:test/coverage.xml --cov-append onnx/test_pytorch_onnx_onnxruntime.py::TestONNXRuntime_opset11 -s
```

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30493939

Pulled By: bertmaher

fbshipit-source-id: e16e19dc9b4c9896e102ca8bf04c8bedfdde87af
2021-08-23 17:31:45 -07:00
fc6dd0bc00 [JIT] Move UseVariadicCat internals (#63577)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63577

Since other variadic ops will have an almost identical implementation, we can generalize the `UseVariadicCat` implementation and put it in a common folder.

Also moved some test utilities that other variadic op tests will likely need.

Test Plan: `buck test caffe2/test/cpp/jit:jit -- ConcatOptTest`

Reviewed By: navahgar

Differential Revision: D30409937

fbshipit-source-id: 925c11c27b58ce98cb8368d2a205e26ba66d3db9
2021-08-23 17:30:36 -07:00
130549d61b Fix typo in NNAPI tests (#63797)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63797

nnapi memory format test has a typo

Test Plan:
pytest test/test_nnapi.py::TestNNAPI

Imported from OSS

Reviewed By: Amyh11325

Differential Revision: D30495473

fbshipit-source-id: 8edad7c01a080847a64a2797e077ec4d6077552a
2021-08-23 16:34:24 -07:00
84890aae35 [Static Runtime] Add an out variant op for aten::abs (#63675)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63675

This change adds an out variant implementation for `aten::abs`.

Test Plan:
- Observed `V0820 14:14:08.880342 101788 impl.cpp:1394] Switch to out variant for node: %3 : Tensor = aten::abs(%a.1)`

- Perf impact: TBD

Reviewed By: hlu1

Differential Revision: D30461317

fbshipit-source-id: 0c0230bd40afe463ae1ccb222c2a1207ebcf4191
2021-08-23 16:25:10 -07:00
55f8f95ad4 fix git diff issue (#63408)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60111, ideally we should merge this before https://github.com/pytorch/pytorch/issues/63360 but we can also test this with https://github.com/pytorch/pytorch/issues/63360 easily.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63408

Test Plan:
- This is conform working with local test.sh run by setting PR_NUMBER
- should be validated by GHA CI as well

Concern:
- currently GHA CI is running into proxy 403 rate-limit exceeded issue consistently. However the worst case is not generating any git diff files, which is going to be exactly the same as current behavior.
- depends on https://github.com/pytorch/pytorch/issues/63770.

Reviewed By: driazati, janeyx99

Differential Revision: D30489355

Pulled By: walterddr

fbshipit-source-id: a638b7ae5820f29a7aca6cc40ff390ab253cb174
2021-08-23 15:38:18 -07:00
49be16d50a .github: Add ec2 information as a step (#63784)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63784

Also creates the common.yml.j2 file as a place to store common code
amongst the templates

Should look like:
![image](https://user-images.githubusercontent.com/1700823/130495226-f18b8c0f-1ea7-4097-8bbb-e998fabb71f2.png)

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet, driazati

Differential Revision: D30490682

Pulled By: seemethere

fbshipit-source-id: 18028b4acff938ef54cd6e4877561b2d830a11cf
2021-08-23 15:04:04 -07:00
7946f8a9f6 Rename DataPipe to Op-er (#63325)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63325

Rename each DataPipe to an operation name ending with er. Functional API should remain `verb` such as `read_from_tar` , `shuffle`, ... (Discussed in [here](https://github.com/facebookexternal/torchdata/pull/97#discussion_r688553905))
- Batch -> Batcher
- Collate -> Collator
- Concat -> Concater
- GroupByKey - > ByKeyGrouper ?
- ListDirFiles -> FileLister
- LoadFilesFromDisk -> FileLoader
- Map -> Mapper
- ReadFilesFromTar -> TarArchiveReader
- ReadFilesFromZip -> ZipArchiveReader
- ReadLinesFromFile -> LineReader
- Shuffle -> Shuffler
- ToBytes -> StreamReader
- Transforms -> Transformer
- Zip -> Zipper

Let me know if you have better name for each DataPipe

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30466950

Pulled By: ejguan

fbshipit-source-id: 72909dca7b3964ab83b965891f96cc1ecf62d049
2021-08-23 14:36:10 -07:00
a781340bf7 Add equality constraints for some acc opeartions for symbolic inference (#63689)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63689

Test Plan:
buck run mode/opt-clang caffe2/torch/fb/model_transform/experimental:fx_ir_lower_inline_cvr -- \
    --action=lower_and_run \
    --filename=inline_cvr_7x_dec_2020.model \
    --print_glow_glog=True

Reviewed By: jamesr66a

Differential Revision: D30462113

fbshipit-source-id: 0b2a1ce9770561248527d47c07b80112491dc949
2021-08-23 14:11:08 -07:00
0bc7fef406 [Static Runtime] Remove unused fusion patterns (#63636)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63636

Reviewed By: d1jang

Differential Revision: D30446573

fbshipit-source-id: 3abb7f697380f3b4e865b98c594de359b5e26b96
2021-08-23 12:55:09 -07:00
a709ab34a8 [nnc] Re-enable CPU fusion" (#63665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63665

This reverts commit 125e2d02e575612eb427104e7c67f1c28f090db8.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30471646

Pulled By: bertmaher

fbshipit-source-id: 4189869566f03b5f9ada78d78830f6a34946eed6
2021-08-23 12:42:42 -07:00
560cd88195 Kill THCUNN (#63429)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63429

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30441308

Pulled By: ngimel

fbshipit-source-id: 3ae342a2f8d5c7f8827b637c4055c5d1b0a1be26
2021-08-23 12:07:16 -07:00
db1b27fa8d fix mpi ssh runtime error (#63580)
Summary:
should fix https://github.com/pytorch/pytorch/issues/60756.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63580

Test Plan:
- this CI.
- validated by running on the bionic_cuda container: https://app.circleci.com/pipelines/github/pytorch/pytorch/366632/workflows/478602fb-698f-4210-ac09-d9c61af5c62b/jobs/15472104

Reviewed By: malfet

Differential Revision: D30486472

Pulled By: walterddr

fbshipit-source-id: d83ab88d163d4a468f03961a13d891b658668a7f
2021-08-23 09:45:33 -07:00
98449f5bba hotfix clone issue (#63770)
Summary:
This was discovered during https://github.com/pytorch/pytorch/issues/63408. For some reason only this checkout action is not correctly set fetch-depth

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63770

Reviewed By: malfet, janeyx99

Differential Revision: D30486110

Pulled By: walterddr

fbshipit-source-id: a67395cca2487407ed0d49c8c89587935ca5f212
2021-08-23 09:30:48 -07:00
f1d865346f [ONNX] add test images to repo (#63717)
Summary:
This is better than the status quo:
* Test doesn't download files from the internet -> faster and more
  reliable.
* Test doesn't leave the git working directory dirty.

Rather than using the original images, I've copied some images from
the pytorch/vision repo. This will keep the tests in the two repos
in sync, while avoiding adding new assets to the vision repo.

See https://github.com/pytorch/vision/pull/4176.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63717

Reviewed By: janeyx99

Differential Revision: D30466016

Pulled By: malfet

fbshipit-source-id: 2c56d4c11b5c74db1764576bf1c95ce4ae714574
2021-08-23 07:43:21 -07:00
bafd875f74 Allow implementing either backward or vjp for Function (#63434)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63434

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30431968

Pulled By: albanD

fbshipit-source-id: 0bb88664283486a9fd3364e6c3d79442a44625c2
2021-08-23 07:07:11 -07:00
726fd26b3e Update ROCm PyTorch persons of interest (#55206)
Summary:
cc jeffdaily sunway513

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55206

Reviewed By: VitalyFedyunin

Differential Revision: D30296584

Pulled By: dzhulgakov

fbshipit-source-id: 6e5c610cc6b7c7fd58b80fa3f9de31f269341a88
2021-08-22 22:31:09 -07:00
d6133b2fe6 Remove _fork_processes from common_distributed.py (#63711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63711

This removes `_fork_process` from common_distributed.py and fixes all
other callpoints to use `spawn_process` instead.
ghstack-source-id: 136395719

Test Plan: waitforbuildbot

Reviewed By: xush6528

Differential Revision: D30463834

fbshipit-source-id: 0c09e8a996d0e5b912c8cdd45488a39951bac4db
2021-08-22 18:57:12 -07:00
2289a12f21 Made FuncTorchBatched decompose CompositeImplicitAutograd (#63616)
Summary:
See https://github.com/facebookresearch/functorch/issues/56

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63616

Reviewed By: zou3519

Differential Revision: D30438316

Pulled By: Chillee

fbshipit-source-id: e84446d9f68b87daa0cfff75b3b8a972f36ec85a
2021-08-21 17:14:39 -07:00
e926f75b0b BatchNorm autodiff re-enabled (#57321)
Summary:
Turns on BN in autodiff:

1. outputs an empty tensor for running stats to by pass autodiff issue on None;
2. fixing BN inference backward in cudnn & miopen, where backward falls back to native batchnorm kernel instead;

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57321

Reviewed By: albanD, ngimel

Differential Revision: D30250419

Pulled By: jansel

fbshipit-source-id: a62553789c20fb50a820003a056f40d9d642dfaa
2021-08-21 09:07:31 -07:00
37d60c08e5 Revert D30360382: [nnc] Support thread level parallelism in fused kernels
Test Plan: revert-hammer

Differential Revision:
D30360382 (d6d86efb1c)

Original commit changeset: 29acf4e932c6

fbshipit-source-id: e0531113135d30eabb172dc1537d5dd6d65dc438
2021-08-21 03:46:43 -07:00
76da46ccdc Revert D30417127: Remove flag to toggle CPU fusion in the presence of parallelism
Test Plan: revert-hammer

Differential Revision:
D30417127 (6600bc9651)

Original commit changeset: b77d7c68364f

fbshipit-source-id: 6b52fb83a84fe241945e3cb3eeb71050d1d9c8f1
2021-08-21 03:38:07 -07:00
8871ff29b7 [sharded_tensor] add readonly tensor properties (#63679)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63679

This PR add read only tensor properties to sharded tensor, to match the torch.Tensor behaviors.

Test Plan: test_sharded_tensor_metadata

Reviewed By: pritamdamania87

Differential Revision: D30459343

fbshipit-source-id: 9aec8ecfe76479eed25f3b843495e5719ed2956d
2021-08-20 22:17:11 -07:00
b2a601ffe5 [Static Runtime] Implement out variant for fb::quantized_linear (#63635)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63635

Reviewed By: ajyu

Differential Revision: D30446234

fbshipit-source-id: 1ef014186ff725930a97d0159626f9233ee74030
2021-08-20 21:42:22 -07:00
2d58f3f56d NNAPI: Support const values in binary ops
Summary:
NNAPI converter failed with 1 const value and one tensor earlier
Code suggestions from dreiss

Test Plan:
pytest test/test_nnapi.py::TestNNAPI::test_pointwise_binary

Imported from OSS

Reviewed By: anshuljain1

Differential Revision: D28893881

fbshipit-source-id: 59240373fb03c6fdafa4cb2fa4d8408dd20092f6
2021-08-20 21:10:26 -07:00
b4f5809db8 Migrate thnn_conv2d from THC to ATen (#63428)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63428

Closes gh-24644, closes gh-24645

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30441307

Pulled By: ngimel

fbshipit-source-id: 9c3dec469c0525831ae398df261cf41b7df7e373
2021-08-20 18:29:02 -07:00
3ee1f81dce Extend _sharded_tensor constructor to support other ops like torch.ones (#63378)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63378

a) Introduce InitCommonParams to wrap tensor creation params
b) Factor local tensor initiation into common_params so that tensor value is not hard specified in ShardedTensor constructor
c) Add _sharded_tensor.ones(...) to exemplify - Note memory_format arg is not provided to be consistent as torch.ones
d) Follow up: more ops like torch.full, torch.zero, torch.rand,

Test:
$ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestCreateTensorFromParams --v
$ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestShardedTensorChunked.test_create_sharded_tensor_with_ones --v
$ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestShardedTensorEnumerable.test_create_sharded_tensor_with_ones --v

Test Plan: Imported from OSS

Reviewed By: pritamdamania87, wanchaol

Differential Revision: D30359245

Pulled By: bowangbj

fbshipit-source-id: 85768fcb36e9d9d40213036884b1266930a91701
2021-08-20 17:11:34 -07:00
7c0f5b9aa4 [clang-tidy] Enable more folders (#63380)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63380

Crosses off some more of #62011, see the test in the stacked PR #63381

Test Plan: Imported from OSS

Reviewed By: malfet, seemethere

Differential Revision: D30455843

Pulled By: driazati

fbshipit-source-id: d473545d05ffa0b2476968f0b1c55f3a16a2c755
2021-08-20 16:40:42 -07:00
e0fe5699c4 enable increment build for build_libtorch (#63074)
Summary:
Since issue https://github.com/pytorch/pytorch/issues/59859 is resolved.

rerun_cmake in build_libtorch should not be hardcoded.
build_libtorch is necessary to generate debug version libtorch.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63074

Reviewed By: VitalyFedyunin, seemethere

Differential Revision: D30306705

Pulled By: malfet

fbshipit-source-id: f2077d334191f4973da0681560937bc8bab730c1
2021-08-20 16:30:34 -07:00
efe01c59e3 [Doc] Deprecation notice for only_inputs argument (#63631)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63544.

Changed docstring accordingly. I'm new here, not sure if the style is okay. Please check.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63631

Reviewed By: ejguan

Differential Revision: D30459439

Pulled By: soulitzer

fbshipit-source-id: 8df3c509d1dd39764815b099ab47229550126cbe
2021-08-20 15:49:49 -07:00
bcf8e2f57e Remove breakpad from docker image (#63598)
Summary:
As of https://github.com/pytorch/pytorch/issues/63186 we're doing this properly via a third_party cmake build, so we don't need it here anymore.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63598

Reviewed By: walterddr, malfet

Differential Revision: D30432250

Pulled By: driazati

fbshipit-source-id: d0d5db14355cf574e42c0d0ed786bb26230180bd
2021-08-20 15:48:39 -07:00
da0820e553 add BFloat16 operators on CPU: range, sinh, cosh, frexp, nan_to_num (#61826)
Summary:
Added BFloat16 support for range, sinh, cosh, frexp, and nan_to_num on CPU, and collected the benchmark data of these OPs(range, sinh, cosh, frexp, and nan_to_num) for BFloat16 and Float32 data type by using the operator_benchmark tool of PyTorch on the platform of Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz

Number of cores: 1 core, 28 cores(1 socket)
[cosh_sinh_benchmark.txt](https://github.com/pytorch/pytorch/files/6974313/cosh_sinh_benchmark.txt)
[frexp_benchmark.txt](https://github.com/pytorch/pytorch/files/6974315/frexp_benchmark.txt)
[nan_to_num_benchmark.txt](https://github.com/pytorch/pytorch/files/6974317/nan_to_num_benchmark.txt)
[range_benchmark.txt](https://github.com/pytorch/pytorch/files/6974318/range_benchmark.txt)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61826

Reviewed By: saketh-are

Differential Revision: D30257259

Pulled By: VitalyFedyunin

fbshipit-source-id: 394cd713e6394050a8c90b2160633beb675d71dd
2021-08-20 14:56:52 -07:00
a8de0d83fe empty caching allocator before test_avg_pool2d large subtest (#63528)
Summary:
Otherwise, unrecoverable OOM occurs on MI25.  Fixes broken ROCm CI test1.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63528

Reviewed By: malfet, zhouzhuojie

Differential Revision: D30459151

Pulled By: walterddr

fbshipit-source-id: 63e205c4f486fcbdd514cfb0ed8e38584f894585
2021-08-20 14:01:45 -07:00
b008bb4443 Include iostream in ProcessGroupMPI.cpp (#63656)
Summary:
As it uses `std::cerr`, which in turn results in compilation regression introduced by https://github.com/pytorch/pytorch/pull/61500
Fixes https://github.com/pytorch/pytorch/issues/63653

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63656

Reviewed By: ejguan

Differential Revision: D30455824

Pulled By: malfet

fbshipit-source-id: 29f316e7f7fd8e7dcbee2666e7a985f25bf56515
2021-08-20 13:15:40 -07:00
07e41cf2d7 [easy]Unbreak caffe2benchmarking build (#63655)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63655

ghstack-source-id: 136324310

Test Plan: buck build //fbobjc/Apps/Internal/Caffe2Benchmarking:Caffe2Benchmarking fbobjc/mode/iphonesimulator

Reviewed By: hl475, JacobSzwejbka

Differential Revision: D30455659

fbshipit-source-id: b6da6be4f89b6e84753ef0849ffedea04785034a
2021-08-20 12:57:27 -07:00
1dd648f1c4 [ONNX] Suppport torch.dot and torch.nn.utils.spectral_norm (#62596) (#62765)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62765

Fixes #27723

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30375181

Pulled By: msaroufim

fbshipit-source-id: 715f4745899757ec405877980cd20c826028eb2c

Co-authored-by: BowenBao <bowbao@microsoft.com>
2021-08-20 12:46:56 -07:00
db0771b05d [ONNX] Update repeat_interleave for dynamic repeats (#59979) (#62764)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62764

Fixes #58733

- Support dynamic interleave for cases with dynamic repeat values
- Moved repeat_interleave symbolic from opset 11 to opset 13, as sequence as output types for loop outputs is needed for this change

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30375179

Pulled By: msaroufim

fbshipit-source-id: 787f96bf91d124fd0483761088c5f4ae930d96a9

Co-authored-by: Shubham Bhokare <shubhambhokare@gmail.com>
2021-08-20 12:46:54 -07:00
8760254911 [ONNX] Fix an issue that optimizations might adjust graph inputs unexpectedly. (#61280) (#62763)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62763

This PR is to fix the issue that the graph inputs might be updated when we export the model in inference mode.

When a model is export in inference mode, some optimizations will be made. One side effect of these optimizations is: the inputs of graph might be adjusted. Such optimizatiosn include:

	1. Conv and BatchNorm op fusion.
	2. Do constant folding.

If the user sets export_params=False, or set keep_initializers_as_inputs=True, it's highly possible that the user wants to provide the corresponding parameters or initiliazers as the inputs of the graph.
In such situation, no matter the model is export in inference mode or training mode, exporter needs to prevent above optimizations from adjusting the graph inputs. By this, the inputs of graph could match inputs that users provided.

The changes in this PR, add an additional common judgement to see if the above optimizations needs to be done or not. From the value of export_params and keep_initializers_as_inputs arguments, infer if the graph inputs are allowed to be adjusted.
If no, these optimizations will be ignored, even other requirements are matched.

Besides these code changes, the comments of some parameters below have been updated so that users have more thoughts when they consider how to leverage these parameters for different purposes:

	1. export_params
	2. training
	3. do_constant_folding
	4. keep_initializers_as_inputs

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30375183

Pulled By: msaroufim

fbshipit-source-id: 4db8b9695649eb32a3a0fefa950ee2e5651bdba0

Co-authored-by: fatcat-z <jiz@microsoft.com>
2021-08-20 12:46:52 -07:00
a65d1ae7cc [ONNX] Fix controlflow shape inference with contrib op (#60707) (#62762)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62762

`ONNXShapeTypeInference` for node `n` is skipped if `n` is non ONNX namespace, or if `n` contains any non ONNX namespace nodes. This prevents controlflow nodes containing contrib ops from running `SpecialPostProcess`, which sets up correct node output shape/type information in rare cases.

This PR depends on opset 14 export https://github.com/pytorch/pytorch/pull/59486

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30375180

Pulled By: msaroufim

fbshipit-source-id: 5deacec39f091deb4d75ddd9e660e12fca7f16c5

Co-authored-by: BowenBao <bowbao@microsoft.com>
2021-08-20 12:45:53 -07:00
125e2d02e5 Revert D30417370: [nnc] Enable CPU fusion
Test Plan: revert-hammer

Differential Revision:
D30417370 (b9fc656cf2)

Original commit changeset: 84ce7a578a36

fbshipit-source-id: cd23774cdc3273fd72f8a05f1900eaf36f373e6b
2021-08-20 12:30:21 -07:00
2d671ca41b [8/N] Remove c10d/ddp fork tests. (#63454)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63454

Continuation of https://github.com/pytorch/pytorch/pull/63443, this
PR removes all fork tests from torch.distributed.
ghstack-source-id: 136285511

Test Plan: waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D30387872

fbshipit-source-id: f6d6313db126ae7b95b86f78a1e0726887c5c513
2021-08-20 12:23:18 -07:00
71da114412 Revert D30426527: Adding DataLoader2 class as future replacement of DataLoader
Test Plan: revert-hammer

Differential Revision:
D30426527 (5a7133b87f)

Original commit changeset: e5905d3364c4

fbshipit-source-id: 794d8a4e9256ccff8cf894aee10eff6adc30d502
2021-08-20 12:06:52 -07:00
70a3210eca Add BinaryUfuncOpInfo and broadcasting tests (#61964)
Summary:
As proof of concept, this PR uses the new `BinaryUfuncOpInfo` in broadcasting tests for `add`, `sub`, `mul`, `div`, `floor_div`, and `true_div`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61964

Reviewed By: ngimel

Differential Revision: D30407734

Pulled By: mruberry

fbshipit-source-id: ada28994f43b0635f279f45a02ecba18bc8ee033
2021-08-20 11:44:15 -07:00
b9fc656cf2 [nnc] Enable CPU fusion (#63545)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63545

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30417370

Pulled By: bertmaher

fbshipit-source-id: 84ce7a578a3678d5562bab99d1dc00330c4f72d1
2021-08-20 11:18:21 -07:00
6600bc9651 Remove flag to toggle CPU fusion in the presence of parallelism (#63514)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63514

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30417127

Pulled By: bertmaher

fbshipit-source-id: b77d7c68364f2af73570740540f3b1152313016e
2021-08-20 11:18:19 -07:00
d6d86efb1c [nnc] Support thread level parallelism in fused kernels (#63386)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63386

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30360382

Pulled By: bertmaher

fbshipit-source-id: 29acf4e932c669ce0f35823faea9099bcd8119b6
2021-08-20 11:18:17 -07:00
c78ab28441 Add support for the ONNX Runtime Eager Mode backend (#58248)
Summary:
This PR implements the necessary hooks/stubs/enums/etc for complete ONNX Runtime (ORT) Eager Mode integration. The actual extension will live out of tree at https://github.com/pytorch/ort.

We have been [working on this at Microsoft](https://github.com/microsoft/onnxruntime-pytorch/tree/eager-ort/torch_onnxruntime) for the last few months, and are finally ready to contribute the PyTorch core changes upstream (nothing major or exciting, just the usual boilerplate for adding new backends).

The ORT backend will allow us to ferry [almost] all torch ops into granular ONNX kernels that ORT will eagerly execute against any devices it supports (therefore, we only need a single ORT backend from a PyTorch perspective).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58248

Reviewed By: astaff

Differential Revision: D30344992

Pulled By: albanD

fbshipit-source-id: 69082b32121246340d686e16653626114b7714b2
2021-08-20 11:17:13 -07:00
b95ce1591d Add docs describing saved tensor hooks (#62362)
Summary:
Add section to the Autograd mechanics docs to describe the recently
exposed saved tensors (https://github.com/pytorch/pytorch/issues/52451), how to register packing / unpacking
hooks (https://github.com/pytorch/pytorch/issues/60975) and how to use default hooks (https://github.com/pytorch/pytorch/issues/61834)

Sister PR: https://github.com/pytorch/pytorch/issues/62361 (will add a link from autograd.rst to notes/autograd in whatever PR does not land first)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62362

Reviewed By: soulitzer

Differential Revision: D30453177

Pulled By: Varal7

fbshipit-source-id: f5759977b069ff0ef36a47b08856d297691a6caa
2021-08-20 11:10:51 -07:00
03cc46a0ac [fx2trt] Add layernorm plugin for dynamic shape (#63620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63620

Added layernorm dynamic plugin, so that it works when explicit batch dim is required. Needed for ig model.

Changed the way of how we creating a plugin layer from instantiating the plugin directly to use plugin creator with `PluginFieldCollection`.

Follow ups:
Another way to convert layernorm is by breaking it down to supported trt layers. T97398182

Test Plan: layernorm unittest

Reviewed By: yinghai

Differential Revision: D30138205

fbshipit-source-id: aebe021d8de818e20376634f30e84579b9807f9b
2021-08-20 10:52:42 -07:00
5f997a7d2f [PyTorch][Edge] Improve InflatableArgs for Bundled Inputs (#62368)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62368

# Context
The bundled inputs accepts an expression in the form of string InflatableArg.fmt that can be applied on the inputs to inflate. The InflatableArg.fmt provides flexibility to have custom transformation to inflate. When the input arguments to a function are not Tensor type, TorchScript casts the inputs from type T to Optional[T] expects the function to handle Nullable (None) clause as well. This becomes tricky to handle in one line code or lambda functions.

We propose an alternative way which allows InflatableArg to include the text of a TorchScript function that would be defined on the module as a helper, then use that in its inflation expression. This can be provided by InflatableArg.fmt_fn. Please refer to pytorch/test/test_bundled_inputs.py for example on how to use the same.

Also refer JacobSzwejbka comment on the same [here](https://github.com/pytorch/pytorch/pull/62368#issuecomment-892012812)

# Mitigation
Allow InflatedArg to include the text of a TorchScript function that would be defined on the module as a helper, then use that in its inflation expression.
ghstack-source-id: 135158680

Test Plan:
To run `test_dict_args`

```
(base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource/fbcode] buck test //caffe2/test:test_bundled_inputs -- test_dict_args
Action graph will be rebuilt because files have been added or removed.
Building: finished in 5.4 sec (100%) 12180/12180 jobs, 0/12180 updated
  Total time: 5.8 sec
More details at https://www.internalfb.com/intern/buck/build/fafcf277-1095-4cba-978d-6022f0d391ad
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 5ef9de71-c1b1-406b-a6c0-3321c2368b8d
Trace available for this run at /tmp/tpx-20210727-163946.454212/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7036874465805934
    ✓ ListingSuccess: caffe2/test:test_bundled_inputs - main (11.365)
    ✓ Pass: caffe2/test:test_bundled_inputs - test_dict_args (test_bundled_inputs.TestBundledInputs) (12.307)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/7036874465805934
```

To check the py code of TS module:
P433043973

Reviewed By: dreiss

Differential Revision: D29950421

fbshipit-source-id: c819ec5c94429b7fbf6c4beb0259457f169b08ec
2021-08-20 09:36:08 -07:00
5a7133b87f Adding DataLoader2 class as future replacement of DataLoader (#63523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63523

Supports sharding and batching on loader level**
* #63522 Adding IterableAsDataPipe IterDataPipe
usefull for tests and simple cases

Supports sharding and batching on loader level

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30426527

Pulled By: VitalyFedyunin

fbshipit-source-id: e5905d3364c4880e720dd62fb066f08881c71a6e
2021-08-20 09:01:55 -07:00
99e28baeba Small custom function refactor which doesn't change anything (#63433)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63433

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30431970

Pulled By: albanD

fbshipit-source-id: 905fa4d2ddeca18005b1bcb13dd6f8a080327e7c
2021-08-20 08:44:23 -07:00
0f2c60f0e3 Adding IterableAsDataPipe IterDataPipe (#63522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63522

Supports sharding and batching on loader level
* **#63522 Adding IterableAsDataPipe IterDataPipe
usefull for tests and simple cases**

usefull for tests and simple cases

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30426528

Pulled By: VitalyFedyunin

fbshipit-source-id: 535b5cc1505bb58731fcca8170541ac5ee7bd417
2021-08-20 08:38:23 -07:00
ae901e372e [Static Runtime] Enable RemoveListMutation (#63536)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63536

Enable a pass that transforms sequences like this:
```
li = []
li.append(1)
li.append(2)
```
into this:
```
li = [1, 2]
```
Initially I implemented this pass myself (D30387213), but I discovered that there is an existing pass that does the same thing.

Reviewed By: hlu1

Differential Revision: D30412970

fbshipit-source-id: 0810ef03480878d5039bd800a40f5fd31c2652ec
2021-08-20 06:15:41 -07:00
913c1f83f4 [Static Runtime] Add native op for aten::detach (#63625)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63625

This change adds a static runtime's native op implementation for `aten::detach` op.

See the standard  `aten::detach`'s implementation (https://codebrowser.bddppq.com/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp.html#_ZN2at6native6detachERKNS_6TensorE ) for comparison.

Test Plan:
- Added `StaticRuntime.IndividualOps_Detach`.

- Observed

```
V0819 18:55:33.181188 3092034 impl.cpp:1398] Switch to native impl for node: %a.1 : Tensor = aten::detach(%input.1)
```

Reviewed By: hlu1

Differential Revision: D30443187

fbshipit-source-id: d6e0eadb1b817e0a126c4fc97526abc276ee8a17
2021-08-20 00:46:27 -07:00
bec75daa77 Update protobuf to 3.13.1 (#62571)
Summary:
Update bazel to 4.10.0

Update ASAN_SYMBOLIZER_PATH to llvm-7
Suppress `vptr` ubsan violations in `test_jit`
Fix ProtoBuf patching for ONNX which caused Windows builds to crash while attempting to free `std::string` allocated on stack

Fixes https://github.com/pytorch/pytorch/issues/62569

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62571

Reviewed By: walterddr

Differential Revision: D30048685

Pulled By: malfet

fbshipit-source-id: 6462c1bef9c42318551d2cf906bbab41e1d4e1cd
2021-08-19 23:43:55 -07:00
d82667f7e2 [nnc] Updated sliceTail to do inplace mutation (#63532)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63532

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30412184

Pulled By: navahgar

fbshipit-source-id: e7669d3b9d24e14501f3feb6505c88d1d42030c6
2021-08-19 22:55:30 -07:00
5e31a3b904 [nnc] Updated sliceHead to do inplace mutation (#63531)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63531

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30412183

Pulled By: navahgar

fbshipit-source-id: 47ee9482a36e606788d28d22eee4edaca45ffa50
2021-08-19 22:54:05 -07:00
0a66d5b325 [PyTorch] Remove unnecessary iostream includes in headers (#61500)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61500

libstdc++ defines a static variable called `std::__ioinit` in iostream that adds global constructor size overhead to each translation that includes iostream. To reduce the size overhead from that, we can often include ostream instead.
ghstack-source-id: 136163529

Test Plan: buildsizebot some mobile apps

Reviewed By: dhruvbird

Differential Revision: D29648016

fbshipit-source-id: 9c3139712c71248513cc5032d21e77f3ecbae8fe
2021-08-19 18:54:51 -07:00
b99a299c60 [PyTorch] Remove unused dump() methods in vec headers (#63533)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63533

These methods don't seem to be used, and they use std::cout, which incurs a small code size overhead on platforms using libstdc++ due to std::__ioinit (see #61500). Seems like we can just delete them?
ghstack-source-id: 136163409

Test Plan:
CI

Reviwers: #sentinel, dhruvbird

Reviewed By: dskhudia

Differential Revision: D30412269

fbshipit-source-id: 380b9aa2f9aabc4107188b6b209d2afc1769c0ee
2021-08-19 18:53:49 -07:00
0b6cc8daf2 [PyTorch][Edge] Support backtrace symbolication for Android builds (#63339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63339

# Context
https://fb.workplace.com/groups/pytorch.dev/permalink/900474523864362/?comment_id=901125403799274&reply_comment_id=905023386742809

##### WHAT IS A STACK TRACE?
A stack trace (also called stack backtrace or stack traceback) is a report of the active stack frames at a certain point in time during the execution of a program.

Typically when an exception is thrown, one would expect to see the code (file:line) that threw the exception, and every intermediate frame up to and including the main function.

We are enabling android stack trace to help debugging on android devices.

Test Plan:
## Steps to test
```
buck build fbsource//xplat/caffe2/mode/aibench_pytorch_android -c pt.enable_qpl=0 -c pt.has_backtraces=1 fbsource//xplat/caffe2/fb/lite_predictor:lite_predictorAndroid#android-x86_64

one_world android emulator android-28

adb push ~/fbsource/buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictorAndroid#android-x86_64 /data/local/tmp

cd /data/local/tmp
./lite_predictorAndroid#android-x86_64

./lite_predictorAndroid#android-x86_64 --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true
```

## See how model file is not found stack traces is:

### before
```
./lite_predictorAndroid#android-x86_64 --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true

Run with 2 threads
Run with 2 threads
Loading model...
terminating with uncaught exception of type c10::Error: open file failed, file path: ./detect.bc
Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:13 (most recent call first):
(no backtrace available)
Aborted
```

### after
```
134|generic_x86_64:/data/local/tmp $ ./lite_predictorAndroid#android-x86_64 --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true
Run with 2 threads
Run with 2 threads
Loading model...
terminating with uncaught exception of type c10::Error: open file failed, file path: ./detect.bc
Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:13 (most recent call first):
 frame #0       c10::get_backtrace(unsigned long, unsigned long, bool)[0x59494274f10e]
 frame #1       [0x5949427b1eee]
 frame #2       [0x5949427b1eb2]
 frame #3       [0x5949427b1cdc]
 frame #4       std::__ndk1::function<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > ()>::operator()() const[0x5949427afc34]
 frame #5       c10::Error::Error(c10::SourceLocation, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >)[0x5949427b05b1]
 frame #6       c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x5949427aca5f]
 frame #7       caffe2::serialize::FileAdapter::RAIIFile::RAIIFile(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x5949426b37b2]
 frame #8       caffe2::serialize::FileAdapter::FileAdapter(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x5949426b3903]
 frame #9       torch::jit::_load_for_mobile(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, c10::optional<c10::Device>, std::__ndk1::unordered_map<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::hash<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::equal_to<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > > >&)[0x5949422737bd]
 frame #10      torch::jit::_load_for_mobile(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, c10::optional<c10::Device>)[0x594942273769]
 frame #11      benchmark(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, int, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, bool, int, int, int, bool, int, bool, int, double, bool, bool, bool, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x59494189b21d]
 frame #12      main[0x594941882aff]
 frame #13      __libc_init[0x7b699d08578d]
```

### what we get for os:linux
```
(base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource] ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true
Run with 24 threads
Run with 24 threads
Loading model...
terminate called after throwing an instance of 'c10::Error'
  what():  open file failed, file path: ./detect.bc
Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:13 (most recent call first):
frame #0: ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor() [0x20cb7fe]
frame #1: ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor() [0x20cb6c6]
frame #2: std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>::operator()() const + 0x54 (0x20ca4e4 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #3: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x57 (0x20ca9a7 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #4: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x7a (0x20c823a in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #5: caffe2::serialize::FileAdapter::RAIIFile::RAIIFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x96 (0x206f3d6 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #6: caffe2::serialize::FileAdapter::FileAdapter(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x42 (0x206f502 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #7: torch::jit::_load_for_mobile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x30 (0x1be826c in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #8: torch::jit::_load_for_mobile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>) + 0x35 (0x1be8214 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #9: benchmark(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, int, int, int, bool, int, bool, int, double, bool, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x16d (0x12093ad in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #10: main + 0x25c (0x11f933c in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #11: __libc_start_main + 0x105 (0x7fc7b9f2ed95 in /usr/local/fbcode/platform009/lib/libc.so.6)
frame #12: _start + 0x2a (0x11f902a in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)

Aborted (core dumped)
````

Reviewed By: dhruvbird

Differential Revision: D30135947

fbshipit-source-id: f50c634ef4545843305cad4b4a14a8776b1aec76
2021-08-19 18:41:29 -07:00
f2bf0f229f Revert D30359218: [pytorch][PR] [doc] pre-commit fix instructions
Test Plan: revert-hammer

Differential Revision:
D30359218 (4e1d84ae8f)

Original commit changeset: 61771babeac4

fbshipit-source-id: c2ac0a4a7463fafa03ad0b20bfb0701a8c1476c4
2021-08-19 16:48:04 -07:00
d0d27f6971 Add concurrency group for more workflows (#63606)
Summary:
Fixes unnecessary duplicated workflows runs

![image](https://user-images.githubusercontent.com/658840/130146332-ecf54e49-3538-49c1-88de-b099f1c1e41f.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63606

Reviewed By: malfet, mruberry

Differential Revision: D30436889

Pulled By: zhouzhuojie

fbshipit-source-id: aafbad1edc45e3ab9bceb00e8f3b4204f18e43d0
2021-08-19 15:39:28 -07:00
71ab48ed3b acc type inference (#63119)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63119

Test Plan:
buck run mode/opt-clang caffe2/torch/fb/model_transform/experimental:fx_ir_lower_inline_cvr -- \
    --action=lower_and_run \
    --filename=inline_cvr_7x_dec_2020.model \
    --print_glow_glog=True

Reviewed By: jamesr66a, jfix71, ansley

Differential Revision: D30235895

fbshipit-source-id: dab7f96e1799b99eeae0ee519cf0ddd636fddf2e
2021-08-19 15:23:56 -07:00
ccca66597a Replace hardcoded values in IndexKernel.cu (#63372)
Summary:
This is a small change that helps to maintain Cruise pytorch fork, since we use a different hardcoded value.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63372

Reviewed By: mruberry

Differential Revision: D30396171

Pulled By: ejguan

fbshipit-source-id: cc0023f58b5922d3d98c7283495e6dc8d35049b6
2021-08-19 15:02:28 -07:00
e5ab0d1013 DataLoader: allow non-integer Samplers (#63500)
Summary:
Not entirely sure how to use TypeVar but if someone could give me a hint it would be appreciated. Also let me know if you want me to add tests so we can make sure non-integer samplers actually work. It seems like `test/test_dataloader.py` is the correct location but that's a big file.

Fixes https://github.com/pytorch/pytorch/issues/63483

ejguan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63500

Reviewed By: mruberry

Differential Revision: D30403689

Pulled By: ejguan

fbshipit-source-id: 464e09e5aad3215b94a29cc5e21cb4b10ec136e3
2021-08-19 14:55:46 -07:00
11a40ad915 [Pytorch] Fix callstack pointer serialization bug (#63576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63576

We serialize function name associated with InlinedCallStackPtr. This is derived
via querying Function* stored in InlinedCallStack. However this is a raw
pointer that is not gauranteed to be valid when we serialization happens. On
the other hand we also store function name separately when constructing
InlinedCallStack anyways. So this change just uniformly relies on function_name
instead of Function*

Test Plan: Internal build's asan failure + CI

Reviewed By: larryliu0820

Differential Revision: D30427029

fbshipit-source-id: de9617482404785920ed2e67b72f38461590fba3
2021-08-19 13:35:52 -07:00
6c3ebccc00 Updating the names of these functions (#63513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63513

updating these names per Jerry's nits in the previous pr

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D30406710

fbshipit-source-id: a9f1577a2b8c4a93f5005e0f6278b7d7348d8b66
2021-08-19 13:34:34 -07:00
ce6fe50158 Revert embedding thrust->cub migration (#63451)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63427

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63451

Reviewed By: mruberry

Differential Revision: D30398482

Pulled By: ngimel

fbshipit-source-id: e153786d204215555a6571688eabae712facad7e
2021-08-19 13:03:33 -07:00
99203580a9 Updates internal assert_allclose callsites in favor of assert_close (#61841)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61841

Redo of #60863.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30408145

Pulled By: mruberry

fbshipit-source-id: 0b34ebc7f23ba38ecd89640b61d8aca59b7eab58
2021-08-19 12:50:41 -07:00
efd70b7ce6 Modernizes add and mul documentation (#63309)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/39329.

The documentation for torch.add and torch.mul was sorely out of date and even included deprecated references. This PR modernizes their descriptions consistent with torch.sub.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63309

Reviewed By: ngimel

Differential Revision: D30338004

Pulled By: mruberry

fbshipit-source-id: ee1c2a8106af8341253cafb0003b06e8f652624d
2021-08-19 12:49:30 -07:00
d986d4bf63 [special] use __all__ to hide internal imports (#63135)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63135

Reviewed By: ngimel

Differential Revision: D30364287

Pulled By: mruberry

fbshipit-source-id: 20078668943fafa45ce09610634b1d2c424b1922
2021-08-19 12:45:43 -07:00
0c3904d180 [BF16] Add a missing thread local specifier to autocast_gpu_dtype (#63416)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63416

Fix a missing thread local specifier introduced by recent PR

https://github.com/pytorch/pytorch/pull/61002

Test Plan: Unit Tests

Reviewed By: ngimel

Differential Revision: D30376154

fbshipit-source-id: c70d37ec85c3eba88eb87f766f1c4e7aeff8eaf9
2021-08-19 12:39:27 -07:00
535d44141b [7/N] Remove fork tests for RPC. (#63443)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63443

After https://github.com/pytorch/pytorch/pull/63442, all distributed
tests can run with opt-asan. As a result, we can now remove all of our fork
based tests.

This is the first PR in a stack, which first removes fork based tests from RPC.
ghstack-source-id: 136177744

Test Plan: waitforbuildbot

Reviewed By: lw

Differential Revision: D30384905

fbshipit-source-id: 86d438aebaa6cb02ae2a966fea244849849a1889
2021-08-19 11:22:40 -07:00
bd8608cd5c Use CMake for breakpad (#63186)
Summary:
We currently build breakpad from [this fork](https://github.com/driazati/breakpad) to include extra logic to restore signal handlers that were previously present. With some [new additions](https://github.com/google/breakpad/compare/main...driazati:main) this fork now includes a CMake based build, so we can add breakpad as a proper dependency rather than rely on including it in Docker images as a system library which is error prone (we have a bunch of images) and hard to extend to MacOS / Windows. This also includes some changes to the crash handling code to support MacOS / Windows in a similar way to Linux.

```python
import torch

# On Windows this writes crashes to C:\Users\<user>\AppData\pytorch_crashes
# On MacOS/Linux this writes crashes to /tmp/pytorch_crashes
torch.utils._crash_handler.enable_minidumps()

# Easy way to cause a segfault and trigger the handler
torch.bincount(input=torch.tensor([9223372036854775807]))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63186

Reviewed By: malfet, seemethere

Differential Revision: D30318404

Pulled By: driazati

fbshipit-source-id: 0d7daf3701cfaba5451cc529a0730272ab1eb1dc
2021-08-19 10:42:01 -07:00
e030b81356 [easy] Fix missing move in TupleType::createNamed (#61572)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61572

ghstack-source-id: 136161829

Test Plan: CI

Reviewed By: SplitInfinity

Differential Revision: D29672872

fbshipit-source-id: d8ba2d54f7914dbeb3fc52aa21dd77025951c4b5
2021-08-19 10:38:52 -07:00
3aa4521fe8 [hpc] use fx2trt for exploration track (#63535)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63535

Reviewed By: yinghai, jianyuh

Differential Revision: D30272810

fbshipit-source-id: 61f3edf2a2282cd8c268a92acf92feb05a6ae3e1
2021-08-19 10:18:56 -07:00
885e312ce0 Add permute021 fx2trt converter (#63238)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63238

Reviewed By: yinghai

Differential Revision: D30295373

fbshipit-source-id: 2a189fe485edaa978fd03e4b8d8582edb34ec648
2021-08-19 10:17:48 -07:00
e7831fe5de [PyTorch] Test IValue move/copy/assign/swap more (#54717)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54717

Hit more tags in these tests
ghstack-source-id: 136140508

Test Plan: buck test //caffe2/aten:ivalue_test

Reviewed By: anjali411

Differential Revision: D27339736

fbshipit-source-id: 610c8e92846bb70ba725ab117440326ab50af5ce
2021-08-19 09:50:40 -07:00
79693bb86a Use linecache.lazycache to cache generated code. (#63453)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63453

Instead of patching linecache.getlines, use linecache.lazycache and
parts of the loader protocol described in PEP-302

Test Plan:
python3 test/test_fx.py

Imported from OSS

Reviewed By: suo

Differential Revision: D30388176

fbshipit-source-id: 92933711ecf3a21a07e1d6b0d1185ab0efd8341c
2021-08-19 09:17:01 -07:00
e1334512a3 Add fastpath for dot and vdot when the inputs have conj bit set to True (#62915)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62915

As much as 45% and 20% perf improvement on CUDA and CPU respectively.
consistent improvement in perf for all cases -- see perf numbers in comments below

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30404006

Pulled By: anjali411

fbshipit-source-id: 565940da28c7761d993cf43346932c24292e8a4d
2021-08-19 08:42:24 -07:00
f596aa8b77 Poisson zero rate (#61511)
Summary:
This PR fixes https://github.com/pytorch/pytorch/issues/53485 by allowing zero rates for the Poisson distribution. This implementation is consistent with `scipy.stats.poisson` which admits zero rates. In addition to addressing the aforementioned issue, this PR makes two supporting changes:

1. add a `nonnegative` constraint to enforce non-negative rates for the Poisson distribution.
2. adjust the evaluation of the gradient of `xlogy` such that it is well defined for `x == 0 and y == 0`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61511

Reviewed By: ejguan

Differential Revision: D30352917

Pulled By: albanD

fbshipit-source-id: f3d33da58360e80d75eb83519f199b93232a2a2d
2021-08-19 08:30:28 -07:00
be9be9bfdd add distributed/_sharded_tensor/test_sharded_tensor to ROCM_BLOCKLIST (#63508)
Summary:
Fixes current ROCm CI test2 brokenness until tensorpipe is fully supported by ROCm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63508

Reviewed By: ejguan

Differential Revision: D30406450

Pulled By: walterddr

fbshipit-source-id: c07509271d5d33901f3eaf7ffb916dc3626e1f9a
2021-08-19 07:50:55 -07:00
e7c4988b52 To fix the chainability at epoch zero for some schedulers (#63457)
Summary:
It has been discussed in the https://github.com/pytorch/pytorch/pull/60836#issuecomment-899084092 that we have observed an obstacle to chain some type of learning rate schedulers. In particular we observed

* some of the learning rate schedulers returns initial learning rates at epoch 0 as
```
       return self.base_lrs`
```

* This can be a problem when two schedulers called as chained as

```
     scheduler1.step()
     scheduler2.step()
```

in particular, we completely ignore the effect of scheduler1 at epoch 0.  This could not be an issue if at epoch 0, scheduler1 was ineffective as in many schedulers, however for schedulers as WarmUp Schedulers, where at epoch 0 schedulers multiplicative value is smaller than 1 this could lead to undesired behaviors.

The following code snippet illustrates the problem better

## Reproducing the bug

```python
import torch
from torch.nn import Parameter
from torch.optim import SGD
from torch.optim.lr_scheduler import WarmUpLR, ExponentialLR

model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 1.0)
scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant")
scheduler2 = ExponentialLR(optimizer, gamma=0.9)

for epoch in range(10):
     print(epoch, scheduler2.get_last_lr()[0])
     optimizer.step()
     scheduler1.step()
     scheduler2.step()
```

### Current Result

```
0 1.0
1 0.9
2 0.81
3 0.7290000000000001
4 0.6561000000000001
5 5.904900000000001
6 5.314410000000001
7 4.782969000000001
8 4.304672100000001
9 3.874204890000001
```

### Expected Result

```
0 1.0
1 0.9
2 0.81
3 0.7290000000000001
4 0.6561000000000001
5 0.5904900000000001
6 0.5314410000000001
7 0.4782969000000001
8 0.4304672100000001
9 0.3874204890000001
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63457

Reviewed By: datumbox

Differential Revision: D30424160

Pulled By: iramazanli

fbshipit-source-id: 3e15af8d278c872cd6f53406b55f4d3ce5002867
2021-08-19 07:17:03 -07:00
2d5b19f62b Update full backward hook doc with not-same-object note (#63245)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61446

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63245

Reviewed By: ejguan

Differential Revision: D30352656

Pulled By: albanD

fbshipit-source-id: 7000ecb54a80f2da968ec7600b98574b608578ae
2021-08-19 06:50:56 -07:00
47a9e8ff32 [Static Runtime] Support __getitem__ for lists (#63398)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63398

This change provides a native `__getitem__` implementation for lists to avoid overhead associated with falling back to the JIT interpreter.

Test Plan: Unit tests: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D30368464

fbshipit-source-id: e0e0971508cd5d9bcf6025606993dc24ecbf6764
2021-08-19 06:38:51 -07:00
ce61100923 Revert D29399533: Hoisting common expressions out of If blocks
Test Plan: revert-hammer

Differential Revision:
D29399533 (9477211e7d)

Original commit changeset: 9336b9dc48c0

fbshipit-source-id: f081c7280203f40328bcbb0c03a7c6a007acedb7
2021-08-19 06:20:40 -07:00
6bb68ba507 Fix interpreter debug logging message (#63499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63499

https://github.com/pytorch/pytorch/pull/62418 combine the instruction and debug handle. This change fix the debugging message.
ghstack-source-id: 136184053

Test Plan: Uncomment and it works

Reviewed By: kimishpatel, raziel

Differential Revision: D30390699

fbshipit-source-id: e32b7b297ad3b7d8bffebd025d15519083a244c4
2021-08-19 02:14:13 -07:00
5254e3adb8 layernom inplace (#63437)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63437

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30388824

Pulled By: Krovatkin

fbshipit-source-id: 852d19bf238544c5de177ed5854dcd01c7ae5572
2021-08-18 23:07:25 -07:00
531262fe2e layernorm (#63436)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63436

use MKLDNN layernorm

use mkldnn version 2

address Elias feedback

fix build CI errors

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30388825

Pulled By: Krovatkin

fbshipit-source-id: fb909bfbf53cb8567a43aac40f51c491daeec908
2021-08-18 23:05:39 -07:00
6e00b31b15 [TensorExpr] Make CacheReplacer and IndexFlattener mutate stmts/exprs inplace. (#63527)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63527

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30411411

Pulled By: ZolotukhinM

fbshipit-source-id: efb14ee57b36537fa4fefa89bdd6bafe7151c012
2021-08-18 22:59:31 -07:00
1d62fb8a63 [TensorExpr] Speedup ExternalCall.ComputeInterop test by reducing tensor sizes. (#63526)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63526

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30411410

Pulled By: ZolotukhinM

fbshipit-source-id: d9a99afac14d2238b5100c98ae9ed4467f9f05ea
2021-08-18 22:58:25 -07:00
773c8b6440 support optional comparisons with different but comparable types (#62890)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62565

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62890

Reviewed By: ejguan

Differential Revision: D30396008

Pulled By: dagitses

fbshipit-source-id: fca02207509f882973d54484f89c4d116505fc66
2021-08-18 21:40:38 -07:00
2544664e54 Beef up comment in AccumulateType (#63503)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63503

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30403160

Pulled By: ezyang

fbshipit-source-id: 6cb24418152d9fb146f86b6f973ec50f1a397a58
2021-08-18 20:59:37 -07:00
0d437fe6d0 BF16 allreduce hook (#63260)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63260

Add BF16 all-reduce communication hook. Skip if CUDA version < 11 or NCCL version < 2.9.7.

Reviewed By: SciPioneer

Differential Revision: D30238317

fbshipit-source-id: bad35bf7d43f10f1c40997a282b831b61ef592bb
2021-08-18 20:53:49 -07:00
9477211e7d Hoisting common expressions out of If blocks (#59492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59492

Adding code to find common expressions from the two subblocks of an if
operation and hoist them before the if block.
This also allows Dead Code Elimination to
then eliminate some if blocks.

Also eliminated some dead code in the codebase.

Test Plan:
python test_jit.py TestIfHoisting

Imported from OSS

Reviewed By: ngimel

Differential Revision: D29399533

fbshipit-source-id: 9336b9dc48c02c38862f98f98cd72fc1767a1802
2021-08-18 16:29:30 -07:00
d9547b9bb2 Nnapi Delegation: Quick improvements (#63489)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63489

A few quick improvements to the Android NNAPI Delegate, some of which were discussed here https://github.com/pytorch/pytorch/pull/62272:
1) `throw std::exception` replaced with `TORCH_CHECK` to reduce runtime
size (nnapi_backend_lib.cpp)
2) weights processing moved from compile to preprocess step, since it can
be done AOT (nnapi_backend_lib.cpp & nnapi_backend_preprocess.cpp)
3) `ser_model_` and `shape_compute_module_` member variables removed, since they are never used after
`init()`, so they are not needed (nnapi_backend_lib.cpp)

Test Plan:
Unit tests: `python test/test_jit.py TestNnapiBackend`
Run SparkAR segmentation with delegated NNAPI as done here D30259033 (can use `jf download GAekdAwsyGKXhggFALN4LnSBTzcubsIXAAAz --file "v303-nnd-mod.ptl"` to get a preprocessed model from these changes)

Imported from OSS

Reviewed By: raziel, iseeyuan

Differential Revision: D30398880

fbshipit-source-id: b6872e1e9ccd583622b80659da00c83fdd82580e
2021-08-18 16:25:01 -07:00
4dcc2197ce [fix] tensor_split : non-contiguous indices tensor (#63390)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63390

Reviewed By: ejguan

Differential Revision: D30362649

Pulled By: mruberry

fbshipit-source-id: 3ea3ad02199e4345beb0b580d056babd56112309
2021-08-18 16:10:17 -07:00
1f4e019d8e [Vulkan] Fix incorrect input range for Hardshrink tests (#63515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63515

Fixed inappropriate input range for Hardshrink tests:
The range -10 ~ +10 for input tensors is more proper when we use the test set of lambda {-4.2, -1.0, -0.42, 0.0, 0.42, 1.0, 4.2, 42.42}.
ghstack-source-id: 136141416

Test Plan:
```build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
```
Note that the test can fail sporadically due to the precision loss by FP16(Vulkan)/FP32(CPU). This issue will be handled separately after some design discussions.

Reviewed By: SS-JIA

Differential Revision: D30389646

fbshipit-source-id: 7224bd8ba4e4972f5fc147df8a0cb84808f8c62e
2021-08-18 15:52:12 -07:00
15eec8e1d1 using PR number instead of IN_PULL_REQUEST (#63360)
Summary:
PR numbers should be available on GHA after this.

This fixes some target determinator not working issue discovered when manually running: https://github.com/pytorch/pytorch/issues/63412.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63360

Reviewed By: malfet, zhouzhuojie, seemethere

Differential Revision: D30374615

Pulled By: walterddr

fbshipit-source-id: eee8d8bb7aa4308a6a50cfdcd4423a96d846777f
2021-08-18 15:05:10 -07:00
779a3d47b0 [Static Runtime] Benchmark reports native nodes (#63346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63346

We have seen that we can get significant perf wins essentially for free by implementing native ops for ops that we cannot write out variants for (e.g. TupleUnpack D30306955 (078b8004a6), append D30326461 (9d9e7a8d72)). Therefore, whether or not SR is using a native implementation is valuable information. By capturing this in the benchmarking suite, we can hopefully avoid wasting time profiling/manually inspecting `native_ops.cpp`

Reviewed By: hlu1

Differential Revision: D30346752

fbshipit-source-id: 205b090513b6a5a6ce4cb92f75ab0395b15d08f9
2021-08-18 15:05:08 -07:00
139413078f [FX] make ASTReriter patch wrapped functions properly (#62987)
Summary:
reference the same global namespace (instead of copying it) in ASTRewriter to patch wrapped functions properly

Fixes #{62071}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62987

Test Plan:
To test it you may write this snippet and ensure the results are as shown in the comments:

```
import torch
import torch.fx

torch.fx.wrap
def to_be_wrapped(x):
    return torch.relu(x)

class Foo(torch.nn.Module):
    def forward(self, x):
        return to_be_wrapped(x)

traced = torch.fx.symbolic_trace(Foo())
print(traced.graph)
"""
graph():
    %x : [#users=1] = placeholder[target=x]
    %to_be_wrapped : [#users=1] = call_function[target=__main__.to_be_wrapped](args = (%x,), kwargs = {})
    return to_be_wrapped
"""

from torch.fx.experimental.rewriter import RewritingTracer

rt = RewritingTracer()
graph = rt.trace(Foo())
print(graph)
"""
### AFTER FIX (CORRECT):
graph():
    %x : [#users=1] = placeholder[target=x]
    %to_be_wrapped : [#users=1] = call_function[target=__main__.to_be_wrapped](args = (%x,), kwargs = {})
    return to_be_wrapped

### BEFORE FIX (WRONG):
graph():
    %x : [#users=1] = placeholder[target=x]
    %relu : [#users=1] = call_function[target=torch.relu](args = (%x,), kwargs = {})
    return relu
"""
```

Reviewed By: ansley

Differential Revision: D30396176

Pulled By: mostafaelhoushi

fbshipit-source-id: f61eddf32e9ef42b5f5c3ce21d559945214ee833
2021-08-18 15:03:57 -07:00
9bbf80969e [PyTorch] Avoid using std::regex for device string parsing in Device.cpp (#63464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63464

This was previously committed as D30281388 (4d6f98ecad), but was reverted due to t98478641. jnkwok1 confirmed that this change was not the root cause, so trying to land it again.

Currently, `std::regex` is used for parsing device strings. This is undesirable for a few reasons.

1. Increases binary size
2. Slows down model loading
3. Potentially uses more memory at runtime
4. Takes marginally longer time to build code that uses std::regex v/s not using std::regex

This change avoids the use of `std::regex` for parsing the device string since we don't need to.
ghstack-source-id: 136006963
ghstack-source-id: 136081898

Test Plan:
### AI Bench Runs

**Before this change:**
1. Model Load time: [252ms](https://www.internalfb.com/intern/aibench/details/332471502816548)
2. Model unload time: 3.5ms

**After this change:**
1. Model Load time: [240ms](https://www.internalfb.com/intern/aibench/details/652195589031318), which is an approx 5% reduction for the current model. I suspect percentage wise, it will be larger for smaller models since this is a fixed cost reduction.
2. Model unload time: 3.3ms (probably too small to be meaningfully impactful to an end user).

### BSB Results

```
D30281388 (4d6f98ecad)-V1 (https://www.internalfb.com/intern/diff/D30281388 (4d6f98ecad)/?dest_number=135713848)

messenger-pika-optimized-device: Succeeded
Change in Download Size for arm64 + 3x assets variation: -7.1 KiB
Change in Uncompressed Size for arm64 + 3x assets variation: -17.6 KiB

Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:551399955987465@base/bsb:551399955987465@diff/
```

Reviewed By: raziel, pavithranrao

Differential Revision: D30388269

fbshipit-source-id: 10942e7aa56f9ea47aa479a8f50187f2ce2899bf
2021-08-18 14:55:12 -07:00
7fdba4564a [TensorExpr] IRSimplifier: sort terms in polynomials, terms, minterms, maxterms. (#63197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63197

This solves non-determinism from using hash values in sort methods.
Changes in tests are mostly mechanical.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30292776

Pulled By: ZolotukhinM

fbshipit-source-id: 74f57b53c3afc9d4be45715fd74781271373e055
2021-08-18 14:49:27 -07:00
8bdd542417 [TensorExpr] Add debug logging to LoopNest::computeInline. (#63196)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63196

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30292778

Pulled By: ZolotukhinM

fbshipit-source-id: d8a111b75466a9354f6d048119cc6f814c9d5abb
2021-08-18 14:48:05 -07:00
feba6806c9 clarify that torch.finfo.tiny is the smallest normal number (#63241)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63241

This is a common source of confusion, but it matches the NumPy
behavior.

Fixes #44010
Fixes #59526

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30307646

Pulled By: dagitses

fbshipit-source-id: d848140ba267560387d83f3e7acba8c3cdc53d82
2021-08-18 13:44:52 -07:00
9253dc1e58 Fix segmentation fault due to access to destroyed CudaIPCGlobalEntities instance (#56141)
Summary:
There is an instance of the static destruction order fiasco where cuda_ipc_global_entities may be accessed after it is destroyed. See https://github.com/pytorch/pytorch/issues/51961

This change uses a flag and avoids accesses to the destroyed class when it is set to false.

Fixes https://github.com/pytorch/pytorch/issues/51961

This removes the function to clear shared_blocks introduced by https://github.com/pytorch/pytorch/issues/53080 which had multiple issues: Unprotected access to a shared structure and modification of the vector which is being cleared by the destructors of the objects contained.
I.e. what happened was:

- `CudaIPCSentDataLimbo_.clear_shared_blocks();` is called from the destructor of CudaIPCGlobalEntities as of your PR
- This deletes instances of `CudaIPCSentData` which hold `at::DataPtr` created by `GetNewRefCountedSentData`
- This means `CudaIPCSentDataDelete` is called with still active pointers
- Hence `CudaIPCSentDataLimbo_.add` is called adding a new value to `shared_blocks_`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56141

Reviewed By: ejguan

Differential Revision: D30397279

Pulled By: VitalyFedyunin

fbshipit-source-id: ce4b8b90fa1c90d275e5eca93ba84321cbc6140a
2021-08-18 13:38:55 -07:00
877e6f2be3 Bugfix for fuse qconfig comparison (#63384)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63384

In some cases the changes to qconfig on module would cause the
fusions to fail. This bugfix solves that problem by adding a
qconfig_function_comparison that compares the functions within the
qconfig rather than the modules the qconfigs are on. The comparison
looks at the partial object within QConfig.activation/weight.p and
compares args, keywords and func. This is necessary to do mannually
because partial doesn't have __eq__ implemented and so == reverts to is.

Test Plan:
python test/test_quantization.py
TestFuseFx.test_problematic_fuse_example

Imported from OSS

Reviewed By: supriyar, ejguan

Differential Revision: D30386264

fbshipit-source-id: 51e358c021c39d6f48dc12ad2a82b2838677b9de
2021-08-18 13:31:56 -07:00
2aa19f33c6 [ONNX] Fix for batchnorm training op mode (#52758) (#62760)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62760

* Rebase

# Conflicts:
#	torch/csrc/jit/passes/onnx/eval_peephole.cpp

# Conflicts:
#	test/onnx/test_utility_funs.py
#	torch/onnx/symbolic_opset9.py

* Update symbolic_opset12.py

* Update test.sh
# Conflicts:
#	.jenkins/caffe2/test.sh

* Merge

* Fix utility tests

# Conflicts:
#	test/onnx/test_pytorch_onnx_onnxruntime.py
#	test/onnx/test_utility_funs.py

* Fix for comment

* Enable BN tests

* Fix for test

* Update test_pytorch_onnx_onnxruntime.py

* Update test_pytorch_onnx_onnxruntime.py

* Update test_utility_funs.py

* Update test_pytorch_onnx_onnxruntime.py

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30349060

Pulled By: msaroufim

fbshipit-source-id: 93312c17607974731c17099ae181acb6e4c1c409
2021-08-18 13:29:07 -07:00
e182401062 [ONNX] Remove aten parameter (#61652) (#62759)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62759

* remove aten argument in export()

* add export_to_pretty_string default value OperatorExportTypes.ONNX

* add DPYTORCH_ONNX_CAFFE2_BUNDLE description

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30349062

Pulled By: msaroufim

fbshipit-source-id: d9738f3aa8b80eac54548d0b9494f9f1e544f20f

Co-authored-by: Gary Miguel <garymiguel@microsoft.com>
2021-08-18 13:29:04 -07:00
3a7bbf5fb7 [ONNX] Add support for opset14 in PT-ONNX exporter (#59486) (#62758)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62758

* Add initial changes for opset14

* Fixed flake

* Add onnx submodule changes and removed utility func tests

* Add updated batchNorm symbolic

* Add triu/tril symbolics

* Fix lint

* Fixed test failures

* Add reshape with allowzero

* Added tests/refactored opset versioning

* Bump onnxruntime version

* Fix clang/lint failures

* Add reshape shape inference for opset 14

* Changes for allowzero

* Fix lint/clang and test failures

* Updated PR

* Flake fixes

* Fix flake

* Remove new_jit_api tests

* Add opset14 models

* Update allowzero

* Fix test failures

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30349063

Pulled By: msaroufim

fbshipit-source-id: 54724246149b01a2f627c43d7396253a7e9c9eb9

Co-authored-by: Shubham Bhokare <sbhokare@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2021-08-18 13:29:01 -07:00
99b154b8be [ONNX] Support lstm_cell symbolic (#61476) (#62757)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62757

Support lstm_cell symbolic

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30349061

Pulled By: msaroufim

fbshipit-source-id: f236177e3e5c62a30b7e4d91a623bcaef21b5eb1

Co-authored-by: jiafatom <jiafa@microsoft.com>
2021-08-18 13:27:46 -07:00
d661e646ad [FX] Fix GraphModule deepcopy to use deepcopied graph (#63090)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63090

Test Plan: Imported from OSS

Reviewed By: ansley

Differential Revision: D30252471

Pulled By: jamesr66a

fbshipit-source-id: cafd7d7917935a5ea6ffa2a7fe9e9b2a9578b3e3
2021-08-18 13:17:14 -07:00
11fbd3958c MaybeOwned page for dev wiki (#63450)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63450

Brief guide to understanding `MaybeOwned<Tensor>`, aimed at C++ PT devs who are obliged to interact with existing uses of it, rather than encouraging new usage.

For reviewers: I haven't yet added a link to this page from anywhere. I'm thinking the right place is the [dev wiki main page C++ section](https://github.com/pytorch/pytorch/wiki#c) but happy to put it wherever makes sense, suggestions welcome.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30402313

Pulled By: bhosmer

fbshipit-source-id: 69b15909ecafcd8d88e44f664f88c3ad4eb26d84
2021-08-18 12:08:58 -07:00
9bb1371cc2 Disable RDYNAMIC check with MSVC (#62949)
Summary:
When testing with clang-cl, the flag is added though it is unsupported and that generates a few warnings. Tried a few alternatives like https://cmake.org/cmake/help/latest/module/CheckLinkerFlag.html, but they just don't work.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62949

Reviewed By: zhouzhuojie, driazati

Differential Revision: D30359206

Pulled By: malfet

fbshipit-source-id: 1bd27ad5772fe6757fa8c3a4bddf904f88d70b7b
2021-08-18 11:51:23 -07:00
d4593d9d08 document why wrappers exist in torch.functional (#62847)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62844.

These wrappers are not super obvious, but ultimately stem from the lack of support for functions with variadic args in native_functions.yaml. https://github.com/pytorch/pytorch/issues/62845 tracks that issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62847

Reviewed By: VitalyFedyunin

Differential Revision: D30305016

Pulled By: dagitses

fbshipit-source-id: 716fcecb0417b770bc92cfd8c54f7ead89070896
2021-08-18 11:51:21 -07:00
f0f5cffde9 [DDP] Add a debug check in cpp fp16 compress (#63379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63379

this codepath has been prone to bugs as seen in the below diff, this
will help ensure against changes/refactors that touch this, as a basic sanity
check. Enabled it in debug-only builds to not affect the perf.
ghstack-source-id: 136056093

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D30358440

fbshipit-source-id: e1b3893a223722c2593ceed8696a09c7d07d47c1
2021-08-18 11:51:19 -07:00
ac1ece054b [DDP][Grad compression] Fix fp16 cpp hook (#63375)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63375

I think tensor.copy_(tensor.to(torch::kFloat16)); will keep it as
float32.

Tested by add the following line:

```
LOG(INFO) << "Type is: " << compressed_tensor.scalar_type();
```

before:

```
I0816 17:03:09.823688 364141 default_comm_hooks.cpp:21] Type is: Float
```
after:

```
I0816 17:01:16.779052 353924 default_comm_hooks.cpp:21] Type is: Half
```
ghstack-source-id: 136056092

Test Plan: ci

Reviewed By: SciPioneer

Differential Revision: D30356256

fbshipit-source-id: 8208a705acd7628541cd43c8bf61d007dfdd2435
2021-08-18 11:49:35 -07:00
4e1d84ae8f [doc] pre-commit fix instructions (#61717)
Summary:
fix invalid instruction

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61717

Reviewed By: zhouzhuojie, driazati

Differential Revision: D30359218

Pulled By: malfet

fbshipit-source-id: 61771babeac4d34425a61ce49f38a7099b521eec
2021-08-18 11:42:25 -07:00
50a3b6a6a8 Make SkipInfo with expected_failure an XFAIL (#63481)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63481

This PR changes the SkipInfo decorators to use unittest.expectedFailure so that the test reports as XFAIL as opposed to PASSED.

Note that changing the expectedFailure here 30e1c74dc1/torch/testing/_internal/common_device_type.py (L879) to an XFAIL is not possible because the decision of whether to decorate is delayed until the wrapper function is called.

fixes https://github.com/pytorch/pytorch/issues/63363

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30397154

Pulled By: heitorschueroff

fbshipit-source-id: c5e4911969ad8667763eec4203dbbc6a51178592
2021-08-18 11:36:18 -07:00
2f615f6313 Improve custom function docs (#60312)
Summary:
- Adds some code examples for `ctx` methods and make requirements of arguments more clear
- Type annotations for `save_for_backward`, `mark_dirty`, `mark_non_differentiable`, and `set_materialize_grads` (BC-breaking?)
- Refactor `torch.autograd.Function` doc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60312

Reviewed By: VitalyFedyunin

Differential Revision: D30314961

Pulled By: soulitzer

fbshipit-source-id: a284314b65662e26390417bd2b6b12cd85e68dc8
2021-08-18 11:31:31 -07:00
d565a7bd68 [6/N] Enable opt-asan for elastic and launcher tests. (#63442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63442

Continuation of https://github.com/pytorch/pytorch/pull/62051, I've
enabled elastic and launcher tests to run in opt-asan mode which is supported
with spawn multiprocessing.

This allows us to completely get rid of fork based tests from torch.distributed
and have all tests run in spawn mode.
ghstack-source-id: 136057123

Test Plan: waitforbuildbot

Reviewed By: cbalioglu

Differential Revision: D30384267

fbshipit-source-id: ad3447cfb9d6e31e7ec8332d64c8ff1054858dcb
2021-08-18 10:48:49 -07:00
af3cbfed95 Add validation check in fx2trt interpreter (#63424)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63424

Add validation check in fx2trt for missing converter operators. If any op missing, interpreter init will report missing operators.

Test Plan:
for call_function and call_method:
manual test with feeds benchmark and verify init failed with expected message.
{F642390780}

for call_module:
specify a module as leaf node and make acc_tracer trace it as a node; then in fx2trt.py, in CONVERTER initialize stage make it skip recording all modules; initialize interpreter and call validator function, verify the output includes the missing module name, return value print as screenshot below.

{F643458718}

Reviewed By: 842974287

Differential Revision: D30294832

fbshipit-source-id: 243dca3fdfc6a174ded65248938e2a234aec19c6
2021-08-18 10:41:10 -07:00
7df2324120 [pytorch] Make qconv forward() thread safe (#63432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63432

There's a race condition in quantized models when multiple threads call forward() due to qnnpack packing the weights the first time the operator is called. This locks the entire apply_impl function.

Test Plan:
https://github.com/pytorch/pytorch/issues/58055

Ran the script before and after, original crashes went away

Reviewed By: kimishpatel

Differential Revision: D30229520

fbshipit-source-id: d06cabe24199a80325cd57f24a7fd60624be2cf7
2021-08-18 10:37:13 -07:00
565578cdab Use fastAtomicAdd in EmbeddingBag (mode "max") backward (#63298)
Summary:
Rel: https://github.com/pytorch/pytorch/issues/62695

### This PR
|   n_tokens |   num_embeddings |   embedding_dim | mode   |    bwd_fp32 |    bwd_fp16 |
|-----------:|-----------------:|----------------:|:-------|------------:|------------:|
|       4096 |             4096 |            4096 | max    | 0.000326228 | 0.000181448 |
|       4096 |             4096 |           16384 | max    | 0.00102805  | 0.000618136 |
|       4096 |            16384 |            4096 | max    | 0.000907326 | 0.000530422 |
|       4096 |            16384 |           16384 | max    | 0.00334988  | 0.00264645  |
|      16384 |             4096 |            4096 | max    | 0.000366449 | 0.000320232 |
|      16384 |             4096 |           16384 | max    | 0.00126421  | 0.00104183  |
|      16384 |            16384 |            4096 | max    | 0.00087738  | 0.00065068  |
|      16384 |            16384 |           16384 | max    | 0.00379229  | 0.00298201  |

### Original
|   n_tokens |   num_embeddings |   embedding_dim | mode   |    bwd_fp32 |    bwd_fp16 |
|-----------:|-----------------:|----------------:|:-------|------------:|------------:|
|       4096 |             4096 |            4096 | max    | 0.00032407  | 0.000188231 |
|       4096 |             4096 |           16384 | max    | 0.00104356  | 0.000624001 |
|       4096 |            16384 |            4096 | max    | 0.000902069 | 0.000527382 |
|       4096 |            16384 |           16384 | max    | 0.00302202  | 0.00255153  |
|      16384 |             4096 |            4096 | max    | 0.000384343 | 0.000403249 |
|      16384 |             4096 |           16384 | max    | 0.00126445  | 0.00135069  |
|      16384 |            16384 |            4096 | max    | 0.000880814 | 0.000825679 |
|      16384 |            16384 |           16384 | max    | 0.00337611  | 0.00319515  |

cc xwang233 ptrblck ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63298

Reviewed By: mruberry

Differential Revision: D30383583

Pulled By: ngimel

fbshipit-source-id: 14dd9d67002c53a153721812709033c198f68c1e
2021-08-18 10:14:40 -07:00
e2ddaec5cf Reverting launch bounds change in topK that induced a regression in perf (#63431)
Summary:
[topkwsyncs.zip](https://github.com/pytorch/pytorch/files/7003077/topkwsyncs.zip)

Running this script on nvidia containers 21.08 vs 21.07 we see the following perf drops:
topk(input=(dtype=torch.float16,shape=[60, 201600]), k=2000, dim=1, sorted=True) - 0.63

topk(input=(dtype=torch.float32,shape=[120000]), k=12000, dim=0, sorted=False) - 0.55

topk(input=(dtype=torch.float16,shape=[5, 201600]), k=2000, dim=1, sorted=True) - 0.55

topk(input=(dtype=torch.float32,shape=[1, 10000]), k=1000, dim=1, sorted=False) - 0.33

The relative perf drop is reported as (21.08_time - 21.07_time) / 21.07_time

I narrowed down the source of the regression to this commit: https://github.com/pytorch/pytorch/pull/60314
which reduced launch bounds from 1024 to 512.

The perf did not seem to regress in the original  evidence provided to change 1024 to 512 due to the input shapes in the benchmark being a lot smaller than the input shapes of the tensors which I am witnessing perf regression in. I suggest reverting back to 1024 as with 512 there was no considerable improvement in perf for small inputs and a major regression in perf for large tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63431

Reviewed By: mruberry

Differential Revision: D30384087

Pulled By: ngimel

fbshipit-source-id: 11eecbba82a069b1d4579d674c3f644ab8060ad2
2021-08-18 09:44:07 -07:00
383a33a0eb Make DataChunk support list in-place ops (#63422)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63422

Fixes #63095

Make `DataChunk` delegate to list method. Then it will support in-place operations:
- `sort`
- `reverse`
- `append`
- `extend`
- `random.shuffle`

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30379027

Pulled By: ejguan

fbshipit-source-id: d176bd0cc8b89b915c7bb184ff243ab1f605616d
2021-08-18 08:48:47 -07:00
cyy
93582e3bba A tiny fix in MT19937RNGEngine (#63219)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63219

Reviewed By: VitalyFedyunin

Differential Revision: D30341484

Pulled By: ezyang

fbshipit-source-id: 0ff4499d0f4a3dfeb991c0f10fe3248c6ca1c992
2021-08-18 08:05:23 -07:00
c508433617 Implement subclass priority for __torch_dispatch__ (#63411)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63411

In order to get this behavior, you have to use append_overloaded,
which I forgot to use in the previous implementation.  I exposed
an internal helper function which is more appropriate for dispatch
to Python where we know that an argument is definitely a Tensor (and
this test no longer needs to be done).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D30374489

Pulled By: ezyang

fbshipit-source-id: 43b08c00d1958c9b26d82a025d19f0b67bb85590
2021-08-18 07:49:03 -07:00
061b36e2f5 [fx2trt] Add dequantize support (#63448)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63448

Only available after TensorRT 8.0

Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_dequantize

Reviewed By: 842974287

Differential Revision: D30296863

fbshipit-source-id: 44b9630ef0d210e7f20e650dc81c519f7e41f5f3
2021-08-18 07:44:17 -07:00
a00d587849 add OpInfo for torch.linalg.tensorinv (#62326)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53739.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62326

Reviewed By: H-Huang

Differential Revision: D30136376

Pulled By: zou3519

fbshipit-source-id: 04ec9450e8866667649af401c7559b96ddc91491
2021-08-18 07:37:34 -07:00
30e1c74dc1 Update cuda amp to also check xla device (#63413)
Summary:
Fixes https://github.com/pytorch/xla/issues/3086. Pytorch/XLA:GPU also use cuda amp. I verified the pt/xla `test_autocast` with this fix and all test passed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63413

Reviewed By: ngimel

Differential Revision: D30380785

Pulled By: bdhirsh

fbshipit-source-id: fd1a1de7d224c616fc3fa90b80a688a21f6b1ecc
2021-08-18 06:44:10 -07:00
4a390a56c4 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D30391472

fbshipit-source-id: d4eb1e7debea8905e7fee5f026c082bee65e78f3
2021-08-18 04:20:05 -07:00
2b303f3f31 enhance comparison tests for c10::optional (#62887)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62887

Reviewed By: VitalyFedyunin

Differential Revision: D30305044

Pulled By: dagitses

fbshipit-source-id: d0a3a9e4ea186915ef087543aaf81a606f943380
2021-08-18 04:08:05 -07:00
0f2f6a79cb clarify the documentation of torch.meshgrid (#62977)
Summary:
Also warn about the behavior differences from `numpy.meshgrid`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62977

Reviewed By: mruberry, ngimel

Differential Revision: D30220930

Pulled By: dagitses

fbshipit-source-id: ae6587b41792721cae2135376c58121b4634e296
2021-08-18 04:01:22 -07:00
f8a84a80cd [5/N] Run opt-asan with detect_leaks=0 (#63361)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63361

Python multiprocessing doesn't support LSAN and causes false positives
instead. As a result, disabling LSAN for these tests so that we can still run
with opt-asan
ghstack-source-id: 135962489

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D30352269

fbshipit-source-id: f6ab5abce7bdef00cd5e1f5977424d2b151174af
2021-08-18 01:59:56 -07:00
d431c77d76 [sharded_tensor] fix typing issue for placement (#63426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63426

placement should either be a string or a _remote_device, this fixes the type to match the behaviors
ghstack-source-id: 136041125

Reviewed By: pritamdamania87

Differential Revision: D30379702

fbshipit-source-id: 34e226494240923b433e3a39cc08c84d42cdad6b
2021-08-17 23:11:48 -07:00
2fd14735d6 [easy][PyTorchEdge] print error message when failing to load model file (#63404)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63404

# Context
Loading a model file using `fopen` might error out for multiple reasons. Repro'ing the error on devices takes some time and efforts. Logging the error no# will help in debugging and fixing the error quickly.

# Mitigation
Printout the error message of the `fopen` to help users debug the issue.

Test Plan:
```
(base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource] buck run xplat/caffe2/fb/lite_predictor:lite_predictor -- --model=/home/pavithran/models/prod/GAaNhAoTIV6cIvgJAHn30m8NR1QgbmQwAAAA.ptl --use_bundled_input=0
Building: finished in 0.5 sec (100%) 354/354 jobs, 0/354 updated
  Total time: 0.6 sec
Run with 24 threads
Run with 24 threads
Loading model...
terminate called after throwing an instance of 'c10::Error'
  what():  open file failed because of errno 2 on fopen: No such file or directory, file path: /home/pavithran/models/prod/GAaNhAoTIV6cIvgJAHn30m8NR1QgbmQwAAAA.ptl
Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:15 (most recent call first):
(no backtrace available)
```

Reviewed By: dhruvbird

Differential Revision: D30372308

fbshipit-source-id: 5346e828f53f6bc5d871b403586566a3332a389a
2021-08-17 22:27:49 -07:00
15144ade25 [fx2trt] Add quantize_per_tensor support (#63447)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63447

Only available in TRT 8.0 and above

Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_quantize_per_tensor

Reviewed By: 842974287

Differential Revision: D30322844

fbshipit-source-id: dfd925e3432de128f2925b1aa55d6125e63359af
2021-08-17 21:37:26 -07:00
3fd8e09102 Fix RPC Python User Function Error Handling (#63406)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63406

The `RemoteException` will be thrown on the caller side when converting
the response message to IValue. Since it is a Python error, the error
message needs to be extracted explicitly and clear the `PyErr`.

Test Plan: Imported from OSS

Reviewed By: rohan-varma, ngimel

Differential Revision: D30372741

Pulled By: mrshenli

fbshipit-source-id: 1f72a7ee0c39cc2ef070f99884c142f7b3e0543d
2021-08-17 20:14:03 -07:00
f12f667e12 [torch] Set default log level for torch elastic (#63214)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63214

The default log level in fb and oss is different: in oss we use WARNING and in fb we use INFO.

Test Plan: unittests, f291441502

Reviewed By: cbalioglu

Differential Revision: D30296298

fbshipit-source-id: 89067352be767255fbc66e790ec333582de64c6c
2021-08-17 19:58:13 -07:00
dcf90b797c [BE] remove _SUPPORTED_OPTIM_MAP from tests (#63383)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63383

Per title
ghstack-source-id: 135966157

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D30358921

fbshipit-source-id: 965e054e525194b1ee55980340df275bab355c9b
2021-08-17 17:17:25 -07:00
5b8862abf1 [DDP] Support step_param for AdamW (#63382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63382

Per title
ghstack-source-id: 135966156

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D30255446

fbshipit-source-id: e6ffbf339db0bc5b4702d02b74a462309df07c75
2021-08-17 17:16:11 -07:00
cd5e9dcc1d [quant][graphmode][fx][fix] Fix quantization for tuple arguments (#63376)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63376

Previously when tuple is an argument for a quantizable op it would be transformed to a list by mistake,
this PR fixes that.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_preserve_tuple

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D30357642

fbshipit-source-id: 82d10805d9c00c003cc99983dca68b6455ff7b2e
2021-08-17 17:01:24 -07:00
975542c314 Add more ciflow labels for more workflows (#63410)
Summary:
- Add more ciflow labels and enable it for more workflows.
- Only the 'ciflow/default' workflows are run by default on pull_request time
- Other labels can be manually triggered by (adding the labels + unassign pytorchbot), OR wait for pytorchbot's comment opt-in rollout
- The label design is a logical operator `OR`, i.e. adding ('ciflow/cuda' + 'ciflow/win') will trigger the union of them. (design feedback is needed here)

Typical default workflows for normal PRs.

<details>
<summary>Generated label rules</summary>

![image](https://user-images.githubusercontent.com/658840/129779905-eb5e56dd-a696-4040-9eb6-71ecb6487dc1.png)

```
{
  "label_rules": {
    "ciflow/all": [
      "libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
      "libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
      "linux-bionic-cuda10.2-py3.9-gcc7",
      "linux-bionic-py3.8-gcc9-coverage",
      "linux-xenial-cuda10.2-py3.6-gcc7",
      "linux-xenial-cuda11.1-py3.6-gcc7",
      "linux-xenial-py3.6-gcc5.4",
      "linux-xenial-py3.6-gcc7-bazel-test",
      "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
      "periodic-linux-xenial-cuda11.3-py3.6-gcc7",
      "periodic-win-vs2019-cuda11.3-py3",
      "win-vs2019-cpu-py3",
      "win-vs2019-cuda10.1-py3",
      "win-vs2019-cuda11.1-py3"
    ],
    "ciflow/bazel": [
      "linux-xenial-py3.6-gcc7-bazel-test"
    ],
    "ciflow/coverage": [
      "linux-bionic-py3.8-gcc9-coverage"
    ],
    "ciflow/cpu": [
      "linux-bionic-py3.8-gcc9-coverage",
      "linux-xenial-py3.6-gcc5.4",
      "linux-xenial-py3.6-gcc7-bazel-test",
      "win-vs2019-cpu-py3"
    ],
    "ciflow/cuda": [
      "libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
      "libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
      "linux-bionic-cuda10.2-py3.9-gcc7",
      "linux-xenial-cuda10.2-py3.6-gcc7",
      "linux-xenial-cuda11.1-py3.6-gcc7",
      "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
      "periodic-linux-xenial-cuda11.3-py3.6-gcc7",
      "periodic-win-vs2019-cuda11.3-py3",
      "win-vs2019-cuda10.1-py3",
      "win-vs2019-cuda11.1-py3"
    ],
    "ciflow/default": [
      "linux-bionic-py3.8-gcc9-coverage",
      "linux-xenial-cuda11.1-py3.6-gcc7",
      "linux-xenial-py3.6-gcc5.4",
      "linux-xenial-py3.6-gcc7-bazel-test",
      "win-vs2019-cpu-py3",
      "win-vs2019-cuda10.1-py3"
    ],
    "ciflow/libtorch": [
      "libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
      "libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
      "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7"
    ],
    "ciflow/linux": [
      "libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
      "libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
      "linux-bionic-cuda10.2-py3.9-gcc7",
      "linux-bionic-py3.8-gcc9-coverage",
      "linux-xenial-cuda10.2-py3.6-gcc7",
      "linux-xenial-cuda11.1-py3.6-gcc7",
      "linux-xenial-py3.6-gcc5.4",
      "linux-xenial-py3.6-gcc7-bazel-test",
      "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
      "periodic-linux-xenial-cuda11.3-py3.6-gcc7"
    ],
    "ciflow/scheduled": [
      "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
      "periodic-linux-xenial-cuda11.3-py3.6-gcc7",
      "periodic-win-vs2019-cuda11.3-py3"
    ],
    "ciflow/slow": [
      "linux-bionic-cuda10.2-py3.9-gcc7",
      "linux-xenial-cuda10.2-py3.6-gcc7"
    ],
    "ciflow/win": [
      "periodic-win-vs2019-cuda11.3-py3",
      "win-vs2019-cpu-py3",
      "win-vs2019-cuda10.1-py3",
      "win-vs2019-cuda11.1-py3"
    ]
  },
  "version": "v1"
}
```
</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63410

Reviewed By: ngimel

Differential Revision: D30378553

Pulled By: zhouzhuojie

fbshipit-source-id: 4e0953740793e5e72b95018f8ab2ce4a6a364c38
2021-08-17 17:00:09 -07:00
da87d648b3 F.avg_pool3 CUDA backward: gpuAtomicAddNoReturn -> fastAtomicAdd (#63387)
Summary:
Rel: https://github.com/pytorch/pytorch/issues/62695

In the following two tables, I set `kernel_size` to 3 and `stride` to 2.
In benchmark, input tensors have the shape of (N, C, n_features, n_features, n_features).
Tested on RTX3080 w/ CUDA11.4 Update 1.

## This PR

|   N |   C |   n_features | dtype         |        time |
|----:|----:|-------------:|:--------------|------------:|
|  32 |   3 |            8 | torch.float16 | 7.46846e-05 |
|  32 |   3 |            8 | torch.float32 | 8.18968e-05 |
|  32 |   3 |           32 | torch.float16 | 0.000156748 |
|  32 |   3 |           32 | torch.float32 | 0.000165236 |
|  32 |   3 |          128 | torch.float16 | 0.00549854  |
|  32 |   3 |          128 | torch.float32 | 0.008926    |

## master (6acd87f)

|   N |   C |   n_features | dtype         |        time |
|----:|----:|-------------:|:--------------|------------:|
|  32 |   3 |            8 | torch.float16 | 7.60436e-05 |
|  32 |   3 |            8 | torch.float32 | 7.55072e-05 |
|  32 |   3 |           32 | torch.float16 | 0.000189292 |
|  32 |   3 |           32 | torch.float32 | 0.000168645 |
|  32 |   3 |          128 | torch.float16 | 0.00699538  |
|  32 |   3 |          128 | torch.float32 | 0.00890226  |

master's time divided by PR's time is as follows:

| N | C | n_features | master / PR |
|---:|---:|---------------:|----------------:|
| 32 | 3 | 8 | 1.018 |
| 32 | 3 | 32 | 1.208 |
| 32 | 3 | 128 | 1.272|

cc: xwang233 ptrblck ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63387

Reviewed By: mruberry

Differential Revision: D30381434

Pulled By: ngimel

fbshipit-source-id: 3b97aee4b0d457a0277a0d31ac56d4151134c099
2021-08-17 16:53:13 -07:00
6e5d065b2b Add pocketfft as submodule (#62841)
Summary:
Using https://github.com/mreineck/pocketfft

Also delete explicit installation of pocketfft during the build as it will be available via submodule

Limit PocketFFT support to cmake-3.10 or newer, as `set_source_files_properties` does not seem to work as expected with cmake-3.5

Partially addresses https://github.com/pytorch/pytorch/issues/62821

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62841

Reviewed By: seemethere

Differential Revision: D30140441

Pulled By: malfet

fbshipit-source-id: d1a1cf1b43375321f5ec5b3d0b538f58082f7825
2021-08-17 15:29:56 -07:00
078dcc4e97 [wip] Move smallest bucket to end after rebuild buckets (#62279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62279

Before rebuild buckets, `kDefaultFirstBucketBytes` is actually misleading because we reverse the parameter indices when initialize reducer so it is actually the size of the last bucket.

Currently rebuild buckets sets this to be the first bucket size, but seeing if keeping it as last can help perf.

This is currently experimental only and don't plan to land it unless experiments show a clear win.
ghstack-source-id: 135966897

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D29927931

fbshipit-source-id: 55b949986fa2c3bade6fcb4bf5b513461bf0f490
2021-08-17 15:04:50 -07:00
e0e2796fa9 adding a note to the documentation of polar (#63259)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63259

Fix #52919

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D30342536

Pulled By: NivekT

fbshipit-source-id: 4c61a86f96a6370cc64652bf652c4ae25c9f4601
2021-08-17 14:48:32 -07:00
bcddc71f26 [quant][graphmode][fx][bc-breaking] Support for reference pattern for fixqparam ops in eval mode (#62608)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62608

Insert extra fixeqparam fake quant in the output of fixed qparam ops in fbgemm e.g. sigmoid
so that we can produce reference patterns for these ops

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: iramazanli

Differential Revision: D30053978

fbshipit-source-id: c527944b6e791bb4d45ebe96265af52794203695
2021-08-17 14:42:40 -07:00
9cd24e12a1 Revert D30281388: [PyTorch] Avoid using std::regex for device string parsing in Device.cpp
Test Plan: revert-hammer

Differential Revision:
D30281388 (4d6f98ecad)

Original commit changeset: 4d998e9f313e

fbshipit-source-id: 11134b3400cc3e851155c9c1b6fb59308ff1567b
2021-08-17 14:40:27 -07:00
495e7e4815 Fix zero-dim handling in torch.matmul (#63359)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63359

Fixes #63352. The problem was that in e.g. `torch.matmul(A, B)` with A,
B having shapes [3, 2, 0] and [0, 2], the code attempts to call
`A.view(-1, 0)` which fails due to "-1 being ambiguous". The solution is
to manually compute what we want the shape of the view to be.

Test Plan: - new tests

Reviewed By: ngimel

Differential Revision: D30351583

Pulled By: zou3519

fbshipit-source-id: 7625691fe8b85d96a4073409596a932c303e3e8c
2021-08-17 13:44:47 -07:00
1dc2b52764 [TensorExpr] Add a wrapper for all expr and stmt pointers. (#63195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195

This helps us to later switch from using KernelArena with raw pointers
to shared pointers without having to change all our source files at
once.

The changes are mechanical and should not affect any functionality.

With this PR, we're changing the following:
 * `Add*` --> `AddPtr`
 * `new Add(...)` --> `alloc<Add>(...)`
 * `dynamic_cast<Add*>` --> `to<Add>`
 * `static_cast<Add*>` --> `static_to<Add>`

Due to some complications with args forwarding, some places became more
verbose, e.g.:
 * `new Block({})` --> `new Block(std::vector<ExprPtr>())`

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30292779

Pulled By: ZolotukhinM

fbshipit-source-id: 150301c7d2df56b608b035827b6a9a87f5e2d9e9
2021-08-17 13:44:45 -07:00
a2db5d34a5 OpInfo fix: conv_transpose2d (#63389)
Summary:
Addresses comment: https://github.com/pytorch/pytorch/pull/62882#issuecomment-899679606.

cc: mruberry ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63389

Reviewed By: mruberry

Differential Revision: D30377481

Pulled By: ngimel

fbshipit-source-id: 0fa21acc3503c259c9b27463e8555247c43d9e2e
2021-08-17 13:42:36 -07:00
9d9e7a8d72 [Static Runtime] Implement aten::append (#63350)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63350

Add a native implementation for `aten::append`, the list append op.

Test Plan: New unit test: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Append`

Reviewed By: hlu1

Differential Revision: D30326461

fbshipit-source-id: 0dbdf6cc82e78c7c36db39583256f6b87385e3d3
2021-08-17 13:40:18 -07:00
6621df9a6a [vulkan] Add log_softmax (#63193)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63193

Test Plan: Imported from OSS

Reviewed By: SS-JIA

Differential Revision: D30291987

fbshipit-source-id: 89c6560274e5a841e5af249f6963b67ef6826f4c
2021-08-17 13:36:02 -07:00
b0396e39f4 [quant][fx] Ensure qconfig works for QAT with multiple modules (#63343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63343

The previous implementation had a bug where we were trying to modify an ordered dict value while iterating through it.
This fixes it by creating a copy before modifying it.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qconfig_qat_module_type

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D30346116

fbshipit-source-id: 0e33dad1163e8bff3fd363bfd04de8f7114d7a3a
2021-08-17 11:40:51 -07:00
e000dfcf97 Add return type hint and improve the docstring of consume_prefix_in_state_dict_if_present method (#63388)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63388

Context: https://discuss.pytorch.org/t/how-to-use-the-helper-function-consume-prefix-in-state-dict-if-present/129505/3

Make it clear that this method strips the prefix in place rather than returns a new value.

Additional reformatting is also applied.
ghstack-source-id: 135973393

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D30360931

fbshipit-source-id: 1a0c7967a4c86f729e3c810686c21dec43d1dd7a
2021-08-17 11:30:27 -07:00
fcc840eae0 Add handling of ifs to shape propagation (#62914)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62914

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30196945

Pulled By: eellison

fbshipit-source-id: 1c0c7f938c4547330fd1dba8ab7dd0b99a79b6a9
2021-08-17 11:26:42 -07:00
3975c08e5d Small shape analysis changes (#62911)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62911

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D30196946

Pulled By: eellison

fbshipit-source-id: 2562bab323088d9c1440ae0431e533f9bcc513d3
2021-08-17 11:26:40 -07:00
e2227e86e4 Add a few peepholes (#62910)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62910

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30196947

Pulled By: eellison

fbshipit-source-id: d88c92616d4de4f47ff4fcf5c1994e629ca20395
2021-08-17 11:26:38 -07:00
9a60759453 Propagate symbolic dimensions through idioms like x.view(y.size()) (#61975)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61975

Propagate symbolic dimensions through size calls. We did this by associating SymbolicSizes with integer inputs by looking through their constructors for `x.size(1)` or `x.size()` nodes.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30196948

Pulled By: eellison

fbshipit-source-id: 377fc1d2f6d396c52dc0e87fa814b15720f1414e
2021-08-17 11:25:22 -07:00
60cadd0bd1 [fx2trt] Refactor linear op to use mm + add
Summary:
Previously linear is translated to fully_connected which only works when weight is a constant,
this diff changes that to mm + add so that the weight can be an ITensor so that we can have the weight - quantize - dequantize
pattern in the produced TensorRT network

Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_linear

Reviewed By: 842974287

Differential Revision: D30294751

fbshipit-source-id: 596fbd4c81caef8df41a002a2e14fbf22d9d2a80
2021-08-17 10:52:28 -07:00
517aa8965a Updates set_default_dtype documentation (#63233)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60560.

The description of set_default_dtype is updated to clarify that it affects the interpretation of Python numbers as either float32 (complex64) or float64 (complex128) and that default (floating) dtypes other than float32 or float64 are unsupported.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63233

Reviewed By: VitalyFedyunin

Differential Revision: D30306396

Pulled By: mruberry

fbshipit-source-id: bbee62f323c773b23b2fa45cb99122bc28197432
2021-08-17 10:41:03 -07:00
63554cfb3d Remove backend_debug from torch_core srcs and replace with library dependency (#63111)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63111

### Problem:
Buck contains at least two libraries which have `backend_debug_info.cpp` as a source, `torch_core` and `backend_interface_lib`. `backend_debug_info.cpp` registers BackendDebugInfo as a class. If targets contain both libraries (e.g. sparkAR debug build with NNAPI delegation), then BackendDebugInfo is registered twice, causing a runtime error.
### Solution:
These changes remove `backend_debug_info.cpp` and `backend_interface.cpp` as a source in `torch_core` and adds backend_interface_lib as a dependency instead.

**build_variables.bzl:**
- Added a list that excludes `backend_debug_info.cpp` and `backend_interface.cpp` ( both srcs already included by `backend_interface_lib`)

**buck:**
- torch_core: Removed `backend_debug_info.cpp` from srcs and added `backend_interface_lib` deps
- backend_interface_lib: Replaced `torch_mobile_core` dep with more specific deps
  - to avoid an indirect dep between `torch_core` and `torch_mobile_core`

ghstack-source-id: 135981061

Test Plan:
### Test Plan:
Build and run SparkAR internally with Android NNAPI Delegation (`buck build --show-output arstudioplayer_arm64_debug`)
and internal tests.

Reviewed By: iseeyuan

Differential Revision: D30259034

fbshipit-source-id: 0c14c827732f07fb9b9bd25a999828b51793cdcc
2021-08-17 10:33:35 -07:00
3aecec609f Move Android Nnapi srcs from aten_native_cpu to aten_cpu (#62919)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62919

Move Android NNAPI srcs (nnapi_bind.cpp, nnapi_wrapper.cpp, nnapi_model_loader.cpp) from aten_native_cpu to aten_cpu, so that later the NNAPI delegate's execution library can depend on it.

aten_native_cpu is built selectively per app, but the srcs have no selective components and are required for the NNAPI delegate library in D30259033.

See Buck Dependencies: https://docs.google.com/document/d/17RuWkqWKCO6sc5fKzIDkGeNhhvMk7BvJOqeSnGsHZ8o/edit?usp=sharing
ghstack-source-id: 135981062

Test Plan: `buck build --show-output arstudioplayer_arm64_debug` and internal tests

Reviewed By: iseeyuan

Differential Revision: D30164867

fbshipit-source-id: 0beff481ff250e75664ce8393beabbeb9db66770
2021-08-17 10:32:30 -07:00
c982f13a80 [android][vulkan] Fix model loading for Vulkan backend (#63402)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63402

Test Plan: Imported from OSS

Reviewed By: SS-JIA

Differential Revision: D30370692

Pulled By: IvanKobzarev

fbshipit-source-id: 73311b9b767fe9ed3ae390db59d6aa2c4a98f06d
2021-08-17 10:20:32 -07:00
f70b9ee5de Advertise USE_PRECOMPILED_HEADERS in CONTRIBUTING.md (#62827)
Summary:
This option was added in https://github.com/pytorch/pytorch/issues/61940 and fits with this section's theme of improving build times.

I've also changed it to a `cmake_dependent_option` instead of `FATAL_ERROR`ing for older CMake versions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62827

Reviewed By: astaff

Differential Revision: D30342102

Pulled By: malfet

fbshipit-source-id: 3095b44b7085aee8a884ec95cba9f8998d4442e7
2021-08-17 10:14:40 -07:00
011fdc3b7e [fx] persist tracer_cls on fx.Graph when deep copying (#63353)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63353

Custom deepcopy method copies all nodes but does not copy the tracer_cls attribute

Reviewed By: houseroad

Differential Revision: D30349424

fbshipit-source-id: 3e98bdac8a8a992eb0b4ec67fe80bb2e5cf3884d
2021-08-17 09:57:48 -07:00
4d6f98ecad [PyTorch] Avoid using std::regex for device string parsing in Device.cpp (#63204)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63204

Currently, `std::regex` is used for parsing device strings. This is undesirable for a few reasons.

1. Increases binary size
2. Slows down model loading
3. Potentially uses more memory at runtime
4. Takes marginally longer time to build code that uses std::regex v/s not using std::regex

This change avoids the use of `std::regex` for parsing the device string since we don't need to.
ghstack-source-id: 136006963

Test Plan:
### AI Bench Runs

**Before this change:**
1. Model Load time: [252ms](https://www.internalfb.com/intern/aibench/details/332471502816548)
2. Model unload time: 3.5ms

**After this change:**
1. Model Load time: [240ms](https://www.internalfb.com/intern/aibench/details/652195589031318), which is an approx 5% reduction for the current model. I suspect percentage wise, it will be larger for smaller models since this is a fixed cost reduction.
2. Model unload time: 3.3ms (probably too small to be meaningfully impactful to an end user).

### BSB Results

```
D30281388-V1 (https://www.internalfb.com/intern/diff/D30281388/?dest_number=135713848)

messenger-pika-optimized-device: Succeeded
Change in Download Size for arm64 + 3x assets variation: -7.1 KiB
Change in Uncompressed Size for arm64 + 3x assets variation: -17.6 KiB

Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:551399955987465@base/bsb:551399955987465@diff/
```

Reviewed By: raziel

Differential Revision: D30281388

fbshipit-source-id: 4d998e9f313e6366d9d89a6a73cd090ddfb059fc
2021-08-17 09:23:48 -07:00
013a42bdb1 [PyTorch] Add Device_test.cpp (#63203)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63203

Currently, `c10::Device` isn't being tested - i.e. there's no test to ensure that the device string parsing works as expected. This diff adds very basic tests to assert that the stuff we expect to work works, and the stuff that we don't expect to work doesn't work.

ghstack-source-id: 136006962

Test Plan:
New test. Ran as:

```
cd fbsource/fbcode/
buck test //caffe2/c10:c10_test_0 -- -r '.*DeviceTest.*'
```

Reviewed By: dreiss, raziel

Differential Revision: D30286910

fbshipit-source-id: b5699068dcbba89d5d224dbaf74b175f3f785a00
2021-08-17 09:22:35 -07:00
336aa9cd85 change with_callable_args to return a fresh _PartialWrapper (#63374)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63326

Currently `get_callable_args` has the side effect of mutating the input _PartialWrapper. When that input is one of the global defaults, there are all sorts of lifetime issues that crop up. (Details in the linked issue.) So far as I can tell, we only need to make a constructor which is module (and by extension device) aware, so making a fresh one should have the same effect without leaking the last call's module.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63374

Test Plan: the repro in https://github.com/pytorch/pytorch/issues/63326 now reports no leaked Tensors, and all quantization tests pass locally.

Reviewed By: HDCharles

Differential Revision: D30359360

Pulled By: robieta

fbshipit-source-id: aef33261ac49952d8d90da868a57ab063dfc456e
2021-08-17 09:11:38 -07:00
7bad9ac78a Fix flaky test for dp saved tensor hooks (#63324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63324

Fix for https://www.internalfb.com/tasks/?t=98258963
`catch_warnings` seem to only trigger once in certain cases where it
should trigger twice.
This test is only meant to test whether hooks are trigger / not trigger,
so changing it to self.assertGreater is ok.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30340833

Pulled By: Varal7

fbshipit-source-id: 1bfb9437befe9e8ab8f95efe5f513337fa9bdc5c
2021-08-17 08:56:58 -07:00
2992d92b5a Add mode to TarArchiveReader (#63332)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63332

Add a corresponding PR from [torchdata](https://github.com/facebookexternal/torchdata/pull/101)

Test Plan: Imported from OSS

Reviewed By: astaff

Differential Revision: D30350151

Pulled By: ejguan

fbshipit-source-id: bced4a1ee1ce89d4e91e678327342e1c095dbb9e
2021-08-17 07:28:37 -07:00
cae5ddc427 add torch.meshgrid() OpInfo (#62720)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62719

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62720

Reviewed By: astaff

Differential Revision: D30344574

Pulled By: dagitses

fbshipit-source-id: ed42d9fe20741df98018efb08e640fca370583fb
2021-08-17 04:04:24 -07:00
22f78144c7 Extends warning on norm docs (#63310)
Summary:
torch.norm has a couple documentation issues, like https://github.com/pytorch/pytorch/issues/44552 and https://github.com/pytorch/pytorch/issues/38595, but since it's deprecated this PR simply clarifies that the documentation (and implementation) of torch.norm maybe be incorrect. This should be additional encouragement for users to migrate to torch.linalg.vector_norm and torch.linalg.matrix_norm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63310

Reviewed By: ngimel

Differential Revision: D30337997

Pulled By: mruberry

fbshipit-source-id: 0fdcc438f36e4ab29e21e0a64709e4f35a2467ba
2021-08-16 22:23:45 -07:00
ad94248b57 Cleanup dead code (#63328)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63328

This code supported the old `at::_fft_with_size` operator which no longer exists.

Test Plan: Imported from OSS

Reviewed By: astaff

Differential Revision: D30343557

Pulled By: mruberry

fbshipit-source-id: 7a71585e013acb46c98f14fd40e15bdfbf026bac
2021-08-16 22:13:08 -07:00
877b649bc3 Workaround for cuFFT bug (#63327)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63327

Fixes #63152

Test Plan: Imported from OSS

Reviewed By: astaff

Differential Revision: D30343558

Pulled By: mruberry

fbshipit-source-id: 68e17a07650f65f397e26efc417e97e2ab302f82
2021-08-16 22:11:52 -07:00
794b04c6c8 Add step to report code coverage from GHA (#63373)
Summary:
Similar to the logic provided in b2069e7d01/.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml (L197-L201)

Fixes https://github.com/pytorch/pytorch/issues/63366

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63373

Reviewed By: walterddr

Differential Revision: D30357737

Pulled By: malfet

fbshipit-source-id: 20b115eb4d6412bd9895680308a9097742d2ae7b
2021-08-16 20:42:38 -07:00
548c717cbd [TensorExpr] Remove test_train from tensorexpr tests. (#63194)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63194

This test implements functionality used nowhere, and the author no
longer works on that. This PR also adds test_approx to CMakeLists where
it's been missing before.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D30292777

Pulled By: ZolotukhinM

fbshipit-source-id: ab6d98e729320a16f1b02ea0c69734f5e7fb2554
2021-08-16 20:36:31 -07:00
e7724bb100 [JIT] Set future's error to current exception as is when --torch_jit_enable_rethrow_caught_exception=true (#63348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63348

This change addresses singlaiiit's comment on D30241792 (61b49c8e41), which makes the JIT interpreter's behavior consistent between `future` is set and not.

Test Plan: Enhanced `EnableRethrowCaughtExceptionTest.EnableRethrowCaughtExceptionTestRethrowsCaughtException` to cover the modified code path.

Reviewed By: singlaiiit

Differential Revision: D30347782

fbshipit-source-id: 79ce57283154ca4372e5341217d942398db21ac8
2021-08-16 17:32:13 -07:00
075024b9a3 [Static Runtime] Fix a bug that assigns multiple outputs to single storage (#63012)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63012

This change fixes a bug that the static runtime's memory optimizer assigns multiple outputs of a node to the same storage.  Fixing this bug enables the static runtime to run `inline_cvr` with its memory optimizer enabled.

A problematic line from `inline_cvr` was as follows:
```
  %7767 : Tensor, %getitem_6419.1 : Tensor = fb::gather_ranges(%tensor74.1, %7764)
```
where enabling the memory optimizer assigns `%7767` and `%getitem_6419.1` to the same storage, which made their data corrupted during the 2nd iteration.

This change fixed the aforementioned bug by marking all inputs & outputs of a node as `alive` during our liveness analysis. By doing that, no inputs / outputs will collide with each other. I believe this is a fair assumption that most ops' implementation always has, but missing in our analysis before this change.

Test Plan: - Added a unittest `StaticRuntime.ValuesShareSameStorageDoesNotContainOutputsFromSameNode` to cover the new code.

Reviewed By: hlu1

Differential Revision: D30202018

fbshipit-source-id: 10287a1bee9e86be16a5201e9a7cd7c7f046bab9
2021-08-16 16:52:02 -07:00
068d6fec5c [Model Averaging] Add a few member methods of PostLocalSGDOptimizer (#63340)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63340

Some methods are needed such as accessing optimizer states. These are necessary for integration with PyTorch Lightning.

Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 135912246

Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_hook_parity_post_localSGD

Reviewed By: rohan-varma

Differential Revision: D30328794

fbshipit-source-id: e585b874313bd266fdc7c79936e2af98700c7bad
2021-08-16 16:39:01 -07:00
aa63c0d9df [PyPer] Skip printing out per node time when do_profile is on (#63256)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63256

This suppresses printing out the per node time which is very long when the net has too many ops. It can be easily turned on by setting `--pt_sr_print_per_node_time=1`.

Reviewed By: ajyu, mikeiovine

Differential Revision: D30298331

fbshipit-source-id: 32b3f93b3fe19d335654168311fda93331a1e706
2021-08-16 16:32:19 -07:00
b2069e7d01 Refactor NnapiCompilation registration into it's own file (#63183)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63183

Move registration of NnapiCompilation into it's own file, so that `nnapi_bind.cpp` (which contains the implementation of NnapiCompilation) can be moved to `aten_cpu`, while maintaining the selectiveness for registration.

`nnapi_bind.cpp` is moved to `aten_cpu` in https://github.com/pytorch/pytorch/pull/62919. See the PR for more details on why it's needed.

ghstack-source-id: 135900318

Test Plan: Nnapi unit tests: `python test/test_nnapi.py`

Reviewed By: iseeyuan

Differential Revision: D30288708

fbshipit-source-id: 6ed5967fa6bd018075469d18e68f844d413cf265
2021-08-16 15:45:26 -07:00
da36bbcd35 Add section to CONTRIBUTING.md explaining developer docs (#63228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63228

It is a quick summary and links to a page on the Developer Wiki that has
more detail.

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30347109

Pulled By: zou3519

fbshipit-source-id: a6242986d275e5279ca3f61ade2294a132d268c4
2021-08-16 15:44:10 -07:00
4982fc4ecf test: Add ability to set CONTINUE_THROUGH_ERROR (#63357)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63357

Adds the ability to set CONTINUE_THROUGH_ERROR as an environment
variable so that we can easily set it without having to add the flag
directly

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: astaff

Differential Revision: D30351108

Pulled By: seemethere

fbshipit-source-id: 767fa9bd24e1399f359eb24d16f6cc985a2d7173
2021-08-16 15:35:40 -07:00
6acd87fe6a Add driver function to run test_sharded_tensor.py and test_sharding_spec.py (#63189)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63189

Add main --> run_tests func in test file which is needed to launch the real test cases in OSS flow.

Test Plan:
b/f:
$ python test/distributed/_sharding_spec/test_sharding_spec.py --v   ==> nothing happened
$ python test/distributed/_sharded_tensor/test_sharded_tensor.py --v ==> nothing happened

after:

$ python test/distributed/_sharding_spec/test_sharding_spec.py --v   ==>

test_chunked_sharding_spec (__main__.TestShardingSpec) ... ok
test_device_placement (__main__.TestShardingSpec) ... ok
test_enumerable_sharding_spec (__main__.TestShardingSpec) ... ok

$ python test/distributed/_sharded_tensor/test_sharded_tensor.py --v

test_complete_world_size (__main__.TestShardedTensorChunked) ... ok
test_insufficient_sharding_dims (__main__.TestShardedTensorChunked) ... ok
test_invalid_pg_rpc_ranks (__main__.TestShardedTensorChunked) ... [W tensorpipe_agent.cpp:699] RPC agent for worker2 encountered error when reading incoming request from worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
ok
test_invalid_sharding (__main__.TestShardedTensorChunked) ... ok
test_load_state_dict_errors (__main__.TestShardedTensorChunked) ... ok
test_multiple_local_shards (__main__.TestShardedTensorChunked) ... ok
test_new_group (__main__.TestShardedTensorChunked) ... ok
test_partial_world_size (__main__.TestShardedTensorChunked) ... ok
test_sharded_tensor_metadata (__main__.TestShardedTensorChunked) ... ok
test_sharded_tensor_sizes (__main__.TestShardedTensorChunked) ... ok
test_sharding_columns (__main__.TestShardedTensorChunked) ... ok
test_state_dict (__main__.TestShardedTensorChunked) ... ok
test_state_dict_new_group (__main__.TestShardedTensorChunked) ... ok
test_state_dict_no_sharded_tensors (__main__.TestShardedTensorChunked) ... ok
test_grid_sharding (__main__.TestShardedTensorEnumerable) ... ok
test_multiple_local_shards (__main__.TestShardedTensorEnumerable) ... ok
test_new_group (__main__.TestShardedTensorEnumerable) ... ok
test_partial_world_size (__main__.TestShardedTensorEnumerable) ... ok
test_sharded_tensor_metadata (__main__.TestShardedTensorEnumerable) ... ok
test_uneven_shards (__main__.TestShardedTensorEnumerable) ... ok
test_with_rpc_names (__main__.TestShardedTensorEnumerable) ... ok
test_init_from_local_shards (__main__.TestShardedTensorFromLocalShards) ... ok
test_init_from_local_shards_invalid_shards (__main__.TestShardedTensorFromLocalShards) ... ok
test_init_from_local_shards_invalid_shards_gaps (__main__.TestShardedTensorFromLocalShards) ...

Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D30294094

fbshipit-source-id: 08f0431a12ea854abe00dc920205b10ba43ae6b6
2021-08-16 15:25:32 -07:00
f4f2c1231a [fx2trt] add unsqueeze converter (#63355)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63355

Added converter for acc_ops.unsqueeze. Needed for ig model.

DIdn't add support for input that has more than one dynamic dim. This is not needed right now and I feel it would be a rare case.

Test Plan: unit test

Reviewed By: yinghai

Differential Revision: D30138293

fbshipit-source-id: 899fe8eb68387de83195a2f6e199618d96f09a9e
2021-08-16 15:18:43 -07:00
078b8004a6 [Static Runtime] Implement prim::TupleUnpack (#63243)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63243

Add `prim::TupleUnpack` native op to static runtime.

Test Plan: Unit test: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D30306955

fbshipit-source-id: 21923d6cbd5545c144ac051b3d48b37ec6e610cf
2021-08-16 14:56:30 -07:00
a12b371f7c [fx2trt] Factor out add_matrix_multiply_layer
Summary: Factor out the function so that it can be reused in future diffs

Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_matmul

Reviewed By: 842974287

Differential Revision: D30322823

fbshipit-source-id: 069b945de2c744cdbcca1618b62827692dfb4174
2021-08-16 14:13:37 -07:00
MY_
dc5ce22a1a A re-open PR: Avoid re-creating the random number generator in RandomSampler (#63026)
Summary:
More details can be found in the old pr: https://github.com/pytorch/pytorch/pull/53085

ejguan  Thanks for your guidance. I tried to reopen this PR following your instructions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63026

Reviewed By: anjali411

Differential Revision: D30224920

Pulled By: ejguan

fbshipit-source-id: 2fa83bd4a2661485e553447fe3e57ce723f2716d
2021-08-16 14:08:37 -07:00
3f06f29577 Improve pip package determination (#63321)
Summary:
Invoking `pip` or `pip3` yields list of packages invoked for `pip` alias on the path, rather than for the one currently being executed. Changed `get_pip_packages` to use `sys.executable + '-mpip'`

Also, add mypy to the list of packages of interest

Discovered while looking at https://github.com/pytorch/pytorch/issues/63279

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63321

Reviewed By: walterddr

Differential Revision: D30342099

Pulled By: malfet

fbshipit-source-id: fc8d17cf2ddcf18236cfde5c1b9edb4e72804ee0
2021-08-16 13:54:39 -07:00
4a59f0b9d9 [Profiler] Change FLOP/s to Total FLOPs (#62779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62779

Change from floating point operations per second to total floating point operations.  This requires removing the division  by executing time from the Kineto computed FLOPs and updating necessary documentation

Test Plan:
Running the following script:

```
import torch
from torch.profiler import profile
import torchvision.models as models

model = models.resnet18().eval()
inputs = torch.randn(5, 3, 224, 224)
with torch.no_grad():
    with profile(record_shapes=True, with_flops=True) as prof:
        model(inputs)
print(prof.key_averages().table(sort_by="cpu_time_total"))
```

Before diff results in:

{F636640118}

And after diff should be about `(27.78 * 10^9) FLOP/s * .652838 seconds =18135839640 FLOP = 18.136 GFLOP`.  Running the script again yields this answer:

{F636655686}

------------------------------------

Reviewed By: gdankel

Differential Revision: D29972997

fbshipit-source-id: 0f8d9f264b7d9f8f6bb3f10ab7c2c9794291e28b
2021-08-16 13:43:32 -07:00
d2e8359971 Fix triage workflow when the card already exists in project (#63347)
Summary:
Fixes issues like https://github.com/pytorch/pytorch/runs/3336787242

```
RequestError [HttpError]: Validation Failed: {"resource":"ProjectCard","code":"unprocessable","field":"data","message":"Project already has the associated issue"}
Error: Unhandled error: HttpError: Validation Failed: {"resource":"ProjectCard","code":"unprocessable","field":"data","message":"Project already has the associated issue"}
    at /home/runner/work/_actions/actions/github-script/v2/dist/index.js:7531:23
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at async eval (eval at callAsyncFunction (/home/runner/work/_actions/actions/github-script/v2/dist/index.js:7985:56), <anonymous>:63:1)
    at async main (/home/runner/work/_actions/actions/github-script/v2/dist/index.js:8011:20) {
  name: 'HttpError',
  status: 422,

...
```

The card may already exist, thus no need to handle `422` status code. Anything else will re-throw the err.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63347

Reviewed By: malfet

Differential Revision: D30348529

Pulled By: zhouzhuojie

fbshipit-source-id: 36647837bfccad43ce01eb5dfe6642e685615037
2021-08-16 13:33:58 -07:00
3ce67efea2 [opinfo] nn.functional.pad (#62814)
Summary:
Reference: https://github.com/facebookresearch/functorch/issues/78

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62814

Reviewed By: VitalyFedyunin

Differential Revision: D30307492

Pulled By: zou3519

fbshipit-source-id: 4f6062eb4a3c91ed1795df1f82846afa0abafcdc
2021-08-16 13:29:34 -07:00
1e8de64c66 Add expecttest to requirements.txt (#63320)
Summary:
This PR closes the developer environment gap left by https://github.com/pytorch/pytorch/issues/60658 by adding [expecttest](https://github.com/ezyang/expecttest) to `requirements.txt`. Thus it provides a solution to one of the short-term problems that https://github.com/pytorch/pytorch/issues/60697 tries to solve, but does not provide a long-term solution to https://github.com/pytorch/pytorch/issues/61375.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63320

Reviewed By: malfet

Differential Revision: D30340654

Pulled By: samestep

fbshipit-source-id: 26c8f8c9889cce4a94fafb1bf2f0d6df4c70503f
2021-08-16 13:22:43 -07:00
e75ed4a4b5 add comma to prevent syntax errors (#62492)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62492

Reviewed By: VitalyFedyunin

Differential Revision: D30304684

Pulled By: ezyang

fbshipit-source-id: db08ca39bcecbfd79ea50df18536bf4e87f51e15
2021-08-16 12:27:31 -07:00
0074a099a8 Retry apt-get during setup_ci_workspace (#63319)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63319

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D30346067

Pulled By: bertmaher

fbshipit-source-id: 2aafa97e78f9297553d772b2524d6f1c0ebaa46e
2021-08-16 12:20:51 -07:00
dbcfd7739f Make torch.lu differentiable for wide/tall inputs + jit (#61564)
Summary:
As per title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61564

Reviewed By: astaff

Differential Revision: D30338136

Pulled By: mruberry

fbshipit-source-id: f01436fc90980544cdfa270feee16bb3dda21b93
2021-08-16 11:40:57 -07:00
979180cd01 [Model Averaging] Allow subgroup to be None in PostLocalSGDState (#63277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63277

`PostLocalSGDState` requires a subgroup. To initialize this subgroup, a global process group must be initialized. However, this imposes a restriction that a hook state can only be provided after distributed environment initialization, which is not compatible with lightning DDP plugin setup where hook state should be provided before distributed environment initialization.

Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 135848575

Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_hook_parity_post_localSGD

Reviewed By: cbalioglu

Differential Revision: D30325041

fbshipit-source-id: 7b870166d096d306c3f2f7c69816a705cec0bebd
2021-08-16 10:07:41 -07:00
d5d5f42ea9 Revert "[docs] Update docs for NegativeBinomial (#45693)" (#63192)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63192

**Summary**
This reverts commit 402caaeba513929dcfe12df183c764b0ef43f688. As per the
dicussion in #62178, this commit was not needed.

**Test Plan**
Continuous integration.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D30293202

Pulled By: SplitInfinity

fbshipit-source-id: 91ee7ad0523a9880605d83fe9712c39df67384a8
2021-08-16 09:14:44 -07:00
d1cbee7b2b Refactor BucketBatch (#63185)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63185

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D30288893

Pulled By: ejguan

fbshipit-source-id: b88b792d12a83c99d8ea9e516e3b4c54a82100f6
2021-08-16 06:42:56 -07:00
56d609d93e Replace str by repr for DataChunk (#63184)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63184

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D30288892

Pulled By: ejguan

fbshipit-source-id: 45c88fdd3987e234f2c22ebbbfd8d5044983c34c
2021-08-16 06:41:38 -07:00
e50e8b07d8 [nnc] Updated IRMutator and IRSimplifier to perform in-place mutations. (#63246)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63246

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30309636

Pulled By: navahgar

fbshipit-source-id: 409ea8d6982888cfee9127e6248044dd2ed9d8d4
2021-08-16 00:09:22 -07:00
a421cba325 [docs][ao] Add overload information for fake_quantize_per_tensor_affine (#63258)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63258

This function supports scalar and tensor qparams

Test Plan:
CI

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D30316432

fbshipit-source-id: 8b2f5582e7e095fdda22c17d178abcbc89a2d1fc
2021-08-15 22:47:05 -07:00
0831b59cf5 [docs][ao] Add missing docstrings for quantized_max_pool1d and quantized_max_pool2d (#63242)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63242

These functions are part of the native functions namespace as well as the quantized namespace

Test Plan:
CI

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D30316430

fbshipit-source-id: cd9c839e5c1a961e3c6944e514c16fbc256a2f0c
2021-08-15 22:47:03 -07:00
a090073fe4 [docs][ao] Add missing documentation for torch.quantized_batch_norm (#63240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63240

Op is exposed via torch.quantized_batch_norm to the end user without any existing documentation

Test Plan:
CI

Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D30316431

fbshipit-source-id: bf2dc8b7b6f497cf73528eaa2bedef9f65029d84
2021-08-15 22:45:56 -07:00
50fc8e8250 [OpInfo] Add expected_failure kwarg to SkipInfo (#62963)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62963

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D30327199

Pulled By: heitorschueroff

fbshipit-source-id: 45231eca11d1697a4449d79849fb17264d128a6b
2021-08-15 18:09:20 -07:00
8987726cc6 Small refactor for OpInfo decorators (#62713)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62713

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D30327200

Pulled By: heitorschueroff

fbshipit-source-id: 1899293990c8c0a66da88646714b38f1aae9179d
2021-08-15 18:08:12 -07:00
3ca3349555 [Pytorch Edge] Fix broken test post changes in error reporting format. (#63287)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63287

Recent changes in https://github.com/pytorch/pytorch/pull/62419 changed
the way module hierarchy is reported. Now it includes information about
function names as well.

Test Plan:
python test/mobile/test_lite_script_module.py
TestLiteScriptModule.test_save_mobile_module_with_debug_info_with_trace

Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D30328512

fbshipit-source-id: ddd6b11b9ab01cc725f4568a35eff7a92f17204b
2021-08-15 16:14:11 -07:00
cec08e7032 To add warm-up scheduler to optim (#60836)
Summary:
Warm up of learning rate scheduling has initially been discussed  by Priya et. al. in the paper: https://arxiv.org/pdf/1706.02677.pdf .

In the section 2.2 of the paper they discussed and proposed idea of warming up learning schedulers in order to prevent big variance / noise in the learning rate. Then idea has been further discussed in the following papers:
  * Akilesh Gotmare et al. https://arxiv.org/abs/1810.13243
  * Bernstein et al  http://proceedings.mlr.press/v80/bernstein18a/bernstein18a.pdf
  * Liyuan Liu et al: https://arxiv.org/pdf/1908.03265.pdf

There are two type of popularly used learning rate warm up ideas
  * Constant warmup  (start with very small constant learning rate)
  * Linear Warmup        ( start with small learning rate and gradually increase)

In this PR we are adding warm up as learning rate scheduler. Note that learning rates are chainable, which means that we can merge warmup scheduler with any other learning rate scheduler to make more sophisticated learning rate scheduler.

## Linear Warmup

Linear Warmup is multiplying learning rate with pre-defined constant - warmup_factor in the first epoch (epoch 0). Then targeting to increase this multiplication constant to one in warmup_iters many epochs. Hence we can derive the formula at i-th step to have multiplication constant equal to:

                    warmup_factor + (1-warmup_factor) * i /  warmup_iters

Moreover, the fraction of this quantity at point i to point i-1 will give us

           1 + (1.0 - warmup_factor) / [warmup_iters*warmup_factor+(i-1)*(1-warmup_factor)]

which is used in get_lr() method in our implementation. Below we provide an example how to use linear warmup scheduler and to give an example to show how does it works.

```python
import torch
from torch.nn import Parameter
from torch.optim import SGD
from torch.optim.lr_scheduler import WarmUpLR

model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=10, warmup_method="linear")

for epoch in range(15):

    print(epoch, scheduler.get_last_lr()[0])

    optimizer.step()
    scheduler.step()
```

```
0 0.010000000000000002
1 0.019000000000000003
2 0.028000000000000008
3 0.03700000000000001
4 0.04600000000000001
5 0.055000000000000014
6 0.06400000000000002
7 0.07300000000000002
8 0.08200000000000003
9 0.09100000000000004
10 0.10000000000000005
11 0.10000000000000005
12 0.10000000000000005
13 0.10000000000000005
14 0.10000000000000005
```

## Constant Warmup

Constant warmup has straightforward idea, to multiply learning rate by warmup_factor until we reach to epoch warmup_factor, then do nothing for following epochs

```python
import torch
from torch.nn import Parameter
from torch.optim import SGD
from torch.optim.lr_scheduler import WarmUpLR

model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant")

for epoch in range(10):

    print(epoch, scheduler.get_last_lr()[0])

    optimizer.step()
    scheduler.step()
```

```
0 0.010000000000000002
1 0.010000000000000002
2 0.010000000000000002
3 0.010000000000000002
4 0.010000000000000002
5 0.10000000000000002
6 0.10000000000000002
7 0.10000000000000002
8 0.10000000000000002
9 0.10000000000000002
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60836

Reviewed By: saketh-are

Differential Revision: D29537615

Pulled By: iramazanli

fbshipit-source-id: d910946027acc52663b301f9c56ade686e62cb69
2021-08-15 12:31:45 -07:00
8e0998ca70 Move fx2trt and oss_acc_tracer to oss (#63101)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63101

Move internal fx2trt to torch/fx/experimental/fx2trt and merge the two TRT interpreter we have right now. cc: mortzur as this might affect uru exporting script.

Move oss_acc_tracer to torch/fx/experimental/fx_acc.

Test Plan: CI

Reviewed By: jerryzh168

Differential Revision: D30257909

fbshipit-source-id: 4e374965fbf88d72e91844d9e9b6ff9b98f467d1
2021-08-15 11:53:36 -07:00
0ce4d30c44 Hide all symbols in llvm namespace (#63272)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63272

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D30331695

Pulled By: bertmaher

fbshipit-source-id: d35130c96f7e2a31fa86d9d80de59002e96301df
2021-08-15 11:29:43 -07:00
045c4cb82f Add copy button to code snippets in docs (#63149)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63149

Test Plan: Imported from OSS

Reviewed By: navahgar, albanD

Differential Revision: D30308891

Pulled By: anjali411

fbshipit-source-id: ad51180ab2f27c4525682b2603bbf753bb8f1ce9
2021-08-15 06:25:32 -07:00
38c185189c [Pytorch Edge] Enable kineto profiler on mobile via EdgeKinetoProfiler (#62419)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419

This diff adds support for cpu only kineto profiler on mobile. Thus
enabling chrome trace generation on mobile. This bring cpp API for
mobile profiling on part with Torchscript.
This is done via:
1. Utilizating debug handle annotations in KinetoEvent.
2. Adding post processing capability, via callbacks, to
KinetoThreadLocalState
3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be
used in surrounding scope of model execution. This will write chrome
trace to the location specified in profiler constructor.

Test Plan:
MobileProfiler.ModuleHierarchy

Imported from OSS

Reviewed By: raziel

Differential Revision: D29993660

fbshipit-source-id: 0b44f52f9e9c5f5aff81ebbd9273c254c3c03299
2021-08-13 21:40:19 -07:00
77a6436cac [Pytorch Mobile] Combing instructions and debug hanles in single struct (#62418)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62418

Debug handles have one to one correspondence with instruction, so just
combine them in one.

Test Plan:
CI

Imported from OSS

Reviewed By: raziel

Differential Revision: D29993661

fbshipit-source-id: 125c7163174cf66624dd95f110fdc8208fea8a07
2021-08-13 21:40:17 -07:00
1b04d99f55 [Pytorch Profiler] Introduce scopes to enableProfiler (#62417)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62417

This diff adds an option to make enableProfiler enable callbacks only
for certain RecordScopes.
Why?
Profiling has some overhead when we repeatedly execute callbacks for
alls copes. On mobile side when we often have small quantized models
this overhead can be large. We observed that by only profiling top level
op and skipping profiling of other atend ops called within we can limit
this overhead. For example, instead of profling at::conv2d -> at::convolution ->
at::convolution_ and further more if ops like transpose etc. are called,
skipping profiling of those. Of course this limits the visibility, but
at the least this way we get a choice.

Test Plan: Imported from OSS

Reviewed By: ilia-cher

Differential Revision: D29993659

fbshipit-source-id: 852d3ae7822f0d94dc6e507bd4019b60d488ef69
2021-08-13 21:40:15 -07:00
b00afe135d [Pytorch Profiler] Add debug_handles to KinetoEvent (#62228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62228

This diff adds debug handles to events and provides a way to use
RECORD_FUNCTIONs that will pass debug_handles down to profiler, which
will record it in the events.

Why add debug_handles?
For pytorch mobile, with lite interpreter, we generate debug handles
that can be used for lazily symbolicate exception traces to model level
stack trace. Similar to the model level stack trace you get in
TorchScript models. The debug_handles also enable getting module
hierarchy for lite interpreter model, support for which was added to
KinetoProfiler in previous diffs.

Followup plan:
1. Enabled scope callbacks such that lite interpreter can use it to
profiler only top level ops.
2. Enable post processing callbacks that take KinetoEvents and populate
module hierarchy using debug handles.

This will let us use KinetoProfiler for lite interpter use cases on
mobile. Aim is to use RAII guard to similarly generate chrome trace for
mobile usecases as well, although only for top level ops.

Test Plan:
test_misc : RecordDebugHandles.Basic

Imported from OSS

Reviewed By: ilia-cher

Differential Revision: D29935899

fbshipit-source-id: 4f06dc411b6b5fe0ffaebdd26d3274c96f8f389b
2021-08-13 21:40:14 -07:00
44b12ba862 [Pytorch Profiler] Move start timestamp to end of start callback (#62191)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62191

This moves start timestamping to end of callback. This way we dont
account for callstack/module hierarchy related overhead in op runtime.

Test Plan:
CI

Imported from OSS

Reviewed By: ilia-cher

Differential Revision: D29910519

fbshipit-source-id: f462031a81ae12b3db7993cf482e5ad93a35e096
2021-08-13 21:40:12 -07:00
54f2eb6e7e [Pytorch Profiler] Add support for adding module hierarchy to (#61792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61792

KinetoEvent

This PR adds module hierarchy information to events.
What is module hierarchy information attached to events?
During profiling a TorchScript module, when events are added, we ask JIT
what is the module hierarchy associated with the node being
executed. At the time of execution of that node, there might be multiple
frames in the stack of interpreter. For each frame, we find
corresponding node and the corresponding module hierarchy is queried.
Module hierarchy corresponding to the node is associated with node's
InlinedCallStack. InlinedCallStack of node tracks the path via which the
node is inlined. Thus during the inlining process we annotate
module information corresponding to the CallMethod nodes being inlined.

With this PR, chrome trace will contain additional metadata:
"Module Hierarchy". This can look like this:
TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward
It contains module instance, type name and the method name in the
callstack.

Test Plan:
test_profiler

Imported from OSS

Reviewed By: raziel, ilia-cher

Differential Revision: D29745442

fbshipit-source-id: dc8dfaf7c5b8ab256ff0b2ef1e5ec265ca366528
2021-08-13 21:39:10 -07:00
385b082854 add substract of max and testcase (#63132)
Summary:
As discussed here https://github.com/pytorch/pytorch/pull/62897, in the path of BF16/non-last-dim Softmax, we miss the subtractions of max value which will cause the overflow in the `exp()` calculation when the value of input tensor is large, such as `1000.0`.
To avoid this issue, we add the subtractions of max value and the corresponding test cases in this PR.

Note w/o subtractions of max value(accidental reverts or changes), we will get the underlying error message of the test case
```
AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.05 and atol=0.05, found 103984 element(s) (out of 126720) whose difference(s) exceeded the margin of error (including 103984 nan comparisons). The greatest difference was nan (0.0 vs. nan), which occurred at index (0, 0, 0, 1).
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63132

Reviewed By: VitalyFedyunin

Differential Revision: D30280792

Pulled By: cpuhrsch

fbshipit-source-id: 722821debf983bbb4fec878975fa8a4da0d1d866
2021-08-13 20:50:49 -07:00
baedb559e3 OpInfo: nn.functional.conv_transpose2d (#62882)
Summary:
See https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261.

cc: mruberry zou3519 Chillee

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62882

Reviewed By: bdhirsh

Differential Revision: D30280804

Pulled By: zou3519

fbshipit-source-id: e40cdf43e98c1f11e45df6b8bc13110b4d29c45f
2021-08-13 17:11:23 -07:00
f8e217a17e refactor fx2trt example script so it can be imported as a library (#63262)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63262

Just create a `__main__` guard.

Test Plan: run linter, sandcastle tests

Reviewed By: 842974287

Differential Revision: D30263617

fbshipit-source-id: 8044ce5d815b043c3778591384cb13d9a89d0048
2021-08-13 16:59:29 -07:00
3f43a8b9a3 [iOS] Add LibTorch-Lite-Nightly pod (#63239)
Summary:
D30090760 (e182b459d9) was reverted by D30303292 because of a lint issue in `LibTorch-Lite-Nightly.podspec.template`. Resubmit the diff after fixing the issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63239

Test Plan: Imported from OSS

Reviewed By: xta0

Differential Revision: D30315690

Pulled By: hanton

fbshipit-source-id: f0fa719ffc3b8181ab28c123584ae5c1da8992c0
2021-08-13 16:21:41 -07:00
809e1e7457 Allow TransformerEncoder and TransformerDecoder to accept 0-dim batch sized tensors. (#62800)
Summary:
This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in  https://github.com/pytorch/pytorch/issues/38115.

This PR allows TransformerEncoder and Decoder (alongwith the inner `Layer` classes) to accept inputs with 0-dimensional batch sizes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62800

Reviewed By: VitalyFedyunin

Differential Revision: D30303240

Pulled By: jbschlosser

fbshipit-source-id: 8f8082a6f2a9f9d7ce0b22a942d286d5db62bd12
2021-08-13 16:11:57 -07:00
ab7a472980 [ROCm] Update HIP_VERSION to TORCH_HIP_VERSION (#62786)
Summary:
- HIP_VERSION semantic versioning will change in ROCm4.3. The changes essentially remove the dependency on HIP_VERSION provided in the hip header to keep code compatible with older and newer versions of ROCm.
- TORCH_HIP_VERSION is derived from HIP_VERSION_MAJOR and HIP_VERSION_MINOR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62786

Reviewed By: bdhirsh

Differential Revision: D30281682

Pulled By: seemethere

fbshipit-source-id: e41e69fb9e13de5ddd1af99ba5bbdcbb7b64b673
2021-08-13 15:00:43 -07:00
e711b5ce6c Respect user-set CMAKE_PREFIX_PATH (#61904)
Summary:
Fixes the case where the `CMAKE_PREFIX_PATH` variable gets silently overwritten by a user specified environment variable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61904

Reviewed By: walterddr, malfet

Differential Revision: D29792014

Pulled By: cbalioglu

fbshipit-source-id: babacc8d5a1490bff1e14247850cc00c6ba9e6be
2021-08-13 13:49:05 -07:00
90a96e0642 Remove left-over print in test_diff_graph_inline_threshold (#63231)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63231

Reviewed By: VitalyFedyunin

Differential Revision: D30305851

Pulled By: gmagogsfm

fbshipit-source-id: 43da3b5f49ad4a6a2d6d174acf792f3ccf41a463
2021-08-13 13:11:27 -07:00
cc6b023cba Add CostInferenceFunction for SplitOp (#63133)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63133

SplitOp is costly but missing cost inference function which hurts cost based balancing. Changes are:
(1) Addition of CostInferenceFunction for SplitOp
(2) Small fix in CostInferenceFunction for ConcatOp

Test Plan:
Added unit tests:

buck test //caffe2/caffe2/python/operator_test:split_op_cost_test

buck test //caffe2/caffe2/python/operator_test:concat_op_cost_test

Reviewed By: smacke

Differential Revision: D30247360

fbshipit-source-id: 989e962f3a981acc85b73aac3fb23e603b7d1591
2021-08-13 12:28:15 -07:00
acdad8bc63 [docs] Merge note block in torch.lu documentation (#63156)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63156

**Summary**
This commit merges the four successive `Note` blocks that appear in the
documentation for `torch.lu`. Each one only has one line in it, so all
of them have been merged into one block with a bulleted list that
contains the original items.

**Test Plan**
Continuous integration.

*Before*
<img width="888" alt="Captura de Pantalla 2021-08-12 a la(s) 10 48 39 a  m" src="https://user-images.githubusercontent.com/4392003/129244443-b7d1594e-8833-4c20-a911-e1bf7ca88a8d.png">

*After*
<img width="932" alt="Captura de Pantalla 2021-08-12 a la(s) 10 48 46 a  m" src="https://user-images.githubusercontent.com/4392003/129244462-1f39dcdb-90e0-4fd9-a95f-343b0b6be1f1.png">

**Fixes**
This commit fixes #62339.

Test Plan: Imported from OSS

Reviewed By: navahgar, pbelevich

Differential Revision: D30292633

Pulled By: SplitInfinity

fbshipit-source-id: cb9071165629bfe7316b1d2fe952e4354c75d48f
2021-08-13 12:11:35 -07:00
e5c32cdde7 [docs] Remove input parameter from Tensor.flatten docs (#63180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63180

**Summary**
This commit removes the `input` parameter from the signature for
`Tensor.flatten` shown in its documentation. This parameter is accepted
by `torch.flatten` but not `Tensor.flatten` (since the input is the
`Tensor` on which `flatten` is invoked).

**Test Plan**
Continuous integration.

**Fixes**
This commit fixes #57478.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D30293156

Pulled By: SplitInfinity

fbshipit-source-id: 4ad70d638af009fb6bdeb703433b306904d39a76
2021-08-13 12:10:16 -07:00
548fe682e2 [docs] Add cross references to torch.transpose and torch.t (#63177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63177

**Summary**
This commit adds a link in the documentation for `torch.transpose` that
directs to `torch.t` and vice versa. These two functions are related and
it is useful for users of one to know about the other.

**Test Plan**
Continuous integration.

**Fixes**
This commit fixes #56267.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D30292654

Pulled By: SplitInfinity

fbshipit-source-id: 8e60cd7a598ff8b4756cb30141399dfe8e118338
2021-08-13 11:51:55 -07:00
7107c367b5 [docs] Mention vsplit, hsplit and tensor_split in Tensor views doc (#63191)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63191

**Summary**
This commit adds `vsplit`, `hsplit` and `tensor_split` to the list of
view ops on the Tensor Views documentation page.

**Test Plan**
Continuous integration.

*Before*
<img width="195" alt="Captura de Pantalla 2021-08-12 a la(s) 2 55 07 p  m" src="https://user-images.githubusercontent.com/4392003/129275921-c1cfdf6c-9f1f-45f3-98b6-1de7a0f0cc84.png">

*After*
<img width="197" alt="Captura de Pantalla 2021-08-12 a la(s) 2 55 15 p  m" src="https://user-images.githubusercontent.com/4392003/129275936-de4afde7-0143-4e1d-b38f-c86256f4896c.png">

**Fixes**
This commit fixes #62727.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D30293181

Pulled By: SplitInfinity

fbshipit-source-id: 283783a4ccc3ebc50cb0a427e55c7a6cb618ffd7
2021-08-13 11:44:38 -07:00
38a825c648 Allow Average Pooling modules to accept tensors with 0-dim batch sizes. (#62025)
Summary:
This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in  https://github.com/pytorch/pytorch/issues/38115.

It introduces changes and tests for allowing the Average Pooling layers to accept tensors with 0 sized batch dimensions and return meaningful results.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62025

Reviewed By: VitalyFedyunin

Differential Revision: D30303256

Pulled By: jbschlosser

fbshipit-source-id: 5f727e62a7c58d2b8bb49fcc3bd7688474917ba5
2021-08-13 11:31:17 -07:00
de7ae9e9b6 [skip ci] fix workflow code generation (#63235)
Summary:
Fixes a clean git check with code generation introduced by https://github.com/pytorch/pytorch/pull/63148

`generated-win-vs2019-cuda10-py3.yml` was renamed as `generated-win-vs2019-cuda10.1-py3.yml`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63235

Reviewed By: VitalyFedyunin

Differential Revision: D30306474

Pulled By: zhouzhuojie

fbshipit-source-id: cbae1ace064e360e8ca0c0e997116bdb20d54d46
2021-08-13 10:38:30 -07:00
000e3a0881 [Static Runtime] Add pass to eliminate __getitem__/DictConstruct calls (#62429)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62429

Introduce a new pass to eliminate calls to `prim::DictConstruct/aten::__getitem__`. Given a graph like this:
```
%2 : Dict = prim::DictConstruct(%key, %value)
%3 : Tensor = aten::__getitem__(%2, %key)
%4 : Tensor = op(%3)
```
This pass produces a graph like this (after dead code elimination):
```
%4 : Tensor = op(%value)
```

This optimization is applied in the static runtime.

Test Plan:
`buck test //caffe2/test:jit -- TestPeephole`

**local.forward performance summary**
About 3% runtime benefit. All `DictConstruct` calls optimized out, `__getitem__` calls reduced significantly (~50% of them are cut out)
P438354810

**local_request_only.forward performance summary**
About 14% runtime benefit. Again, all `DictConstruct` calls optimized out, 50% `__getitem__` calls removed.
P438359742

There is some variance with runtime measurements, so take these numbers with a grain of salt. Also note that the benefit does not exist in the shrunk model since there are no `DictConstruct` calls

Reviewed By: hlu1

Differential Revision: D29995087

fbshipit-source-id: f376376a46ff808115afd2d60446e5db8f6f752f
2021-08-13 10:21:16 -07:00
fcc1f87b6a Fixing user inputs for low, high in make_tensor (#61108)
Summary:
**TODOs:**

* [x] Do not clamp inputs for low and high when given and valid.
* [x] Devise rules for modifying `low` and `high` when extremals/invalid values passed.
* [x] Testing with `test_references_numerics_hard` with the revised changes. _(I've tested locally, the changes will take place in a separate PR though after offline discussion with mruberry)_
* [x] Revise comments/documentation for `make_tensor`

See https://github.com/pytorch/pytorch/issues/61758 for tracker issue.

cc: mruberry pmeier

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61108

Reviewed By: VitalyFedyunin

Differential Revision: D30296167

Pulled By: mruberry

fbshipit-source-id: 67e8d15b173209a9c97ca013231494a5fa99f8c7
2021-08-13 10:13:12 -07:00
720a7a0d81 [hackathon] fix benchmarking script in CONTRIBUTING (#63199)
Summary:
[skip ci]
Per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63199

Reviewed By: mruberry

Differential Revision: D30305487

Pulled By: ngimel

fbshipit-source-id: 2704c4f08ab976a55c9f8c2fe54cd4f3f39412cf
2021-08-13 09:50:48 -07:00
bd9fad25c2 [codemod][lint][caffe2] Extend BLACK coverage
Test Plan: Sandcastle

Reviewed By: zsol

Differential Revision: D30302716

fbshipit-source-id: f9724d4f4d1b8950f581cc2c6c77eedf19b4b6fc
2021-08-13 09:28:10 -07:00
c5f3ab6982 ENH Adds no_batch_dim to FractionalMaxPool2d (#62490)
Summary:
Towards https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62490

Reviewed By: bdhirsh

Differential Revision: D30287143

Pulled By: jbschlosser

fbshipit-source-id: 1b9dd932157f571adf3aa2c98c3c6b56ece8fa6e
2021-08-13 08:48:40 -07:00
61b49c8e41 [JIT] Add a flag to rethrow caught exception in jit interpreter (#63073)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63073

It turned out that it's less than ideal to print out verbose stacktrace in exception messages in high-QPS services (see the related task) with a non-significant failure rate due to the truncation of long stacktrace which results in losing the original exception message thrown from native code. It is actually desirable to retain only the message of the original exception directly thrown from native code in such a usecase.

This change adds a new flag `torch_jit_disable_exception_stacktrace` to the pytorch jit interpreter to suppress stacktrace in the messages of exception thrown from the interpreter.

Reviewed By: Krovatkin

Differential Revision: D30241792

fbshipit-source-id: c340225c69286663cbd857bd31ba6f1736b1ac4c
2021-08-13 08:44:24 -07:00
32b6104f37 Port norm kernel to structured kernels. (#62711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62711

Tracking issue: #55070

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D30109866

Pulled By: ezyang

fbshipit-source-id: 894c9496894d059c7690a174b75bbd4db7ed6016
2021-08-13 08:27:48 -07:00
07bb6e4fd0 Port prod kernel to structured kernels. (#62024)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62024

Tracking issue: #55070

In this PR, I also broke down the meta functions of other reduction kernels (e.g. `all`,
`argmax`, `sum`) into the composition of common patterns.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29847122

Pulled By: ezyang

fbshipit-source-id: a6680a6cf6e59bb46b8ffe7bf2a3a611d6e0fd14
2021-08-13 08:27:46 -07:00
1280363bad Port mean kernel to structured kernels. (#61643)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61643

Tracking issue: #55070

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29783866

Pulled By: ezyang

fbshipit-source-id: dc95baf593096c03fb5f292ee6c36de3cc7f2b35
2021-08-13 08:26:01 -07:00
2d75703c6a Remove req to call step() in training loop (#63164)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63164

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D30284616

Pulled By: andwgu

fbshipit-source-id: afdb677fb08851b139178a9f6d782196f26773e1
2021-08-13 08:22:44 -07:00
28f9e108b1 Pass _allow_empty_param_list into func opt ctor (#63163)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63163

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D30284615

Pulled By: andwgu

fbshipit-source-id: 4857f5b618ec5b007648737ab532ce605e5d70dc
2021-08-13 08:22:42 -07:00
bd81c9178a Simplify data structures, add uniform approximation, fix mem leak (#63162)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63162

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D30284617

Pulled By: andwgu

fbshipit-source-id: 9bd9e5f89abcc0d3dac56b85d55cc88e843baa9f
2021-08-13 08:20:59 -07:00
75f198d48d [docs][ao] update quantize_per_tensor to mention overloads (#63165)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63165

Add details about the overloads for
* list of tensors input
* supporting tensor scale/zero-point inputs

Test Plan:
CI

Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D30291045

fbshipit-source-id: 9fc6418792c5e3a35417eeb8d31de4a4bfcbb7a5
2021-08-13 08:00:10 -07:00
5abeac3ef7 Make saved tensors default hooks thread local (#62909)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62909

This PR makes saved tensors default hooks thread local.
This allows using default hooks in a multithreaded context.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30165416

Pulled By: Varal7

fbshipit-source-id: 10a7d580661d3d94bdaf398c4e076b7bea11c16b
2021-08-13 07:49:20 -07:00
cb23976f9f Allow 0-dim batch sizes for AdaptiveMaxPool and MaxPool. (#62088)
Summary:
This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in  https://github.com/pytorch/pytorch/issues/38115.

This PR allows `MaxPool` and `AdaptiveMaxPool` to accept tensors whose batch size is 0. Some changes have been made to modernize the tests so that they will show the name of C++ function that throws an error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62088

Reviewed By: bdhirsh

Differential Revision: D30281285

Pulled By: jbschlosser

fbshipit-source-id: 52bffc67bfe45a78e11e4706b62cce1469eba1b9
2021-08-13 07:33:17 -07:00
72bc6dc8c3 DOC Improve documentation for LayerNorm (#63144)
Summary:
In this [commit](7026995f3c) and [issue](https://github.com/pytorch/pytorch/pull/59178#issuecomment-897485295), the [Line 134](47e286d024/torch/nn/modules/normalization.py (L134)) will overwrite the "embedding" variate which would cause an error when initiating `nn.LayerNorm` function.

I suggest renaming the "embedding" in [Line 133](47e286d024/torch/nn/modules/normalization.py (L133)) to "embedding_dim".

The final example is:
```
batch, sentence_length, embedding_dim = 20, 5, 10
embedding = torch.randn(batch, sentence_length, embedding_dim)
layer_norm = nn.LayerNorm(embedding_dim)
```

Fixes #{59178}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63144

Reviewed By: bdhirsh

Differential Revision: D30288778

Pulled By: jbschlosser

fbshipit-source-id: e74b11430e302dae5661bf6e830ee5ac6c1838c4
2021-08-13 07:04:40 -07:00
aa665e1ab8 Revert D30090760: [iOS] Add podspec for libTorch-lite nightly build
Test Plan: revert-hammer

Differential Revision:
D30090760 (e182b459d9)

Original commit changeset: 361aa2ed24a1

fbshipit-source-id: 9c0dfee80a80eb012b142d3928204d6eb8025b0a
2021-08-13 06:45:43 -07:00
dcb5eb8d9b OpInfo for torch.nn.functional.normalize (#62635)
Summary:
See https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261

cc: mruberry zou3519 Chillee

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62635

Reviewed By: H-Huang

Differential Revision: D30136503

Pulled By: zou3519

fbshipit-source-id: 258c069f30d9c2a51ed27dadf94f3703b9432a4a
2021-08-13 06:36:50 -07:00
741accb11e Implements backward for torch.lu_solve (#61681)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22620

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61681

Reviewed By: ngimel

Differential Revision: D30063116

Pulled By: mruberry

fbshipit-source-id: e095b0cadfb7c8b37a7ef91bae5b5dc170d8ef1c
2021-08-12 21:17:11 -07:00
126ff6222e Moving getattr_from_fqn to torch.quantization.utils (#63107)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63107

moving this function because the functionality would be useful outside of ns
ghstack-source-id: 135727260

Test Plan: buck test //caffe2/test:quantization_fx mode/dev-nosan --keep-going --config client.id=nuclide --show-full-output -- suite

Reviewed By: supriyar

Differential Revision: D30260735

fbshipit-source-id: 58deabdd0f3b03b0ee7ee92be0548a0945084d65
2021-08-12 20:59:01 -07:00
07b00fc324 ENH Migrate nll_loss2d from THC to ATen (#62826)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24608
Fixes https://github.com/pytorch/pytorch/issues/24607

With the following benchmark, the backward pass runs a little slower. This is strange since the implementation should be exactly the same.

<details>
 <summary>Benchmark script</summary>

```python
from itertools import product

import torch
import torch.nn as nn
import torch.nn.functional as F
import time

torch.manual_seed(0)
MS_PER_SECOND = 1000

def _time():
    torch.cuda.synchronize()
    return time.perf_counter() * MS_PER_SECOND

device = "cuda"
C = 3
n_runs = 30
reductions = ["none", "sum", "mean"]
Ns = [128, 256, 512]
Hs = [128, 256, 512]

for reduction, N, H in product(reductions, Ns, Hs):
    total_fwd_time = 0
    total_back_time = 0
    if reduction == "none":
        grad_out = torch.randn(N, H, H, device=device)
    else:
        grad_out = torch.randn(1)[0]

    for _ in range(n_runs):
        input = torch.randn(N, C, H, H, device=device, requires_grad=True)
        target = torch.rand(N, H, H, device=device).mul(3).floor().long()

        # forward
        start = _time()
        result = F.nll_loss(input, target, reduction=reduction)
        total_fwd_time += _time() - start

    result = F.nll_loss(input, target, reduction=reduction)
    for _ in range(n_runs):
        # backward
        start = _time()
        result.backward(grad_out, retain_graph=True)
        total_back_time += _time() - start

    fwd_avg = total_fwd_time / n_runs
    bwd_avg = total_back_time / n_runs
    print(
        f"input size({N}, {C}, {H}, {H}), reduction: {reduction}, fwd: {fwd_avg:.2f} (ms), back: {bwd_avg:.2f} (ms)"
    )

```

</details>

<details>
 <summary>master results</summary>

```
input size(128, 3, 128, 128), reduction: none, fwd: 0.34 (ms), back: 0.57 (ms)
input size(128, 3, 256, 256), reduction: none, fwd: 2.56 (ms), back: 3.85 (ms)
input size(128, 3, 512, 512), reduction: none, fwd: 14.54 (ms), back: 16.62 (ms)
input size(256, 3, 128, 128), reduction: none, fwd: 1.26 (ms), back: 1.78 (ms)
input size(256, 3, 256, 256), reduction: none, fwd: 7.07 (ms), back: 8.22 (ms)
input size(256, 3, 512, 512), reduction: none, fwd: 29.38 (ms), back: 33.29 (ms)
input size(512, 3, 128, 128), reduction: none, fwd: 3.41 (ms), back: 4.05 (ms)
input size(512, 3, 256, 256), reduction: none, fwd: 14.32 (ms), back: 16.46 (ms)
input size(512, 3, 512, 512), reduction: none, fwd: 59.20 (ms), back: 66.68 (ms)
input size(128, 3, 128, 128), reduction: sum, fwd: 0.08 (ms), back: 0.21 (ms)
input size(128, 3, 256, 256), reduction: sum, fwd: 0.21 (ms), back: 0.73 (ms)
input size(128, 3, 512, 512), reduction: sum, fwd: 0.82 (ms), back: 2.86 (ms)
input size(256, 3, 128, 128), reduction: sum, fwd: 0.12 (ms), back: 0.39 (ms)
input size(256, 3, 256, 256), reduction: sum, fwd: 0.42 (ms), back: 1.45 (ms)
input size(256, 3, 512, 512), reduction: sum, fwd: 1.53 (ms), back: 5.66 (ms)
input size(512, 3, 128, 128), reduction: sum, fwd: 0.21 (ms), back: 0.74 (ms)
input size(512, 3, 256, 256), reduction: sum, fwd: 0.78 (ms), back: 2.86 (ms)
input size(512, 3, 512, 512), reduction: sum, fwd: 2.98 (ms), back: 11.23 (ms)
input size(128, 3, 128, 128), reduction: mean, fwd: 0.07 (ms), back: 0.21 (ms)
input size(128, 3, 256, 256), reduction: mean, fwd: 0.21 (ms), back: 0.73 (ms)
input size(128, 3, 512, 512), reduction: mean, fwd: 0.82 (ms), back: 2.86 (ms)
input size(256, 3, 128, 128), reduction: mean, fwd: 0.13 (ms), back: 0.39 (ms)
input size(256, 3, 256, 256), reduction: mean, fwd: 0.42 (ms), back: 1.45 (ms)
input size(256, 3, 512, 512), reduction: mean, fwd: 1.54 (ms), back: 5.65 (ms)
input size(512, 3, 128, 128), reduction: mean, fwd: 0.22 (ms), back: 0.74 (ms)
input size(512, 3, 256, 256), reduction: mean, fwd: 0.78 (ms), back: 2.87 (ms)
input size(512, 3, 512, 512), reduction: mean, fwd: 2.98 (ms), back: 11.23 (ms)
```

</details>

<details>
 <summary>PR results</summary>

```
input size(128, 3, 128, 128), reduction: none, fwd: 0.33 (ms), back: 0.59 (ms)
input size(128, 3, 256, 256), reduction: none, fwd: 2.51 (ms), back: 3.92 (ms)
input size(128, 3, 512, 512), reduction: none, fwd: 14.52 (ms), back: 17.05 (ms)
input size(256, 3, 128, 128), reduction: none, fwd: 1.23 (ms), back: 1.85 (ms)
input size(256, 3, 256, 256), reduction: none, fwd: 7.07 (ms), back: 8.45 (ms)
input size(256, 3, 512, 512), reduction: none, fwd: 29.39 (ms), back: 34.21 (ms)
input size(512, 3, 128, 128), reduction: none, fwd: 3.40 (ms), back: 4.18 (ms)
input size(512, 3, 256, 256), reduction: none, fwd: 14.33 (ms), back: 16.90 (ms)
input size(512, 3, 512, 512), reduction: none, fwd: 59.04 (ms), back: 68.36 (ms)
input size(128, 3, 128, 128), reduction: sum, fwd: 0.07 (ms), back: 0.25 (ms)
input size(128, 3, 256, 256), reduction: sum, fwd: 0.21 (ms), back: 0.86 (ms)
input size(128, 3, 512, 512), reduction: sum, fwd: 0.82 (ms), back: 3.33 (ms)
input size(256, 3, 128, 128), reduction: sum, fwd: 0.12 (ms), back: 0.46 (ms)
input size(256, 3, 256, 256), reduction: sum, fwd: 0.42 (ms), back: 1.70 (ms)
input size(256, 3, 512, 512), reduction: sum, fwd: 1.53 (ms), back: 6.58 (ms)
input size(512, 3, 128, 128), reduction: sum, fwd: 0.21 (ms), back: 0.87 (ms)
input size(512, 3, 256, 256), reduction: sum, fwd: 0.78 (ms), back: 3.34 (ms)
input size(512, 3, 512, 512), reduction: sum, fwd: 2.98 (ms), back: 13.07 (ms)
input size(128, 3, 128, 128), reduction: mean, fwd: 0.07 (ms), back: 0.26 (ms)
input size(128, 3, 256, 256), reduction: mean, fwd: 0.21 (ms), back: 0.86 (ms)
input size(128, 3, 512, 512), reduction: mean, fwd: 0.82 (ms), back: 3.34 (ms)
input size(256, 3, 128, 128), reduction: mean, fwd: 0.12 (ms), back: 0.46 (ms)
input size(256, 3, 256, 256), reduction: mean, fwd: 0.42 (ms), back: 1.72 (ms)
input size(256, 3, 512, 512), reduction: mean, fwd: 1.53 (ms), back: 6.60 (ms)
input size(512, 3, 128, 128), reduction: mean, fwd: 0.21 (ms), back: 0.87 (ms)
input size(512, 3, 256, 256), reduction: mean, fwd: 0.78 (ms), back: 3.33 (ms)
input size(512, 3, 512, 512), reduction: mean, fwd: 2.98 (ms), back: 13.07 (ms)
```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62826

Reviewed By: bdhirsh

Differential Revision: D30282279

Pulled By: ngimel

fbshipit-source-id: 4aa0ff3f8af0632957417931d332ec486a12b52d
2021-08-12 18:07:15 -07:00
219ba6575b add autowrap_functions kwarg to fx.Tracer (#62106)
Summary:
Implements feature request https://github.com/pytorch/pytorch/issues/62021

Test it out with

```python
from torch import fx
from torch import nn

def fx_int(x):
    return int(x)

class MyModule(nn.Module):
    def forward(self, x):
        return fx_int(x.shape[0] / 2)

tracer = fx.Tracer(autowrap_functions=(fx_int,))  # or remove kwarg to demonstrate symbolic trace error
tracer.trace(MyModule())
```

First time contributor, so please advise if I could have done anything to make lives easier for next time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62106

Reviewed By: SplitInfinity, driazati

Differential Revision: D30080834

Pulled By: jamesr66a

fbshipit-source-id: 68fadf8c881ea7930e7afd62b642874010fe4903
2021-08-12 17:38:25 -07:00
7a1ab9f5d7 [fx] store Tracer class on Graph and GraphModule for package deserialization [v2, the re-do] (#63121)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63121

Re-introducing this diff with a small change to ignore setting Tracer classes on GraphModules when the Tracer class is defined not at module-level (prevents pickling).

Previous, reverted Pull Request: https://github.com/pytorch/pytorch/pull/62497

Reviewed By: houseroad

Differential Revision: D30252776

fbshipit-source-id: 42d2bc846e4b32d00563419c38c02b63cd0986e6
2021-08-12 17:28:50 -07:00
988ef190e3 Show warning in eager mode for empty containers (#62978)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/54873

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62978

Reviewed By: navahgar

Differential Revision: D30278343

Pulled By: ansley

fbshipit-source-id: ebb19f7b8a10720f2612b99a2668d1ebbc1f2d16
2021-08-12 16:11:27 -07:00
e182b459d9 [iOS] Add podspec for libTorch-lite nightly build (#62691)
Summary:
The nightly pod version will be aliased with [PyTorch nightly build version](https://l.facebook.com/l.php?u=https%3A%2F%2Fgithub.com%2Fpytorch%2Fpytorch%2Fblob%2Fmaster%2F.circleci%2Fscripts%2Fbinary_populate_env.sh%23L88&h=AT3AeTpSGcz9YVeG7Lr_bweWOv8H2-kAMevglFfMslaZwgEPptNM59WdWj2ZER806rKVLNhQGM5EQcyFC_8xOq334LBo2J6YzgPW2LELkgASlA6UxP2gaD2 (fa22f6303f)Wy5mA6_lu_YlHHbEGPIU7ewJQD1 (2d884f2263)aBSlOy) and [CocoaPods version specification](https://l.facebook.com/l.php?u=https%3A%2F%2Fguides.cocoapods.org%2Fusing%2Fthe-podfile.html%23specifying-pod-versions&h=AT3AeTpSGcz9YVeG7Lr_bweWOv8H2-kAMevglFfMslaZwgEPptNM59WdWj2ZER806rKVLNhQGM5EQcyFC_8xOq334LBo2J6YzgPW2LELkgASlA6UxP2gaD2 (fa22f6303f)Wy5mA6_lu_YlHHbEGPIU7ewJQD1 (2d884f2263)aBSlOy), the version format of the podspect is `PyTorch version + nightly build date`, like `1.10.0.20210812`.

Usage:
1. Add `pod 'LibTorch-Lite-Nightly'` to `Podfile`
2. Run `pod install` to install the nightly built lib
3. Run `pod update` to update the lib to the latest version

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62691

Test Plan:
* Test on [TestApp](https://github.com/pytorch/pytorch/tree/master/ios/TestApp) and [HelloWorld](https://github.com/pytorch/ios-demo-app):
Podfile: `pod 'LibTorch-Lite-Nightly'`

* Test on Private Pod:
{F642106928}

Reviewed By: xta0

Differential Revision: D30090760

Pulled By: hanton

fbshipit-source-id: 361aa2ed24a11d6aced8374cb45f70f49bd5da52
2021-08-12 15:35:14 -07:00
0b89e69e7c [BE] delete GHA generated workflow files before regen (#63148)
Summary:
Unlike circle which all workflow goes in one file, GHA legacy generated files will stay silently in once's PR. e.g. when we change build_environment name and that's not ideal.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63148

Reviewed By: bdhirsh

Differential Revision: D30283382

Pulled By: walterddr

fbshipit-source-id: ffdd5bf9561dd38499052855a12ee5cf838a20b0
2021-08-12 14:43:00 -07:00
ba25527ffc [iOS][GPU] Fix the clamp shader function for x86_64 (#63062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63062

Pervasively, due to the need of supporting 10.0, we used a fp16 version of the clamp kernel on Metal, which didn't work well on x86_64. Since we don't need to support 10.0 anymore, we can use the fp32 version, which works both on arm64 and x86_64.
ghstack-source-id: 135536785

Test Plan:
- `buck test pp-macos`
- Op tests in the playground app

{F641013793}

Reviewed By: husthyc

Differential Revision: D30239931

fbshipit-source-id: 6ad1bf71422b537e052fbd7b7465ba8deb7ca0cf
2021-08-12 13:20:27 -07:00
ed7ece389d Forbid inplace modification of a saved tensor's pack_hook input (#62717)
Summary:
When using saved tensors hooks (especially default hooks),
if the user defines a `pack_hook` that modifies its input,
it can cause some surprising behavior.

The goal of this PR is to prevent future user headache by catching
inplace modifications of the input of `pack_hook` and raising an error if
applicable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62717

Reviewed By: albanD

Differential Revision: D30255243

Pulled By: Varal7

fbshipit-source-id: 8d73f1e1b50b697a59a2849b5e21cf0aa7493b76
2021-08-12 12:40:10 -07:00
aa5141f204 Update CONTRIBUTING.md to remove ProcessGroupAgent (#63160)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63160

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D30284439

Pulled By: H-Huang

fbshipit-source-id: 53c31b6917ef5e2125e146fb0ed73ae3d76a8cf9
2021-08-12 12:26:12 -07:00
96fb1a56ea add use_strict_trace to tensorboard add_graph method (#63120)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63120

FAIM returns dictionaries as the model output, which throws an error when trying to trace using add_graph. Pass in `strict` to the tracer to make this user configurable.

User post: https://fb.workplace.com/groups/pytorchLightning/permalink/1510194972650369/?comment_id=1510252919311241&reply_comment_id=1510281112641755

Test Plan: unit test

Reviewed By: Reubend

Differential Revision: D30265890

fbshipit-source-id: 58b25d9500b875a29a664aa9ef4c1e7f13631fa1
2021-08-12 12:12:12 -07:00
1022443168 Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: revert-hammer

Differential Revision:
D30279364 (b004307252)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e
2021-08-12 11:45:01 -07:00
ed0b8a3e83 LayerNorm Support in autodiff: (#50467)
Summary:
1. extend autodiff by adding entry for layer_norm in symbolic script, we now use native_layer_norm_backward
2. added backward function `layernorm_double_backward` for `native_layer_norm_backward`, preserves double backward support for LayerNorm in autodiff/ScriptModule
3. added python test to verify autodiff on layer_norm with various configuration of optional tensors; (verify the fix in https://github.com/pytorch/pytorch/issues/49430)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50467

Reviewed By: eellison

Differential Revision: D30232864

Pulled By: jansel

fbshipit-source-id: b9c33075386aff96afff7415df9f94388bfb474a

Co-authored-by: Ryan Spring <rspring@nvidia.com>
Co-authored-by: Jie <jiej@nvidia.com>
2021-08-12 11:05:53 -07:00
b004307252 [codemod][lint][fbcode/c*] Enable BLACK by default
Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a
2021-08-12 10:58:35 -07:00
aac3c7bd06 [reland] OpInfo: adaptive_avg_pool2d (#62935)
Summary:
This PR is an attempt to reland https://github.com/pytorch/pytorch/pull/62704.

**What has changed?**

The op has non-deterministic behavior, hence an appropriate `gradcheck` wrapper had to be added.

cc: mruberry zou3519 heitorschueroff kshitij12345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62935

Reviewed By: anjali411

Differential Revision: D30225095

Pulled By: zou3519

fbshipit-source-id: 644873cc21d44b19c8b68f9edff691913778de0e
2021-08-12 09:46:38 -07:00
daba551922 [BE] shorten CI name part2 (#63030)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62357
there's no need to specify cudnn version since they are recommended from cuda version already.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63030

Reviewed By: zhouzhuojie, driazati

Differential Revision: D30226354

Pulled By: walterddr

fbshipit-source-id: 7e2dc577810e0ce80ee27569c25a814566250ab1
2021-08-12 08:14:22 -07:00
eea52b7d47 Skip zero test on windows (#63087)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63087

Test failed on windows unexpectedly see
https://github.com/pytorch/pytorch/issues/63086. Skip for now while we
investigate
ghstack-source-id: 135631811

Test Plan: CI

Reviewed By: ngimel

Differential Revision: D30251300

fbshipit-source-id: 8acb1ea8863c654c171fe989ac24446c321c085d
2021-08-12 00:38:42 -07:00
4d7a12f68b BatchNorm: Use resize_output and empty, instead of empty_like (#63084)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62967

This lets each of the three implementations choose which memory format
to use for the output, meaning channels_last can be used in more cases.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63084

Reviewed By: saketh-are

Differential Revision: D30255740

Pulled By: ngimel

fbshipit-source-id: 48d42850952ec910b29521a1c4e530eb2b29df5e
2021-08-11 23:47:24 -07:00
d5a7579597 [quant] Make version 1 the default for get_default_qat_qconfig (#63043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63043

In version 1 we use the fused module/operator during QAT. Making this the default for all QAT runs going forward.

Older models saved after prepare_qat_fx can still load their state_dict into a model prepared using version 1.
The state_dict will still have the same attribute for the observer/fake_quant modules.

There may be some numerics difference between the old observer code in observer.py and the new fused module that was
re-written in C++/CUDA to perform observe + fake_quantize.

This PR also updates the test to check for the new module instead of the default FakeQuantize module.
Note: there are also some changes to make the operator work for multi-dim per-channel quantization + updated the test for that.

Test Plan:
python test/test_quantization.py TestSerialization.test_default_qat_qconfig

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D30232222

fbshipit-source-id: f3553a1926ab7c663bbeed6d574e30a7e90dfb5b
2021-08-11 22:06:44 -07:00
91525d42d9 Fix sharded tensor tests. (#63054)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63054

1) Ensure these tests are skipped in environments without any GPUs.
2) Add the test to run_test.py
ghstack-source-id: 135595698

Test Plan: waitforbuildbot

Reviewed By: wanchaol

Differential Revision: D30239159

fbshipit-source-id: 21b543ba72e8d10182bc77e7ae1fd34fd4096509
2021-08-11 21:46:45 -07:00
bf7d03ff1f Port log_softmax_backward_data to structured kernel (#62372)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62372

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D30240242

Pulled By: SplitInfinity

fbshipit-source-id: 67d5e4b1543c2e43675e905ce18ca49c11e33748
2021-08-11 21:03:59 -07:00
ba603594fd Port log_softmax to structured kernel (#57374)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57374

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D30240243

Pulled By: SplitInfinity

fbshipit-source-id: de6617c75d16e26d607a884c25b8752b7b561737
2021-08-11 21:02:48 -07:00
d2eda7f2f3 Add ciflow_ruleset.json generator along with gha ci (#63097)
Summary:
- Add `.github/generated-ciflow-ruleset.json` for ciflow-bot (so that we can generate better comments)
- The lint job also checks git dirty to make sure that the file is always in sync with ciflow configs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63097

Reviewed By: saketh-are

Differential Revision: D30263278

Pulled By: zhouzhuojie

fbshipit-source-id: bad68105a228e892ba071b29ecfdf433e1038054
2021-08-11 17:14:40 -07:00
04caef8e1d Improve IMethod::getArgumentNames to deal with empty argument names list (#62947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62947

This diff improved IMethod::getArgumentNames to deal with empty argument names list.

Test Plan:
buck test mode/dev //caffe2/caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor.GetEmptyArgumentNamesValidationMode
buck test mode/dev //caffe2/caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor.GetEmptyArgumentNamesRealMode

Reviewed By: wconstab

Differential Revision: D30179974

fbshipit-source-id: c7aec35c360a73318867c5b77ebfec3affee47e3
2021-08-11 16:44:00 -07:00
5cf32c1d09 Fix Nnapi backend execute's dangling pointer (#63092)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63092

Bug discovered while testing NNAPI Delegate on SparkAR.
Using
```
c10::IntArrayRef order = {0, 2, 3, 1};
fixed_inputs.push_back(tensorInp.get(i).permute(order).contiguous());
```
results in a garbage value for order in `permute()`.
Moving order inside the call to `permute()` fixes this issue. Problem is seemingly related to https://github.com/pytorch/pytorch/issues/44409, but luckily the solution in this case is simple.

Bug wasn't caught earlier, since regular unit tests weren't affected by the dangling pointer, and address sanitizer NNAPI tests are turned off due to there being a different failure (T95764916).
ghstack-source-id: 135526129

Test Plan:
Run Unit tests: `python test/test_jit.py`

Build and run SparkAR on an Android phone at the top of this diff stack (D30173959): `buck build --show-output arstudioplayer_arm64_debug -c pt.enable_nnapi=1`

Reviewed By: raziel, iseeyuan

Differential Revision: D30237504

fbshipit-source-id: c946d81feefc453b43d9295d8d6f509cafdcec03
2021-08-11 14:26:48 -07:00
709ac6853a Fix warnings (#62930)
Summary:
Add `-Wno-writable-strings`(which is clang's flavor of `-Wwrite-strings`) to list of warnings ignored while compiling torch_python.
Avoid unnecessary copies in range loop
Fix number of signed-unsigned comparisons

Found while building locally on M1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62930

Reviewed By: albanD

Differential Revision: D30171981

Pulled By: malfet

fbshipit-source-id: 25bd43dab5675f927ca707e32737ed178b04651e
2021-08-11 14:07:10 -07:00
855e8f2b17 [iOS][GPU] Consolidate array and non-array kernel for upsampling_nearest2d (#63061)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63061

Cleanup the redundant shader code for the upsampling nearest kernel.
ghstack-source-id: 135524349

Test Plan:
- `buck test pp-macos`
- Op tests in PyTorchPlayground app

Reviewed By: husthyc

Differential Revision: D30236905

fbshipit-source-id: e1e001b446452b077e6db719b0519c9070f3300b
2021-08-11 13:29:39 -07:00
456364729e irange-ify 13b (#62476)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62476

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D30001445

fbshipit-source-id: 6f4525338c80e9f929695f47f36ca9c72d96a75d
2021-08-11 13:13:44 -07:00
31c1983603 Add BFloat16 support for unique and unique_consecutive on CPU (#62559)
Summary:
Add BFloat16 support for unique and unique_consecutive on CPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62559

Reviewed By: saketh-are

Differential Revision: D30250675

Pulled By: ngimel

fbshipit-source-id: 26e48f971d87f3b86db237e8ad3a4b74eb3c1def
2021-08-11 12:54:46 -07:00
51a67d3168 Add Github action to upload full source releases (#63022)
Summary:
Those release tarballs include the submodules.
The action is run on every tag, master-branch push but will not upload anything.
This makes sure nothing is broken when an actual release happens.

On created releases the action runs and uploads the tarball

Fixes https://github.com/pytorch/pytorch/issues/62708

As I don't have access rights here and testing is obviously hard (as a new release needs to be published), I set up a test at https://github.com/Flamefire/pytorch/releases/tag/testtag
See also the run(s) at https://github.com/Flamefire/pytorch/actions/workflows/create_release.yml

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63022

Reviewed By: saketh-are

Differential Revision: D30256253

Pulled By: seemethere

fbshipit-source-id: ab5fe131452de14ae3768b91c221e68c536cb3aa
2021-08-11 12:47:17 -07:00
821c1edea9 Embedding thrust->cub: unique (#63042)
Summary:
Followup of https://github.com/pytorch/pytorch/pull/62495

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63042

Reviewed By: saketh-are

Differential Revision: D30231084

Pulled By: ngimel

fbshipit-source-id: 03b0a88107e8a2aee3570881d81bf2b676f525cd
2021-08-11 12:40:36 -07:00
fa22f6303f [PyTorch] Add flop count for addmm (#61895)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61895

* Add FLOP count for addmm, should be `2*m*n*k`.

Share the same code path for `addmm` and `mm`.

Test Plan:
Imported from OSS

`python test/test_profiler.py`
Run a sample profile and check that FLOPS for `aten::addmm` is correct.

`[chowar@devbig053.frc2 ~/local/pytorch/build] ninja bin/test_jit`
`[chowar@devbig053.frc2 ~/local/pytorch/build] ./bin/test_jit --gtest_filter='ComputeFlopsTest*'`

Reviewed By: dskhudia

Differential Revision: D29785671

fbshipit-source-id: d1512036202d7234a981bda897af1f75808ccbfe
2021-08-11 12:33:43 -07:00
fb4ba9e664 XNNPack Input Pointer Caching Comment (#62818)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62818

Added a comment to explain why we no longer need to manually cache pointers/parameters for convolution, as removed in D29777605 (f5c6c3947e)

Test Plan: Sandcastle tests (no code changed)

Reviewed By: kimishpatel

Differential Revision: D30113489

fbshipit-source-id: d697f05816acbd367d59a4aced1925303c683d40
2021-08-11 11:55:42 -07:00
82123758ba _convert_coo_to_csr CPP and CUDA functionality (#61838)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57381 and improves https://github.com/pytorch/pytorch/pull/61340 via dedicated `coo_to_csr` functionalities.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61838

Reviewed By: ezyang

Differential Revision: D30132736

Pulled By: cpuhrsch

fbshipit-source-id: a1fd074c0d70366a524d219a620b94f8bed71d7c
2021-08-11 11:37:20 -07:00
b8e6144e0a Add a _RemoteDevice structure for ShardedTensor/ShardingSpec. (#62927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62927

As part of the ShardedTensor work, we realized we do need some sort of
_RemoteDevice structure that deals with our format of "workername/device" so
that users don't have to worry about parsing this string directly.

Right now this structure is just the bare minimum and is mostly a container for
describing a remote device. It is currently only used in ShardedTensor,
ShardingSpec and RemoteModule.

Once we actually have a consolidated remote device proposal, this class can be
extended appropriately if needed.
ghstack-source-id: 135534086

Test Plan:
1) unit tests
2) waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D30170689

fbshipit-source-id: 1ac2e81c7a597dc40bf3fbf2c1168c382c66649f
2021-08-11 11:27:32 -07:00
b746fed164 [Pytorch Edge] Move RuntimeCompatibilityInfo Factory Method (#63005)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63005

Realized I forgot to move the Runtime half of these functions be within the struct.

Test Plan: ci

Reviewed By: pavithranrao

Differential Revision: D30205521

fbshipit-source-id: ccd87d7d78450dd0dd23ba493bbb9d87be4640a5
2021-08-11 11:15:57 -07:00
3d3ad0a52f [easy] add an inplace argument to MutableNetProto.to_net() and core.Net() constructor (#63068)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63068

The caffe2 core.Net constructor can accept a caffe2_pb2.NetDef proto, but it always creates a copy. This is wasteful when we can prove that the proto being passed to it will not be used anywhere else. So we add an "inplace" argument to the `core.Net` constructor that allows clients to give away ownership of the passed proto without copying. We default this argument to `False`, ensuring that behavior does not change unless explicitly requested.

Test Plan: Let CI run.

Differential Revision: D29976510

fbshipit-source-id: 26e13ca76f3431b8ef0de51f08bbf263491d323e
2021-08-11 11:10:52 -07:00
c090ae291e Fix gha render-test-result mixed failure passthrough (#63056)
Summary:
To fix something like https://github.com/pytorch/pytorch/actions/runs/1114555082

![image](https://user-images.githubusercontent.com/658840/128956528-86997457-5e18-4ae1-83cc-aa7d0ca03c0e.png)

Not sure why `needs.test.result` doesn't capture the `failure` case before, so changed it to `needs.test.result != 'skipped' || failure()`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63056

Reviewed By: walterddr, tktrungna

Differential Revision: D30240112

Pulled By: zhouzhuojie

fbshipit-source-id: d159cc3f79ed5d604ae12583736b37ac28e8d87c
2021-08-11 09:45:31 -07:00
4ea6a3aa74 Fix issues with printing certain torch modules (#62447)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/54420

When I tested on master, with the testing code, there were multiple objects on the garbage collector that cannot be printed.

Testing code:
```
import torch
import gc
import os
import sys

print(torch.__version__)

a = torch.rand(10)

print(a)

objects = gc.get_objects()

for i in range(len(objects)):
   print(objects[i])
```

### 1
```
print(torch.classes)
```

Like SplitInfinity has mentioned in the GitHub issue, the solution here is to set `__file__` for `torch.classes` to something. Similar to [_ops.py](https://github.com/pytorch/pytorch/blob/master/torch/_ops.py#L69), where `__file__` is set to `_ops.py`, we could set `__file__` for torch.classes to `_classes.py`.

### 2
```
print(torch._ops.ops.quantized)
print(torch._ops.ops.atan)
```

When we try to print these two modules, it will call `_OpNamespace::__getattr__`, but the `op_name` is `__file__`. This becomes a problem when `torch._C._jit_get_operation(qualified_op_name)` [(link)](https://github.com/pytorch/pytorch/blob/master/torch/_ops.py#L60) tries to look for an actual op on the native C++ side.

Only when we get the attribute for an actual op, e.g. `print(torch._ops.ops.quantized.elu)`, the `op_name` becomes proper (e.g. `elu`).

My current solution is to return a hardcoded string (i.e. “torch.ops”) if `op_name` is `"__file__"`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62447

Reviewed By: saketh-are

Differential Revision: D30234654

Pulled By: yidawang-oss

fbshipit-source-id: de43a8f599739c749fb3307eea015cc61f1da60e
2021-08-11 09:40:41 -07:00
5c00091f02 Shard python_functions.cpp (#62186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62186

This file takes 6 minutes on its own to compile and is the limiting factor for
building `libtorch_python` on a 32-core threadripper. This splits the file into
5 shards which take around 50 seconds each to compile.

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D29962046

Pulled By: albanD

fbshipit-source-id: df13cfaebd54296f10609f67ae74a850c329bd37
2021-08-11 09:21:26 -07:00
c5de83adca Fix inconsisteny between Python and JIT power operation (#62842)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62842

Test Plan:
Wrote unit test TestAtenPow to test behavior of aten::pow when:
1. base is int, exponent is int
2. base is int, exponent is float
3. base is float, exponent is int
4. base is float, exponent is float

Specifically, we test that when base is zero and exponent is negative, we raise error. In all other cases, we expect behavior to be the same as the result returned by Python.

It is because the cpp code relies on overloading, we need to make sure all combinations of types give us the expected result.

Reviewed By: zhxchen17

Differential Revision: D30146115

Pulled By: szewaiyuen7

fbshipit-source-id: dc661897ad38da286ee454120fbe41314b7f2995
2021-08-11 08:41:46 -07:00
f446e835ee Fix CUDA_KERNEL_ASSERT ambiguous symbol in NDEBUG mode (#62527)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62527

If NDEBUG is applied inconsistently in compilation we might get 'ambiguous declaration' error. Let's make sure that the forward declaration matches glibc including all specifiers.

Test Plan: sandcastle

Reviewed By: mdschatz

Differential Revision: D30030051

fbshipit-source-id: 9f4d5f1d4e74f0a4eaeeaaaad76b93ee485d8bcd
2021-08-11 01:10:09 -07:00
f7611b31aa [4/N] Enable opt-asan for distributed unit tests. (#62051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62051

The goal here is to enable opt-asan for "spawn" based unit tests since
this works for "spawn" unlike "dev-asan". As a result, we can run ASAN for
"spawn" unit tests as well.

This means we can completely remove fork unit tests from the code base since
the only purpose for these tests was to run ASAN.
ghstack-source-id: 135523770

Test Plan: waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D29854514

fbshipit-source-id: 02a5bfcfae2afc21badecff77082c7a6ad83636b
2021-08-10 22:38:31 -07:00
847a7cfa10 Back out "[fx] store Tracer class on Graph and GraphModule for package deserialization" (#63053)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63053

Original commit changeset: eca09424ad30

The original diff - D30019214 (6286d33878) breaks the publish flow in model saving.

Test Plan: ci

Differential Revision: D30236517

fbshipit-source-id: 3e05db02fc1cbbc2ed262c83bf56d555277abb34
2021-08-10 21:58:08 -07:00
324673a537 rebase for autocast updates to include device_type and dtype flags (#61002)
Summary:
Fixes #{55374}
https://github.com/pytorch/pytorch/issues/55374

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61002

Reviewed By: malfet, mruberry

Differential Revision: D30016812

Pulled By: ngimel

fbshipit-source-id: 6e09a29f539d28e9aea5cd9489b1e633cc588033
2021-08-10 20:03:12 -07:00
a55cae3d37 Fix missing element types and shapes when autograd.Function has multiple tensor outputs (#57966)
Summary:
When generating IR for autograd.Function, if the function has multiple outputs, a TupleUnpack may be inserted after the original function node, and Pytorch only assigns proper information (tensor element type and shape) to the TupleUnpack and forgets the original function node. In contrast, if autograd.Function only produces one output, the original function node may have tensor
element type and shape in its output schema.

Before this PR:
- (simplified) IR for autograd.Function with one output: input (tensor, dtype=float32, shape=[2, 3]) -> PythonOp -> output (tensor, dtype=float32, shape=[4, 5])
- (simplified) IR for autograd.Function with one output: input (tensor, dtype=float32, shape=[2, 3]) -> PythonOp -> output_0 **(tensor)**, output_1 **(tensor)** -> TupleUnpack output_2 (tensor, dtype=float32, shape=[4, 5]), output_3 (tensor, dtype=float32, shape=[6, 7])

After this PR:
- (simplified) IR for autograd.Function with one output: input (tensor, dtype=float32, shape=[2, 3]) -> PythonOp -> output (tensor, dtype=float32, shape=[4, 5])
- (simplified) IR for autograd.Function with one output: input (tensor, dtype=float32, shape=[2, 3]) -> PythonOp ->output_0 **(tensor, dtype=float32, shape=[4, 5])**, output_1 **(tensor, dtype=float32, shape=[6, 7])** -> TupleUnpack output_2 (tensor, dtype=float32, shape=[4, 5]), output_3 (tensor, dtype=float32, shape=[6, 7])

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57966

Reviewed By: zhxchen17

Differential Revision: D30208207

Pulled By: gmagogsfm

fbshipit-source-id: 42a3d1f9c0932133112a85df0c49cf4ea0afa175
2021-08-10 19:48:11 -07:00
390c0ac403 remove dead code (#63031)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63031

Reviewed By: mruberry

Differential Revision: D30225094

Pulled By: ngimel

fbshipit-source-id: 3666a0fa120bea85225cd3ee04f89d64952d2862
2021-08-10 18:41:13 -07:00
94c5309369 Revert D30199482: [pytorch][PR] Add BFloat16 support for unique and unique_consecutive on CPU
Test Plan: revert-hammer

Differential Revision:
D30199482 (fc0b8e6033)

Original commit changeset: 6f2d9cc1a528

fbshipit-source-id: 39e9f202bcbd978525f792173d4f97b5b329b5b1
2021-08-10 18:27:18 -07:00
d1f9c03cef Use const auto with irange (#62990)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62990

Test Plan: Sandcastle

Reviewed By: zhouzhuojie

Differential Revision: D30199748

fbshipit-source-id: 284b208ffa3c6c4749e5ac9b1fccb28914590f2c
2021-08-10 17:59:01 -07:00
d893b44cd8 change nccl version reporting (#62916)
Summary:
https://github.com/pytorch/pytorch/issues/62295

Previously the packing and unpacking of the NCCL version "integer" was done to have parity with the upstream NCCL version encoding. However, there doesn't seem to be any place where this integer is directly compared with a version integer sourced from upstream NCCL, and syncing the encoding seems to be error-prone (e.g., a recent change where a special case was added for minor versions >= 10 7e51592129/src/nccl.h.in (L22)).

This patch changes the reporting to return a tuple of version numbers instead (to preserve ease-of-use for comparisons) and tweaks the passing between C/Python to avoid the digit overflow problem.

CC ngimel mcarilli

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62916

Reviewed By: anjali411

Differential Revision: D30201069

Pulled By: mrshenli

fbshipit-source-id: 2e4e7c69f001c3f22bd04aa6df6a992e538bea45
2021-08-10 17:46:27 -07:00
f307120df4 Update test_torch_deploy (#62838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62838

Fixes #62380

* update test functions to call wheel install folder {sitepackages}/torch instead of build/ folder
* add symbolic link for shared libraries which are called by the tests (this is a bit hacky and should be fixed the rpath before compiling -- similar to https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L204-L208).

### Test plan
check if all ci workflows pass

Test Plan: Imported from OSS

Reviewed By: walterddr

Differential Revision: D30193141

Pulled By: tktrungna

fbshipit-source-id: 72c2bd3a740fca0f72e4803df505240193692c44
2021-08-10 16:29:50 -07:00
af6ed084b4 update test_libtorch (#62797)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62797

Fixes #62380

* update test functions to call wheel install folder {sitepackages}/torch instead of build/ folder
* add symbolic link for shared libraries which are called by the tests (this is a bit hacky and should be fixed the rpath before compiling -- similar to https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L204-L208).

### Test plan
check if all ci workflows pass

Test Plan: Imported from OSS

Reviewed By: walterddr

Differential Revision: D30193140

Pulled By: tktrungna

fbshipit-source-id: d8e54c403f42abbbbe4556abf40c22a7955df737
2021-08-10 16:29:48 -07:00
2f5ac9c0ba update test distributed (#62796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62796

Fixes #62380

* update test functions to call wheel install folder {sitepackages}/torch instead of build/ folder
* add symbolic link for shared libraries which are called by the tests (this is a bit hacky and should be fixed the rpath before compiling -- similar to https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L204-L208).

### Test plan
check if all ci workflows pass

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30193142

Pulled By: tktrungna

fbshipit-source-id: 1247f9eda1c11c763c31c7383c77545b1ead1a60
2021-08-10 16:29:47 -07:00
dfe8445cd7 update test_vulkan (#62795)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62795

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30124421

Pulled By: tktrungna

fbshipit-source-id: 235ba166b02f7334e89cb2493024067851bf5b9b
2021-08-10 16:29:45 -07:00
25c3b9dc10 update test_rpc (#62781)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62781

Test Plan: Imported from OSS

Reviewed By: walterddr, zhouzhuojie

Differential Revision: D30124391

Pulled By: tktrungna

fbshipit-source-id: 99c275d6c9f23b4f274fd0ca19a16879ed27afd5
2021-08-10 16:28:35 -07:00
f807229fd4 [ONNX] add support for prim::Unitialized in lower_tuples pass (#56912)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56911

Code from issue generates this Torchscript:
```
graph(%self : __torch__.MyModule,
      %t.1 : Tensor):
  %12 : None = prim::Constant()
  %7 : str = prim::Constant[value="Negative input"]() # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:11:28
  %3 : int = prim::Constant[value=0]() # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:15
  %9 : int = prim::Constant[value=5]() # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:13:31
  %33 : (Tensor, Tensor) = prim::Uninitialized()
  %4 : Tensor = aten::lt(%t.1, %3) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:11
  %6 : bool = aten::Bool(%4) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:11
  %34 : (Tensor, Tensor) = prim::If(%6) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:8
    block0():
       = prim::RaiseException(%7) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:11:12
      -> (%33)
    block1():
      %11 : int[] = prim::ListConstruct(%9)
      %16 : Tensor = aten::zeros(%11, %12, %12, %12, %12) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:13:19
      %18 : int[] = prim::ListConstruct(%9)
      %23 : Tensor = aten::zeros(%18, %12, %12, %12, %12) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:13:35
      %24 : (Tensor, Tensor) = prim::TupleConstruct(%16, %23)
      -> (%24)
  return (%34)
```

Problem is that onnx exporter during lower_tuples pass doesn't support forwarding of tuples in prim::Unitialized.
Solution is:
1. add prim::Unitialized to supported_op in lower_tuples pass
1. As prim::Unitialized has now multiple outputs, we should call giveFreshAlias for every output

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56912

Reviewed By: nikithamalgifb

Differential Revision: D29837200

Pulled By: SplitInfinity

fbshipit-source-id: 321fae6fe52b1523df5653dbb9ea73b998ef1cda
2021-08-10 16:21:16 -07:00
4d0497034c Remove process_group_agent and faulty_process_group_agent files (#62985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62985

Remove the process_group_agent and faulty_process_group_agent code now that PROCESS_GROUP backend has been deprecated for RPC (https://github.com/pytorch/pytorch/issues/55615). Discussed with xush6528 that it was okay to remove ProcessGroupAgentTest and ProcessGroupAgentBench which depended on process_group_agent.

Test Plan: CI tests

Reviewed By: pritamdamania87

Differential Revision: D30195576

fbshipit-source-id: 8b4381cffadb868b19d481198015d0a67b205811
2021-08-10 15:57:39 -07:00
790553811c fix sort and topk with discontiguous out (#63029)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62645 and https://github.com/pytorch/pytorch/issues/62940. The root cause of those bugs is in the bad interaction between `collapseDims` and setting the size of sorting/topK dimension to 1. If all other dimensions happen to be 1, `collapseDims` thinks that that `1` dimension is collapsible (even though it was specifically marked to be preserved) and loses its stride information. If dimension was really of size 1, the stride information would be unimportant, but since in reality that dimension is not 1 and was set to 1 for convenience, the loss of stride information results in incorrect outputs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63029

Reviewed By: heitorschueroff

Differential Revision: D30224925

Pulled By: ngimel

fbshipit-source-id: 269dd375c5cd57c6007fe91f729f8c60a2e7a264
2021-08-10 15:45:28 -07:00
500b24e303 [iOS] enable Metal in the nightly build (#62855)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62855

Test Plan: Test on Private Pod with the [HelloWorld](https://fburl.com/3hiwkkhm) demo

Reviewed By: xta0

Differential Revision: D30174151

Pulled By: hanton

fbshipit-source-id: 22cd8663ac239811bf8ed1c3b6301460d798dbfa
2021-08-10 15:18:58 -07:00
3beb65d45d test_cudnn_convolution_relu skipCUDAIfRocm
Summary: skip rocm test for test_cudnn_convolution_relu

Test Plan: This skips a test

Reviewed By: ngimel

Differential Revision: D30233620

fbshipit-source-id: 31eab8b03c3f15674e0d262a8f55965c1aa6b809
2021-08-10 15:15:23 -07:00
557047eb4c Add docstring for saved tensors default hooks (#62361)
Summary:
Add documentation for the saved tensors default hooks introduced in https://github.com/pytorch/pytorch/issues/61834 / https://github.com/pytorch/pytorch/issues/62563

Sister PR: https://github.com/pytorch/pytorch/issues/62362 (will add a link from autograd.rst to notes/autograd in whatever PR does not land first)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62361

Reviewed By: zou3519

Differential Revision: D30081997

Pulled By: Varal7

fbshipit-source-id: cb923e943e1d96db9669c1d863d693af30910c62
2021-08-10 14:59:38 -07:00
dbb7be2e79 [iOS][CI] Store every version of nightlies in S3 (#63039)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63039

Test Plan: Imported from OSS

Reviewed By: hanton

Differential Revision: D30229385

Pulled By: xta0

fbshipit-source-id: 15b438a6326159258803ab97e67dc9ec5db50d59
2021-08-10 14:33:36 -07:00
990c2190d1 [quant][graphmode] Reference pattern support for elu (#62607)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62607

Removing the quantize handler for elu since it can be covered by DefaultNodeQuantizeHandler

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: iramazanli

Differential Revision: D30053977

fbshipit-source-id: 426789443e928bb01a88907de616cbda5866f621
2021-08-10 14:00:39 -07:00
f836c4f8bd [fix] TestMultiThreadAutograd: propagate exception from child thread to main thread (#63018)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62895

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63018

Reviewed By: anjali411

Differential Revision: D30225856

Pulled By: Varal7

fbshipit-source-id: b5dd7999de5060e06f8958ea3ce49e0b74110971
2021-08-10 13:56:49 -07:00
bfa67264d1 [1/N] Nnapi backend execute and compile (#62272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62272

Added Android NNAPI delegate implementation of runtime initialization (compilation) and execution.
The delegate's preprocess step was [previously implemented](https://github.com/pytorch/pytorch/pull/62225). Now, the reset of the delegate, which implements client-side execution, is added.

**nnapi_backend_lib.cpp**:
Implementation of delegate's compile and execute.
`execute()` is essentially a C++ implementation of [`NnapiModule`](https://github.com/pytorch/pytorch/blob/master/torch/backends/_nnapi/prepare.py), which wraps an NNAPI Compilation and handles preparation of weights, inputs, and outputs.
- Any steps that can be done before execution are moved to `compile()`.
    - `init()` cannot be moved to `compile()` because it requires real inputs for dynamic shaping.
    - `shape_compute_module` cannot currently be deserialized in `compile()`, since mobile::Module has no IValue conversion.
- Processed arguments that are modified by `init()` must be kept as member variables. Any other processed arguments are passed through a dictionary, `handles`.

**nnapi_bind.cpp & nnapi_bind.h**:
Created a header file for `nnapi_bind.cpp`, so that it's NnapiCompilation class can be used by `nnapi_backend_lib.cpp`.
**test_backend_nnapi.py**:
Enabled execution testing.
ghstack-source-id: 135432844

Test Plan:
Imported from OSS

Tested on devserver.
1. Load and unpack a special devserver build of NNAPI: `jf download GICWmAAzUR0eo20TAPasVts8ObhobsIXAAAz --file "nnapi-host-linux.tar.xz"`
2. `export LIBNEURALNETWORKS_PATH=/path/to/libneuralnetworks.so`
3. Run unittests: `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py`

TODO: test with lite interpreter runtime

Reviewed By: raziel, iseeyuan

Differential Revision: D29944873

fbshipit-source-id: 48967d873e79ef2cce9bcba2aeea3c52f7a18c07
2021-08-10 13:37:39 -07:00
fc0b8e6033 Add BFloat16 support for unique and unique_consecutive on CPU (#62559)
Summary:
Add BFloat16 support for unique and unique_consecutive on CPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62559

Reviewed By: anjali411

Differential Revision: D30199482

Pulled By: ngimel

fbshipit-source-id: 6f2d9cc1a528bea7c723139a4f1b14e4b2213601
2021-08-10 13:22:54 -07:00
cb7f35d47a [quant][refactor] Checking activation_dtype instead of activation_post_process (#62489)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62489

Addressing comment from previous PR: https://github.com/pytorch/pytorch/pull/62374#discussion_r679354145

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: iramazanli

Differential Revision: D30053980

fbshipit-source-id: 79c216410282eccd6f0a8f24e38c55c4d18ec0d0
2021-08-10 12:17:36 -07:00
6d21e36f21 LU solve uses cuBLAS and cuSOLVER for matrices with dim > 1024 (#61815)
Summary:
This PR builds off of https://github.com/pytorch/pytorch/issues/59148 and modifies the `lu_solve` routine to avoid MAGMA for `b` or `lu_data` matrices with any dimension > 1024, since MAGMA has a bug when dealing with such matrices (https://bitbucket.org/icl/magma/issues/19/dgesv_batched-dgetrs_batched-fails-for).
Fixes https://github.com/pytorch/pytorch/issues/36921
Fixes https://github.com/pytorch/pytorch/issues/61929

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61815

Reviewed By: anjali411

Differential Revision: D30199618

Pulled By: ngimel

fbshipit-source-id: 06870793f697e9c35aaaa8254b8a8b1a38bd3aa9
2021-08-10 11:07:16 -07:00
0c39cea3d2 [sharded_tensor] add default fields to ShardedTensorMetadata (#62867)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62867

This add default fields for ShardedTensorMetadata, to allow easy construction and modification afterwards.
ghstack-source-id: 135284133

Test Plan: ShardedTensorMetadata validity should be guarded with `init_from_local_shards` API and its tests.

Reviewed By: pritamdamania87

Differential Revision: D30148481

fbshipit-source-id: 0d99f41f23dbeb4201a36109556ba23b9a6c6fb1
2021-08-10 11:00:01 -07:00
5fb79f61a8 [DDP] Dont set thread local state in reducer autograd hook. (#62996)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62996

No need to set this because autograd engine already propagates TLS
states.
ghstack-source-id: 135438220

Test Plan: CI

Reviewed By: albanD

Differential Revision: D30202078

fbshipit-source-id: e5e917269a03afd7a6b8e61f28b45cdb71ac3e64
2021-08-10 10:50:16 -07:00
6915bc0781 [typing] suppress errors in fbcode/caffe2 - batch 2
Test Plan: Sandcastle

Differential Revision: D30222378

fbshipit-source-id: 6a0a5d210266f19de63273240a080365c9143eb0
2021-08-10 10:26:52 -07:00
ea808df25d Test shape analysis with opinfos (#59814)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59814

Using opinfos to test shape analysis. By default, we just check that we don't give incorrect answers, and then if `assert_jit_shape_analysis` is true, tests that we correctly propagates the full shape. and it found a couple bugs {emoji:1f603}

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D30200058

Pulled By: eellison

fbshipit-source-id: 6226be87f5390277cfa5a1fffaa1b072d4bc8803
2021-08-10 09:47:33 -07:00
7312bd953c add ssupport for a few more opinfos in jit (#59812)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59812

This is sort of a half measure: we can successfully trace through opinfos which are registered as lambdas, we just can't script them. This tests if the op is a lambda in which case bails... see the next PR to get resize_ to work, maybe this should be consolidated with that...

Test Plan: Imported from OSS

Reviewed By: pbelevich, zhxchen17

Differential Revision: D30200061

Pulled By: eellison

fbshipit-source-id: 7e3c9b0be746b16f0f57ece49f6fbe20bf6535ec
2021-08-10 09:47:32 -07:00
9cbdc90d73 Don't substitute in symbolic shapes to shape compute graph (#59811)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59811

We don't want to actually substitute in symbolic shapes, because it invalidates the partially evaluated graph for further use.

Test Plan: Imported from OSS

Reviewed By: pbelevich, zhxchen17

Differential Revision: D30200059

Pulled By: eellison

fbshipit-source-id: 267ed97d8421fe480dec494cdf0dec9cf9ed3ba2
2021-08-10 09:47:30 -07:00
7db0bcfb40 small cleanups (#59810)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59810

Rephrasings and cleanup of dead code

Test Plan: Imported from OSS

Reviewed By: pbelevich, zhxchen17

Differential Revision: D30200062

Pulled By: eellison

fbshipit-source-id: b03e5adb928aa46bee6685667cad43333b6e6016
2021-08-10 09:47:28 -07:00
9cd990de0d Only optimize after change (redo) (#59809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59809

Some how this didnt get landed previously in ghstack mixup

Test Plan: Imported from OSS

Reviewed By: pbelevich, zhxchen17

Differential Revision: D30200060

Pulled By: eellison

fbshipit-source-id: 47f256421a1fe1a005cd11fcc4d7f023b5990834
2021-08-10 09:46:13 -07:00
4c630773e8 [jit] warn if _check_overload_body fails to find source
Summary:
Under certain conditions (particularly if a module is frozen, like with
PyInstaller or torch::deploy), we will not have source code available for
functions. `import torch` should still work in this case, but this check is
currently causing it to raise an exception.

Since this is an initial check (if an overload is actually exercised there will
be hard failure), raise a warning and move on.

Test Plan: unit tests

Reviewed By: eellison

Differential Revision: D30214271

fbshipit-source-id: eb021503e416268e8585e0708d6271c1e7b91e95
2021-08-10 09:28:50 -07:00
aa89d5f7f6 [quant] Update get_default_qat_qconfig to return the fused observer+fake_quant module (#62702)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62702

Expose the qconfig to the user to speed up training by leveraging the fused module.
The module currently supports per-tensor/per-channel moving avg observer and fake-quantize.

For details on perf benefits, refer to https://github.com/pytorch/pytorch/pull/61691

Test Plan: Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D30093719

fbshipit-source-id: b78deb7810f5b597474b9b9a0395d361d04eb46a
2021-08-10 09:28:49 -07:00
08d1a12d69 [quant] add reduce_range option to FusedMovingAvgFakeQuantize module (#62863)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62863

To make this consistent with other observers, add reduce_range option that can be used to update quant_min/max

Test Plan:
python test/test_quantization.py test_fused_mod_reduce_range

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D30146602

fbshipit-source-id: a2015f095766f9c884611e9ab6942528bc9bc972
2021-08-10 09:27:01 -07:00
978490d7c7 Codegen: Fix operator::name on windows (#62278)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62278

In `Operators.h` we're using `str(BaseOperatorName)`, while in
`OperatorsEverything.cpp` we're using `str(OperatorName)`. e.g.
```
STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(name, "aten::abs")
```
vs
```
STATIC_CONST_STR_OUT_OF_LINE_FOR_WIN_CUDA(abs_out, name, "aten::abs.out")
```

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D29962047

Pulled By: albanD

fbshipit-source-id: 5a05b898fc734a4751c2b0187e4eeea4efb0502b
2021-08-10 07:58:09 -07:00
cdf702b60c Reject kwonly arguments passed positionally in torch.ops (#62981)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62981

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: Chillee

Differential Revision: D30211030

Pulled By: ezyang

fbshipit-source-id: aae426592e92bf3a50076f470e153a4ae7d6f101
2021-08-10 07:16:00 -07:00
9e7b6bb69f Allow LocalResponseNorm to accept 0 dim batch sizes (#62801)
Summary:
This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in  https://github.com/pytorch/pytorch/issues/38115.

This PR allows `LocalResponseNorm` to accept tensors with 0 dimensional batch size.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62801

Reviewed By: zou3519

Differential Revision: D30165282

Pulled By: jbschlosser

fbshipit-source-id: cce0b2d12dbf47dc8ed6247c267bf2f2305f858a
2021-08-10 06:54:52 -07:00
061062ae2a Update TensorPipe submodule
Test Plan: CI ran as part of https://github.com/pytorch/pytorch/pull/60938.

Reviewed By: beauby

Differential Revision: D30219343

fbshipit-source-id: 531338f912fee488d312d23da8bda63ceb862aa9
2021-08-10 05:46:12 -07:00
3df4870343 [Reland][DDP] Support not all outputs used in loss calculation (#61753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61753

Reland of https://github.com/pytorch/pytorch/pull/57081.
Main difference is that the former diff moved `prepare_for_backward` check into `DDPSink` backward, but that resulted in issues due to potential autograd engine races. The original diff moved `prepare_for_backward` into `DDPSink` as part of a long-term plan to always call it within `DDPSink`.

In particular this doesn't work because `prepare_for_backward` sets `expect_autograd_hooks=true` which enables autograd hooks to fire, but there were several use cases internally where autograd hooks were called before DDPSink called `prepare_for_backward`, resulting in errors/regression.

We instead keep the call to `prepare_for_backward` in the forward pass, but still run outputs through `DDPSink` when find_unused_parameters=True. As a result, outputs that are not used when computing loss have `None` gradients and we don't touch them if they are globally `None`. Note that the hooks still fire with a undefined gradient which is how we avoid the Reducer erroring out with the message that some hooks did not fire.

Added the unittests that were part of the reverted diff.
ghstack-source-id: 135388925

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D29726179

fbshipit-source-id: 54c8819e0aa72c61554104723a5b9c936501e719
2021-08-09 22:29:11 -07:00
5ed6e4429e To fix variance computation for complex Adam (#62946)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59998

It has been discussed in the issue that the variance term of Adam optimizer currently doesn't compute correctly for complex domain.  As it has been stated in the Generalization to Complex numbers section  in https://en.wikipedia.org/wiki/Variance variance is computed as E[(X - mu)(X-mu)*] (where mu = E[X] and * stands for conjugate) for complex random variable X.

However, currently the computation method in implementation of Adam is via E[(X - mu)(X-mu)] which doesn't return right variance value, in particular it returns complex number. Variance is defined to be real number even though underlying random variable is complex.

We fix this issue here, and testing that resulting variance is indeed real number.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62946

Reviewed By: albanD

Differential Revision: D30196038

Pulled By: iramazanli

fbshipit-source-id: ab0a6f31658aeb56bdcb211ff86eaa29f3f0d718
2021-08-09 17:54:43 -07:00
3c1d1170a4 [quant][graphmode][fx] Attach a weight qparam dict to linear and conv in reference quantized model (#62488)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62488

Instead of attaching weight observer/fake_quant to the float linear and conv, we can
compute the quantization parameters and attach that as a dictionary to these modules so
that we can reduce the model size and make the reference module clearer

TODO: the numerics for linear and conv in reference quantized model is still not correct since
we did not quantize weight, we may explore things like parameterization to implement this support

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30053979

fbshipit-source-id: b5f8497cf6cf65eec924df2d8fb10a9e154b8cab
2021-08-09 16:55:14 -07:00
59ac451ba3 Simplify the logic of running ci workflow codegen (#62853)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62853

wanted to simplify the logic in the `__post_int__`, and delegate the settings back to individual workflows, this gives us more flexibility in changing individual workflows, as well as reducing the complexity of understanding the mutation conditions.

Test Plan: Imported from OSS

Reviewed By: walterddr, seemethere

Differential Revision: D30149190

Pulled By: zhouzhuojie

fbshipit-source-id: 44df5b1e14184f3a81cb8004151525d0e0fb20d9
2021-08-09 16:47:46 -07:00
8720369a48 irange-ify 12b (#62484)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62484

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D30015528

fbshipit-source-id: c4e1a5425a73f100102a97dcec1579f1049c9c1d
2021-08-09 16:40:47 -07:00
93e0f3a330 Shard Operators.cpp (#62185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62185

This file can take 5 minutes on its own to compile, and is the single limiting
factor for compile time of `libtorch_cpu` on a 32-core threadripper. Instead,
sharding into 5 files that take around 1 minute each cuts a full minute off the
overall build time.

This also factors out the `.findSchemaOrThrow(...).typed` step so the code can
be shared between `call` and `redispatch`.

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D29962049

Pulled By: albanD

fbshipit-source-id: be5df05fbea09ada0d825855f1618c25a11abbd8
2021-08-09 16:19:49 -07:00
4b9ca72c7c irange-ify 13d (#62477)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62477

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D30001499

fbshipit-source-id: 993eb2b39f332ff0ae6c663792bd04734cfc262b
2021-08-09 16:16:58 -07:00
d16587f84d Enable rebuilds for Ninja on Windows (#62948)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59859.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62948

Reviewed By: seemethere, tktrungna

Differential Revision: D30192246

Pulled By: janeyx99

fbshipit-source-id: af25cc4bf0db67a1304d9971cfa0ff6831bb3b48
2021-08-09 16:15:45 -07:00
a82b9ef1ff BFP16 quantization/dequantization (#62974)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62974

Testing the functionality of `tensor.to` approach.
Comparing `tensor.to` and `torch.ops.fb.FloatToBfloat16Quantized` approach and testing if they match for 2d tensors.

Test Plan: buck test //torchrec/fb/distributed/tests:test_quantized_comms

Reviewed By: wanchaol

Differential Revision: D30079121

fbshipit-source-id: 612e92baeb2245449637faa9bc31686353d67033
2021-08-09 15:47:07 -07:00
c4aeecac75 Migrate Embedding thrust sort to cub sort (#62495)
Summary:
This PR only migrates sort. Other thrust operations will be migrated in followup PRs

Benchmark `num_embeddings` pulled from https://github.com/huggingface/transformers/tree/master/examples by
```
grep -P 'vocab_size.*(=|:)\s*[0-9]+' -r transformers/examples/
grep -P 'hidden_size.*(=|:)\s*[0-9]+' -r transformers/examples/
```
to get `vocab_size = 119547, 50265, 32000, 8000, 3052` (similar size omitted) and `hidden_size = 512, 768`

Code:
```python
import torch
import itertools

num_embeddings = (119547, 50265, 32000, 8000, 3052)
num_tokens = (4096, 16384)
hidden_sizes = (512, 768)

for ne, nt, nh in itertools.product(num_embeddings, num_tokens, hidden_sizes):
    print(f"Embedding size: {ne}, Tokens: {nt}, Hidden size: {nh}")
    embedding = torch.nn.Embedding(ne, nh).cuda()
    input_ = torch.randint(ne, (nt,), device='cuda')
    out = embedding(input_)
    torch.cuda.synchronize()
    %timeit out.backward(out, retain_graph=True); torch.cuda.synchronize()
```

## On CUDA 11.3.1

Before:
```
Embedding size: 119547, Tokens: 4096, Hidden size: 512
1.43 ms ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 119547, Tokens: 4096, Hidden size: 768
2.07 ms ± 56.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Embedding size: 119547, Tokens: 16384, Hidden size: 512
1.61 ms ± 2.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 119547, Tokens: 16384, Hidden size: 768
2.32 ms ± 8.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Embedding size: 50265, Tokens: 4096, Hidden size: 512
738 µs ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 50265, Tokens: 4096, Hidden size: 768
1.02 ms ± 1.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 50265, Tokens: 16384, Hidden size: 512
913 µs ± 3.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 50265, Tokens: 16384, Hidden size: 768
1.27 ms ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 32000, Tokens: 4096, Hidden size: 512
559 µs ± 860 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 32000, Tokens: 4096, Hidden size: 768
743 µs ± 630 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 32000, Tokens: 16384, Hidden size: 512
713 µs ± 969 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 32000, Tokens: 16384, Hidden size: 768
977 µs ± 884 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 8000, Tokens: 4096, Hidden size: 512
301 µs ± 8.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 8000, Tokens: 4096, Hidden size: 768
383 µs ± 4.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 8000, Tokens: 16384, Hidden size: 512
409 µs ± 1.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 8000, Tokens: 16384, Hidden size: 768
515 µs ± 766 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 3052, Tokens: 4096, Hidden size: 512
215 µs ± 1.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 3052, Tokens: 4096, Hidden size: 768
250 µs ± 320 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 3052, Tokens: 16384, Hidden size: 512
271 µs ± 888 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 3052, Tokens: 16384, Hidden size: 768
325 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

After:
```
Embedding size: 119547, Tokens: 4096, Hidden size: 512
1.42 ms ± 1.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 119547, Tokens: 4096, Hidden size: 768
2.05 ms ± 9.93 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Embedding size: 119547, Tokens: 16384, Hidden size: 512
1.6 ms ± 3.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 119547, Tokens: 16384, Hidden size: 768
2.3 ms ± 3.67 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Embedding size: 50265, Tokens: 4096, Hidden size: 512
730 µs ± 811 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 50265, Tokens: 4096, Hidden size: 768
1.01 ms ± 2.71 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 50265, Tokens: 16384, Hidden size: 512
887 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 50265, Tokens: 16384, Hidden size: 768
1.25 ms ± 2.74 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 32000, Tokens: 4096, Hidden size: 512
556 µs ± 1.86 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 32000, Tokens: 4096, Hidden size: 768
744 µs ± 4.44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 32000, Tokens: 16384, Hidden size: 512
691 µs ± 570 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 32000, Tokens: 16384, Hidden size: 768
957 µs ± 2.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 8000, Tokens: 4096, Hidden size: 512
309 µs ± 2.84 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 8000, Tokens: 4096, Hidden size: 768
376 µs ± 2.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 8000, Tokens: 16384, Hidden size: 512
381 µs ± 1.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 8000, Tokens: 16384, Hidden size: 768
487 µs ± 2.42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 3052, Tokens: 4096, Hidden size: 512
202 µs ± 383 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 3052, Tokens: 4096, Hidden size: 768
239 µs ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 3052, Tokens: 16384, Hidden size: 512
243 µs ± 1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 3052, Tokens: 16384, Hidden size: 768
340 µs ± 2.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

## On CUDA 11.1

Before:
```
Embedding size: 119547, Tokens: 4096, Hidden size: 512
1.41 ms ± 14.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 119547, Tokens: 4096, Hidden size: 768
2.05 ms ± 7.61 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Embedding size: 119547, Tokens: 16384, Hidden size: 512
1.61 ms ± 1.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 119547, Tokens: 16384, Hidden size: 768
2.32 ms ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Embedding size: 50265, Tokens: 4096, Hidden size: 512
743 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 50265, Tokens: 4096, Hidden size: 768
1.02 ms ± 2.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 50265, Tokens: 16384, Hidden size: 512
912 µs ± 5.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 50265, Tokens: 16384, Hidden size: 768
1.28 ms ± 6.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 32000, Tokens: 4096, Hidden size: 512
555 µs ± 2.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 32000, Tokens: 4096, Hidden size: 768
743 µs ± 655 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 32000, Tokens: 16384, Hidden size: 512
714 µs ± 1.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 32000, Tokens: 16384, Hidden size: 768
980 µs ± 1.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 8000, Tokens: 4096, Hidden size: 512
312 µs ± 396 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 8000, Tokens: 4096, Hidden size: 768
386 µs ± 2.32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 8000, Tokens: 16384, Hidden size: 512
413 µs ± 3.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 8000, Tokens: 16384, Hidden size: 768
512 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 3052, Tokens: 4096, Hidden size: 512
209 µs ± 585 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 3052, Tokens: 4096, Hidden size: 768
271 µs ± 776 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 3052, Tokens: 16384, Hidden size: 512
297 µs ± 1.11 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 3052, Tokens: 16384, Hidden size: 768
377 µs ± 3.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

After:
```
Embedding size: 119547, Tokens: 4096, Hidden size: 512
1.46 ms ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 119547, Tokens: 4096, Hidden size: 768
2.09 ms ± 4.31 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Embedding size: 119547, Tokens: 16384, Hidden size: 512
1.64 ms ± 4.48 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 119547, Tokens: 16384, Hidden size: 768
2.35 ms ± 2.54 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Embedding size: 50265, Tokens: 4096, Hidden size: 512
782 µs ± 2.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 50265, Tokens: 4096, Hidden size: 768
1.06 ms ± 596 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 50265, Tokens: 16384, Hidden size: 512
945 µs ± 2.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 50265, Tokens: 16384, Hidden size: 768
1.31 ms ± 553 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 32000, Tokens: 4096, Hidden size: 512
603 µs ± 856 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 32000, Tokens: 4096, Hidden size: 768
789 µs ± 500 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 32000, Tokens: 16384, Hidden size: 512
752 µs ± 7.56 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 32000, Tokens: 16384, Hidden size: 768
1.01 ms ± 4.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 8000, Tokens: 4096, Hidden size: 512
323 µs ± 7.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 8000, Tokens: 4096, Hidden size: 768
398 µs ± 765 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 8000, Tokens: 16384, Hidden size: 512
412 µs ± 544 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 8000, Tokens: 16384, Hidden size: 768
519 µs ± 614 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 3052, Tokens: 4096, Hidden size: 512
229 µs ± 1.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 3052, Tokens: 4096, Hidden size: 768
263 µs ± 417 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 3052, Tokens: 16384, Hidden size: 512
274 µs ± 576 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Embedding size: 3052, Tokens: 16384, Hidden size: 768
354 µs ± 1.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62495

Reviewed By: gchanan

Differential Revision: D30176833

Pulled By: ngimel

fbshipit-source-id: 44148ebb53a0abfc1e5ab8b986865555bf326ad1
2021-08-09 15:31:55 -07:00
=
084e92bb76 Use output memory format based on input for cudnn_convolution_relu (#62482)
Summary:
Currently when cudnn_convolution_relu is passed a channels last Tensor it will return a contiguous Tensor. This PR changes this behavior and bases the output format on the input format.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62482

Reviewed By: ngimel

Differential Revision: D30049905

Pulled By: cpuhrsch

fbshipit-source-id: 98521d14ee03466e7128a1912b9f754ffe10b448
2021-08-09 15:31:53 -07:00
4fdb9579fa irange-ify 12 (#62120)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62120

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29879713

fbshipit-source-id: 3084a5eacb722f7fb0a630d47bf694f4d6831136
2021-08-09 15:31:51 -07:00
da9958c899 irange-ify 1 (#62193)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62193

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29879504

fbshipit-source-id: adc86adcd1e7dcdfa2d7adf4d576f081430d52ec
2021-08-09 15:30:43 -07:00
161fb31893 Fix render_test_results if condition on always() (#62997)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62997

Fixes #62979, changed the condition to listen on the previous'
job's result to be either 'success' or 'failure'.

Notice that 'skipped' will also skip this job, which is what
we want.

Test Plan: Imported from OSS

Reviewed By: driazati, seemethere

Differential Revision: D30202598

Pulled By: zhouzhuojie

fbshipit-source-id: f3c0f715c39a5c8119b528b66e45f594a54b49d1
2021-08-09 15:27:40 -07:00
39ec1da935 [reland] Gate DistributedOptimizers on RPC availability (#62937)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62937

reland due to windows + cuda failure, fix by running it on gloo on windows even with cuda.
ghstack-source-id: 135306176

Test Plan: ci

Reviewed By: mrshenli

Differential Revision: D30177734

fbshipit-source-id: 7625746984c8f858648c1b3632394b98bd4518d2
2021-08-09 14:41:06 -07:00
5b8389e536 irange-ify 8d (#62505)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62505

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29971891

fbshipit-source-id: 7dcbe27221788695f320c7238f5fe81e32823802
2021-08-09 13:18:38 -07:00
6286d33878 [fx] store Tracer class on Graph and GraphModule for package deserialization (#62497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62497

Previously named: add support for custom tracer in __reduce_package__

Stores a Tracer class on a Graph created by Tracer, and copies the Tracer class into the GraphModule's state so that when a GraphModule is packaged by torch package, it can be reconstructed with the same Tracer and GraphModule class name.

Reviewed By: suo

Differential Revision: D30019214

fbshipit-source-id: eca09424ad30feb93524d481268b066ea55b892a
2021-08-09 13:07:30 -07:00
f82d4b8957 Mark unused functions with C10_UNUSED (#62929)
Summary:
Which fixes number of warnings

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62929

Reviewed By: walterddr, albanD

Differential Revision: D30171953

Pulled By: malfet

fbshipit-source-id: f82475289ff4aebb0c97794114e94a24d00d2ff4
2021-08-09 13:00:33 -07:00
08f6bc1da6 Stop exporting symbols in anonymous namespaces (#62952)
Summary:
The cases are found out by compiling against clang on Windows.
Those functions will still be exported under this case, which is a waste of space in the symbol table.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62952

Reviewed By: gchanan

Differential Revision: D30191291

Pulled By: ezyang

fbshipit-source-id: 3319b0ec4f5fb02e0fe1b81dbbcedcf12a0c795e
2021-08-09 12:52:12 -07:00
3dcd785cac [Static Runtime] Add tests for all aten ops (#62347)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62347

This diff includes tests for all `aten` ops that did not already have test coverage.

Test Plan: `buck test //caffe2/benchmarks/static_runtime/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D29968280

fbshipit-source-id: 768655ca535f9e37422711673168dce193de45d2
2021-08-09 12:09:59 -07:00
a01f832329 handle get_attr opearations in typechecker (#62682)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62682

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D30107789

Pulled By: migeed-z

fbshipit-source-id: 0b21b2893e2dc7cfaf5b5f5990f662e051a981b4
2021-08-09 11:49:04 -07:00
3eeaffc7c5 Linker version script to hide LLVM symbols (#62906)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62906

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30193893

Pulled By: bertmaher

fbshipit-source-id: 9b189bfd8d4c52e8dc4296a4bed517ff44994ba0
2021-08-09 11:26:02 -07:00
1b1f1e36b4 Add `allow_empty_param_list` to functional optimizers (#62522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62522

Addresses https://github.com/pytorch/pytorch/issues/62481

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D30072074

Pulled By: andwgu

fbshipit-source-id: 1a5da21f9636b8d74a6b00c0f029427f0edff0e3
2021-08-09 11:18:56 -07:00
710c419f11 [Vulkan] Added Hardshrink op (#62870)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62870

Added Hardshrink operator for Vulkan
Added tests for Hardshrink op

Reference: [Hardshrink](https://pytorch.org/docs/stable/generated/torch.nn.Hardshrink.html#torch.nn.Hardshrink)

Test Plan: Imported from OSS

Reviewed By: SS-JIA

Differential Revision: D30174950

Pulled By: beback4u

fbshipit-source-id: 3e192390eb9f92abecae966e84bbfae356bfd7c8
2021-08-09 10:54:11 -07:00
922710f9b9 Change output node handling for typechecker to deal with tuples (#62582)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62582

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D30050004

Pulled By: migeed-z

fbshipit-source-id: 9b81b10d24e1e8165cdc18c820ea314349b463cb
2021-08-09 10:47:12 -07:00
e55f271859 __torch_dispatch__: Populate kwargs dictionary with keyword-only arguments (#62822)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62822

This is BC breaking for people who were using the old integration,
although only if you had been writing bindings for functions with
keyword-only arguments (that includes functorch).  Other than that,
the patch was pretty straightforward.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30134552

Pulled By: ezyang

fbshipit-source-id: a47f536fb030994a07c9386069b8f800ac86d731
2021-08-09 10:02:54 -07:00
2b83007ae2 Modify GHA CI to use PYTORCH_IGNORE_DISABLED_ISSUES based on PR body (#62851)
Summary:
Another step forward in fixing https://github.com/pytorch/pytorch/issues/62359

Disclaimer: this only works with GHA for now, as circleci would require changes in probot.

Test plan can be seen a previous description where I modified the description to include linked issues. I've removed them now since the actual PR doesn't fix any of them.

It works! In the [periodic 11.3 test1](https://github.com/pytorch/pytorch/pull/62851/checks?check_run_id=3263109970), we get this in the logs and we see that PYTORCH_IGNORE_DISABLED_ISSUES is properly set:
```
  test_jit_cuda_extension (__main__.TestCppExtensionJIT) ... Using /var/lib/jenkins/.cache/torch_extensions/py36_cu113 as PyTorch extensions root...
Creating extension directory /var/lib/jenkins/.cache/torch_extensions/py36_cu113/torch_test_cuda_extension...
Detected CUDA files, patching ldflags
Emitting ninja build file /var/lib/jenkins/.cache/torch_extensions/py36_cu113/torch_test_cuda_extension/build.ninja...
Building extension module torch_test_cuda_extension...
Using envvar MAX_JOBS (30) as the number of workers...
[1/3] c++ -MMD -MF cuda_extension.o.d -DTORCH_EXTENSION_NAME=torch_test_cuda_extension -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11 (d55b25a633)_COMPILER_TYPE=\"_gcc\" -DPYBIND11 (d55b25a633)_STDLIB=\"_libstdcpp\" -DPYBIND11 (d55b25a633)_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.6/site-packages/torch/include -isystem /opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.6/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++14 -c /var/lib/jenkins/workspace/test/cpp_extensions/cuda_extension.cpp -o cuda_extension.o
[2/3] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=torch_test_cuda_extension -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11 (d55b25a633)_COMPILER_TYPE=\"_gcc\" -DPYBIND11 (d55b25a633)_STDLIB=\"_libstdcpp\" -DPYBIND11 (d55b25a633)_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.6/site-packages/torch/include -isystem /opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.6/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_52,code=compute_52 -gencode=arch=compute_52,code=sm_52 --compiler-options '-fPIC' -O2 -std=c++14 -c /var/lib/jenkins/workspace/test/cpp_extensions/cuda_extension.cu -o cuda_extension.cuda.o
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[3/3] c++ cuda_extension.o cuda_extension.cuda.o -shared -L/opt/conda/lib/python3.6/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o torch_test_cuda_extension.so
Loading extension module torch_test_cuda_extension...
ok (26.161s)
```

whereas on the latest master periodic 11.1 windows [test](https://github.com/pytorch/pytorch/runs/3263762478?check_suite_focus=true), we see
```
test_jit_cuda_extension (__main__.TestCppExtensionJIT) ... skip (0.000s)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62851

Reviewed By: walterddr, tktrungna

Differential Revision: D30192029

Pulled By: janeyx99

fbshipit-source-id: fd2ecc59d2b2bb5c31522a630dd805070d59f584
2021-08-09 09:48:56 -07:00
8b54b14f92 [Static Runtime] Added a cache for NNC generated code across different calls to the same ops (#62921)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62921

Added a cache for NNC generated code across different calls to the same ops.

Before this diff:
```
ProcessedNode time 13402.9 ms
Static Module initialization took 30964.8 ms
```

After this diff:
```
ProcessedNode time 85.4195 ms
Static Module initialization took 4348.42 ms
```

There is one global cache for all the ops. It is guarded with a reader-writer lock. This is necessary because we could have multiple threads loading different models in parallel. Note that this locking does not guarantee that there will be exactly one code generated for each op. There could be more than one thread generating code for the same op simultaneously and all of them will update the cache in some order. But that should be small number bounded by the number of threads. Also, there is no correctness issue, since the generated code is always the same and the one generated by the last thread is retained in the cache and reused later while running the model.

Test Plan: Tested inline_cvr model

Reviewed By: hlu1

Differential Revision: D30104017

fbshipit-source-id: 32e9af43d7e724ed54b661dfe58a73a14e443ff7
2021-08-09 09:30:07 -07:00
3782f3eced Enable upper for torch.linalg.cholesky (#62434)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61988

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62434

Reviewed By: seemethere, tktrungna

Differential Revision: D30079806

Pulled By: walterddr

fbshipit-source-id: 044efb96525155c9bc7953ac4ad47c1b7c12fb20
2021-08-09 09:28:33 -07:00
e54ee9bac1 [nnc] Updated IR cloning to create clones of expressions in addition to statements (#62833)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62833

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30135980

Pulled By: navahgar

fbshipit-source-id: e557eedec7ecf596a4045756276d25a485fa66fb
2021-08-09 09:13:03 -07:00
5deeaab36a minor fixes in c10d for Windows (#62953)
Summary:
Found out by triggering builds against clang on Windows.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62953

Reviewed By: gchanan

Differential Revision: D30191300

Pulled By: ezyang

fbshipit-source-id: d929119768298084c41d70dbc3a78aacd64fb715
2021-08-09 09:05:09 -07:00
fff83f3f66 Add handling of list write to remove mutation (#62904)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62904

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30168493

Pulled By: eellison

fbshipit-source-id: 3b25982b235938cc7439dd3a5236dfce68254c05
2021-08-09 08:56:06 -07:00
254148ec7d Add tensor-scalar op (#62903)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62903

Test Plan: Imported from OSS

Reviewed By: pbelevich, SplitInfinity

Differential Revision: D30168338

Pulled By: eellison

fbshipit-source-id: 7dcb34ddd76c6aad4108a4073d3c8a93d974d0ef
2021-08-09 08:54:47 -07:00
4c4c5b14e4 Port sum.dim_IntList kernel to structured kernels. (#61642)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61642

Tracking issue: #55070

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29783865

Pulled By: ezyang

fbshipit-source-id: 375d4cd5f915812108367601a610a428762e606d
2021-08-09 08:46:16 -07:00
c7db642a72 Adding collective quantization API (#62142)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62142

Created wrapper that takes the collective op and a quantization type as an arguments. It quantize the input, performs the collective op, and and perform dequantization

Test Plan:
Tested through distributed_gloo_fork.
e.g., buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_all_to_all_quantized

Reviewed By: wanchaol

Differential Revision: D29682812

fbshipit-source-id: 79c39105ff11270008caa9f566361452fe82a92e
2021-08-09 08:11:22 -07:00
6ccedc7c1f Set mkl thread locally (#62891)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62891

Fixes #60469

We want to land this PR before next release, so soliciting the idea from raven38 in https://github.com/pytorch/pytorch/pull/60471. And, add corresponding test to verify the result.

- Before this PR using this test:
![image](https://user-images.githubusercontent.com/68879799/128542334-1b899be5-2b6e-4c03-8ac0-568fb15470b8.png)
- After this PR the test passed without Error.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D30161483

Pulled By: ejguan

fbshipit-source-id: 800f7204e0e1a19c492b2e556c92a91115f1b69b
2021-08-09 07:37:18 -07:00
30214aef2d [BE] irangefy (#62928)
Summary:
Replace for loop with for `irange` loop. Also fix some unused variable warnings in range loop cases

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62928

Reviewed By: driazati

Differential Revision: D30171904

Pulled By: malfet

fbshipit-source-id: 1b437a0f7e3515f4a2e324f3450e93312f1933ae
2021-08-07 13:34:13 -07:00
9f7aba737b Make IMethod cache mutable so getArgument works on const IMethod (#62834)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62834

Test Plan: existing unit tests

Reviewed By: alanwaketan

Differential Revision: D30135939

fbshipit-source-id: e19c0ac1af6996e065a18318351265b5c4a01e70
2021-08-06 22:58:21 -07:00
b80dffd911 [TensorExpr] Remove more 'const' from IRVisitor methods for *Imm types. (#62932)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62932

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30172961

Pulled By: ZolotukhinM

fbshipit-source-id: 9b7f45880d356f823364135fe29fc08f6565f827
2021-08-06 22:44:09 -07:00
b45cf9b81b Revert D30117838: [WIP] Gate DistributedOptimizers on RPC availability
Test Plan: revert-hammer

Differential Revision:
D30117838 (3f09485d7e)

Original commit changeset: e6365a910a3d

fbshipit-source-id: f276b2b2bdf5f7bd27df473fca0eebaee9f7aef2
2021-08-06 22:10:41 -07:00
e6a3154519 Allow broadcasting along non-reduction dimension for cosine similarity (#62912)
Summary:
Checks introduced by https://github.com/pytorch/pytorch/issues/58559 are too strict and disable correctly working cases that people were relying on.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62912

Reviewed By: jbschlosser

Differential Revision: D30165827

Pulled By: ngimel

fbshipit-source-id: f9229a9fc70142fe08a42fbf2d18dae12f679646
2021-08-06 19:17:04 -07:00
6630d98ae5 Refactor codegen file sharding (#62184)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62184

File sharding is currently implemented twice, once for VariableType and once for
TraceType. This refactors the implementation into `FileManager` and also changes
it so template substitution is only done once and shared between the sharded
file and the "Everything" file.

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D29962050

Pulled By: albanD

fbshipit-source-id: 7858c3ca9f6e674ad036febd2d1a4ed2323a2861
2021-08-06 19:13:42 -07:00
44fad84bca [DDP] Add host-side time to CUDATimer (#62770)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62770

Adding timing of forward, backward comp, backward comm, etc will help
detect desynchronization issues.
ghstack-source-id: 135195680

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D30115585

fbshipit-source-id: 509bf341c5c92dcc63bdacd3c1e414da4eb4f321
2021-08-06 18:41:40 -07:00
22e3cc21e5 Back out "Enable test_api IMethodTest in OSS" (#62893)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62893

Original commit changeset: 50eb3689cf84

Test Plan: Confirm pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test2 passes in OSS

Reviewed By: seemethere, alanwaketan

Differential Revision: D30159999

fbshipit-source-id: 74ff8975328409a3dc8222d3e2707a1bb0ab930c
2021-08-06 16:43:50 -07:00
bbe2c8e6d2 Fix reshape for the Lazy key (#62846)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62846

Test Plan: CI

Reviewed By: zou3519

Differential Revision: D30162185

Pulled By: asuhan

fbshipit-source-id: d582dcef35ce7e8bebf161a5c93e470339891e29
2021-08-06 15:29:56 -07:00
6e24ce7a46 Revert D30138788: [pytorch][PR] OpInfo for adaptive_avg_pool2d
Test Plan: revert-hammer

Differential Revision:
D30138788 (5c431981b5)

Original commit changeset: 66735ceaa85b

fbshipit-source-id: 75eb241ef82d32d6480db069c035df0abc6753fe
2021-08-06 15:17:05 -07:00
d9154b9b26 [quant] Input-Weight Equalization - allow logical evaluation (#61603)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61603

Test Plan: Imported from OSS

Reviewed By: supriyar

Differential Revision: D29686878

fbshipit-source-id: 67ca4cab98b3d592ff2bb8db86499789b85bd582
2021-08-06 15:10:32 -07:00
43b087791c .github: Make sure to deep clone on windows (#62907)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62907

Deep clones allow us to use git commands on historical commits so that
we can do things like collect test times correctly

Should fix empty `.pytorch-test-times.json` files that walterddr was observing

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: janeyx99

Differential Revision: D30166414

Pulled By: seemethere

fbshipit-source-id: 1f9904eeb5a8ebaf0a02d1aa7291fffe1aecd57b
2021-08-06 15:06:56 -07:00
e3944ab00e Revert D30038175: Improve IMethod::getArgumentNames to deal with empty argument names list
Test Plan: revert-hammer

Differential Revision:
D30038175 (64b3ab6407)

Original commit changeset: 46f08dda9418

fbshipit-source-id: 604735d2300487a0b75890b330d7ba5b3e7145b2
2021-08-06 14:58:43 -07:00
7a3f1386ae Add GradBucket::parameters() to ddp_comm_hooks.rst (#62877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62877

as title
ghstack-source-id: 135214612

Test Plan: N/A

Reviewed By: rohan-varma

Differential Revision: D30153490

fbshipit-source-id: d4cec434a53ef6e65b60c065804884d1a114aa0d
2021-08-06 14:50:47 -07:00
eqy
6d24a075cb Check contiguous to dispatch to NHWC cuda template (#62839)
Summary:
follow up of https://github.com/pytorch/pytorch/issues/62773

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62839

Reviewed By: H-Huang

Differential Revision: D30142906

Pulled By: ngimel

fbshipit-source-id: 600a7ad240a4a1827352eab8c8cbc98240d693f0
2021-08-06 14:11:10 -07:00
=
e6e579ce74 [FX] Add torch.memory_format as a BaseArgumentType (#62593)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62498

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62593

Reviewed By: H-Huang

Differential Revision: D30104091

Pulled By: cpuhrsch

fbshipit-source-id: 25b7a4b308219860c969db54d7b1867b1aa4180a
2021-08-06 14:03:41 -07:00
97dc43beeb use test environment for test phase (#62824)
Summary:
Currently all test generated in test matrix share the same `BUILD_ENVIRONMENT` variable. we should distinguish them because some test scripts uses BUILD_ENVIRONMENT to differentiate what to run.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62824

Reviewed By: zhouzhuojie

Differential Revision: D30162250

Pulled By: walterddr

fbshipit-source-id: 3a99a21e91e02ed8638feed102e7966af01dd175
2021-08-06 11:52:41 -07:00
786934902c Adds JOB_BASE_NAME to steps of CircleCI mac workflows (#62892)
Summary:
Upon noticing that we had a job entry named "None" in our S3 stats, I set out to find which test reporting had a JOB_BASE_NAME that wasn't set.

It turns out all non Windows and Linux workflows did not have JOB_BASE_NAME but instead used CIRCLE_JOB. This remedies the current issue by explicitly setting JOB_BASE_NAME in Mac workflows, but doesn't touch anything else as those other jobs (like android) do not report test stats.

This also adds back the CIRCLE_JOB dependency in print_test_stats to be backwards compatible, but the goal is to move off of CIRCLE_JOB dependency to a more CI-platform-agnostic naming of variables.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62892

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

{F639556801}
None is now the macos!

Reviewed By: walterddr

Differential Revision: D30160234

Pulled By: janeyx99

fbshipit-source-id: df868dec5f9b289d3837e927d2bb95acb2d9185b
2021-08-06 11:34:17 -07:00
c9b5d79d40 [hotfix] fix BC checker direction (#62901)
Summary:
fix https://github.com/pytorch/pytorch/issues/62687 error. should allow listed those that has date time newer than today.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62901

Reviewed By: zhouzhuojie

Differential Revision: D30163202

Pulled By: walterddr

fbshipit-source-id: b882975a231249137cb2d252f41e98e133b6f337
2021-08-06 11:29:28 -07:00
59d09b148c BUG Fixes bug in no_batch_dim tests (#62726)
Summary:
The way that Python captures variables for lambdas meant that only the last `input_fn`, etc were captured. This PR adds makes sure the local variable to captured by a lambda.

REF: https://docs.python.org/3/faq/programming.html#why-do-lambdas-defined-in-a-loop-with-different-values-all-return-the-same-result

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62726

Reviewed By: zou3519

Differential Revision: D30159478

Pulled By: jbschlosser

fbshipit-source-id: cfef3d9776d2676b2f5bb6d39d569b8ca07b0fe5
2021-08-06 11:11:25 -07:00
a03604c610 Set JOB_BASE_NAME consistently for bazel (#62886)
Summary:
It was manually set incorrectly before to pytorch-linux-xenial-py3.6-gcc7-bazel-test-test, which is inconsistent with the rest of our naming scheme.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62886

Reviewed By: driazati

Differential Revision: D30159860

Pulled By: janeyx99

fbshipit-source-id: 4984ec04ee2bcf68b9a57e241ca9f979bfe6398a
2021-08-06 11:07:03 -07:00
3f09485d7e [WIP] Gate DistributedOptimizers on RPC availability (#62774)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62774

Gates DistributedOptimizer which relies on RRef based on if RPC is available. This should enable ZeRo to work with Windows as Windows should not try to import the DIstributedOptimizer. If this works as expected we can enable the windows tests for functional/local sgd optimizers as well.
ghstack-source-id: 135216642

Test Plan: CI

Reviewed By: pbelevich

Differential Revision: D30117838

fbshipit-source-id: e6365a910a3d1ca40d95fa6777a7019c561957db
2021-08-06 10:59:00 -07:00
1dba329d20 Enable step_param for Adam functional optimizer (#62611)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62611

Enables optimizer overlap with backwards in DDP for Adam. Additional optimizers, especially Adagrad will be done in follow up diffs.

1. Implement `step_param` method based on `step` in _FunctionalAdam (perf permitting we can later dedupe `step` to call `step_param`
2. Modify tests to test all current functional optimizers.
ghstack-source-id: 135207143

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D29891783

fbshipit-source-id: 321915982afd5cb0a9c2e43d27550f433bff00d1
2021-08-06 10:53:55 -07:00
836b2431dc [quant] Input-Weight Equalization - selective equalization (#61916)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61916

Functions used to run selective equalization based on the SQNR obtained from running the Numeric Suite. After running the Numeric Suite between the equalized and float model, we will get the SQNR between the two models and construct an equalization_qconfig_dict that specifies to only equalize the layers with the highest quantization errors.

How to run:
```
layer_to_sqnr_dict = get_layer_sqnr_dict(float_model, equalized_model, input)
eq_qconfig_dict = get_equalization_qconfig_dict(layer_to_sqnr_dict, equalized_model, num_layers_to_equalize)

prepared = prepare_fx(float_model, qconfig_dict, eq_qconfig_dict)
...
```

Test Plan:
`python test/test_quantization.py TestEqualizeFx.test_selective_equalization`

Imported from OSS

Reviewed By: supriyar

Differential Revision: D29796950

fbshipit-source-id: 91f0f8427d751beaea32d8ffc2f3b8aa8ef7ea95
2021-08-06 09:29:03 -07:00
e6ef87001c [BF16] Add BF16 support to _aminmax and _anminmax_all operators (#62767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62767

Add BF16 support to _aminmax_all and _aminmax operators.

Test Plan:
Added unit test:
https://www.internalfb.com/intern/testinfra/testconsole/testrun/2533274857208373/

Reviewed By: anjali411

Differential Revision: D30073837

fbshipit-source-id: 9cb4991e644cfdb2f0674ccaff161d223c174150
2021-08-06 08:56:12 -07:00
56ff996386 [vulkan] Add _reshape_alias (#62858)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62858

D29792126 (adb73d3dcf) changed the behaviour of `reshape()` such that it calls `_reshape_alias()` instead of `view()` in order to avoid duplicating some work such as computing strides.

Vulkan has not yet implemented `_reshape_alias()` so `reshape()` would fail with

```
C++ exception with description "Could not run 'aten::_reshape_alias' with arguments from the 'Vulkan' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions.
```

For Vulkan there is no concept of strides so it's fine to just have `_reshape_alias()` point to `view()`.

Test Plan:
```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
cd -
```

Reviewed By: kimishpatel

Differential Revision: D30054706

fbshipit-source-id: 770979fa3a0f99bcc2ddaefa4674e5bd79b17c03
2021-08-06 08:44:15 -07:00
5f4207eb91 [vulkan] Throw an exception if device does not support Vulkan (#62859)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62859

If the Vulkan instance cannot be initialized successfully (i.e. no `vkPhysicalDevice` could be found due to missing drivers) then Vulkan ops will not be able to execute. However, currently `api::context()` which is used to access the global Vulkan context simply returns a null pointer if there is a problem initializing the Vulkan instance.

This leads to Segmentation Faults later on because Vulkan ops assume that `api::context()` will not return a `nullptr`. For instance: [this line](https://www.internalfb.com/code/fbsource/xplat/caffe2/aten/src/ATen/native/vulkan/ops/Persistent.cpp?lines=14) will frequently cause a Segmentation Fault when drivers are not present.

Instead of having `api::context()` returning a nullptr when Vulkan cannot be initialized, it should just throw an exception since ops cannot be executed anyway. This results in a more graceful failure as these exceptions can be caught instead of crashing the app with a Seg Fault down the line.

Test Plan:
```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
cd -
```

On an Omni model portal, I can also remove the vulkan drivers in order to test the functionality when Vulkan is not supported.

Reviewed By: kimishpatel

Differential Revision: D30139891

fbshipit-source-id: 47fcc8dcd219cb78ab9bec0b6a85b2aa7320ab50
2021-08-06 08:42:26 -07:00
d3bdf345cb Introducing DataChunk for DataPipes batching (#62768)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62768

This is part of TorchArrow DF support preparation, separating it to multiple PRs to simplify review process.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30149090

Pulled By: VitalyFedyunin

fbshipit-source-id: a36b5ff56e2ac6b06060014d4cd41b487754acb8
2021-08-06 08:38:33 -07:00
5e5de75f4d Add getPyInterpreter() API (#62659)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62659

It turns out that it is occasionally useful to be able to access the
PyInterpreter object from other Python bindings (see next diff in the
stack).  Make it publicly available.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30074926

Pulled By: ezyang

fbshipit-source-id: 2f745ab7c7a672ed7215231fdf9eef6af9705511
2021-08-06 08:23:24 -07:00
27135f86fd fix docstring default value of last_epoch for SWALR in torch/optim/… (#62799)
Summary:
…swa_utils

Fixes https://github.com/pytorch/pytorch/issues/62633

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62799

Reviewed By: zou3519

Differential Revision: D30131929

Pulled By: H-Huang

fbshipit-source-id: 741c077073bbe398492dff0761836acdbba7be78
2021-08-06 08:15:10 -07:00
9573e7a644 rename namespace f4d to velox (#61)
Summary:
Pull Request resolved: https://github.com/facebookexternal/torchdata/pull/61

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62860

Pull Request resolved: https://github.com/facebookexternal/presto_cpp/pull/453

Moving all namespace definitions, declarations and  references from 'f4d' to 'velox'

Test Plan:
```
buck build //f4d/...
buck test //f4d/...
```
Also monitor the signals from sandcaslte

Reviewed By: pedroerp

Differential Revision: D30140136

fbshipit-source-id: 5b53ac768bb7e5cd07c93a9b04dfd6363080eb52
2021-08-05 21:04:36 -07:00
e1f81c9321 [torchelastic][multiprocessing] Print warning message only when child processes are stuck (#62823)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62823

The diff makes sure that the warning message is printed only when the child processes are stuck after sending termination code.

Test Plan:
sandcastle

    buck build mode/dev-nosan //caffe2:run
    buck-out/gen/caffe2/run.par --nnodes 1 --nproc_per_node 1 main.py
P435691445

Differential Revision: D30046695

fbshipit-source-id: c59170b297f4a0e530906fa5069234303deee938
2021-08-05 19:57:31 -07:00
f6c7081a16 Allow FractionalMaxPool 2D and 3D layers to accept 0 dim batch size tensors. (#62083)
Summary:
This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in  https://github.com/pytorch/pytorch/issues/38115.

Allow `FractionalMaxPool` 2D and 3D layers to accept 0 dim batch sizes. Also make some minor corrections to error messages to make them more informative.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62083

Reviewed By: H-Huang

Differential Revision: D30134461

Pulled By: jbschlosser

fbshipit-source-id: 0ec50875d36c2083a7f06d9ca6a110fb3ec4f2e2
2021-08-05 17:40:10 -07:00
8aa12cbf86 Add tutorial link (#62785)
Summary:
Addresses: https://github.com/pytorch/pytorch/pull/62605#discussion_r681380364

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62785

Test Plan: I checked the render, and the link redirects as desired.

Reviewed By: mrshenli

Differential Revision: D30133229

Pulled By: andwgu

fbshipit-source-id: baefe0d1f1b78ece44bb42e67629bc130dbf8e9a
2021-08-05 17:28:02 -07:00
64c54f92ca [opinfo] nn.functional.unfold (#62705)
Summary:
Reference: facebookresearch/functorch#78

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62705

Reviewed By: H-Huang

Differential Revision: D30138807

Pulled By: zou3519

fbshipit-source-id: 1d0b0e58feb13aec7b231c9f632a6d1694b9d272
2021-08-05 17:12:25 -07:00
9ac56ef0fc [DDP] log gradient ready order and bucket indices (#62751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62751

This will help us determine whether gradient ready order and bucket
indices are aligned amongst all the ranks. This should always be true for rank
0 as we determine rebuilt bucket order by the gradient ready order on rank 0,
but would be interested to see this on different workloads for other ranks
ghstack-source-id: 135104369

Test Plan: CI

Reviewed By: SciPioneer, wanchaol

Differential Revision: D30111833

fbshipit-source-id: a0ab38413a45022d953da76384800bee53cbcf9f
2021-08-05 16:36:25 -07:00
80091cb0f7 [DDP] Allow tuning of first bucket (#62748)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62748

Previously after buckets were rebuilt the first bucket size was always
defaulted to 1MB, this diff allows first bucket to be tuned like the rest of
the bucket sizes can.

Setting `dist._DEFAULT_FIRST_BUCKET_BYTES = 1` results in the following logs as
expected:
I0804 12:31:47.592272 246736 reducer.cpp:1694] 3 buckets rebuilt with size
limits: 1, 1048, 1048 bytes.
ghstack-source-id: 135074696

Test Plan: CI

Reviewed By: SciPioneer, wanchaol

Differential Revision: D30110041

fbshipit-source-id: 96f76bec012de129d1645e7f50e266d4b255ec66
2021-08-05 16:35:07 -07:00
5c431981b5 OpInfo for adaptive_avg_pool2d (#62704)
Summary:
Please see https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261.

Note regarding sample inputs for this function:

* Checks added for all relevant/interesting cases for `output_size`: `(None, None), (None, width), (height, None), (height, width)`.

cc: mruberry zou3519 Chillee

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62704

Reviewed By: H-Huang

Differential Revision: D30138788

Pulled By: zou3519

fbshipit-source-id: 66735ceaa85b9e6050d4ec27749fc3a8108cf557
2021-08-05 16:11:31 -07:00
eaaceea8d4 Bump protobuf version in CircleCI docker images (#62441)
Summary:
Needed to update ONNX to 1.10 (https://github.com/pytorch/pytorch/issues/62039) because that introduces uses
of the "reserved" protobuf feature.

Also:
* Remove protobuf install code from scripts where it was unused.
* Add `-j` flag to make commands to speed things up.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62441

Reviewed By: soulitzer

Differential Revision: D30072381

Pulled By: malfet

fbshipit-source-id: f55a4597baf95e3ed8ed987d6874388cab3426b0
2021-08-05 15:46:12 -07:00
e62189ad69 [jit] Better checking for overload function declarations. (#59956)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59956

Issue #50175. Basically two things need to be checked and are lacking currently:
1. Overload declarations should always have a single `pass` statement as the body.
2. There should be always an implementation provided for decls which doesn't
   have the torch.jit._overload decorator. So in this case we need to check
   whether we are actually compiling a function body with decorator ahead.

Test Plan:
python test/test_jit.py TestScript.test_function_overloads

Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D29106555

fbshipit-source-id: 2d9d7df2fb51ab6db0e1b726f9644e4cfbf733d6
2021-08-05 14:21:48 -07:00
63fa53d37a Add batched model to torchdeploy examples (#62836)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62836

Used for upcoming diff that adds support for batching to torchdeploy

Test Plan: Models are used by later diffs, but generation script is verified by CI now and locally.

Reviewed By: gunchu

Differential Revision: D30135938

fbshipit-source-id: 566a32a3ede56833e41712025e9d47191dfc5f39
2021-08-05 14:01:40 -07:00
c8eda919a4 test, fix sparse * dense exceptions and corner case (#61723)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59916

This fixes two problems with sparse multiplication
- 0d-dense * sparse was creating a non-sparse output and failing.
- dense * sparse or sparse * dense is not supported, but would emit an unhelpful error message
<details>
<summary> unhelpful error message </summary>
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NotImplementedError: Could not run 'aten::_nnz' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_nnz' is only available for these backends: [SparseCPU, SparseCUDA, SparseCsrCPU, SparseCsrCUDA, BackendSelect, Python, Named, Conjugate, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].

SparseCPU: registered at aten/src/ATen/RegisterSparseCPU.cpp:961 [kernel]
SparseCUDA: registered at aten/src/ATen/RegisterSparseCUDA.cpp:1092 [kernel]
SparseCsrCPU: registered at aten/src/ATen/RegisterSparseCsrCPU.cpp:202 [kernel]
SparseCsrCUDA: registered at aten/src/ATen/RegisterSparseCsrCUDA.cpp:229 [kernel]
BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:38 [backend fallback]
Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:118 [backend fallback]
ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:60 [backend fallback]
AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradXPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradMLC: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradHPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
Tracer: registered at ../torch/csrc/autograd/generated/TraceType_2.cpp:10254 [kernel]
UNKNOWN_TENSOR_TYPE_ID: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:446 [backend fallback]
Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:285 [backend fallback]
Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
</details>

Also added tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61723

Reviewed By: ezyang

Differential Revision: D29962639

Pulled By: cpuhrsch

fbshipit-source-id: 5455680ddfa91d5cc9925174d0fd3107c40f5b06
2021-08-05 11:27:12 -07:00
8d7786ada6 Simplify hardswish ONNX export graph. (#60080)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58301

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60080

Reviewed By: suo

Differential Revision: D30002939

Pulled By: SplitInfinity

fbshipit-source-id: 8b4ca6f62d51b72e9d86534592e3c82ed6608c9d
2021-08-05 11:15:14 -07:00
7630f407cc add OpInfo for torch.nn.functional.grid_sample (#62311)
Summary:
Addresses facebookresearch/functorch#78.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62311

Reviewed By: malfet

Differential Revision: D30013388

Pulled By: zou3519

fbshipit-source-id: 0887ae9935923d928bfeb59054afe1aab954b40b
2021-08-05 10:43:54 -07:00
5dbcd5638b OpInfo for nn.functional.avg_pool2d (#62455)
Summary:
Please see https://github.com/facebookresearch/functorch/issues/78

cc: mruberry zou3519

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62455

Reviewed By: soulitzer

Differential Revision: D30096146

Pulled By: heitorschueroff

fbshipit-source-id: ef09abee9baa5a9aab403201226d1d9db5af100a
2021-08-05 10:28:52 -07:00
878943c64f Preserve memory layout when aten batchnorm is used (#62773)
Summary:
https://github.com/pytorch/pytorch/issues/62594

CC cpuhrsch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62773

Reviewed By: H-Huang

Differential Revision: D30118658

Pulled By: cpuhrsch

fbshipit-source-id: bce9e92f5f8710c876a33cccbd1625155496ddea
2021-08-05 10:21:44 -07:00
d45291613c [pruner] generalize bias hook for conv2d (#62430)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62430

The bias hook is a forward hook that is part of the pruning parametrization; it is attached after the activation reconstruction forward hook, so adding the bias occurs after zeros are reinserted to the pruned activation.

This diff/PR amends the bias hook to work for Conv2d layers, in addition to Linear layers. The reshaping of the ._bias parameter ensures that it is added to the right dimension of the output.
ghstack-source-id: 135097700

Test Plan:
Added tests for `Conv2dB()`, a model with Conv2d layers that have `bias=True`.

`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1MfgL

Reviewed By: jerryzh168

Differential Revision: D29979571

fbshipit-source-id: c1a7e9fabc8b3c9d0050bd6b6c6a631ddfdf2a68
2021-08-05 09:27:17 -07:00
b524a1101a ns for fx: add ref_node_target_type (#62685)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62685

Adds a `ref_node_target_type` field to hold the string type
of the base node. This is needed because in some cases
the previous node does not match ref_node (if we have observers,
or if we are logging inputs), and it is useful to know the type
of ref_node.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D30082947

fbshipit-source-id: 98ded7b25a5d8d5ea820e0ef62c3799b65c3fc77
2021-08-05 09:26:10 -07:00
b96acb7591 Allow disabled tests to be re-enabled with IGNORE_DISABLED_ISSUES (#62686)
Summary:
Part 1 of fixing https://github.com/pytorch/pytorch/issues/62359

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62686

Test Plan:
1. Check out this PR and run `python setup.py install`.
2. The test we will be running requires CUDA. If you don't have CUDA, you can try this on another device or simply comment out the skipIf statement before the `test_jit_cuda_extension` test in `test_cpp_extensions_jit.py`
3. Run: `IN_CI=1 python test/run_test.py -i test_cpp_extensions_jit -- -k test_jit_cuda_extension` and notice that it should skip. If it doesn't skip, edit test/.pytorch-disabled-tests.json: modify the platforms list of the first issue (61655) to include whatever platform you are on (macos or linux), and just run `python test/test_cpp_extensions_jit.py -v -k test_jit_cuda_extension --import-disabled-tests` to make sure it skips.
4. Now `export PYTORCH_IGNORE_DISABLED_ISSUES=61655` or `export PYTORCH_IGNORE_DISABLED_ISSUES=34952,61655`.
5. `rm test/.pytorch-*` to clear the cached files.
6. Run the same command as in step 5 and note that it SHOULDN'T skip. It should run.

Reviewed By: walterddr, samestep

Differential Revision: D30108773

Pulled By: janeyx99

fbshipit-source-id: dbf015a266f57577dc9283b0cdff720083b5c0cb
2021-08-05 09:05:40 -07:00
24a2681358 Revert D30094460: [profiler] Re-enable test on Windows
Test Plan: revert-hammer

Differential Revision:
D30094460 (5a1017be97)

Original commit changeset: 80521f6bc136

fbshipit-source-id: 7c01493ad078be7df1bbb81c08be6364d6ffaa4d
2021-08-05 08:34:15 -07:00
0c8ed042f2 Revert D30095246: [pytorch][PR] Enable ncclAvg for reductions
Test Plan: revert-hammer

Differential Revision:
D30095246 (a749180e4e)

Original commit changeset: d3a3475345fa

fbshipit-source-id: 34b5100b925859461296cae5a717a70e5eca6af6
2021-08-05 07:56:40 -07:00
6d896cb545 Update faq.rst so OOM section mentions checkpoint (#62709)
Summary:
This FAQ has a section for CUDA OOMs where there are lots of don'ts. This limits modeling solution. Deep nets can blow up memory due to output caching during training.
It's a known problem with a known solution: to trade-off compute for memory via checkpointing.
FAQ should mention it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62709

Reviewed By: nairbv

Differential Revision: D30103326

Pulled By: ezyang

fbshipit-source-id: 3a8b465a7fbe19aae88f83cc50fe82ebafcb56c9
2021-08-05 07:40:08 -07:00
b84885cc8b Add support for boxed functors (#62658)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62658

Boxed functors, like their unboxed brethren, support operators which
aren't just a function pointer, but a function pointer with some
associated global state that is allocated at registration time.

The use case I have in mind with this implementation is "dispatcher
API from Python", where the extra state kernel registrations need is
the PyObject callable we will invoke to do the actual invocation.
See next PR in this stack.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D30074925

Pulled By: ezyang

fbshipit-source-id: ee040edbbec1e607486d338d1ea78bb5c6b2ece9
2021-08-05 07:26:09 -07:00
e6a227465b Add serialization support for slots and subclass getstate/setstate (#62745)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62745

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30113112

Pulled By: albanD

fbshipit-source-id: 6c562d0c060fb0280e5e3d432bb42fb833e6d500
2021-08-05 06:49:44 -07:00
056b147e10 clean torch_function handling in serialization (#62744)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62744

The `Tensor._reduce_ex_internal` function can only be called via the `Tensor.__reduce_ex__` function.
And that second function already properly handles the `__torch_function__` overwrites. So no need to handle them again in `Tensor._reduce_ex_internal`.

This PR also updates `Tensor.__reduce_ex__` to use the specialized unary API for `__torch_function__` that makes it nicer to read.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D30113113

Pulled By: albanD

fbshipit-source-id: c94f5d2597ee3afe799d9de991f75615c3c172d6
2021-08-05 06:48:26 -07:00
ee82e7a14e [DDP Communication Hook] Renaming C++ calls to match python API closer (#62735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62735

Renamed the following
1. getTensor -> getBuffer
2. getTensorRef -> getBufferRef
3. setTensor -> setBuffer
and all associated private variables as well

Reviewed By: SciPioneer

Differential Revision: D30069124

fbshipit-source-id: fa8f1f8a7f3255e6242973bc37b3f7b2731af55d
2021-08-05 05:06:29 -07:00
64b3ab6407 Improve IMethod::getArgumentNames to deal with empty argument names list (#62782)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62782

This diff improved IMethod::getArgumentNames to deal with empty argument names list.

Test Plan:
buck test mode/dev caffe2/caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor.GetEmptyArgumentNamesValidationMode
buck test mode/dev caffe2/caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor.GetEmptyArgumentNamesRealMode

Reviewed By: wconstab

Differential Revision: D30038175

fbshipit-source-id: 46f08dda94187160b4d6ee87600d1b46fe934222
2021-08-05 01:32:00 -07:00
019048b3b6 [PyTorch Edge] Simplify Exception Handling (Take-2) (module.cpp) (#62634)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62634

Apply the same set of changes as in D27688352 (d728491fc1) to `module.cpp` as instructed by xcheng16.

Basically, this simplifies exception handling and allows propagation of the original message undisturbed to the caller so that we can figure out the lineage of the exception in crash tasks such as t96812652
ghstack-source-id: 134877012

Test Plan: Build/Sandcastle

Reviewed By: raziel

Differential Revision: D30038867

fbshipit-source-id: 8dfd415c510bcd0ab49814f4eb559ec6fc8f72e5
2021-08-04 23:25:30 -07:00
4b68801c69 Enable test_api IMethodTest in OSS (#62521)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62521

This diff did the following few things to enable the tests:
1. Exposed IMethod as TORCH_API.
2. Linked torch_deploy to test_api if USE_DEPLOY == 1.

Test Plan:
./build/bin/test_api --gtest_filter=IMethodTest.*

To be noted, one needs to run `python torch/csrc/deploy/example/generate_examples.py` before the above command.

Reviewed By: ezyang

Differential Revision: D30055372

Pulled By: alanwaketan

fbshipit-source-id: 50eb3689cf84ed0f48be58cd109afcf61ecca508
2021-08-04 21:14:20 -07:00
a749180e4e Enable ncclAvg for reductions (#62303)
Summary:
[ncclAvg](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/types.html?highlight=ncclavg#c.ncclAvg) is a new `ncclRedOpt_t` that fuses a div-by-world-size with ncclAllReduce, Reduce, or ReduceScatter. This PR adds support.

This PR and https://github.com/pytorch/pytorch/pull/62140 lay the foundation for to DDP allreduce+average grad tensors in place with a single nccl call without additional memory pass(es) to flatten or average or unflatten. I'll write the necessary DDP changes once this PR and https://github.com/pytorch/pytorch/pull/62140 land.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62303

Reviewed By: soulitzer

Differential Revision: D30095246

Pulled By: rohan-varma

fbshipit-source-id: d3a3475345fafb0ab265c11d36db74d7c5613a0a
2021-08-04 19:43:50 -07:00
4bd54cebe0 Refinement types and unification for symbolic shape inference (#61776)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61776

Test Plan: Imported from OSS

Reviewed By: iramazanli

Differential Revision: D29772537

Pulled By: migeed-z

fbshipit-source-id: 3555d43152a213087c64faa326432f1628eb3bb1
2021-08-04 17:34:29 -07:00
a27a0b1ef5 [SR] Disable NNC temporarily (#62746)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62746

Disable NNC temporarily until a code cache is implemented to reduce the compilation time.

Reviewed By: ajyu

Differential Revision: D30080326

fbshipit-source-id: ef8bb3ac3a6947614f4a03a3d52774b6933d3ea8
2021-08-04 17:33:07 -07:00
afc1d1b3d6 Fix lint errors in cuda_ReportMemoryUsage tests (#62778)
Summary:
Introduced in 8bbcef5096

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62778

Reviewed By: chaekit, driazati

Differential Revision: D30120245

Pulled By: malfet

fbshipit-source-id: 2cb5755b870182dd147a6685c74f7defcc10030a
2021-08-04 17:26:23 -07:00
658540f43f remove deprecated is_deterministic and set_deterministic (#62158)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58096

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62158

Reviewed By: mruberry

Differential Revision: D29909634

Pulled By: ezyang

fbshipit-source-id: ccffbcf8f378e39bd2c7fbeace7ed1cbbe003981
2021-08-04 16:45:23 -07:00
a705b8f08f OpInfo for nn.functional.relu (#62076)
Summary:
See https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261.

cc: mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62076

Reviewed By: soulitzer

Differential Revision: D30013262

Pulled By: zou3519

fbshipit-source-id: 7df5e930d1588146e09cf58c53c8860392da7348
2021-08-04 15:50:18 -07:00
123be6b261 Port addcdiv to structured kernels. (#62319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62319

Tracking issue: #55070

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29961996

Pulled By: bdhirsh

fbshipit-source-id: d38141476b41dbfd4bf029d631f81a32aff82a5e
2021-08-04 15:35:25 -07:00
693b0af996 Port addcmul kernels to structured kernels. (#62318)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62318

Tracking issue: #55070

This PR introduces the method `TensorIteratorBase::build_ternary_op` for building a
`TensorIteratorBase` for 3-input 1-output kernel.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29961997

Pulled By: bdhirsh

fbshipit-source-id: 2208d24823bad6e74c8d508f363716d8125b8619
2021-08-04 15:34:01 -07:00
8bbcef5096 Report more information for memory profiling (#61282)
Summary:
Report pointed memory size, total allocated memory, total reserved size all in one report.

`ptr` and `alloc_size` will be used for associating with op trace.
`allocated_size`, `reserved_size` will be used for memory trace.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61282

Reviewed By: ejguan

Differential Revision: D29796282

Pulled By: chaekit

fbshipit-source-id: 5314c867632d3af1fa9a3811b35eaa5e931a5d87
2021-08-04 15:03:14 -07:00
0aee9c0ef8 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D30097148

fbshipit-source-id: 514c22ea52f048bb048a53fa6b5ea57f3ac12250
2021-08-04 14:58:29 -07:00
aed01a991d Add hasattr to torch::deploy interface and hasMethod to PredictorContainer (#62669)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62669

Useful to avoid having to implement null checking on the application side.

Test Plan: Add unit tests

Reviewed By: suo, houseroad

Differential Revision: D30074406

fbshipit-source-id: 881aec735953b43cb24786c1a2d79e8e724928b8
2021-08-04 14:48:34 -07:00
281737ea6f [DDP Communication Hook] Rename 4 Methods of GradBucket Class
Summary:
1. getPerParameterTensors -> getGradients
2. getModelParamsForBucket -> getParameters
3. isTheLastBucketToAllreduce -> IsLast

Test Plan:
Test results for "buck test mode/dev-nosan caffe2/test/distributed:c10d":
https://pxl.cl/1Mrq8

Test results for "buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork":
https://pxl.cl/1MrtP

Reviewed By: SciPioneer

Differential Revision: D30076436

fbshipit-source-id: 0bd1e410186a318ea6328f4c1e830ea5632f8a47
2021-08-04 14:37:23 -07:00
7f1b672b7a Revert D29952381: [Static Runtime] Ensure that unittests only use out variants or native ops
Test Plan: revert-hammer

Differential Revision:
D29952381 (8737e17af2)

Original commit changeset: e60e70b80ccf

fbshipit-source-id: 59dc2f920b7ceaf94ba8f5f36024e7cc710f6645
2021-08-04 14:25:11 -07:00
491d89da1b .github: Fix --no-build-suffix (#62739)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62739

Original flag didn't initially work correctly so this makes it actually
output the right thing

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: janeyx99

Differential Revision: D30107694

Pulled By: seemethere

fbshipit-source-id: 5ff28d6820b9cf7145dbb617b86a941bf7686b5c
2021-08-04 14:19:38 -07:00
de94034328 Fixes #62636 (#62670)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62636.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62670

Reviewed By: ezyang

Differential Revision: D30102179

Pulled By: soulitzer

fbshipit-source-id: 38480463ef354f2c12ed83e6678aed26b0b96efe
2021-08-04 13:58:21 -07:00
8e35df0bf3 det_backward: return svd path for double backward (so that all ci tests pass) (#62570)
Summary:
Potentially fixes https://github.com/pytorch/pytorch/issues/62327 and fixes https://github.com/pytorch/pytorch/issues/62328.
This PR replaces the double backward of det from eig to svd. The latter is slower but should be more stable.

CC anjali411

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62570

Reviewed By: pbelevich

Differential Revision: D30072876

Pulled By: anjali411

fbshipit-source-id: c91b507dbfd6a3ec47dc6d0b0dcfa5f8c8228c30
2021-08-04 13:43:51 -07:00
6f0abba04c [fix] manual_seed{_all}: mem leak (#62534)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/55768

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62534

Reviewed By: nairbv

Differential Revision: D30103294

Pulled By: ezyang

fbshipit-source-id: d871ae869314dfd2d27544a51107ab752abfe452
2021-08-04 13:03:12 -07:00
89f898ebb5 Fix wrong command in README.md (#62472)
Summary:
If it is `[15^,16^)`, 16.10 is not included.
https://github.com/Microsoft/vswhere/wiki/Examples

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62472

Reviewed By: nairbv

Differential Revision: D30103199

Pulled By: ezyang

fbshipit-source-id: 82085627ca53cd5a4e666848d27d4ab062de8352
2021-08-04 12:55:18 -07:00
b454275f47 Support eager mode use of torch.jit.isinstance with multiple types (#60465)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60095

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60465

Reviewed By: soulitzer

Differential Revision: D30093110

Pulled By: ansley

fbshipit-source-id: ee9c654bdb031e9eff4837f9f1d489c81e47cc06
2021-08-04 12:45:24 -07:00
5a1017be97 [profiler] Re-enable test on Windows (#62703)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62703

Re-enable test on Windows

Test Plan: CI

Reviewed By: ezyang

Differential Revision: D30094460

Pulled By: ilia-cher

fbshipit-source-id: 80521f6bc1365d2c252f20b5d0485fc062c8d9c3
2021-08-04 12:32:24 -07:00
8737e17af2 [Static Runtime] Ensure that unittests only use out variants or native ops (#62335)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62335

This change ensures that unittests only use out variants or native ops.

- Our unittests currently assume that a graph fed to the static runtime correctly replaces an interpreter op for its corresponding out variant / native op, but it's not checked by the unittest. This change ensures that.

- We relied on manual inspection of log messages to see if an out variant is used for a specific workload even for unittesting. This change frees us from doing that.

- `aten::add` is excluded from this check since it's only enabled for an internal workload. Also some unittests are excluded by using `expect_interpreter_op  = true` since they are written to use interpreter ops by design.

Test Plan: Ran `buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest` successfully.

Reviewed By: mikeiovine, hlu1

Differential Revision: D29952381

fbshipit-source-id: e60e70b80ccf45e91c6654b4ad53f92ffd5ab702
2021-08-04 11:37:15 -07:00
de77c6a0eb [BE] fix bc check (#62687)
Summary:
a bug was discovered in https://github.com/pytorch/pytorch/issues/62434, for some reason comparing the schema name didn't match the allow_list item. So:
1. remove duplicate regex compile
2. make use of the schema string is used instead of just the name

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62687

Reviewed By: ezyang

Differential Revision: D30102437

Pulled By: walterddr

fbshipit-source-id: 541b2ed77948f24daebb08623cadabb034a241e0
2021-08-04 11:00:22 -07:00
0a66416767 Rename master to main for test-infra references (#62728)
Summary:
Reacting to the main->master switch in test-infra

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62728

Reviewed By: samestep

Differential Revision: D30104777

Pulled By: janeyx99

fbshipit-source-id: a7af7dfc69fd6e02c30ad6c15808a5b32a68c587
2021-08-04 10:45:47 -07:00
90ba71f841 Automated submodule update: FBGEMM (#62688)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: 10ec0d3388

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62688

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: dskhudia

Differential Revision: D30088109

fbshipit-source-id: da8a1e6232e489eac0384faadb71c2dfac5927f7
2021-08-04 10:40:50 -07:00
8bcf01631a [ROCm] update magma (#62502)
Summary:
Update magma to point to magma_ctrl_launch_bounds branch.
When upstream magma branch is used,  cholesky tests in test_ops.py and test_linalg.py
fails due to "Intel MKL ERROR: Parameter 4 was incorrect on entry to DPOTRF."
Suspect commit: [35325212b15c5baadd7493d61b19b2db2635cb68](35325212b1) in magma master.

Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62502

Reviewed By: malfet

Differential Revision: D30089171

Pulled By: seemethere

fbshipit-source-id: b07234ce66d48e3af113640995f923ee586b3cd9
2021-08-04 10:19:55 -07:00
dfdc3069e7 Revert D30072994: [pytorch][PR] [6/n Update test rpc path
Test Plan: revert-hammer

Differential Revision:
D30072994 (ad4e1f1132)

Original commit changeset: 3217e764bd85

fbshipit-source-id: cf89df78a4e04ef03b04ec3c253c5cbb1a1f5f63
2021-08-04 10:14:31 -07:00
34c9f5a8da [DDP Communication Hook] Update get_tensor and set_tensor to be cleaner naming conventions (buffer() and set_buffer()) (#62662)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62662

Replaced the methods set_tensor(.) and get_tensor() in the python exposed API from the C++ logic with buffer() and set_buffer(.) to be a cleaner interface.

Reviewed By: SciPioneer

Differential Revision: D30012869

fbshipit-source-id: bd8efab583dd89c96f9aeb3dd48a12073f0b1482
2021-08-04 09:27:31 -07:00
4b47ea9446 adding a skip for ROCm for a flaky test (#62664)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62664

Skipping a test for ROCm because of issue #62602

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D30079534

Pulled By: NivekT

fbshipit-source-id: a9cf35e5d3a8d218edc9c5a704d1f9599d2f38a6
2021-08-04 07:29:06 -07:00
d1c85d2c06 Move ASAN tests to clang-7 (#62663)
Summary:
This should avoid following false positives:
```
[ RUN      ] ProtoTest.Basic
/var/lib/jenkins/workspace/build/third_party/onnx/onnx/onnx_onnx_torch-ml.pb.h:7060:15: runtime error: member call on address 0x7fffffffdd80 which does not point to an object of type 'google::protobuf::MessageLite'
0x7fffffffdd80: note: object is of type 'onnx_torch::ModelProto'
 00 00 00 00  b0 b9 05 ef ff 7f 00 00  00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  00 00 00 00
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'onnx_torch::ModelProto'
 UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/build/third_party/onnx/onnx/onnx_onnx_torch-ml.pb.h:7060:15 in
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62663

Reviewed By: tktrungna

Differential Revision: D30076315

Pulled By: malfet

fbshipit-source-id: 7bfc2c4b417307195e3c3379e4874eaceb4f3134
2021-08-04 06:26:03 -07:00
773a8eede4 [profiler][refactor] Refactor the usage of legacy profiler implementation (#61931)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61931

This PR consolidates the profiling code around a new C++ implementation
(profiler_kineto.h/cpp) and uses it unconditionally from
torch.autograd.profiler/torch.profiler:
1. Always use profiler_kineto.h/cpp as the C++ implementation
2. Simplify profiler.py to remove unneeded parts depending on legacy
impl
3. Move some of the legacy logic into profiler_legacy.py (to be fully
deleted later)

Test Plan:
USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake
python test/test_profiler.py -v
USE_KINETO=0 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake
python test/test_profiler.py -v

Imported from OSS

Reviewed By: gdankel

Differential Revision: D29801599

fbshipit-source-id: 9794d29f2af38dddbcd90dbce4481fc8575fa29e
2021-08-03 18:51:29 -07:00
5830f122f1 Add docstrings for save_on_cpu hooks (#62410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62410

This PR adds docstrings for CPU hooks introduced in #61928.

Also uncomments the warning about pinned memory in CUDA semantics docs.

Depends on: #62361.

For now docstrings are an orphan page at https://docs-preview.pytorch.org/62410/generated/torch.autograd.graph.set_save_on_cpu_hooks.html#torch-autograd-graph-set-save-on-cpu-hooks

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D29990129

Pulled By: Varal7

fbshipit-source-id: 7a98eeee6a0abb11e2c2d9169cd1aa35ad7ba3f4
2021-08-03 17:53:45 -07:00
5542d590d4 [EZ] Fix type of functional.pad default value (#62095)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62095

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D29879898

Pulled By: jamesr66a

fbshipit-source-id: 903d32eca0040f176c60ace17cadd36cd710345b
2021-08-03 17:47:20 -07:00
d7d399f3df Exposes _aminmax as aminmax and makes it structured (#62401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62401

This PR exposes the `torch._aminmax` operator as `torch.aminmax`.

**TODO**

- [x] add examples to documentation
- [x] add minmax to rst docs

fixes https://github.com/pytorch/pytorch/issues/62164

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D30072246

Pulled By: heitorschueroff

fbshipit-source-id: 557d30af7c28ca6c238c59122367104036429ecd
2021-08-03 16:10:43 -07:00
92f470da08 Revert D30070707: [pytorch][PR] [5/n] Update test distribute path
Test Plan: revert-hammer

Differential Revision:
D30070707 (d8849bdb03)

Original commit changeset: c45f07b7b548

fbshipit-source-id: 867019e95b2898346ba2d918fa7a7291c8125efd
2021-08-03 16:00:56 -07:00
18eeccc7e8 [mypy] Fix Optional type check (#62668)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62668

Test Plan: Imported from OSS

Reviewed By: malfet, 842974287

Differential Revision: D30077960

Pulled By: IvanKobzarev

fbshipit-source-id: 5e423bfb65a65974ed848caa177330d6e61452e6
2021-08-03 16:00:55 -07:00
5a49abfaf1 Revert "Revert D29940705: [fx2trt] Dynamic shape inference support" (#62667)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62667

This reverts commit 053e11f1b39b50fcd7aa7cdd272f7775c7a5e1ba.

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30077961

Pulled By: IvanKobzarev

fbshipit-source-id: a7e76b2d2fa79e6c42a6a87f0a13f62642591fee
2021-08-03 15:59:40 -07:00
34f50c6e35 [Static Runtime] testStaticRuntime verifies that # of nodes is at least 2 (#62622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62622

This allows us to catch cases where an out variant is being tested but the test author forgot to call `.clone()` in the test script. More than 2 ops does not guarantee that the memory planner is being exercised, but less than 2 guarantees that it is not being used.

Reviewed By: hlu1

Differential Revision: D30058050

fbshipit-source-id: 5bc053736f1cc6fd1ffcf8254bf38874ac18c34b
2021-08-03 15:55:57 -07:00
2bddaf6149 Revert D30072859: [pytorch][PR] [4/n] Update vulkan test path
Test Plan: revert-hammer

Differential Revision:
D30072859 (1630b86dd6)

Original commit changeset: bf75faabf6b6

fbshipit-source-id: 3e2672bd19544ed3f1e2a951eb02d58f5c2f9d52
2021-08-03 15:28:04 -07:00
ad4e1f1132 [6/n Update test rpc path (#62526)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62380

* update `test_rpc` function to call wheel install folder {sitepackages}/torch instead of build/ folder
* add IN_WHEEL_TEST to limit the change for linux CI GHA only

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62526

Test Plan: check if all ci workflows pass

Reviewed By: walterddr, seemethere

Differential Revision: D30072994

Pulled By: tktrungna

fbshipit-source-id: 3217e764bd859dc2db597d24a1abb5ec1d0e8c9e
2021-08-03 15:26:54 -07:00
c48dfe0d9f .github: Enable SSH to linux runners (#62280)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62280

Enables SSH to linux GHA runners for FB employees while on the FB VPN

SSH keys will be added to runners when the label "with-ssh" is applied to
your pull request.

Depnds on https://github.com/fairinternal/pytorch-gha-infra/pull/8

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: janeyx99, soulitzer

Differential Revision: D29941681

Pulled By: seemethere

fbshipit-source-id: 9d291f4291eb1d814d4a3473f7daf7f6951ad724
2021-08-03 15:15:39 -07:00
9beb279d84 Add context manager to save tensors on CPU (#61928)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61928

Fix #57100.
Creates a function `torch.autograd.graph.set_save_on_cpu_hooks()` which can be used to register default hooks under which all tensors saved during the forward pass are actually copied* to cpu, then copied back to the appropriate device for the backward pass.

*If the tensor was already on cpu, the entire operation is a no op.

If the tensor is on GPU, we copy the tensor to `pin_memory` during packing so that the unpacking can be done asynchronously.

See [benchmark](https://github.com/pytorch/pytorch/pull/61928#issuecomment-885089279) and [note about training large models](https://github.com/pytorch/pytorch/pull/61928#issuecomment-887009448)

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D29848526

Pulled By: Varal7

fbshipit-source-id: 3d289cddd4fa377bd4884ba0d569fa47c777d9e5
2021-08-03 13:08:37 -07:00
91ef19309e [quant] Input-weight equalization - branch support (#62366)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62366

In the case of models with branches, we are unable to equalize the branching part in the graph.

For example, given this graph:
```
     conv2
    /     \
x -> conv1 -> add
```

After prepare, we will ignore the branched layers (conv1 and conv2) and will not insert the equalization observers. A warning message will also be printed with the layers that are unable to be equalized.
```
                        conv2 -> out_quant_obs2
                       /                       \
x -> input_quant_obs -> conv1 -> out_quant_obs1 -> add
```

Test Plan:
`python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_prepare`

Imported from OSS

Reviewed By: malfet, supriyar

Differential Revision: D29982585

fbshipit-source-id: 706297e7f1861975998dfa83e7ca59af09d80618
2021-08-03 12:45:25 -07:00
62a90c227f Make _Join, _Joinable, _JoinHook public (#62605)
Summary:
**Overview:**
This removes the preceding `_` from `_Join`, `_Joinable`, and `_JoinHook` in preparation for adding the generic join context manager tutorial (see [here](https://github.com/pytorch/tutorials/pull/1610)). This also adds a docs page, which can be linked from the tutorial. [Here](https://github.com/pytorch/pytorch/files/6919475/render.pdf) is a render of the docs page.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62605

Test Plan:
`DistributedDataParallel.join()`:
```
touch /tmp/barrier && TEMP_DIR="/tmp" BACKEND="nccl" WORLD_SIZE="2" gpurun python test/distributed/test_distributed_fork.py -- TestDistBackendWithFork.test_ddp_uneven_inputs TestDistBackendWithFork.test_ddp_uneven_inputs_stop_iteration_sync_bn TestDistBackendWithFork.test_ddp_grad_div_uneven_inputs TestDistBackendWithFork.test_ddp_uneven_input_join_disable TestDistBackendWithFork.test_ddp_uneven_input_exception
```

`ZeroRedundancyOptimizer`:
```
gpurun4 python test/distributed/optim/test_zero_redundancy_optimizer.py
```
NOTE: DDP overlap tests are failing due to a landing race. See https://github.com/pytorch/pytorch/pull/62592. Once the fix is landed, I will rebase, and tests should be passing.

`Join`:
```
gpurun4 python test/distributed/algorithms/test_join.py
```

Reviewed By: mrshenli

Differential Revision: D30055544

Pulled By: andwgu

fbshipit-source-id: a5ce1f1d9f1904de3bdd4edd0b31b0a612d87026
2021-08-03 12:20:11 -07:00
053e11f1b3 Revert D29940705: [fx2trt] Dynamic shape inference support
Test Plan: revert-hammer

Differential Revision:
D29940705 (6b02ad5f82)

Original commit changeset: 1eab53a8cfd5

fbshipit-source-id: 68150a193df6f11389b14a0e8224e1489b29ff0b
2021-08-03 12:03:42 -07:00
ff31389c21 Cast a few vars to void that are otherwise unused
Summary:
llvm-13 marks this as an error when a variable is set but not used.
Evidently this macro doesn't always expand to using the var.  Work around that
here with void casts.

Test Plan: nfc

Reviewed By: drodriguez

Differential Revision: D30062462

fbshipit-source-id: ff868ec74116da99afd539142996d2ffffd399fb
2021-08-03 11:57:57 -07:00
59dd12042e [nnc] Removed const from all fields in IR. (#62336)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62336

This PR was generated by removing `const` for all types of nodes in NNC IR, and fixing compilation errors that were the result of this change.

This is the first step in making all NNC mutations in-place.

Test Plan: Imported from OSS

Reviewed By: iramazanli

Differential Revision: D30049829

Pulled By: navahgar

fbshipit-source-id: ed14e2d2ca0559ffc0b92ac371f405579c85dd63
2021-08-03 11:44:36 -07:00
474d7ec43b [Pytorch Edge] Black Box Compatibility API (#61477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61477

It would be nice if the compatibility api was just kinda plug and play with no care about the internals of the api at all. Thats what this diff aims to provide.

The general usage would be something like
  < On the Client >
  RuntimeCompatibilityInfo runtime_info = get_runtime_compatibility_info();

  .
  .
  .
  < On the Server >
  ModelCompatibilityInfo model_info = get_model_compatibility_info(<model_path>);
  bool compatible = is_compatible(runtime_info, model_info);

Currently RuntimeCompatibilityInfo and ModelCompatibilityInfo are exactly the same, but it seemed feasible to me that they may end up diverging as more information is added to the api (such as a min supported bytecode version being exposed from the runtime).

Test Plan: unit test and ci

Reviewed By: dhruvbird, raziel

Differential Revision: D29624080

fbshipit-source-id: 43c1ce15531f6f1a92f357f9cde4e6634e561700
2021-08-03 11:27:28 -07:00
b7391f44df cast return of cudaGetLastError() to void when discarding (#62518)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62511.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62518

Reviewed By: walterddr, janeyx99

Differential Revision: D30029858

Pulled By: malfet

fbshipit-source-id: d47ce4e507ac800b4e5a5e0a8d9a6fabdfd28e6d
2021-08-03 11:17:22 -07:00
d6048ecd6b Enable bazel builds on ciflow/default (#62649)
Summary:
Add `regenerate.sh` convenience script
Remove "TODO: Reenable on PR" label from workflows which are enabled on PRs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62649

Reviewed By: seemethere

Differential Revision: D30071905

Pulled By: malfet

fbshipit-source-id: c82134cb676b273d23e225be21166588996a31d3
2021-08-03 11:05:41 -07:00
4d5607bb25 [Reland][DDP] log bucket sizes (#62625)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62625

reland of https://github.com/pytorch/pytorch/pull/62232 which ran into a land race.

Test Plan: ci

Reviewed By: SciPioneer

Differential Revision: D30058217

fbshipit-source-id: 1454dd481e630f3de9ec6111b3f2e18cd8976091
2021-08-03 10:55:46 -07:00
1630b86dd6 [4/n] Update vulkan test path (#62519)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62380

* update `test_vulkan` function to call wheel install folder {sitepackages}/torch instead of build/ folder
* add `IN_WHEEL_TEST` to limit the change for `pytorch_linux_test` only
* add symbolic link for shared libraries which are called by the tests (this is a bit hacky and should be fixed the rpath before compiling -- similar to https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L204-L208).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62519

Test Plan: check if all ci workflows pass

Reviewed By: walterddr

Differential Revision: D30072859

Pulled By: tktrungna

fbshipit-source-id: bf75faabf6b6070c366571a74834a1f58b2549d3
2021-08-03 10:24:47 -07:00
ddd916c210 [quant][refactor] Return the models in checkGraphModeFxOp for further checking (#62487)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62487

checkGraphModeFxOp is our utility test function to quantize a given model with FX Graph Mode Quantization
and checks whether the result model contains expected ops, previously it only returns a result on the sample data for the
quantized model, this PR chagnes it to return prepared, quantized, quantized_reference models together with the result
for quantized models.

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: iramazanli

Differential Revision: D30053981

fbshipit-source-id: 31fbce48d138261d0b00ba24e1427fd0c6208990
2021-08-03 10:12:16 -07:00
76c447a730 Remove CUDA10.2 + gcc 9 in CI (#62609)
Summary:
This is an invalid combination because CUDA10.2 does not support gcc>8

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62609

Reviewed By: iramazanli

Differential Revision: D30057292

Pulled By: seemethere

fbshipit-source-id: 7cb0fa8401e80297846b0fcb5e0ecaa435b101be
2021-08-03 10:05:16 -07:00
d8849bdb03 [5/n] Update test distribute path (#62520)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62380

* update `test_distributed` function to call wheel install folder {sitepackages}/torch instead of build/ folder
* add IN_WHEEL_TEST to limit the change for linux CI GHA only

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62520

Test Plan: check if all ci workflows pass

Reviewed By: soulitzer

Differential Revision: D30070707

Pulled By: tktrungna

fbshipit-source-id: c45f07b7b54857dc8e78405714d6d5b864c30868
2021-08-03 09:52:48 -07:00
6b02ad5f82 [fx2trt] Dynamic shape inference support
Summary:
Add a field called `shape_range` to `inputTensorSpec` which allow user to indicate the range of the input shape.

Make all current converters work with dynamic shape expect `layer_norm`. Need to make the layer_norm plugin to be `IPluginV2Ext`.

Some ops only have limited dynamic shape support for now:
- "linear": only support at most 1 dynamic dim. We add full support but I'm thinking breaking down linear to matmul + add.
- "adaptive_avgpool`: right now we lower it to trt avgpool which means we need to know the last two dims to calculate parameters like kernel_size, strides, etc. Follow up would be make a plugin for adaptive avgpool. TRTorch already have one, we can borrow it.

Test Plan: Added unit tests for dynamic shape inference for converter tests.

Reviewed By: jackm321

Differential Revision: D29940705

fbshipit-source-id: 1eab53a8cfd5e8db0be57845062e9794578165d1
2021-08-03 09:44:26 -07:00
b7ac286d0e CMake: Add optional precompiled header support (#61940)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61940

This adds a `USE_PRECOMPILED_HEADERS` option to the CMake build which
precompiles `ATen.h` and also `CUDAContext.h` for the cuda library.
After making a change in `native_functions.yaml`, this speeds up compilation
time by around 15% on my machine.

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D29988775

Pulled By: malfet

fbshipit-source-id: a23c468c958a8b74ebaef052a5b2e5fa3836c64b
2021-08-03 09:13:47 -07:00
2cf4d8128d add OpInfo for torch.nn.functional.mse_loss (#62254)
Summary:
Addresses facebookresearch/functorch#78.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62254

Reviewed By: malfet

Differential Revision: D30013331

Pulled By: zou3519

fbshipit-source-id: e3242cb97d1f061b932e3e0ed589f1ee6a291512
2021-08-03 09:01:09 -07:00
ab8af15545 [Static Runtime] Enabled building Static Runtime tests and benchmarks in OSS CI (#62226)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62226

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D29923800

Pulled By: navahgar

fbshipit-source-id: 33cfe0e92a34c7140ea762e5715301cfbf401434
2021-08-03 08:52:36 -07:00
43327cc197 Refactor commonalities between two approaches (#62624)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62624

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D30058543

Pulled By: andwgu

fbshipit-source-id: 73c794062b75e011868fae264f592549eed67482
2021-08-03 08:43:14 -07:00
e6a3967c2a Add invariant check (bucket indices: 0, 1, ..., k-1) (#62623)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62623

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D30058544

Pulled By: andwgu

fbshipit-source-id: a56910f294c6a40118751eebe255b62700f42be9
2021-08-03 08:13:52 -07:00
87465a6e68 adding operator cumulative_trapezoid (#61615)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* https://github.com/pytorch/pytorch/issues/61616
* **https://github.com/pytorch/pytorch/issues/61615**
* https://github.com/pytorch/pytorch/issues/61475

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61615

Reviewed By: malfet, mruberry

Differential Revision: D29975064

Pulled By: NivekT

fbshipit-source-id: 4d4e98f3efb720fdc44eb238ecbf0fa157ac13d7
2021-08-03 08:04:00 -07:00
b37578b3c0 Make bazel output less verbose in CI (#62601)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62600

Adds `bazel --config=no-tty` that is useful for less verbose output in environments that don't implement full tty like CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62601

Reviewed By: soulitzer

Differential Revision: D30070154

Pulled By: malfet

fbshipit-source-id: 5b89af8441c3c6c7ca7e9a0ebdfddee00c9ab576
2021-08-03 07:59:01 -07:00
3bda4ea842 Avoid unnecessary copying data in Saved Variable (#61927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61927

This is a refactor of `SavedVariable.cpp` to prevent ever defining the `data_` tensor if default hooks are set.

Before the refactor:

```c++
data_ = variable.tensor_data(); // this is wasteful if hooks are defined
register_hooks(Engine::get_default_engine().get_default_saved_variable_hooks());
```

After the refactor:
```c++
if (get_default_hooks_()) {
  save_metadata_(variable);
  register_hooks_(get_default_hooks_(), variable);
  return;
}
save_metadata_(variable);
data_ = variable.tensor_data(); // only needed if hooks are not defined
```

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D29848524

Pulled By: Varal7

fbshipit-source-id: abca1eee37a17b47841e28d8a576490913fce1ce
2021-08-03 07:09:47 -07:00
7edb4f8761 Port cumprod kernel to structured kernels. (#61899)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61899

Tracking issue: #55070

This PR also removes `at::_cumprod`, which was the "backend" for `at::cumprod`.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29939489

Pulled By: ezyang

fbshipit-source-id: d5e4a6dfa6c79e4b135508ea13c2d11bd0684f63
2021-08-03 06:58:13 -07:00
e52325655a Port cumprod kernel to structured kernels. (#61899)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61899

Tracking issue: #55070

This PR also removes `at::_cumprod`, which was the "backend" for `at::cumprod`.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29939152

Pulled By: ezyang

fbshipit-source-id: b3379033a1ffe3c7bc8216d16d089d388ea559ba
2021-08-03 06:57:09 -07:00
c7a7c2b62f Enable Gelu fp32/bf16 in CPU path using Mkldnn implementation (#58525)
Summary:
Enable Gelu bf16/fp32 in CPU path using Mkldnn implementation. User doesn't need to_mkldnn() explicitly. New Gelu fp32 performs better than original one.

Add Gelu backward for https://github.com/pytorch/pytorch/pull/53615.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58525

Reviewed By: ejguan

Differential Revision: D29940369

Pulled By: ezyang

fbshipit-source-id: df9598262ec50e5d7f6e96490562aa1b116948bf
2021-08-03 06:52:23 -07:00
fd8004b42e add bfloat16 impl for nextafter (#61829)
Summary:
Add `BFloat16` support for `nextafter`.

* [x] Add OpInfo
* [x] Add Implementation Test (C++ tests)
* [x] Add credit

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61829

Reviewed By: ejguan

Differential Revision: D29932498

Pulled By: mruberry

fbshipit-source-id: 89524531a4800569ba1addd08a4ace330a6f72a4
2021-08-02 23:16:58 -07:00
2888b7fec5 Fix sign comparison (#62483)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62483

Test Plan: Sandcastle

Reviewed By: albanD

Differential Revision: D30015385

fbshipit-source-id: eefc3208fb8c42ff46b9f4d910eb93c32595fa28
2021-08-02 22:50:39 -07:00
a77be16538 TensorAccessor::bounds_check should be a CPU-only funciton (#62628)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62628

This fixes following errors when ROCm compiler is used
```
caffe2/aten/src/ATen/core/TensorAccessor.h:160:5: error: throw is prohibited in AMP-restricted functions
    TORCH_CHECK_INDEX(
    ^
```

Test Plan: CI

Reviewed By: zhouzhuojie, seemethere

Differential Revision: D30059737

fbshipit-source-id: d094ee608768db41fcc91d044c2c6d7d29f33fe4
2021-08-02 22:46:24 -07:00
e0364ccc33 [caffe2] break one circular dependency between Caffe2 and ATen-cpu (#62632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62632

Update the caffe2/core/context.h to directly use `at::mt19937` instead of the
`at::CPUGeneratorImpl` wrapper class from the ATen-cpu library.

Using `at::CPUGeneratorImpl` causes circular dependencies between the ATen and
caffe2 code.  In particular the `at::CPUGeneratorImpl::get_state()` logic
depends on CPU Tensor functionality that currently depends on code from
caffe2.

Test Plan:
The RNG behavior should be identically to the previous code (perhaps even
faster since we now avoid virtual function calls).

  buck test //caffe2/caffe2:caffe2_test_cpu \
    //caffe2/caffe2/python: //caffe2/caffe2/fb/operators:

Differential Revision: D29915701

fbshipit-source-id: f9b2eab8d3b21b2224d30bcf52be9c0e7eb7cd0a
2021-08-02 22:40:56 -07:00
88af4d8441 Initialize RRefs only when explicitly asked for. (#62618)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62618

ShardedTensor implicitly initialized RRefs to remote shards if the
RPC framework was initialized. Although, there are use cases where the RPC
framework might be initialized for a different purpose but users would not
prefer that ShardedTensor initializes RRefs as well.

As a result, I've made RRef initialization explcitit in ShardedTensor APIs.
ghstack-source-id: 134889287

Test Plan:
1) waitforbuildbot
2) unit tests.

Reviewed By: wanchaol

Differential Revision: D30056833

fbshipit-source-id: 9b2433a38dafa1888589c5b72ed93b6f0ee51639
2021-08-02 22:17:17 -07:00
b58e04f156 Make sure FindLAPACK finds the same BLAS library (#49647)
Summary:
BLAS library is found by cmake/Dependencies.cmake and then
LAPACK library is found by FindLAPACK.cmake which in turn calls
FindBLAS.cmake. This means that we are searching for BLAS twice
and they might be different things. By setting a few variables,
this can be avoided.

cc seemethere

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49647

Reviewed By: seemethere, ejguan

Differential Revision: D29943680

Pulled By: malfet

fbshipit-source-id: 3cbc350ea645a1a28dd92c19e5ee7f9eecdeff59
2021-08-02 20:41:00 -07:00
2d038b5dc8 Cast a var to void that is unused
Summary: The comment above makes it seem intentional, so just ignore it.

Test Plan: NFC

Reviewed By: smeenai

Differential Revision: D30057632

fbshipit-source-id: 45929b4eeeefdf22f5c7c4dd603229635f9da31b
2021-08-02 19:56:41 -07:00
c4196bee93 Save some memory in scatter (#62516)
Summary:
Also removes some redundant parenthesis for clarity.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62516

Reviewed By: andwgu

Differential Revision: D30030546

Pulled By: SciPioneer

fbshipit-source-id: e106486f70b9590bf3dcffb76d23f5725737542f
2021-08-02 18:41:58 -07:00
10d3a2c13a [tensorexpr] Added logging info for SimplifierUnderContext (#62138)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62138

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D29891257

Pulled By: huiguoo

fbshipit-source-id: c36b3d615fa2fe971d022111bef61ee843a9dbea
2021-08-02 18:38:55 -07:00
3a592730d5 [nnc] Simplify i%100 to i if i is less than 100; fixed #52580 (#60693)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60693

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D29375938

Pulled By: huiguoo

fbshipit-source-id: 1388729c5b93805cb156efa53e8823d5462885bf
2021-08-02 18:38:54 -07:00
8f7ae77040 [nnc] Add context-sensitive simplification for div/mod (#60688)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60688

Test Plan: Imported from OSS

Reviewed By: navahgar, ZolotukhinM

Differential Revision: D29373313

Pulled By: huiguoo

fbshipit-source-id: 90d7f2fbfce583b0ea3b0f1c7899e22b0210bd62
2021-08-02 18:37:39 -07:00
c07a123b26 Support saving and loading ShardedTensor. (#62242)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62242

1) Add a state_dict hook to ensure ShardedTensors are
added to a state_dict.
2) Add a pre load state_dict hook to ensure ShardedTensor are added back to a
module at load time.
3) Add a `with_load_process_group` context manager for load time.
4) Added ser-de capability to ShardedTensor.
ghstack-source-id: 134860967

Test Plan:
1) unit tests
2) waitforbuildbot

Reviewed By: wanchaol

Differential Revision: D29927881

fbshipit-source-id: b1ef8872ed91e9cb0e2d5dd17d2764678ab89f0c
2021-08-02 18:33:19 -07:00
dd23372aa5 .circleci: Prefix intermediate build image tags (#62610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62610

Prefixes intermediate build image tags with build- so that ECR lifecycle
policies can automatically clean them up

Policy to automatically cleanup images prefixed with `build-`: b02dd818f9

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: walterddr

Differential Revision: D30055952

Pulled By: seemethere

fbshipit-source-id: 328b9c94ffc02877d088d0118a19c732f580838b
2021-08-02 18:17:14 -07:00
525fa2f0b6 [reland] Catch saved tensors default hooks race condition (#62564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62564

If the user runs code that registers default saved tensor hooks from
multiple threads, it will fail with a nice error message most of the
time. This commit handles the very rare case where a race condition
would have made it fail silently.

Relanding previous PR #61957

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30045406

Pulled By: Varal7

fbshipit-source-id: d04f74c99affbbf655e53cfc2acd42f7c5b4e6eb
2021-08-02 18:00:37 -07:00
f5cf24a224 Fix lint in test_deploy_from_python.py (#62626)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62626

Reviewed By: walterddr, zhouzhuojie, seemethere

Differential Revision: D30059119

Pulled By: malfet

fbshipit-source-id: 2aff44c1585091d864ab7e02d69046204e5b5d17
2021-08-02 17:55:24 -07:00
615ac8e573 Added logic for notifying PTE webapp for Nightly and PR builds (#62512)
Summary:
This PR adds the logic to notify the PTE webapp for DevOps PyTorch Nightly and PR builds.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62512

Reviewed By: iramazanli

Differential Revision: D30046165

Pulled By: malfet

fbshipit-source-id: ef7e4848d4db9f38536a647fcd2d8e26cf64b960
2021-08-02 16:44:35 -07:00
db071ef005 [Reland][DDP Communication Hook] Rename 4 Methods of GradBucket Class (#62592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62592

Reland #62510

`GradBucket` is an important class defined in both C++ and Python, used for PyTorch Distributed Training. We need to rename the following methods for simplicity:
1) get_index -> index
2) is_the_last_bucket_to_allreduce -> is_last,
3) get_per_parameter_tensors -> gradients,
4) get_model_params_for_bucket -> parameters.
ghstack-source-id: 134848352

Test Plan: unit test

Reviewed By: andwgu

Differential Revision: D30049431

fbshipit-source-id: 1bcac331aa30e529b7230e3891bc811c531b0ea9
2021-08-02 16:38:09 -07:00
d228a8fc94 [Vulkan] Softmax Along Channel Dim (#62239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62239

Added naive implementation of vulkan softmax (not using shared memory)

Based off of naive implementation of mean, found here:

2565a33c98/aten/src/ATen/native/vulkan/glsl/mean.glsl

Test Plan:
After building:

```
build/bin/vulkan_api_test
```

{F637001190}

```
[ RUN      ] VulkanAPITest.softmax
[       OK ] VulkanAPITest.softmax (180 ms)
```

Reviewed By: SS-JIA

Differential Revision: D29793150

fbshipit-source-id: 4f9d8e1dae8a43cbcb7063b095fa4726df06c929
2021-08-02 16:20:44 -07:00
940cbbce76 Add BFloat16 support to CPU nansum (#61083)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61083

It's already supported on CUDA, so it seems reasonable to support on CPU as
well. This also changes `test_nansum` to compare against `torch.sum` since numpy
doesn't support BFloat16. Note that `test_nansum_vs_numpy` checks against
NumPy as well, so that's still being tested.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30006227

Pulled By: heitorschueroff

fbshipit-source-id: 1449730e1936417e7de1f8b3cf8cdcc15518873c
2021-08-02 16:03:57 -07:00
27d3d3a7d7 deploy in python fix to work in @opt mode
Summary: if we let torch_deploy get put in libomnibus, it hides the symbols we need to link against

Test Plan: buck test //caffe2/torch/csrc/deploy:test_deploy_from_python -- --exact 'caffe2/torch/csrc/deploy:test_deploy_from_python - test_deploy_from_python (caffe2.torch.csrc.deploy.test_deploy_from_python.TestDeployFromPython)' --run-disabled

Reviewed By: wconstab

Differential Revision: D30031134

fbshipit-source-id: e5c2f740f17abafec7d01c57c664bd71a00b6f61
2021-08-02 14:47:49 -07:00
a4af91b2fe Cleanup CUDA 10.1 and 10.0 support on CI (#62597)
Summary:
10.1 is removed in https://github.com/pytorch/pytorch/pull/56056

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62597

Reviewed By: walterddr

Differential Revision: D30053902

Pulled By: seemethere

fbshipit-source-id: deb148e5e44c12b08c267a36fbd4a1afa138e6e4
2021-08-02 14:42:25 -07:00
305d5fcc05 [Pytorch Edge] get_model_bytecode int -> uint (#62201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62201

change int to uint to be the same type as the runtimes bytecode. Only affects c++ since python doesn't have uints iirc. Also changed the behavior of the functions from returning -1 and a warning to just throw an exception. Wasnt sure what the proper behavior here would be (returning UINT_MAX seemed gross) so feedback is appreciated.

Test Plan: ci

Reviewed By: raziel

Differential Revision: D29914072

fbshipit-source-id: 1bb08702fc301d7c7612b5ad7205a6dbe855c890
2021-08-02 14:17:44 -07:00
0c4c37b01e Disable libtorch testing on MacOS (#62599)
Summary:
Fixes regression introduced by https://github.com/pytorch/pytorch/issues/62402

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62599

Reviewed By: walterddr, driazati

Differential Revision: D30051914

Pulled By: malfet

fbshipit-source-id: a07184b21cc4b2d0ae31fe385bb58a0f665595c6
2021-08-02 13:41:18 -07:00
093495d3f0 [fx] prevent implicit submodule inlining when submodule is a GraphModule (#62436)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62436

## Problem

Given two modules and a tracer that indiscriminately marks all modules as a leaf:
```
class InnerModule(torch.nn.Module):

    def forward(self, t):
        return t + t

class MyModule(torch.nn.Module):
    def __init__(self, inner):
        super().__init__()
        self.inner = inner

    def forward(self, t):
        x = self.inner(t)
        y = self.inner(t)
        return x + y

class MyTracer(torch.fx.Tracer):
    def is_leaf_module(self, module, name):
        return True
```

One might generally expect the following behavior (note call_module nodes):
```
print(">> Outer GraphModule (with inner module as nn.Module):")
inner = InnerModule()
m = MyModule(inner)
gm = torch.fx.GraphModule(m, MyTracer().trace(m))
print(gm.graph.print_tabular())

>> Outer GraphModule (with inner module as nn.Module):
opcode         name     target                   args              kwargs
-------------  -------  -----------------------  ----------------  --------
placeholder    t        t                        ()                {}
call_module    inner    inner                    (t,)              {}
call_module    inner_1  inner                    (t,)              {}
call_function  add      <built-in function add>  (inner, inner_1)  {}
output         output   output                   (add,)            {}
None
```

However, when the inner module is first symbolically traced, the symbolic trace of the outer module ignores `is_leaf_node` entirely, and traces through the whole module (note call_function nodes).
```
print(">> Inner module as GraphModule:")
inner = InnerModule()
inner_gm = torch.fx.GraphModule(inner, MyTracer().trace(inner))
print(inner_gm.graph.print_tabular())

print(">> Outer GraphModule (with inner module as GraphModule):")
m = MyModule(inner_gm)
gm = torch.fx.GraphModule(m, MyTracer().trace(m))
print(gm.graph.print_tabular())

>> Inner module as GraphModule:
opcode         name    target                   args    kwargs
-------------  ------  -----------------------  ------  --------
placeholder    t       t                        ()      {}
call_function  add     <built-in function add>  (t, t)  {}
output         output  output                   (add,)  {}
None

>> Outer GraphModule (with inner module as GraphModule):
opcode         name    target                   args          kwargs
-------------  ------  -----------------------  ------------  --------
placeholder    t       t                        ()            {}
call_function  add     <built-in function add>  (t, t)        {}
call_function  add_1   <built-in function add>  (t, t)        {}
call_function  add_2   <built-in function add>  (add, add_1)  {}
output         output  output                   (add_2,)      {}
None
```

This is surprising behavior and at first glance violates the tracer's intent. As I understand it, `torch.fx.symbolic_trace.Tracer.trace` intends to patch `torch.nn.Module.__call__` with a `module_call_wrapper()` that records a `call_module` node if the module is a leaf, else executes `torch.fx._symbbolic_trace._orig_module_call = torch.nn.Module.__call__`, which is set a module loading time.

**Every submodule should be a leaf, but no `call_module` nodes are created when that submodule is a `GraphModule`. Why?**

Upon further inspection, I found:

- The constructor for GraphModule includes a path to `GraphModule.recompile()` via the setter for a `fx.Graph`:
```
inner_gm = torch.fx.GraphModule(inner, MyTracer().trace(inner))

File "/torch/fx/graph_module.py", line 252, in __init__
self.graph = graph

File "/torch/nn/modules/module.py", line 1183, in __setattr__
object.__setattr__(self, name, value)

File "/torch/fx/graph_module.py", line 277, in graph
self.recompile()
```
- `recompile()` wraps the `__call__` method by holding a reference to the `__call__` method at the time of recompilation:
```
cls = type(self)
cls_call = cls.__call__
...
def wrapped_call(self, *args, **kwargs):
    try:
        return cls_call(self, *args, **kwargs)
    except Exception as e:
        ...
cls.__call__ = wrapped_call
```
- Recompilation of the inner GraphModule happens on initialization, before creation or tracing of the outer module. Adding some old-fashioned print debug statements gives:
```
Inner Module:
_orig_module_call: <function Module._call_impl at 0x7faaebfee8b0>
recompile: cls.__call__ now wraps _orig_module_call, <function Module._call_impl at 0x7faaebfee8b0>

Outer Module:
_orig_module_call: <function Module._call_impl at 0x7faaebfee8b0>
tracing: patching method <class 'torch.nn.modules.module.Module'>.__call__ <function Module._call_impl at 0x7faaebfee8b0> with <function Module._call_impl at 0x7fa9d42bce50>

outer module MRO before tracing:
(0) <class '__main__.MyModule'>: <function Module._call_impl at 0x7faaebfee8b0>
(1) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7faaebfee8b0>
(2) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00>

outer module MRO during tracing:
(0) <class '__main__.MyModule'>: <function Module._call_impl at 0x7fa9d42bce50>
(1) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7fa9d42bce50>
(2) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00>

inner module MRO before tracing:
(0) <class 'torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl'>: <function x.y.z.wrapped_call at 0x7fa9d42a8670>
(1) <class 'torch.fx.graph_module.GraphModule'>: <function Module._call_impl at 0x7faaebfee8b0>
(2) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7faaebfee8b0>
(3) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00>

inner module MRO during tracing:
(0) <class 'torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl'>: <function x.y.z.wrapped_call at 0x7fa9d42a8670>
(1) <class 'torch.fx.graph_module.GraphModule'>: <function Module._call_impl at 0x7fa9d42bce50>
(2) <class 'torch.nn.modules.module.Module'>: <function Module._call_impl at 0x7fa9d42bce50>
(3) <class 'object'>: <method-wrapper '__call__' of type object at 0x7fac3cd15f00>
```

- The outer module is patched correctly, but the inner module's first element in its MRO is the `wrapped_call` from `recompile` that still invokes `<function Module._call_impl at 0x7faaebfee8b0>` directly. Therefore, no call_module nodes are created.

## In Practice

In practice, this behavior affects the ability of `torch.package` to package `GraphModules` whose submodules are `GraphModules`. In our case, the `GraphModule` submodules are not passed through a constructor, but created separately and installed on the root `GraphModule` via `setattr`. This means that prior to packaging, there appear to be no issues with the module, since the root's graph was created before any call_module targets were replaced with `GraphModules`.

When unpackaging such a model with `torch.package`, `torch.fx.graph_module._deserialize_graph_module` uses an inline `KeepModules` tracer that sets all submodules to leaves; the unpackaged module is implicitly and surprisingly inlined in the process.

## Potential Solution

This behavior was previously not understood by us, and so the current workaround is a gnarly process of wrapping all submodules with a `nn.Module` with a manually installed forward method.

Changing `wrapped_call` to return `return super(type(self), self).__call__(*args, **kwargs)` whenever `__call__` is inherited at least appears to solve the issue. Does this seem like an acceptable approach?

## Other Thoughts
- Repeated calls to `recompile` create nested `wrapped_calls`, all for the purpose of error handling. This seems probably unnecessary ¯\\_(ツ)\_/¯
- If a root module with a overriden `__call__` method is symbolically traced, it is ignored

Test Plan:
```
buck test:
    ✓ ListingSuccess: caffe2/test:fx - main (12.570)
    ✓ Pass: caffe2/test:fx - test_tracing_graphmodules_as_leaf_submodules (test_fx.TestFX) (11.982)
```

Reviewed By: ansley

Differential Revision: D29997935

fbshipit-source-id: 1988fbb025b14188da26a3e73e94fb789c3c1f74
2021-08-02 13:37:08 -07:00
dc1bd6acee Remove PROCESS GROUP rpc backend (#62411)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62411

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29990408

Pulled By: H-Huang

fbshipit-source-id: 183d3b316767b12993cebbe32b73c2850fd1cc42
2021-08-02 12:26:22 -07:00
2ec4f69b48 [DDP Comm Hook] Do not expose hook_then_optimizer as a public method (#62532)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62532

This method is not stable at this time, so avoid releasing it when DDP communication hook feature is released as a stable feature.
ghstack-source-id: 134787831

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_hook_with_optimizer_parity
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_hook_then_optimizer_nccl

Reviewed By: rohan-varma

Differential Revision: D30031222

fbshipit-source-id: e03a8e13fee5116a5ddd724eb76316ee98f2a676
2021-08-02 12:25:01 -07:00
b161ac541d [reland] Add default Saved Variable hooks (#62563)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62563

Expose a pair of functions to Python users: torch.autograd.graph.set_saved_tensors_default_hooks(pack, unpack) and torch.autograd.graph.reset_saved_tensors_default_hooks().
These functions control the hooks applied to saved tensors: all tensors saved in that context will be packed using the pack function, then unpacked accordingly when needed.

Currently, this works by simply calling register_hooks (cf #60975) directly at the end of the constructor of a SavedVariable. This could be optimized further by not performing the copy before registering default hooks, but this would require a small refactor. Edit: the refactor is done in #61927.

A current limitation is that if users create tensors in this context, they will not be able to register additional hooks on the saved tensor.

For instance, to perform something like #28997, one could define a pack function that saves to disk whenever the tensor size is too big and returns a filename, then unpack simply reads the content of the file and outputs a tensor, e.g.:

```
def pack(x):
    name = os.path.join(tmp_dir, str(uuid.uuid4()))
    torch.save(x, name)
    return name

def unpack(name):
    return torch.load(name)
```

Relanding previous PR: https://github.com/pytorch/pytorch/pull/61834

Original PR led to timeout error in: https://www.internalfb.com/mast/job/yuguo-release_canary_offline_training-inlinecvrp_a-canary_offline_train_28a7ecfc

Now passing: https://www.internalfb.com/mast/job/quach-release_canary_offline_training-inlinecvrp_a-canary_offline_train_9bb57e98

The difference with the new version is we don't need to acquire the GIL when calling `PyDefaultSavedVariableHooks::get_hooks`.

Test Plan: Imported from OSS

Reviewed By: iramazanli

Differential Revision: D30045405

Pulled By: Varal7

fbshipit-source-id: 7f6c07af3a56fe8835d5edcc815c15ea4fb4e332
2021-08-02 11:30:26 -07:00
6f95850127 Revert D30024161: [DDP Communication Hook] Rename 4 Methods of GradBucket Class
Test Plan: revert-hammer

Differential Revision:
D30024161 (29c8b1db57)

Original commit changeset: 07e6072a2f7b

fbshipit-source-id: d571c2caadaf7b71fe2aba3c0597bd8074d153de
2021-08-02 10:26:54 -07:00
2e4f566d30 add OpInfo for torch.nn.functional.softplus (#62317)
Summary:
Addresses facebookresearch/functorch#78.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62317

Reviewed By: malfet

Differential Revision: D30013322

Pulled By: zou3519

fbshipit-source-id: e80affd10b81534234694c9e4326cc68c7efc7fe
2021-08-02 09:46:13 -07:00
cb626da145 [fix] mark non-differentiable ops (#62529)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62506
Fixes https://github.com/pytorch/pytorch/issues/62504

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62529

Reviewed By: albanD

Differential Revision: D30032665

Pulled By: malfet

fbshipit-source-id: 90254c50fb4a873e3eda59c8484626137e01cb31
2021-08-02 09:40:45 -07:00
562b555a2b [CUDA] Fix typo in Normalization.cu (#62515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62515

**Summary**
This commit fixes an obvious typo in `Normalization.cu` I found while
working on #62452. Since that PR will not be landed anytime soon, I
thought it would be prudent to land this fix.

**Test Plan**
Continuous integration.

Test Plan: Imported from OSS

Reviewed By: makslevental

Differential Revision: D30027324

Pulled By: SplitInfinity

fbshipit-source-id: 9d368a54c13f8e2bf6f6956dfb2bee974226f726
2021-08-02 09:38:46 -07:00
29c8b1db57 [DDP Communication Hook] Rename 4 Methods of GradBucket Class (#62510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62510

`GradBucket` is an important class defined in both C++ and Python, used for PyTorch Distributed Training. We need to rename the following methods for simplicity:
1) get_index -> index
2) is_the_last_bucket_to_allreduce -> is_last,
3) get_per_parameter_tensors -> gradients,
4) get_model_params_for_bucket -> parameters.

Test Plan:
Local run comprehensive test with following results:
https://pxl.cl/1Ml8b
For two timeout failure test cases, most likely environment related and fail in my devserver.

Reviewed By: SciPioneer

Differential Revision: D30024161

fbshipit-source-id: 07e6072a2f7b81f731425d9b71f8c8b60d383b0f
2021-08-02 09:33:32 -07:00
34cb2b5d04 Update SobolEngine docstring w/ correct behavior (#62548)
Summary:
Sobol was modfied to not drop the first point. This update reflects this behavior in the docstring.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62548

Reviewed By: qingfeng10

Differential Revision: D30035627

Pulled By: Balandat

fbshipit-source-id: 64c659ea30c0c929778da3b58041875834e25e9a
2021-08-02 09:04:38 -07:00
2445d5c60a Removed the hypothesis tests for adaptive_avg_pool (#60886)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60886

Remove all the hypothesis tests from test_adaptive_avg_pool2d_nhwc, test_adaptive_avg_pool, and test_adaptive_avg_pool3d_ndhwc.

Test Plan: I tested it with buck test //caffe2/test:quantization and all three tests passed. The tests that failed are test_conv2d_api (test_quantized_functional.py), test_conv3d_api (test_quantized_functional.py),

Reviewed By: wanchaol, jerryzh168

Differential Revision: D29432184

fbshipit-source-id: 2a4c540d2c169aec69cf8d143d5a155394885745
2021-08-02 08:57:14 -07:00
3dc588d577 Fix: no enough space for cu102 debug nightly build (#62465)
Summary:
Fixes #{issue number}
![image](https://user-images.githubusercontent.com/16190118/127632173-783630b7-c644-4239-b1dd-fb12b6bacf83.png)

verification:
https://app.circleci.com/pipelines/github/pytorch/pytorch/358483/workflows/a34a0123-54fe-418f-9211-4af75ee56a70/jobs/15120463

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62465

Reviewed By: iramazanli

Differential Revision: D30045280

Pulled By: janeyx99

fbshipit-source-id: f40090eb02fd1d86033971611d492c7b107cc4bd
2021-08-02 08:44:16 -07:00
51f687fd4b Add overlap with DDP to ZeRO (two approaches) (#62157)
Summary:
**Overview:**
This adds two approaches to overlapping `DistributedDataParallel.backward()` with `ZeroRedundancyOptimizer.step()` by providing two hook constructors: `hook_with_zero_step()` and `hook_with_zero_step_interleaved()`. The former waits for all backward computation to finish before starting optimizer computation, while the latter launches a partial optimizer computation using the contents of a gradient bucket once that bucket's all-reduce completes. The two approaches each suffer from their own weaknesses, and which one to use depends on the specific hardware configuration.

Both approaches can share changes to `ZeroRedundancyOptimizer`. A user should pass `overlap_with_ddp=True` to `ZeroRedundancyOptimizer`, construct a DDP communication hook using either `hook_with_zero_step()` or `hook_with_zero_step_interleaved()`, and register that communication hook. `ZeroRedundancyOptimizer.step()` should still be called in the training loop, though the optimizer computation and communication will be offloaded to originate from the communication hook. Currently, the first two iterations are vacuous, meaning they do not result in parameter updates and the inputs are ignored. This is required to finalize the DDP bucket strategy and to then initialize the `ZeroRedundancyOptimizer`'s local optimizer based on that bucketing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62157

Test Plan:
The existing `ZeroRedundancyOptimizer` tests pass, and new unit tests for both hooks pass:
- ~~`test_ddp_with_zero_step_parity_cpu`~~ (removed for now due to flakiness in CI -- under investigation, could possibly be similar Gloo issue as with `hook_with_zero_step_interleaved()`)
- `test_ddp_with_zero_step_parity_gpu`
- `test_ddp_with_zero_step_interleaved_parity_gpu`

These were tested on the AI AWS cluster.

An analogous `test_ddp_with_zero_step_interleaved_parity_cpu` is missing due to existing bugs with Gloo. See https://github.com/pytorch/pytorch/pull/62302.

Both approaches have been verified using an internal accuracy benchmark.

Reviewed By: mrshenli

Differential Revision: D29971046

Pulled By: andwgu

fbshipit-source-id: a7234c23c7ea253f144a698fd7e3c0fe039de5e8
2021-08-02 08:33:34 -07:00
ee482edf0a Callable activation function support for Transformer modules (C++) (#62342)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60747

Enhances the C++ versions of `Transformer`, `TransformerEncoderLayer`, and `TransformerDecoderLayer` to support callables as their activation functions. The old way of specifying activation function still works as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62342

Reviewed By: malfet

Differential Revision: D30022592

Pulled By: jbschlosser

fbshipit-source-id: d3c62410b84b1bd8c5ed3a1b3a3cce55608390c4
2021-08-02 08:06:39 -07:00
c9d5325c52 [BE] shorten the name part 1 (#62402)
Summary:
This should address part of https://github.com/pytorch/pytorch/issues/62357.

1. rename all files 'generated-*' to make it clear, filename will not be in CI workflow name
2. remove all 'pytorch-' in names
3. make sure the build test shell scripts are adopted to new name

Next change should reduce more device related naming

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62402

Reviewed By: malfet

Differential Revision: D30021959

Pulled By: walterddr

fbshipit-source-id: 64b21a2020e25a507101c09c010cb593d8ac4146
2021-08-02 07:56:55 -07:00
7565039ee9 Support system-provided Intel TBB (#61934)
Summary:
This PR: (1) enables the use of a system-provided Intel TBB for building PyTorch, (2) removes `tbb:task_scheduler_init` references since it has been removed from TBB a while ago (3) marks the implementation of `_internal_set_num_threads` with a TODO as it requires a revision that fixes its thread allocation logic.

Tested with `test/run_test`; no new tests are introduced since there are no behavioral changes (removal of `tbb::task_scheduler_init` has no impact on the runtime behavior).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61934

Reviewed By: malfet

Differential Revision: D29805416

Pulled By: cbalioglu

fbshipit-source-id: 22042b428b57b8fede9dfcc83878d679a19561dd
2021-08-02 07:39:00 -07:00
bbf6131159 Add factory kwargs test to test_modules (#62340)
Summary:
Adds a new `ModuleInfo`-based test to `test_modules.py`.

The test passes `device` and `dtype` to each module during instantiation, ensuring that the kwargs are applied to any newly-created parameters or buffers. Note that the `device` and `dtype` kwargs should only be present when a module creates parameters or buffers; the test uses some mock magic to identify this.

Originally lifted from `test/test_module_init.py`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62340

Reviewed By: malfet

Differential Revision: D30022543

Pulled By: jbschlosser

fbshipit-source-id: 77e5d46d6b11c16dc39d19a1c650ee48c26c54c1
2021-08-02 06:53:00 -07:00
46b18aa294 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D30039182

fbshipit-source-id: 3b38fc89585853bb9a5483a0de9ebd6852154a8d
2021-08-02 04:17:10 -07:00
aa5e3ad705 [quant] Support PerChannel quantization in FusedMovingAvgObsFakeQuantize (#62346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62346

Update the operator code to resize the min/max tensors if per-channel quant is selected. We need to do this because by default the observer creates empty tensors for min/max and scale/zero_point values when per-channel quantization is enabled

Test Plan:
python test/test_quantization.py test_fused_mod_per_channel

Imported from OSS

Reviewed By: HDCharles

Differential Revision: D30003835

fbshipit-source-id: b5ec80261cb50ee543f21191a887e979dcde4667
2021-08-01 21:45:11 -07:00
7adb78017a [countbuild][xplat/caffe2] contbuild with sanitizers (#61724)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61724

To improve the stability of xplat/caffe2 code, we are enabling sanitizers (asan, tsan, ubsan) on contbuild.
ghstack-source-id: 134339882

Test Plan:
```
buck test --show-output --flagfile fbsource//fbcode/mode/dev-asan --config fbsource.sanitizer=address fbsource//xplat/pytorch_models/build/pytorch_model_test/v13:body_tracking_v124_test
clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument]

Downloaded 0/7 artifacts, 0.00 bytes, 100.0% cache miss
Building: finished in 14.5 sec (100%) 4538/4538 jobs, 4 updated
  Total time: 14.5 sec
Testing: finished in 1.1 sec (1 PASS/0 FAIL)
RESULTS FOR //xplat/pytorch_models/build/pytorch_model_test/v13:body_tracking_v124_test
PASS      1.0s  1 Passed   0 Skipped   0 Failed   //xplat/pytorch_models/build/pytorch_model_test/v13:body_tracking_v124_test
TESTS PASSED
```

```
buck test --show-output --flagfile fbsource//fbcode/mode/dev-tsan --config fbsource.sanitizer=thread fbsource//xplat/pytorch_models/build/ads_mai_test_train/v4:model_test
clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument]

Downloaded 3/19 artifacts, 88.30 Kbytes, 66.7% cache miss
Building: finished in 24.0 sec (100%) 4609/4609 jobs, 9 updated
  Total time: 24.9 sec
Testing: finished in 0.9 sec (1 PASS/0 FAIL)
RESULTS FOR //xplat/pytorch_models/build/ads_mai_test_train/v4:model_test
PASS     808ms  1 Passed   0 Skipped   0 Failed   //xplat/pytorch_models/build/ads_mai_test_train/v4:model_test
TESTS PASSED
````

Reviewed By: dhruvbird, albanD

Differential Revision: D29348099

fbshipit-source-id: 3d3255bff0464129745d2ed729d443f3e7470313
2021-08-01 12:02:30 -07:00
32b37ba246 [DDP Communication Hook] Update the typing info of comm hook output as well as some docstring (#62457)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62457

Specify `Future[torch.Tensor]` as DDP communication hook return type, which should be explicitly a single tensor. The previous API takes a list that has a single tensor.

Note that now the typing info no longer accepts the internal type of `torch._C.Future`, which does not support torchscript and hence cannot support `Future[torch.Tensor]`.
ghstack-source-id: 134771419

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_default_ddp_comm_hooks_nccl
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_invalid_comm_hook_return_type

Reviewed By: rohan-varma

Differential Revision: D30007390

fbshipit-source-id: 246667c9b575b4c6e617b0a5b373151f1bd81e7f
2021-07-30 20:51:34 -07:00
72295da6c3 Reformat (#62456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62456

as title
ghstack-source-id: 134771417

Test Plan: N/A

Reviewed By: rohan-varma

Differential Revision: D30006493

fbshipit-source-id: 1d1dc9cfff69a9b4fa31470177c1f4fa206a94ef
2021-07-30 20:50:19 -07:00
c506769f19 irange-ify 8 (#62422)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62422

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29879655

fbshipit-source-id: 69fdf0196091932f866bfaba707ad7643790fdd8
2021-07-30 20:31:58 -07:00
bd9f35313a Revert D29922299: [DDP] log bucket sizes
Test Plan: revert-hammer

Differential Revision:
D29922299 (5429f68f00)

Original commit changeset: 538b331c96e7

fbshipit-source-id: 3595fe04e8dea38bc9d05e8c70f2dcd2ad496ced
2021-07-30 20:27:36 -07:00
9df7ac7a94 Port nll_loss_backward to structured (#62144)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62144

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D29945279

Pulled By: SplitInfinity

fbshipit-source-id: 2fee60e8424fc590a81767c9b0a8226a0c2fd69c
2021-07-30 19:43:10 -07:00
5429f68f00 [DDP] log bucket sizes (#62232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62232

Logs the bucket sizes in DDP logging so that we know which workflow ran with what bucket size config. Will be used to verify how changing bucket sizes in DDP affects perf.

Based on the test, we can see inconsistency where the "first" bucket size actually is (last before rebuild buckets, first after).
ghstack-source-id: 134663867

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D29922299

fbshipit-source-id: 538b331c96e77048164ad130b377433be100a761
2021-07-30 18:07:04 -07:00
63d3da7961 Fix sign comparison (#62194)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62194

Reviewed By: albanD

Differential Revision: D29885396

Pulled By: r-barnes

fbshipit-source-id: 8092f3002474a48fc6b349b9e369c8d59e832fcc
2021-07-30 17:18:05 -07:00
2006dc6316 [3/N] Remove unittest.skip from torch/testing/_internal distributed files. (#61991)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61991

Continuation of https://github.com/pytorch/pytorch/pull/61887 and
removing unittest.skip as much as possible.
ghstack-source-id: 134759368

Test Plan: waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D29831860

fbshipit-source-id: fe57a7d56d4423924a2dec10bb670137ace0c9a4
2021-07-30 16:40:43 -07:00
7521addede [deploy] loader cleanup (#62223)
Summary:
Some refactoring of the custom loader logic:

* Make sure we unregister frames when they are deleted so that future exceptions do not attempt to read unallocated memory
* rename linker -> loader to make its name more correct
* move the build of the loader into lib deploy since it can be shared across interpreters
* unify the logic for finding the library symbol across ops and fbcode

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62223

Test Plan: Imported from OSS

Reviewed By: wconstab

Differential Revision: D29922002

Pulled By: zdevito

fbshipit-source-id: b7f8ee5812e29a5d098fcf1bd9f4cea7d30ecb4c
2021-07-30 16:34:13 -07:00
174433267c [dte] fastpath implementation for broadcast utility function (4/x) (#62493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62493

This diff adds a broadcast fastpath for the caffe2 broadcast utility function, which just copies the contents of a smaller tensor into a larger one. We also update the tests to exercise the new functionality.

Test Plan: unit tests + let CI run

Differential Revision: D29938285

fbshipit-source-id: 543ecc548500380e307be91902696033454964a2
2021-07-30 16:15:10 -07:00
08539ca047 Add non-context manager usage support for profiler (#61690)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60238, https://github.com/pytorch/kineto/issues/329

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61690

Reviewed By: malfet

Differential Revision: D30016561

Pulled By: ngimel

fbshipit-source-id: 93a578ffbb556f4b584213ac9cfafcc5cf0a9270
2021-07-30 15:54:36 -07:00
6441caeaa7 Use multi-dimensional cuFFT transforms to improve FFT performance (#61203)
Summary:
Benchmark and numerical accuracy tests on A100 and V100 are available at https://github.com/xwang233/code-snippet/tree/master/fft-61203.

I've checked the FFT results for different shapes/dims and different `dim` arg for `rfftn` and `irfftn` before and after this PR, and they all numerically matched.

With this PR, about 10%~15% performance gain is expected on commonly used shapes and dims.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61203

Reviewed By: heitorschueroff

Differential Revision: D29996244

Pulled By: zou3519

fbshipit-source-id: 02c9862eaa1ad8f2ae9c7f7448aeb9e23bcda276
2021-07-30 14:54:04 -07:00
73c46092f1 [pytorch] sort the output of the model_dump util (#62485)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62485

Make it easier to browse the code section by sorting the files by name.

Test Plan: Imported from OSS

Reviewed By: dhruvbird, malfet

Differential Revision: D30016245

Pulled By: ljk53

fbshipit-source-id: c9cb3c1ad9bcaa5337a6ad5c575ac0c240751f6c
2021-07-30 14:40:07 -07:00
49060aa81a Revert D29999785: Reland D29943356: .github: Migrate ecr_gc to github actions
Test Plan: revert-hammer

Differential Revision:
D29999785 (49dc827712)

Original commit changeset: bb9285076551

fbshipit-source-id: c26b39fb2d3c361015ce7f205d3f5f4232845289
2021-07-30 14:33:12 -07:00
43d4fe68cd [Foreach] support implicit broadcasting in slow path (#62167)
Summary:
This PR has foreach functions support implicit broadcasting via slow path.

rel: https://github.com/pytorch/pytorch/issues/58833

cc: ptrblck  ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62167

Reviewed By: mruberry

Differential Revision: D30005109

Pulled By: ngimel

fbshipit-source-id: f48c0a13e304411763541ffcfcfc6154adb26bac
2021-07-30 13:29:56 -07:00
70f57bcb1e [PyTorch] Fix quantized Conv1d module parameters (#62356)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62356

In `torch/nn/quantized/module/conv.py`, Conv1d is making a scaler `kernel_size` into a tuple with size 2 by repeating `kernel_size` value. This logic is breaking `Conv1d` because internally it's unsqueezing the input with shape N, C, L to N, C, 1, L in [`qconv.cpp`](06dfaadfc6/aten/src/ATen/native/quantized/cpu/qconv.cpp (L841)). Applying aforementioned kernel to this input shape will produce negative output shape in [`ConvUtils.h`](203f7ff6e0/include/fbgemm/ConvUtils.h (L118-L119)), if kernel_size > 1.

Here I'm modifying the processing logic for `kernel_size` and a few other parameters, to follow the pattern of [`torch/nn/module/conv.py`](aae2a3c95e/torch/nn/modules/conv.py (L284-L287)).

Test Plan: Rely on unit test

Reviewed By: kimishpatel

Differential Revision: D29957556

fbshipit-source-id: ae13f7ca892d60b82cfffdf972cce422ebfaae8e
2021-07-30 12:27:52 -07:00
eac288ea77 [Pytorch Backend Delegation] Annotate function args with type information (#62433)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62433

Without type information, default type is Tensor which may conflict at runtime.

Test Plan: CI

Reviewed By: raziel

Differential Revision: D29990902

fbshipit-source-id: 0a38843d7d0612a458bb38fad7c86bad08c7197b
2021-07-30 11:34:40 -07:00
f16c73b9f3 Improve error messages of torch.testing.assert_close for sparse inputs (#61583)
Summary:
This utilizes the feature introduced in https://github.com/pytorch/pytorch/issues/60091 to modify the header of the error message.

Before:

```python
AssertionError: Tensor-likes are not equal!

Mismatched elements: 1 / 2 (50.0%)
Greatest absolute difference: 1 at index 1
Greatest relative difference: 0.3333333432674408 at index 1

The failure occurred for the values.
```

After:

```python
AssertionError: Sparse COO values of tensor-likes are not equal!

Mismatched elements: 1 / 2 (50.0%)
Greatest absolute difference: 1 at index 1
Greatest relative difference: 0.3333333432674408 at index 1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61583

Reviewed By: malfet

Differential Revision: D30014797

Pulled By: cpuhrsch

fbshipit-source-id: 66e30645e94de5c8c96510822082ff9aabef5329
2021-07-30 11:23:26 -07:00
8a9dfa52e9 Delete an unused variable
Summary: This was set twice but never used. Delete it.

Test Plan: NFC

Reviewed By: smeenai

Differential Revision: D30000794

fbshipit-source-id: 084d16da914febec58c4cb5f452c37027275cd08
2021-07-30 11:10:38 -07:00
73ba166e2a fix(elastic-docs): Fix elastic launch doc (#62378)
Summary:
The documentation link should be https://pytorch.org/docs/stable/elastic/run.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62378

Reviewed By: aivanou

Differential Revision: D30002830

Pulled By: kiukchung

fbshipit-source-id: 34b434acaa10222561df43f6397a2420eef02015
2021-07-30 10:58:13 -07:00
635e63c53d irange-ify 15 (#62123)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62123

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29879765

fbshipit-source-id: eda8e641e9fd06e16ad71b8144332f253537955a
2021-07-30 10:41:33 -07:00
3c0c1c4ecb Fix incorrectly sized tensors for svd when full_matrices=False (#62022)
Summary:
Before this PR for m x n input matrix, the return matrices were always allocated as m x m and n x n and then narrowed.
This unnecessarily requires a lot of memory that is then discarded.
With this PR when `compute_uv=True and full_matrices=False` correctly sized tensors are allocated. Moreover, if `compute_uv=False` U, V matrices are not allocated as they are not needed. However, cusolver's gesvdj routines fail when these matrices are not allocated, which is a bug, so this allocation is done separately in cusolver specific code path.

MAGMA doesn't work for this input because it tries to allocate a large matrix internally (ROCm doesn't work as it uses MAGMA). Example error:
```
CUBLAS error: memory mapping error (11) in magma_sgelqf at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgelqf.cpp:161
CUBLAS error: out of memory (3) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145
CUBLAS error: not initialized (1) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145
MAGMA error: function-specific error, see documentation (1) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145
MAGMA error: function-specific error, see documentation (1) in magma_sgeqrf2_gpu at /opt/conda/conda-bld/magma-cuda110_1598416697386/work/src/sgeqrf2_gpu.cpp:145
python: /opt/conda/conda-bld/magma-cuda110_1598416697386/work/interface_cuda/interface.cpp:806: void magma_queue_create_internal(magma_device_t, magma_queue**, const char*, const char*, int): Assertion `queue->dAarray__ != __null' failed.
Aborted (core dumped)
```

Fixes https://github.com/pytorch/pytorch/issues/61949.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62022

Reviewed By: heitorschueroff

Differential Revision: D29994429

Pulled By: ngimel

fbshipit-source-id: c3f7744d7adc5fd6787f6cbb1ec41405f89a6d4c
2021-07-30 10:27:13 -07:00
26d2f4acb2 Quick fix to make torch.tensor work with functorch (#62423)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62423

Fixes https://github.com/facebookresearch/functorch/issues/7.

functorch uses FuncTorchDynamicLayerBackMode as a mode key to wrap all
tensors returned from operators in special TensorWrapper tensor
extension.

The problem with this is that TensorWrapper does not have storage so
accessing the data_ptr (for recursive_store) internal asserts.

As a quick hack, the guard added prevents functorch from wrapping the
empty tensor in a TensorWrapper and instead when `tensor.to` is called later,
the tensor gets wrapped. This is effectively what Ed proposed in
https://github.com/facebookresearch/functorch/issues/7#issuecomment-847501020

In the long term we probably want some better way of extending
`internal_new_from_data` for cases like this (where there is a
mode-based dispatch key for a C++ tensor extension -- the Python case
may be different).

Test Plan: - Verified that this fixes functorch's problem

Reviewed By: malfet

Differential Revision: D29992607

Pulled By: zou3519

fbshipit-source-id: 82b713156a37d7470f8fc46e3803ee7353689a33
2021-07-30 10:15:23 -07:00
8c4d8c29e4 [2/n] add test ATen to wheel test (#62341)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62380

* This PR introduces env variable IN_WHEEL_TEST to control the dependency on `build/` folder
* update `test_aten` function to call wheel install folder `{sitepackages}/torch` instead of `build/` folder

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62341

Test Plan: check if all ci workflows pass

Reviewed By: walterddr

Differential Revision: D30004259

Pulled By: tktrungna

fbshipit-source-id: ccebd513a3530f1e5c8c9177d5f2daf14de3e853
2021-07-30 10:09:09 -07:00
d08165dfdf [fx2trt] Add op converters for ads 23x dense arch
Summary:
Adding 4 converters for
1. torch.addmm
2. torch.mul
3. torch.t
4. torch.sigmoid

Test Plan:
fx2trt unittests

Able to lower dense arch with fx2trt locally.

Reviewed By: ajtulloch, yinghai

Differential Revision: D29563962

fbshipit-source-id: 114c4e871efb25379043224f5f0116829cd7dc50
2021-07-30 09:26:11 -07:00
d783617216 enable warnings on cuda synchronization (#62092)
Summary:
This creates `torch.cuda.set_warn_on_synchronization()` function that would warn or error when synchronizing operation is performed. We could wrap it in a context manager for ease of use, but it would be a lie, because it sets global, and not thread-local state. Since it's intended for debugging, maybe that's ok though.
As all `torch.cuda.*` functions, it's going through CPython, not pybind, so the argument is converted to long before being passed to c10 function. I'll make python argument a python enum class, but without pybind it'll still have to go thourgh long conversion.

For a test script
```
import torch
torch.cuda.set_warn_on_synchronization(1)
x=torch.randn(10, device="cuda")
x.nonzero()
y=torch.randn((), device="cuda")

if y:
    print("something")
torch.multinomial(x.abs(), 10, replacement=False)
torch.randperm(20000, device="cuda")
ind = torch.randint(10, (3,), device="cuda")
mask = torch.randint(2, (10,), device="cuda", dtype=torch.bool)
val = torch.randn((), device="cuda")
x[mask]=1.
x[mask] = val
torch.cuda.synchronize()
```
the output is
```
/../playground/sync_warn_test.py:4: UserWarning: called a synchronizing operation (Triggered internally at  ../c10/cuda/CUDAFunctions.cpp:145.)
  x.nonzero()
/../playground/sync_warn_test.py:7: UserWarning: called a synchronizing operation (Triggered internally at  ../c10/cuda/CUDAFunctions.cpp:145.)
  if y:
something
/../playground/sync_warn_test.py:9: UserWarning: called a synchronizing operation (Triggered internally at  ../c10/cuda/CUDAFunctions.cpp:145.)
  torch.multinomial(x.abs(), 10, replacement=False)
/../playground/sync_warn_test.py:15: UserWarning: called a synchronizing operation (Triggered internally at  ../c10/cuda/CUDAFunctions.cpp:145.)
  x[mask] = val
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62092

Reviewed By: mruberry

Differential Revision: D29968792

Pulled By: ngimel

fbshipit-source-id: cc6f817212c164727ed99ecf6ab050dc29631b9e
2021-07-30 09:13:01 -07:00
273188549f pass through *EXITCODE *EXITCODE__TRYRUN_OUTPUT variables (#49646)
Summary:
This is needed to allow cross compiling to work

There are some `try_run` statements in CMake files used for building pytorch and dependencies. Since we are cross compiling, there's no way to run the compiled executables to get the output for `try_run` function. CMake provides a solution to this by requiring the user to manually provide the exitcode and the output of the executable which should be given by `*EXITCODE` and `*EXITCODE__TRYRUN_OUTPUT` respectively.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49646

Reviewed By: heitorschueroff

Differential Revision: D29960301

Pulled By: malfet

fbshipit-source-id: b10ab9c182d1220f7e1911f922e7db261d521145
2021-07-30 08:22:33 -07:00
b3781f0244 Remove faulty process group agent logic (#62409)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62409

This a reland of #61907 because removing process_group_agent.h / cpp broke facebook specific tests. I will remove the files and update the internal test code in a separate PR.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29990001

Pulled By: H-Huang

fbshipit-source-id: 2ee333322247d8b72691152308c3297e8c0c006d
2021-07-30 08:12:48 -07:00
ee7d19ac29 add OpInfo for torch.nn.functional.one_hot (#62253)
Summary:
Addresses facebookresearch/functorch#78.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62253

Reviewed By: heitorschueroff

Differential Revision: D29992924

Pulled By: zou3519

fbshipit-source-id: 1fc81edf3c8ca0722c5db0b32929a4cb3285f634
2021-07-30 07:05:29 -07:00
09d10c4329 OpInfo for nn.functional.softmax (#62077)
Summary:
This PR:

* Adds OpInfo for `softmax` and `nn.functional.softmax` (alias).
* Skip removal for `test_jit_alias_remapping` test of `log_softmax`.

Please see https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261.

cc: mruberry zou3519 pmeier

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62077

Reviewed By: heitorschueroff

Differential Revision: D29990019

Pulled By: zou3519

fbshipit-source-id: 67476990b54a5dd824eed9d10236e118564f2501
2021-07-30 06:56:03 -07:00
9fdf7ec6a2 [docs] Update sphinx to 3.5.4 (#61601)
Summary:
Sphinx 4.x is out, but it seems that requires many more changes to
adopt. So instead use the latest version of 3.x, which includes
several nice features.

* Add some noindex directives to deal with warnings that would otherwise
  be triggered by this change due to conflicts between the docstrings
  declaring a function and the autodoc extension declaring the
  same function.
* Update distributions.utils.lazy_property to make it look like a
  regular property when sphinx autodoc inspects classes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61601

Reviewed By: ejguan

Differential Revision: D29801876

Pulled By: albanD

fbshipit-source-id: 544d2434a15ceb77bff236e934dbd8e4dbd9d160
2021-07-30 06:23:10 -07:00
e352585f67 Clean up running smoke tests logic for Windows GHA (#62344)
Summary:
Followup to https://github.com/pytorch/pytorch/issues/62288

Front loads the logic and also force smoke tests to run on only one shard.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62344

Test Plan: Note that for the windows cuda10 run on PR, we get only 1 shard with the smoke tests running: https://github.com/pytorch/pytorch/pull/62344/checks?check_run_id=3194294041

Reviewed By: seemethere, heitorschueroff

Differential Revision: D29991573

Pulled By: janeyx99

fbshipit-source-id: 263d7de72c7a82a7205932914c32d39892294cad
2021-07-30 05:00:56 -07:00
329426c249 Fix cppdoc example syntax (#62385)
Summary:
a simple fix

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62385

Reviewed By: suo

Differential Revision: D30000982

Pulled By: heitorschueroff

fbshipit-source-id: e2e1c9efba3734b58d9b5f358c01d12c2c8c91ff
2021-07-30 04:36:55 -07:00
d57ce8cf89 [Linalg] Add cusolver syevjBatched path for torch.linalg.eigh when cuda >= 11.3 U1 (#62003)
Summary:
This PR adds the `cusolverDn<T>SyevjBatched` fuction to the backend of `torch.linalg.eigh` (eigenvalue solver for Hermitian matrix). Using the heuristics from https://github.com/pytorch/pytorch/pull/53040#issuecomment-788264724 and my local tests, the `syevj_batched` path is only used when `batch_size > 1` and `matrix_size <= 32`. This would give us huge performance boost in those cases.

Since there were known numerical issues on cusolver `syevj_batched` before cuda 11.3 update 1, this PR only enables the dispatch when cuda version is no less than that.

See also https://github.com/pytorch/pytorch/issues/42666 #47953 https://github.com/pytorch/pytorch/issues/53040

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62003

Reviewed By: heitorschueroff

Differential Revision: D30006316

Pulled By: ngimel

fbshipit-source-id: 3a65c5fc9adbbe776524f8957df5442c3d3aeb8e
2021-07-30 00:35:21 -07:00
956c22b1f9 [dte] fastpath implementations for mulgrad / divgrad (3/x) (#62437)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62437

In this diff we add a broadcast fastpath for MulGradient and DivGradient ops, whose tests we update to exercise the new functionality.

Test Plan: Added test cases to elementwise ops (which will exercise the new MulGradient / DivGradient broadcast fastpath functionality) that will be run by CI. It's worth noting there's still no code (outside of the new test cases) that takes the new code paths added -- the user must explicitly request  allow_broadcast_fastpath=True, and nothing outside of the added tests currently does so.

Differential Revision: D29938273

fbshipit-source-id: 281c1a109e38c25b9bf9ff8d832de60ac3c231a9
2021-07-30 00:05:34 -07:00
607d720be1 Remove an unused variable
Summary: This is set but never used

Test Plan: NFC

Reviewed By: smeenai

Differential Revision: D30000830

fbshipit-source-id: 702d6f7b844b52bfe696725a6b0a055d494b739a
2021-07-29 23:10:03 -07:00
cfd0f5ebc9 [quant] update per-channel observer min/max_val attribute names (#62345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62345

This PR updates the attribute names from min_vals to min_val. the motivation for this is to keep the attribute name consistent with per-tensor observers so that dependencies (like FusedMovingAvgObsFakeQuantize) don't need to differentiate between the two observer types to access the attributes.

It also adds some BC tests to make sure that observers saved earlier with min_vals/max_vals can be loaded depending on the state_dict version.
Note: Scriptability of the observers isn't fully supported yet, so we aren't testing for that in this PR.

Test Plan:
python test/test_quantization.py TestSerialization

Imported from OSS

Reviewed By: HDCharles

Differential Revision: D30003700

fbshipit-source-id: 20e673f1bb15e2b209551b6b9d5f8f3be3f85c0a
2021-07-29 22:28:53 -07:00
d92301dd02 [sharded_tensor] add new init_from_local_shards API (#60479)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60479

This added `init_from_local_shards` API to construct a ShardedTensor from local_shards and global sharded_tensor_metadata. It also refactors the utils in ShardingSpec to be able to be used by sharded_tensor for sanity check purpose.

Test Plan:
test_init_from_local_shards
test_init_from_local_shards_invalid_sharding

Reviewed By: pritamdamania87

Differential Revision: D29276777

fbshipit-source-id: 011c1d70426bc560a59b8d858c68f1aa12db8481
2021-07-29 22:04:13 -07:00
bc787f2402 Fix setArgumentNames and make Script/Python consistent (#62442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62442

For PythonMethodWrapper::setArgumentNames, make sure to use the correct method
specified by method_name_ rather than using the parent model_ obj which itself
_is_ callable, but that callable is not the right signature to extract.

For Python vs Script, unify the behavior to avoid the 'self' parameter, so we only
list the argument names to the unbound arguments which is what we need in practice.

Test Plan: update unit test and it passes

Reviewed By: alanwaketan

Differential Revision: D29965283

fbshipit-source-id: a4e6a1d0f393f2a41c3afac32285548832da3fb4
2021-07-29 21:29:06 -07:00
725d98bab6 [Prototype] [PyTorch Edge] Speed up model loading by 12% by directly calling the C file API from FileAdapter (#61997)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61997

After profiling the model loading latency on AI Bench (Android Galaxy S8 US), it seems like a significant amount of time was spent reading data using FileAdapter, which internally calls IStreamAdapter. However, IStreamAdapter uses `std::istream` under the hood, which is not that efficient. This change reduces the model loading time from [~293ms](https://www.internalfb.com/intern/aibench/details/600870874797229) to [~254ms](https://www.internalfb.com/intern/aibench/details/163731416457694), which is a reduction of ~12%.
ghstack-source-id: 134634610

Test Plan: See the AI Bench links above.

Reviewed By: raziel

Differential Revision: D29812191

fbshipit-source-id: 57810fdc1ac515305f5504f88ac5e9e4319e9d28
2021-07-29 20:14:49 -07:00
693d8f2f07 [PyTorch Edge] Cache operator lambda during model loading [7% faster model loading] (#61996)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61996

A recent post https://fb.workplace.com/groups/pytorch.edge.users/posts/2012215235600341/ about slow model loading with an accompanying perf report (report.html) caused me to look at the report and find hot spots during model loading. This suggested that we spend quite a bit of time looking up operators from the dispatcher. This means that we can probably just cach the operator handler functions (instead of computing them every time the operator name shows up since it potentially shows up multiple times in a given model).

This diff results in an approx 7% speedup in model loading time (from [315ms](https://www.internalfb.com/intern/aibench/details/45077128343028) to [293ms](https://www.internalfb.com/intern/aibench/details/600870874797229)) when run against an 87MB speech model that jiatongzhou provided.

See https://fb.workplace.com/groups/pytorch.dev/posts/855724575006024/ for the previous post from jiatongzhou.
ghstack-source-id: 134634612

Test Plan:
Run using AI Bench.

### Speech Transducer v25 model (87MiB)

Followed up with jiatongzhou and he gave me his speech model. For posterity, here's how to fetch it (you don't need to since I uploaded it to NMLML and now has a permanent Everstore Handle):

```
cd /tmp/
mkdir speech_model
cd speech_model
fbpkg fetch speech.stella.neural_transducer.on_device.en_us:25
cp pytorchmodel.pt ~/speech_transducer_v25_pytorchmodel.ptl
```

Here's how to build and run the benchmark using AI Bench:

```
buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/speech_transducer/v25.json --framework pytorch --platform android/arm64 --devices "S8US" --force_profile --remote
```

Reviewed By: raziel

Differential Revision: D29826210

fbshipit-source-id: 134b67eb466e73f0e43447b9b966278f13c4b56f
2021-07-29 20:14:47 -07:00
0b3f42fa4f [PyTorch Edge] Add test for lite interpreter operator caching (#62306)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62306

Test to see if caching of operators works as expected. When caching operators during model load we look up using the operator name. This test ensures that even if there are multiple operators with the same name (in the same model), the caching distinguishes between the ones that have a different number of arguments specified during the call in the serialized bytecode.

In this specific test, there's a model with 3 methods, 2 of which return a `float32` tensor and one which return an `int64` dtype. Please see the comments in the diff for details.

ghstack-source-id: 134634613

Test Plan:
Test command:

```
cd fbsource/fbcode/
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.OperatorCacheDifferentiatesDefaultArgs'
```

```
cd fbsource/
buck test xplat/caffe2:test_lite_interpreter
```

Reviewed By: raziel

Differential Revision: D29929116

fbshipit-source-id: 1d42bd3e6d33128631e970c477344564b0337325
2021-07-29 20:14:45 -07:00
0bbdf0e1e3 [PyTorch Edge] Add test_lite_interpreter to fbsource xplat BUCK files (#62305)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62305

Currently, it's super time consuming to run a lite interpreter test from fbcode since it takes > 10 minutes to build. Recently, I haven't been able to do that either due to low disk space.

Having this test available in fbsource/xplat/ is a great win for productivity since I can re-run it in ~2 minutes even after significant changes!

I've had to disarm some tests that can only run in OSS of fbcode builds (since they need functionality that we don't include for on-device FB builds). They are disarmed using the macro `FB_XPLAT_BUILD`.

ghstack-source-id: 134634611

Test Plan: New test!

Reviewed By: raziel, JacobSzwejbka, cccclai

Differential Revision: D29954943

fbshipit-source-id: e55eab14309472ef6bc9b0afe0af126c561dbdb1
2021-07-29 20:13:06 -07:00
90977e10ed Remove an unused variable
Summary: This is defined and then set once but never actually used. Kill it here.

Test Plan: NFC

Reviewed By: smeenai

Differential Revision: D29994983

fbshipit-source-id: 0cb7383b3ec95f1aeed5210974bc95060cf10be5
2021-07-29 18:04:01 -07:00
74291c8347 [quant][graphmode][fx] Fix the calls to load_arg in quantization_patterns.py (#62376)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62376

load_arg(quantized=...) accepts a dictionary from index to dtype,
not a list of dtype, the call is just to make sure the inputs are quantized with correct
dtype

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D29979711

fbshipit-source-id: 8499976ac5df8eb2019c3beae573dec6c9a56247
2021-07-29 17:28:07 -07:00
eef85f89b9 [dte] broadcast fastpath implementations for reduce utility functions (2/x) (#62428)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62428

In this diff we add a broadcast fastpath for reduce utility functions. These functions are used by various elementwise ops, whose tests we update to exercise the new functionality.

Test Plan: Added test cases to elementwise ops (which will exercise the new reducer functionality) that will be run by CI. It's worth noting there's still no code (outside of the new test cases) that takes the new code paths added -- the user must explicitly request  `allow_broadcast_fastpath=True`, and nothing outside of the added tests currently does so.

Differential Revision: D29938264

fbshipit-source-id: 5d5542bd93afb85fd9f7a4073f766adc07eb3b65
2021-07-29 17:27:39 -07:00
219917706e [quant][graphmode] Add support for reference pattern for default ops (#62375)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62375

default ops means ops that has one quantized input and one quantized output,
e.g. gelu, silu, leaky_relu etc. and we need to insert observer for the output

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29979712

fbshipit-source-id: ed88210a9d6f1ab5cdb9397b4ff7f1628162ef22
2021-07-29 17:27:37 -07:00
acba9b3104 [DDP Communication Hook] Simplify the implementation of parseHookResult of PythonCommHook (#62389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62389

Simplify the implementation of `parseHookResult` of `PythonCommHook`, by not directly accepting the output of allreduce, which is a tensor list.

Address the comment on https://github.com/pytorch/pytorch/pull/62074#discussion_r675303280

Additionally, formatter is also applied to `OptimizerHookState` and `hook_then_optimizer`.
ghstack-source-id: 134626246

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed:c10d
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork

Reviewed By: rohan-varma

Differential Revision: D29982485

fbshipit-source-id: 5b27cc5ef09d2f87c1ade4c0feef7eacc1af3a9a
2021-07-29 17:27:35 -07:00
554daef820 Reformat test_c10d_nccl.py and distributed_test.py (#62388)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62388

as title
ghstack-source-id: 134626247

Test Plan: N/A

Reviewed By: rohan-varma

Differential Revision: D29984086

fbshipit-source-id: 0960e5acc379ccdf08813387e11d3fb0a5f0e4b0
2021-07-29 17:27:33 -07:00
9fee176be3 [Model Averaging] Fix docstring of PeriodicModelAverager (#62392)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62392

The constructor of `PeriodicModelAverager` does not need to accept parameters.
ghstack-source-id: 134626245

Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork --  test_periodic_model_averager

Reviewed By: rohan-varma

Differential Revision: D29986446

fbshipit-source-id: 6a8b709e4383a3c44b9e60955fbb067cd2868e76
2021-07-29 17:26:27 -07:00
8f519c5e07 [quant][graphmode] Add support for reference pattern for torch.cat (#62374)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62374

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_cat

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29979713

fbshipit-source-id: 2d38991f96fcca783169ffd306bc2b0fb7debc69
2021-07-29 16:31:09 -07:00
502823c201 Change torch::Tensor to at::Tensor to fix build failure (#62425)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62425

Fixes https://github.com/pytorch/pytorch/issues/62416

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D30000948

Pulled By: heitorschueroff

fbshipit-source-id: 07dfc88a01b7718bc32be4342f43bb2cf2842b60
2021-07-29 16:31:08 -07:00
49dc827712 Reland D29943356: .github: Migrate ecr_gc to github actions (#62438)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62438

Switches out BASH_ENV for GITHUB_ENV

This reverts commit 1f1d01df3ec06046880d0a92b930fbd051d60606.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D29999785

Pulled By: seemethere

fbshipit-source-id: bb92850765518005a3f530264643959e5038e681
2021-07-29 16:31:06 -07:00
dc8b5db5f8 [quant][graphmode] relax the constraint for supported_dtypes for reference option (Linear and Conv) (#62348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62348

Originally we have a supported_dtypes check for linear and conv, but it's only valid for non reference option,
this PR removes the constraint when is_reference=True and enables producing reference patterns for the dtype
combinations that's not supported by fbgemm/qnnpack, for example qint8 activation dtypes

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_linear_qint8_activation

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29968675

fbshipit-source-id: 2abe37940eb62e16fcf0cbb700c174de49719223
2021-07-29 16:31:04 -07:00
9f9244aabe [dte] scaffolding for c2 operator broadcasting fastpath (1/x) (#62369)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62369

This diff is a big no-op that just sets up scaffolding for passing the "allow_broadcast_fastpath" from caffe2 operator protos created in Python down to C++. To facilitate this, we create helper template wrappers that pass a flag for "allow_broadcast_fastpath" down to elementwise functors. This flag will determine whether to try and take the broadcast fastpath, which we will add in subsequent diffs.

Test Plan: sandcastle + let github CI run

Differential Revision: D28154475

fbshipit-source-id: 15750a0bcd2994fbc6a61fb5653d8cae6b0177dd
2021-07-29 16:31:02 -07:00
5c47038d12 Back out D29792193 "Add default Saved Variable hooks" (#62415)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62415

test error

Differential Revision: D29990361

fbshipit-source-id: 99c87dec6c5be6496c9db5c9205c3cb72a953dd9
2021-07-29 16:31:00 -07:00
dcfcefcd0b Back out D29848525 "Catch saved tensors default hooks race condition" (#62414)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62414

test error

Differential Revision: D29990348

fbshipit-source-id: 1a7c668153ad7ad9e847dd1a74db678e787b6b0e
2021-07-29 16:29:46 -07:00
389380ffcc [reland] Refactor Tensor::to to call a primitive that is not copy_. (#62262)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62262

Context
-------
functorch is unable to vmap(grad(f)) when f contains a .to
call. This is because .to (when it is not a no-op) decomposes
to .copy_ under grad and the .copy_ is not compatible with vmap.

Fix
 ---
The fix for this is to have all Tensor::to variants call a new operator,
`_to_copy`, that always copies and is a primitive w.r.t. autograd so
that autograd decomposes Tensor::to into a call to `_to_copy`.
(This is related to https://github.com/pytorch/pytorch/issues/60956,
please let me know if you want to bikeshed the naming).

In order to get this done I had to do a bit of refactoring. All of the
`::to` implementations now call `to_impl` which may call `_to_copy`.

Autograd codegen changes
------------------------

The second thing I had to do was modify the autograd codegen. Right now,
autograd assumes that every output is either statically known to be
differentiable or not differentiable at codegen time. `_to_copy` is a
little special because its differentiability depends on the output
dtype. e.g. `torch.randn(3, requires_grad=True).to(torch.long)` is non
differentiable. To get this to work:
- I changed how `output_differentiability` in derivatives.yaml work.
- output_differentiability can now accept "conditions" for each of the
output arguments. A "condition" is some C++ code.
- We currently only support `output_differentiability` with conditions
if there is a single output. This is for convenience and can be changed
in the future.
- I added a new `output_differentiability_conditions` field to
DifferentiabilityInfo. This gets populated in load_derivatives.yaml
- forward-mode and reverse-mode AD take
`output_differentiability_conditions` into account.

Here's how the generated code for `VariableType::_to_copy`
[looks
like](https://gist.github.com/zou3519/93462df4bda1837acee345205b7cc849)
No other autogenerated code gets modified by this PR.

Performance benchmarking
------------------------
- I benchmarked [three
cases that demonstrate overhead](https://gist.github.com/zou3519/5b6985e6906b80eec5a0dd94ed5b6a1a).
- Case A: No-op .to(). Instruction count went from 50223 to 25623. I
have no clue why but this is a good thing.
- Case B: not-no-op .to(). Instruction count went from 665291 to 671961.
This is expected; `_to_copy` adds an additional dispatch.
- Case C: not-no-op .to() forward pass and backward pass. Instruction count
went from 4022841 to 4030057. This PR adds
an additional dispatch to .to() (so there should be one additional
dispatch in the forward pass) so this number looks reasonable.

Test Plan
---------
- test_torch.py has a test_to
- test_cuda.py has test_to*
- test_autograd has tests (test_type_conversions) that exercise the
reverse-mode path
- test_ops.py has some tests (like log_softmax) that exercise the
reverse-mode and forward-mode AD path.
- test_quantization, test_namedtensor all exercise tensor.to as well.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29934998

Pulled By: zou3519

fbshipit-source-id: 820069acd66fd5af97b98f42edfca68572c9eb1c
2021-07-29 10:49:32 -07:00
7b6d569a2b [jit] Renamed prim::Concat as prim::VarConcat (#61983)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61983

Trial #2. The previous PR (https://github.com/pytorch/pytorch/pull/61498) was reverted because this caused a failure in `pytorch_linux_backward_compatibility_check_test`. Fixed that now by adding to the exception list in `check_backward_compatibility.py`.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D29828830

Pulled By: navahgar

fbshipit-source-id: 947a7b1622ff6e3e575c051b8f34a789e105bcee
2021-07-29 10:28:59 -07:00
5ede826178 Fix alpine ecr image pull (#62413)
Summary:
Fixes alpine ecr image pull in the render_test_result step

![image](https://user-images.githubusercontent.com/658840/127527503-e88f198d-a8d5-4d3b-a064-096dca07d713.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62413

Reviewed By: malfet

Differential Revision: D29990844

Pulled By: zhouzhuojie

fbshipit-source-id: ff420f57d5e4b80d0ebf73508001a127649e9eb2
2021-07-29 10:20:13 -07:00
a42345adee Support for target with class probs in CrossEntropyLoss (#61044)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/11959

Alternative approach to creating a new `CrossEntropyLossWithSoftLabels` class. This PR simply adds support for "soft targets" AKA class probabilities to the existing `CrossEntropyLoss` and `NLLLoss` classes.

Implementation is dumb and simple right now, but future work can add higher performance kernels for this case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61044

Reviewed By: zou3519

Differential Revision: D29876894

Pulled By: jbschlosser

fbshipit-source-id: 75629abd432284e10d4640173bc1b9be3c52af00
2021-07-29 10:04:41 -07:00
dd0ef23a85 Delete .clang-tidy-oss (#62373)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62373

Internal clang-tidy can handle all the options after  D29863426 was deployed

Test Plan: CI

Reviewed By: 1ntEgr8

Differential Revision: D29978471

fbshipit-source-id: ea531734ab4fc3e0a26552bd24846b22c2e5c745
2021-07-29 09:30:18 -07:00
7157ad44bc Fix windows ci squid env (#62353)
Summary:
This is a re-land of https://github.com/pytorch/pytorch/pull/62244, noticeable changes are

- Use jinja2 variables to DRY the settings
- Added no_proxy for common destinations that don't fit into proxy (e.g. the magic settings from [aws link](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/http_proxy_config.html#windows-proxy))
- Try to trigger windows GHA CI flows
- Also went through the actionlint for github action linting errors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62353

Reviewed By: driazati

Differential Revision: D29970842

Pulled By: zhouzhuojie

fbshipit-source-id: b9c457b0005bb1a64850949a56679d68fbb281d6
2021-07-29 09:20:30 -07:00
80a662e773 ENH Updates docs and tests for classification modules that already support no batch dims (#61874)
Summary:
Towards https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61874

Reviewed By: heitorschueroff

Differential Revision: D29979977

Pulled By: jbschlosser

fbshipit-source-id: 82c19151aa7220e564337b05d7677d52981e0aa2
2021-07-29 09:14:52 -07:00
b9f02778b2 Forward fix mypy for #61820 (#62398)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62398

Test Plan: Imported from OSS

Reviewed By: malfet, anjali411

Differential Revision: D29988610

Pulled By: ejguan

fbshipit-source-id: 700dfa5b1c415bc058390bbe5727a739c8419b0f
2021-07-29 07:43:12 -07:00
2d103025a5 Adding warning on isend about modifying after send (#61875)
Summary:
This is a standard limitation on communication collective libraries. For example:

https://www.open-mpi.org/doc/v4.0/man3/MPI_Isend.3.php
```
A nonblocking send call indicates that the system may start copying data out of the send buffer. The sender should not modify any part of the send buffer after a nonblocking send operation is called, until the send completes.
```

http://openucx.github.io/ucx/api/latest/html/group___u_c_p___c_o_m_m.html#ga8323878b60f426c630d4ff8996ede3cc
```
The user should not modify any part of the buffer after this operation is called, until the operation completes.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61875

Reviewed By: suo

Differential Revision: D29783720

Pulled By: mrshenli

fbshipit-source-id: 78fd047c74449f77b906f3766a6c2bc29499847d
2021-07-29 07:37:18 -07:00
945d40dca6 Also disable inplace fw AD for acos on windows (#62360)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62304

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62360

Reviewed By: malfet, bdhirsh

Differential Revision: D29973310

Pulled By: albanD

fbshipit-source-id: 3b033e779f557724602c5a87f497698f2262a12e
2021-07-29 06:42:25 -07:00
1b147a52f5 Allow FX tracer to trace control flow (if/while) statements when parameter shapes are in the conditionals (#61820)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61733

Allow FX tracer to trace control flow (if/while) statements when parameter shapes are in the condition.
If the user specifies the new "param_shapes_constant" option when constructing a tracer,  the model's parameter shape attribute will be evaluated and the resulting constant will be emitted into the IR during tracing.
Also added a new test

`
python test/fx/test_fx_param_shape_control_flow.py
`
The test also performs a somewhat whitebox style testing to check the generated Python code from the IR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61820

Reviewed By: bdhirsh

Differential Revision: D29969299

Pulled By: jerryzhenleicai

fbshipit-source-id: 99aae824bdfec880be69258de7ead5c8cd59eddc
2021-07-28 23:48:44 -07:00
4ed8858817 Exclude time of waiting in queue from gloo communication prof… (#61342)
Summary:
Background:
    The gloo communication implementation is as follow:
        1. Construct communication workers and push them into a queue.
        2. Initialize a thread pool and each thread run a loop to get worker from the queue and execute it.
Issue:
        The recorded profiling time span start from the worker construction and end at finish. So it will include the time of worker waiting in the queue and will result in multiple gloo communication time span overlapping with each other in a same thread in the timeline:
![image](https://user-images.githubusercontent.com/62738430/124867273-5bc95b80-dff0-11eb-8664-6e5d4166fc39.png)
This is because when next work is waiting in the queue, the last work is not finished.

Solution:
     This PR delays the profiling start time of gloo communication from worker construction to worker is really executed, so the profiling span will not include the time of waiting in queue. Implementation as follow:
             1. Firstly, disable the original record function by specifying 'nullptr' to 'profilingTitle' argument of ProcessGroup::Work
             2. Construct a 'recordFunctionBeforeCallback_' and 'recordFunctionEndCallback_' and save it as member of the worker.
             3. When the worker is executed, invoke the 'recordFunctionBeforeCallback_'.
             4. The 'recordFunctionEndCallback_' will be invoked at finish as before.
      After this modification, the gloo profiling span in timeline will not overlap with each other:
![image](https://user-images.githubusercontent.com/62738430/124868716-bb286b00-dff2-11eb-9cf0-d0494a356d0c.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61342

Reviewed By: albanD

Differential Revision: D29811656

Pulled By: gdankel

fbshipit-source-id: ff07e8906d90f21a072049998400b4a48791e441
2021-07-28 22:24:26 -07:00
35307b131d Callable activation function support for Transformer modules (Python) (#61355)
Summary:
Fixes Python part of https://github.com/pytorch/pytorch/issues/60747

Enhances the Python versions of `Transformer`, `TransformerEncoderLayer`, and `TransformerDecoderLayer` to support callables as their activation functions. The old way of specifying activation function still works as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61355

Reviewed By: bdhirsh

Differential Revision: D29967302

Pulled By: jbschlosser

fbshipit-source-id: 8ee6f20083d49dcd3ab432a18e6ad64fe1e05705
2021-07-28 21:42:56 -07:00
1f2b96e7c4 [DDP] Make compute_bucket_assignment_by_size return per bucket sizes (#62231)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62231

`compute_bucket_assignment_by_size` is responsible for setting per-bucket size limits, return this information from the function so that we are aware of size limits for each bucket.

This is currently not being consumed, but will be in the next diff when we log bucket size limits to DDP logging. This will help us run experiments under different bucket size configs and analyze the impact.
ghstack-source-id: 134480575

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D29919056

fbshipit-source-id: dd5a096fa23d22e5d9dc1602899270a110db4a19
2021-07-28 20:21:01 -07:00
c76daa6de3 [DDP][ez] Remove misleading comment (#62230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62230

We don't iterate over model replicas anymore.
ghstack-source-id: 134475834

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D29918760

fbshipit-source-id: 84bde670b4e91667a49f94f1b597fad364498467
2021-07-28 20:20:59 -07:00
842228fd0d [DDP] Save bucket size limits (#62229)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62229

First of a stack of diffs to save and log the bucket size limits to help debug/discover discrepancies and analyze impact of bucket size tuning
ghstack-source-id: 134475835

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D29918629

fbshipit-source-id: b9b3f9a5658340a4c7fd72874c2254664e3c52e9
2021-07-28 20:19:56 -07:00
cac4aa71ca Provide option to pass module instance to _load_state_dict_pre_hooks. (#62070)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62070

We have a custom Tensor:
https://github.com/pytorch/pytorch/blob/master/torch/distributed/_sharded_tensor/api.py#L67,
which doesn't show up in state_dict for the module. This was resolved by
using the _register_state_dict_hook:
https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py#L1196
to parse and add custom tensors to state_dict.

However, the problem is during load time  _register_load_state_dict_pre_hook:
https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py#L1272,
does not pass in the module instance and as a result, a ShardedTensor in the
state_dict cannot be appropriately added to a module at load time.

To resolve this issue, in this PR I've enhanced this hook to support two
variations, one which passes in the module instance (for the problem described
above) and one is the previous version for BC reasons.
ghstack-source-id: 134541391

Test Plan:
1) unit tests
2) waitforbuildbot

Reviewed By: jbschlosser

Differential Revision: D29867142

fbshipit-source-id: bcb136ff51eedd0b508cfb419e8b8a6b7d95539c
2021-07-28 19:22:47 -07:00
2eaf71d749 [Model Averaging] Update model averager API to avoid the redundant params arg needed by post-localSGD optimizer (#62132)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62132

as title

Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 134560541

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_post_localSGD_optimizer_parity

buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager

Reviewed By: rohan-varma

Differential Revision: D29887751

fbshipit-source-id: 60dadb04790d800fdcc7cb8a08d060e411718739
2021-07-28 18:43:09 -07:00
55bee44951 [Model Averaging] Post-localSGD optimizer (#62131)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62131

Wrap `PeriodicModelAverager` as an optimizer.

Currently both the optimizer and averager require an input `params` arg, where the latter actually can read params from the optimizer wrapper. Will update averager class API in a follow-up PR.

Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 134560248

Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_post_localSGD_optimizer_parity

Reviewed By: rohan-varma

Differential Revision: D29881465

fbshipit-source-id: b9634972f4d8bffd3b3eb94f5dbbb19db2bcd759
2021-07-28 18:42:06 -07:00
58d45d950b [DDP] Log unused param names under DETAIL debug mode. (#62209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62209

When `TORCH_DISTRIBUTED_DEBUG=DETAIL` is set, log names and indices of unused parameters when searching for them.

Motivation is that we have seen a couple of issues occasionally when there are errors related to parameter possibly being marked as unused when it shouldn't, this can help narrow down the root cause by explicitly logging param names that are marked as unused.
ghstack-source-id: 134541461

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D29916085

fbshipit-source-id: d84cf637cbbd811521e6264ffd6c50ca8a79595b
2021-07-28 18:10:32 -07:00
24ed6e6b16 Add actionlint (#62364)
Summary:
This adds a linter for our GitHub actions. When a GitHub Actions workflow has an invalid definition, GitHub doesn't queue the job and doesn't report it as failed, so these can be hard to detect with the usual tools. This adds an explicit job to check if our workflow YAMLs are valid using [https://github.com/rhysd/actionlint](https://github.com/rhysd/actionlint). We deployed a similar check in pytorch/test-infra [here](https://github.com/pytorch/test-infra/pull/89).

This PR enables the linter and fixes all the issues it complained about (it did already catch one bug where we were leaving `CIRCLE_BRANCH` blank when uploading binary size)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62364

Reviewed By: zhouzhuojie

Differential Revision: D29973928

Pulled By: driazati

fbshipit-source-id: 83b365e98fd6cbdcd75eeb44daf1be1c89056f8d
2021-07-28 17:10:20 -07:00
fcc7fbe15f Split zeta_kernel out of BinaryMiscOpsKernel.cu (#62261)
Summary:
`BinaryMiscOpsKernel.cu` takes 4 m 30 s to compile on my machine, which is the second slowest after `PowKernel.cu`. Moving the zeta kernel into it's own file takes 3 m 30 s, and reduces `BinaryMiscOpsKernel.cu` compile time to 1 m.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62261

Reviewed By: bdhirsh

Differential Revision: D29969350

Pulled By: ngimel

fbshipit-source-id: 37cad5775088b2f7d22948414e4bf0427f88e07d
2021-07-28 16:07:15 -07:00
f6e137598d ns for fx: fix nit in default qlinear weight extraction function (#62334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62334

Removes the assert for node type in default qlinear weight extraction
function. Without the assert, user defined functions can now use
this util function without failing this check.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs

// further tests will be in follow-up fb-only diffs
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29963501

fbshipit-source-id: a634eabb5165375bde186438318ec52fa29c970f
2021-07-28 16:07:13 -07:00
72c943a2ac ns for fx: fix bug for user function in weight extraction (#62333)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62333

We incorrectly ignored any custom relationships the user specified
in the `extract_weights` API.  Fixing this and adding a test case.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29963502

fbshipit-source-id: 33ce3d4df1acb6298b6c7dcb6674015c8d14bdf4
2021-07-28 16:05:51 -07:00
d98b1c400d [pruner] add cuda tests for pruner (#61993)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61993

Repeating `test_pruner` unit tests for Linear and Conv2d models with device = 'cuda' to confirm pruner will work on GPU
- set device to cuda
- move model to device
- assert that module.weight.device is cuda
ghstack-source-id: 134554382

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1Md9c

Reviewed By: jerryzh168

Differential Revision: D29829293

fbshipit-source-id: 1f7250e45695d0ad634d0bb7582a34fd1324e765
2021-07-28 14:45:04 -07:00
b39b28ced3 irange-ify 10 (#62122)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62122

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29879694

fbshipit-source-id: 87cd8ab17061129c835d9f961b67587c84d181d1
2021-07-28 13:35:23 -07:00
88f8f2ab94 irange-ify 6 (#62115)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62115

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29879576

fbshipit-source-id: 63cbf0ab5a52325fa2c3dec0e8239e2eac1ecf72
2021-07-28 13:32:11 -07:00
9e77113e85 irange-ify 11 (#62121)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62121

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29879701

fbshipit-source-id: 5c51879c88fa6a5790db241c8b33ec0dc4b177ca
2021-07-28 13:32:09 -07:00
b5867a1b34 irange-ify 7 (#62117)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62117

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29879640

fbshipit-source-id: 189578a57301747a3421742e145bbcdf2ad75c49
2021-07-28 13:30:39 -07:00
59bb4f2dab Revert D29928698: [pytorch][PR] Use private squid proxy
Test Plan: revert-hammer

Differential Revision:
D29928698 (6da4a25509)

Original commit changeset: 4ee78be0abe3

fbshipit-source-id: 44679a2b247ba8163f09895d9d36ecf5df4390b8
2021-07-28 12:35:55 -07:00
3a2603bc68 Port slow_conv_transpose2d to structured (#55503)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55503

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29945028

Pulled By: SplitInfinity

fbshipit-source-id: 0b696d104938287444210f1bc926afc13f899991
2021-07-28 12:03:03 -07:00
05b802d4e0 [pytorch] Bring back RemoveInplaceOps() (#62200)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62200

This commit brings back the `RemoveInplaceOps` pass removed in D29523283 (dec5aa2260) that apparently had a bunch of internal users.

Test Plan: danthe3rd

Reviewed By: danthe3rd

Differential Revision: D29833316

fbshipit-source-id: 6cf13d463ab0a5e50ba3eb3243f79a9c51623809
2021-07-28 12:00:38 -07:00
b91a917616 [Static Runtime] Fixed another build failure in OSS due to test_utils.h (#62338)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62338

Test Plan: Imported from OSS

Reviewed By: d1jang

Differential Revision: D29965744

Pulled By: navahgar

fbshipit-source-id: cf3e54ac13432ea8afc4b718fac6c9768743d01b
2021-07-28 11:41:33 -07:00
7c588d5d00 ENH Adds no_batch_dim support for pad 2d and 3d (#62183)
Summary:
Towards https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62183

Reviewed By: ejguan

Differential Revision: D29942250

Pulled By: jbschlosser

fbshipit-source-id: d1df4ddcb90969332dc1a2a7937e66ecf46f0443
2021-07-28 11:10:44 -07:00
6da4a25509 Use private squid proxy (#62244)
Summary:
This PR adds a **private** squid proxy (note that the internal ELB is only accessible from the private VPC subnets of GitHub Runners) that's deployed dedicated for PyTorch CI for GitHub runners.

```
dig $SQUID_PROXY

10.0.x.x
10.0.x.x
```

http_proxy and https_proxy are compatible with the following http clients:

- curl
- wget
- python

Existing cache policy:

refresh_pattern -i .(7z|deb|rpm|exe|zip|tar|tgz|gz|ram|rar|bin|tiff|bz2|run|csv|sh)$ 1440 80% 2880
It uses the standard squid refresh_pattern for cache requests. In our setup, we tried
to cache at least (1440 minutes - 1 day) and at max (2880 minutes - 2 days), with
last-modified factor 80% (squid doc). Please refer to pytorch/test-infra for details.

Right now, it only applies to the build and test step, to limit the scope and make sure build and test are more reliable with egress cache.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62244

Test Plan:
```
# first time, cache miss (4min20s)
http_proxy=$SQUID_PROXY https_proxy=$SQUID_PROXY curl -v -L http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz --output /tmp/tmp_mnist.zip
100 9680k  100 9680k    0     0  37836      0  0:04:21  0:04:21 --:--:-- 29908

# second time, cache hit (0s)
http_proxy=$SQUID_PROXY https_proxy=$SQUID_PROXY curl -v -L http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz --output /tmp/tmp_mnist.zip
100 9680k  100 9680k    0     0   103M      0 --:--:-- --:--:-- --:--:--  103M
```

Load Test Plan:
```
# ab load test with `-n 100` requests
ab -X $SQUID_PROXY -n 100 http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz

Concurrency Level:      1
Time taken for tests:   9.044 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      991326300 bytes
HTML transferred:       991242200 bytes
Requests per second:    11.06 [#/sec] (mean)
Time per request:       90.442 [ms] (mean)
Time per request:       90.442 [ms] (mean, across all concurrent requests)
Transfer rate:          107040.50 [Kbytes/sec] received
```

Reviewed By: malfet

Differential Revision: D29928698

Pulled By: zhouzhuojie

fbshipit-source-id: 4ee78be0abe35411666c6121991b0addded57106
2021-07-28 10:37:42 -07:00
2581dfc249 [Model Averaging] Create a base class for model averaging (#62111)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62111

This base class will be passed to the post-localSGD optimizer in the next PR. This way, the same post-localSGD optimizer can choose different model averaging algorithms.

Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 134489187

Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager

Reviewed By: rohan-varma

Differential Revision: D29884954

fbshipit-source-id: 1dc5e35c58895902991567f633afd621c7108938
2021-07-28 10:15:36 -07:00
a15fff0a7f Revert D29794666: Remove faulty process group code
Test Plan: revert-hammer

Differential Revision:
D29794666 (afe3644321)

Original commit changeset: 0b35191cc072

fbshipit-source-id: 6467bc5100f4115f2fdb385e205740cd68c89743
2021-07-28 10:15:34 -07:00
71a6ef17a5 ENH Adds no_batch_dim tests/docs for Maxpool1d & MaxUnpool1d (#62206)
Summary:
Towards https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62206

Reviewed By: ejguan

Differential Revision: D29942341

Pulled By: jbschlosser

fbshipit-source-id: a3fad774cee30478f7d6cdd49d2eec31be3fc518
2021-07-28 10:15:32 -07:00
cdf85a82ed [quant][graphmode][fx] Add reference pattern support for BatchNorm (#62215)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62215

including batchnorm2d, batchnorm3d, batchnormrelu2d and batchnormrelu3d

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29917524

fbshipit-source-id: 3a9520ff659cb21e6e2fe614973b3d08aa0af923
2021-07-28 10:14:16 -07:00
7443c90f15 optimize non lastdim softmax bf16 (#60371)
Summary:
Here is the PR to enable the softmax calculation with data type of `bfloat16` when not along the last dim.
* Use bf16 specialization for forward calculation to reduce the bf16/fp32 cast in vec template.
* Release the bf16 limitation for backward calculation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60371

Reviewed By: ejguan

Differential Revision: D29563109

Pulled By: cpuhrsch

fbshipit-source-id: f6b439fa3850a6c633f35db65ea3d735b747863e
2021-07-28 10:06:51 -07:00
68efa186cc [static runtime] Implement aten::full (#62227)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62227

Test Plan: Added `StaticRuntime.IndividualOps_Full` to cover the newly added code path.

Reviewed By: hlu1

Differential Revision: D29923649

fbshipit-source-id: 722950137c35ae325590a670b97f03b395e8eac3
2021-07-28 09:50:27 -07:00
10c6811a6b [DDP] Run test_ddp_new_tensor_in_fwd with static graph (#61992)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61992

This test previously was not enabled for static graph but to ensure
this feature is supported with DDPSink, enable it for static graph which
currently passes outputs to DDPSink.
ghstack-source-id: 134471406

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D29830887

fbshipit-source-id: 2d3f750d9eb4289558ed21acccd172d83d9b82cc
2021-07-28 09:49:12 -07:00
acf8907e94 These should be equivalent per the previous formula but breaks xla (#62329)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62329

Reviewed By: ejguan

Differential Revision: D29961527

Pulled By: albanD

fbshipit-source-id: 46e46726591f4c0c8faf6ec0d7136a2d4ca976ea
2021-07-28 09:23:51 -07:00
f4baa83eae [bc-breaking] reference option for conv produce a pattern instead of reference conv module (#61942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61942

This PR changes is_reference=True for conv to produce a pattern consists of dequant - float conv - quant instead of reference conv module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for
convert in the future.

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29810656

fbshipit-source-id: 549237a62bfda4341a2a7474c124f5e33350e267
2021-07-28 09:13:40 -07:00
52d1ffb789 Teach pytrees about namedtuple (#62292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62292

This PR adds pytree support for namedtuples. The challenge about namedtuple
is that each namedtuple class is actually different. This PR does the
following:
- it adds a namedtuple flatten/unflatten. The flatten function returns
a context that is the actual type of the namedtuple subclass. The
unflatten function uses that type to reconstruct the namedtuple
- Special cases all pytree logic to consider all namedtuples the same.
This is done by creating a `_get_node_type(pytree)` helper function that
returns `namedtuple` if `pytree` is any namedtuple subclass. The effect
of this is that all namedtuple subclasses will go through the namedtuple
flatten/unflatten functions
- Adds a `_namedtuple_flatten_spec` function for FX pytrees. This function
flattens the namedtuple based on the spec and is equivalent to the
`_tuple_flatten_spec`.

Test Plan
- new tests in test/test_pytree.py and test/test_fx.py

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29947302

Pulled By: zou3519

fbshipit-source-id: 19c00665b13546642c315df0f243ad99b8e7ff7c
2021-07-28 06:27:44 -07:00
c06b6e445f Build M1 binaries with PocketFFT (#62222)
Summary:
As MKL is only available on x86_64 platform, clone header-only PocketFFT
library and use it as FFT provider

Fixes https://github.com/pytorch/pytorch/issues/62107

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62222

Reviewed By: ejguan

Differential Revision: D29938718

Pulled By: malfet

fbshipit-source-id: ac0bd98b5090d6c8a26c36c4e34a4d6e1d9f1a92
2021-07-27 22:41:29 -07:00
cb2b5f06c9 Revert D29816592: [pytorch][PR] [fix] polygamma n>=1
Test Plan: revert-hammer

Differential Revision:
D29816592 (b73d759708)

Original commit changeset: 2c020a6e4c32

fbshipit-source-id: 310c93ade300966366ef04f206a5908fb27745db
2021-07-27 22:14:10 -07:00
73f1e2d1dc [8/N] Nnapi backend delegation preprocess: New refactored design (#62225)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62225

Rewrote the preprocess function for Android NNAPI delegate.
Previously, `preprocess()` called `convert_model_to_nnapi()` using Pybind and returned a NnapiModule that is serialized for mobile. Now, `preprocess()` calls a sub-function of `convert_model_to_nnapi()` and returns several preprocessed items (that were previously components of NnapiModule).

Dictionary returned contains:
   "shape_compute_module": torch::jit::Module,
   "ser_model": torch::Tensor,
   "weights": List[torch.Tensor],
   "inp_mem_fmts": List[int],
   "out_mem_fmts": List[int]

**Purpose and Future:**
The purpose of these changes are to move more implementation from bytecode and Torchscript to the delegate API, since bytecode is less efficient.
Now, only the shape computation uses bytecode. In the future, shape computation will be moved out of Torchscript as well.

**nnapi_backend_preprocess.cpp:** preprocess implementation
**prepare.py**: refactored a portion of `convert_model_to_nnapi()` to `process_for_nnapi()`, so preprocess can get components of NnapiModule

**Test:**
Ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` on OSS successfully
ghstack-source-id: 134444190

Test Plan: Ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` on OSS successfully

Reviewed By: raziel

Differential Revision: D29922279

fbshipit-source-id: cadcf8908d8a745dc7abbe286e97d6ead937d4ab
2021-07-27 18:52:48 -07:00
7aabda6d5d Update nccl to v2.10.3-1 (#62276)
Summary:
Which at the time of creating PR is points to 7e51592129

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62276

Reviewed By: ngimel

Differential Revision: D29940950

Pulled By: malfet

fbshipit-source-id: 59c6fda76a9023af3adbfb5a96b83ca50950df6c
2021-07-27 18:32:53 -07:00
1f1d01df3e Revert D29943356: .github: Migrate ecr_gc to github actions
Test Plan: revert-hammer

Differential Revision:
D29943356 (8e0622abf1)

Original commit changeset: 493592baf2f7

fbshipit-source-id: f0e604aab2b828561adc3e8fabf0f39221e15615
2021-07-27 18:14:31 -07:00
af0f083d42 [dist_optim] fix the bug of none grads on functional optimizers (#62249)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62249

parameter and grads passed to torch.optim.functional should always match, we should skip the parameters that have none gradients to avoid the size mismatch
ghstack-source-id: 134452467

Test Plan: test_dist_optim_none_grads

Reviewed By: mrshenli

Differential Revision: D29929653

fbshipit-source-id: 4ca6167fecdfe1db422236655edee3aa59b8b044
2021-07-27 18:10:51 -07:00
c0b806694f Do not use deprecated data accessor in IndexKernel.cu (#62268)
Summary:
Fixes repeated warnings like:
```
/var/lib/jenkins/workspace/aten/src/ATen/native/cuda/IndexKernel.cu: In lambda function:
/var/lib/jenkins/workspace/aten/src/ATen/native/cuda/IndexKernel.cu:354:683: warning: 'T* at::Tensor::data() const [with T = c10::BFloat16]' is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
   AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3 (e23ddf06e9)(at::ScalarType::Half, at::ScalarType::Bool, at::ScalarType::BFloat16, iter.dtype(), "take_cuda", [&] {
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           ^
/var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:559:1: note: declared here
   T * data() const {
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62268

Reviewed By: walterddr

Differential Revision: D29937267

Pulled By: malfet

fbshipit-source-id: 6413deb9762b973880f4a7db47652eacd013214f
2021-07-27 17:58:19 -07:00
e3be185069 [PyTorch] Add KWargs support to script module forward (#62224)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62224

They underlying operator allows both args and kwargs, but we only expose args in this convenience method. this brings them in line while not changing any existing programs.

Test Plan: CI

Reviewed By: gunchu

Differential Revision: D29920830

fbshipit-source-id: f4b2aa88d4a679e33595625b7ef355e4d14e54c4
2021-07-27 17:02:57 -07:00
9776e1ff2f Migrate thnn_conv_depthwise2d from THC to ATen (#62281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62281

Closes gh-24646, Closes gh-24647

There is no `TensorIterator` equivalent to these kernels so this is just
migrating the existing kernels over to the ATen style.

I've benchmarked for contiguous tensors with this script:
```
import torch
shape = (10, 10, 100, 100)
x = torch.randn(*shape, device='cuda')
w = torch.randn((10, 1, 5, 5), device='cuda')

for _ in range(100):
    torch.nn.functional.conv2d(x, w, groups=10)
```

and similarly for backwards. I see these as the same to within measurement error.

|                   | Master Forward (us) | This PR Forward (us) |
|------------------:|:-------------------:|:--------------------:|
|           Forward |        133.5        |         133.6        |
|  Backward (input) |        1,102        |         1,119        |
| Backward (weight) |        2,220        |         2,217        |

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29943062

Pulled By: ngimel

fbshipit-source-id: fc5d16496eb733743face7c5a14e532d7b8ee26a
2021-07-27 16:51:23 -07:00
ba9423aa93 Fix forward ad for matrix power land race (#62291)
Summary:
Fix land race from https://github.com/pytorch/pytorch/pull/59993

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62291

Reviewed By: driazati, seemethere

Differential Revision: D29946599

Pulled By: albanD

fbshipit-source-id: 16411e1a0c298fad12a6a6788ec2427923b0112a
2021-07-27 16:17:51 -07:00
171e13fde9 Rework PowKernel.cu (#62260)
Summary:
PowKernel.cu is the single slowest file to compile in all of pytorch, taking
7 m 34 s on my machine. After investigating, I discovered that the case with
complex inputs and a cpu scalar for the first argument takes more than half that
time just on its own.

Noting that [`thrust::pow`] for complex is just `exp(log(base) * exponent)`,
we can improve this kernel by precomputing `log(base)` on cpu and computing
only the `exp` on CUDA. This is faster in both runtime and compile time.
For 1 million elements, master takes 61.6 us vs 56.9 us with this PR.

I also noticed that the constant exponent case is implemented twice, once in
`gpu_kernel_with_scalars` and again in `pow_tensor_scalar_kernel`. Further, the
`Pow.cpp` code detects cpu-scalar exponents and redispatches to the `tensor_scalar`
overload, making the `gpu_kernel_with_scalars` version dead code. Now instead,
we unconditionally run `tensor_tensor` and it will call into `tensor_scalar` if appropriate.

With these changes, PowKernel.cu takes just 2 m 30 s to compile.

[`thrust::pow`]: 368266e80e/thrust/detail/complex/cpow.h (L33)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62260

Reviewed By: ejguan

Differential Revision: D29938789

Pulled By: ngimel

fbshipit-source-id: 7ab7d81ececc92a9e6e62e60b0a4f2e6e3146df8
2021-07-27 16:16:20 -07:00
7507aeded5 [reland][bc-breaking] reference option for linear produce a pattern instead of reference linear module (#61892) (#62277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62277

This PR changes is_reference=True for linear to produce a pattern consists of dequant - float linear - quant instead of reference linear module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for
convert in the future.

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Imported from OSS

Reviewed By: ejguan

Differential Revision: D29941079

fbshipit-source-id: 84bdfc0bb872c34fc345875e545c8b323e77c41e
2021-07-27 15:46:44 -07:00
24d94f5102 Limit smoke tests on PRs to just one config (#62288)
Summary:
When coming across the short runtime of a periodic job on this PR, I realized the current smoke tests on PRs set up was flawed. Previously an attempt for better future compatibility, our conditional for running smoke tests only was for USE_CUDA=1 on Windows.

This is BAD and has unintended consequences, such as misleading results when a ci/scheduled workflow is triggered but fails to test the full test suite. e.g., with PR https://github.com/pytorch/pytorch/issues/62266 https://github.com/pytorch/pytorch/actions/runs/1071698069

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62288

Reviewed By: seemethere, ejguan

Differential Revision: D29945540

Pulled By: janeyx99

fbshipit-source-id: 3cc91511c151f7348872b039c94d7752b6ea4692
2021-07-27 15:33:37 -07:00
8e0622abf1 .github: Migrate ecr_gc to github actions (#62284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62284

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet, zhouzhuojie

Differential Revision: D29943356

Pulled By: seemethere

fbshipit-source-id: 493592baf2f7abe206e1fb17438bac4e908b1251
2021-07-27 15:11:01 -07:00
d0e5ef5eba .circleci: Remove conda-package-handling pin (#62290)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62290

No longer needed anymore.

Fixes nightly failures that we're observing as well:

```
Jul 27 07:33:02 Found conflicts! Looking for incompatible packages.
Jul 27 07:33:02 This can take several minutes.  Press CTRL-C to abort.
Jul 27 07:33:02 failed
Jul 27 07:33:02
Jul 27 07:33:02 UnsatisfiableError: The following specifications were found
Jul 27 07:33:02 to be incompatible with the existing python installation in your environment:
Jul 27 07:33:02
Jul 27 07:33:02 Specifications:
Jul 27 07:33:02
Jul 27 07:33:02   - conda-package-handling=1.6.0 -> python[version='>=2.7,<2.8.0a0|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|>=3.8,<3.9.0a0']
Jul 27 07:33:02
Jul 27 07:33:02 Your python: python=3.9
```

From: https://app.circleci.com/pipelines/github/pytorch/pytorch/356478/workflows/2102acf1-c92a-4a59-919c-61d32d3bcd71/jobs/15027876

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D29946501

Pulled By: seemethere

fbshipit-source-id: 3e9182f4cbcf2aab185dbbc21b7a6171746e2281
2021-07-27 14:59:41 -07:00
8fe32c9c13 fix test-report uploading uniqueness issue (#62217)
Summary:
Should fix: https://github.com/pytorch/pytorch/issues/61978.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62217

Reviewed By: seemethere, ejguan

Differential Revision: D29944444

Pulled By: walterddr

fbshipit-source-id: 4b737d1535fd5cbfafb24245fad9ef67285f1dc0
2021-07-27 14:17:50 -07:00
190cdcb08c remove print for status on scribe sending (#62285)
Summary:
Following up on https://github.com/pytorch/pytorch/issues/61768.

Currently the printout is hugely long because each test case returns a status code OK without an exception.
This should be avoided when no exception was raised from send_to_scribe.

Removing the log printing when response without error

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62285

Reviewed By: zhouzhuojie

Differential Revision: D29944461

Pulled By: walterddr

fbshipit-source-id: fc3c2b88bba27c68521cef7079ca2b6197d2d58b
2021-07-27 14:16:32 -07:00
e1bee3eb30 [Static Runtime] Add missing unit tests for static runtime ops (#62238)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62238

Added tests for the following ops:

* `aten::mul`
* `aten::nan_to_num`
* `aten::stack`
* `aten::relu`
* `aten::tanh`

Reviewed By: hlu1

Differential Revision: D29914217

fbshipit-source-id: 6a6c39629310e7131127e24fdce7253ccdf80340
2021-07-27 14:12:21 -07:00
4a15f4a902 Allow 0-dim batch sizes in Bilinear NN layer. (#47106)
Summary:
Part of the fix for https://github.com/pytorch/pytorch/issues/12013

Checks if the inputs and outputs are non-zero in order to allow the Bilinear layer to accept 0-dim batch sizes. The if-check for this checks for both input and output dim sizes since the `_trilinear` function is written to work with both forward and backward for Bilinear.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47106

Reviewed By: ejguan

Differential Revision: D29935589

Pulled By: jbschlosser

fbshipit-source-id: 607d3352bd4f88e2528c64408f04999960be049d
2021-07-27 13:59:42 -07:00
ab0354b650 All remaining linear/element-wise formulas (#59993)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59993

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29914594

Pulled By: albanD

fbshipit-source-id: 2ffc5993cb66586e1458d7016774a03dfe786863
2021-07-27 13:06:46 -07:00
4c3eea26bd Fix out= variant forward grad detection (#60499)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60499

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29914595

Pulled By: albanD

fbshipit-source-id: c51bb3aed91ab1f6ebc57936143b249590a43bd5
2021-07-27 13:06:45 -07:00
4a36e2a223 Add forward AD inplace check and fix codegen (#60498)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60498

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29914593

Pulled By: albanD

fbshipit-source-id: bde649d5a03639a240dfe5fe027c6a3f758428a4
2021-07-27 13:04:55 -07:00
df18d05429 Make bytes_read available for OperatorCost (#62059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62059

GetOperatorCost in Workspace exposes flops and bytes_written only. Make the an additional piece, bytes_read, available from OperatorSchema::Cost.

Test Plan:
Added the two additional pieces in the unit test testGetOperatorCost in workspace_test

buck test caffe2/caffe2/python:workspace_test -- testGetOperatorCost

buck test //aml/ml_foundation/exp_platform/large_scale_training/distributed_hogwild/auto_device_placement/tests/...

buck test //aiplatform/training/autotuning/tests/...

buck test //aiplatform/training/pipelining/tests/...

buck test //deeplearning/fblsim/tests/...

Flow tests:

ADP Greedy: f288078287
ADP MILP: f288079278

Reviewed By: CrazySherman, xtaofb

Differential Revision: D29860676

fbshipit-source-id: 8b3a9f2bf17c0dae48cfe2800e8821bf441e0b03
2021-07-27 12:48:36 -07:00
bba7800933 Add logical op symbol (#62063)
Summary:
This is for xla side [pr](https://github.com/pytorch/xla/pull/3054) to add logical op lowering

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62063

Reviewed By: ejguan

Differential Revision: D29937449

Pulled By: bdhirsh

fbshipit-source-id: ba421f6c2dad67395a383b5ed0b81ad9d59abe86
2021-07-27 12:19:56 -07:00
3bdee2bbed [jit] Rewrote DFS graph iterator to remove unnecessary local state (#61326) (#61980)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61980

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29917766

Pulled By: laurencer

fbshipit-source-id: 536c4806636fe9e709e8bffdefa9320127064dea
2021-07-27 11:50:20 -07:00
fa52b4b922 .github: chown workspace for render_test_results (#62207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62207

Workspace was getting held back due to permission denied errors, let's
ensure we have a chown'd / clean workspace for all render_test_results
runs

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: walterddr, janeyx99

Differential Revision: D29915232

Pulled By: seemethere

fbshipit-source-id: dd9fcc9c00d9665569bd8cfa57e5d2d8da965aac
2021-07-27 11:44:15 -07:00
acaac70f63 Revert D29883676: Migrate thnn_conv_depthwise2d from THC to ATen
Test Plan: revert-hammer

Differential Revision:
D29883676 (de3a4eb583)

Original commit changeset: 9b2ac62cdd8a

fbshipit-source-id: d211d3cb7723b5d2e73de6941a7e649e5f78864f
2021-07-27 11:28:52 -07:00
82d81455ae [2/N] Remove unittest.skip across all of torch.distributed. (#61887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61887

1) Introduced a `sandcastle_skip_if` decorator that ensures these
tests just get passed on sandcastle.
2) Fixed all test files under `test/distributed` to not use `unittest.skip`

Overall goal is to avoid using skips since sandcastle tags these tests as
continuously skipping.
ghstack-source-id: 134382237

Test Plan: waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D29784152

fbshipit-source-id: 17b4df6c5a55ff1d1e8e1de128fa679c3dfbcb7d
2021-07-27 10:53:23 -07:00
7fc96db45d fix typo errors in quantization-support.rst Line320 (#44447)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44379

change
"`torch.per_channel_symmetric` — per tensor, symmetric"
to
 "`torch.per_channel_symmetric` — per channel, symmetric"

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44447

Reviewed By: mruberry

Differential Revision: D29909645

Pulled By: ezyang

fbshipit-source-id: e1505d070ec2b335dd6503b528e6a2f3bda2f1e3
2021-07-27 10:42:29 -07:00
5f7f08f498 Reenable AMP on XLA (#61861)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61861

Fixes https://github.com/pytorch/pytorch/issues/61804

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D29881903

Pulled By: ezyang

fbshipit-source-id: 91530c10fa37715bec33f477285da119415a9da9
2021-07-27 10:32:01 -07:00
a0c1c7e5d4 Fixing the case when starter nodes depend on get_attr node (#62234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62234

There was a typo that we caught until recently, thus making this fix.

Reviewed By: 842974287

Differential Revision: D29924190

fbshipit-source-id: ee6259fcd41358aefe9680b419acc87c0c2821cb
2021-07-27 10:29:53 -07:00
8cdf16d1de Revert D29810657: [bc-breaking] reference option for linear produce a pattern instead of reference linear module
Test Plan: revert-hammer

Differential Revision:
D29810657 (9df605133e)

Original commit changeset: 949615bbc017

fbshipit-source-id: 54597d1f9636b0f94ae01c66018ff2592e5c39fc
2021-07-27 10:10:13 -07:00
d7ddae8e4f det_backward: correct, more robust and with complex support [clone] (#61905)
Summary:
Clone of https://github.com/pytorch/pytorch/pull/58195 to ease the import. Done by request from anjali411

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61905

Reviewed By: albanD

Differential Revision: D29937920

Pulled By: anjali411

fbshipit-source-id: 025892a8e6147790825b20458986730ad8c5bb0f
2021-07-27 10:08:26 -07:00
de3a4eb583 Migrate thnn_conv_depthwise2d from THC to ATen (#62006)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62006

Closes gh-24646, gh-24647

There is no `TensorIterator` equivalent to these kernels so this is just
migrating the existing kernels over to the ATen style.

I've benchmarked for contiguous tensors with this script:
```
import torch
shape = (10, 10, 100, 100)
x = torch.randn(*shape, device='cuda')
w = torch.randn((10, 1, 5, 5), device='cuda')

for _ in range(100):
    torch.nn.functional.conv2d(x, w, groups=10)
```

and similarly for backwards. I see these as the same to within measurement error.

|                   | Master Forward (us) | This PR Forward (us) |
|------------------:|:-------------------:|:--------------------:|
|           Forward |        133.5        |         133.6        |
|  Backward (input) |        1,102        |         1,119        |
| Backward (weight) |        2,220        |         2,217        |

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D29883676

Pulled By: ngimel

fbshipit-source-id: 9b2ac62cdd8a84e1a23ffcd66035b2b2fe2374d8
2021-07-27 10:00:25 -07:00
9df605133e [bc-breaking] reference option for linear produce a pattern instead of reference linear module (#61892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61892

This PR changes is_reference=True for linear to produce a pattern consists of dequant - float linear - quant instead of reference linear module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for
convert in the future.

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29810657

fbshipit-source-id: 949615bbc017bc454d81c8a6b2bdec53badaab19
2021-07-27 09:49:20 -07:00
6c6a9c73f2 [7/N] Nnapi backend delegation preprocess: compile_spec sanity check (#62213)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62213

Added sanity checks in preprocess function for Android NNAPI delegate.
`preprocess()` requires some input metadata passed through its `method_compile_spec` function argument.

`preprocess()` now throws specific error messages, if it cannot find the correct input arguments.
Example error message:
```
RuntimeError: method_compile_spec does not contain the "forward" key.
method_compile_spec should contain a Tensor or Tensor List which bundles input parameters: shape, dtype, quantization, and dimorder.
For input shapes, use 0 for run/load time flexible input.
method_compile_spec must use the following format: {"forward": {"inputs": at::Tensor}} OR {"forward": {"inputs": c10::List<at::Tensor>}}
```

nnapi_backend_preprocess.cpp: contains sanity check implementation
test_backend_nnapi.py: sanity check unit tests

Test: Ran `python test/test_jit.py TestNnapiBackend` in OSS successfully.

TODO: Using Tensors to pass input parameters is a temporary hack. When a dedicated object is implemented, update the sanity check error message.
ghstack-source-id: 134339282

Test Plan: Ran `python test/test_jit.py TestNnapiBackend` in OSS successfully.

Reviewed By: raziel, iseeyuan

Differential Revision: D29917004

fbshipit-source-id: 0d5c6b35889c556cda905ffc29c25c5422ae9ee4
2021-07-27 09:31:35 -07:00
2cbc0ede7d [DDP] Log if graph is static at end of training (#61871)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61871

When set_static_graph=False, the only type of dynamism we really
support in DDP is dynamic set of unused parameters which must be explicitly
enabled with find_unused_parameters=True. Although, some workflows have static
set of unused parameters, would be good to detect and add this to logging to
identify workflows that are candidates for static graph optimization.
ghstack-source-id: 134371429

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D29773962

fbshipit-source-id: 1f741984c6e6f8e3e55cf69ca719b1e25a485b13
2021-07-27 09:23:43 -07:00
79eb8bb299 [Static Runtime] Enforce proper output dtype for many ops (re-land) (#62267)
Summary:
Re-land of D29935444
We previously had lots of ops with implementations like this:
```
if (p_node->Output(0).isNone()) {
  p_node->Output(0) = create_empty_like(input_0);
}
...
auto& out = p_node->Output(0);
some_func_out(inputs, out);
```
This would make the output have the correct shape. But it would
also take the dtype of `input_0`, which is not always correct.

This change transforms these blocks to:
```
if (p_node->Output(0).isNone()) {
  p_node->Output(0) = some_func(inputs)
} else {
  ...
  auto& out = p_node->Output(0);
  some_func_out(inputs, out);
}
```
This gives the output the correct shape and dtype.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62267

Reviewed By: ejguan

Differential Revision: D29937253

Pulled By: malfet

fbshipit-source-id: d91ca5d5703490d7d349a1de2ad3bb09b0c33967
2021-07-27 08:54:09 -07:00
2eef1f27f8 Disable ccache for nccl builds (#62208)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62208

reverts
https://github.com/pytorch/pytorch/pull/55814
which removed a workaround for:
https://github.com/pytorch/pytorch/issues/13362

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29935472

Pulled By: nairbv

fbshipit-source-id: 7ce9cde1408f17153632036fd128814032739746
2021-07-27 08:07:26 -07:00
dc55d511d9 Forward fix mypy (#62263)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62263

Fixes current HUD Error: https://github.com/pytorch/pytorch/runs/3170342799

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D29935265

Pulled By: ejguan

fbshipit-source-id: 6f247833d24bff7aea42f6287493a85d62d73b96
2021-07-27 07:52:31 -07:00
3cd12448b4 Add forward mode differentiation for inverse and solve (#62160)
Summary:
This PR adds forward mode differentiation for `torch.linalg.inv`, `torch.linalg.inv_ex`, and `torch.linalg.solve` functions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62160

Reviewed By: mruberry

Differential Revision: D29917213

Pulled By: albanD

fbshipit-source-id: b08bbc830f77f342cc7ca5b823d7ea4380f2aaa8
2021-07-27 07:51:22 -07:00
a0309f89f4 Initial ModuleInfo implementation (#61935)
Summary:
This PR contains the initial version of `ModuleInfo` for use in testing modules. The design philosophy taken here is to start small and simple and build out / refactor as needed when more test coverage or `ModuleInfo` entries are added. As such, it's not intended for general usage yet. The PR contains the following:

* (new file) `torch/testing/_internal/common_modules.py`
  * `ModuleInfo` definition - metadata for each module to use in testing
  * `module_db` - the actual `ModuleInfo` database; currently contains entries for two modules
  * `ModuleInput` - analogous to `SampleInput` from OpInfo; contains `FunctionInput`s for both constructor and forward pass inputs
      * Constructor and forward pass inputs are tied together within a `ModuleInput` because they are likely correlated
  * `FunctionInput` - just contains args and kwargs to pass to a function (is there a nicer way to do this?)
  * `modules` decorator - analogous to `ops`; specifies a set of modules to run a test over
  * Some constants used to keep track of all modules under torch.nn:
      * `MODULE_NAMESPACES` - list of all namespaces containing modules
      * `MODULE_CLASSES` - list of all module class objects
      * `MODULE_CLASS_NAMES` - dict from module class object to nice name (e.g. torch.nn.Linear -> "nn.Linear")
* (new file) `test/test_modules.py`
    * Uses the above to define tests over modules
    * Currently, there is one test for demonstration, `test_forward`, which instantiates a module, runs its forward pass, and compares it to a reference, if one is defined

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61935

Reviewed By: mruberry

Differential Revision: D29881832

Pulled By: jbschlosser

fbshipit-source-id: cc05c7d85f190a3aa42d55d4c8b01847d1efd57f
2021-07-27 07:42:07 -07:00
afe3644321 Remove faulty process group code (#61907)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61907

Removing the code for faulty process group agent since it was replaced by faulty tensorpipe agent

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29794666

Pulled By: H-Huang

fbshipit-source-id: 0b35191cc07220b6774ecacc8d004f25fd2e87f0
2021-07-27 07:37:40 -07:00
a3be2ecc3a Revert D29887367: [Static Runtime] Enforce proper output dtype for many ops
Test Plan: revert-hammer

Differential Revision:
D29887367 (f4136c5efc)

Original commit changeset: cef04bfa52ec

fbshipit-source-id: 32e89f2b6381930559dd746b535904c3e90fd52b
2021-07-27 07:29:09 -07:00
b599c1e794 Create linalg and parametrizations codeowners (#62086)
Summary:
Added myself nikitaved  and IvanYashchuk

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62086

Reviewed By: mruberry

Differential Revision: D29920798

Pulled By: albanD

fbshipit-source-id: dcbd57bb2a438a1f04d4651447710fced83264d3
2021-07-27 06:50:41 -07:00
228b50e053 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D29930232

fbshipit-source-id: e36dbc59a25d7f36d3bb7a02ad76696f299712cf
2021-07-27 04:13:15 -07:00
2d7c1e3fa8 [bc-breaking] Produce quantization pattern for add_scalar and mul_scalar (#61859)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61859

BC-breakign note:
Previously we do not add observer/fake_quant for output of add/mul for tensor - scalar operation,
in this PR we added the observer/fake_quant instance (that's the same as input) to correctly model
the behavior of the quantized add_scalar and mul_scalar op (since quantized add/mul scalar assumes the
output quantized tensor have the same quantization parameter as input)

Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_add
python test/test_quantization.py TestQuantizeFxOps.test_mul

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29770859

fbshipit-source-id: f43fcbfecd04c392467770b22c481bbbdaf43c25
2021-07-27 02:46:00 -07:00
b176feec1e Add device and key for lazy tensors (#61621)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61621

Test Plan: CI

Reviewed By: mruberry

Differential Revision: D29912934

Pulled By: asuhan

fbshipit-source-id: 493c32063a3e756d93cbf1d876563a35eaafb537
2021-07-26 23:00:22 -07:00
2945a73d90 Add option to skip GH validation for torch.hub (#62139)
Summary:
Split from https://github.com/pytorch/pytorch/pull/62072

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62139

Reviewed By: mthrok

Differential Revision: D29891497

Pulled By: malfet

fbshipit-source-id: 5c0baf53a2acf8f95062bd001457e1f936011529
2021-07-26 22:44:12 -07:00
64283fe146 [DDP/Functional Optim] Support kwarg arguments (#62079)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62079

Adds support for kwarg arguments into functional optimizer running as
hook.
ghstack-source-id: 134330379

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D29838127

fbshipit-source-id: 2ab051ef5f0dff19c145ebe2260668b927ba47b2
2021-07-26 22:12:50 -07:00
c0ebeca1a8 [Functional Optim] Test kwargs parity for SGD (#62078)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62078

Ensure that kwarg arguments such as momentum and weight decay maintain
parity between optimizer.step and step_param.
ghstack-source-id: 134330377

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D29837942

fbshipit-source-id: 1ae39648fc26aebd8aaef1a7ac0e03b598a8ed60
2021-07-26 22:11:40 -07:00
478098aaac Revert D29801652: Refactor Tensor::to to call a primitive that is not copy_.
Test Plan: revert-hammer

Differential Revision:
D29801652 (29bb3f4647)

Original commit changeset: bb01eb1acf3d

fbshipit-source-id: 93693bad8068d47a3a4c16f34f300e03ea573897
2021-07-26 19:37:17 -07:00
69adb21940 Parity tests for functional optimizer step_param (#61756)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61756

DDP will support running optimizer as communication hook with
optimizers that support a per-parameter/gradient step function `step_param`.
Add parity tests as we implement more optimizers that support step_param to
ensure parity with regular optimizers.
ghstack-source-id: 134330378

Test Plan: Ci

Reviewed By: SciPioneer

Differential Revision: D29727549

fbshipit-source-id: 18977c896f12b8e478298488b298fd107affcf5f
2021-07-26 19:03:22 -07:00
b6d10a3a27 Fix infinite loop in _validate_not_a_forked_repo() (#62072)
Summary:
Increase `page_idx` in the loop rather than outside of it
Break from the loop when receive empty response as it means there are no more items to fetch via pagination request

Also, add options to use provided github token (via `GITHUB_TOKEN` environment variable)

Fixes failure with "Rate Limit Exceeded" when doing something like `torch.hub.list("pytorch/test-infra:dsf")`

Fixes https://github.com/pytorch/pytorch/issues/61755

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62072

Reviewed By: jbschlosser

Differential Revision: D29868539

Pulled By: malfet

fbshipit-source-id: 206082a0ba1208e9b15ff6c9c6cb71d2da74f1c3
2021-07-26 17:54:07 -07:00
d0f430927b [PyTorch][Edge] Serializing sub modules with same names (#61933)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61933

### Issue:

SubModules with same name are not serialized correctly in bytecode format while using `_save_for_mobile`. These submodules are not distinguished as different modules even though they have different foward, setstate etc if they have the same name.

### Fix:
Mangler creates unique names so that modules and submodules that have same names can be uniquely identified  while saving the module. iseeyuan rightly pointed out the underlying issue that mangler is not used in the process of saving bytecode and hence unique references for the submodules are not created. Please refer to the notebook to repro the issue: N777224

### Diff:
The above idea of fix is implemented. The mangled names are used in bytecode thereby the files in `code/` directory now have right reference to the `bytecode.pkl`

Will this have backward compatibility?
iseeyuan please feel free to correct or update this.
Yes. This fix impacts only modules with same name sub modules which were not serialized correctly before. Existing modules should have correct references and `_load_for_mobile` must not see any change. To confirm this the existing test cases need to pass for the diff to be approved and shipped.
ghstack-source-id: 134242696

Test Plan:
```
~/fbsource/fbcode > buck test caffe2/test/cpp/jit:jit -- BackendTest.TestCompositeWithSetStates
Downloaded 0/5 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 19.2 sec (100%) 17619/17619 jobs, 3/17619 updated
  Total time: 19.5 sec
More details at https://www.internalfb.com/intern/buck/build/91542d50-25f2-434d-9e1a-b93117f4efe1
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: de9e27cf-4c6c-4980-8bc5-b830b7c9c534
Trace available for this run at /tmp/tpx-20210719-161607.659665/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/844425127206388
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (8.140)
    ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestCompositeWithSetStates (0.528)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/844425127206388
```

```
~/fbsource/fbcode > buck test caffe2/test/cpp/jit:jit -- BackendTest.TestConsistencyOfCompositeWithSetStates
Building: finished in 4.7 sec (100%) 6787/6787 jobs, 0/6787 updated
  Total time: 5.0 sec
More details at https://www.internalfb.com/intern/buck/build/63d6d871-1dd9-4c72-a63b-ed91900c4dc9
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 81023cd2-c1a2-498b-81b8-86383d73d23b
Trace available for this run at /tmp/tpx-20210722-160818.436635/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8725724325952153
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (7.867)
    ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestConsistencyOfCompositeWithSetStates (0.607)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8725724325952153
```

To check the `bytecode.pkl` using module inspector please check:
N1007089

Reviewed By: iseeyuan

Differential Revision: D29669831

fbshipit-source-id: 504dfcb5f7446be5e1c9bd31f0bd9c986ce1a647
2021-07-26 16:31:48 -07:00
a13f714b6d DOC: remove git stamp from release documentation version (#58486)
Summary:
CI built the documentation for the recent 1.9.0rc1 tag, but left the git version in the `version`, so (as of now) going to https://pytorch.org/docs/1.9.0/index.html and looking at the version in the upper-left corner shows "1.9.0a0+git5f0bbb3" not "1.9.0". This PR should change that to cut off everything after and including the "a".

It should be cherry-picked to the release/1.9 branch so that the next rc will override the current documentation with a "cleaner" version.

brianjo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58486

Reviewed By: zou3519

Differential Revision: D28640476

Pulled By: malfet

fbshipit-source-id: 9fd1063f4a2bc90fa8c1d12666e8c0de3d324b5c
2021-07-26 16:28:59 -07:00
60070982d2 [Static Runtime] Fixed build failure in OSS due to test_utils (#62216)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62216

Test Plan: Imported from OSS

Reviewed By: hlu1

Differential Revision: D29917514

Pulled By: navahgar

fbshipit-source-id: 379863e6cd0b157de3bfa1482f5519b26654b3d2
2021-07-26 16:10:10 -07:00
962841b532 Fix subnet counting and re-enable check for multiple onnxifi ops in AOT (#62033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62033

Count the number of onnxifi ops rather than just number of subnets, since when the subnet size < min_ops, it isn't turned into an onnxifi op.

Test Plan:
Runs which ran into the "Did not find a partition with an SLS node" error now report "multiple onnxifi ops found"
From https://fb.workplace.com/groups/527892364588452/permalink/807802049930814/:
```
buck run mode/opt-clang -c python.package_style=inplace sigrid/predictor/scripts:rerun_aot -- --manifold_url="https://manifold.facebook.net/v0/read/tree/2021-06-30/onnxifi_caffe2_net_aot_input_arguments_01-55-32_711d9476?bucketName=dper3_job_meta&apiKey=dper3_job_meta-key&timeoutMsec=5000&withPayload=1"

```
Reran some failures from last week which now pass AOT:
From https://fb.workplace.com/groups/527892364588452/permalink/807802049930814/,
https://fb.workplace.com/groups/243933520351820/permalink/572715897473579/

```
buck run mode/opt-clang -c python.package_style=inplace sigrid/predictor/scripts:rerun_aot -- --manifold_url="https://manifold.facebook.net/v0/read/tree/2021-07-09/onnxifi_caffe2_net_aot_input_arguments_05-31-08_ef5393a6?bucketName=dper3_job_meta&apiKey=dper3_job_meta-key&timeoutMsec=5000&withPayload=1"
```
```
buck run mode/opt-clang -c python.package_style=inplace sigrid/predictor/scripts:rerun_aot -- --manifold_url="https://manifold.facebook.net/v0/read/tree/2021-07-12/onnxifi_caffe2_net_aot_input_arguments_14-44-34_cfdf3053?bucketName=dper3_job_meta&apiKey=dper3_job_meta-key&timeoutMsec=5000&withPayload=1"
```
```
buck run mode/opt-clang -c python.package_style=inplace sigrid/predictor/scripts:rerun_aot -- --manifold_url="https://manifold.facebook.net/v0/read/tree/2021-07-13/onnxifi_caffe2_net_aot_input_arguments_04-03-30_162e7e53?bucketName=dper3_job_meta&apiKey=dper3_job_meta-key&timeoutMsec=5000&withPayload=1"
```

Reviewed By: khabinov

Differential Revision: D29796893

fbshipit-source-id: e9de7529ef86745207d41643d0fbe932fa166437
2021-07-26 16:08:51 -07:00
037c4aa1d1 [fx2trt] flatten converter (#62202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62202

Add acc_ops.flatten converter. Also migrate to oss acc tacer for trt interpreter.

Test Plan: unit test

Reviewed By: khabinov

Differential Revision: D29861555

fbshipit-source-id: dac88a703fdbf386f3f7fb27674a67951f3add49
2021-07-26 15:49:01 -07:00
f883ed9095 irange-ify 8b (#62195)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62195

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29887946

fbshipit-source-id: e3bd44721cf06a34ced47994810212be8460a2bb
2021-07-26 15:38:54 -07:00
f7743e92bf irange-ify 9 (#62118)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62118

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29879670

fbshipit-source-id: 99b86ac7d65dfa2a47d0e6b7d65433200d18081e
2021-07-26 15:13:02 -07:00
026cfe85b4 Fix InlinedCallStack annotation to account for module calling its own (#61791)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61791

methods from forward

During inlining we attached InlinedCallstack to nodes being inlined. In
the process we attach moodule information as well, such that if
CallMethod is being inlined we know which class instance and class type
the method belongs to. However, CallMethod can be calling a method of
the same object to which the graph belongs. e.g.:

```
def forward(self, input):
  x = input + 10
  return forward_impl_(x, input)
```
Here forward_impl is method defined on the same class in which forward
is defined. Existing module hierarchy annotation will mislabel this as
unknown instance since the method is not associated with output of
GetAttr node (it would be we had called self.conv.forward_impl_ for
example).
Change in this PR reconciles this by creating a placeholder name "SELF"
for module instance indicating that you can traverse InlinedCallStack
backwards to find first node with name != SELF, which would be the name
of the object.
e.g.:
TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward

Test Plan:
Add test

Imported from OSS

Reviewed By: larryliu0820

Differential Revision: D29745443

fbshipit-source-id: 1525e41df53913341c4c36a56772454782a0ba93
2021-07-26 15:00:57 -07:00
f16102f72a Revert D29892919: Add squid proxy as egress cache
Test Plan: revert-hammer

Differential Revision:
D29892919 (e63160d735)

Original commit changeset: ac17227f2553

fbshipit-source-id: b78313147d60f26c1df68a25293e6b571ba66919
2021-07-26 14:42:28 -07:00
cf1f59452b Hacky support for meta tensor serialization. (#62192)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62192

This support is hacky because it doesn't preserve meta tensor storage
sharing (e.g., if you serialize a model with shared storage, e.g., a
tensor and a view on a tensor, when I deserialize the viewing
relationship will be broken and these are just different tensors.) The
hack is also durable, in the sense that we will be on the hook for
supporting `_rebuild_meta_tensor_no_storage` in perpetuity in the
future, even if we change our mind about the serialization format.

This unblocks an FB production use case. I didn't add C++ support to minimize
blast area of this patch.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D29910535

Pulled By: ezyang

fbshipit-source-id: d98dcdd0108dfc3ae730a071d3c583b6d0281d21
2021-07-26 14:33:45 -07:00
f0140a8c5f Disable cppcoreguidelines-non-private-member-variables-in-classes (#62212)
Summary:
This PR disables the `cppcoreguidelines-non-private-member-variables-in-classes` check. PyTorch makes use of `protected` members throughout the codebase, and we do not want to perform this clang-tidy check in CI to improve signal-to-noise.

Relevant failure: https://github.com/pytorch/pytorch/pull/61871/checks?check_run_id=3146453417

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62212

Reviewed By: driazati

Differential Revision: D29917882

Pulled By: 1ntEgr8

fbshipit-source-id: f607c3d050a122e95136f9915060c4cda6694c9d
2021-07-26 14:14:05 -07:00
1343eea037 Fix clang-tidy line filtering logic (#62210)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62210

Fixes #62204

Test Plan: #62211 clang-tidy should only error on the added lines (and not on context/removals)

Reviewed By: driazati

Differential Revision: D29917897

Pulled By: 1ntEgr8

fbshipit-source-id: de91dbf34c1ad8405507cad91ab3dd0d6c61d82e
2021-07-26 14:12:53 -07:00
2a83f24027 Enable macos clang-tidy installs (#62214)
Summary:
This PR enables installing our custom MacOS clang-tidy binaries. It also updates related documentation.

The binaries are produced by [this CI job](https://github.com/pytorch/test-infra/blob/master/.github/workflows/clang-tidy-macos.yml), and are published to S3.

This PR does not handle versioning of the downloaded binaries as this is being worked on separately. See https://github.com/pytorch/test-infra/issues/73

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62214

Test Plan:
On a MacOS machine, run
```bash
python3 -m tools.linter.install.clang_tidy
.clang-tidy-bin/clang-tidy --checks="*" --list-checks | grep "misc-max-tokens"
```

Reviewed By: janeyx99, mruberry

Differential Revision: D29917728

Pulled By: 1ntEgr8

fbshipit-source-id: 98d0d8b7a57bdebf0ebcdc83228ef391e8c6629e
2021-07-26 13:43:29 -07:00
f4136c5efc [Static Runtime] Enforce proper output dtype for many ops
Summary:
We previously had lots of ops with implementations like this:
```
if (p_node->Output(0).isNone()) {
  p_node->Output(0) = create_empty_like(input_0);
}
...
auto& out = p_node->Output(0);
some_func_out(inputs, out);
```
This would make the output have the correct shape. But it would
also take the dtype of `input_0`, which is not always correct.

This change transforms these blocks to:
```
if (p_node->Output(0).isNone()) {
  p_node->Output(0) = some_func(inputs)
} else {
  ...
  auto& out = p_node->Output(0);
  some_func_out(inputs, out);
}
```
This gives the output the correct shape and dtype.

Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D29887367

fbshipit-source-id: cef04bfa52ec082ad3a9a32aa27c44e275c6b24c
2021-07-26 13:27:02 -07:00
29bb3f4647 Refactor Tensor::to to call a primitive that is not copy_. (#61458)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61458

Context
-------
functorch is unable to vmap(grad(f)) when f contains a .to
call. This is because .to (when it is not a no-op) decomposes
to .copy_ under grad and the .copy_ is not compatible with vmap.

Fix
 ---
The fix for this is to have all Tensor::to variants call a new operator,
`_to_copy`, that always copies and is a primitive w.r.t. autograd so
that autograd decomposes Tensor::to into a call to `_to_copy`.
(This is related to https://github.com/pytorch/pytorch/issues/60956,
please let me know if you want to bikeshed the naming).

In order to get this done I had to do a bit of refactoring. All of the
`::to` implementations now call `to_impl` which may call `_to_copy`.

Autograd codegen changes
------------------------

The second thing I had to do was modify the autograd codegen. Right now,
autograd assumes that every output is either statically known to be
differentiable or not differentiable at codegen time. `_to_copy` is a
little special because its differentiability depends on the output
dtype. e.g. `torch.randn(3, requires_grad=True).to(torch.long)` is non
differentiable. To get this to work:
- I changed how `output_differentiability` in derivatives.yaml work.
- output_differentiability can now accept "conditions" for each of the
output arguments. A "condition" is some C++ code.
- We currently only support `output_differentiability` with conditions
if there is a single output. This is for convenience and can be changed
in the future.
- I added a new `output_differentiability_conditions` field to
DifferentiabilityInfo. This gets populated in load_derivatives.yaml
- forward-mode and reverse-mode AD take
`output_differentiability_conditions` into account.

Here's how the generated code for `VariableType::_to_copy`
[looks
like](https://gist.github.com/zou3519/93462df4bda1837acee345205b7cc849)
No other autogenerated code gets modified by this PR.

Performance benchmarking
------------------------
- I benchmarked [three
cases that demonstrate overhead](https://gist.github.com/zou3519/5b6985e6906b80eec5a0dd94ed5b6a1a).
- Case A: No-op .to(). Instruction count went from 50223 to 25623. I
have no clue why but this is a good thing.
- Case B: not-no-op .to(). Instruction count went from 665291 to 671961.
This is expected; `_to_copy` adds an additional dispatch.
- Case C: not-no-op .to() forward pass and backward pass. Instruction count
went from 4022841 to 4030057. This PR adds
an additional dispatch to .to() (so there should be one additional
dispatch in the forward pass) so this number looks reasonable.

Test Plan
---------
- test_torch.py has a test_to
- test_cuda.py has test_to*
- test_autograd has tests (test_type_conversions) that exercise the
reverse-mode path
- test_ops.py has some tests (like log_softmax) that exercise the
reverse-mode and forward-mode AD path.
- test_quantization, test_namedtensor all exercise tensor.to as well.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29801652

Pulled By: zou3519

fbshipit-source-id: bb01eb1acf3d79d84f284150d1be4be3b4ace351
2021-07-26 13:02:39 -07:00
e63160d735 Add squid proxy as egress cache (#62103)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62103

This PR adds a squid proxy that's deployed dedicated for PyTorch CI. Initially we only roll out to GHA, and if things are ok we will extend this to circleci tests if necessary.

`http_proxy` and `https_proxy` are compatible with the following http clients:

- curl
- wget
- python

Existing cache policy:

```
refresh_pattern -i .(7z|deb|rpm|exe|zip|tar|tgz|gz|ram|rar|bin|tiff|bz2|run|csv|sh)$ 1440 80% 2880
```

It uses the standard squid refresh_pattern for cache requests. In our setup, we tried
to cache at least (1440 minutes - 1 day) and at max (2880 minutes - 2 days), with
last-modified factor 80% ([squid doc](http://www.squid-cache.org/Doc/config/refresh_pattern/)). Please refer to [pytorch/test-infra](https://github.com/pytorch/test-infra/tree/master/aws/websites/squid-proxy) for details.

Right now, it only applies to the `build` and `test` step, to limit the scope and make sure build and test are more reliable with egress cache.

Test Plan: Imported from OSS

Reviewed By: jbschlosser, malfet, seemethere, janeyx99

Differential Revision: D29892919

Pulled By: zhouzhuojie

fbshipit-source-id: ac17227f2553ca62881711b3e9943488dfd8defd
2021-07-26 13:01:34 -07:00
d2594fa538 irange-ify 3 (#62112)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62112

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29879513

fbshipit-source-id: c01d18d34bb19014bf28d92c4d04b07e50a2770a
2021-07-26 12:56:58 -07:00
f5c6c3947e Remove Input Pointer Caching for XNNPack (#61959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61959

We no longer need to cache the Input Pointer as XNNPACK has implemented a more robust approach where indirection buffer does not need to be recalculated even if activation tensor pointer changes, as long as tensor dimensions are the same.

This reverses the changes in https://github.com/pytorch/pytorch/pull/42840/files

Reviewed By: kimishpatel

Differential Revision: D29777605

fbshipit-source-id: c1750538c17bce34f885c6f1bbb1f7164ebba25b
2021-07-26 12:02:15 -07:00
7ec6d1e857 irange-ify 2 (#62113)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62113

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29879507

fbshipit-source-id: 1fb114e44afe8c1407f648b705db7fd4edb9d6e3
2021-07-26 12:00:52 -07:00
6dc2c07304 [Reland] [DDP] Implement a hook which performs FunctionalSGD step. (#62177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62177

Reland of https://github.com/pytorch/pytorch/pull/61678
Fix CI failure by gating including torchvision model on whether torchvision is available or not.
ghstack-source-id: 134282165

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D29904101

fbshipit-source-id: 47e799eb4a90acbbda91c5857ea00de3045d49f5
2021-07-26 11:56:56 -07:00
1dfb687f3c Fixed off-by-one bug in Adam Smart Decay (#62135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62135

The initial implementation of Adam with Smart Decay had an off-by-one error.  This was in the summation of the geometric series used to calculate how much built-up momentum would have been discharged in skipped minibatches.

The unit tests should have caught these, but the testing strategy missed this because k, the "number of skipped minibatches" was always either 0 or so high that the impact of the bug was too small.  The impact of the bug was proportional to 1/k.  The testing strategy has also been adjusted to cover this bug.

Differential Revision: D29889309

fbshipit-source-id: b086c0efed5c27f621061e726533c73658daffc6
2021-07-26 11:55:38 -07:00
dcb3eadc1f [quant][fix] Update quantization c++ tests to not run if CPU_STATIC_DISPATCH is specified (#62197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62197

For build configs with ATEN_CPU_STATIC_DISPATCH defined, quantization tests will fail since they
require QuantizedCPU dispatch to be enabled.
This will fix some internal test failures like https://www.internalfb.com/intern/test/844424941811803?ref_report_id=0 which are run under the `caffe2_aten_cpu_inference` project

Test Plan:
buck test mode/dev //caffe2/aten:quantized_test

Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D29912742

fbshipit-source-id: b117eb9f4afb51e0d0dd52fbe9d5c5be7dfafe85
2021-07-26 11:39:45 -07:00
0ca5dc7f03 irange-ify 5 (#62114)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62114

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29879534

fbshipit-source-id: 0b1d6d2c9062a2fd7a55b00cb9f3d59ec941bad3
2021-07-26 11:07:54 -07:00
8e71f48f0a Handle simple NNAPI flatten NHWC case (#61796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61796

We can easily handle nnapi conversion for nhwc inputs
that have 1 channel or H & W are 1

Test Plan:
pytest test/test_nnapi.py::TestNNAPI::test_flatten

Imported from OSS

Reviewed By: saketh-are

Differential Revision: D29827735

fbshipit-source-id: 65dee4b42fceef1b032bf5dd1c4cc6e020d01e14
2021-07-26 10:59:04 -07:00
b73d759708 [fix] polygamma n>=1 (#61641)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/55357

TODO:
* [x] Use proper casting to avoid confusing the compiler

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61641

Reviewed By: albanD

Differential Revision: D29816592

Pulled By: mruberry

fbshipit-source-id: 2c020a6e4c325c1b5d15499a77fb39f9ba93dd79
2021-07-26 10:52:20 -07:00
ef7d572afa Ensure ShardedTensor handles list/tuple appropriately as size parameter. (#62109)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62109

The `size` parameter only worked correctly for *args like invocation
:10, 20 and not for list: [10, 20] and tuples: (10, 20). This PR ensures this
works similar to `torch.empty`.
ghstack-source-id: 134246166

Test Plan:
1) unit tests
2) waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D29884768

fbshipit-source-id: 7a4a3c5ed5d7c081344f6ead3170905b97fc652d
2021-07-26 10:31:32 -07:00
f9dce598a5 Add some missing cuda guards (#62100)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62100

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29880330

fbshipit-source-id: 7089000ccbcaa70a13f0ab4531b032bd5326e539
2021-07-26 10:26:22 -07:00
200b6ccdc0 Catch saved tensors default hooks race condition (#61957)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61957

If the user runs code that registers default saved tensor hooks from
multiple threads, it will fail with a nice error message most of the
time. This commit handles the very rare case where a race condition
would have made it fail silently.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D29848525

Pulled By: Varal7

fbshipit-source-id: eb9bdcfbeed857a988834651246390ea14eedd33
2021-07-26 09:48:47 -07:00
f2369f12f9 Add logging for dynamic rendezvous (#61822)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61822

Added scuba logging to the following files:
- dynamic_rendezvous.py
- c10d_rendezvous_backend.py

NOTE: This diff introduces the use of python's inspect module to easily allow for obtaining the calling method name and filename when logging. This module can mess with python's garbage collector, so special care was taken to never store references to results from inspect.stack() longer than absolutely needed.

Test Plan:
The following tests can be run.
```
buck run mode/dev-nosan //caffe2/test/distributed/elastic/rendezvous:c10d_rendezvous_backend_test
```
```
buck run mode/dev-nosan //caffe2/test/distributed/elastic/rendezvous:dynamic_rendezvous_test
```
```
buck run mode/dev-nosan //caffe2/test/distributed/elastic/events:lib_test
```

Reviewed By: aivanou

Differential Revision: D29643774

fbshipit-source-id: f10cd5ebf8f6860856267bc2483c0b85faacb0fd
2021-07-26 09:39:09 -07:00
6007ad3529 [Static Runtime] Refactor fb op tests to use testStaticRuntime (#62064)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62064

`testStaticRuntime` was previously only available in `test_static_runtime.cc`. It has been moved to a common library `test_utils` to facilitate code re-use. This also lets us test dynamic shapes in `test_fb_operators`

Reviewed By: hlu1

Differential Revision: D29858928

fbshipit-source-id: 68a94760166ddb745972b0f1fc24bed594937d1c
2021-07-26 08:25:10 -07:00
be17d6eadf Add default Saved Variable hooks (#61834)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61834

Expose a pair of functions to Python users: torch.autograd.graph.set_saved_tensors_default_hooks(pack, unpack) and torch.autograd.graph.reset_saved_tensors_default_hooks().
These functions control the hooks applied to saved tensors: all tensors saved in that context will be packed using the pack function, then unpacked accordingly when needed.

Currently, this works by simply calling register_hooks (cf #60975) directly at the end of the constructor of a SavedVariable. This could be optimized further by not performing the copy before registering default hooks, but this would require a small refactor. Edit: the refactor is done in #61927.

A current limitation is that if users create tensors in this context, they will not be able to register additional hooks on the saved tensor.

For instance, to perform something like #28997, one could define a pack function that saves to disk whenever the tensor size is too big and returns a filename, then unpack simply reads the content of the file and outputs a tensor, e.g.:

```
def pack(x):
    name = os.path.join(tmp_dir, str(uuid.uuid4()))
    torch.save(x, name)
    return name

def unpack(name):
    return torch.load(name)
```

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D29792193

Pulled By: Varal7

fbshipit-source-id: 33e931230ef59faa3ec8b5d11ef7c05539bce77c
2021-07-26 08:14:32 -07:00
89ca638c18 ENH Adds no batch dim support for AdativeMaxPool*D (#61847)
Summary:
Towards https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61847

Reviewed By: suo

Differential Revision: D29883887

Pulled By: jbschlosser

fbshipit-source-id: de3fcf1cc3878b138ab766d2a50cc59c52ec5a60
2021-07-26 07:35:36 -07:00
394dd391dd [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D29904940

fbshipit-source-id: 16ce87cc328f2950ed95a12710b50c444e363c79
2021-07-26 03:41:55 -07:00
e6e8745bea [nnc] Add simplifierUnderContext for simplification that needs context info: currently added for-stmt index var bounds info as context (#60687)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60687

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D29373315

Pulled By: huiguoo

fbshipit-source-id: 8729af60dd6d9735187b2118e3e83c75ef21789d
2021-07-25 23:30:13 -07:00
2299d6a013 Revert D29701447: [DDP] Implement a hook which performs FunctionalSGD step.
Test Plan: revert-hammer

Differential Revision:
D29701447 (bd95cf4473)

Original commit changeset: 183954593b82

fbshipit-source-id: 714e6a2b698147db9533a67783aed2a65d9d5bfe
2021-07-25 22:23:30 -07:00
457a3fb6d1 [bc-breaking][quant][graphmode][fx] Produce dequant - fp_op - quant pattern for copy nodes (#61763)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61763

This PR changes the is_reference=True option for convert_fx to produce a dequant - fp_op - quant
pattern for copy nodes like maxpool op.

Before the PR:
```
def forward(self, x):
    maxpool2d_input_scale_0 = self.maxpool2d_input_scale_0
    maxpool2d_input_zero_point_0 = self.maxpool2d_input_zero_point_0
    quantize_per_tensor = torch.quantize_per_tensor(x, maxpool2d_input_scale_0, maxpool2d_input_zero_point_0, torch.quint8);  x = maxpool2d_input_scale_0 = maxpool2d_input_zero_point_0 = None
    maxpool2d = self.maxpool2d(quantize_per_tensor);  quantize_per_tensor = None
    dequantize = maxpool2d.dequantize();  maxpool2d = None
    return dequantize
```

After (we expand the maxpool2d that works with quantized input to "dequant - maxpool2d - quant" pattern
```
def forward(self, x):
    maxpool2d_input_scale_0 = self.maxpool2d_input_scale_0
    maxpool2d_input_zero_point_0 = self.maxpool2d_input_zero_point_0
    quantize_per_tensor = torch.quantize_per_tensor(x, maxpool2d_input_scale_0, maxpool2d_input_zero_point_0, torch.quint8);  x = maxpool2d_input_scale_0 = maxpool2d_input_zero_point_0 = None
    dequantize = quantize_per_tensor.dequantize();  quantize_per_tensor = None
    maxpool2d = self.maxpool2d(dequantize);  dequantize = None
    maxpool2d_output_scale_0 = self.maxpool2d_output_scale_0
    maxpool2d_output_zero_point_0 = self.maxpool2d_output_zero_point_0
    quantize_per_tensor_1 = torch.quantize_per_tensor(maxpool2d, maxpool2d_output_scale_0, maxpool2d_output_zero_point_0, torch.quint8);  maxpool2d = maxpool2d_output_scale_0 = maxpool2d_output_zero_point_0 = None
    dequantize_1 = quantize_per_tensor_1.dequantize();  quantize_per_tensor_1 = None
    return dequantize_1
```

note that the call to self.maxpool2d is expanded to
```
    dequantize = quantize_per_tensor.dequantize();  quantize_per_tensor = None
    maxpool2d = self.maxpool2d(dequantize);  dequantize = None
    maxpool2d_output_scale_0 = self.maxpool2d_output_scale_0
    maxpool2d_output_zero_point_0 = self.maxpool2d_output_zero_point_0
    quantize_per_tensor_1 = torch.quantize_per_tensor(maxpool2d, maxpool2d_output_scale_0, maxpool2d_output_zero_point_0, torch.quint8);  maxpool2d = maxpool2d_output_scale_0 = maxpool2d_output_zero_point_0 = None
```

Test Plan:
```
python test/test_quantization.py TestQuantizeFx.test_copy_node_has_shared_actpp_instance
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29728900

fbshipit-source-id: cf2c7f1f6659e3ba97cbb7c6204dd13983da10bd
2021-07-25 19:49:13 -07:00
76d3cdf9df [quant] Add from_blob_quantized_per_channel API (#62049)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62049

Adds a new function that accepts qint data blobs as input and creates a per-channel quantized tensor using the pre-allocated data and the provided scale and zero_point inputs
Addresses issue #61777

Test Plan:
./build/bin/quantized_test --gtest_filter='TestQTensor.FromBlobQuantizedPerChannel'

Imported from OSS

Reviewed By: kimishpatel

Differential Revision: D29854136

fbshipit-source-id: da6ecd3fb59a6f40ae88430fdd5d895f93d5411c
2021-07-25 14:09:38 -07:00
7195b78a59 [quant] Add from_blob_quantized_per_tensor API (#61986)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61986

Adds a new function that accepts qint data blobs as input and creates a quantized tensor using the pre-allocated data and the provided scale and zero_point inputs
Addresses issue https://github.com/pytorch/pytorch/issues/61777

Test Plan:
./build/bin/quantized_test --gtest_filter='TestQTensor.FromBlobQuantizedPerTensor'

Imported from OSS

Reviewed By: kimishpatel

Differential Revision: D29831135

fbshipit-source-id: b08299bbe9e939fedff98a585e6b12c14d31f17e
2021-07-25 14:08:25 -07:00
bd95cf4473 [DDP] Implement a hook which performs FunctionalSGD step. (#61678)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61678

This diff makes the following changes: - Add `step_param` method to `_FunctionalSGD` class which is written similar to `step` but for a single param - Implement a communication hook wrapper that runs a given comm. hook and then applies functional SGD step - Verifies that this is equal to regular allreduce + SGD optimizerghstack-source-id: 133567598
ghstack-source-id: 134263399

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D29701447

fbshipit-source-id: 183954593b82a092414623292f9b10e675fef96e
2021-07-25 13:36:47 -07:00
8152433de2 [1/n] Update testing lib*.so path (#61960)
Summary:
### Issue

Build PyTorch wheel packages during build stage for pull requests and install during test stage.

### Fix
Update all tests which call lib*.so (under `./build folder`), change to call lib*.so in `{ent}/pytorch/lib/python3.8/site-packages/torch`

### Diff
This diff starts to update test_fx, test_backend and test_torchbind first to check if current ci pass

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61960

Test Plan: check of all ci workflows pass

Reviewed By: malfet, saketh-are

Differential Revision: D29823235

Pulled By: tktrungna

fbshipit-source-id: e7f652def698e303d4843fbaedf4859f5eca2fd9
2021-07-24 05:16:35 -07:00
956f1c981e fix a typo (#61061)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61061

Reviewed By: navahgar, Gamrix

Differential Revision: D29495806

Pulled By: Krovatkin

fbshipit-source-id: 510de724e3108c52af1b25b8ab53ae3c895b55f9
2021-07-24 00:35:58 -07:00
ee44d73e59 Modernize override (#61744)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61744

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29717320

fbshipit-source-id: 6eea4295ee2e5572ab337620be412376fcc2f3cc
2021-07-23 23:04:46 -07:00
d2e03dc484 [fx2trt] Add support for explicit batch dimension (#62110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62110

Add an option to opt in explicit batch dimension. Extend unit tests to test both scenario (implicit and explicit). Fixed some converters that doesn't work with explicit batch dimension before.

Add broadcast support and a generic function for adding elementwise binary ops.

Follow ups:
1. Adding the dynamic shape support in explicit batch dimension mode to allow different batch dimension at least.
2. Extend layer_norm plugin `PluginV2Ext` to make it work in explicit batch dimension.

Test Plan: unit tests

Reviewed By: jackm321

Differential Revision: D29798239

fbshipit-source-id: 91d47c6155d2473ed4a6f8d2816715a32c61b869
2021-07-23 22:54:07 -07:00
cc263ef795 [bc-breaking][quant][graphmode][fx] Add observer/fake_quant for copy nodes (#61687)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61687

Previously we do not insert observer/fake_quant for output copy nodes (e.g. maxpool).
But to produce reference patterns we need to insert observer/fake_quant for the output and later convert that to a quantize
node.

Model:
```
class M(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.maxpool2d = torch.nn.MaxPool2d(kernel_size=3)

    def forward(self, x):
        x = self.maxpool2d(x)
        return x
```
result of prepare:

Before:
def forward(self, x):
    x_activation_post_process_0 = self.x_activation_post_process_0(x);  x = None
    maxpool2d = self.maxpool2d(x_activation_post_process_0);  x_activation_post_process_0 = None
    return maxpool2d

After:
def forward(self, x):
    x_activation_post_process_0 = self.x_activation_post_process_0(x);  x = None
    maxpool2d = self.maxpool2d(x_activation_post_process_0);  x_activation_post_process_0 = None
    maxpool2d_activation_post_process_0 = self.maxpool2d_activation_post_process_0(maxpool2d);  maxpool2d = None
    return maxpool2d_activation_post_process_0

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29715566

fbshipit-source-id: 817df9b2933a35cad5331d8d8ce1c5ba0752e9df
2021-07-23 21:29:37 -07:00
78f7d8ccfa [Static Runtime] Remove wrappers for aten::cat (#62067)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62067

The wrapper for aten::cat is no longer needed after the variadic cat change in D29565344 (ae58a4c45d) .
Also added a simple test to test dynamic shapes, i.e., input tensors in args2 are larger than in args1.

Reviewed By: navahgar, mikeiovine

Differential Revision: D29864600

fbshipit-source-id: 44a712c2e776815c09e0bf5631412149b81274b2
2021-07-23 20:33:41 -07:00
7c09de8384 [torch deploy] add support for Python C extension modules (#58117)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58117

Previously it was not possible to load C extension modules with deploy because extension
modules need to link against the Python.h API functions. Since
each libtorchdeploy_interpreter.so had its own copy of these functions, it is not possible
to tell dlopen to resolve symbols in a loaded SO from one of these libraries without exposing
its symbols globally.

This patch adds a custom ELF loader which does the custom loading of attaching c extension libraries
to the Python API that loaded the shared library. Simple use of numpy and regex modules appears to work.

This diff has some limitations:

* 64-bit Linux only. OSX and windows use different formats for shared libraries. 32-bit ELF files are not supported.
* debug info is not immediately availiable to debuggers. A script for lldb is provided which can be loaded
so that lldb knows about the libraries as they are loaded.
* shared libraries can directly use the Python API, but libraries they depend on
  (via DT_NEEDED entries in their dynamic segment) may not use Python. In the future, we can
  try to detect whether a sub library uses the Python API and load it with our customer loader.
* TLS initialization and library initialization may occur in a different order than what would happen with dlopen,
  potentially leading to some issues running destructors in TLS segments. Use of this C++ features is relatively rare.

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D28435305

Pulled By: zdevito

fbshipit-source-id: 10f046053dd1d250e3c73f2cce8eb945eeba31b6
2021-07-23 19:58:54 -07:00
e856a45283 [Model Averaging] Refactor averagers to accept parameters instead of a module (#62105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62105

This is for the preparation of wrapping the averager as an optimizer, which can only accept parameters rather than a module.

Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 134213572

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager

buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_average_parameters

Reviewed By: rohan-varma

Differential Revision: D29883693

fbshipit-source-id: 474ba924a0b05068b12f163fb74582bccf314964
2021-07-23 18:39:45 -07:00
41f7a9dac0 [profiler][refactor] Avoid using legacy event in profiler (#61721)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61721

Remove dependency on LegacyEvent from the profiler

Test Plan:
python test/test_profiler.py -v

Imported from OSS

Reviewed By: kimishpatel, gdankel

Differential Revision: D29716769

fbshipit-source-id: 2c2b48f2ee096adcbde09821e0cc7c0fcb94d19f
2021-07-23 18:28:08 -07:00
06a3b23971 [android] Lite interpreter module to load from assets (#61609)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61609

Test Plan: Imported from OSS

Reviewed By: cccclai

Differential Revision: D29688641

Pulled By: IvanKobzarev

fbshipit-source-id: 7857bad51e91eae7c90a1218d463f3767f4fae15
2021-07-23 17:51:18 -07:00
643e58466e [nnc] Rename IRSimplifierBase with PolynomialBase (#60686)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60686

Test Plan: Imported from OSS

Reviewed By: navahgar, soulitzer

Differential Revision: D29373316

Pulled By: huiguoo

fbshipit-source-id: bd44bff60455076d1c5291273989e9939a428f9a
2021-07-23 17:18:41 -07:00
046272f3e5 [6/N] Nnapi Backend Delegate: Comprehensive OSS Tests (#61782)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61782

This PR depends on https://github.com/pytorch/pytorch/pull/61787

### Summary:
Added more comprehensive tests for Android NNAPI delegate.
Previously, there was only one basic test for lowering a PReLU module with the NNAPI delegate. Now, more tests are inherited from `test_nnapi.py`, the file for testing NNAPI conversion and execution without the delegate.

**test_backend_nnapi.py**
Test file for Android NNAPI delegate.
- `TestNnapiBackend` class inherits tests from `test_nnapi.py` and overrides the model conversion to use the delegate API.
- Includes an extra test for passing input arguments as Tensors and Tensor Lists.
- Has extra set up for loading the NNAPI delegate library and changing the default dtype from float64 to float32 (dtype is typically float32 by default, but not in delegate backend unit tests)

**test_nnapi.py**
Test file for Android NNAPI without the delegate.
- Some code was refactored to allow override of only the NNAPI conversion call.
- An extra function was added to allow the NNAPI delegate unit test to turn off the model execution step. Once the NNAPI delegate's execution implementation is complete, this may no longer be necessary.

### Test Plan:
I ran `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py` to run both test files.

Test Plan: Imported from OSS

Reviewed By: raziel, iseeyuan

Differential Revision: D29772005

fbshipit-source-id: 5d14067a4f6081835699b87a2ece5bd6bed00c6b
2021-07-23 17:04:07 -07:00
f03e7170f0 ENH Updates docs and tests for regression modules that already support no-batch-dims (#61461)
Summary:
Towards https://github.com/pytorch/pytorch/issues/60585

This PR does not use `check_sum_reduction` because I wanted to test every reduction option.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61461

Reviewed By: suo

Differential Revision: D29883744

Pulled By: jbschlosser

fbshipit-source-id: cdad0effb41f0484938caad0d4c9d6d83e2aec07
2021-07-23 16:40:17 -07:00
1ec6205bd0 ENH Adds no_batch_dim support for maxpool and unpool for 2d and 3d (#61984)
Summary:
Towards https://github.com/pytorch/pytorch/issues/60585

(Interesting how the maxpool tests are currently in `test/test_nn.py`)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61984

Reviewed By: suo

Differential Revision: D29883846

Pulled By: jbschlosser

fbshipit-source-id: 1e0637c96f8fa442b4784a9865310c164cbf61c8
2021-07-23 16:14:10 -07:00
f4ffaf0cde Fix type promotion for cosine_similarity() (#62054)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61454

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62054

Reviewed By: suo

Differential Revision: D29881755

Pulled By: jbschlosser

fbshipit-source-id: 10499766ac07b0ae3c0d2f4c426ea818d1e77db6
2021-07-23 15:20:48 -07:00
e408af083f Improve MHA docs (#61977)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60831
Also clarifies the relationship between `embed_dim` and `num_heads` (see https://github.com/pytorch/pytorch/issues/60853 and https://github.com/pytorch/pytorch/issues/60445).
Formatting was overhauled to remove some redundancy between the input docs and shape docs; suggestions / comments welcome!

Link to rendered docs here: https://14912919-65600975-gh.circle-artifacts.com/0/docs/generated/torch.nn.MultiheadAttention.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61977

Reviewed By: bhosmer

Differential Revision: D29876884

Pulled By: jbschlosser

fbshipit-source-id: a3e82083219cc4f8245c021d309ad9d92bf39196
2021-07-23 15:19:34 -07:00
cf3cc01f1d [Static Runtime] Add is_frozen to StaticModule ctor (#62020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62020

Add is_frozen to StaticModule ctor so we can skip freezing in StaticModule.

Reviewed By: ajyu, mikeiovine

Differential Revision: D29807431

fbshipit-source-id: 7742e9f5c5ae9f442a9e4007c870a14fd8b4af20
2021-07-23 15:12:35 -07:00
fa11103c6a [clang-tidy] Fix unknown GNU flag error (#62128)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62128

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D29888297

Pulled By: 1ntEgr8

fbshipit-source-id: 0657d5baa72c014a83c9def4a39338c52f4ef8d1
2021-07-23 14:46:51 -07:00
9730d91abd MAINT Migrates multilabel_margin_loss from THC to ATen (CUDA) (#60708)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24603
Fixes https://github.com/pytorch/pytorch/issues/24602

<s>The implementation should be exactly the same, so it is strange that the benchmarks show such a significant improvement in this PR.</s>

The benchmarks are now the same.

<details>
 <summary>Benchmark script</summary>

```python
from itertools import product

import torch
import torch.nn as nn
import torch.nn.functional as F
import time

torch.manual_seed(0)
MS_PER_SECOND = 1000

def _time():
    torch.cuda.synchronize()
    return time.perf_counter() * MS_PER_SECOND

device = "cuda"
C = 30
n_runs = 100
reductions = ["none", "sum", "mean"]
Ns = [1_000, 10_000, 100_000]

for reduction, N in product(reductions, Ns):
    total_fwd_time = 0
    total_back_time = 0
    grad_out = torch.randn(N, device=device)
    if reduction != "none":
        grad_out = grad_out[0]

    for _ in range(n_runs):
        input = torch.randn(N, C, device=device, requires_grad=True)
        target = torch.randint(0, C, size=input.size(), device=device)

        # forward
        start = _time()
        result = F.multilabel_margin_loss(input, target, reduction=reduction)
        total_fwd_time += _time() - start

    result = F.multilabel_margin_loss(input, target, reduction=reduction)
    for _ in range(n_runs):
        # backward
        start = _time()
        result.backward(grad_out, retain_graph=True)
        total_back_time += _time() - start

    fwd_avg = total_fwd_time / n_runs
    bwd_avg = total_back_time / n_runs
    print(
        f"input size({N}, {C}), reduction: {reduction}, fwd: {fwd_avg:.2f} (ms), back: {bwd_avg:.2f} (ms)"
    )
```

</details>

## master

```
input size(1000, 30), reduction: none, fwd: 0.14 (ms), back: 0.41 (ms)
input size(10000, 30), reduction: none, fwd: 1.26 (ms), back: 3.58 (ms)
input size(100000, 30), reduction: none, fwd: 13.15 (ms), back: 34.68 (ms)
input size(1000, 30), reduction: sum, fwd: 0.14 (ms), back: 0.38 (ms)
input size(10000, 30), reduction: sum, fwd: 1.16 (ms), back: 3.53 (ms)
input size(100000, 30), reduction: sum, fwd: 13.04 (ms), back: 34.53 (ms)
input size(1000, 30), reduction: mean, fwd: 0.14 (ms), back: 0.38 (ms)
input size(10000, 30), reduction: mean, fwd: 1.17 (ms), back: 3.52 (ms)
input size(100000, 30), reduction: mean, fwd: 13.12 (ms), back: 34.54 (ms)
```

## this PR

```
input size(1000, 30), reduction: none, fwd: 0.14 (ms), back: 0.35 (ms)
input size(10000, 30), reduction: none, fwd: 1.22 (ms), back: 2.98 (ms)
input size(100000, 30), reduction: none, fwd: 12.90 (ms), back: 29.32 (ms)
input size(1000, 30), reduction: sum, fwd: 0.14 (ms), back: 0.32 (ms)
input size(10000, 30), reduction: sum, fwd: 1.16 (ms), back: 2.97 (ms)
input size(100000, 30), reduction: sum, fwd: 13.00 (ms), back: 29.17 (ms)
input size(1000, 30), reduction: mean, fwd: 0.14 (ms), back: 0.32 (ms)
input size(10000, 30), reduction: mean, fwd: 1.17 (ms), back: 2.97 (ms)
input size(100000, 30), reduction: mean, fwd: 13.09 (ms), back: 28.91 (ms)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60708

Reviewed By: saketh-are

Differential Revision: D29856579

Pulled By: ngimel

fbshipit-source-id: b6bbf27a71e5a04f61779f6fef4ed1c98baa2607
2021-07-23 13:45:28 -07:00
a6c6fd923e [profiler] Nvtx support (#61634)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61634

Legacy profiler supported Nvtx and that was used by emit_nvtx, this PR
adds support for Nvtx in the new compiler, to prepare for the eventual
deprecation of the legacy profiler

Test Plan:
Verified that the profiles produced with nvprof are the same
```
import torch
import torchvision.models as models
from torch.autograd.profiler import emit_nvtx, load_nvprof

model = models.resnet18().cuda()
inputs = torch.randn(5, 3, 224, 224).cuda()

with emit_nvtx(record_shapes=True):
  model(inputs)
```
/usr/local/cuda/bin/nvprof  -o test_trace2.prof -f  -- python test_emit_nvtx.py
```
evt = load_nvprof("/home/iliacher/local/pytorch/test_trace.prof")
```

Imported from OSS

Reviewed By: kimishpatel, gdankel

Differential Revision: D29691316

fbshipit-source-id: 1e186cc072368f3e3987a2da0bfd90ed328817c5
2021-07-23 13:37:09 -07:00
812bc1dde6 Smart Decay for Adam - DPER3 (#62058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62058

This is the second diff in this stack.  This diff includes the changes to DPER3; the first diff includes the changes to Caffe2.

We want to decay learning parameters properly.  Previously this was not done when a parameter is absent from a minibatch.  We fix this by keeping track of missed minibatches and making decay catch up accordingly.

The exponential moving averages (EMA) for the first and second moments used in Adam are updated only for parameters seen in a minibatch.  Actually, for these parameters, 0 should be added to the EMAs and the EMAs should then be decayed by multiplying by beta1 and beta2 respectively.

To avoid the computational overhead of touching every parameter for every minibatch, we:
* keep track of the last time a parameter is seen
* instead of decaying the EMAs by multiplying by beta1 and beta2, we multiply by beta1^k and beta2^k, where k is the number of minibatches since the parameter was last seen.

We hope this will significantly improve the inconsistent learning parameter issue we have seen with Adam.

Differential Revision: D29638897

fbshipit-source-id: 18d8e227d72c2e23010ca81e0f6eeb78872c8d3c
2021-07-23 13:26:30 -07:00
5224490ae9 Implement NumPy-like frombuffer tensor constructor. (#59077)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59077

Fixes #58549

`from_buffer` constructs a tensor object from an already allocated buffer through
CPython's buffer protocol. Besides the standard `dtype`, `count`, and `offset` parameters,
this function also accepts:

- `device`: where the buffer lives
- `requires_grad`: should autograd record operations on the new tensor

A new test file _test_buffer_protocol.py_ was created. Currently, only CPU tests were
implemented. That's because neither PyTorch nor Numba implements CPython's buffer
protocol. Therefore, there's no way to create a CUDA buffer with the existing
dependencies (could use PyCUDA for that, though).

At the moment, if `device` differs from the device the buffer actually lives, two things
may happen:

- `RuntimeError`, if `device='cuda'`
- Segmentation fault (not tested -- see above), if `device='cpu'`

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D29870914

Pulled By: mruberry

fbshipit-source-id: 9fa8611aeffedfe39c9af74558178157a11326bb
2021-07-23 13:17:48 -07:00
ec4e6181e6 [Static Runtime] Fix broken test_static_runtime build (#62098)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62098

The build was broken by D29821533 (1d2ea76afb). The `clamp` overloads used in `deep_wide.h`
are no longer available in the `at::native` namespace.

Use `at::cpu::clamp` and `at:🗜️:clip_out` (which should be an alias for
clamp) instead.

Reviewed By: hlu1

Differential Revision: D29880187

fbshipit-source-id: 210b6d2be8a8142e7af1a0ba07e55a95b1a77d25
2021-07-23 12:35:09 -07:00
b820493cf1 [skip ci] Refactor CIFlow init logic (#62102)
Summary:
This PR refactors the CIWorkflow post_init step to best account for how CIFlow interacts with everything.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62102

Test Plan: This PR did NOT garner any workflow changes. I ran mypy and flake8 on the changed file locally with no issues.

Reviewed By: jbschlosser

Differential Revision: D29883275

Pulled By: janeyx99

fbshipit-source-id: 6c5c1fc1878159e0de1bf8d9bd0cb32aa47af49a
2021-07-23 12:29:04 -07:00
71cfbc45b4 Remove redundant torch.cuda.set_device(self.rank) (#62097)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62097

as title
ghstack-source-id: 134196740

Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_profiling_autograd_profiler

Reviewed By: rohan-varma

Differential Revision: D29880040

fbshipit-source-id: 6a06fb2d87e9a7dfa1d7c81bf0c3fe115c1a1abb
2021-07-23 11:59:16 -07:00
5ef667a8b8 Remove duplicated movedim implementation (#61939)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61939

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D29850798

Pulled By: zou3519

fbshipit-source-id: e803b235d8535a204515ff9f5d46b8c4d191b73c
2021-07-23 11:52:07 -07:00
10ccc5a81c remove randn? from torch.testing namespace (#61840)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61840

Redo of #60859.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D29871017

Pulled By: mruberry

fbshipit-source-id: 47afed1dc6aa0bb1e826af616ef5d5aaabb8e5bb
2021-07-23 11:51:03 -07:00
cb47d1f9c8 OpInfo Ref: fmod, remainder (#61527)
Summary:
See https://github.com/pytorch/pytorch/issues/54261 for OpInfo tracker.

This PR:

* [x] Adds references to both `fmod` and `remainder` for testing.
* [x] Updates `remainder` documentation to add a note on divergence with `std::remainder`. (something similar to NumPy's note: https://numpy.org/doc/1.20/reference/generated/numpy.remainder.html), see: https://github.com/pytorch/pytorch/pull/61527#discussion_r670238788 for further discussion.

cc: mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61527

Reviewed By: albanD

Differential Revision: D29841266

Pulled By: mruberry

fbshipit-source-id: be99851a94f53ea2fc07b64fd7c947775129658c
2021-07-23 11:44:32 -07:00
c9b71549f2 don't allow alias dispatch keys to go in the DispatchKeySet (#61771)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61771

Test Plan: Imported from OSS

Reviewed By: asuhan

Differential Revision: D29736432

Pulled By: bdhirsh

fbshipit-source-id: 54bb716db1e41565b00f4f01ea0096f834087577
2021-07-23 11:29:46 -07:00
143ef016ee Throw RuntimeError when numpy() is called on a tensor with conjugate or negative bit set (#61925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61925

Resolves https://github.com/pytorch/pytorch/issues/59945 and https://github.com/pytorch/pytorch/issues/59946

bc breaking note: Unlike before, complex_tensor.conj().numpy(),  complex_float_tensor.conj().view(torch.float64), complex_float_tensor.conj().imag.view(torch.int32) now doesn't return a view but instead errors out

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29819288

Pulled By: anjali411

fbshipit-source-id: 4bebec721eb535f44ef4b728bdc75fa444e05d16
2021-07-23 11:28:36 -07:00
943ca5f6f7 [special] alias for mvlgamma (#61633)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345

Have added `out` variant for consistency.

TODO:
* [x] Check docs https://docs-preview.pytorch.org/61633/special.html#torch.special.multigammaln

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61633

Reviewed By: albanD

Differential Revision: D29815514

Pulled By: mruberry

fbshipit-source-id: 003c7b6a5938ecc7a96727310e8a39da0b3d7aca
2021-07-23 11:24:27 -07:00
0c55f1bdec [torchelastic] Improve process termination logic (#61602)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61602

The diff introduces signal handlers and SignalException that is raised when the agent process receives SIGTERM or SIGINT.

When any of these signals received, the termination handler will raise the `SignalException`. The exception will then be processed by the main agent loop. The `shutdown(signum)` will be invoked, that would propagate the received signal to the child processes. The default 30 seconds timeout introduced: if child processes will not be able gracefully terminate during this timeout, the agent process would kill the processes via SIGKILL.

Test Plan: unittests, sandcastle

Reviewed By: cbalioglu

Differential Revision: D29671783

fbshipit-source-id: 3dbca2125676dc18d417cc3e3bb0301fdd42737a
2021-07-23 11:00:15 -07:00
e42360d56f Remove default arguments before calling to __torch_dispatch__ (#61123)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61123

This applies the design pattern of removing explicit arguments when they
coincide with the default arguments.  This simplifies argument patterns
that dispatch kernels receive and make it easier for us to maintain BC
(as addition of a new default argument isn't immediately BC-breaking
for dispatch implementors).

There is an important extra API which I haven't implemented here yet,
which is to take an incomplete sequence of arguments and fill out their
defaults (in case the user did want normalization).  I plan on adding
that in a future PR.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D29853616

Pulled By: ezyang

fbshipit-source-id: 71c672cb3a7d4d01f838a1c7fcdb75a8ce7d058e
2021-07-23 10:41:35 -07:00
32d0c3e8ee Support for reference convert_fx working on gpu
Summary:
This PR enables gpu only quantization, best used with is_reference since
there are not many gpu kernels for ops as of now.

This PR mainly changes how qconfigs and their obs constructors operate once they
on modules qconfig. The function add_module_to_qconfig_obs_ctr takes the obs constructors on the original
qconfig, and configures them so that when invoked, the created obs will
be on whatever device the module occupies. (Once observers are created,
module.to(device) is already setup so that it moves any observers). To do this,
a new method and a few small chanegs were added to the _PartialWrapper class that
our observers already use to create constructors (without changing the
existing functionality). These changes work in
concert with changes to the prepare flow such that when the qconfigs are
propagated to the moduels (in quantize.py and qconfig_utils.py) they are configured using add_module_to_qconfig_obs_ctr.

Ideally this would work on other models but the is_reference support for
a lot of modules isn't there yet, those tests should be added in a
future PR

Test Plan:
python test/test_quantization.py TestQuantizeFxModels.test_static_gpu_convert_basic

python test/test_quantization.py TestQuantizeFxModels.test_switch_device_prepare_convert

python test/test_quantization.py TestQuantizeFxModels.test_prepare_serialize_switch_device_convert

python test/test_quantization.py TestQuantizeFx.test_qconfig_precedence

Reviewed By: vkuzo

Differential Revision: D29684114

fbshipit-source-id: 19fefb8e1998eaf212723e836276ccf39467f2e7
2021-07-23 10:30:38 -07:00
0df1679e5c BatchNorm: fix mixed precision usage with affine=False (#61962)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61924

The fused backward kernel was using the weight dtype to detect mixed precision usage, but the weights can be none and the `running_mean` and `running_var` can still be mixed precision. So, I update the check to look at those variables as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61962

Reviewed By: albanD

Differential Revision: D29825516

Pulled By: ngimel

fbshipit-source-id: d087fbf3bed1762770cac46c0dcec30c03a86fda
2021-07-23 09:55:52 -07:00
e318058ffe Ignore LNK4099 for debug binary libtorch builds (#62060)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61979

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62060

Test Plan:
This CI shouldn't break
and https://github.com/pytorch/pytorch/pull/62061

Reviewed By: driazati

Differential Revision: D29877487

Pulled By: janeyx99

fbshipit-source-id: 497f84caab3f9ae609644fd397ad87a6dc8a2a77
2021-07-23 09:31:41 -07:00
04c95a0638 ns for fx: expose hook to define custom weight extraction functions (#62047)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62047

Adds a hook for user to define a weight extraction function for a
custom type.

Example usage:
```
op_to_type_to_weight_extraction_fn = \
    get_op_to_type_to_weight_extraction_fn()
op_to_type_to_weight_extraction_fn['call_function'][_wrapped_linear] = \
    torch.quantization.ns.weight_utils.get_linear_fun_weight

results = extract_weights_impl(
    'a', m1, 'b', m2,
    op_to_type_to_weight_extraction_fn=op_to_type_to_weight_extraction_fn)
```

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29853625

fbshipit-source-id: 183916ef54ba303bc818e0eba00b52e33c4633ad
2021-07-23 09:31:37 -07:00
07c6a12008 ns for fx: fix typing issue in weight extraction (#62041)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62041

Before this PR, weights of conv and linear modules were extracted
as lists, in order to match the signature of LSTM weights.

After this PR, weight extraction preserves the type of the weights,
so extracted weights of conv and linear have a different type
from LSTM weights.  The comparison util functions are updated to
handle the LSTM weight type of `List[tensor]`.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29853626

fbshipit-source-id: 93da5b9b0b174679c61528d02b6b902cb064444e
2021-07-23 09:31:33 -07:00
eaba16d665 ns for fx: change weight extraction to direct mapping (#62038)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62038

Updates the logic to extract weights from nodes to use a
direct mapping from type to weight extraction function.

This is needed for a future PR which will allow users to
specify custom weight extraction functions for user defined
types.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29853627

fbshipit-source-id: 3ef90ef4bd7b28f6316c0af215a2bd3ff8a2aeca
2021-07-23 09:30:08 -07:00
8a2c525d3b Fix some sign comparisons (#61849)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61849

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29736180

fbshipit-source-id: 1391b11e73725ee985b9aa768566ca77f44d04ae
2021-07-23 09:03:33 -07:00
9d4056468e Migrate scheduled jobs debuggability to GHA (#62056)
Summary:
This removes the debuggable-ci workflow in Circle and enables the same idea in GHA, to allow contributors to run scheduled GHA workflows by:
1. assigning the PR to pytorchbot.
2. labeling the PR with ciflow/scheduled
3. unassigning the PR.

This PR also adds the trigger_action_only logic to windows_ci_template yaml, as it was present on the linux template and seemed to be left out by mistake.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62056

Test Plan: Note that this periodic job https://github.com/pytorch/pytorch/pull/62056/checks?check_run_id=3138504471 ran later than other jobs (like [this one](https://github.com/pytorch/pytorch/pull/62056/checks?check_run_id=3138226668)), and its time is close to when unassigning happens.

Reviewed By: seemethere

Differential Revision: D29859079

Pulled By: janeyx99

fbshipit-source-id: cd5c6be415cfa8090e3cac90625f92b49fd453a8
2021-07-23 08:48:22 -07:00
b03b45afd9 [DDP Comm Hook] Use a single tensor instead of a tensor list as the comm hook result (#62074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62074

Since SPMD mode is retired, the comm hook result will always be a single tensor.

This can improve comm hook developer experience, as no need to add an extra `[0]` to the precursor future result.

#Closes: https://github.com/pytorch/pytorch/issues/61914
ghstack-source-id: 134164593

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed:c10d
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork

Reviewed By: rohan-varma

Differential Revision: D29864732

fbshipit-source-id: 59fe6dd78b66214b1788514ad4d236039d9bda31
2021-07-23 03:32:05 -07:00
1d2ea76afb clamp: port to structured kernel (#61361)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61361

This PR ports the `clamp` kernel to the structured format. In addition, it introduces `OptionalScalarRef` as a replacement for `c10::optional<Scalar>&`. The latter, although it is a reference type, can still involve copying the contained `Scalar` (e.g. if the actual parameter is a `Scalar` or if a `c10::optional<Scalar>` is constructed just to call a kernel). `OptionalScalarRef` contains only a `const Scalar&`, and stores flag about whether the instance contains something inside the `Scalar` itself using a new tag.

For more information, see #55070.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29821533

Pulled By: SplitInfinity

fbshipit-source-id: 88d55df5a4b2c14b68a57e4905d90eea1b088d99
2021-07-23 02:02:07 -07:00
b106b958eb preserve residual in transformer norm_first (#61692)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61692

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D29706830

Pulled By: bhosmer

fbshipit-source-id: d9c9e88fb589d46189955a96909c6ca76d587f72
2021-07-22 23:49:08 -07:00
53222c59f0 Reformat (#62073)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62073

as title
ghstack-source-id: 134159445

Test Plan: N/A

Reviewed By: rohan-varma

Differential Revision: D29869185

fbshipit-source-id: 17a32d56860e9469bd26c4eb4ca2d483827d946e
2021-07-22 23:36:22 -07:00
3687bbb1ed [pruner] add Conv2d support (#61778)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61778

Adding Conv2d as supported modules for the pruner. Previously the pruner only supported Linear layers. This addition includes:
- adding a Conv2d activation reconstruction forward hook to match Conv2d weight shapes
- in `prepare`, checking the type of the module and using the corresponding activation forward hook
ghstack-source-id: 134143557

Test Plan:
Added conv2d tests
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1LLf3

Reviewed By: jerryzh168

Differential Revision: D29719045

fbshipit-source-id: 6a9f91b96992c552fff32f0e5a6e22f16eb7077b
2021-07-22 23:00:31 -07:00
a9b0a921d5 Disable avoid-non-const-global-variables lint check (#62008)
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`

All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`;  do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008

Reviewed By: driazati, r-barnes

Differential Revision: D29838584

Pulled By: malfet

fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
2021-07-22 18:04:40 -07:00
260198d42c Disable bazel in CircleCI (#62055)
Summary:
As it runs in GHA for a while

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62055

Reviewed By: zhouzhuojie, seemethere

Differential Revision: D29856620

Pulled By: malfet

fbshipit-source-id: 754e392442f68d4eee15811e2bd2cf147326c42a
2021-07-22 16:28:12 -07:00
a91be24e2d Modernize make pointers (#61741)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61741

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29717385

fbshipit-source-id: 4452b77981e49175f744bdaab12cd225bf75b90e
2021-07-22 15:54:37 -07:00
f98fa5ea13 [skip ci] minor typo link fix (#62042)
Summary:
This is not a functional change but a typo fix where I forgot to update the link to windows_smoke_tests.csv in test_python_first_shard. The windows_smoke_tests.csv is currently the same in pytorch/test-infra and my fork, janeyx99/test-infra, but that will not be the case in the future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62042

Reviewed By: seemethere

Differential Revision: D29851984

Pulled By: janeyx99

fbshipit-source-id: 9bafdf0ba006b9128463e3cf132fdfcddd3d10f2
2021-07-22 15:34:41 -07:00
1a64a5c0ba .github: Only run workflows on pytorch/pytorch (#62044)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62044

Downstream users have reported that they're seeing github workflows pop
up in their downstream forks which is not ideal. Let's make it so that
all of these generated workflows actually get skipped.

Also includes workflows related to automating pytorch/pytorch repository
maintenance

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet, janeyx99

Differential Revision: D29852199

Pulled By: seemethere

fbshipit-source-id: bbc1684c06a50bb3597f3112cb65fe9c1a4d7c1f
2021-07-22 15:08:31 -07:00
414537ac99 DOC Fixes link in register_module_backward_hook (#61999)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61580

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61999

Reviewed By: saketh-are

Differential Revision: D29847397

Pulled By: albanD

fbshipit-source-id: 3d9e1a5abac82d658b4f1746ace73e2fecb41725
2021-07-22 14:29:40 -07:00
b522f3be4c Svd docfix (#62028)
Summary:
moving back the variable names to match the python variable and remove unicode exponents.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62028

Reviewed By: saketh-are, mruberry

Differential Revision: D29848591

Pulled By: albanD

fbshipit-source-id: f86b8666cb5f86e300e214a6d59638d069018c50
2021-07-22 14:11:52 -07:00
d6e776d961 Add build/.ninja_log to artifacts for Windows (#62035)
Summary:
Being able to download the .ninja_log allows for better debugging. There may be a follow-up PR to convert this to a better tracefile.

This PR only handles windows as it is already handled for linux here:
https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L248-L252

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62035

Test Plan: Check the artifacts for a windows job and see if we see .ninja_log

Reviewed By: malfet

Differential Revision: D29852228

Pulled By: janeyx99

fbshipit-source-id: a3a87b709cd0c84f5b3cdc274ac4a623771c2b5c
2021-07-22 13:04:29 -07:00
0309c5780d ENH Adds no batch dim support for AvgPool1d (#61860)
Summary:
Towards https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61860

Reviewed By: albanD

Differential Revision: D29826382

Pulled By: jbschlosser

fbshipit-source-id: 47e12073d866f0604310fc1ff270cde9907e516d
2021-07-22 12:46:48 -07:00
5a00152a3d Warn about poor performance creating Tensor from list of numpy.array's (#51680)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/13918

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51680

Reviewed By: saketh-are

Differential Revision: D29847229

Pulled By: ezyang

fbshipit-source-id: 0519aad27f9ca1d8c06be5b9e6de382374d8b72b
2021-07-22 12:02:50 -07:00
2b0eddb0aa [Static Runtime] Implement prim::isinstance and prim::TypeCheck (#61783)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61783

Implement two new prim operators for static runtime: `isinstance` and `TypeCheck`. `isinstance` is very straightforward, but there were a few wrinkles with implementing `TypeCheck`:

1. There is no way to directly generate `TypeCheck` nodes from TorchScript, they are generated by the JIT at runtime. This makes testing a little difficult. I had to make some modifications to `testStaticRuntime` to allow for the use of IR and TorchScript tests.
2. The behavior of `prim::TypeCheck` as implemented here does not match up 1:1 with the version implemented in the interpreter! This is because grad mode is disabled in static runtime. Here's an example.

IR is the same as the one included in this test, but with `requires_grad == 1`
```
graph(%a.1 : Tensor,
      %b.1 : Tensor):
  %t0 : Float(2, 2, strides=[2, 1], device=cpu, requires_grad=1), %t1 : Float(3, 3, strides=[3, 1]), %type_matched : bool = prim::TypeCheck[types=[Float(2, 2, strides=[2, 1], device=cpu, requires_grad=1), Float(3, 3, strides=[3, 1])]](%a.1, %b.1)
  return (%t0, %t1, %type_matched)
```

And in the test setup:
```
auto a = at::zeros({2, 2}, at::kFloat);
a.to(at::kCPU);
a.set_requires_grad(true);
auto b = at::ones({3, 3}, at::kFloat);

std::vector<IValue> args_correct = {a, b};

// prim::TypeCheck should be true with args_correct,
// but we get false when using static runtime!
```

Reviewed By: hlu1

Differential Revision: D29743862

fbshipit-source-id: db1788f0f5de42bab42602e8cc24eee04cbcc280
2021-07-22 10:23:35 -07:00
e6339ee336 optimize imports (#61908)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61908

Reviewed By: suo

Differential Revision: D29800269

Pulled By: ejguan

fbshipit-source-id: 74ce4414eb6d2a5608df9ec1efdc71e2112aef70
2021-07-22 09:58:44 -07:00
554e04090f Add 11.3 conda nightly binaries (#61873)
Summary:
Adds conda 11.3 cuda binaries to our nightly matrix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61873

Test Plan:
Tested by https://github.com/pytorch/pytorch/pull/61867-->testing complete, showing all passing binaries.

THIS CAN ONLY BE MERGED _AFTER_ pytorch/builder#806 and pytorch/builder#807 are merged, which they now are.

Reviewed By: saketh-are

Differential Revision: D29848267

Pulled By: janeyx99

fbshipit-source-id: db04899418bd0b4116315fbbe36b06f772020c2e
2021-07-22 09:50:13 -07:00
e858f6eed9 torch.nn.utils.clip_grad_norm_: remove device syncs (#61042)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60691

### Changes

Per the discussion in the above issue, this PR makes 2 changes:
1. When `error_if_nonfinite=False`, the NaN/Inf checks are truly skipped, and no device synchronization occurs.
    - Additionally, when performing the checks, the 2 results are combined with `torch.logical_or` to incur only a single sync (instead of 2 in the happy/finite path).
2. The `clip_coef` conditional is removed, in favor of a call to `clamp(..., max=1.0)` and an unconditional multiplication.

### Testing

- The existing unit tests for `clip_grad_norm_` pass.
- I have manually profiled the example program from https://github.com/pytorch/pytorch/issues/60691, and verified that:
    - No synchronizations occur when using `error_if_nonfinite=False`.
    - A single synchronization occurs when using `error_if_nonfinite=True`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61042

Reviewed By: mrshenli

Differential Revision: D29764096

Pulled By: jbschlosser

fbshipit-source-id: db594b24608d16374b91bcbb9469046dfeeb152d
2021-07-22 08:53:40 -07:00
9e53c823b8 Add AVX512 support in ATen & remove AVX support (#61903)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61903

### Remaining Tasks

- [ ] Collate results of benchmarks on two Intel Xeon machines (with & without CUDA, to check if CPU throttling causes issues with GPUs) - make graphs, including Roofline model plots (Intel Advisor can't make them with libgomp, though, but with Intel OpenMP).

### Summary

1. This draft PR produces binaries with with 3 types of ATen kernels - default, AVX2, AVX512 . Using the environment variable `ATEN_AVX512_256=TRUE`  also results in 3 types of kernels, but the compiler can use 32 ymm registers for AVX2, instead of the default 16. ATen kernels for `CPU_CAPABILITY_AVX` have been removed.

2. `nansum` is not using AVX512 kernel right now, as it has poorer accuracy for Float16, than does AVX2 or DEFAULT, whose respective accuracies aren't very good either (#59415).
It was more convenient to disable AVX512 dispatch for all dtypes of `nansum` for now.

3. On Windows , ATen Quantized AVX512 kernels are not being used, as quantization tests are flaky. If `--continue-through-failure` is used, then `test_compare_model_outputs_functional_static` fails. But if this test is skipped, `test_compare_model_outputs_conv_static` fails. If both these tests are skipped, then a third one fails. These are hard to debug right now due to not having access to a Windows machine with AVX512 support, so it was more convenient to disable AVX512 dispatch of all ATen Quantized kernels on Windows for now.

4. One test is currently being skipped -
[test_lstm` in `quantization.bc](https://github.com/pytorch/pytorch/issues/59098) - It fails only on Cascade Lake machines, irrespective of the `ATEN_CPU_CAPABILITY` used, because FBGEMM uses `AVX512_VNNI` on machines that support it. The value of `reduce_range` should be used as `False` on such machines.

The list of the changes is at https://gist.github.com/imaginary-person/4b4fda660534f0493bf9573d511a878d.

Credits to ezyang for proposing `AVX512_256` - these use AVX2 intrinsics but benefit from 32 registers, instead of the 16 ymm registers that AVX2 uses.
Credits to limo1996 for the initial proposal, and for optimizing `hsub_pd` & `hadd_pd`, which didn't have direct AVX512 equivalents, and are being used in some kernels. He also refactored `vec/functional.h` to remove duplicated code.
Credits to quickwritereader for helping fix 4 failing complex multiplication & division tests.

### Testing
1. `vec_test_all_types` was modified to test basic AVX512 support, as tests already existed for AVX2.
Only one test had to be modified, as it was hardcoded for AVX2.
2.  `pytorch_linux_bionic_py3_8_gcc9_coverage_test1` & `pytorch_linux_bionic_py3_8_gcc9_coverage_test2` are now using `linux.2xlarge` instances, as they support AVX512. They were used for testing AVX512 kernels, as AVX512 kernels are being used by default in both of the CI checks. Windows CI checks had already been using machines with AVX512 support.

### Would the downclocking caused by AVX512 pose an issue?

I think it's important to note that AVX2 causes downclocking as well, and the additional downclocking caused by AVX512 may not hamper performance on some Skylake machines & beyond, because of the double vector-size. I think that [this post with verifiable references is a must-read](https://community.intel.com/t5/Software-Tuning-Performance/Unexpected-power-vs-cores-profile-for-MKL-kernels-on-modern-Xeon/m-p/1133869/highlight/true#M6450). Also, AVX512 would _probably not_ hurt performance on a high-end machine, [but measurements are recommended](https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/). In case it does, `ATEN_AVX512_256=TRUE` can be used for building PyTorch, as AVX2 can then use 32 ymm registers instead of the default 16. [FBGEMM uses `AVX512_256` only on Xeon D processors](https://github.com/pytorch/FBGEMM/pull/209), which are said to have poor AVX512 performance.

This [official data](https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-scalable-spec-update.pdf) is for the Intel Skylake family, and the first link helps understand its significance. Cascade Lake & Ice Lake SP Xeon processors are said to be even better when it comes to AVX512 performance.

Here is the corresponding data for [Cascade Lake](https://cdrdv2.intel.com/v1/dl/getContent/338848) -

![CASCADE LAKE AVX2](https://user-images.githubusercontent.com/76181208/120666172-ffec3f80-c451-11eb-8ea1-8933ccc12a1b.PNG)
![CASCADE LAKE AVX512](https://user-images.githubusercontent.com/76181208/120666190-04b0f380-c452-11eb-9faa-38d233c874c8.PNG)

The corresponding data isn't publicly available for Intel Xeon SP 3rd gen (Ice Lake SP), but [Intel mentioned that the 3rd gen has frequency improvements pertaining to AVX512](https://newsroom.intel.com/wp-content/uploads/sites/11/2021/04/3rd-Gen-Intel-Xeon-Scalable-Platform-Press-Presentation-281884.pdf). Ice Lake SP machines also have 48 KB L1D caches, so that's another reason for AVX512 performance to be better on them.

### Is PyTorch always faster with AVX512?

No, but then PyTorch is not always faster with AVX2 either. Please refer to #60202. The benefit from vectorization is apparent with with small tensors that fit in caches or in kernels that are more compute heavy. For instance, AVX512 or AVX2 would yield no benefit for adding two 64 MB tensors, but adding two 1 MB tensors would do well with AVX2, and even more so with AVX512.

It seems that memory-bound computations, such as adding two 64 MB tensors can be slow with vectorization (depending upon the number of threads used), as the effects of downclocking can then be observed.

Original pull request: https://github.com/pytorch/pytorch/pull/56992

Reviewed By: soulitzer

Differential Revision: D29266289

Pulled By: ezyang

fbshipit-source-id: 2d5e8d1c2307252f22423bbc14f136c67c3e6184
2021-07-22 08:51:49 -07:00
cyy
59d6e07ada fix forward_idx check (#59911)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59911

Reviewed By: dzhulgakov

Differential Revision: D29829020

Pulled By: albanD

fbshipit-source-id: f685063061dab499368a272d6b94a44e89f9a143
2021-07-22 08:37:33 -07:00
b60d1b713e Revert D26007050: add channels last support for thnn_conv2d (non-dilated)
Test Plan: revert-hammer

Differential Revision:
D26007050 (8b88c24670)

Original commit changeset: 1289e0687c24

fbshipit-source-id: 88b679efbcae572fe604d50e2199861cadbc3d4a
2021-07-22 08:31:15 -07:00
171598f0e3 [Refactoring] Fix imports order for torch/utils/data/dataset.py (#61328)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61328

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29588897

Pulled By: VitalyFedyunin

fbshipit-source-id: 63df653fb471532819c83ebcee4f9dc951500ffb
2021-07-22 08:30:08 -07:00
1b02641bb1 add BFloat16 operators on CPU: arange, acosh, asinh, atanh, exp2, digamma, trigamma, polygamma (#60444)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60444

Reviewed By: ejguan

Differential Revision: D29800899

Pulled By: ezyang

fbshipit-source-id: 26d2c2ac3e7d3a2d49679508aad8c8bf0232cad5
2021-07-22 08:13:22 -07:00
f3f7e92be5 Manually call lazyInitCUDA in structured CUDA calls (#61882)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61882

If you directly call the native implementation that bypasses the
initialization, which is bad!  This probably slows things down a little
though...

Fixes problem uncovered by #61642

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D29783856

Pulled By: ezyang

fbshipit-source-id: 16857569a049e09c6ebd96ef04b0025403b254af
2021-07-22 07:50:05 -07:00
196679d3aa [Refactoring] Reordering imports in torch/utils/data/datapipes/iter/__init__.py (#61325)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61325

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29588896

Pulled By: VitalyFedyunin

fbshipit-source-id: 8c0f3580f82083c43a590a18ecddb3e04ae93ca9
2021-07-22 07:46:08 -07:00
25be031c6e Add missing docker build to slow gradcheck label-triggered build (#61941)
Summary:
Currently, when adding the label, it fails like: https://app.circleci.com/pipelines/github/pytorch/pytorch/352569/workflows/d213cbad-edd6-4fe0-a79c-d46f8c0aae85/jobs/14856158

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61941

Reviewed By: suo

Differential Revision: D29827084

Pulled By: albanD

fbshipit-source-id: 134828d36e51324e6b6539dd4bc5f1eebfb89a03
2021-07-22 07:37:21 -07:00
5186fa2831 Fix c10d -> dist in test_ddp_hooks.py (#61864)
Summary:
**Overview:**
The existing `test_ddp_hooks.py` test file uses a prefix `c10d`, which is not defined in the file, meaning the test errors if left as is. This renames each `c10d` prefix to `dist`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61864

Test Plan:
All four tests pass when run:
```
gpurun python test/distributed/algorithms/ddp_comm_hooks/test_ddp_hooks.py
```

Reviewed By: ejguan

Differential Revision: D29783860

Pulled By: andwgu

fbshipit-source-id: 16bdd2dfcb76192964246148f14851a74f8907c8
2021-07-22 07:20:41 -07:00
109bd5e78a OpInfo: bitwise_and (#61349)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61349

Also add type promotion test for bugs found by pr #60813

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29592840

Pulled By: ezyang

fbshipit-source-id: ee013b20e31baf6c6ebf2edb881ae6d8e215c7a6
2021-07-22 07:04:17 -07:00
2f3300f25f [docs] Correct torch.permute (#61833)
Summary:
Noted while reviewing https://github.com/pytorch/pytorch/issues/61830

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61833

Reviewed By: albanD

Differential Revision: D29816661

Pulled By: mruberry

fbshipit-source-id: 895607d7ddcbd4319218ab7719a2f57cbde2283c
2021-07-22 00:27:23 -07:00
5801431c9b OpInfo Ref: addbmm (#61832)
Summary:
See https://github.com/pytorch/pytorch/issues/54261. This PR:

* Adds reference wrapper using NumPy for reference function of `addbmm`
* Refines sample inputs (makes it more readable and avoids redundancy)

cc: mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61832

Reviewed By: albanD

Differential Revision: D29816024

Pulled By: mruberry

fbshipit-source-id: e0fea6dc923504169a13bfaa258c61fbbc5fa9f4
2021-07-22 00:26:10 -07:00
31beef009d Fix IMethodTest.GetArgumentNames after D29648756 (#61985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61985

Fix IMethodTest.GetArgumentNames after D29648756 (641f6ef8a7).
ghstack-source-id: 134054637

Test Plan: buck test mode/dev caffe2/test/cpp/api:imethod -- IMethodTest.GetArgumentNames

Reviewed By: suo

Differential Revision: D29828807

fbshipit-source-id: b1411745b91e1b8c0ea0fd9e9666e22125dde333
2021-07-22 00:21:59 -07:00
07a91f1cfd fix graph deepcopy to propagate output type (#61747)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61747

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D29737565

Pulled By: migeed-z

fbshipit-source-id: 8583f0c87f2db27695e062f59a15de77f3b00fd6
2021-07-21 23:53:03 -07:00
8a2063e58a Foreach Test Refactor: Pointwise, Min/Max-imum (#61327)
Summary:
- rewrite pointwise unittests using `ops` decorator
- rewrite minimum&maximum unittests using `ops` decorator
- enable minimum/maximum fastpath for BFloat16
- remove _test_data method

https://github.com/pytorch/pytorch/issues/58833

cc: ptrblck ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61327

Reviewed By: albanD

Differential Revision: D29830209

Pulled By: ngimel

fbshipit-source-id: fa7805262b86c40fc32750b16629d80ad48ea4b5
2021-07-21 21:59:57 -07:00
d6899fe492 [Refactoring] Reordering imports in utils/data/__init__.py (#61324)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61324

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29588895

Pulled By: VitalyFedyunin

fbshipit-source-id: 5e719c80f9cb5630c65187ac89773831777f368d
2021-07-21 21:38:28 -07:00
06efced177 .github: Specify directory to pull reports from (#61990)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61990

This adds more specificity to where to pull test reports from since I
believe that actions/upload-artifact doesn't actually respect the
working-directory default

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD, zhouzhuojie

Differential Revision: D29831719

Pulled By: seemethere

fbshipit-source-id: cee5609f97338d44a484d85baa77f0167d81ce55
2021-07-21 20:57:07 -07:00
cc18654d66 [fx_acc] Refactoring acc_tracer (#61963)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61963

Test Plan: CI

Reviewed By: jfix71

Differential Revision: D29772522

fbshipit-source-id: 4b117735147624f9428b933ea798495823423a0e
2021-07-21 20:09:15 -07:00
6284d2a82b wrap cudaStreamSynchronize calls (#61889)
Summary:
This is a first step towards creating context manager that errors out on synchronizing calls.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61889

Reviewed By: albanD

Differential Revision: D29805280

Pulled By: ngimel

fbshipit-source-id: b66400fbe0941b7daa51e6b30abe27b9cccd4e8a
2021-07-21 19:30:52 -07:00
3d6aa3a2f6 Enable torch.isclose to suppport bool tensors (#61271)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60533

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61271

Reviewed By: zhxchen17

Differential Revision: D29737618

Pulled By: SplitInfinity

fbshipit-source-id: 45314bc7e0b9a28c10700455b1e6267c0db3eefc
2021-07-21 18:50:14 -07:00
243c7079a1 add 3d input and output shapes to maxpool documentation (#61310)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61310

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D29737516

Pulled By: migeed-z

fbshipit-source-id: eb6964f6808b8ae05d4d3852a5162dc66930cd64
2021-07-21 18:27:27 -07:00
d00bb45846 [typing] suppress errors in fbcode/caffe2 - batch 2
Test Plan: Sandcastle

Differential Revision: D29827809

fbshipit-source-id: 7ca7c2a33d691ac57392945b78a320d253c84ed4
2021-07-21 17:56:26 -07:00
a0e381641b Remove relative paths for clang-tidy annotations (#62004)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62004

Some of the files checked by clang tidy are compiled from a sibling directory, so the files all start with something like `../torch`. This ends up messing with `translate_annotations.py` which runs from the repo root. This fixes it by chopping off any relative paths in the clang tidy output.

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D29835446

Pulled By: driazati

fbshipit-source-id: 2bd279370e41ed0a321e30f88fe38434105c75e8
2021-07-21 17:52:31 -07:00
e731a63e63 Silence clang-tidy linter for TorchpyTest.FxModule test (#62001)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62001

This will fix [this linter error](https://github.com/pytorch/pytorch/runs/3120335141) introduced with D29690088 (810e19979d).

Test Plan: N/A (just looked at other examples and tidy doc https://clang.llvm.org/extra/clang-tidy/)

Reviewed By: suo

Differential Revision: D29832654

fbshipit-source-id: 8cf69cb5551f3b1bd384a2553dc5c827beb0a68f
2021-07-21 17:40:46 -07:00
b6ff0fa8dd Enable dynamically ciflow/slow so that we can run GHA slow tests on PR (#61987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61987

This PR enables us to run slow GHA tests on PR.

Steps to do (~may only take effect after this PR is merged~ works on this PR)
- Add label `ciflow/slow`
- Assign/unassign pytorchbot
- The job should be running .github/workflows/pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7.yml

The above steps are manual, and after probot can do the dispatch work, the ciflow will be automated.

Related meta RFC issue: #61888

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D29832758

Pulled By: zhouzhuojie

fbshipit-source-id: 64d31ef572502e62b80e6b7ac480ffcfa9f4e38b
2021-07-21 16:56:54 -07:00
9d6cdf34a4 Annotate generated files in .gitattributes (#61995)
Summary:
Mark CI yaml files generated from templates as linguist-generated
Fixes https://github.com/pytorch/pytorch/issues/61994

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61995

Reviewed By: seemethere

Differential Revision: D29832199

Pulled By: malfet

fbshipit-source-id: 86ad3a16b4d3e4f94c35b8f766a8556a07632419
2021-07-21 16:49:07 -07:00
ae58a4c45d [Static Runtime] Added a variadic cat operator (#61302)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61302

Test Plan: Imported from OSS

Reviewed By: hlu1

Differential Revision: D29565344

Pulled By: navahgar

fbshipit-source-id: 96f5f4546ec0e61eb7f87e016e026e7b62576248
2021-07-21 15:58:20 -07:00
b145889192 Modernize use make_unique (#61739)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61739

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29717133

fbshipit-source-id: 70e3d81a48f7ae90cca3ef3c9587174ca15d81f4
2021-07-21 15:28:26 -07:00
2c0ecfbb20 [PyTorch] Expose bias() and unpack() API of LinearPackedParamsBase to Python layer (#61855)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61855

Exposing `bias()` and `unpack()` for `LinearPackedParamsBase`. This is useful for inspecting linear op attributes.

Test Plan:
See unit test passing:

```
[ (6c61a5eb4) | devvm1625 ~/fbsource/fbcode] buck test //caffe2/test:quantization -- test_linear_bias_unpack
Parsing buck files: finished in 2.8 sec
Building: finished in 9.9 sec (100%) 11973/55220 jobs, 0/55220 updated
  Total time: 12.8 sec
More details at https://www.internalfb.com/intern/buck/build/2d0ee210-c8f3-4994-ac2b-1dccf4c3ca6c
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: b7c6ea1b-8eef-430e-b83a-dad4033ecc87
Trace available for this run at /tmp/tpx-20210720-115423.031745/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5348024618459562
    ✓ ListingSuccess: caffe2/test:quantization - main (10.806)
    ✓ Pass: caffe2/test:quantization - test_linear_bias_unpack (quantization.core.test_quantized_op.TestQuantizedOps) (10.913)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5348024618459562
```

Reviewed By: kimishpatel

Differential Revision: D29767704

fbshipit-source-id: 716f43b61814b92094c0b08d4e63e1dddc352aa7
2021-07-21 15:13:40 -07:00
a02ccd6080 [ONNX] add supplement for standardOps low precision cast (#60731) (#61561)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61561

address Gary reply and add supplement of https://github.com/pytorch/pytorch/pull/53813.

- add more details for LowPrecisionCastNodeForStandardOps to make it more comprehensible.

- remove unuse gemm test

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D29767991

Pulled By: SplitInfinity

fbshipit-source-id: d00032e13699f5b02fc619e64aa8fdd39f3a66b8

Co-authored-by: hwangdeyu <dejack953@outlook.com>
2021-07-21 15:10:36 -07:00
6f08ddfc28 [ONNX] Enable aten:normal op and add tests for aten:uniform op. (#60441) (#61560)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61560

1. Add a new symbolic function broadcast_tensors() to support exporting torch.broadcast_tensors() function. This is required by exporting torch.distribution.normal() function.
2. Add a new symbolic function normal() to support exporting torch.distribution.normal() function.
3. Add relative tests for normal and uniform ops as well.

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D29767995

Pulled By: SplitInfinity

fbshipit-source-id: acfe5e7801d00c0df8ca46966bbd6015fed0045e

Co-authored-by: Jay Zhang <jiz@microsoft.com>
2021-07-21 15:10:35 -07:00
f0054e1a6e [ONNX] Update expand_as for dynamic shape (#61084) (#61559)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61559

Update expand_as for dynamic shape

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D29767990

Pulled By: SplitInfinity

fbshipit-source-id: 3f1e3f68fd17c5ffbd4a50fccff224fd9d6c84fb

Co-authored-by: Negin Raoof <neginmr@utexas.edu>
2021-07-21 15:10:33 -07:00
34075e2c8b [ONNX] Fix the issue of converting empty list to sequence. (#58651) (#61558)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61558

When we construct an empty list by python list comprehension, we need to avoid converting the node without inputs to onnx::Concat in shape_type_inference.cpp and peephole.cpp because it will create an invalid Concat node which doesn't have inputs.

In addition, update the code to avoid passing a Sequence input to an onnx::Cast node which doesn't accept Sequence data type as an input.

Add tests for the validation as well.

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D29767989

Pulled By: SplitInfinity

fbshipit-source-id: f97f172ff20eebda4c3744c7a934df36716f12a2

Co-authored-by: fatcat-z <jiz@microsoft.com>
2021-07-21 15:10:31 -07:00
22e60d77e7 [ONNX] Support tensor list as module attribute (#59685) (#61557)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61557

* Support tensor list as module attribute.
* Support exporting `torch.set_`.

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D29767992

Pulled By: SplitInfinity

fbshipit-source-id: 5ac5a09600d4dbe86b2fe354d240e46f1d1084ef
2021-07-21 15:08:35 -07:00
a8f6b5a80a [1/N] Avoid skipping tests in sandcastle. (#61876)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61876

In the sandcastle environment, avoid skipping tests and instead just
"pass" these tests to avoid a large number of tasks being created which are not
actionable.
ghstack-source-id: 133846232

Test Plan: Test with `SANDCASTLE=1 TW_JOB_USER=sandcastle`

Reviewed By: rohan-varma

Differential Revision: D29779699

fbshipit-source-id: add71008830dfa6f456ce2365a2d70436b7b7a31
2021-07-21 14:31:17 -07:00
adb73d3dcf Removed overhead from reshape() call if tensor doesn't need to be changed (#61466)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61466

## Goal

Per #55126 the performance of `reshape` is worse than `alias` in cases where they are performing the same operation (i.e. where reshape is returning a view) because `reshape` delegates to `view` and duplicates some of the operations (specifically `infer_size_dv` and `computeStride`).

The goal of this pull-request is to reduce or remove the additional overhead that `reshape` has.

### Proposed Implementation

Instead of using `view` we implement a private/internal operator (`_reshape_alias`) that `reshape` dispatches to which skips the relevant checks. This is functionally equivalent to `as_strided` however it is a lot simpler because it's specialized to this use-case, and importantly the `backward` implementation is a lot faster.

Note that we have to dispatch (`reshape` is a composite operator) because `reshape` can return either a view or a copy of the Tensor depending on the parameters, and this complicates implementing a derivative/backward for `reshape`.

### Why not `as_strided`?

Using `as_strided` directly slows down autograd. If we use a custom function equivalent to `_reshape_alias` but with a simpler backward function then `view` has the same performance as `reshape`. If we delegate to `as_strided` it is about 56% slower (and this holds against our custom function).

This is also the reason we make an internal operator named `_reshape_alias` instead of exposing a new operator since this should only be used in the `reshape` case and it is effectively a more limited version of `view`, `alias`, and `as_strided`.

## Benchmarks
In a micro-benchmark for `backward` running:

```cpp
// Setup
at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));

// Benchmark loop
// `reshape(-1)` replaced with a call to view(-1) for view baseline
x.pow(4).reshape(-1).mean().backward();
```

I also benchmarked simple operations without gradients using:

```cpp
// Setup
at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));

// Benchmark loop
x.reshape(-1) // replaced with a call to view(-1) for view baseline
```

Baselined to `view`:

* Original `reshape`: `+3.3%` (without gradients `+20.8%`)
* Using `as_strided`: `+55.1%` (without gradients `+1.0%`)
* Using custom `_reshape_view`: `-1.0%` (without gradients `+6.2%`)

In absolute terms (note the percentages above were generated comparing between runs/tests rather than to a single baseline):

* Original `view`: `53.66 us` (without gradients `582.78 ns`)
* Original `reshape`: `55.46 us` (without gradients `704.24 ns`)
* Using `as_strided`: `83.24 us` (without gradients `576.49 ns`)
* Using custom `_reshape_view`: `53.13 us` (without gradients `536.01 ns`)

Note that these benchmarks perform a backwards operation as well. When compared without using gradient computation at all the performance differneces are more pronounced as this takes up more of the time.

### Original performance

<details>
  <summary>Benchmark results</summary>

```
[<torch.utils.benchmark.utils.common.Measurement object at 0x7f0e4d393160>
x.pow(4).view(-1).mean().backward();
setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));
  Median: 53.66 us
  IQR:    2.70 us (52.54 to 55.24)
  884 measurements, 100 runs per measurement, 1 thread]

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f0e2ebd4fa0>
x.pow(4).reshape(-1).mean().backward();
setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));
  Median: 55.46 us
  IQR:    2.61 us (54.39 to 57.01)
  889 measurements, 100 runs per measurement, 1 thread]

2276116
2286256

<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f0e5b2e3e20>
   2640  ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&)
   1920  ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>)
   1520  ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>)
   1040  ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long>&&)
    980  ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&)
    720  ???:__tls_get_addr
    520  ???:at::shouldRunRecordFunction(bool*)
    520  ???:__memcpy_avx_unaligned_erms
    200  ???:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10:: ... g>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
    100  ???:c10::TensorImpl::strides() const
    100  ???:c10::TensorImpl::sizes() const
    100  ???:at::(anonymous namespace)::manager()
     77  /tmp/benchmark_utils_jit_build__1626465284__8a34e7ff-cd37-4a82-be28-7f19e081e771/timer_cpp_7815557938202456331/timer_src.cpp:main
     40  ???:c10::TensorImpl::numel() const
    -77  /tmp/benchmark_utils_jit_build__1626465284__8a34e7ff-cd37-4a82-be28-7f19e081e771/timer_cpp_8055217880649990171/timer_src.cpp:main
   -260  ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>)

Total: 10140
```

```
[<torch.utils.benchmark.utils.common.Measurement object at 0x7f850dd66c10>
x.view(-1);
setup: at::Tensor x=torch::empty({2,2});
  Median: 582.78 ns
  IQR:    33.80 ns (573.80 to 607.61)
  833 measurements, 10000 runs per measurement, 1 thread]

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f850de31e20>
x.reshape(-1);
setup: at::Tensor x=torch::empty({2,2});
  Median: 704.24 ns
  IQR:    24.42 ns (697.20 to 721.62)
  679 measurements, 10000 runs per measurement, 1 thread]

56896
67036

<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f84e1930bb0>
   2640  ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&)
   1920  ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>)
   1520  ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>)
   1040  ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long>&&)
    980  ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&)
    720  ???:__tls_get_addr
    520  ???:at::shouldRunRecordFunction(bool*)
    520  ???:__memcpy_avx_unaligned_erms
    200  ???:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10:: ... g>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
    100  ???:c10::TensorImpl::strides() const
    100  ???:c10::TensorImpl::sizes() const
    100  ???:at::(anonymous namespace)::manager()
     76  /tmp/benchmark_utils_jit_build__1626466038__15fbbac0-2072-4459-8f8e-08121a905b99/timer_cpp_547407365342278353/timer_src.cpp:main
     40  ???:c10::TensorImpl::numel() const
    -76  /tmp/benchmark_utils_jit_build__1626466038__15fbbac0-2072-4459-8f8e-08121a905b99/timer_cpp_3457873755756181226/timer_src.cpp:main
   -260  ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>)

Total: 10140
```

</details>

### Using `as_strided`

<details>
  <summary>Benchmark results</summary>

```
[<torch.utils.benchmark.utils.common.Measurement object at 0x7f8b13bb5b50>
x.pow(4).view(-1).mean().backward();
setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));
  Median: 53.37 us
  IQR:    3.15 us (51.73 to 54.88)
  936 measurements, 100 runs per measurement, 1 thread]

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f8af55f8490>
x.pow(4).reshape(-1).mean().backward();
setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));
  Median: 83.24 us
  IQR:    4.05 us (81.20 to 85.25)
  609 measurements, 100 runs per measurement, 1 thread]

2267916
2525061

<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f8af55f8e50>
   31930  ???:_int_free
   15940  ???:malloc
   11595  ???:_int_malloc
   10100  ???:torch::autograd::generated::details::as_strided_backward(at::Tensor, at::TensorGeometry, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
    9360  ???:__tls_get_addr
    8280  ???:free
    8100  ???:torch::autograd::VariableType::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
    4520  ???:c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::reset_()
    4080  ???:operator new(unsigned long)
     ...
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
    -920  ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&)
   -1220  ???:torch::autograd::generated::ViewBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&)
   -1520  ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>)
   -1580  ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
   -1680  ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&)
   -2560  ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&)
   -2640  ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>)
   -4860  ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)

Total: 257145
```

```

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f93176a0160>
x.view(-1);
setup: at::Tensor x=torch::empty({2,2});
  Median: 570.55 ns
  IQR:    32.69 ns (552.87 to 585.56)
  874 measurements, 10000 runs per measurement, 1 thread]

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f92f8f29490>
x.reshape(-1);
setup: at::Tensor x=torch::empty({2,2});
  Median: 576.49 ns
  IQR:    37.95 ns (559.51 to 597.46)
  861 measurements, 10000 runs per measurement, 1 thread]

56896
58556

<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f932556ca60>
    2140  ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>)
    1940  ???:torch::autograd::VariableType::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
    1880  ???:torch::ADInplaceOrView::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
    1720  ???:at::_ops::as_strided::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
    1520  ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>)
    1400  ???:at::native::as_strided_tensorimpl(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
    1260  ???:at::_ops::as_strided::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)'2
    1260  ???:at::_ops::as_strided::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
     980  ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&)
     ...
    -620  ???:at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, c10::ArrayRef<long ... ::ArrayRef<long>)> const&, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) const
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
    -920  ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&)
   -1520  ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>)
   -1580  ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
   -1680  ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&)
   -1740  ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
   -2640  ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>)

Total: 1660

```

</details>

### Using custom function (`_reshape_alias`)

<details>
  <summary>Benchmark results</summary>

```
[<torch.utils.benchmark.utils.common.Measurement object at 0x7f16861d6b50>
x.pow(4).view(-1).mean().backward();
setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));
  Median: 53.50 us
  IQR:    2.64 us (52.32 to 54.96)
  906 measurements, 100 runs per measurement, 1 thread]

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f1667b2ed60>
x.pow(4).reshape(-1).mean().backward();
setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));
  Median: 53.13 us
  IQR:    3.40 us (51.72 to 55.13)
  914 measurements, 100 runs per measurement, 1 thread]

2269736
2273236

<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f1693f8dc10>
    5060  ???:torch::autograd::VariableType::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    2000  ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>)
    1780  ???:torch::ADInplaceOrView::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    1660  ???:at::_ops::_reshape_alias::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    1600  ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::ArrayRef<long> >(at::Tensor const&, c10::ArrayRef<long> const&, c10::ArrayRef<long> const&)
    1520  ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>)
    1240  ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)'2
    1240  ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    1220  ???:torch::autograd::generated::AliasToShapeBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&)
     ...
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
    -920  ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&)
   -1220  ???:torch::autograd::generated::ViewBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&)
   -1520  ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>)
   -1580  ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
   -1680  ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&)
   -2640  ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>)
   -4860  ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)

Total: 3500
```

```

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f5287adfb20>
x.view(-1);
setup: at::Tensor x=torch::empty({2,2});
  Median: 505.10 ns
  IQR:    20.04 ns (500.41 to 520.45)
  944 measurements, 10000 runs per measurement, 1 thread]

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f526951b430>
x.reshape(-1);
setup: at::Tensor x=torch::empty({2,2});
  Median: 536.01 ns
  IQR:    17.81 ns (531.34 to 549.16)
  916 measurements, 10000 runs per measurement, 1 thread]

56896
60376

<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f5295896c10>
    2000  ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>)
    1860  ???:torch::autograd::VariableType::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    1780  ???:torch::ADInplaceOrView::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    1660  ???:at::_ops::_reshape_alias::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    1600  ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::ArrayRef<long> >(at::Tensor const&, c10::ArrayRef<long> const&, c10::ArrayRef<long> const&)
    1520  ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>)
    1240  ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)'2
    1240  ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
     980  ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&)
     ...
    -620  ???:at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, c10::ArrayRef<long ... ::ArrayRef<long>)> const&, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) const
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
    -920  ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&)
   -1520  ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>)
   -1580  ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
   -1680  ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&)
   -1740  ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
   -2640  ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>)

Total: 3480

```

</details>

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29792126

Pulled By: laurencer

fbshipit-source-id: f0519b45b65f868aa3e8651679354558bd761dfd
2021-07-21 14:05:35 -07:00
a8d99a28d7 Modernize avoid a C array (#61740)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61740

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29717118

fbshipit-source-id: 70e73346b75deb4fe6b6399e06bd576f3b6e2b91
2021-07-21 13:52:54 -07:00
d7b31fe95d Add ciflow config and change jinja2 templates (#61886)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61886

This PR is rolling out at the `1. Manual Phase`.

```
#       Rollout Strategy:
#       1. Manual Phase
#          step 1. Add 'ciflow/default' label to the PR
#          step 2. Once there's an [unassigned] event from PR, it should rerun
#          step 3. Remove 'ciflow/default' label
#          step 4. Trigger the [unassigned] event again, it should not rerun
#       2. Probot Phase 1 (manual on 1 workflow)
#          step 1. Probot automatically add labels based on the context
#          step 2. Manually let probot trigger [unassigned] event
#       4. Probot Phase 3 (auto on 1 workflows)
#          step 1. Modify the workflows so that they only listen on [unassigned] events
#          step 2. Probot automatically adds labels automatically based on the context
#          step 3. Probot automatically triggers [unassigned] event
#       4. Probot Phase 3 (auto on many workflows)
#          step 1. Enable it for all workflows
```

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D29808366

Pulled By: zhouzhuojie

fbshipit-source-id: c7e5009d839239df58825dec093ff0f1fd281697
2021-07-21 13:32:09 -07:00
2dab368d26 Refactor generate_ci_workflows (#61879)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61879

Refactor generate_ci_workflows to support CI dispatcher. This is the first step to refactor the workflow into a dataclass with some validation and OOP.

Verified that the output is the same:

```
.github/scripts/generate_ci_workflows.py
git status
```

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D29808365

Pulled By: zhouzhuojie

fbshipit-source-id: b8c5fd43f4bd6e17e06f3925a1a509084b790d95
2021-07-21 13:30:36 -07:00
e2acce373f Run Windows smoke tests with gflags in test dir (#61967)
Summary:
Previous testing yielded the torch.version ModuleNotFound error when I ran the smoke tests from the pytorch root directory.

This PR simply reorders the commands to run the smoke tests within the test directory, which passes in this series of runs:
https://github.com/seemethere/test-repo/actions/runs/1050734298 (the failures are due to missing credentials during uploading stats, which we don't need here)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61967

Reviewed By: samestep

Differential Revision: D29820985

Pulled By: janeyx99

fbshipit-source-id: 363ef321c32cfaf4446ceeb6117ea26abc311816
2021-07-21 12:06:34 -07:00
a03466cb07 Back out "Revert D29687143: [3/N] Nnapi Backend Delegate Preprocess: Basic OSS Test" (#61878)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61878

CMakeLists.txt
Android NNAPI delegate library was moved from test/cpp/jit/CMakeLists.txt to torch/CMakeLists.txt. This resolves the issue the original PR had, where the NNAPI delegate library was added to builds without Python (when it depends on Python).
Original PR: https://github.com/pytorch/pytorch/pull/61594

There's an error where the library cannot be built on MacOS. This problem existed in the original PR as well, but now an issue has been created: https://github.com/pytorch/pytorch/issues/61930

test_backend_nnapi.py
Also changed the skip unit test headers so that it's a little cleaner. Now the unit tests are skipped if the Nnapi delegate library file is not found. Previously, the skip was based on the platform (only allowing Linux).

Test Plan:
To run NNAPI delegate unit tests: `python test/test_jit.py TestNnapiBackend`

Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D29799895

fbshipit-source-id: b69a767b5cde3814b0853cfbc84d61ab4155f619
2021-07-21 11:58:45 -07:00
4532b3c4a9 Fix _C public bindings test (#61088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61088

The test was previously a no-op since it was comparing the bindings with themselves. This fixes that to use the hardcoded list and adds the items that changed in the meantime.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29510525

Pulled By: driazati

fbshipit-source-id: 3497023e5c8b3cd6fdd1d07d48b4f2650b203ded
2021-07-21 11:50:37 -07:00
8880f3d450 [fx] introduce __fx_create_arg__ dunder method for controlling custom classes are handled as node args (#61780)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61780

These changes would allow objects to control how they are handled when they are an argument to a torch.fx call_module node from within their source. Previously, we have been using a custom Tracer with an overridden create_arg() method and branching based on class name to handle args that are unusual (data classes, etc).

Reviewed By: suo, houseroad

Differential Revision: D27976120

fbshipit-source-id: 0c5249c5f8398368ca0fbec0ad8a07ccf99b7da4
2021-07-21 11:27:09 -07:00
3c7bfa632a reland D29801875: .github: Clone pytorch to separate directory (#61972)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61972

This reverts commit 716567504c8b4da8d764d9674595c2095b62080c.

Also includes change to add the TEST_CONFIG env variable so that test
reports get uploaded correctly.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D29821858

Pulled By: seemethere

fbshipit-source-id: 23602706446e0a95db6bd7cedfa665e8c4145168
2021-07-21 11:15:52 -07:00
810e19979d Torch deploy for fx.grapm_module with non-torch dependencie (#61680)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61680

This diff enables torch deploy for fx.graph_module with non-torch dependencies . Here are the issues currently preventing this and are fixed in this change:
-  Pickle is used as an internal format to transmit objects between interpreters. It needs to serialize python code, but to be able to get the source code for imports from python_code.globals it needs access to the PackageImporter. Currently a regular _reduce_ function is used which doesn't have the notion of custom importer.
- When deserializing pickled objects on an interpreter, it is passing empty globals to exec, thus it will not be able to resolve non-torch imports located in the package. We need to be able to point exec to our custom PackageImporter.
- Subclasses extending fx.graph_module should be able to optionally provide their own Tracer (extending fx.Tracer).

As a solution a new reducer is introduced (_reduce_deploy_) for torch deploy workflow. Reducer will be registered in _deploy.py (entry point for C++ torch deploy API) when saving the object transmitting it between interpreters. It allows us to pass a proper PackageImporter for each interpreter for pickling/unpickling fx.graph_module. It also defines an api for passing custom fx.Tracer when needed.

Test Plan:
Added UT to cover changes.
```
buck test //caffe2/torch/csrc/deploy:test_deploy
```
```
buck test caffe2/test:fx
```

Reviewed By: suo

Differential Revision: D29690088

fbshipit-source-id: 3a8dbe02d5d7e085534aa61b7773c86f0f8c19b0
2021-07-21 10:29:48 -07:00
f41d3341b1 [pytorch] Support embedding_bag_4bit_rowwise_offsets in cuda (#61728)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61728

Templatize existing embedding_bag_byte_rowwise_offsets_kernel to support both 4 bits per dimension and 8 bits per dimension. Test rigorously using fb internal random testing vs CPU ops.

Reviewed By: hyuen

Differential Revision: D29706346

fbshipit-source-id: c9f4591a2cc6205e4b7e57a363ba0a6306fdddd5
2021-07-21 10:23:30 -07:00
716567504c Revert D29801875: .github: Clone pytorch to separate directory
Test Plan: revert-hammer

Differential Revision:
D29801875 (a152c12d7b)

Original commit changeset: 71a3c7c949e5

fbshipit-source-id: 85175a9933d1e33117b1461d5a760e1a79f60047
2021-07-21 10:19:28 -07:00
ea8abcf76e [quant] Remove calls to .item() for fake_quant_on (#61921)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61921

For GPU training, the fake_quant_on tensors are present on the GPU and the .item() calls incur a GPU->CPU copy to access the tensor element.
This call can prove expensive and hurt the performance during training as the `item()` and `local_scalar_dense()` calls take up 11% of the total CPU execution time.
The solution here is to access the tensor on the GPU without a copy.

Individual op benchmarks show a 33% speedup just by removing the `.item()` calls

Profiler Before
```
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                  aten::fused_moving_avg_obs_fake_quant         5.61%       1.538ms       100.00%      27.421ms     548.425us     978.208us         3.42%      28.575ms     571.501us            50
                  aten::_fused_moving_avg_obs_fq_helper        27.63%       7.576ms        94.39%      25.883ms     517.668us       6.536ms        22.87%      27.597ms     551.937us            50
aten::_fake_quantize_per_tensor_affine_cachemask_ten...        11.07%       3.037ms        21.54%       5.905ms     118.103us       9.549ms        33.42%       9.549ms     190.978us            50
                                         aten::_aminmax        19.39%       5.317ms        27.44%       7.524ms     150.484us       8.683ms        30.38%       8.683ms     173.651us            50
                                             aten::item         4.49%       1.232ms        11.12%       3.051ms      61.011us       1.058ms         3.70%       2.829ms      56.579us            50
                              aten::_local_scalar_dense         6.63%       1.818ms         6.63%       1.818ms      36.363us       1.771ms         6.20%       1.771ms      35.419us            50
                                            aten::empty         5.76%       1.579ms         5.76%       1.579ms      15.792us       0.000us         0.00%       0.000us       0.000us           100
                                       aten::as_strided         2.29%     628.399us         2.29%     628.399us       6.284us       0.000us         0.00%       0.000us       0.000us           100
                                       aten::empty_like         7.56%       2.073ms        17.13%       4.696ms      31.310us       0.000us         0.00%       0.000us       0.000us           150
                                    aten::empty_strided         9.57%       2.623ms         9.57%       2.623ms      17.489us       0.000us         0.00%       0.000us       0.000us           150
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 27.421ms
Self CUDA time total: 28.575ms
```
After
```
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                  aten::fused_moving_avg_obs_fake_quant         6.59%       1.240ms       100.00%      18.820ms     376.396us     490.272us         2.36%      20.745ms     414.901us            50
                  aten::_fused_moving_avg_obs_fq_helper        26.12%       4.916ms        93.41%      17.580ms     351.597us       2.033ms         9.80%      20.255ms     405.096us            50
aten::_fake_quantize_per_tensor_affine_cachemask_ten...        14.55%       2.738ms        31.09%       5.850ms     117.005us       9.968ms        48.05%       9.968ms     199.363us            50
                                         aten::_aminmax        25.28%       4.758ms        36.21%       6.814ms     136.278us       8.253ms        39.79%       8.253ms     165.069us            50
                                            aten::empty         7.94%       1.494ms         7.94%       1.494ms      14.944us       0.000us         0.00%       0.000us       0.000us           100
                                       aten::as_strided         2.99%     561.785us         2.99%     561.785us       5.618us       0.000us         0.00%       0.000us       0.000us           100
                                       aten::empty_like         8.36%       1.573ms        16.53%       3.112ms      31.118us       0.000us         0.00%       0.000us       0.000us           100
                                    aten::empty_strided         8.17%       1.538ms         8.17%       1.538ms      15.384us       0.000us         0.00%       0.000us       0.000us           100
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 18.820ms
Self CUDA time total: 20.745ms
```

Test Plan:
python test/test_quantization.py

Imported from OSS

Reviewed By: jingsh

Differential Revision: D29796533

fbshipit-source-id: 10abb93abd61c6ac25b8e8c114aa57b9db891918
2021-07-21 10:13:06 -07:00
b8386f5d72 [quant] Create FusedMovingAvgObsFakeQuantize for QAT (#61691)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61691

Create a new module for QAT that does a Fused MovingAvgMinMaxObserver and FakeQuantize operation
The module currently only supports per-tensor quantization (affine/symmetric). Follow-up PR will add support for per-channel

Results on running QAT with MobileNetV2 (Obs enabled/fake_quant enabled)
Original FQ module
PyTorchObserver {"type": "_", "metric": "qnnpack_fp_latency_ms", "unit": "ms", "value": "242.80261993408203"}
PyTorchObserver {"type": "_", "metric": "qnnpack_qat0_latency_ms", "unit": "ms", "value": "505.7964324951172"}
PyTorchObserver {"type": "_", "metric": "fbgemm_fp_latency_ms", "unit": "ms", "value": "235.80145835876465"}
PyTorchObserver {"type": "_", "metric": "fbgemm_qat0_latency_ms", "unit": "ms", "value": "543.8144207000732"}

Fused FakeQuant module (~50% improvement in latency)
PyTorchObserver {"type": "_", "metric": "qnnpack_fp_latency_ms", "unit": "ms", "value": "232.1624755859375"}
PyTorchObserver {"type": "_", "metric": "qnnpack_qat0_latency_ms", "unit": "ms", "value": "263.8866901397705"}
PyTorchObserver {"type": "_", "metric": "fbgemm_fp_latency_ms", "unit": "ms", "value": "236.9832992553711"}
PyTorchObserver {"type": "_", "metric": "fbgemm_qat0_latency_ms", "unit": "ms", "value": "292.1590805053711"}

Individual module benchmark result (>5x improvement in latency)
===> Baseline FakeQuantize module
```
---------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                               Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
---------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
              aten::fake_quantize_per_tensor_affine         0.77%       1.210ms         4.92%       7.730ms     154.596us     718.528us         0.45%       9.543ms     190.862us            50
    aten::fake_quantize_per_tensor_affine_cachemask         2.41%       3.792ms         4.15%       6.520ms     130.402us       8.825ms         5.58%       8.825ms     176.492us            50
                                     aten::_aminmax         3.25%       5.105ms         4.43%       6.955ms     139.102us       8.193ms         5.18%       8.193ms     163.868us            50
                                   aten::zeros_like         1.87%       2.939ms         6.95%      10.922ms     109.218us       5.992ms         3.79%      10.844ms     108.442us           100
                                        aten::zeros         0.97%       1.527ms         3.11%       4.885ms      97.702us       2.383ms         1.51%       4.800ms      96.010us            50
                                         aten::rsub         1.34%       2.106ms         2.94%       4.614ms      92.277us       2.063ms         1.30%       4.559ms      91.173us            50
                                        aten::clamp         2.79%       4.381ms         5.42%       8.519ms      85.190us       5.385ms         3.41%       8.438ms      84.381us           100
                                           aten::eq        11.70%      18.384ms        21.31%      33.479ms      83.280us      22.465ms        14.21%      33.310ms      82.861us           402
                                         aten::ones         1.05%       1.656ms         2.57%       4.038ms      80.751us       2.494ms         1.58%       3.951ms      79.028us            50
                                           aten::le         2.52%       3.955ms         4.84%       7.607ms      76.071us       4.998ms         3.16%       7.702ms      77.016us           100
                                          aten::min         0.69%       1.087ms         2.32%       3.641ms      72.827us       1.017ms         0.64%       3.603ms      72.055us            50
                                          aten::max         1.40%       2.195ms         4.62%       7.260ms      72.597us       2.008ms         1.27%       7.140ms      71.404us           100
                                   aten::is_nonzero         2.68%       4.207ms        11.35%      17.829ms      71.033us       4.062ms         2.57%      17.225ms      68.625us           251
                                       aten::detach         1.17%       1.831ms         3.65%       5.736ms      57.360us       1.680ms         1.06%       5.634ms      56.340us           100
                                          aten::mul         3.36%       5.278ms         3.36%       5.278ms      53.862us       5.215ms         3.30%       5.215ms      53.216us            98
                                          aten::div         3.42%       5.376ms         3.42%       5.376ms      53.759us       5.320ms         3.36%       5.320ms      53.196us           100
                                          aten::sub         6.79%      10.672ms         6.79%      10.672ms      53.901us      10.504ms         6.64%      10.504ms      53.050us           198
                                         aten::item         4.06%       6.380ms        12.02%      18.883ms      53.798us       6.127ms         3.87%      18.322ms      52.198us           351
                                          aten::add         3.28%       5.147ms         3.28%       5.147ms      52.518us       5.113ms         3.23%       5.113ms      52.171us            98
                                      aten::minimum         1.63%       2.555ms         1.63%       2.555ms      51.092us       2.585ms         1.64%       2.585ms      51.708us            50
                                      aten::maximum         3.22%       5.065ms         3.22%       5.065ms      50.646us       5.133ms         3.25%       5.133ms      51.329us           100
                                        aten::round         1.61%       2.529ms         1.61%       2.529ms      50.578us       2.528ms         1.60%       2.528ms      50.552us            50
                                        aten::zero_         1.99%       3.125ms         4.72%       7.422ms      49.481us       2.835ms         1.79%       7.269ms      48.462us           150
                                        aten::copy_         6.62%      10.394ms         6.62%      10.394ms      41.576us      10.252ms         6.48%      10.252ms      41.010us           250
                                             detach         2.49%       3.905ms         2.49%       3.905ms      39.049us       3.954ms         2.50%       3.954ms      39.539us           100
                                       aten::select         2.01%       3.154ms         2.47%       3.876ms      38.759us       3.866ms         2.44%       3.866ms      38.658us           100
                          aten::_local_scalar_dense         7.96%      12.503ms         7.96%      12.503ms      35.621us      12.195ms         7.71%      12.195ms      34.743us           351
                                           aten::to         2.31%       3.625ms         4.16%       6.530ms      32.650us       4.320ms         2.73%       6.270ms      31.348us           200
                                        aten::fill_         3.70%       5.808ms         3.70%       5.808ms      29.039us       5.892ms         3.73%       5.892ms      29.459us           200
                                   aten::as_strided         0.79%       1.244ms         0.79%       1.244ms       6.221us       0.000us         0.00%       0.000us       0.000us           200
                                        aten::empty         3.55%       5.579ms         3.55%       5.579ms      11.137us       0.000us         0.00%       0.000us       0.000us           501
                                      aten::resize_         2.36%       3.712ms         2.36%       3.712ms      12.332us       0.000us         0.00%       0.000us       0.000us           301
                                   aten::empty_like         1.45%       2.284ms         3.68%       5.776ms      28.878us       0.000us         0.00%       0.000us       0.000us           200
                                aten::empty_strided         2.80%       4.398ms         2.80%       4.398ms      17.592us       0.000us         0.00%       0.000us       0.000us           250
---------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 157.108ms
Self CUDA time total: 158.122ms
```

===> FusedFakeQuant
```
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                   fb::fused_fake_quant        23.42%       6.408ms       100.00%      27.361ms     547.215us       7.887ms        27.20%      28.996ms     579.925us            50
                  aten::fake_quantize_per_tensor_affine         4.25%       1.162ms        27.65%       7.565ms     151.298us     686.176us         2.37%      10.217ms     204.336us            50
aten::_fake_quantize_per_tensor_affine_cachemask_ten...        14.11%       3.860ms        23.40%       6.403ms     128.068us       9.531ms        32.87%       9.531ms     190.612us            50
                                         aten::_aminmax        20.57%       5.628ms        27.47%       7.515ms     150.305us       8.218ms        28.34%       8.218ms     164.367us            50
                                             aten::item         3.65%     999.522us        10.27%       2.810ms      56.202us     931.904us         3.21%       2.674ms      53.481us            50
                              aten::_local_scalar_dense         6.62%       1.811ms         6.62%       1.811ms      36.212us       1.742ms         6.01%       1.742ms      34.843us            50
                                            aten::empty        10.85%       2.969ms        10.85%       2.969ms      14.843us       0.000us         0.00%       0.000us       0.000us           200
                                       aten::as_strided         1.92%     524.365us         1.92%     524.365us       5.244us       0.000us         0.00%       0.000us       0.000us           100
                                       aten::empty_like         6.48%       1.774ms        14.62%       4.000ms      26.670us       0.000us         0.00%       0.000us       0.000us           150
                                    aten::empty_strided         8.14%       2.226ms         8.14%       2.226ms      14.842us       0.000us         0.00%       0.000us       0.000us           150
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 27.361ms
Self CUDA time total: 28.996ms
```

Test Plan:
python test/test_quantization.py TestFusedObsFakeQuantModule

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29706889

fbshipit-source-id: ae3f9fb1fc559920459bf6e8663e8299bf7d21e1
2021-07-21 10:13:04 -07:00
afdca41bab [quant] Add a new fused MovingAvg Obs + FakeQuant operator (GPU) (#61589)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61589

Custom GPU implementation that does the observer + calculate qparams calculation on GPU.
It calls the aten fake_quant_per_tensor/channel functions to perform the fake quant step.

Test Plan:
python test/test_quantization.py TestFusedObsFakeQuant

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29682761

fbshipit-source-id: 373a50f88481b7e5b4d9e65d84a6c174bb277dd4
2021-07-21 10:13:02 -07:00
92d3391fb1 [quant] Add a new fused MovingAvg Obs + FakeQuant operator(CPU) (#61570)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61570

Fused operator that computes moving average min/max values (in-place) of the input tensor and fake-quantizes it.
It expects the qmin/qmax values to reflect the range of the quantized tensor (instead of reduce_range)

Motivation for adding this operator is for performance reasons, since moving the computation from python to C++/CUDA can increase the performance of QAT.

Test Plan:
python test/test_quantization.py TestFusedObsFakeQuant

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29682762

fbshipit-source-id: 28e4c50e77236d6976fe4b326c9a12103ed95840
2021-07-21 10:11:41 -07:00
403f59701c Changes default DDP behavior to divide sparse grad by world size before allreduce, not after (#61814)
Summary:
I appreciate https://github.com/pytorch/pytorch/pull/61379, which restores the fusion of div-by-world-size and copy-to-allreduce-buffer for dense gradients. But i noticed in the wake of https://github.com/pytorch/pytorch/pull/61379 there's misaligned treatment of dense and sparse gradients. Specifically, dense gradients are dived by world size before the allreduce, and sparse gradients are dived by world size after the allreduce. On paper you wouldn't expect that to matter, but for cluster-scale DDP training with amp gradient scaling and allreduces of FP16 grads, we've noticed several cases where postdividing grads by world size caused nonconvergence while predividing worked. I'm not aware of any cases where the reverse was true.

This PR changes the treatment of sparse gradients to match the treatment of dense gradients (both will be dived by world size before allreduce).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61814

Reviewed By: mrshenli

Differential Revision: D29772444

Pulled By: rohan-varma

fbshipit-source-id: 033a17d5c019511889d908876282c6624fb26a2d
2021-07-21 09:54:53 -07:00
17d743ff04 ENH Adds test and docs for dropout for no batch dims (#61911)
Summary:
Towards https://github.com/pytorch/pytorch/issues/60585

I think `Dropout` is already tested in `test_Dropout` for no batch dims.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61911

Reviewed By: albanD

Differential Revision: D29810928

Pulled By: jbschlosser

fbshipit-source-id: 7716a1a808e9e34aae43573f38706212552afbb4
2021-07-21 09:07:10 -07:00
06df33857b fix adapative_avg_pool (#61851)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61851

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29812559

Pulled By: makslevental

fbshipit-source-id: ac54166aaec63992748ea3299c3144ee107b24f4
2021-07-21 08:42:26 -07:00
33db828e52 Revert D29647586: [jit] Renamed prim::Concat as prim::VarConcat
Test Plan: revert-hammer

Differential Revision:
D29647586 (db11619901)

Original commit changeset: cdd34ea5a3c9

fbshipit-source-id: bab5ac4ed67a00ac151fe39463aa3fb56897d7f4
2021-07-21 08:28:26 -07:00
48af9de92f ENH Enables No-batch for *Pad1d Modules (#61060)
Summary:
Toward https://github.com/pytorch/pytorch/issues/60585

This PR adds a `single_batch_reference_fn` that uses the single batch implementation to check no-batch.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61060

Reviewed By: mrshenli

Differential Revision: D29739823

Pulled By: jbschlosser

fbshipit-source-id: d90d88a3671177a647171801cc6ec7aa3df35482
2021-07-21 07:12:41 -07:00
bdf439a958 Adds _LazyInstanceNorm and LazyInstanceNormXd (#60982)
Summary:
Signed-off-by: Calvin McCarter <calvin@lightmatter.co>

Fixes https://github.com/pytorch/pytorch/issues/60981

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60982

Reviewed By: albanD

Differential Revision: D29810547

Pulled By: jbschlosser

fbshipit-source-id: d933d4c7fe5cf7be9b09a5ab93f740b94cf08cc1
2021-07-21 06:45:45 -07:00
db11619901 [jit] Renamed prim::Concat as prim::VarConcat (#61498)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61498

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D29647586

Pulled By: navahgar

fbshipit-source-id: cdd34ea5a3c986350a813be17e7d428844ea4cbf
2021-07-20 19:30:00 -07:00
7fbdc86aec [jit] Removed a local function to check for dominators and used the one added to Node class (#60909)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60909

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D29441864

Pulled By: navahgar

fbshipit-source-id: 362bd462fa70256dd1f8b05756a76da0cb3d4b76
2021-07-20 19:29:58 -07:00
429908e540 [jit] Updated the concat common inputs elimination pass to use the variadic cat op instead of aten::cat (#60908)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60908

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D29441865

Pulled By: navahgar

fbshipit-source-id: 2ab08168102eff1f43667ca418bdd94bb2df562a
2021-07-20 19:29:57 -07:00
53668f8bf6 [jit] Added an API to remove list mutations and replace with variadic cat until fixed point (#60776)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60776

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D29406099

Pulled By: navahgar

fbshipit-source-id: e2e69eb6ebff3bc6e25d80f46ce118e52f557fb6
2021-07-20 19:29:55 -07:00
0cfcf68aa5 [jit] Added special handling for prim::ListConstruct while checking for may alias inputs (#60775)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60775

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29406101

Pulled By: navahgar

fbshipit-source-id: 9b8a4050167750610400637e7de48ffa8727051a
2021-07-20 19:29:53 -07:00
4dd04a8bbe [jit] Handled cases when input list to cat is mutated after cat using AliasDb (#60774)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60774

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29406100

Pulled By: navahgar

fbshipit-source-id: af6afca65881c18c51b482eb63898a0f1c94d591
2021-07-20 19:28:42 -07:00
604f503d30 Revert D29794958 + compilation fix (#61937)
Summary:
This PR un-reverts https://github.com/pytorch/pytorch/issues/61475 + fixes compilation with MSVC, that does not recognize alternative operator spellings (i.e. using `or` instead of `||` )

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61937

Reviewed By: albanD

Differential Revision: D29805941

Pulled By: malfet

fbshipit-source-id: 01e5963c6717c1b44b260300d87ba0bf57f26ce9
2021-07-20 18:14:45 -07:00
a152c12d7b .github: Clone pytorch to separate directory (#61932)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61932

Clones pytorch to a separate directory for each run so that they do not
overlap with each other

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: zhouzhuojie

Differential Revision: D29801875

Pulled By: seemethere

fbshipit-source-id: 71a3c7c949e5aeacf033ae1fc9aaef13b42833b6
2021-07-20 17:30:30 -07:00
7cbb7c6d2e [vulkan] Make vulkan ops selective (#58332)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58332

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D28454976

Pulled By: IvanKobzarev

fbshipit-source-id: 445c1f326be76e3530a4884aa5fe749d636e0ae5
2021-07-20 16:26:55 -07:00
73fbf43684 [vulkan] Fix asserts (#61495)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61495

Test Plan: Imported from OSS

Reviewed By: SS-JIA

Differential Revision: D29647357

Pulled By: IvanKobzarev

fbshipit-source-id: cb4ba15f28625ea6e667883c9a2d31eba48b6f37
2021-07-20 16:07:13 -07:00
22fff61f06 Revert D29794958: [pytorch][PR] changing trapz to trapezoid
Test Plan: revert-hammer

Differential Revision:
D29794958 (95cec8f4fa)

Original commit changeset: 60b9c07efd47

fbshipit-source-id: 2dcda2d62e01c2521a86ae5ed8246cfb686d3f64
2021-07-20 16:00:46 -07:00
e067960243 lint_setup should not require elevated privileges (#61798)
Summary:
s/pip/pip3/ (because unversioned pip can reference either pip2 or pip3
depending on setup)
Always invoke `pip install` with user option (otherwise, unless one is
using conda environment, it will try to install in system folder, which
should not be writable to regular users)

Do not install shellcheck in `/usr/bin`, but rather rely on `~/.local/bin` and add it to the PATH

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61798

Reviewed By: zhouzhuojie, seemethere

Differential Revision: D29747286

Pulled By: malfet

fbshipit-source-id: 30cb51fe60b5096b758f430d1c51465205532a19
2021-07-20 15:53:12 -07:00
994434ad16 Adding complex number support for all_to_all/scatter (#61299)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61299

Modifying all_to_all and scatter to support complex numbers as well as float numbers.

Test Plan: buck run //caffe2/test/distributed:distributed_gloo_fork -- test_name --print-passing-details --run-disabled

Reviewed By: wanchaol

Differential Revision: D29563938

fbshipit-source-id: 59e436b3fa1aee3d5195cbcffd39587e642c76b9
2021-07-20 15:45:34 -07:00
457a0b63bf use torch.bucketize into_sparse_csr implementation (+ additional tests) (#61340)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57381

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61340

Reviewed By: bhosmer

Differential Revision: D29601393

Pulled By: cpuhrsch

fbshipit-source-id: 4ca1f013d96e8716f0e658e0cd685d9aa0d98a5c
2021-07-20 15:44:25 -07:00
95cec8f4fa changing trapz to trapezoid (#61475)
Summary:
This PR resolves issue https://github.com/pytorch/pytorch/issues/52606 while also adding support for complex number

Stack from [ghstack](https://github.com/ezyang/ghstack):
* https://github.com/pytorch/pytorch/issues/61616
* https://github.com/pytorch/pytorch/issues/61615
* **https://github.com/pytorch/pytorch/issues/61475**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61475

Reviewed By: mruberry

Differential Revision: D29794958

Pulled By: NivekT

fbshipit-source-id: 60b9c07efd47fd85b9c8178768fc7828d7b57d29
2021-07-20 15:25:55 -07:00
86715623dd Adding super calls to JIT test case setUp and tearDown (#61922)
Summary:
This issue was surfaced when adding this issue: https://github.com/pytorch/pytorch/issues/61655 did not manage to skip the appropriate test case.

I then investigated and realized it was because the setUp code that does the test disabling is not called because another defined setUp overrode the parent class' setUp.

I am not sure if that was intentional--if so we would have to adopt the child class' code to call the check_if_enable function in common_utils.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61922

Reviewed By: ejguan

Differential Revision: D29798716

Pulled By: janeyx99

fbshipit-source-id: d31b664e48507d69de14574ff5e6ecf1d41ae24d
2021-07-20 15:08:44 -07:00
7acb8b71e1 Remove AVX detection code that duplicates FindAVX.cmake (#61748)
Summary:
This PR deletes some code in `MiscCheck.cmake` that perform the exact
same functionality as `FindAVX.cmake`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61748

Reviewed By: ejguan

Differential Revision: D29791282

Pulled By: malfet

fbshipit-source-id: 6595fd1b61c8ae12b821fad8c9a34892dd52d213
2021-07-20 14:34:36 -07:00
e8d2916b84 Add faulty tensorpipe implementation (#61421)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61421

This PR adds the faulty tensorpipe agent implementation and replaces all faulty process group agent tests with it. The faulty tensorpipe agent code is very similar to that of faulty process group agent. It allows the user to fail or delay certain types of rpc messages, which is used in the faulty agent tests. These changes are needed to deprecate the process group rpc backend.

Summary of changes:
- Add faulty tensorpipe agent class
- Update tensorpipe pipeWrite function to allow to be overwritten and add delay
- Update test backend registry and faulty agent tests to use the FAULTY_TENSORPIPE_AGENT backend.

This effects all faulty agent tests, here a few of them as sample commands:
`pytest test/distributed/rpc/test_faulty_agent.py -vs -k test_verify_backend_options`
`pytest test/distributed/rpc/test_faulty_agent.py -vs -k test_no_faulty_messages`
`pytest test/distributed/rpc/test_faulty_agent.py -vs -k test_builtin_remote_message_dropped_timeout`

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29773739

Pulled By: H-Huang

fbshipit-source-id: 6b2bc366735d70b79943d4207f454bc9555bbf5f
2021-07-20 13:54:30 -07:00
d856914c57 Fix missing braces (#61745)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61745

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29717538

fbshipit-source-id: ed0ff4fb6a72b701bf6d36ebde343672356a916a
2021-07-20 13:32:38 -07:00
f78142b68d Modernize emplace (#61742)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61742

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29717433

fbshipit-source-id: 93996388780862e90ab4e697508407091e8e763b
2021-07-20 13:31:19 -07:00
2c2a084012 approx 100x acceleration for parse_kineto_results (#60432)
Summary:
Fixes https://github.com/pytorch/kineto/issues/308, https://github.com/pytorch/pytorch/issues/58983 maybe related

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60432

Reviewed By: ilia-cher

Differential Revision: D29715257

Pulled By: gdankel

fbshipit-source-id: 7c94d1bb00b609f502db7aa9d9a447ab09645e6a
2021-07-20 13:21:49 -07:00
4567a50b2a Enable clang-tidy on master (#61689)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61689

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29767984

Pulled By: 1ntEgr8

fbshipit-source-id: 658355da274ada41e01ed2772a03a701b90fbbab
2021-07-20 12:55:12 -07:00
8b88c24670 add channels last support for thnn_conv2d (non-dilated) (#49582)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49582

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D26007050

Pulled By: VitalyFedyunin

fbshipit-source-id: 1289e0687c2459dd4eb8e4ba2efc8266397cfe5f
2021-07-20 12:50:24 -07:00
91bc285084 Fix clang-tidy error in pre-commit script (#61918)
Summary:
Fixes a clang-tidy error in the git-pre-commit script. See log below for the error it fixes.

```
Running pre-commit flake8
Running pre-commit clang-tidy
usage: clang_tidy [-h] [-e CLANG_TIDY_EXE] [-g GLOB] [-x REGEX] [-c COMPILE_COMMANDS_DIR] [--diff-file DIFF_FILE] [-p PATHS [PATHS ...]] [-n] [-v] [-q] [--config-file CONFIG_FILE] [--print-include-paths] [-I INCLUDE_DIR] [-s]
                  [--disable-progress-bar]
                  [extra_args [extra_args ...]]
clang_tidy: error: unrecognized arguments: -j
```

It gets rid of the redundant binary check because `tools.linter.clang_tidy` already does this.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61918

Test Plan: Run `tools/git-pre-commit`. It should not show a clang-tidy error.

Reviewed By: driazati

Differential Revision: D29796383

Pulled By: 1ntEgr8

fbshipit-source-id: b804b0170747f04e84d21e03d1c4985748d78cf2
2021-07-20 12:40:56 -07:00
f6446802c7 Revert D29783943: [pytorch][PR] add BFloat16 operators on CPU: arange, acosh, asinh, atanh, exp2, digamma, trigamma, polygamma
Test Plan: revert-hammer

Differential Revision:
D29783943 (513c40cb1a)

Original commit changeset: 40cebe829720

fbshipit-source-id: 5276dea572f1286dad7b7caa69ecc2f369ec13ff
2021-07-20 12:33:52 -07:00
c2cc6a9396 Add generic join unit tests (#61786)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61786

This adds unit tests for the generic join context manager.

```
gpurun python test/distributed/algorithms/test_join.py
```

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29746646

Pulled By: andwgu

fbshipit-source-id: 2933d85783c2225574c4b77bfb90064690c6e668
2021-07-20 12:13:05 -07:00
1c80b5220b nll_loss_forward: port to structured kernel (#61443)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61443

For more information, see #55070.

This PR also adds a new type, `OptionalTensorRef` as a replacement for `c10::optional<Tensor>&` in order to avoid the reference count manipulations that are inevitable with the latter. I have confirmed using Godbolt/Compiler Explorer that this class does indeed avoid manipulating the reference count of the `intrusive_ptr` inside the `Tensor` it refers to:

1. [P429709479](https://www.internalfb.com/phabricator/paste/view/P429709479) - Given a `const Tensor&` in scope, an `OptionalTensorRef` can be constructed without bumping refcount.
2. [P429709883](https://www.internalfb.com/phabricator/paste/view/P429709883) - Given an `OptionalTensorRef`, a `const Tensor&` can be produced without bumping refcount.
3. [P429710335](https://www.internalfb.com/phabricator/paste/view/P429710335) - When `OptionalTensorRef` is destructed, the refcount should not be decremented.
4. [P429769525](https://www.internalfb.com/phabricator/paste/view/P429769525) - `OptionalTensorRef` can be assigned without refcount manipulation.
5. [P429769882](https://www.internalfb.com/phabricator/paste/view/P429769882) - `OptionalTensorRef` can be move assigned without refcount manipulation.

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D29780666

Pulled By: SplitInfinity

fbshipit-source-id: 7af157215300e9254d635433cbd583f7329fe064
2021-07-20 11:45:44 -07:00
f0df0207ec [jit] Arithmetic simplification for integers. (#61444)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61444

Add a mini pass to merge arithmetic nodes like (((x - 1) + 2) * 1) - 1.
Issue #60913

Test Plan:
python test/test_jit.py TestPeephole.test_peephole_arith

Imported from OSS

Reviewed By: eellison

Differential Revision: D29630614

fbshipit-source-id: 08ac64cee39070401f9ff9163d309f20ff53c5ac
2021-07-20 11:35:42 -07:00
d2abfc547b Add ShardedTensorMetadata for ShardedTensor. (#61683)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61683

This PR adds a consolidated metadata field (ShardedTensorMetadata)
which has all the necessary global metadata for a ShardedTensor.
ghstack-source-id: 133847517

Test Plan:
1) unit tests
2) waitforbuildbot

Reviewed By: wanchaol

Differential Revision: D29703719

fbshipit-source-id: 567279e46c787a88ef3310e4dce6fd2ad7631c62
2021-07-20 11:28:13 -07:00
87334c40a7 Remove torch._bmm and remove torch.bmm deterministic arg documentation (#61629)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61571

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61629

Reviewed By: mrshenli

Differential Revision: D29774486

Pulled By: albanD

fbshipit-source-id: bfc9119c478f0244d5be681bcf4954a3eb97e542
2021-07-20 10:55:43 -07:00
513c40cb1a add BFloat16 operators on CPU: arange, acosh, asinh, atanh, exp2, digamma, trigamma, polygamma (#60444)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60444

Reviewed By: ejguan

Differential Revision: D29783943

Pulled By: ezyang

fbshipit-source-id: 40cebe8297207669d1ca430ed1d1e81dda5a0c45
2021-07-20 10:30:04 -07:00
45751e0b34 Support integral target for the backward of nn.SmoothL1Loss (#61112)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58816

- enhance the backward of `nn.SmoothL1Loss` to allow integral `target`
- add test cases in `test_nn.py` to check the `input.grad` between the integral input and its floating counterpart.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61112

Reviewed By: mrshenli

Differential Revision: D29775660

Pulled By: albanD

fbshipit-source-id: 544eabb6ce1ea13e1e79f8f18c70f148e92be508
2021-07-20 10:24:03 -07:00
59a5312ce6 Modernize fix deprecated header (#61736)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61736

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29716965

fbshipit-source-id: 314c2b557c240ac16bbfab114ab764beb189e78a
2021-07-20 10:06:11 -07:00
5a04bd8723 Modernize some loops in torch (#61737)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61737

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29716813

fbshipit-source-id: 21f9716bead4e0e913406e681c55d1956327e6af
2021-07-20 10:04:54 -07:00
65616184bc [Docs] Bundle of errata and small corrections / improvements for torch.linalg docs (#61578)
Summary:
This PR bundles a number of errata detected in the linalg docs over the last few weeks.

- Simpler Cholesky deprecation rule
- Remove repeated consecutive words
- Correct cond with rcond in lstsq
- Correct examples of lstsq
- More concise examples
- Use the names of the inputs / outputs in the variables of the examples

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61578

Reviewed By: mrshenli

Differential Revision: D29757988

Pulled By: mruberry

fbshipit-source-id: a740a64826c065c1d7c1b8b498364d147008d76d
2021-07-20 09:58:09 -07:00
a0c9d70fba bitwise_and: Port to structured (#60813)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60813

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29449374

Pulled By: ezyang

fbshipit-source-id: d7e236ad841dcb9d5914352d117a34b10894bb91
2021-07-20 09:01:41 -07:00
875d63ed04 bitwise_xor: Port to structured (#60812)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60812

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29449372

Pulled By: ezyang

fbshipit-source-id: 016d2012f64486c2490ff319e753b0d054dccf2c
2021-07-20 09:01:40 -07:00
ce8aeefbf4 bitwise_or: Port to strucutred (#60811)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60811

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29449370

Pulled By: ezyang

fbshipit-source-id: ac176985b0141a55807ba909d7342eb35b1dc28f
2021-07-20 09:00:20 -07:00
f59ac5abc8 Add thread local state guards in autograd engine hooks. (#60067)
Summary:
The thread local state of backward thread is not aligned to the GraphTask's `thread_local_` when calling the hooks in backward.

This is required for profiling the statistics c10d operation of `DistributedDataParallel` module.

Is there any concern to add the thread local state guard when calling the hooks in backward? ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60067

Reviewed By: ezyang

Differential Revision: D29654599

Pulled By: albanD

fbshipit-source-id: 656c4f91017184fd40f1a184de24757a13387e37
2021-07-20 07:41:49 -07:00
641f6ef8a7 Implement IMethod::getArgumentNames() (#61856)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61856

This diff did the following few things:
1. It implemented IMethod::getArgumentNames() for all IMethod's subclasses.
2. It refactors PyTorchDeployPredictor to use IMethod for model executions.

Test Plan:
[... ~/fbsource/fbcode/caffe2] buck test mode/dev caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor
[... ~/fbsource/fbcode/caffe2] buck test mode/dev caffe2/fb/predictor:pytorch_predictor_test -- PyTorchPredictor

Reviewed By: wconstab

Differential Revision: D29648756

fbshipit-source-id: e047345f26ce495a5d74d8063f7f8edc32a1b13c
2021-07-19 23:16:48 -07:00
42d6543c7b [bc-breaking] Dispatch index_put with boolean mask argument to masked_fill (#61612)
Summary:
https://github.com/pytorch/pytorch/issues/57515

Based on ngimel 's branch, with a few tweaks to determine when to copy value tensors to device memory/additional tests.
bc-breaking note: Previously, if in `x[index]=value` `value` was a 0-d tensor with device different from `x`'s device, it resulted in a RuntimeError. Now this case is handled by copying `value` to the correct device.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61612

Reviewed By: mrshenli

Differential Revision: D29753491

Pulled By: ngimel

fbshipit-source-id: 3fba14f4c2b9b136b50af020f9c1eda88f7373b0
2021-07-19 22:53:14 -07:00
018dc4193e Factor vector intrinsics out of SumKernel.cpp (#61483)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61483

This will make it simpler to support AVX512 which is upcoming in #56992, see https://github.com/pytorch/pytorch/pull/56992#discussion_r667060280 for reference.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29753536

Pulled By: ngimel

fbshipit-source-id: 03ae66cdc01a3679c67214468e2bdf93b15c3bc2
2021-07-19 21:49:01 -07:00
c44d9d9f70 Use cascade-summation to improve nansum accuracy (#61082)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61082

Fixes #59415

This implements nansum as a new `LoadPolicy` for the existing sum functions.
So, it's using the more accurate cascade-sum algorithm.

I've also expanded `test_nansum` to cover the four special cases of the sum
algorithm (inner/outer reduction; vectorized or scalar).

Nansum performance comparison
-----------------------------
For float sums, contiguous reductions are as much as 10x faster and discontiguous sums are ~1.8x faster (more for small shapes due to TensorIterator overheads).

|        Shape | Dim | Master Contiguous (us) | This PR Contiguous (us) | Master Discontiguous (us) | This PR Discontiguous (us) |
|-------------:|-----|:----------------------:|:-----------------------:|:-------------------------:|:--------------------------:|
|     10, 1000 | 0   |          74.9          |           2.02          |            75.6           |            6.41            |
|              | 1   |          8.24          |           1.8           |            8.28           |            5.24            |
|    100, 1000 | 0   |           134          |           7.55          |            130            |            43.2            |
|              | 1   |          70.5          |           7.01          |            71.5           |            40.6            |
|   1000, 1000 | 0   |           726          |           69.2          |            737            |             403            |
|              | 1   |           702          |           51.0          |            709            |             404            |
|  10000, 1000 | 0   |         15,300         |          2,470          |           18,200          |           10,400           |
|              | 1   |          7,200         |          1,160          |           7,470           |            4,440           |
| 100000, 1000 | 0   |         163,000        |          28,000         |          199,000          |           131,000          |
|              | 1   |         70,700         |          13,500         |           75,700          |           44,200           |

Sum performace comparison
-------------------------

For float sums, performance is unchanged to within measurement precision:
|        Shape | Dim | Master Contiguous (us) | This PR Contiguous (us) | Master Discontiguous (us) | This PR Discontiguous (us) |
|-------------:|-----|:----------------------:|:-----------------------:|:-------------------------:|:--------------------------:|
|     10, 1000 | 0   |          1.92          |           2.01          |            4.2            |            4.49            |
|              | 1   |          1.68          |           1.68          |            2.79           |            2.75            |
|    100, 1000 | 0   |          6.52          |           7.07          |            26.9           |            27.3            |
|              | 1   |          5.91          |           5.66          |            16.8           |            16.9            |
|   1000, 1000 | 0   |          55.6          |           58.6          |            256            |             254            |
|              | 1   |          41.0          |           41.2          |            150            |             147            |
|  10000, 1000 | 0   |          1,370         |          1,650          |           8,070           |            8,020           |
|              | 1   |           908          |           845           |           3,100           |            2,980           |
| 100000, 1000 | 0   |         24,700         |          24,700         |           90,900          |           91,000           |
|              | 1   |         12,500         |          12,100         |           31,500          |           31,800           |

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29753523

Pulled By: ngimel

fbshipit-source-id: 28095ac39e4a07ff878775c98f7a7815d9a4e457
2021-07-19 21:47:43 -07:00
bf1c9aaa79 logit_backward: Port to structured (#60817)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60817

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29449376

Pulled By: ezyang

fbshipit-source-id: e6f793300488370f50a97db58f0400c557ee64e5
2021-07-19 21:23:05 -07:00
b8686b42d8 tanh_backward: Port to structured (#60816)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60816

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29449375

Pulled By: ezyang

fbshipit-source-id: 93b70341fc6a2a42056fef74d6e5d81ec34e9da2
2021-07-19 21:23:03 -07:00
8c42d7ad07 sigmoid_backward: Port to structured (#60815)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60815

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29449371

Pulled By: ezyang

fbshipit-source-id: e68c05cc90446e86d50b67d8346f145bf13ed207
2021-07-19 21:23:01 -07:00
11cc179366 xlogy: Port to structured (#60814)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60814

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29449373

Pulled By: ezyang

fbshipit-source-id: a37499cd4fabff80f848627def7dd500364b8a22
2021-07-19 21:21:54 -07:00
9fb6b40f3e Makes a streaming backward test try gradient stealing more directly (#60065)
Summary:
Closes https://github.com/pytorch/pytorch/issues/59846.

https://github.com/pytorch/pytorch/issues/59846 is likely paranoia, and some of the test_streaming_backward_* in test_cuda.py already use gradient stealing (ie, they start with `.grad`s as None before backward). Regardless, this PR augments one of the tests to stress gradient stealing a bit more directly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60065

Reviewed By: mrshenli

Differential Revision: D29779518

Pulled By: ngimel

fbshipit-source-id: ccbf278543c3adebe5f4ba0365b1dace9a14da9b
2021-07-19 20:39:55 -07:00
873cc7a46d Support 3 argument variant of the getattr() call where the third arg is the default return value (#61599)
Summary:
Issue: https://github.com/pytorch/pytorch/issues/56909

Note the emitted code for such a call will either be a) getattr() call with first two args if the
attribute name (which must be a string literal) is determined to be valid based on the hasAttr() result,
or b) just the AST node for the default value (the 3rd arg) alone with no getattr call at all.

Test code:

```
import torch
import numpy as np

class Shape:
    def __init__(self):
        self.center = 1.0

def f(x):
    s = Shape()
    return getattr(s, "missing", [])

y = torch.jit.script(f)
print(y.graph)
```
Output:
```
graph(%x : Tensor):
  %s.1 : __torch__.Shape = prim::CreateObject()
  %2 : NoneType = prim::CallMethod[name="__init__"](%s.1) # ts.py:10:8
  %4 : Tensor[] = prim::ListConstruct()
  return (%4)
```

Another example:
```
import torch

class Shape:
    def __init__(self):
        self.center = 1.0

def f(x):
    s = Shape()
    y = getattr(s, "center")
    w : list[float] = [1.0]
    z = getattr(s, "missing", w)
    z.append(y)
    return z

y = torch.jit.script(f)
print(y.graph)
 --- output ---

graph(%x : Tensor):
  %5 : float = prim::Constant[value=1.]() # ts.py:12:23
  %s.1 : __torch__.Shape = prim::CreateObject()
  %2 : NoneType = prim::CallMethod[name="__init__"](%s.1) # ts.py:10:8
  %center : float = prim::GetAttr[name="center"](%s.1)
  %w.1 : float[] = prim::ListConstruct(%5)
  %11 : float[] = aten::append(%w.1, %center) # ts.py:14:4
  return (%w.1)
```
Fixes #{56969}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61599

Reviewed By: ZolotukhinM

Differential Revision: D29776058

Pulled By: jerryzhenleicai

fbshipit-source-id: 76333bd54002e08a064677c1f287115a80cc7c8e
2021-07-19 20:04:21 -07:00
ffd2e602f4 [CUDA graphs] Make sure graph mempool cudaMalloc_count decrement pairs with cudaFree for all allocations (#61567)
Summary:
Graphs mempools aren't deleted until all their allocations are cudaFreed. `PrivatePool::cudaMalloc_count` tracks the number of outstanding (not-yet-cudaFreed) allocations.

https://github.com/pytorch/pytorch/pull/44742 moves cudaFree to [release_block](https://github.com/pytorch/pytorch/pull/44742/files#diff-acc6337586bf9cdcf0a684380779300ec171897d05b8569bf439820dc8c93bd5R1160), while the `cudaMalloc_count` decrement (if needed) remains in a caller ([release_blocks](https://github.com/pytorch/pytorch/pull/44742/files#diff-acc6337586bf9cdcf0a684380779300ec171897d05b8569bf439820dc8c93bd5R1177)). But I noticed there's also a path ([release_available_cached_blocks](https://github.com/pytorch/pytorch/pull/44742/files#diff-acc6337586bf9cdcf0a684380779300ec171897d05b8569bf439820dc8c93bd5R1094)) that calls `release_block` without calling `release_blocks`, in other words, it calls cudaFree but dodges any potential `cudaMalloc_count` decrement.

In practice, the way the code is currently organized, I don't _think_ this second path can cause the pool to become a zombie whose `cudaMalloc_count` will never reach zero (I think this could only happen if you call `release_available_cached_blocks` on a private pool, and the only way it would be called on a private pool is if capture is underway, and if capture is underway, the cudaFree call will hard error). Regardless, I feel much more comfortable keeping the cudaMalloc_count decrement right next to the cudaFree.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61567

Reviewed By: mrshenli

Differential Revision: D29765198

Pulled By: ezyang

fbshipit-source-id: bcbeed656c3e0d101112aa470d8a098c73a011b1
2021-07-19 19:22:18 -07:00
208d06ca8c Port other comparison ops: ne, lt, gt, le, ge to structured kernels. (#60942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60942

Tracking Issue: #55070

This PR applies the same transformation of `eq` to the other comparison ops: `ne`, `lt`,
`gt`, `le`, and `ge`. Macros for crating meta and impl functions are used (since the
checks they have are the same).

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D29509868

Pulled By: ezyang

fbshipit-source-id: 6a1ed1d93d08884c9e09d3f419037533a235d68c
2021-07-19 19:14:12 -07:00
97327137ba Port eq kernel to structured kernels. (#60177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60177

Tracking issue: #55070

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D29509871

Pulled By: ezyang

fbshipit-source-id: ad81bb49c46edc81c705d12108b98c5ffaaddf92
2021-07-19 19:13:09 -07:00
64ac428889 [vulkan] Adaptive local work group size (#61170)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61170

Instead of using a fixed local work group size of {4,4,4}, adjust the size based on the global size in order to minimize the number of inactive invocations.

## Perf improvements from this change
On aloha portal devices, in conjunction with the below diff that tweaks the conv2d_pw shader to calculate a 4x4 output, benchmark latency of the xirp14b model was reduced from ~8.7 ms to ~6.6 ms.

Test Plan:
```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
cd -
```

Reviewed By: IvanKobzarev

Differential Revision: D28724591

fbshipit-source-id: ede896300b2be1a9578e492cb870121012886aa7
2021-07-19 18:52:19 -07:00
f324421d34 [vulkan] Calculate a 4x4 output tile for each invocation in conv2d_pw (#60760)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60760

A simple optimization to the `conv2d_pw` shader that makes each invocation calculate a 4x4 output tile instead of a single output texel. This results in better memory reuse and subsequently a pretty significant performance win for models similar to the MobileNets.

## Perf improvements from this change
On aloha portal devices, in conjunction with the above diff that introduces adaptive work group sizes, benchmark latency of the xirp14b model was reduced from ~8.7 ms to ~6.6 ms.

Test Plan:
Test vulkan ops:

```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
cd -
```

Reviewed By: IvanKobzarev

Differential Revision: D28724590

fbshipit-source-id: e742286b01bf566dc6378677be55409b7faa8cfb
2021-07-19 18:52:18 -07:00
a1b5025ecd [vulkan] Convolution op cleanup (#60759)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60759

Remove unused convolution implementations and refactor convolution op code to make this file easier to maintain.

Test Plan:
Test vulkan ops:

```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
cd -
```

Reviewed By: IvanKobzarev

Differential Revision: D28724592

fbshipit-source-id: cb509fa1cd68089f78188bfb3c866aabc9b0cbdb
2021-07-19 18:52:16 -07:00
cacab7e9d6 [vulkan] Reduce submission rate to save CPU cycles (#60758)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60758

Further tweak the submission rate of ops. Before in D28293756 (bc0965ac85), the submission rate was set as high as possible in order to prioritize performance. However, in practice (i.e. when running the model in an app) the high rate of submission increases CPU usage and increases GPU contention which may regress fps.

In the future it would be beneficial to devise a scheme to adaptively set the GPU submission rate.

## Perf Improvements
This change doesn't really affect benchmark latency. However, through systraces it can be observed that CPU usage is reduced without too much impact on FPS/model latency.

Test Plan:
Test vulkan ops:

```

cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
cd -
```

Reviewed By: IvanKobzarev

Differential Revision: D29062836

fbshipit-source-id: 1a0f42b49fecb80baee08cb3f1048bb35a1b5d5c
2021-07-19 18:51:04 -07:00
554038c2a2 [package] merge test_torchscript into test_package_script (#61807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61807

These shouldn't be separate files, they test the same thing

Differential Revision:
D29748967
D29748967

Test Plan: Imported from OSS

Reviewed By: Lilyjjo

Pulled By: suo

fbshipit-source-id: 177f40fa460d00d064dfd1f33a0b6656b214a296
2021-07-19 18:23:45 -07:00
f02cfcc802 ban PyTorchStreamWriter from writing the same file twice (#61805)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61805

Similar in spirit to https://github.com/pytorch/pytorch/pull/61371.
While writing two files with the same name is allowed by the ZIP format,
most tools (including our own) handle this poorly. Previously I banned
this within `PackageExporter`, but that doesn't cover other uses of the
zip format like TorchScript.

Given that there are no valid use cases and debugging issues caused by
multiple file writes is fiendishly difficult, banning this behavior enitrely.

Differential Revision:
D29748968
D29748968

Test Plan: Imported from OSS

Reviewed By: Lilyjjo

Pulled By: suo

fbshipit-source-id: 0afee1506c59c0f283ef41e4be562f9c22f21023
2021-07-19 18:23:43 -07:00
04043d681e [package] fix storage serialization collision (#61806)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61806

Currently, if you do `save_pickle` on a ScriptModule, then `save_pickle`
on a tensor, this would result in a `0.storage` tensor being written
*twice* to the zip archive. This would cause weird bugs on the
serializing side (this presented as a ASAN-detected heap buffer overflow
because we tried to read more memory from a tensor than we actually
had).

Turns out this was because when we did:
```
self.storage_context = self.script_module_serializer.storage_context()
```
it returned a new copy of the storage context, so we weren't actually
assigning unique names to tensors!!

This PR fixes the issue by making `(De)SerializationStorageContext`
non-copyable and fixing up the parts of the bindings that returned by
copy.

Differential Revision:
D29748969
D29748969

Test Plan: Imported from OSS

Reviewed By: Lilyjjo

Pulled By: suo

fbshipit-source-id: c2f89ab270e07e7a111fb35c545b5e07b804dc3c
2021-07-19 18:22:36 -07:00
c30048fccf add BFloat16 support for topk on CPU (#59547)
Summary:
Added BFloat16 support for topk on CPU, and collected the benchmark data of topk for BFloat16 and Float32 data type by using the operator_benchmark tool of PyTorch on the platform of Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz

Input: 512x512, 512x1024, 1024x512, 1024x1024
K: 5
Number of cores: 1 core, 28 cores(1 socket)

For 1 core:

 ----------------------------------------
 PyTorch/Caffe2 Operator Micro-benchmarks
 ----------------------------------------
 Tag : all

 Benchmarking PyTorch: topk
 Mode: Eager
 Name: topk_H512_W512_k5_dtypetorch.float32_cpu
 Input: H: 512, W: 512, k: 5, dtype: torch.float32, device: cpu
Forward Execution Time (us) : 911.401

 Benchmarking PyTorch: topk
 Mode: Eager
 Name: topk_H512_W512_k5_dtypetorch.bfloat16_cpu
 Input: H: 512, W: 512, k: 5, dtype: torch.bfloat16, device: cpu
Forward Execution Time (us) : 911.700

 Benchmarking PyTorch: topk
 Mode: Eager
 Name: topk_H512_W1024_k5_dtypetorch.float32_cpu
 Input: H: 512, W: 1024, k: 5, dtype: torch.float32, device: cpu
Forward Execution Time (us) : 1506.927

 Benchmarking PyTorch: topk
 Mode: Eager
 Name: topk_H512_W1024_k5_dtypetorch.bfloat16_cpu
 Input: H: 512, W: 1024, k: 5, dtype: torch.bfloat16, device: cpu
Forward Execution Time (us) : 1492.036

 Benchmarking PyTorch: topk
 Mode: Eager
 Name: topk_H1024_W512_k5_dtypetorch.float32_cpu
 Input: H: 1024, W: 512, k: 5, dtype: torch.float32, device: cpu
Forward Execution Time (us) : 1825.634

 Benchmarking PyTorch: topk
 Mode: Eager
 Name: topk_H1024_W512_k5_dtypetorch.bfloat16_cpu
 Input: H: 1024, W: 512, k: 5, dtype: torch.bfloat16, device: cpu
Forward Execution Time (us) : 1819.872

 Benchmarking PyTorch: topk
 Mode: Eager
 Name: topk_H1024_W1024_k5_dtypetorch.float32_cpu
 Input: H: 1024, W: 1024, k: 5, dtype: torch.float32, device: cpu
Forward Execution Time (us) : 3001.459

 Benchmarking PyTorch: topk
 Mode: Eager
 Name: topk_H1024_W1024_k5_dtypetorch.bfloat16_cpu
 Input: H: 1024, W: 1024, k: 5, dtype: torch.bfloat16, device: cpu
Forward Execution Time (us) : 2970.718

For 28 cores(1 socket):

 ----------------------------------------
 PyTorch/Caffe2 Operator Micro-benchmarks
 ----------------------------------------
 Tag : all

 Benchmarking PyTorch: topk
 Mode: Eager
 Name: topk_H512_W512_k5_dtypetorch.float32_cpu
 Input: H: 512, W: 512, k: 5, dtype: torch.float32, device: cpu
Forward Execution Time (us) : 146.995

 Benchmarking PyTorch: topk
 Mode: Eager
 Name: topk_H512_W512_k5_dtypetorch.bfloat16_cpu
 Input: H: 512, W: 512, k: 5, dtype: torch.bfloat16, device: cpu
Forward Execution Time (us) : 123.423

 Benchmarking PyTorch: topk
 Mode: Eager
 Name: topk_H512_W1024_k5_dtypetorch.float32_cpu
 Input: H: 512, W: 1024, k: 5, dtype: torch.float32, device: cpu
Forward Execution Time (us) : 105.967

 Benchmarking PyTorch: topk
 Mode: Eager
 Name: topk_H512_W1024_k5_dtypetorch.bfloat16_cpu
 Input: H: 512, W: 1024, k: 5, dtype: torch.bfloat16, device: cpu
Forward Execution Time (us) : 101.498

 Benchmarking PyTorch: topk
 Mode: Eager
 Name: topk_H1024_W512_k5_dtypetorch.float32_cpu
 Input: H: 1024, W: 512, k: 5, dtype: torch.float32, device: cpu
Forward Execution Time (us) : 128.023

 Benchmarking PyTorch: topk
 Mode: Eager
 Name: topk_H1024_W512_k5_dtypetorch.bfloat16_cpu
 Input: H: 1024, W: 512, k: 5, dtype: torch.bfloat16, device: cpu
Forward Execution Time (us) : 125.172

 Benchmarking PyTorch: topk
 Mode: Eager
 Name: topk_H1024_W1024_k5_dtypetorch.float32_cpu
 Input: H: 1024, W: 1024, k: 5, dtype: torch.float32, device: cpu
Forward Execution Time (us) : 129.855

 Benchmarking PyTorch: topk
 Mode: Eager
 Name: topk_H1024_W1024_k5_dtypetorch.bfloat16_cpu
 Input: H: 1024, W: 1024, k: 5, dtype: torch.bfloat16, device: cpu
Forward Execution Time (us) : 124.556

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59547

Reviewed By: mrshenli

Differential Revision: D29763916

Pulled By: ezyang

fbshipit-source-id: 706c7d4349ac9ebd5d63f4844fca70febcb67023
2021-07-19 16:06:24 -07:00
15210f3b82 ignore and clear not ready errors (#61554)
Summary:
Follow-up to https://github.com/pytorch/pytorch/issues/18584. This PR covers the remaining places where event or stream query might result in not ready errors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61554

Reviewed By: mrshenli

Differential Revision: D29763973

Pulled By: ezyang

fbshipit-source-id: 41d988d1826b2309cc6b01a81144094b353abdf9
2021-07-19 16:03:04 -07:00
e68c016871 Regenerate libtorch workflow files that got lost in merge conflict (#61872)
Summary:
Forward fixes merge conflict on master: https://github.com/pytorch/pytorch/runs/3106027618

for PR https://github.com/pytorch/pytorch/issues/61774

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61872

Reviewed By: dzhulgakov

Differential Revision: D29775595

Pulled By: janeyx99

fbshipit-source-id: 8194dd123f166fd5f3fd1e77417e865c188f40c8
2021-07-19 15:30:13 -07:00
0a6d88244b Fix grammatical errors on the PyTorch Contribution Guide (#61818)
Summary:
## What does the PR do?
- Fix grammatical errors on the PyTorch Contribution Guide page.

## Changes [Screenshots]
> Note:
> 1. The changes are highlighted in each screenshot.
> 2. Could not load CSS while testing locally, hope that is not an issue as all the changes are made on the content.

1.
![Change1](https://user-images.githubusercontent.com/20442648/126077764-39fd8b78-524f-407d-bc39-c93167bd10a7.PNG)

2.
![Change2](https://user-images.githubusercontent.com/20442648/126077766-9dd7dc61-ef06-41d0-a7e5-cfd179ece0cd.PNG)

3.
![Change3](https://user-images.githubusercontent.com/20442648/126077767-2c2e05e4-09fc-403a-a18e-9b108651a5f8.PNG)

4.
![Change4](https://user-images.githubusercontent.com/20442648/126077769-ad755db6-3afa-457b-b95c-9f6c6281f828.PNG)

5.
![Change5](https://user-images.githubusercontent.com/20442648/126077770-a7759dee-7f90-4b9e-a07c-4dec4ca934d0.PNG)

6.
![Change6](https://user-images.githubusercontent.com/20442648/126077772-0474e58d-c0c8-4156-b56f-808d225c38e7.PNG)

7.
![Change7](https://user-images.githubusercontent.com/20442648/126077774-d48382a7-5379-49a4-a8d2-b478fabf0bf0.PNG)

8.
![Change8](https://user-images.githubusercontent.com/20442648/126077777-fd743825-8dd7-4cb9-a22c-233e5fa085a6.PNG)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61818

Reviewed By: dzhulgakov

Differential Revision: D29775606

Pulled By: mrshenli

fbshipit-source-id: 3f3bfdeede341f784b72dfe55da9ba8bdce1192a
2021-07-19 15:06:22 -07:00
43c5dc40c5 Port signbit to structured kernel (#57936)
Summary:
Port signbit to structured kernel
Related https://github.com/pytorch/pytorch/issues/55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57936

Reviewed By: mrshenli

Differential Revision: D29764904

Pulled By: ezyang

fbshipit-source-id: 758f5f085d0cc84af612726f667cde15d615053b
2021-07-19 15:03:10 -07:00
44d3267103 Remote whitespace introduced by #61438 (#61863)
Summary:
Since it's a one-character change it feels faster to fix than revert

Verified with `(! git --no-pager grep -In '[[:blank:]]$' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' || (echo "The above lines have trailing spaces; please remove them"; false))` from the lint check

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61863

Reviewed By: ZolotukhinM

Differential Revision: D29772353

Pulled By: dzhulgakov

fbshipit-source-id: 33cb887f25e344b420f645a8e4dc8d0d7462e9ef
2021-07-19 14:57:10 -07:00
26d17ddc9f Exclude wrapper tensors from functorch in the native::resize_output fastpath (#61846)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61846

Related to #61485.

native::resize_output has a fast path that avoids dispatching.
Unfortunately, we have a number of CompositeImplicitAutograd operations
that directly call out= variants of operators. These
CompositeImplicitAutograd operators (e.g. torch.linalg.norm) end up
calling native::resize_output. That function, combined with how
functorch uses a mode-dispatch key to wrap tensors, causes silently
incorrect behavior in functorch (more details are available in #61485).

The very easy short-term fix is to have `native::resize_output` always
dispatch on a Tensor (and skip the fast-path) if a Tensor is a functorch
wrapped Tensor. More long-term fixes are proposed in the issue.

Test Plan:
- I checked that this change fixes torch.linalg.norm and other operators
with this problem in functorch.
- We're not testing functorch in pytorch/pytorch CI but we probably will
in the near future.
- wait for PyTorch tests.

Reviewed By: ezyang

Differential Revision: D29764293

Pulled By: zou3519

fbshipit-source-id: c7afcb0bd3bc77d2ba716d5b11f62830d8bdf0a9
2021-07-19 13:50:37 -07:00
f912889726 Remove unnecessary Ubuntu version checks (#61738)
Summary:
PR https://github.com/pytorch/pytorch/issues/5401 missed another Ubuntu version check in `cmake/MiscCheck.cmake`.

The check for available functions added by https://github.com/pytorch/pytorch/issues/5401 are already present below the code snippet that this PR deletes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61738

Reviewed By: mrshenli

Differential Revision: D29757525

Pulled By: ezyang

fbshipit-source-id: 7f5f9312284973481a8b8a2b9c51cc09774722e9
2021-07-19 13:04:24 -07:00
1b0a7f3887 Always use fast gradcheck for LayerNorm 3d_no_affine_large_feature (#61848)
Summary:
Due to the introduction of a test from https://github.com/pytorch/pytorch/pull/59987/files, slow gradcheck has been failing intermittently (timing out/getting killed).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61848

Reviewed By: mrshenli

Differential Revision: D29765773

Pulled By: soulitzer

fbshipit-source-id: d78bee758cab76f26ba9f54925c42d4825db9449
2021-07-19 12:33:55 -07:00
094abf5fd0 [BE] Include a unit test for Save Operator with db_options
Summary: A test case that triggers db_options with the save operator is missing.

Test Plan: buck test

Differential Revision: D29642719

fbshipit-source-id: 72b7374d40430398abac26dfe91538550525384d
2021-07-19 12:22:59 -07:00
e389650f10 Upgrade CPUFallback for loops (#61722)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61722

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29715862

fbshipit-source-id: 21e12c71e28e542abc649890f72938801d9d7d7a
2021-07-19 11:27:26 -07:00
04bd9d7577 [DDP] Add API to get model parameters in hook (#61637)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61637

To support running optimizer as a communication hook, add API to
retrieve the model parameters.

The API returns a `dict[idx -> tensor]` where `idx` is the intra bucket index of gradient tensor and thus the same index of `perParameterTensors`. The API can be used as follows to retrieve the model parameters:

```
per_param_grad_tensors = bucket.get_per_parameter_tensors()
idx_to_model_params = bucket.get_grad_index_to_variable_mapping()
for grad_tensor_idx, model_param in idx_to_model_params.items():
    self.assertEqual(model_param.grad, per_param_grad_tensors[grad_tensor_idx])
```

This provides a way for comm. hook developer to retrieve model parameters within a hook. In the next diffs, we will use this to run optimizer as a DDP comm. hook.
ghstack-source-id: 133768666

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D29691418

fbshipit-source-id: 4bfa824768a5850f73ee330017e2bcc29ceb7edc
2021-07-19 11:24:54 -07:00
66c8d21d7b Update progress and error reporting in clang-tidy (#61672)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61672

This PR adds a progress bar to clang-tidy, and updates how it threads error codes (when run in parallel). The progress bar is disabled on GHA because backspace escape codes are not supported.

It also adds a `--quiet` flag to the script.

Screenshot of progress bar:
<img width="955" alt="Screen Shot 2021-07-14 at 3 17 11 PM" src="https://user-images.githubusercontent.com/40111357/125686114-a8a7c154-3e65-43a8-aa8f-c1fb14d51d27.png">

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D29763848

Pulled By: 1ntEgr8

fbshipit-source-id: cbd352593b279f279911bc3bb8d5ed54abd5f1d5
2021-07-19 11:19:06 -07:00
24a6eb3fda ENH Adds tests and docs for 2d & 3d modules that already support no batch (#61262)
Summary:
Toward https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61262

Reviewed By: mrshenli

Differential Revision: D29660554

Pulled By: jbschlosser

fbshipit-source-id: d5e3dc7096fcf8621bce4a1063d521b84092e0ca
2021-07-19 11:12:28 -07:00
4f46943e3d enable check trace when tracing a mkldnn model (#61241)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43039, when tracing a MKLDNN model with setting **check_trace=True**, there has an error: **RuntimeError: unsupported memory format option Preserve**, this PR is to solve this problem.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61241

Reviewed By: anjali411

Differential Revision: D29737365

Pulled By: suo

fbshipit-source-id: e8f7f124bc6256f10b9d29969e0c65d332514625
2021-07-19 11:03:53 -07:00
75b68def63 fmin has been ported to the structured kernel, removing the old implementation (#60810)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60810

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29449377

Pulled By: ezyang

fbshipit-source-id: 0b43562d0dfe81dfa401268f1d12e0d2c3c9f420
2021-07-19 10:20:06 -07:00
b526080d89 fmod: Port to structured (#60809)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60809

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29449378

Pulled By: ezyang

fbshipit-source-id: 70f6fa95988f753eec4aefa60a60dddb7f3d744e
2021-07-19 10:18:57 -07:00
b65ddef000 for shared-memory handles, use an atomic counter, instead of potentially colliding random numbers (#60978)
Summary:
These handles, used for shared-memory tensors, can collide.

E.g. see https://github.com/pytorch/pytorch/issues/60626#issuecomment-869919018

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60978

Reviewed By: mruberry

Differential Revision: D29479291

Pulled By: ezyang

fbshipit-source-id: 408ef1817768f007ad4795b286482809ea43467c
2021-07-19 09:56:43 -07:00
ac5a40e068 Fix benchmark's import module and remove its usage of tools.stats.scribe (#61808)
Summary:
There're a few convoluted logic here to fix the `benchmarks`'s import module for pytest.

- On one hand, if we want to use `tools.stats.scribe` from `benchmarks`, we will need to add `benchmarks/__init__.py`
- On the other hand, if we add `benchmarks/__init__.py`, it breaks how `pytest` is working on searching what is the system built `torch` instead of the local source module `../torch`
  - That's why we are seeing errors like

```
ImportError while loading conftest '/var/lib/jenkins/workspace/benchmarks/fastrnns/conftest.py'.
benchmarks/fastrnns/__init__.py:1: in <module>
    from .cells import *  # noqa: F403
benchmarks/fastrnns/cells.py:1: in <module>
    import torch
torch/__init__.py:29: in <module>
    from .torch_version import __version__ as __version__
torch/torch_version.py:9: in <module>
    from .version import __version__ as internal_version
E   ModuleNotFoundError: No module named 'torch.version'
```

Instead, this PR changed the usage of `upload_scribe.py` back to its original form using HTTP request, and only circleci for now will continue the this path using the `python benchmarks/upload_scribe.py`, which is gated by `if [[ -z "${GITHUB_ACTIONS}" ]];`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61808

Reviewed By: seemethere

Differential Revision: D29750188

Pulled By: zhouzhuojie

fbshipit-source-id: 3b842b21978f2159001e9c6c1cdc96c5a0515f2e
2021-07-19 09:45:05 -07:00
9c3346c8aa reduce max_num_threads for complex double ops in reduce_kernel (#61438)
Summary:
reduce_kernel currently has a all-purpose MAX_NUM_THREADS of 512, which causes register spilling in various kernel instantiations for the various ops that use it as a template (ReduceLogicKernel, ReduceMinMaxKernel, ReduceMomentKernel, ReduceNormKernel, and ReduceSumProdKernel). This is a coarse first attempt at mitigating spillage by reducing max_num_threads to 256 for all complex double ops, which are by far the most common and egregious offenders, while keeping it 512 for the other normal ops, the large majority of which are fine. Besides complex double ops, the remaining kernels which exhibit lmem usage are ReduceMinMax double, long, and BFloat16; ReduceMomentKernel BFloat16, Half, float, and double; and ReduceNorm double.

The proposed fix manages to eliminate lmem usage and massively improve runtime (by 3-5x) for complex double ops. All other ops are unaffected and have the same runtime; if they used lmem before, they still do now. We would still strongly recommend further testing of input shapes and ops as well as looking into if there's a cleaner approach to doing this.

We tested the following ops for both complex double instantiations, as well as testing torch.max and torch.argmax with doubles to make sure they didn't break. We didn't include the double instantiations in the timing data, since they remain unchanged post-fix vs pre-fix. Timing data for the complex double ops below (all done on Nvidia Titan-V GPU):

torch.mean:
![MeanTimingData](https://user-images.githubusercontent.com/22803332/125005623-0f424800-e011-11eb-864e-8419485a9c76.PNG)

torch.linalg.norm:
![NormTimingData](https://user-images.githubusercontent.com/22803332/125005649-179a8300-e011-11eb-96e1-54e18c85a336.PNG)

torch.sum:
![SumTimingData](https://user-images.githubusercontent.com/22803332/125005655-1b2e0a00-e011-11eb-928e-ee5941608fb2.PNG)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61438

Reviewed By: mrshenli

Differential Revision: D29756863

Pulled By: ngimel

fbshipit-source-id: 4c4635df58af9313966ff1df1095f7e15a39bb07
2021-07-19 09:38:22 -07:00
d565b3e9ea Migrate libtorch to GHA (#61774)
Summary:
Makes progress on https://github.com/pytorch/pytorch/issues/57686

Tested in https://github.com/pytorch/pytorch/pull/61775:

periodic 11.3 libtorch: https://github.com/pytorch/pytorch/pull/61775/checks?check_run_id=3088529584?check_suite_focus=True
10.2: https://github.com/pytorch/pytorch/pull/61775/checks?check_run_id=3089965441
11.1: https://github.com/pytorch/pytorch/pull/61775/checks?check_run_id=3089965697

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61774

Reviewed By: samestep

Differential Revision: D29745793

Pulled By: janeyx99

fbshipit-source-id: a17f561051b1e5eccf4918137a4b5df19308a716
2021-07-19 09:21:52 -07:00
3e3acf8a9a Minor documentation fixes (#61785)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61785

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29746648

Pulled By: andwgu

fbshipit-source-id: 435bbd8894f2ae5c814b9acd562673affea1daf6
2021-07-19 09:01:29 -07:00
813b887dad Fix indent (#61784)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61784

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29746647

Pulled By: andwgu

fbshipit-source-id: f42d3a0864a8291941d695a0cf575a5737cbb35c
2021-07-19 09:00:25 -07:00
cyy
a26a9f8b75 zero initialize some members and other fixes (#59915)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59915

Reviewed By: soulitzer

Differential Revision: D29106684

Pulled By: ezyang

fbshipit-source-id: 713cbdf10866017ee715ee89ec82acb592c769b6
2021-07-19 07:36:26 -07:00
0263865bfe [Docs] Fix docs for torch.chunk (#61097)
Summary:
torch.chunk may return less than the requested number of chunks silently if some undocumented division constraints are not met. The functionality that users expect is provided by another function: torch.tensor_split

This has led to confusion countless times and who knows how many systems out there are fragile because of this.
My changes describe the discrepancy, show an example and direct users to the usually preferred function.

Issues mentioning this problem:
https://github.com/pytorch/pytorch/issues/9382
https://github.com/torch/torch7/issues/617

I considered documenting the constraint for when an unexpected number of chunks may be returned (it is  chunks*chunks>input.size[dim] ), so that users could quickly tell if their code may be affected. Please let me know if you think this should be in the docs or not.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61097

Reviewed By: heitorschueroff

Differential Revision: D29660280

Pulled By: ezyang

fbshipit-source-id: 675086bc8a8882c1685a50a2c083ae8dd1854384
2021-07-19 06:13:04 -07:00
552eab7935 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D29758833

fbshipit-source-id: e07673bb19f15865bf5810910224f3f37a759db7
2021-07-19 04:12:20 -07:00
593e8f41ca [jit] Fixed a bug in the pass that replaces cat with the variadic op (#61795)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61795

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D29748785

Pulled By: navahgar

fbshipit-source-id: df5b84c35f007718c92a21a0b44a231e6d346918
2021-07-18 21:38:30 -07:00
ff82394fc0 Apply saved tensor hooks (#60975)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60975

Fixes #58512

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D29466227

fbshipit-source-id: c1498d52173aceb29638b5c4f521ac05356a5958
2021-07-18 08:42:51 -07:00
eefbff773b ns for fx: add utils for l2 error and cosine similarity (#61380)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61380

Adds convenience wrappers for l2 error and cosine similarity
to NS utils.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extend_logger_results_with_comparison
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29600354

fbshipit-source-id: 670c44a44df7f345884cacf26ed3c885edbe9977
2021-07-17 20:53:43 -07:00
2a2bc1fc8a ns for fx: add fqn to results, when present (#61377)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61377

Both the quantization tracer and the NS tracer record
`_node_name_to_scope`, which contains the mapping from
node name to FQN.

This PR adds the FQN information to the NS results, so that it is
more convenient for users to attribute a NS result to the corresponding
module in their model.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extract_weights_fqn
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_match_activations_fqn
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_shadow_activations_fqn
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29600349

fbshipit-source-id: df489e03daff97dd380f59c83ffdc2b0012a0a53
2021-07-17 20:53:41 -07:00
7449f49a4c ns for fx: return results in execution order (#61360)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61360

By default, NS graph matching matches from the end of the graph
to the start.  This PR reverses the returned results so that
the outputs of the NS APIs are in the order of execution, making
it easier to analyze.

Test Plan:
```
python test/test_quantization.py TestFXGraphMatcher.test_results_order
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29600348

fbshipit-source-id: c9fa4a3748db27c1788eebf803f35221e6fc8701
2021-07-17 20:53:39 -07:00
2b2928c5ca ns for fx: improve error messages for graph matching (#61359)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61359

Makes the error messages when graph matching easier to read
for users.

Test Plan:
```
// inspect the exceptions in the following two tests and verify
// that they are easier to read than before
python test/test_quantization.py TestFXGraphMatcher.test_matching_failure_node_count
python test/test_quantization.py TestFXGraphMatcher.test_matching_failure_node_type
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29600353

fbshipit-source-id: ec6640fe6cab7b62a697e4ee385be182f2918fd4
2021-07-17 20:53:38 -07:00
ddf6d6cc14 ns for fx: clean up override_qengines and copy TODO in tests (#61358)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61358

1. changes override_qengines to require fbgemm instead, these
tests are not testing any qengine specific logic so better to just
run them once
2. removes a TODO about copy.deepcopy which we do not plan to address

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29600352

fbshipit-source-id: 4db08f0080233ff46d7679928c83e41c5ba21ec8
2021-07-17 20:53:36 -07:00
cf6f5efb39 ns for fx: test case for comparing fp32 vs fp32_prepared shadow (#61357)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61357

Adds a test case for comparing fp32 vs fp32_prepared in a shadow model.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29600350

fbshipit-source-id: ff7518ce8a789ab7469cb22044f1d7c697e2cd04
2021-07-17 20:53:34 -07:00
4acd14da02 ns for fx: preserve observers and fake_quants through passes (#61323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61323

Before this PR, all observers and fake quants were silently removed
when adding loggers with NS. This is problematic for QAT models because
we need the fake quants to run in order to properly capture intermediate
outputs.

This PR fixes the issue by preserving the observers throughout
the passes which add loggers.  In detail:
* for each quantization module or fusion, add additional patterns with that fusion and an observer/fake_quant at the end
* remove the places in the logger model creation code which removed observers
* add unit testing that QAT numerics do not change after adding loggers

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_loggers_preserve_qat_numerics
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_shadow_loggers_preserve_qat_numerics
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29600351

fbshipit-source-id: 5f25118b79eb47860c49bca882de6a8eae7a4456
2021-07-17 20:53:33 -07:00
a70505cdbd ns for fx: support comparing fp32 vs fp32_prepared, except shadowed (#61129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61129

Adds support the comparing fp32 model (without quantization) to a
fp32 model prepared with quantization. The main missing feature was
handling conv-bn fusion, since this fusion for PTQ happens outside
of quantization patterns.

Adds testing for this case for comparing weights and comparing
activations

Adds a TODO for also handling this for shadow activations, we need to
first stop removing observers in graph passes before we can add
this support, will be in a future PR.

Test Plan:
```
python test/test_quantization.py TestFXGraphMatcherModels.test_mobilenet_v2
python test/test_quantization.py TestFXGraphMatcherModels.test_mobilenet_v2_qat
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_compare_activations_conv
```

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D29520009

fbshipit-source-id: f63484a998f1424bd9cacf5d823b82b2edfea1ae
2021-07-17 20:52:23 -07:00
e117d94e21 Wrapped create_type_hint in try/except block so that NormalizeArgs doesn't fail if create_type_hint fails (#61524)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61524

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D29746106

Pulled By: Chillee

fbshipit-source-id: d08c0030f40b504e8f7a61fc0ee432f1515a0e6d
2021-07-17 16:13:17 -07:00
59ca89dca8 Fix scribe logs again (#61768)
Summary:
revert the revert of 3624d75 with additional fix in https://github.com/pytorch/pytorch/pull/61764

Got the corrent logs sent to lambda

```
...
,"21721":"OK","21722":"OK","21723":"OK","21724":"OK","21725":"OK","21726":"OK","21727":"OK","21728":"OK","21729":"OK","21730":"OK","21731":"OK","21732":"OK","21733":"OK","21734":"OK","21735":"OK","21736":"OK","21737":"OK","21738":"OK","21739":"OK","21740":"OK","21741":"OK","21742":"OK","21743":"OK","21744":"OK","21745":"OK","21746":"OK","21747":"OK","21748":"OK","21749":"OK","21750":"OK","21751":"OK","21752":"OK","21753":"OK","21754":"OK","21755":"OK","21756":"OK","21757":"OK","21758":"OK","21759":"OK","21760":"OK","21761":"OK","21762":"OK","21763":"OK","21764":"OK","21765":"OK","21766":"OK","21767":"OK","21768":"OK","21769":"OK","21770":"OK","21771":"OK","21772":"OK","21773":"OK","21774":"OK","21775":"OK","21776":"OK","21777":"OK","21778":"OK","21779":"OK","21780":"OK","21781":"OK","21782":"OK","21783":"OK","21784":"OK","21785":"OK","21786":"OK","21787":"OK","21788":"OK","21789":"OK","21790":"OK","21791":"OK","21792":"OK","21793":"OK","21794":"OK","21795":"OK","21796":"OK","21797":"OK","21798":"OK","21799":"OK","21800":"OK","21801":"OK","21802":"OK","21803":"OK","21804":"OK","21805":"OK","21806":"OK","21807":"OK","21808":"OK","21809":"OK","21810":"OK","21811":"OK","21812":"OK","21813":"OK","21814":"OK","21815":"OK","21816":"OK","21817":"OK","21818":"OK","21819":"OK","21820":"OK","21821":"OK","21822":"OK","21823":"OK","21824":"OK","21825":"OK","21826":"OK"}}

class StartProcessesTest:
    tests: 14 failed: 0 skipped: 0 errored: 0
    run_time: 4.86 seconds
    avg_time: 0.35 seconds
    median_time: 0.01 seconds
    3 longest tests:
        test_function_large_ret_val time: 1.55 seconds
        test_pcontext_wait time: 1.11 seconds
        test_void_function time: 1.03 seconds

...
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61768

Reviewed By: janeyx99

Differential Revision: D29735781

Pulled By: zhouzhuojie

fbshipit-source-id: 6882e334f5108d20773ad66d5300cd37eb509ded
2021-07-16 17:56:16 -07:00
311f1f275a Update clang-tidy-linux64 (#61797)
Summary:
Update clang-tidy linux hash to match one build for 7ae60a49ac by  https://github.com/pytorch/test-infra/runs/3090057893

Fixes `The downloaded binary is not what was expected!`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61797

Reviewed By: zhouzhuojie

Differential Revision: D29746840

Pulled By: malfet

fbshipit-source-id: a7388952b04ba12f250003c32629d57b8d5ffed8
2021-07-16 17:23:21 -07:00
4337650c91 Fixing a bug in .to for qtensors so scale/zp move too (#61576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61576

This also fixed an issue in the
empty_quantized_per_channel_affine function where specifying a device
that was different from the device of scale/zp would result in a
mismatched qtensor

Test Plan:
python test/test_quantization.py
testquantizedtensor.test_per_channel_to_device

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29675461

fbshipit-source-id: 0e2ff20f0f581dae94ee01d3ceead2a620cd26b9
2021-07-16 17:16:24 -07:00
cb6841b263 Fix ConnectionError in download_mnist (#61789)
Summary:
Fixes issues like the following error. Note that `ConnectionResetError` is a subclass of `ConnectionError`.

```
+ python tools/download_mnist.py --quiet -d test/cpp/api/mnist
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz ...
Traceback (most recent call last):
  File "tools/download_mnist.py", line 93, in <module>
    main()
  File "tools/download_mnist.py", line 86, in main
    download(path, resource, options.quiet)
  File "tools/download_mnist.py", line 42, in download
    urlretrieve(url, destination_path, reporthook=hook)
  File "/opt/conda/lib/python3.6/urllib/request.py", line 277, in urlretrieve
    block = fp.read(bs)
  File "/opt/conda/lib/python3.6/http/client.py", line 463, in read
    n = self.readinto(b)
  File "/opt/conda/lib/python3.6/http/client.py", line 507, in readinto
    n = self.fp.readinto(b)
  File "/opt/conda/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
ConnectionResetError: [Errno 104] Connection reset by peer
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61789

Reviewed By: dreiss

Differential Revision: D29745459

Pulled By: zhouzhuojie

fbshipit-source-id: 2deb668bd74478f32bd01704d4362e8a4d95087b
2021-07-16 17:02:13 -07:00
4e2fe9718d flatten operation (resnet50) (#61265)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61265

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D29626383

Pulled By: migeed-z

fbshipit-source-id: 107769fc14f1fad295a93a10e84235f25ae17357
2021-07-16 16:06:10 -07:00
4479aa8838 Remove all the code that constructs metadata.pkl file (#61760)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61760

Remove all code that related to metadata.pkl creation including creating metadata.pkl, converting data from extra/mobile_info.json and extra/producer_info.json to metadata.pkl file.

Test Plan:
## Run buck commands:
  - `cd` into `fbcode` then `buck build //caffe2/caffe2/fb/init:init`
  - `cd` into `fbcode` then `buck build //caffe2/torch/fb/init:init`
  - `buck build //xplat/caffe2:torch_mobile_core`

## Export a PyTorch lite/mobile model
- Run: `flow-cli canary users.xcheng16.pytorch_trainer.TestWorkflow --run-as-secure-group ai_mobile_platform --buck-target //fblearner/flow/projects/users/xcheng16:workflow` under `fbcode` on devserver.
-  Resulted Model: metadata.pkl no longer exist
{F632063134}

Reviewed By: guangy10

Differential Revision: D29702943

fbshipit-source-id: ec7964f4aa3a8e09ccc20b1a7e2232f85931dd81
2021-07-16 15:39:07 -07:00
7ac8054d5a Use better defaults in the clang-tidy wrapper script (#61651)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61651

This PR sets some QOL defaults to the clang-tidy wrapper script and refactors how defaults are set.

- Runs in parallel
- Custom executable (prints an error message to users asking them to install our custom build)
- `generate_build_files` can now be run as a script

Test Plan: Imported from OSS

Reviewed By: malfet, zhouzhuojie

Differential Revision: D29743661

Pulled By: 1ntEgr8

fbshipit-source-id: 256617d006a03e4ab96091593f5bb80c9b31a2d1
2021-07-16 14:58:19 -07:00
dc0d1612e1 ENH Updates docs and tests for activation modules for no-batch dims (#61300)
Summary:
Towards https://github.com/pytorch/pytorch/issues/60585

This PR updates docs and tests for activation modules that already support no-batch dims.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61300

Reviewed By: heitorschueroff

Differential Revision: D29660543

Pulled By: jbschlosser

fbshipit-source-id: 5edad45f7e9995aca6c3403469668e6e1cbb94b6
2021-07-16 14:42:18 -07:00
6a085648d8 add aten symbols for amin and amax (#61550)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61550

Test Plan: Imported from OSS

Reviewed By: asuhan

Differential Revision: D29668123

Pulled By: bdhirsh

fbshipit-source-id: b111e1c6c6d2beddb220cad70d95954756a3ee9d
2021-07-16 14:06:00 -07:00
4e94e84f65 Type annotate torch.nn.Module ctor (#61334)
Summary:
Annotate generic types
Fix some type violations
Override `_modules` and `_parameters` in `Sequential`, `ModuleList`, `ModuleDict`, etc

Fixes https://github.com/pytorch/pytorch/issues/45497

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61334

Reviewed By: albanD

Differential Revision: D29579533

Pulled By: malfet

fbshipit-source-id: 5cd8ca918b260ca35cfdd873dee8851d39d17de2
2021-07-16 13:59:06 -07:00
ee2f2ec9a5 Revert D29687143: [3/N] Nnapi Backend Delegate Preprocess: Basic OSS Test
Test Plan: revert-hammer

Differential Revision:
D29687143 (5798a00aa4)

Original commit changeset: 9ba9e57f7f85

fbshipit-source-id: 6a672c76a04366b35c492698ae5b39fd4dd1785f
2021-07-16 13:32:51 -07:00
a07d3dc34c Pin macos mkl conda version to fix the cmake build (#61773)
Summary:
Fixes macos build error in master, recently mkl had a upgrade.

CircleCI error:
https://app.circleci.com/pipelines/github/pytorch/pytorch/351645/workflows/d22421c1-bb8f-48fd-9efd-7c0d77f0b083/jobs/14815607

```
Jul 16 11:43:05 CMake Error at /Users/distiller/workspace/miniconda3/lib/cmake/mkl/MKLConfig.cmake:456 (list):
Jul 16 11:43:05   list does not recognize sub-command PREPEND
Jul 16 11:43:05 Call Stack (most recent call first):
Jul 16 11:43:05   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/share/cmake/Caffe2/public/mkl.cmake:1 (find_package)
Jul 16 11:43:05   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:109 (include)
Jul 16 11:43:05   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
Jul 16 11:43:05   CMakeLists.txt:5 (find_package)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61773

Reviewed By: soulitzer

Differential Revision: D29736742

Pulled By: zhouzhuojie

fbshipit-source-id: 68c5244196f7f7562a6c202157c4ccdcfcb64337
2021-07-16 13:15:04 -07:00
8ad584823f add shortcircuit in isclose for zero tolerances (#61529)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61412.

Large integers gave false positives, because the comparison always takes place in floating point dtypes. This happens, because their integer precision is lower than the range of an integer dtype with the same number of bits.

For non-extremal values, `isclose` is defined by [this equation]:

```python
abs(a - b) <= atol + rtol * abs(b)
```

For `rtol == 0 and atol==0`, this is equivalent to `a == b`. This PR goes for the low hanging fruit and adds a shortcut for this case that falls back to an actual equality check.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61529

Reviewed By: gchanan

Differential Revision: D29707534

Pulled By: mruberry

fbshipit-source-id: 71b8c4901e9cd4f366442437e52032b0d3002b4a
2021-07-16 12:48:16 -07:00
612632556d Fix torch.median crash on empty tensor (#61698)
Summary:
`torch.tensor([]).median()` returns `nan`, which mimics the behavior of `np.median`
Add test to `TestReductions.test_median_corner_cases`
Fixes https://github.com/pytorch/pytorch/issues/61656

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61698

Reviewed By: heitorschueroff

Differential Revision: D29706912

Pulled By: malfet

fbshipit-source-id: ea5f58327fbff371f3fb8786b269430c7a10d05f
2021-07-16 12:36:18 -07:00
3fd9dcf934 Move non-libtorch scheduled linux CI to GHA (#61732)
Summary:
Move non-libtorch Linux 11.3 scheduled CI job to GHA.
Libtorch builds will be migrated here: https://github.com/pytorch/pytorch/pull/61774

Successful run: https://github.com/pytorch/pytorch/actions/runs/1035592487

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61732

Reviewed By: seemethere

Differential Revision: D29735637

Pulled By: janeyx99

fbshipit-source-id: dce13370b218ae7833483fdaa00137db95e27c98
2021-07-16 12:16:58 -07:00
287603f51c Revert D29698486: [pytorch][PR] Remove torch._bmm and remove torch.bmm deterministic arg documentation
Test Plan: revert-hammer

Differential Revision:
D29698486 (328606699f)

Original commit changeset: 5af2d3803ab1

fbshipit-source-id: ce954c13196b1fb8277d61a686ac351d3bf13903
2021-07-16 11:02:09 -07:00
5798a00aa4 [3/N] Nnapi Backend Delegate Preprocess: Basic OSS Test (#61594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61594

### Summary:
Added a unit test for the Nnapi delegate's preprocess() function. The
function was previously tested locally, but now a basic test is
added for OSS.

See https://github.com/pytorch/pytorch/pull/61499 for preprocess
implementation. See D29647123 for local testing.

**TODO:**
Add more comprehensive tests.
Add tests for model execution, after the Nnapi delegate's initialization
and execution is implemented T91991928.

**CMakeLists.txt:**
Added a library for the Nnapi delegate
- Explicit linking of torch_python is necessary for the Nnapi delegate's use of pybind

**test_backends.py:**
Added a test for lowering to Nnapi
- Based off https://github.com/pytorch/pytorch/blob/master/test/test_nnapi.py
- Only differences are the loading of the nnapi backend library and the need to change dtype from float64 to float32

### Test Plan:
Running `python test/test_jit.py TestBackendsWithCompiler -v` succeeds. Also saved and examined the model file locally.

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D29687143

fbshipit-source-id: 9ba9e57f7f856e5ac15e13527f6178d613b32802
2021-07-16 11:00:38 -07:00
349f2f767c Modernize to default constructor and nullptr in torch (#61735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61735

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29716659

fbshipit-source-id: ec2a0a0b7e55d2e50b1d35f0b651bd40675ae7e8
2021-07-16 10:51:13 -07:00
736bb26746 use rand over empty in flaky test (#61710)
Summary:
Fixes https://github.com/pytorch/pytorch/pull/61694#issuecomment-880641635. cc krshrimali.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61710

Reviewed By: anjali411

Differential Revision: D29719660

Pulled By: mruberry

fbshipit-source-id: 589574a039ad431acc7d095d452f0b3e52260208
2021-07-16 10:50:05 -07:00
efeacc0779 [Static Runtime] Fixed visibility of ProcessedNode class and a newly added function (#61729)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61729

Test Plan: Imported from OSS

Reviewed By: hlu1

Differential Revision: D29719644

Pulled By: navahgar

fbshipit-source-id: 27a77b2a281d1a8a48e2a9df1c254f62c0e2e7ef
2021-07-16 10:42:02 -07:00
6fa80f7f9f Refactor embedded_interpreter registration to be friendly to python case (#59991)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59991

add a registration mechanism whereby on loading the embedded interpreter library, a registration function is called which links up the symbols it provides with torch::deploy.

Test Plan: local and CI deploy tests pass

Reviewed By: suo

Differential Revision: D28764436

fbshipit-source-id: 88416bd098be306f887cc9fd2d65d29199439bc4
2021-07-16 10:33:58 -07:00
6349bde572 [4/N] Nnapi backend delegation preprocess: List Tensors & Comment Updates (#61752)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61752

Updated Android NNAPI preprocess, so that it can accept both single Tensor inputs and Tensor List inputs.
- The inputs are not real data, but input parameters for shape, dtype, quantization, and dimorder that are bundled as a Tensor. Comments were updated to make this clearer.
- In the future, preprocess will also accept a dedicated NnapiArg object.

Compile_spec should have the following format:
{"forward": {"inputs": at::Tensor}} OR {"forward": {"inputs": c10::List< at::Tensor >}}
Example input Tensor:
torch.tensor([[1.0, -1.0, 2.0, -2.0]]).unsqueeze(-1).unsqueeze(-1)

### Testing
OSS testing is blocked by https://github.com/pytorch/pytorch/pull/61594. Testing was done locally in D29726948
TODO: Add OSS tests for single Tensor and Tensor List inputs.
ghstack-source-id: 133683735

Test Plan:
OSS testing is blocked by https://github.com/pytorch/pytorch/pull/61594. Testing was done locally in D29726948.
TODO: Add OSS tests for single Tensor and Tensor List inputs.

Reviewed By: iseeyuan

Differential Revision: D29726432

fbshipit-source-id: 08de70578f37681bda365f9776a1c96030257e7a
2021-07-16 10:17:56 -07:00
328606699f Remove torch._bmm and remove torch.bmm deterministic arg documentation (#61629)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61571

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61629

Reviewed By: zou3519

Differential Revision: D29698486

Pulled By: albanD

fbshipit-source-id: 5af2d3803ab1eb093616bcfc7e074d8b57ef6958
2021-07-16 09:18:34 -07:00
28150fd0c8 [static_runtime] Implement aten::linear (#61595)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61595

Add out variant wrapper for `aten::linear` in the static runtime

Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D29684236

fbshipit-source-id: 94df6d7267b3f269b2cadf065f207648777147df
2021-07-16 08:55:43 -07:00
3624d75864 Revert D29703523: [pytorch][PR] Fix scribe logs
Test Plan: revert-hammer

Differential Revision:
D29703523 (eb5a56fb74)

Original commit changeset: 829ad3630d35

fbshipit-source-id: 2b2196d58791b995a008b6d810b3248ed27e7d94
2021-07-16 08:50:13 -07:00
b963607d50 [nnc] Insert alloc/free at global scope (#61725)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61725

Alloc/free inside a loop isn't really an optimization, and furthermore
it breaks some attempted optimization in the llvm backend: we use alloca for
small allocations, which is efficient since alloca is on the stack, but there's
no corresponding free, so we leak tons of stack.  I hit this while building an
rfactor buffer inside a very deeply nested loop.
ghstack-source-id: 133627310

Test Plan:
Unit test which simulates use of a temp buffer in a deeply nested
loop.

Reviewed By: navahgar

Differential Revision: D29533364

fbshipit-source-id: c321f4cb05304cfb9146afe32edc4567b623412e
2021-07-16 08:42:24 -07:00
4c3d9cfe03 [BE] Fix flaky test_ddp_model_diff_across_ranks test (#61546)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61546

Closes https://github.com/pytorch/pytorch/issues/60661

Fixes this flaky test by using blocking wait instead of async error handling, and performs a gloo-based barrier with higher timeout at the end of test which avoids issues with Barrier.sync. This also allows us to remove this test from the `skip_return_code_checks` list.
ghstack-source-id: 133657107

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D29663884

fbshipit-source-id: 9f0df085b1968f6a7e2c7ae2f06b6dcd4838a87e
2021-07-16 08:37:02 -07:00
f1114364ad [DDP] Enhance comm hook docs (#61677)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61677

Specify return type more clearly, 2) Misc fixes
ghstack-source-id: 133657895

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D29701384

fbshipit-source-id: 7f77b99065bd2977153f397745e07b75bbdd7a94
2021-07-16 08:35:49 -07:00
39ce29efe0 Refactor metadata_map with flattened key/value pair (#61731)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61731

In the previous diff, metadata_map contains mobile_info.json and producer_info.json. We need to parse json each time when we log the required information. This diff helps to flatten the content in the files into key/value pair. It allows logger to directly loop through the metadata_map and log the information.

Test Plan:
Since 3D Photo is disabled for current FB app, testings are only performed on CC scanner.

# Test On CC Scanner
**Test content with LOG(WARNING)**
{P429123273}

**Scuba Logger Output**

1. MOBILE_MODULE_LOAD_STATS

{F631884673}

2.  MOBILE_MODULE_STATS

{F631884787}

Reviewed By: xcheng16

Differential Revision: D29690702

fbshipit-source-id: 1db5a1f5c25e98e5b2f1cc254fd880dfdfa025e2
2021-07-16 00:37:17 -07:00
00a7f55b6e Apply for MOBILE_MODULE_STATS Logging (#61600)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61600

This diff changes the module.h constructor, and removes metadata_. It refactors all the constructors caller side, and creates a getter & setting for metadata_. MOBILE_MODULE_STATS reads the metadata from mobile::Module, and pass it into logger.

Test Plan:
Since 3D Photo is disabled for current FB app, testings are only performed on CC scanner.

# Test On CC Scanner
**Test content with LOG(WARNING)**
{P428930572}

**Scuba Logger Output**

{F631761194}

Reviewed By: xcheng16

Differential Revision: D29673184

fbshipit-source-id: 962e0d7b06a07caaa0c695a4ac58b885fd1505ea
2021-07-16 00:37:15 -07:00
fc710eecc0 Apply for MOBILE_MODULE_LOAD_STATS Logging (#61480)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61480

Append mobile_info.json and producer_info.json into extra_files and parse the jsons from “model_info.json” in onExitLoadModel.
ghstack-source-id: 133327912

Test Plan:
# Test On CC Scanner
**Test content with LOG(WARNING)**
{P428339274}

**Scuba Logger Output**
{F631024095}

# Test On 3D Photo
**Test content with LOG(WARNING)**
{P428340927}

**Scuba Logger Output**

{F631026739}

Reviewed By: xcheng16, guangy10

Differential Revision: D29608014

fbshipit-source-id: abc39c44b947632fd4349de8a432649e84284a87
2021-07-16 00:36:09 -07:00
56d562e790 [DDP] fix test_ddp_inference (#61666)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61666

Closes https://github.com/pytorch/pytorch/issues/61481. Fixes this
test by removing section that uses only torch.no_grad() and doesn't call
model.eval(). For SyncBN, need to call model.eval() otherwise SyncBN will
assume it is in training mode, which does collective calls in the forward pass
and does not work for inference.
ghstack-source-id: 133657549

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D29699444

fbshipit-source-id: 03ccb296dd9cb56729cd23e91c7f50b72fcf3adf
2021-07-16 00:25:02 -07:00
7e1f01d4c0 Alias for polygamma (#59691)
Summary:
See https://github.com/pytorch/pytorch/issues/50345

cc: mruberry kshitij12345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59691

Reviewed By: gchanan

Differential Revision: D29707514

Pulled By: mruberry

fbshipit-source-id: 40c15e1fda3d9f7013977b0f36a77b228dda6aa5
2021-07-16 00:06:27 -07:00
f008e8d32d Remove test_out, test_variant_consistency_eager skips for addmv; fixed before (#61579)
Summary:
This PR:

1. Removes `test_out` skip: it's not needed anymore after it was fixed in https://github.com/pytorch/pytorch/pull/55746. This should also close https://github.com/pytorch/pytorch/issues/55589.
2. Removes `test_variant_consistency_eager` skip, it was added by mistake in https://github.com/pytorch/pytorch/issues/55771.
3. Refines `sample_inputs_addmv` function, the updated function should now be cleaner and easy to read.

cc: mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61579

Reviewed By: gchanan

Differential Revision: D29709674

Pulled By: mruberry

fbshipit-source-id: 9b975c024777efdd33c6b9444b0b36e0eab85c03
2021-07-15 22:35:03 -07:00
843c42ffd8 [nnc] Refactored test macros and updated compress buffer tests to use them (#61716)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61716

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D29715754

Pulled By: navahgar

fbshipit-source-id: c400a58b7f393c0f93e5a25f118403124f8834b0
2021-07-15 21:17:14 -07:00
d01837081d [nnc] Cleaned up compress buffer tests to use BufHandle instead of Buf (#61715)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61715

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D29715755

Pulled By: navahgar

fbshipit-source-id: 453adac8f5b13263c39d96b6b4086425a01bae54
2021-07-15 21:15:23 -07:00
eb5a56fb74 Fix scribe logs (#61675)
Summary:
Related to https://github.com/pytorch/pytorch/issues/61632

This PR adds
- refactoring of scribe related code to scribe.py
- changed the `render_test_results` job to always use the `linux.2xlarge` runner
- if SCRIBE_GRAPHQL_ACCESS_TOKEN is empty, try boto3 instead

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61675

Reviewed By: seemethere

Differential Revision: D29703523

Pulled By: zhouzhuojie

fbshipit-source-id: 829ad3630d3500a498b41aa458ce6539aaeae938
2021-07-15 19:27:58 -07:00
127562a0ed Fix some sign comparisons (#61618)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61618

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29688193

fbshipit-source-id: ea7a6b6be8b25d4a0668e744688f96bbbb144dc7
2021-07-15 18:28:41 -07:00
e6860ba508 Fix some sign comparisons and a loop (#61663)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61663

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29696766

fbshipit-source-id: eb5a77bd0cfafeb6209d274f121f10dca20d461a
2021-07-15 18:27:42 -07:00
9d955abcdb Fix test_reductions when no SciPy is installed (#61699)
Summary:
Also, use skipIfNoSciPy decorator instead of implicit `unittest.skipIf`

This fixes regression introduced by https://github.com/pytorch/pytorch/pull/52565

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61699

Reviewed By: seemethere

Differential Revision: D29706938

Pulled By: malfet

fbshipit-source-id: 0b63c3ddadfa7f68bed994b71cadf68976d3b396
2021-07-15 15:57:11 -07:00
968a01a94a [special] migrate xlogy (#60641)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60641

Reviewed By: gchanan

Differential Revision: D29709306

Pulled By: mruberry

fbshipit-source-id: e8a5f64009a895a25618637de40b55cf36b8f794
2021-07-15 15:32:09 -07:00
1ce3281a6d Revert D29361872: [pytorch][PR] det_backward: more robust and with complex support
Test Plan: revert-hammer

Differential Revision:
D29361872 (fce85480b9)

Original commit changeset: b1f0fec7e3ac

fbshipit-source-id: feffa74ad65b0b294e0a9b0ee72d245393421f70
2021-07-15 15:26:00 -07:00
3a0801f960 [skip ci] Fix "arugment" typos (#61459)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61455.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61459

Reviewed By: soulitzer

Differential Revision: D29636559

Pulled By: samestep

fbshipit-source-id: 9ad65265c0491d9e81bb303abe3a07c6843bfa4a
2021-07-15 15:20:18 -07:00
e5fcc903d6 torch: Make __version__ better with comparisons (#61556)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61556

Prior to 1.10.0 `torch.__version__` was stored as a str and so many did
comparisons against `torch.__version__` as if it were a str. In order to not
break them we have TorchVersion which masquerades as a str while also
having the ability to compare against both packaging.version.Version as
well as tuples of values, eg. (1, 2, 1)

Examples:
  Comparing a TorchVersion object to a Version object
```
TorchVersion('1.10.0a') > Version('1.10.0a')
```
  Comparing a TorchVersion object to a Tuple object
```
TorchVersion('1.10.0a') > (1, 2)    # 1.2
TorchVersion('1.10.0a') > (1, 2, 1) # 1.2.1
```

  Comparing a TorchVersion object against a string
```
TorchVersion('1.10.0a') > '1.2'
TorchVersion('1.10.0a') > '1.2.1'
```

Resolves https://github.com/pytorch/pytorch/issues/61540

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D29671234

Pulled By: seemethere

fbshipit-source-id: 6044805918723b4aca60bbec4b5aafc1189eaad7
2021-07-15 15:12:09 -07:00
0ea29a6ccb Analysing time taken by gradgrad checks for Spectral Functions (#60435)
Summary:
**Description:** `SpectralFuncInfo` defines decorator mentioning: "gradgrad is quite slow". This PR re-analyses that statement since things have changed with gradient tests.

**Test times:** https://github.com/pytorch/pytorch/pull/60435#issuecomment-865658177

**Follow-up** of https://github.com/pytorch/pytorch/pull/57802

cc: mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60435

Reviewed By: gchanan

Differential Revision: D29707444

Pulled By: mruberry

fbshipit-source-id: 444b4863bac8556c7e8fcc8ff58d81a91bd96a21
2021-07-15 14:02:03 -07:00
4ff121f58d Add complex64 dtype for OpInfo Reference testing (#61627)
Summary:
This PR adds `complex64` dtype testing, following conversation from: pytorch/xla#3019 ([comment](https://github.com/pytorch/xla/pull/3019#discussion_r666754943)). Original PR that added OpInfo reference testing: https://github.com/pytorch/pytorch/pull/59369.

cc: mruberry kshitij12345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61627

Reviewed By: gchanan

Differential Revision: D29710560

Pulled By: mruberry

fbshipit-source-id: 55b2e5ff47f031069335a0c75a45d4f4885ef9ac
2021-07-15 13:40:37 -07:00
e2c3049e2a Delete stable-sort-only-works-on-cpu warning (#61685)
Summary:
stable GPU sorting is implemented by https://github.com/pytorch/pytorch/pull/56821
Fixes https://github.com/pytorch/pytorch/issues/61682

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61685

Reviewed By: gchanan

Differential Revision: D29704864

Pulled By: malfet

fbshipit-source-id: 3a5aa24bf6507be63844fe6016fb9e3c682f4d84
2021-07-15 13:34:41 -07:00
e098e9000b Compare DDP static graph (C++ core) with legacy DDP forward and backward delay. (#61507)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61507

Benchmark Python-only DDP vs production C++ based DistributedDataParallel.
- Implemented a pure python DDP: PythonDDP with support of SYNC and ASYNC reduction
- Added compare_ddp to measure the difference in forward and backward step

Kudos on Shen and Yi for the great idea.

Test Plan:
Test on DevGPUS with 2 CUDA devices.

$python compare_ddp.py

Python only DDP has slightly better (-1%) forward performance and slightly slower (2%-20%) backward performance.
This suggested that we need to keep C++ Core since the maximum latency increase can be 20%. See README.md for details.
Imported from OSS

Differential Revision:
D29685364
D29685364

Reviewed By: mrshenli

Pulled By: bowangbj

fbshipit-source-id: 429e4473fac0ec4c70d6db12d946d2636dd6477a
2021-07-15 12:52:22 -07:00
7a3b05ea6d Fix hardswish inplace op for strided tensor with skipped elements (#61622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61622

Hardswish inplace op would return incorrect response for strided tensor inputs that skip elements like a slice. Create a contiguous tensor and copy elements back to return the correct answer

Test Plan: Internal CI tests

Reviewed By: kimishpatel

Differential Revision: D29689745

fbshipit-source-id: 11618a8d865f550f6b70637345f9ebc3e5676f11
2021-07-15 11:50:27 -07:00
fce85480b9 det_backward: more robust and with complex support (#58195)
Summary:
As per title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58195

Reviewed By: albanD

Differential Revision: D29361872

Pulled By: anjali411

fbshipit-source-id: b1f0fec7e3ac52acd1481bcc878cc0c1d07c1852
2021-07-15 11:04:42 -07:00
bd360ebe6f [nnc] Added a new API to distribute loop and all its parents (#61293)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61293

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D29560008

Pulled By: navahgar

fbshipit-source-id: e4e459184f20b1872bc242ba8626d0a6df29e810
2021-07-15 10:28:20 -07:00
76f097466e [nnc] Added a new API to compress all buffers in a given statement (#61087)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61087

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D29506677

Pulled By: navahgar

fbshipit-source-id: 63583fd5a0e42c0096ddf08d5b96bc680ea8a44e
2021-07-15 10:28:18 -07:00
2908d3eb45 [nnc] Modified the semantics of reorder in using permutation (#61085)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61085

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D29506679

Pulled By: navahgar

fbshipit-source-id: f674aedff8175b9947404fd2164a0b4f57a71e93
2021-07-15 10:28:16 -07:00
7177509380 Revert [DDP] Support not all outputs used in loss calculation (#61497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61497

Reverts [DDP] Support not all outputs used in loss calculation
ghstack-source-id: 133589153

Test Plan: CI, ping authors to run their workflow on this diff

Reviewed By: zhaojuanmao

Differential Revision: D29642892

fbshipit-source-id: 81a15b9ab3329602f34d3758bb0799005a053d4f
2021-07-15 10:28:14 -07:00
25f9c35dd7 Revert [DDP] Support for multiple backwards (#61401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61401

Reverts https://github.com/pytorch/pytorch/pull/59359, which is causing a few internal issues in DDP training. We will evaluate the internal use cases and reland it after reconsidering the design.

Also moves `prepare_for_backward` back into forward pass instead of DDP Sink for `find_unused_parameters`. This ensures that hooks will always fire in the backwards pass, which is behavior that internal training workloads rely on. Calling `prepare_for_backward` in DDPSink autograd function is not the best solution since other autograd threads may have been executing which can cause races.

ghstack-source-id: 133589152

Test Plan: CI

Reviewed By: pritamdamania87

Differential Revision: D29608948

fbshipit-source-id: f060f41cd103573ddff8da50cdbb6c56768dab46
2021-07-15 10:28:13 -07:00
38ac9e69aa Back out "[DDP] Disable reducer hooks from running outside of DDP backwards." (#61399)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61399

Reverts https://github.com/pytorch/pytorch/pull/60921
Original commit changeset: fef76a0dd295
ghstack-source-id: 133581300

Test Plan: CI

Differential Revision: D29594262

fbshipit-source-id: a308d3f10dbbb2169d9a7f60f2f28f139185ed1f
2021-07-15 10:27:02 -07:00
a50a389ca6 Revert D29701479: [pytorch][PR] Remove _broadcast_object() from ZeroRedundancyOptimizer
Test Plan: revert-hammer

Differential Revision:
D29701479 (9b5d9b4049)

Original commit changeset: c8d5f9057b32

fbshipit-source-id: 35ab1f399513fb9d1c4e73b1fa906e559d2a6994
2021-07-15 10:03:08 -07:00
aa01a7a61c Fix for get_buffer(): check buffers by name instead of value (#61429)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61242

Previous code was wrongly checking if a tensor is a buffer in a module by comparing values; fix compares names instead.
Docs need some updating as well- current plan is to bump that to a separate PR, but I'm happy to do it here as well if preferred.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61429

Reviewed By: gchanan

Differential Revision: D29712341

Pulled By: jbschlosser

fbshipit-source-id: 41f29ab746505e60f13de42a9053a6770a3aac22
2021-07-15 09:55:09 -07:00
5407108533 CopyBackward: Remove redundant src_device and unnecessary copy=True (#60025)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60025

`to` already copies unconditionally if `src.device() != options.device()` so
specifying the copy argument is unnecessary.

`src.device()` is also completely equivalent to `src.options().device()` so
storing both is redundant.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D29698627

Pulled By: albanD

fbshipit-source-id: eb091d39b71db688e6bcbb33a227c01b94b432bb
2021-07-15 09:48:03 -07:00
da667e2d5f Add .github for CODEOWNERS (#61598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61598

I'd like to be notified on changes to the github actions workflows, add
this so I can be notified.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: janeyx99, samestep

Differential Revision: D29685783

Pulled By: seemethere

fbshipit-source-id: 865a1360a24633ef5074e43b8277838a0eef94f6
2021-07-15 09:39:12 -07:00
8afb65b6c5 changed launch bounds for upsample_linear1d fwd, bwd from 1024 to 512 (#61307)
Summary:
Changed launch bounds for upsample_linear1d_out_frame and upsample_linear1d_backward_out_frame from 1024 to 512. Shows performance improvement as shown below. Does not completely eliminate lmem usage (lmem usage goes from 40-48 bytes to 8-16 bytes), not sure why.

Timing data (using Nvidia Titan-V GPU):
![UpsampleLinear1dTimingData](https://user-images.githubusercontent.com/22803332/124677708-e20d6280-de75-11eb-8187-fb50ec89dc50.PNG)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61307

Reviewed By: heitorschueroff

Differential Revision: D29662137

Pulled By: ngimel

fbshipit-source-id: 9653672ee17f25b75a02f295f388a78327091431
2021-07-15 09:19:16 -07:00
ee5a97de11 Register Saved Tensors hooks (#60663)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60663

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D29466223

fbshipit-source-id: 65dc3a935c18a0e6b93a37e24543c696e6ae0321
2021-07-15 08:09:55 -07:00
94965212e5 [static runtime] Use at::allclose to test NNC sigmoid (#61566)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61566

This change uses `at::allclose` to compare results from sigmoid functions (CPU/NNC) instead of `Tensor::equals` due to numerical errors occurring between them.

Test Plan:
I confirmed that the flakiness of `StaticRuntime.Sigmoid` is gone with this change:

```
[djang@devvm1999.ftw0 ~/fbsource/fbcode] buck-out/gen/caffe2/benchmarks/static_runtime/static_runtime_cpptest -v 3 --gtest_filter=StaticRuntime.Sigmoid --gtest_repeat=100 &> output.txt
[djang@devvm1999.ftw0 ~/fbsource/fbcode] grep PASSED output.txt  | wc
    100     500    2100
```

Reviewed By: bertmaher

Differential Revision: D29671203

fbshipit-source-id: 99a7b16d18ea047c9aad444f36d8368f9d0b088d
2021-07-14 19:48:00 -07:00
9b5d9b4049 Remove _broadcast_object() from ZeroRedundancyOptimizer (#61539)
Summary:
Revised version of https://github.com/pytorch/pytorch/issues/60573.

**Overview:**
This makes two changes:
- It introduces a `map_location` argument to `broadcast_object_list()`. The argument specifies the device to load tensors contained in objects received from the broadcast. This change requires modifying the implementation of `_object_to_tensor()` and `_tensor_to_object()` to use `torch.save()` and torch.load()` respectively.
- It removes all calls to `_broadcast_object()` in `ZeroRedundancyOptimizer` and the corresponding test file in favor of `broadcast_object_list()`.

The default value of `map_location` is `None`, in which case `_object_to_tensor()` and hence `broadcast_object_list()` preserve their original behavior. Namely, contained tensors are loaded to their original device.

In `consolidate_state_dict()`, I specify `map_location=torch.device("cpu")` instead of `self._default_device`. This slightly changes the behavior from before when using `_broadcast_object()`. The reason I do so is that it saves one GPU to CPU data transfer since the action immediately after receiving the broadcasted `local_state_dict` is to copy it to CPU.

Explicitly, if `map_location=self._default_device`, then the data transfer path assuming NCCL backend is as follows:
`source GPU --[before serialize]--> source CPU --[before broadcast]--> source GPU --[broadcast]--> destination GPU --[before deserialize]--> destination CPU --[deserialize]--> destination GPU --[copy]--> destination CPU`
Hence, by setting `map_location=torch.device("cpu")` instead, the suffix becomes:
`destination CPU --[deserialize]--> destination CPU --[copy]--> destination CPU`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61539

Test Plan:
I added a test `test_broadcast_object_list_map_location()` that checks for both `map_location` as CPU and GPU that (1) tensors contained in broadcasted objects are appropriately loaded onto the specified device and (2) that the contents of the tensors are correct.

The existing `ZeroRedundancyOptimizer` tests pass.
```
gpurun4 python test/distributed/optim/test_zero_redundancy_optimizer.py
```

The existing `broadcast_object_list()` test passes:
```
touch /tmp/barrier && TEMP_DIR="/tmp" BACKEND="nccl" WORLD_SIZE="2" gpurun python test/distributed/test_distributed_fork.py -- TestDistBackendWithFork.test_broadcast_object_list
```

Reviewed By: zou3519

Differential Revision: D29701479

Pulled By: andwgu

fbshipit-source-id: c8d5f9057b32e5e9f40e8edc5b2cc25fb21414a9
2021-07-14 17:36:30 -07:00
e3d5619ff0 [pytorch][profiler] Fix division by 0 in computeFlops (#61676)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61676

Reviewed By: ilia-cher

Differential Revision: D29646067

fbshipit-source-id: d872221bbde5384a9e397e68c1e5b0664d913b42
2021-07-14 16:38:19 -07:00
70e94bb1dd Avoid redefining __BYTE_ORDER (#60346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60346

Introduction:
In order to support the Intel SGX platform, we have to avoid redefining __BYTE_ORDER.
Solution:
Check if the platform is SGX and avoid the redefinition.

Test Plan: Run the PyTorch tests.

Reviewed By: h397wang, malfet

Differential Revision: D29022626

fbshipit-source-id: 801c3a75c202d192a3808eb5d54b875094499996
2021-07-14 14:55:04 -07:00
a9c3580080 Grammatical update of tech docs (#61547)
Summary:
Added some minor grammatical updates to the 'Complex Numbers' docs.

![Screenshot (180)](https://user-images.githubusercontent.com/75036632/125342884-0b952500-e373-11eb-9e63-410ff31e6c21.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61547

Reviewed By: zou3519

Differential Revision: D29677361

Pulled By: H-Huang

fbshipit-source-id: 78222310a755911192905a8f52aa0ae325900006
2021-07-14 14:01:59 -07:00
5a5c7f563d add trainer hook functions (#60785)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60785

This pr adds hook functions for the trainers.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D29697299

Pulled By: gcramer23

fbshipit-source-id: cc3b991aad0d32503fbfc5acd4fca8b404e74c0f
2021-07-14 13:19:17 -07:00
304c02ee44 refactor ps benchmark (#60784)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60784

This pr refactors the ps benchmark for modular trainers.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D29697291

Pulled By: gcramer23

fbshipit-source-id: 64579a1f5326d3cd9f32936dcf53bc243d54b71d
2021-07-14 13:19:13 -07:00
7d2ea9a8f7 Release GIL as much as possible in dist_autograd pybind. (#61593)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61593

Following the pattern in https://github.com/pytorch/pytorch/pull/61588
to avoid deadlocks as much as possible.
ghstack-source-id: 133497897

Test Plan: waitforbuildbot

Reviewed By: mrshenli

Differential Revision: D29683451

fbshipit-source-id: 1951622eb964f57a551a9c0d46ad0ab24b66c458
2021-07-14 13:19:10 -07:00
5ebc7c9f97 Avoid holding GIL while calling retrieveContext. (#61588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61588

As part of debugging https://github.com/pytorch/pytorch/issues/60290,
we discovered the following deadlock:

```
Thread 79 (Thread 0x7f52ff7fe700 (LWP 205437)):
#0  pthread_cond_timedwait@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x0000564880199152 in PyCOND_TIMEDWAIT (cond=0x564880346080 <gil_cond>, mut=0x564880346100 <gil_mutex>, us=5000) at /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/condvar.h:103
#2  take_gil (tstate=0x7f5254005ef0) at /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/ceval_gil.h:224
#3  0x0000564880217b62 in PyEval_AcquireThread (tstate=0x7f5254005ef0) at /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/ceval.c:278
#4  0x00007f557d54aabd in pybind11::gil_scoped_acquire::gil_scoped_acquire() () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so
#5  0x00007f557da7792f in (anonymous namespace)::concrete_decref_fn(c10::impl::PyInterpreter const*, _object*) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so
#6  0x00007f5560dadba6 in c10::TensorImpl::release_resources() () from /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so
#7  0x00007f5574c885bc in std::_Sp_counted_ptr_inplace<torch::distributed::autograd::DistAutogradContext, std::allocator<torch::distributed::autograd::DistAutogradContext>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so
#8  0x00007f5574c815e9 in std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<long const, std::shared_ptr<torch::distributed::autograd::DistAutogradContext> >, false> > >::_M_deallocate_node(std::__detail::_Hash_node<std::pair<long const, std::shared_ptr<torch::distributed::autograd::DistAutogradContext> >, false>*) [clone .isra.325] () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so
#9  0x00007f5574c81bf1 in torch::distributed::autograd::DistAutogradContainer::eraseContextIdAndReset(torch::distributed::autograd::DistAutogradContainer::ContextsShard&, long) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so
#10 0x00007f5574c86e83 in torch::distributed::autograd::DistAutogradContainer::releaseContextIfPresent(long) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so
#11 0x00007f5574cc6395 in torch::distributed::rpc::RequestCallbackNoPython::processCleanupAutogradContextReq(torch::distributed::rpc::RpcCommandBase&) const () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so
#12 0x00007f5574cccf15 in torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so

Thread 72 (Thread 0x7f53077fe700 (LWP 205412)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f55bc62adbd in __GI___pthread_mutex_lock (mutex=0x564884396440) at ../nptl/pthread_mutex_lock.c:80
#2  0x00007f5574c82a2f in torch::distributed::autograd::DistAutogradContainer::retrieveContext(long) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so
#3  0x00007f557de9bb2f in pybind11::cpp_function::initialize<torch::distributed::autograd::(anonymous namespace)::dist_autograd_init(_object*, _object*)::{lambda(long)#11}, pybind11::dict, long, pybind11::name, pybind11::scope, pybind11::sibling, char [931], pybind11::arg>(torch::distributed::autograd::(anonymous namespace)::dist_autograd_init(_object*, _object*)::{lambda(long)#11}&&, pybind11::dict (*)(long), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [931], pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so

```

Basically Thread 72, holds GIL and tries to acquire the lock for
DistAutogradContainer to perform a lookup on a map. On the other hand,
Thread 79 holds the lock on DistAutogradContainer to remove a Tensor and as
part of TensorImpl destructor, concrete_decref_fn is called which waits for
GIL. As a result, we have a deadlock.

To fix this issue, I've ensured we release GIL when we call `retrieveContext`
and acquire it later when needed.
ghstack-source-id: 133493659

Test Plan: waitforbuildbot

Reviewed By: mrshenli

Differential Revision: D29682624

fbshipit-source-id: f68a1fb39040ca0447a26e456a97bce64af6b79c
2021-07-14 13:17:16 -07:00
f2adbff36e [Metal] Do not use read/write textures in concat shaders (#61074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61074

`read_write` textures are not available on some devices, such as iPhone 7. This prevents the concat op from functioning on those devices.

This diff rewrites the concat shaders such that they do not depend on `read_write` textures.

Test Plan:
Test on device: run squeezenet and/or the operator tests
```
arc focus2 pp-ios
```

Test on Mac
```
buck test pp-macos
```

Test specifically on iPhone7, either device or simulator.

Reviewed By: xta0

Differential Revision: D29501656

fbshipit-source-id: de4a059953ab4b0abf38b6ecb3f665323dcdbea1
2021-07-14 13:03:48 -07:00
80bdfd64c5 Skip Bfloat16 support when building for VSX (#61630)
Summary:
Copy-paste ifdef guard from vec256/vec256.h
Probably fixes https://github.com/pytorch/pytorch/issues/61575

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61630

Reviewed By: janeyx99

Differential Revision: D29690676

Pulled By: malfet

fbshipit-source-id: f6d91eadab74bcbcb1dc9854ae1b98a0dccacd14
2021-07-14 13:02:29 -07:00
43a2f7c26a [TensorExpr] Do not fuse float16 values. (#61569)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61569

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D29672564

Pulled By: ZolotukhinM

fbshipit-source-id: fe64ec38209d43f8246bcb6c397b64a28cbd86fa
2021-07-14 12:53:59 -07:00
ab27399566 Make broadcast_object_list accept a device parameter. (#61305)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61305

Part I (this PR): Add dist_device argument to broadcast_object_list API
Part II: andwgu@ will deprecate _broadcast_object with the newly introduced API
	 Also include the changes to _object_to_tensor()/_tensor_to_object() with PR 60573

Context: https://github.com/pytorch/pytorch/issues/60062

Test Plan:
Run the following on DevGpus with two cuda devices

$python setup.py develop    --- run this build on DevGPU
$BACKEND='nccl' WORLD_SIZE=2 with-proxy  python test/distributed/test_distributed_fork.py  TestDistBackendWithFork.test_broadcast_object_list --v
$BACKEND='gloo' WORLD_SIZE=2 with-proxy  python test/distributed/test_distributed_fork.py  TestDistBackendWithFork.test_broadcast_object_list --v

Build with distributed on: USE_DISTRIBUTE=1 python setup.py develop
Test on CPU devvm:

$ with-proxy python test/distributed/optim/test_zero_redundancy_optimizer.py

Imported from OSS

Differential Revision:
D29566538
D29566538

Reviewed By: iramazanli, mrshenli

Pulled By: bowangbj

fbshipit-source-id: 0bea52442551c5194acba85eadda16ba2ec4b6ef
2021-07-14 11:43:17 -07:00
9b3cbeaf7d [pruner] fix activation handles logic (#61592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61592

Add activation handles for each layer (stored in a list), so they can each be removed.

We don't remove them in the `convert` in eager mode because we aren't modifying output/input layer dimensions. We will need this in Fx mode though.
ghstack-source-id: 133497376

Test Plan:
Added some tests to make sure `model(x)` runs without error.

`buck test mode/dev-nosan //caffe2/test:ao --
TestBasePruner`

https://pxl.cl/1LBf4

Reviewed By: z-a-f

Differential Revision: D29682789

fbshipit-source-id: 9185702736e5f7f4320754ffef441610738ac154
2021-07-14 11:07:23 -07:00
343cb276b0 [pytorch] Add broadcasting support to add_relu kernel (#61584)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61584

add_relu is not working with broadcasting. This registers a scalar version of add_relu in native_functions that casts to tensor before calling the regular function. TensorIterator handles broadcasting analogously to existing add.
ghstack-source-id: 133480068

Test Plan: python3 test/test_nn.py TestAddRelu

Reviewed By: kimishpatel

Differential Revision: D29641768

fbshipit-source-id: 1b0ecfdb7eaf44afed83c9e9e74160493c048cbc
2021-07-14 10:32:20 -07:00
c23db9327a Smart Decay for Adam - Caffe2 (#61548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61548

We want to decay learning parameters properly.  Previously this was not done when a parameter is absent from a minibatch.  We fix this by keeping track of missed minibatches and making decay catch up accordingly.

The exponential moving averages (EMA) for the first and second moments used in Adam are updated only for parameters seen in a minibatch.  Actually, for these parameters, 0 should be added to the EMAs and the EMAs should then be decayed by multiplying by beta1 and beta2 respectively.

To avoid the computational overhead of touching every parameter for every minibatch, we:
* keep track of the last time a parameter is seen
* instead of decaying the EMAs by multiplying by beta1 and beta2, we multiply by beta1^k and beta2^k, where k is the number of minibatches since the parameter was last seen
* we calculate the amount of momentum that would have been discharged over the missed minibatches and update the weight accordingly.

Differential Revision: D29654246

fbshipit-source-id: 7a6cd7966eb1f31116d99dfce79a78b2d3ee9e3e
2021-07-14 10:22:38 -07:00
58adaaba60 Enable C2 load rate limiter [2/n] (#61551)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61551

We aim to enable rate limiter in C2 load, with a fix bandwidth limit.
This diff update LoadOp to pass down the manifold db options.

Test Plan:
```
buck test mode/opt caffe2/caffe2/python/operator_test:load_save_test
```

Differential Revision: D29639102

fbshipit-source-id: cf69549adadf4c7f12a8a2b7f3ca39092cab4b99
2021-07-14 08:27:05 -07:00
57feb35474 Refactor non-joined process computation (#61555)
Summary:
**Overview:**
This refactors the computation on non-joined processes relating to the join context manager. The concept was inspired by a comment from pritamdamania.

**Changes:**
This introduces a `_Joinable` abstract base class, which requires a `_join_hook()` method and `_join_device()` and `_join_process_group()` property methods. Any class that we want to be compatible with the generic join context manager should inherit from `_Joinable` and implement `_join_hook()`, `_join_device()`, and `_join_process_group()`. (The `device` and `process_group` information has been moved from `_JoinHook` to `_Joinable`.)

The generic join context manager now takes in a `List[_Joinable]` instead of `List[_JoinHook]`. The motivation for this is that previously, by passing the `_JoinHook`s into the context manager, the class providing a `_JoinHook` can modify the context manager's behavior, but the context manager cannot modify the class's behavior. This is solved by giving the context manager a reference to the class's instance.

This implementation reserves the field `_join_config` in every `_Joinable` to store a `_JoinConfig` instance, which holds all dynamic fields needed from the `_Joinable` for the join context manager: `enable`, `throw_on_early_termination`, and `is_first_joinable`. ("dynamic" here means that for a given `_Joinable` instance, the values for those fields may change across different join context usages.) In particular, these fields are needed to implement a method `notify_join_context()`, which encapsulates the computation performed on non-joined processes relating to the join context manager --- (1) the all-reduce to indicate that the process has not yet joined and (2) the all-reduce to check whether to throw an exception if `throw_on_uneven_inputs=True`. The idea is that every `_Joinable` class only needs to make a call to `notify_join_context()` before its per-iteration collective communications; it is a simple one-line addition.

Only the first `_Joinable` instance passed into the context manager actually performs the collective communications in `notify_join_context()`. In that case, the method returns an async work handle for the initial all-reduce indicating that the process not yet joined. Otherwise, the method returns `None`. This conditional logic is handled internally without additional input from the user.

**New API:**
Now, the example usage would look like:
```
ddp_model = DistributedDataParallel(...)
zero_optim = ZeroRedundancyOptimizer(ddp_model.parameters(), ...)
with _Join([ddp_model, zero_optim]):
    ...
```
Any arguments meant for a join hook (e.g. `divide_by_initial_world_size`) must be specified as keyword arguments. For example:
```
with _Join([ddp_model, zero_optim], divide_by_initial_world_size=False):
    ...
```
They will be forwarded to every `_join_hook()` function via `**kwargs`. This creates a clear separation between the variables needed by the context manager (`enable` and `throw_on_early_termination`) and those needed by the `_Joinable` class (e.g. `divide_by_initial_world_size`).

**Recap:**
After this change, the relevant information to use the generic join context manager looks like the following (omitting prefix `_` from names):
- Suppose we have a class `C` (e.g. `DistributedDataParallel`) that we want to be able to use the `Join` context.
- We make `C` inherit from `Joinable` and implement `join_hook() -> JoinHook`, `join_device()`, and `join_process_group()`.
- To implement `join_hook()`, we define a `CJoinHook` class inheriting from `JoinHook` and implement `main_hook()` and `post_hook()` as needed.
- We locate a place before `C`'s per-iteration collective communications and add a call to `Join.notify_join_context()`.
- We call `Joinable.__init__(self)` in `C`'s constructor.
- The `C.join_config` field will be used internally by the context manager. This does not affect `C`'s serializability.
- Run time arguments for `C`'s join hook can be passed in as keyword arguments to the context manager: `with Join([C()], arg1=..., arg2=...):`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61555

Test Plan:
I ran the existing DDP join tests:
```
touch /tmp/barrier && TEMP_DIR="/tmp" BACKEND="nccl" WORLD_SIZE="2" gpurun python test/distributed/test_distributed_fork.py -- TestDistBackendWithFork.test_ddp_uneven_inputs TestDistBackendWithFork.test_ddp_uneven_inputs_stop_iteration_sync_bn TestDistBackendWithFork.test_ddp_grad_div_uneven_inputs TestDistBackendWithFork.test_ddp_uneven_input_join_disable TestDistBackendWithFork.test_ddp_uneven_input_exception
```
I ran the ZeRO join tests:
```
gpurun4 python test/distributed/optim/test_zero_redundancy_optimizer.py TestZeroRedundancyOptimizerDistributed.test_zero_join_gpu TestZeroRedundancyOptimizerDistributed.test_zero_join_cpu
```

Reviewed By: zou3519

Differential Revision: D29690359

Pulled By: andwgu

fbshipit-source-id: 2950f78de755eb5fb13b95b803dd7c705879a9c7
2021-07-14 08:20:40 -07:00
03a79f43e3 adding support for index_select on quantized tensors (#61406)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61406

Only really needed to fix a few select functions so that they could work
for quantized tensors. Primarily creation and resizing of tensors
required a branch for quantized tensors. This doesn't work for
per_channel tensors

Test Plan:
```python test/test_quantization.py TestQuantizedTensor.test_qtensor_index_select_cuda```

```python test/test_quantization.py TestQuantizedTensor.test_qteensor_index_select_cpu```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29654446

fbshipit-source-id: 8fde9b2dd2c3e380cc330bbad71d6c4d2aeec0ab
2021-07-14 05:38:00 -07:00
a07b08136f [Static Runtime] Check unsupported up when enabling static runtime (#61613)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61613

Reviewed By: ajyu, movefast1990

Differential Revision: D29663466

fbshipit-source-id: d819903b7227f534c0a4fffa5eeea2b5c0c04750
2021-07-14 02:13:51 -07:00
ac64a41e8a [FX][docs] Add note about python set pitfall (#61597)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61597

Test Plan: Imported from OSS

Reviewed By: Chillee

Differential Revision: D29685735

Pulled By: jamesr66a

fbshipit-source-id: b5c5b53ff94fac1022f69b7c0ad4e4055b116029
2021-07-13 20:09:13 -07:00
9ade039593 fix test file not found issue (#61610)
Summary:
it should not error out if the file is not found.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61610

Reviewed By: samestep

Differential Revision: D29687958

Pulled By: walterddr

fbshipit-source-id: 17cacba8daa131df9bfb37fd58d6e4870ff75198
2021-07-13 17:50:50 -07:00
2ab8126e36 Add NewLib support (#60345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60345

Add support for NewLib, an embedded libc variant by re-using existing Android library stubs plus few NewLib specific guards

Problem:
Newlib is a C standard library intended for embedded use, similarly to how Android uses bionic. This causes some incompatibility with the math functions that are present in glibc but not Newlib (and some versions bionic) and makes porting PyTorch to environments such as SGX hard.

Solution:
Subscribed Newlib to the same fixes present for older versions of Android and add fixes specific for Newlib

Test Plan: Run the PyTorch tests.

Reviewed By: malfet

Differential Revision: D29022623

fbshipit-source-id: 028dd7ff9b3ee394371c275642c90c9ef108e639
2021-07-13 17:26:45 -07:00
8e6d8991b2 [torch/elastic] Fix the agent store key prefix used by workers (#61590)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61590

This PR fixes the bug where the state of the first run of a failed training job leaks to the secondary runs due to constant worker key prefix.
ghstack-source-id: 133494239

Test Plan: Run the existing integ tests.

Reviewed By: SciPioneer

Differential Revision: D29682743

fbshipit-source-id: d96ecadcfe5b6563225ee19f5d0776c7f935393a
2021-07-13 14:57:27 -07:00
523d6fe27c [PyTorch] Remove unnecessary std::string in Device.cpp (#61502)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61502

No reason not to use string literals here.
ghstack-source-id: 133449808

Test Plan: buildsizebot

Reviewed By: dhruvbird

Differential Revision: D29648079

fbshipit-source-id: 74ecf12283c2f196b4b3edb75c6bb1eeed51322e
2021-07-13 14:36:13 -07:00
72394aaf68 Bump addressable from 2.7.0 to 2.8.0 in /ios/TestApp (#61573)
Summary:
Bumps [addressable](https://github.com/sporkmonger/addressable) from 2.7.0 to 2.8.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/sporkmonger/addressable/blob/main/CHANGELOG.md">addressable's changelog</a>.</em></p>
<blockquote>
<h1>Addressable 2.8.0</h1>
<ul>
<li>fixes ReDoS vulnerability in Addressable::Template#match</li>
<li>no longer replaces <code>+</code> with spaces in queries for non-http(s) schemes</li>
<li>fixed encoding ipv6 literals</li>
<li>the <code>:compacted</code> flag for <code>normalized_query</code> now dedupes parameters</li>
<li>fix broken <code>escape_component</code> alias</li>
<li>dropping support for Ruby 2.0 and 2.1</li>
<li>adding Ruby 3.0 compatibility for development tasks</li>
<li>drop support for <code>rack-mount</code> and remove Addressable::Template#generate</li>
<li>performance improvements</li>
<li>switch CI/CD to GitHub Actions</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="6469a232c0"><code>6469a23</code></a> Updating gemspec again</li>
<li><a href="24336385de"><code>2433638</code></a> Merge branch 'main' of github.com:sporkmonger/addressable into main</li>
<li><a href="e9c76b8897"><code>e9c76b8</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sporkmonger/addressable/issues/378">https://github.com/pytorch/pytorch/issues/378</a> from ashmaroli/flat-map</li>
<li><a href="56c5cf7ece"><code>56c5cf7</code></a> Update the gemspec</li>
<li><a href="c1fed1ca0a"><code>c1fed1c</code></a> Require a non-vulnerable rake</li>
<li><a href="0d8a3127e3"><code>0d8a312</code></a> Adding note about ReDoS vulnerability</li>
<li><a href="89c76130ce"><code>89c7613</code></a> Merge branch 'template-regexp' into main</li>
<li><a href="cf8884f815"><code>cf8884f</code></a> Note about alias fix</li>
<li><a href="bb03f7112e"><code>bb03f71</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sporkmonger/addressable/issues/371">https://github.com/pytorch/pytorch/issues/371</a> from charleystran/add_missing_encode_component_doc_entry</li>
<li><a href="6d1d8094a6"><code>6d1d809</code></a> Adding note about :compacted normalization</li>
<li>Additional commits viewable in <a href="https://github.com/sporkmonger/addressable/compare/addressable-2.7.0...addressable-2.8.0">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=addressable&package-manager=bundler&previous-version=2.7.0&new-version=2.8.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

 ---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `dependabot rebase` will rebase this PR
- `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `dependabot merge` will merge this PR after your CI passes on it
- `dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `dependabot cancel merge` will cancel a previously requested merge and block automerging
- `dependabot reopen` will reopen this PR if it is closed
- `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
- `dependabot use these labels` will set the current labels as the default for future PRs for this repo and language
- `dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language
- `dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language
- `dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/pytorch/pytorch/network/alerts).

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61573

Reviewed By: xta0

Differential Revision: D29685329

Pulled By: seemethere

fbshipit-source-id: a43008155144a358950dc3ed1934fcc470b73c02
2021-07-13 14:30:33 -07:00
0751a41ab1 [quant] Input-Weight Equalization - ConvReLU support (#61350)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61350

Applied changes in convert to allow for ConvReLU2d layers

Initial Model: `x -> conv1 -> relu`

After fusion: `x -> convRelu2d`

After prepare: `x -> input_quant_obs -> input_eq_obs1 -> convRelu2d -> output_quant_obs1`

After equalization functions: `x -> mul -> input_quant_obs (scaled) -> convRelu2d -> output_quant_obs`

After convert: `x -> mul -> quantize_per_tensor -> quantized::convRelu2d -> dequantize`

Test Plan:
`python test/test_quantization.py TestEqualizeFx`

Initial Model:
```
ConvReluModel(
  (fc): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1))
  (relu): ReLU()
)
```

After prepare:
```
GraphModule(
  (x_activation_post_process_0): MinMaxObserver(min_val=5.960464477539063e-08, max_val=0.9999999403953552)
  (x_activation_post_process_0_equalization_process_0): _InputEqualizationObserver(
    (input_obs): PerChannelMinMaxObserver(min_val=tensor([1.1921e-07, 3.3379e-06, 5.9605e-08]), max_val=tensor([1.0000, 1.0000, 1.0000]))
  )
  (fc): ConvReLU2d(
    (0): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1))
    (1): ReLU()
  )
  (fc_activation_post_process_0): MinMaxObserver(min_val=0.0, max_val=1.2341605424880981)
)

graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {})
    %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {})
    return fc_activation_post_process_0
```

After equalization functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0,), kwargs = {})
    %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {})
    return fc_activation_post_process_0
```

After convert:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %fc_input_scale_0 : [#users=1] = get_attr[target=fc_input_scale_0]
    %fc_input_zero_point_0 : [#users=1] = get_attr[target=fc_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %fc_input_scale_0, %fc_input_zero_point_0, torch.quint8), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%quantize_per_tensor,), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%fc,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29638275

fbshipit-source-id: 40d4666a4451e132612ea38fdfeaaec177a1defb
2021-07-13 14:00:40 -07:00
b3e4dab45a [quant] Input-Weight Equalization - Conv convert support (#61287)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61287

Modifications to functions during convert() to support equalization. Note that this implementation does not work for connected F.conv2d layers yet.

Initial:
```
      w
      |
x -> conv -> y
```

After prepare:
```
                                         w
                                         |
                                  weight_quant_obs
                                         |
                                    weight_eq_obs
                                         |
x -> input_quant_obs -> input_eq_obs -> conv -> out_quant_obs -> y
```

After convert:
```
                scale, zero_point             w (scaled)
                       |                           |
x -> mul -> quantize_per_tensor (scaled) -> quantized::conv -> dequant -> y
      |
   eq_scale
```

Test Plan:
`python test/test_quantization.py TestEqualizeFx`

Initial model:
```
ConvModel(
  (conv): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1), bias=False)
)
```

After prepare:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {})
    %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {})
    %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {})
    return conv_activation_post_process_0
```

After equalization functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {})
    %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0,), kwargs = {})
    %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {})
    return conv_activation_post_process_0
```

After convert:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %conv_input_scale_0 : [#users=1] = get_attr[target=conv_input_scale_0]
    %conv_input_zero_point_0 : [#users=1] = get_attr[target=conv_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %conv_input_scale_0, %conv_input_zero_point_0, torch.quint8), kwargs = {})
    %conv : [#users=1] = call_module[target=conv](args = (%quantize_per_tensor,), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%conv,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29557055

fbshipit-source-id: dc9f44182e31fa362c43ad2dfe224e6f4e4a730e
2021-07-13 14:00:38 -07:00
77d36b657a [quant] Input-Weight Equalization - Conv prepare support (#61286)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61286

Modifies the prepare step to support conv layers during input-weight equalization and adds tests to make sure that the results are as expected.

Initial:
```
      w
      |
x -> conv -> y
```

After prepare:

```
                                         w
                                         |
                                  weight_quant_obs
                                         |
                                    weight_eq_obs
                                         |
x -> input_quant_obs -> input_eq_obs -> conv -> out_quant_obs -> y
```

Test Plan:
`python test/test_quantization.py TestEqualizeFx.test_input_weight_equalization_prepare`

Initial:
```
ConvModel(
  (conv): Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1), bias=False)
)
```

After prepare:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {})
    %conv : [#users=1] = call_module[target=conv](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {})
    %conv_activation_post_process_0 : [#users=1] = call_module[target=conv_activation_post_process_0](args = (%conv,), kwargs = {})
    return conv_activation_post_process_0
```

Imported from OSS

Reviewed By: supriyar

Differential Revision: D29557051

fbshipit-source-id: 25d1531645dfaf565f5c615e2ee850fcf96c7eb9
2021-07-13 14:00:36 -07:00
ce9cedd119 [quant] Input-Weight Equalization - Conv observer support (#61285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61285

Modifies observers to support conv layers and tests to make sure that the observers are returning the expected values for conv inputs.

Test Plan:
`python test/test_quantization.py TestEqualizeFx.test_input_weight_eq_observer`

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29557041

fbshipit-source-id: 5e43329f189ba352eb8b991f38bf37752eebb6e6
2021-07-13 13:59:23 -07:00
30e48bbeae Add neg bit (#56058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56058

User facing changes:
1. Adds a negative bit and corresponding new API (`is_neg()`,`resolve_neg()`)
2. `tensor.conj().imag` now returns a floating point tensor with neg bit set to 1 instead of a tensor with no notion of negative bit. Note that imag is still a view and all the view properties still hold for imag.

Non user facing changes:
1. Added a new Negative dispatch key and a backend fallback to handle it
2. Updated copy kernel to handle negative bit
3. Merged conjugate and negative bit fallback kernel
4. fixed https://github.com/pytorch/pytorch/issues/60478 (caused due to https://github.com/pytorch/pytorch/pull/54987)

Testing:
1. Added a new OpInfo based test `test_neg_view` (verifies that out-of-place and in-place operations work correctly for all operations when the input is a neg view tensor by checking the result against an actually negated tensor, verifies that autograd returns the same output for both neg view and actually negated tensors as well as it works fine when grad_out is a neg view).
2. Added a new test class containing `test_conj_view`, `test_neg_view`.

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D29636403

fbshipit-source-id: 12214c9dc4806c51850f4a72a109db9527c0ca63
2021-07-13 13:50:42 -07:00
60382de455 [torch] Set nproc_per_node to 1 (#61552)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61552

Set `nproc_per_node` to 1

Test Plan: unittests

Reviewed By: cbalioglu

Differential Revision: D29667056

fbshipit-source-id: 6601f66fec5e018c7737d909f8c71642451abb29
2021-07-13 13:35:25 -07:00
437e7d9fc9 codegen_backend_module() now passes correct type designators to isinstance in the generated script
Summary: For methods returning complex (i.e. container) types, the existing code attempted to pass type designators with unsupported syntax (e.g. `Tensor[]`) into `isinstance`. Will now use the correct syntax supported by TorchScript (i.e. `List[Tensor]`).

Test Plan:
Unfortunately, a backend supporting methods returning container types has not yet been identified so the functionality cannot be tested end-to-end.

Adding a printout of `method_ct.format(method_te)` before https://fburl.com/code/4619d12g lets inspect the difference in the generated method body, e.g.:

```
assert isinstance(_0, List[Tensor])
```
vs
```
assert isinstance(_0, Tensor[])
```

Reviewed By: allwu

Differential Revision: D29537358

fbshipit-source-id: 3356f3c1477aa9304e1f070711f480441579414d
2021-07-13 12:18:17 -07:00
b42cc19c88 Fix broken assertion error test in NNAPI convertor (#61586)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61586

Error message was changed

Test Plan:
pytest test/test_nnapi.py:

Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D29682319

fbshipit-source-id: 52a96d79633ee9aae1de2056c7583311edc92353
2021-07-13 11:46:32 -07:00
2ade4d2a92 .github: Ensure clean workspaces before checkout (#61565)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61565

I was noticing the checkout step failing a lot for me, this adds a
cleaning step to fully remove the github workspace before attempting to
do your checkout

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: zhouzhuojie

Differential Revision: D29671074

Pulled By: seemethere

fbshipit-source-id: 43a8f9a9a272c6bdbfffa9c6263443aac37f4b89
2021-07-13 11:13:48 -07:00
d5204064dc [BE] Fix flaky ProcessGroupGloo tests (#61396)
Summary:
A hypothesis as to why tests such as https://github.com/pytorch/pytorch/issues/57469 may be flaky is due to `c10d = ProcessGroupGloo(...)` is not actually guaranteed to be a synchronization point, so some ranks may create the PG, run all the error checking (which does not actually call into gloo APIs so doesn't require synchronization), and then exit, all before other ranks have created the gloo pg.

This can result in the following error:
```
File "distributed/test_c10d_gloo.py", line 1037, in test_reduce_checks
May 03 06:42:34     pg = c10d.ProcessGroupGloo(store, self.rank, self.world_size, self.opts())
May 03 06:42:34 RuntimeError: [/var/lib/jenkins/workspace/third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [127.0.0.1]:35521
```

which indicates that the remote end has hung up. Furthermore all the flaky tests in this file only do error checking and don't call into the gloo APIs, further indicating that this issue may be the root cause. Not 100% sure this PR will fix it because I haven't been able to actually repro the issue even after 10000+ runs, but it happens regularly in CI.

To fix this, we add a `dist.barrier(group=pg)` call after creating the pg to enforce a synchronization. Would be good to land this and observe whether it helps with the flakiness.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61396

Reviewed By: mrshenli

Differential Revision: D29664189

Pulled By: rohan-varma

fbshipit-source-id: bc046d5d816fe6cb426522b85312383bfa3f90b7
2021-07-13 10:34:59 -07:00
3e5d2b539d Replace deprecated comment with C10_DEPRECATED in linalg.h (#60374)
Summary:
Replace // DEPRECATED comment with C10_DEPRECATED.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60374

Reviewed By: H-Huang

Differential Revision: D29661630

Pulled By: heitorschueroff

fbshipit-source-id: fc086276fd7d3ddfb8d17c67ade456377ef0e990
2021-07-13 08:21:22 -07:00
9679fa7f30 Update cpp_extension.py (#61484)
Summary:
By default, majority of Python-3.[6789] installation comes with `pkg_resources.packaging` version 16.8 (or `setuptool` older than 49.6.0), which does not have major/minor properties on Version package, as one can observe in https://github.com/pypa/setuptools/blob/v49.5.0/pkg_resources/_vendor/packaging/version.py
On the other hand, compare operators exists, so why not use it to check for version equality

Fixes https://github.com/pytorch/pytorch/issues/61036

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61484

Reviewed By: walterddr, seemethere

Differential Revision: D29643883

Pulled By: malfet

fbshipit-source-id: 3db9168c1b009ac3a278709083ea8c5b417471b8
2021-07-13 07:11:58 -07:00
0afbb9e81e PYTHON_LIBRARY may be set to empty or NOTFOUND. (#61230)
Summary:
Not sure why (maybe from dependencies?) but it can certainly break package lookup upon re-entry of cmake.
So instead of checking whether they are defined, we should check whether there is any meaningful value inside.

Fixes https://github.com/pytorch/pytorch/issues/59887

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61230

Reviewed By: H-Huang

Differential Revision: D29668766

Pulled By: malfet

fbshipit-source-id: 79a59578740c4434327aff4f9a22eba9c4bf48d1
2021-07-13 07:09:31 -07:00
ac6ec0efa1 [ROCM] fix bug in #60313 (#61073)
Summary:
This PR fixes a bug in https://github.com/pytorch/pytorch/issues/60313. Where the tensors generated by _generate_valid_rocfft_input are on the cpu instead of the gpu. This was due to using numpy to generate tensors and converting it to pytorch using torch.from_numpy. This leads to the generated tensors staying on the cpu. We now generate the tensors using pytorch itself which carries over the device type of the input tensors to the generated tensor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61073

Reviewed By: H-Huang

Differential Revision: D29668418

Pulled By: malfet

fbshipit-source-id: ce2025c26d079c15603a89b9bf7878f48d73155e
2021-07-13 07:08:17 -07:00
2e49c5dc37 Move GetArgumentNamesModule registration to InterpreterManager() (#61549)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61549

Move GetArgumentNamesModule registration to InterpreterManager() such that the module is a permanent part of the interpreters and can be used by InterpreterSession.global() freely.

Test Plan: [... ~/fbsource/fbcode/caffe2] buck test mode/dev caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor.GetArgumentNames

Reviewed By: wconstab

Differential Revision: D29643460

fbshipit-source-id: cf132d4795cbb334ce164ac715d590a105535508
2021-07-13 00:57:01 -07:00
5144381b1d [pytorch][JIT] Widen exception caught by ScriptList casting (#61520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61520

This commit widens the exception caught by the try-catch block that checks if
an object passed to a scripted function is a `ScriptList`. It turns out that
there are internal tests that do not throw a `py::cast_error` so catching only
that is not sufficient.

Test Plan: Ran the failing tests in T94889011.

Reviewed By: Chillee

Differential Revision: D29560815

fbshipit-source-id: 442258f8997146d833a9d5db923e1f6359f2bfdd
2021-07-12 23:20:58 -07:00
94840969e4 SGX can not read from /dev/urandom (#60368)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60368

Problem:
The SGX secure enclave does not support reading from /dev/urandom as it is isolated from the OS for greater security. The SGX api provides a way to generate random numbers as a replacment.
Solution:
Conditionally enable SGX api for random number generation when building for it.

Test Plan: Run the PyTorch tests

Reviewed By: malfet, LiJihang

Differential Revision: D29022616

fbshipit-source-id: 1c7115457a2abde682df4d55fa4a8446fc5f8613
2021-07-12 20:43:23 -07:00
8a2c7d902f [static runtime] Add DCHECK to ensure that outputs do not overlap with immutable inputs (#61301)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61301

This change adds a `DCHECK` to ensure that outputs do not overlap with immutable inputs.

Test Plan:
Added unittests as follows:

- `ProcessedNode.VerifyOutputsNotOverlappingWithImmutableInputsWithImmutableArguments`
- `ProcessedNode.VerifyOutputsNotOverlappingWithImmutableInputsWithMutableArguments`

Reviewed By: hlu1

Differential Revision: D29564158

fbshipit-source-id: bf14b4978ab544af79010cf724ed28202b4521cc
2021-07-12 18:04:05 -07:00
4ef640d6f6 Sort imports of test_datapipe.py (#61312)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61312

Sorting according to isort output. Alphabetically ordered one per line imports help merging.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29588833

Pulled By: VitalyFedyunin

fbshipit-source-id: 4c80c3086132b50894e734ad6c5799d78d689e42
2021-07-12 15:33:20 -07:00
fd13e925ec Adding backward compatibility for sharding support in old DataLoader (#61237)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61237

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29588832

Pulled By: VitalyFedyunin

fbshipit-source-id: 3bfa4417f6a04450f656ecf28fc95322d2cf076a
2021-07-12 14:53:45 -07:00
d3cb065b2f Implement usage of is_shardable and apply_sharding (#61236)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61236

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29588835

Pulled By: VitalyFedyunin

fbshipit-source-id: 00c3042f96af498637b2dcf6e3f842c1fc05ddd8
2021-07-12 14:23:20 -07:00
4d842d909b Revert FC workaround for ReflectionPad3d (#61308)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61248

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61308

Reviewed By: iramazanli

Differential Revision: D29566849

Pulled By: jbschlosser

fbshipit-source-id: 8ab443ffef7fd9840d64d71afc2f2d2b8a410ddb
2021-07-12 14:19:07 -07:00
2fd37a830e Revert D29642893: .github: Add force_on_cpu tests for windows
Test Plan: revert-hammer

Differential Revision:
D29642893 (a52de0dfec)

Original commit changeset: 2dd2b295c71d

fbshipit-source-id: c01c421689f6d01cdfb3fe60a8c6428253249c5f
2021-07-12 14:01:44 -07:00
7fdce39a4b Revert D29642891: .circleci: Remove force_on_cpu jobs from circleci
Test Plan: revert-hammer

Differential Revision:
D29642891 (2aedd17661)

Original commit changeset: d51bb859bc28

fbshipit-source-id: a39a2d57d6e68961d94d4137a57bdc280f9b1b5b
2021-07-12 13:59:39 -07:00
58df01c3b8 clarify default value of requires_grad for tensors (#61038)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61038

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29491984

Pulled By: dagitses

fbshipit-source-id: 7e6b7f8e81d77f38c881b86a68c17d3cf5483dad
2021-07-12 12:57:37 -07:00
5897a60480 warn about SVD outputs not supporting backprop (#61037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61037

* **#61037**

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29491985

Pulled By: dagitses

fbshipit-source-id: 6322e7c86cade52671062ee97d2fcb8c15d8aa86
2021-07-12 12:55:37 -07:00
65ab861ec6 fix mm not correctly report TORCH_CHECK failure issue (#61394)
Summary:
fixes https://github.com/pytorch/pytorch/issues/61291.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61394

Reviewed By: zhouzhuojie, seemethere

Differential Revision: D29614208

Pulled By: walterddr

fbshipit-source-id: f49a15dde708e30b06059b47fae1cda7c2c3571c
2021-07-12 12:50:51 -07:00
68f9819df4 Typo fix (#41121)
Summary:
Description:
- Typo fix in the docstring

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41121

Reviewed By: heitorschueroff

Differential Revision: D29660228

Pulled By: ezyang

fbshipit-source-id: fc2b55683ec5263ff55c3b6652df3e6313e02be2
2021-07-12 12:43:47 -07:00
255a324258 add nesting_level as attribute to pickle for map datapipe (#61534)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61534

currently, attribute `nesting_level` on `MapIterDataPipe` is not pickled. this yields `AttributeError` exceptions when multiprocessing with `DataLoader`

this diff adds it as an attribute to pickle

Test Plan: confirmed errors go away after change

Reviewed By: ejguan

Differential Revision: D29648655

fbshipit-source-id: 943b57eaff9712eb7ce92f43cb360acdb3111f2b
2021-07-12 11:41:01 -07:00
5144cc029e Bump docker image tag for clang-tidy (#61545)
Summary:
Fixes recent `clang-diagnostic-errors` on clang-tidy runs

See https://github.com/pytorch/test-infra/pull/59

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61545

Reviewed By: malfet, seemethere

Differential Revision: D29664061

Pulled By: 1ntEgr8

fbshipit-source-id: cca482a8774e34e61919f2298846ae0b479bf224
2021-07-12 11:32:39 -07:00
a5a10fe353 Move all downloading logic out of common_utils.py (#61479)
Summary:
and into tools/ folder

Currently run_tests.py invokes tools/test_selections.py
1. download and analyze what test_file to run
2. download and parse S3 stats and pass the info to local files.
3. common_utils.py uses download S3 stats to determine what test cases to run.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61479

Reviewed By: janeyx99

Differential Revision: D29661986

Pulled By: walterddr

fbshipit-source-id: bebd8c474bcc2444e135bfd2fa4bdd1eefafe595
2021-07-12 11:23:22 -07:00
2aedd17661 .circleci: Remove force_on_cpu jobs from circleci (#61473)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61473

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: walterddr

Differential Revision: D29642891

Pulled By: seemethere

fbshipit-source-id: d51bb859bc28efe15618d1e65f1a1cee64d60508
2021-07-12 11:17:33 -07:00
a52de0dfec .github: Add force_on_cpu tests for windows (#61472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61472

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: walterddr

Differential Revision: D29642893

Pulled By: seemethere

fbshipit-source-id: 2dd2b295c71d79593ad7f71d6160de4042c08b80
2021-07-12 11:16:17 -07:00
51d18369c3 [1/N] Nnapi backend delegation preprocess (#61499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61499

Added a preprocess function for the delegate to Nnapi backend (internal and external files).

In the past we had functions and classes for converting to the Nnapi backend. Now, these functions and classes will be wrapped by the delegate API.

### nnapi_backend_preprocess.cpp:

Contains the preprocess function, which uses Pybind to call an existing python function, `convert_model_to_nnapi()`.
- The model is wrapped by a `RecursiveScriptModule`, so that `convert_model_to_nnapi()` can run correctly, since when jumping from Python to C++ to Python, the model loses its original wrapper.
- A tensor, which includes shape, data type, and quantization information, is passed through preprocess's compile_spec to `convert_model_to_nnapi()`.
- Finally, the Nnapi model is serialized for mobile and returned as a string.
### nnapi_backend_lib.cpp:
Contains stub functions for compile and execute, and is necessary for the Nnapi backend to be registered correctly. These will be implemented in a future PR.

**TODO:** implement execute and compile for the delegate API; throw exceptions for incorrect an compile_spec; add OSS tests
**Testing:** Tests were done locally (see D29647123). A simple module was lowered to Nnapi, saved locally, and examined.

ghstack-source-id: 133415234

Test Plan:
Tests were done locally (see D29647123).
TODO: add test in OSS in test_backends.py after CMake is ready.
I ran buck run caffe2:nnapi_backend_example. The model files are saved as nnapi_model.ptl and mobile_model.ptl. I checked that both zip files have expected contents.

Reviewed By: iseeyuan

Differential Revision: D29563351

fbshipit-source-id: 642e349356e38aecc1b9973c285569650c02668c
2021-07-12 11:13:05 -07:00
3faf6a715d [special] migrate log_softmax (#60512)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345

Rendered Docs: https://14335157-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.log_softmax

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60512

Reviewed By: iramazanli

Differential Revision: D29626262

Pulled By: mruberry

fbshipit-source-id: c42d4105531ffb004f11f1ba6ae50be19bc02c91
2021-07-12 11:01:25 -07:00
f2857883c4 Add DataPipes Graph Functions (#61235)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61235

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29588834

Pulled By: VitalyFedyunin

fbshipit-source-id: e0331d6e1fc2a3f8b6211aac83965bcf13165161
2021-07-12 10:28:35 -07:00
25a705610f ENH Adds support for no-batch dim in AdaptiveAvgPool1d (#61264)
Summary:
Towards https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61264

Reviewed By: iramazanli

Differential Revision: D29615292

Pulled By: jbschlosser

fbshipit-source-id: 826d1c87d67261a7211270e90e3a1022bbbe37bd
2021-07-12 10:24:37 -07:00
583b045fc3 Make .contiguous(memory_format) call .clone(memory_format) (#61456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61456

functorch is unable to `vmap(grad(f))` when `f` contains a `.contiguous`
call. This is because `.contiguous` (when it is not a no-op) decomposes
to `.copy_` under grad and the `.copy_` is not compatible with vmap.

The fix for this is to have `.contiguous` call `.clone` instead of
`.copy_`. `clone` is a primitive w.r.t. to autograd, so `grad`
decomposes contiguous into clone.

Perf testing (forward pass)
- [script and
output](https://gist.github.com/zou3519/294f583b9c5d7bdf234d5295f97fb02e)
- The instruction count increased from 774479 to 781379. This is because
we're now calling .clone(), which does an additional dispatch. We could
optimize the implementation of clone() to not dispatch on .copy_() in
the future if we really care about this.

Perf testing (backward pass)
- [script and
output](https://gist.github.com/zou3519/6fbdb121de6342334192d55c8a72276a)
- The instruction count decreased from 5402648 to 5335977. This is
because the [backward for
.clone](9b908ab0d0/tools/autograd/derivatives.yaml (L383))
is a lot simpler than the [backward for
copy_](9b908ab0d0/torch/csrc/autograd/functions/tensor.cpp (L37-L41))
- The backward for .clone() and .copy_() end up doing the same thing for
contiguous (from reading the code above, they both do no-op copies).

Test Plan:
- wait for existing tests (test_view_ops have the tests)
- functorch isn't tested in PyTorch CI yet.
- Taking suggestions on how to write a test for this. I'm thinking we
could use LoggingTensor from #59760 (because it logs underneath
autograd) and test that clone is called instead of copy_ but I didn't
want to refactor it into a utility

Reviewed By: soulitzer

Differential Revision: D29636859

Pulled By: zou3519

fbshipit-source-id: 97eb56bfae1c4bb31612dc9d06536019f21d69a6
2021-07-12 10:19:33 -07:00
5a20c56ebc [static runtime] Remove hasOperation() check (#61496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61496

glow::FusionGroup is JitOnlyOperator that produces an Operation when passed a Node* https://fburl.com/ybwfn3bl

hasOperation doesn't return true in that case https://fburl.com/19wd10aw

by removing the hasOperation() check, the Operation gets successfully materialized, and static runtime enables successfully and runs ok. Will check that the outputs match with jit interpreter

Test Plan:
Test with 281805158_2
```
./buck-out/gen/admarket/lib/ranking/prediction_replayer/replayer --model_inference_type_target=DISAGG_ACCELERATOR --prediction_replayer_force_model_type=inline_cvr_post_imp_model --prediction_replayer_force_model=281805158_2 --prediction_replayer_target_tier=127.0.0.1:7447 --prediction_replayer_input_stream_filename=/data/users/ansha/tmp/adfinder/filter_requests_inline_cvr_post_imp_model_1000_2021_04_29 --ignore_model_id_mismatch --check_performance --fully_remote_sr_connection_options="overall_timeout:10000000,processing_timeout:10000000" --use_new_encoding_for_ads_services --use_new_encoding_from_model_id_to_shard_id --sigrid_force_model_dir=/data/users/ansha/tmp/adfinder/281805158_2/ --sigrid_predictor_model_suffix=.predictor.disagg.local —use_new_encoding_from_model_id_to_shard_id=true --prediction_replayer_force_model_kind=19 --pytorch_predictor_static_runtime_enable=true --prediction_replayer_target_qps=1
```

```
NNPI_LOG_LEVEL=0 USE_INF_API=1 ./buck-out/gen/sigrid/predictor/sigrid_remote_predictor_glow_nnpi \
  --force_models=281805158_2 \
  --sigrid_predictor_model_suffix=.predictor.disagg.remote_other \
  --gflags_config_path=sigrid/predictor/gflags/predictor_gflags_ads_perf_glow_nnpi_pyper_v1 \
  --smc_server_port=7447 \
  --sigrid_predictor_tier_name=sigrid.predictor.perf.dianshi_staticruntime_debug_0604.test.storage \
  --predictor_storage_smc_tier=sigrid.predictor.perf.dianshi_staticruntime_debug_0604.test.storage \
  --predictor_storage_smc_tier_v2=sigrid.predictor.perf.dianshi_staticruntime_debug_0604.test.storage \
  --torch_glow_min_fusion_group_size=30 \
  --glow_enable_sanitize_inputs=100 \
  --sigrid_force_model_dir=/data/users/ansha/tmp/adfinder/281805158_2/ \
  --pytorch_predictor_static_runtime_enable=true \
  --pytorch_predictor_glow_enable=true \
  --pytorch_predictor_enable_loading_xl_format_on_cpu=false \
  --pytorch_disagg_acc_input_dump_path=/tmp/
```

Reviewed By: hlu1

Differential Revision: D29647043

fbshipit-source-id: 8ce6dc0f4f0464b65ca6a8c9d42e3d8bb392e66e
2021-07-12 10:09:33 -07:00
99959fe3f5 [DataLoader] Adding demux and mux DataPipe-s (#61234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61234

* **#61234 [WIP] Adding demux and mux DataPipe API examples**

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29588836

Pulled By: VitalyFedyunin

fbshipit-source-id: 523d12ea6be7507d706b4c6d8827ec1ac4ccabc3
2021-07-12 10:04:03 -07:00
d46689a201 OpInfo reference tests for add and sub (#61169)
Summary:
This PR adds OpInfo reference checks for `add, sub`. See https://github.com/pytorch/pytorch/issues/54261

cc: mruberry pmeier

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61169

Reviewed By: iramazanli

Differential Revision: D29625702

Pulled By: mruberry

fbshipit-source-id: c5e536ab52865890990353c5c862b44b5a16ed20
2021-07-12 09:27:22 -07:00
c18017190b Relax some linalg test tolerances (#61101)
Summary:
We are seeing some test failures on A100 machine, though TF32 matmul is not involved in these cases.

I tried `svd_lowrank` test. It passed while testing itself, but failed when I run the whole test suite. It's probably some random seed issue. Relax test tolerance would be much easier to do.

Some SVD tests failed when we compare CPU float32 vs GPU float32. Since linear algebra are sort of unstable at single precision, comparing two single precision results may give some false positives. So we calculate CPU results in float64 or complex128, which is much more accurate.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61101

Reviewed By: ngimel

Differential Revision: D29593483

Pulled By: mruberry

fbshipit-source-id: 3df651e3cca1b0effc1a4ae29d4f26b1cb4082ed
2021-07-12 09:17:59 -07:00
bacf8ecbd1 Make pin_memory/is_pinned use BackendSelect (#60547)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60547

These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. https://github.com/pytorch/pytorch/pull/59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD, bdhirsh

Differential Revision: D29331881

Pulled By: ezyang

fbshipit-source-id: db3b4e2c872ba1caa0243fecc60a4da65179ce28
2021-07-12 09:13:14 -07:00
7136a62b56 Add expecttest to CONTRIBUTING.md (#61163)
Summary:
Now expecttest is an independent library but `CONTRIBUTING.md` and `requirements.txt` do not mention the need of the library.

Related: https://github.com/pytorch/pytorch/pull/60658

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61163

Reviewed By: heitorschueroff

Differential Revision: D29660296

Pulled By: ezyang

fbshipit-source-id: e2e86d42526c83bec7cdf7221e19fe83d9686103
2021-07-12 09:11:12 -07:00
8754238410 torch._utils.ExceptionWrapper: fix for Exceptions with multiple args (#58131)
Summary:
Here's an example of what this PR should fix:
```
from torch._utils import ExceptionWrapper

class TwoArgException(Exception):
    def __init__(self, msg, count): ...

# If you need a "real world" exception with two args, here's one from the stdlib:
# import asyncio
# TwoArgException = asyncio.exceptions.LimitOverrunError
# or if on Python 3.7, try:
# TwoArgException = asyncio.streams.LimitOverrunError

try:
    raise TwoArgException("oh no", 0)
except Exception as e:
    data = ExceptionWrapper(where="in a test case")

data.reraise()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58131

Reviewed By: heitorschueroff

Differential Revision: D29660248

Pulled By: ezyang

fbshipit-source-id: cbcecfee9cac183354542e147ee3d956038c8986
2021-07-12 09:04:36 -07:00
93d98ecef7 update the pytorch-gdb example so that it works on current master (#61175)
Summary:
As pointed out by https://github.com/pytorch/pytorch/pull/54339#issuecomment-872827580, the `pytorch-gdb` example is currently broken because the code has been refactored.

This PR updates the example so that it works again.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61175

Reviewed By: heitorschueroff

Differential Revision: D29660336

Pulled By: ezyang

fbshipit-source-id: 8bcd32fc583c0b28a705ef37203ce7ad4d636732
2021-07-12 08:57:18 -07:00
cyy
0de35fe039 fix return local reference (#59913)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59913

Reviewed By: soulitzer

Differential Revision: D29107110

Pulled By: ezyang

fbshipit-source-id: c0f9888867c7dfeb05f6a3b9d2067df35e1e3ffb
2021-07-12 08:29:32 -07:00
d4549ba5dc Add VS_VERSION to Circle (#61532)
Summary:
Fixes current HUD 10.1 failure https://app.circleci.com/pipelines/github/pytorch/pytorch/349359/workflows/ead2904b-3f37-4c9d-b271-a8e772046523/jobs/14713215

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61532

Test Plan: The new 10.1 CI run: https://app.circleci.com/pipelines/github/pytorch/pytorch/349677/workflows/b7143b56-e8e7-4f85-8bdf-0ce50788f3c0/jobs/14727686

Reviewed By: walterddr

Differential Revision: D29661179

Pulled By: janeyx99

fbshipit-source-id: 5023c41fe6ddce4113116b07d8f0fd7d66c864a8
2021-07-12 08:21:02 -07:00
cyy
00c4897c51 use make_unique (#61272)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61272

Reviewed By: pbelevich

Differential Revision: D29660354

Pulled By: ezyang

fbshipit-source-id: f0aba1ea6983aec415915ed9b7dbced2e2b3b171
2021-07-12 08:09:46 -07:00
ac086ca15b Update version.txt file path (#61177)
Summary:
The file version.txt is located one directory above generate_torch_version,
some platforms are unable to find this file unless given an explicit
path.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61177

Reviewed By: pbelevich

Differential Revision: D29660334

Pulled By: ezyang

fbshipit-source-id: f66105f782aaff031e373f96a69baabb13c89337
2021-07-12 07:30:10 -07:00
09679af260 Delete dead code in Tensor::to implementation (#61435)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61435

Deleted the following:
- I couldn't find the NOTE mentioned so I deleted the reference to it
- The memory_format check (because it always passes)
- The requires_grad check (because it always passes)

Test Plan: - run tests

Reviewed By: soulitzer

Differential Revision: D29636872

Pulled By: zou3519

fbshipit-source-id: 48a32c1821b72c512d337becf2398ce7f4cf01a2
2021-07-12 07:10:27 -07:00
60086ab39b Remove export PYTHONPATH hacks (#61487)
Summary:
Remove `export PYTHONPATH=$PWD` in favor of `-m`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61487

Test Plan: Let's see if CI passes

Reviewed By: 1ntEgr8

Differential Revision: D29645544

Pulled By: janeyx99

fbshipit-source-id: 841aea8ebed2cb1c7dbc68754b5fbdee932559c2
2021-07-12 06:59:50 -07:00
5c1505076b [Codemod][FBSourceBlackLinter] Daily arc lint --take BLACK
Reviewed By: zertosh

Differential Revision: D29656934

fbshipit-source-id: c40bbc8e4512b145050ee47db2c8dc781f3c36e9
2021-07-12 04:15:21 -07:00
666dff381d add AdaptiveAvgPooling2D (#61239)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61239

Test Plan: Imported from OSS

Reviewed By: iramazanli

Differential Revision: D29626359

Pulled By: migeed-z

fbshipit-source-id: b7cd4ce4176e2d6e7a853974443affd23a49d3d9
2021-07-10 20:07:14 -07:00
93ef40bd83 add linear operation and modify one of the tests (#61238)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61238

Test Plan: Imported from OSS

Reviewed By: iramazanli

Differential Revision: D29626333

Pulled By: migeed-z

fbshipit-source-id: d4303918e380d64ba8ab678f249db6674e89357a
2021-07-10 20:07:12 -07:00
292ee65261 add maxpool2D, add more tests, handle integer parameters for maxpool2D (#61188)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61188

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D29626303

Pulled By: migeed-z

fbshipit-source-id: 32309cd1eb1189beaba63017653b3aeccdf2761d
2021-07-10 20:06:07 -07:00
7a15576a65 [quant] update FakeQuant modules to use tensor qparams (#61318)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61318

Remove the `float()` and `int()` calls in the forward function so that we can directly use the tensor qparams in the fake_quantize operator.

Calling `float()/int()` internally calls `item()` which can trigger a gpu-> cpu copy if the original tensors reside on GPU.
Local benchmark P427668213

Before this change
```
                                               Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
---------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                     aten::_aminmax         2.57%       1.507ms         3.10%       1.819ms      36.371us       2.872ms         4.81%       2.872ms      57.446us            50
              aten::fake_quantize_per_tensor_affine         1.04%     610.915us         3.60%       2.114ms      42.276us     472.896us         0.79%       2.698ms      53.962us            50
    aten::fake_quantize_per_tensor_affine_cachemask         1.69%     993.626us         2.56%       1.503ms      30.058us       2.225ms         3.73%       2.225ms      44.504us            50
                                   aten::is_nonzero         3.85%       2.258ms        19.68%      11.540ms      46.161us       2.168ms         3.63%      11.084ms      44.336us           250
                                   aten::zeros_like         1.82%       1.064ms         6.65%       3.901ms      39.007us       1.531ms         2.57%       3.905ms      39.045us           100
                                           aten::eq        13.80%       8.093ms        25.90%      15.189ms      37.972us       9.580ms        16.05%      15.566ms      38.914us           400
                                         aten::item         5.67%       3.323ms        21.50%      12.607ms      36.019us       3.233ms         5.42%      12.167ms      34.762us           350
                                        aten::zeros         0.94%     549.208us         2.93%       1.717ms      34.343us     688.928us         1.15%       1.695ms      33.894us            50
                                           aten::le         2.52%       1.478ms         4.50%       2.641ms      26.411us       1.753ms         2.94%       2.845ms      28.448us           100
                                         aten::rsub         1.04%     608.715us         2.44%       1.433ms      28.667us     532.000us         0.89%       1.418ms      28.353us            50
                                          aten::max         1.54%     905.401us         4.62%       2.711ms      27.106us     847.488us         1.42%       2.697ms      26.969us           100
                                         aten::ones         0.92%     542.159us         2.16%       1.266ms      25.324us     661.856us         1.11%       1.301ms      26.017us            50
                                          aten::min         0.82%     479.167us         2.15%       1.258ms      25.160us     407.808us         0.68%       1.276ms      25.530us            50
                          aten::_local_scalar_dense        15.83%       9.284ms        15.83%       9.284ms      26.526us       8.934ms        14.97%       8.934ms      25.524us           350
                                        aten::clamp         2.35%       1.378ms         4.21%       2.467ms      24.669us       1.546ms         2.59%       2.461ms      24.612us           100
                                        aten::zero_         2.53%       1.482ms         5.65%       3.316ms      22.108us       1.326ms         2.22%       3.380ms      22.531us           150
                                      aten::maximum         3.08%       1.805ms         3.08%       1.805ms      18.052us       1.849ms         3.10%       1.849ms      18.494us           100
                                      aten::minimum         1.33%     778.854us         1.33%     778.854us      15.577us     868.672us         1.46%     868.672us      17.373us            50
                                        aten::round         1.36%     799.910us         1.36%     799.910us      15.998us     809.568us         1.36%     809.568us      16.191us            50
                                        aten::copy_         6.61%       3.878ms         6.61%       3.878ms      15.513us       4.036ms         6.76%       4.036ms      16.143us           250
                                          aten::div         2.53%       1.483ms         2.53%       1.483ms      14.833us       1.535ms         2.57%       1.535ms      15.353us           100
                                          aten::mul         2.44%       1.431ms         2.44%       1.431ms      14.314us       1.478ms         2.48%       1.478ms      14.782us           100
                                       aten::detach         1.46%     855.670us         2.41%       1.411ms      14.110us     832.448us         1.39%       1.395ms      13.949us           100
                                          aten::add         2.22%       1.301ms         2.22%       1.301ms      13.008us       1.383ms         2.32%       1.383ms      13.828us           100
                                        aten::fill_         4.18%       2.452ms         4.18%       2.452ms      12.262us       2.693ms         4.51%       2.693ms      13.463us           200
                                          aten::sub         5.06%       2.967ms         5.06%       2.967ms      14.837us       2.675ms         4.48%       2.675ms      13.374us           200
                                           aten::to         2.10%       1.230ms         3.65%       2.140ms      10.701us       1.310ms         2.20%       2.062ms      10.310us           200
                                       aten::select         1.28%     749.144us         1.49%     874.227us       8.742us     863.232us         1.45%     863.232us       8.632us           100
                                             detach         0.95%     555.326us         0.95%     555.326us       5.553us     562.496us         0.94%     562.496us       5.625us           100
                                   aten::as_strided         0.40%     232.289us         0.40%     232.289us       1.161us       0.000us         0.00%       0.000us       0.000us           200
                                        aten::empty         2.93%       1.720ms         2.93%       1.720ms       3.439us       0.000us         0.00%       0.000us       0.000us           500
                                      aten::resize_         1.04%     611.313us         1.04%     611.313us       2.038us       0.000us         0.00%       0.000us       0.000us           300
                                   aten::empty_like         0.75%     438.585us         1.77%       1.036ms       5.180us       0.000us         0.00%       0.000us       0.000us           200
                                aten::empty_strided         1.36%     799.442us         1.36%     799.442us       3.198us       0.000us         0.00%       0.000us       0.000us           250
---------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 58.645ms
Self CUDA time total: 59.674ms
```

After this change
```

test_fake_quant_profiler (scripts.supriyar.benchmark.module_bench.ProfilerBench) ... -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                  aten::fake_quantize_per_tensor_affine         0.98%     505.210us         4.38%       2.259ms      45.187us     419.424us         0.78%       3.218ms      64.367us            50
                                         aten::_aminmax         2.78%       1.434ms         3.42%       1.766ms      35.321us       2.825ms         5.27%       2.825ms      56.505us            50
aten::fake_quantize_per_tensor_affine_cachemask_tens...         2.38%       1.229ms         3.40%       1.754ms      35.083us       2.799ms         5.22%       2.799ms      55.979us            50
                                             aten::rsub         0.94%     485.040us         5.02%       2.590ms      51.793us     458.976us         0.86%       2.587ms      51.747us            50
                                       aten::is_nonzero         3.78%       1.952ms        23.64%      12.196ms      48.786us       2.055ms         3.83%      11.986ms      47.944us           250
                                             aten::item         6.92%       3.572ms        19.86%      10.244ms      40.977us       3.670ms         6.85%       9.931ms      39.724us           250
                                       aten::zeros_like         1.65%     848.874us         6.64%       3.426ms      34.260us       1.397ms         2.61%       3.572ms      35.717us           100
                                            aten::zeros         0.85%     436.691us         3.00%       1.549ms      30.984us     551.936us         1.03%       1.576ms      31.516us            50
                                               aten::eq        10.60%       5.467ms        20.26%      10.452ms      26.130us       7.018ms        13.09%      10.832ms      27.079us           400
                                               aten::le         2.58%       1.332ms         4.67%       2.407ms      24.074us       1.580ms         2.95%       2.614ms      26.144us           100
                              aten::_local_scalar_dense        12.93%       6.673ms        12.93%       6.673ms      26.691us       6.261ms        11.68%       6.261ms      25.046us           250
                                            aten::clamp         2.43%       1.253ms         4.37%       2.256ms      22.560us       1.431ms         2.67%       2.273ms      22.725us           100
                                             aten::ones         0.89%     460.133us         2.18%       1.123ms      22.467us     570.496us         1.06%       1.128ms      22.551us            50
                                              aten::min         0.74%     383.132us         2.06%       1.065ms      21.296us     377.536us         0.70%       1.091ms      21.824us            50
                                            aten::zero_         2.36%       1.219ms         5.87%       3.029ms      20.194us       1.261ms         2.35%       3.199ms      21.327us           150
                                              aten::max         1.51%     779.081us         4.06%       2.096ms      20.960us     791.680us         1.48%       2.130ms      21.295us           100
                                              aten::sub         7.97%       4.111ms         7.97%       4.111ms      20.556us       3.847ms         7.18%       3.847ms      19.234us           200
                                              aten::div         2.94%       1.516ms         2.94%       1.516ms      15.158us       1.580ms         2.95%       1.580ms      15.798us           100
                                            aten::round         1.45%     750.445us         1.45%     750.445us      15.009us     756.064us         1.41%     756.064us      15.121us            50
                                            aten::copy_         6.88%       3.548ms         6.88%       3.548ms      14.190us       3.701ms         6.90%       3.701ms      14.803us           250
                                          aten::minimum         1.32%     681.654us         1.32%     681.654us      13.633us     713.664us         1.33%     713.664us      14.273us            50
                                          aten::maximum         2.55%       1.317ms         2.55%       1.317ms      13.169us       1.338ms         2.50%       1.338ms      13.378us           100
                                              aten::mul         2.63%       1.358ms         2.63%       1.358ms      13.581us       1.328ms         2.48%       1.328ms      13.283us           100
                                           aten::detach         1.34%     688.820us         2.35%       1.211ms      12.110us     772.800us         1.44%       1.278ms      12.779us           100
                                            aten::fill_         4.53%       2.338ms         4.53%       2.338ms      11.692us       2.495ms         4.65%       2.495ms      12.473us           200
                                              aten::add         2.32%       1.197ms         2.32%       1.197ms      11.968us       1.240ms         2.31%       1.240ms      12.405us           100
                                               aten::to         2.07%       1.069ms         3.66%       1.889ms       9.443us       1.224ms         2.28%       1.975ms       9.874us           200
                                           aten::select         1.44%     743.042us         1.64%     848.207us       8.482us     641.600us         1.20%     641.600us       6.416us           100
                                                 detach         1.01%     522.155us         1.01%     522.155us       5.222us     505.088us         0.94%     505.088us       5.051us           100
                                       aten::as_strided         0.44%     227.884us         0.44%     227.884us       1.139us       0.000us         0.00%       0.000us       0.000us           200
                                            aten::empty         3.20%       1.652ms         3.20%       1.652ms       3.304us       0.000us         0.00%       0.000us       0.000us           500
                                          aten::resize_         1.25%     646.711us         1.25%     646.711us       2.156us       0.000us         0.00%       0.000us       0.000us           300
                                       aten::empty_like         0.79%     407.768us         2.07%       1.067ms       5.334us       0.000us         0.00%       0.000us       0.000us           200
                                    aten::empty_strided         1.52%     785.788us         1.52%     785.788us       3.143us       0.000us         0.00%       0.000us       0.000us           250
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
Self CPU time total: 51.590ms
Self CUDA time total: 53.609ms
ghstack-source-id: 133370215

Test Plan: buck test mode/dev-nosan caffe2/test/:quantization

Reviewed By: raghuramank100

Differential Revision: D29566512

fbshipit-source-id: 1aefca51f99949da7334bcfe504848275c9f952c
2021-07-10 19:43:02 -07:00
99848c7269 [quant] Add tensor_qparam variant to fake_quantize_per_tensor (#61317)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61317

Add an overload to fake_quantize_per_tensor that accepts scale/zero_point as input. The reasons to do this are

* required for fused observer + fake_quant operator on GPU where the scale/zero_point will be calculated by the observer on device. Passing tensor inputs enables us to directly access the scale/zero-point value in the cuda kernel to avoid extra copies/malloc
* enables us to pass in float as scale dtype and int32 as zero_point dtype (which is consistent with what the quantize call actually uses) https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/affine_quantizer_base.cpp#L52-L53
* overload consistent with `quantizer_per_tensor.tensor_qparams`
ghstack-source-id: 133370216

Test Plan:
buck test mode/dev-nosan caffe2/test/:quantization -- test_backward_per_tensor_cachemask
buck test mode/dev-nosan caffe2/test/:quantization -- test_forward_per_tensor_cachemask

Reviewed By: raghuramank100

Differential Revision: D29552727

fbshipit-source-id: cbb9af40fc575ad27a29c646b760d5ee52cc923d
2021-07-10 19:41:55 -07:00
57676ce128 Migrate multi_margin_loss to ATen (CUDA) (#61426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61426

Closes gh-24600, closes gh-24601

These operators use custom kernels that aren't well suited to `TensorIterator` style, so this is just changing the CPU code and cleaning up the style.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29648015

Pulled By: ngimel

fbshipit-source-id: cadf1890cdc2199d57f4533370e554613efeb54a
2021-07-10 18:48:58 -07:00
5a17cb6f44 Add channels-last support for bilinear and nearest 2d interpolation on CUDA (#56322)
Summary:
Add channels-last support for bilinear and nearest 2d interpolation on CUDA

Benchmark (on 2070 Super) is available at

- nearest 2d: https://github.com/xwang233/code-snippet/tree/master/interpolate-channels-last/nearest-2d
- bilinear: https://github.com/xwang233/code-snippet/tree/master/interpolate-channels-last/bilinear

Some regressions are seen for tensors with small channel size. We may add a heuristic to dispatch the contiguous and channels-last path if needed.

Close https://github.com/pytorch/pytorch/issues/60137

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56322

Reviewed By: mruberry

Differential Revision: D29645980

Pulled By: ngimel

fbshipit-source-id: c36dff4ee4789bec9b01da4029f326d30067c6b7
2021-07-10 18:00:50 -07:00
df00c636d2 [Model Averaging] Skip model averaging for the first K steps (#61207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61207

Model averager now must be combined with post-localSGD DDP communication hook. It will skip model averaging for the first K steps, because post-localSGD communication hook will run global gradient averaging during this phase.

Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 133371335

Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager

Reviewed By: pritamdamania87

Differential Revision: D29523738

fbshipit-source-id: 3fa9611046e1c0afa4bda78aa3ba200fa2a5fa4b
2021-07-10 17:12:16 -07:00
0f6876d721 [Model Averaging] Create a post-localSGD communication hook (#61206)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61206

Create a communication hook to run post-local SGD. This will be combined with model averager component to better support local SGD.

In contrast to the previous approach that runs local gradient averaging + global model averaging at each step for the first K steps, now we plan to run global gradient averaging only for the first K steps at each step, just like normal DDP. This can give us two advantages:
1) For some optimizers, model averaging can cause discrepancy in optimizer states. If we still do global gradient averaging for the first K steps, we can defer such discrepancy until we actually start local SGD.
2) Gradient averaging at the first K steps only run one allreduce that overlaps with backward pass, so it should also be more efficient.

Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 133371322

Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_hook_parity_post_localSGD

Reviewed By: pritamdamania87

Differential Revision: D29523292

fbshipit-source-id: 3f215f7150f2917c2781278fad759530c685ea2c
2021-07-10 17:11:10 -07:00
a46d4212bf Allow dims=0 in torch.tensordot call (#61331)
Summary:
In one of my previous PRs that rewrite `tensordot` implementation, I mistakenly take empty value of `dims_a` and `dims_b` as illegal values. This turns out to be not true. Empty `dims_a` and `dims_b` are supported, in fact common when `dims` is passed as an integer. This PR removes the unnecessary check.

Fixes https://github.com/pytorch/pytorch/issues/61096

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61331

Reviewed By: eellison

Differential Revision: D29578910

Pulled By: gmagogsfm

fbshipit-source-id: 96e58164491a077ddc7a1d6aa6ccef8c0c9efda2
2021-07-10 17:05:20 -07:00
7d7b7abb3b [Static Runtime] Separate function for getting always_alive values (#61506)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61506

Separate out the logic of GetAlwaysAliveValues from GetLivenessMap so to simplify the code structure. Also you don't need to run GetLivenessMap if optimize_memory is turned off.

Reviewed By: ajyu

Differential Revision: D29423534

fbshipit-source-id: dbdeeb10f7bcad86a24aa12f741f7c9ab946bb3b
2021-07-10 16:59:29 -07:00
7fdc5f9e08 model_dump: Fix non-counting and double-counting bugs in tensor memory (#60702)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60702

- Instead of traversing and counting all tensor memory, collect a map
  from storage key to storage info while traversing.  Add up sizes at
  the end to avoid double counting.
- Count tensor memory from constants as well.

Test Plan: Ran webdriver test.

Reviewed By: dhruvbird

Differential Revision: D29380396

Pulled By: dreiss

fbshipit-source-id: 6d0fd66f677fe23c851aa218387aa4dc59502b1e
2021-07-10 15:16:34 -07:00
158d351517 model_dump: Add webdriver test (#60701)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60701

The unit test previously only tested that the dump could complete
successfully.  It was not able to verify that any JS worked properly.
Now we can test the JS as long as webdriver is installed.

Tweaked the implementation of Hider a bit to make it easier for tests to
find and open them.

I disabled the tests by default since I don't want to deal with
webdriver in CI.  Enable them with the environment variable
RUN_WEBDRIVER=1.

We could make the tests use headless mode, but it's kind of fun to watch
them run.

Add a test to verify that tensor memory computation is working for the
simple model.

Test Plan: Ran the test.

Reviewed By: dhruvbird

Differential Revision: D29380398

Pulled By: dreiss

fbshipit-source-id: f19d0b05d79ad5a8231e85422976f1889e021c89
2021-07-10 15:16:32 -07:00
cc78c463c0 model_dump: Render constants.pkl similar to data.pkl (#60700)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60700

Test Plan:
Dumped a model with a lot of constants (qconvs produced by optimizing).
Was able to see them rendered nicely.

Reviewed By: dhruvbird

Differential Revision: D29380400

Pulled By: dreiss

fbshipit-source-id: c951508b92bb2717591dd173282157e1a40a30bd
2021-07-10 15:16:31 -07:00
e292f34def model_dump: Make stdout argument for main a keyword-only argument (#60699)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60699

Also add a unit test for main, which brings the test coverage up to
~98%.  Also factor out the "needs importlib.resources" check into a
function for easier reuse.

Test Plan: CI

Reviewed By: dhruvbird

Differential Revision: D29380397

Pulled By: dreiss

fbshipit-source-id: bba16da85bf7bfb4370308e38c844694d01b47eb
2021-07-10 15:16:29 -07:00
2942e9aa80 model_dump: update maintainer comment (#60698)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60698

... to reflect that the Python command should be re-run when changing
the model.

Test Plan: CI

Reviewed By: dhruvbird

Differential Revision: D29380399

Pulled By: dreiss

fbshipit-source-id: 1ec464da4ebe6ddf400eb4a3b14da683369c0039
2021-07-10 15:15:15 -07:00
f5c10fdbd3 Allow for heterogenous List and Dict values + Improve container typing algorithm (#57137)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57137

This PR corrects and expands our typing algorithm for unannotated, non-empty dicts and lists. Previously, to verify type correctness for an unannotated, non-empty container, we had gotten the type of the first element in the container, then checked if each following element was a subtype of the first type. That's too restrictive--what if the first element were a subtype of the second element? Instead, we should type the container by getting the smallest common supertype of all the given elements.

We need slightly different rules for keys and values in dicts, though: because the set of key types is restricted, finding two key types that cannot be unified should cause an error. On the other hand, the set of value types is not restricted, so we should be able to use `Any` as a valid supertype. We need to keep the set of keys restricted since the keys are used to generate and match schemas.

This does not break backwards compatibility, because the default element type is the smallest supertype of all the given types. So, if someone creates an unannotated dict where the keys are all `str` and the values are all `torch.Tensor`, the dict will be inferred to `Dict[str, Tensor]` just like it was before. Empty lists are still typed as `List[torch.Tensor],` and empty dicts are still typed as `Dict[str, Tensor]`.

This PR unblocks three engineers on an FB-internal team and improves FX-TorchScript compatibility.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D28231839

Pulled By: ansley

fbshipit-source-id: 7297bf239749daa54895add708185c75e6ca5999
2021-07-10 14:29:05 -07:00
ccd0977060 [Static Runtime] Support prim::GetAttr/SetAttr (#61505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61505

The handling of `self` in static runtime was previously incorrect. This diff fixed that issue, since self is essential to prim::GetAttr/SetAttr. After all, most of the time we're getting and setting attributes from self, the torch script module.

Reviewed By: ajyu

Differential Revision: D29350173

fbshipit-source-id: 6e62add4cda517ef8cd6c315d4cb0595e7d531fb
2021-07-10 14:06:06 -07:00
f291b1899f Revert D27978269: Smart Decay for Adam - Caffe2
Test Plan: revert-hammer

Differential Revision:
D27978269 (aaa1e07609)

Original commit changeset: e47524101ddf

fbshipit-source-id: 334824bbf9a6ed788e75af9c292754081f70a19b
2021-07-10 13:09:58 -07:00
8bcf24b37a [TCPStore] enhance connect timeout error message (#61390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61390

Enhances this error message for better debugability.
ghstack-source-id: 133185482

Test Plan: CI

Reviewed By: H-Huang

Differential Revision: D29601528

fbshipit-source-id: f7aaf4d67ac96e6ed0b535e0200f918dd01e42f9
2021-07-10 03:57:23 -07:00
336970c03e Add note on torch.distributed backends on ROCm (#58975)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58975

Reviewed By: soulitzer

Differential Revision: D29595510

Pulled By: rohan-varma

fbshipit-source-id: 384bb67fcd003d65b76e957a474406b2a38099b9
2021-07-10 03:51:19 -07:00
73b86c9f9c Add getMethod to PytorchPredictorContainer (#61052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61052

Implement getMethod in the container in a similar way to getPredictor,
using either Deploy or Script functionality depending on how the container
was initialized and how the gflag deploy override are set.

Test Plan: Add new unit test

Reviewed By: houseroad

Differential Revision: D29346969

fbshipit-source-id: 08e95ee96d533f5a7cc9c8f9b1c53751715c9181
2021-07-09 22:27:40 -07:00
677313b670 ReLU (#61150)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61150

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D29625826

Pulled By: migeed-z

fbshipit-source-id: 10e0662e33ccd4342cedd51579a10651755b633f
2021-07-09 19:32:08 -07:00
a556c1c4dc [profiler] Update Kineto submodule (ci-all) (#61478)
Summary:
Update Kineto submodule

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61478

Test Plan:
CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61432

Reviewed By: gdankel

Differential Revision: D29646019

Pulled By: ilia-cher

fbshipit-source-id: 02ecb0a2a6b457f6537c7d6b3c475e1e0ace3b6f
2021-07-09 19:32:06 -07:00
06166a13e0 Remove VS install step unless necessary from GHA Windows workflows (#60791)
Summary:
~~This should only be merged after our AMI has been deployed after https://github.com/fairinternal/pytorch-gha-infra/pull/1. (And will likely fail our current windows jobs)~~

I have revised this PR to install VS only when it's not already installed.

This should save ~5min per Windows workflow.
![image](https://user-images.githubusercontent.com/31798555/125141598-7e886c80-e0e3-11eb-9fe0-bb9e6bcc14f1.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60791

Reviewed By: soulitzer

Differential Revision: D29643876

Pulled By: janeyx99

fbshipit-source-id: 4bcfaf5bcad9e5636a1624c3e799e7cc97a87660
2021-07-09 19:32:04 -07:00
9b2b45919a Revert D29639797: [package] error if we try to mock a module in 3.6
Test Plan: revert-hammer

Differential Revision:
D29639797

Original commit changeset: 775ed78638fb

fbshipit-source-id: 9d2f6dae7ee35c6b37338e36ec7ade9d9e2ccbc2
2021-07-09 19:31:04 -07:00
aaa1e07609 Smart Decay for Adam - Caffe2 (#61488)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61488

We want to decay learning parameters properly.  Previously this was not done when a parameter is absent from a minibatch.  We fix this by keeping track of missed minibatches and making decay catch up accordingly.

The exponential moving averages (EMA) for the first and second moments used in Adam are updated only for parameters seen in a minibatch.  Actually, for these parameters, 0 should be added to the EMAs and the EMAs should then be decayed by multiplying by beta1 and beta2 respectively.

To avoid the computational overhead of touching every parameter for every minibatch, we:
* keep track of the last time a parameter is seen
* instead of decaying the EMAs by multiplying by beta1 and beta2, we multiply by beta1^k and beta2^k, where k is the number of minibatches since the parameter was last seen.

Differential Revision: D27978269

fbshipit-source-id: e47524101ddfcb281c46c505b9b7a8f0835bc64a
2021-07-09 18:28:21 -07:00
b52909d861 [TensorExpr] Add python bindings for ArgValue class and TensorExprKernel constructor accepting custom lowerings. (#61385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61385

The bindings coverage might be not full yet, but this already allows us
to register custom lowerings from python.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D29623487

Pulled By: ZolotukhinM

fbshipit-source-id: b97ee420a57fd887e204c021b9e098764b2ee232
2021-07-09 18:27:14 -07:00
dec5aa2260 [JIT] clean up (#60390)
Summary:
* Minor: spelling, grammar.
* Add calls to `GRAPH_DUMP()` where they were missing.
* Add or expand a few comments.
* Move a few comments to seemingly more appropriate spots.
* In canonicalize_graph_fuser_ops.cpp inline `runnableInputs()` since it
  was only called in one place and had a misleading comment and
  confusing name.
* In `PeepholeOptimizeImpl::optimizeBlock()`, set `changed = true;` when
  removing `aten::is_complex`. Pretty sure its absence was a bug.
* Delete unused `_jit_pass_remove_inplace_ops` and and its
  implementation `RemoveInplaceOps()`.
* In `preprocessCaffe2Ops()`, remove redundant check for nested optional
  types. It was already checked in `checkONNXCompatibility()`.
* In `EncoderBase::AddAttribute`, log the unexpected attribute kind.
  I don't remember the repro case now but I did hit this error at some
  point and this additional logging made it easier to understand.
* In `fuseConvBatchNorm()` in eval_peephole.cpp, consistently use
  camelCase instead of snake_case for local variables.
* Add curly braces around the bodies of if and loops.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60390

Reviewed By: Krovatkin

Differential Revision: D29523283

Pulled By: SplitInfinity

fbshipit-source-id: 4e16c5648616f53da07d68dab7fdf252e06a0752
2021-07-09 16:28:27 -07:00
54ea7d33ba [package] error if we try to mock a module in 3.6 (#61469)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61469

This feature is not supported, error out early.

Differential Revision:
D29639797
D29639797

Test Plan: Imported from OSS

Reviewed By: Lilyjjo

Pulled By: suo

fbshipit-source-id: 775ed78638fb6da8f830b632726b00c0533ed176
2021-07-09 16:26:38 -07:00
a3670ba377 Add option to specify custom NNAPI serializer (#61025)
Summary:
To add serializer for custom ops we can subclass default serializer
and update ADDER_MAP

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61025

Test Plan:
* pytest test/test_nnapi.py::TestNNAPI for current serializer
* Custom serializers to be tested with custom ops

Imported from OSS

Reviewed By: anshuljain1

Differential Revision: D29480745

fbshipit-source-id: 37e3f8de3c97f6c8a486f9879ce11430ea89af34
2021-07-09 15:27:10 -07:00
cbb6ab6d88 [package] ignore dunder import errors (#61148)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61148

Changes `__import__` processing to silently skip cases where the `__import__` statement cannot be parsed. Adds failed imports to a list retrievable by `PackageExporter.failed_dunder_import_list()`.

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D29559680

Pulled By: Lilyjjo

fbshipit-source-id: 2513d0b9ef271c85cadc3f5a013fbd8c8de80b46
2021-07-09 15:27:08 -07:00
12772c8dd8 [package] PackageExporter visualization methods (#61147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61147

Basic tooling to enable users to see what is inside of a PackageExporter. Added methods:
- `externed/interned/mocked/denied_list()`: returns list of modules which are currently in the specified category
- `relied_on_by(module_name)`: returns list of modules which rely on `module_name`
- `dependency_graph_str()`: returns string format of graph for users. Example of output:
```
digraph G {
rankdir = LR;
node [shape=box];
"<res.foo.pkl>" -> "foo";
"foo" -> "torch.package";
"foo" -> "time";
"foo" -> "sentencepiece";
"foo" -> "package_top";
}
```

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D29559683

Pulled By: Lilyjjo

fbshipit-source-id: 5dff4d04af911a9c9fdd0d100420f1382eaef46e
2021-07-09 15:27:06 -07:00
b5f0576278 [package] Modify Digraph to track predecessors (#61146)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61146

Track predecessors of nodes in DiGraph in order to enable cleaner dependency visualization code.

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D29559682

Pulled By: Lilyjjo

fbshipit-source-id: 06f51b1108423aece5bdd72a7b82ab736e5e4f94
2021-07-09 15:27:04 -07:00
ae65f63971 Make nnapi flatten converter accept flex inputs (#61024)
Summary:
As title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61024

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_flatten

Reviewed By: anshuljain1

Differential Revision: D29480748

fbshipit-source-id: c334b09600a64d3e552cec843d6da3de28e7d27c
2021-07-09 15:27:02 -07:00
028e438d6c [torchelastic] Make sure rdzv_configs[timeout] is not getting overwritten (#61471)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61471

Make sure `rdzv_configs[timeout]` is not getting overwritten

Test Plan: sandcastle

Differential Revision: D29638606

fbshipit-source-id: e164cdddaed77e7e35412ed58ac1ee312e9d489d
2021-07-09 15:27:00 -07:00
1f4bba77b6 [fx] fix subgraph API call_module warning about no owning module (#61463)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61463

seems like a small oversight(?), current test fails when warnings are recorded. discovered this when calling `graph.call_module(existing_call_module_node.target)` and it raised a warning

Test Plan: `buck test //caffe2/test:fx`

Reviewed By: ansley

Differential Revision: D29637799

fbshipit-source-id: 2305629863230235f76a926fe2e4de480cbf853c
2021-07-09 15:25:44 -07:00
76c0f223d3 Make nnapi cat converter accept flex inputs
Summary: As title

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_cat

Reviewed By: anshuljain1

Differential Revision: D29480747

fbshipit-source-id: 161803054ff1a4c2c750fc30a5f0fc6d8a24b2c9
2021-07-09 14:27:53 -07:00
9e81d3d869 Make NNAPI linear converter accept flex inputs (#61022)
Summary:
As title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61022

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_linear

Reviewed By: anshuljain1

Differential Revision: D29480749

fbshipit-source-id: 35975861740298c9e16f866c939e7ee3c2151710
2021-07-09 14:27:51 -07:00
35b950ea98 [package] properly handle case where we are re-packaging mocked modules (#61434)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61434

Mocking is the only time we introduce a "special" module to a
torch.package of our own creation. This interacts poorly with
re-packaging, since if we treat `_mock` as a regular module and try to
package it normally we will produce a broken package.

This PR teaches PackageExporter to recognize `_mock` modules and treat
them specially during the dependency walking process, thus avoiding the
issue.

Test Plan: Imported from OSS

Reviewed By: jdonald, Lilyjjo

Differential Revision: D29638283

Pulled By: suo

fbshipit-source-id: 37a7ffa34da8bb665f679fbd72aa3d71154b2209
2021-07-09 14:27:49 -07:00
4f4beb8286 Add Model Parallel Support to ZeRO (#61370)
Summary:
**Overview:**
The existing `ZeroRedundancyOptimizer` implementation assumes that all model parameters are stored on the same device (due to the recent [refactor](https://github.com/pytorch/pytorch/pull/59834)). This change allows model parameters to be sharded across multiple devices, as in the DDP with Model Parallelism example [here](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html).

The only logic affected is the bucketing strategy used when `parameters_as_bucket_view=True`. Let `n` denote the world size and `k` denote the number of devices per process.
- Previously, `k = 1`, and `self._buckets` was a `List[torch.Tensor]`, where `self._buckets[j]` is a tensor (i.e. bucket) containing the parameters assigned to rank `j` for `j = 0, ..., n - 1`.
- Now, `self._buckets` is a `List[List[torch.Tensor]]`, where `self._buckets[i][j]` is a tensor containing the parameters stored on device `i` assigned to rank `j` for `i = 0, ..., k - 1` and `j = 0, ..., n - 1`.

This bucket construction uses an auxiliary data structure `self._device_to_per_rank_params`, which is a `Dict[torch.device, List[List[torch.Tensor]]]`. It maps:
- `dev_0` to `[rank 0's assigned parameters on dev_0, rank 1's assigned parameters on dev_1, ...]`,
- `...`
- `dev_{k-1}` to `[rank 0's assigned parameters on dev_{k-1}, rank 1's assigned parameters on dev_{k-1}, ...]`

I removed the invariant checker `_verify_same_param_device()` and its corresponding test since it is no longer an invariant.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61370

Test Plan: I added a new test `test_zero_model_parallel()` that checks for parity between a DDP model with model parallelism using `ZeroRedundancyOptimizer` and a local model with the same architecture using a local optimizer. I also verified that the existing tests still pass.

Reviewed By: soulitzer

Differential Revision: D29637132

Pulled By: andwgu

fbshipit-source-id: 07112959fa4e94a3f40e67e88cbb58ce3cd1e033
2021-07-09 14:27:47 -07:00
fb7ed24f6e [PyTorch] Try using ExclusivelyOwned in LinearAlgebra (#59420)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59420

This is a sample of how we might use ExclusivelyOwned on an opt-in basis.
ghstack-source-id: 133089540

Test Plan:
1) CI to run regression tests
2) Spot-checked assembly for linalg_det_out. Rather than calling the intrusive_ptr dtor, we get the ExclusivelyOwned dtor inline. In particular, we do not get any atomic refcount decrement instructions emitted.
3) TODO: some kind of perf profiling; advice welcome

Reviewed By: ezyang

Differential Revision: D28885313

fbshipit-source-id: ae4b39ed738c41d0c4a4509a5199c040ba9aa63a
2021-07-09 14:27:45 -07:00
a5c5b56cf5 gen ExclusivelyOwned in structured kernels (#59827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59827

ghstack-source-id: 133089541

Test Plan: existing CI

Reviewed By: ezyang, janeyx99

Differential Revision: D28965922

fbshipit-source-id: ffbc1d43e5d3ab3abfad3b0830b4da1ce899f505
2021-07-09 14:26:37 -07:00
711ded688d Add a script to codemod max_tokens_total pragmas to C/C++ files (#61369)
Summary:
This PR adds a new script: `max_tokens_pragmas.py`

This is a utility script that can add/remove `max_tokens_total` pragmas from the codebase.

- [x] Implement script and test manually
- [x] Write test script

Examples:
First, change directories
```bash
cd tools/linter/clang_tidy
```

Then run the following:
```bash
cat << EOF > test/test1.cpp
// File without any prior pragmas

int main() {
    for (int i = 0; i < 10; i++);
    return 0;
}
EOF

cat << EOF > test/test2.cpp
// File with prior pragmas

#pragma clang max_tokens_total 1

int main() {
    for (int i = 0; i < 10; i++);
    return 0;
}
EOF

cat << EOF > test/test3.cpp
// File with multiple prior pragmas

#pragma clang max_tokens_total 1

// Different pragma; script should ignore this
#pragma clang max_tokens_here 20

int main() {
    for (int i = 0; i < 10; i++);
    return 0;
}

#pragma clang max_tokens_total 1
EOF

# Add pragmas to some files
python3 max_tokens_pragma.py --num-max-tokens 42 test/*.cpp
grep "#pragma clang max_tokens_total 42" test/*.cpp

# Remove pragmas from files
python3 max_tokens_pragma.py --strip test/*.cpp
grep "#pragma clang max_tokens_total 42" test/*.cpp # should fail

# Ignore files
python3 max_tokens_pragma.py --num-max-tokens 42 test/*.cpp --ignores test/test2.cpp
grep "#pragma clang max_tokens_total 42" test/*.cpp # should not list `test/test2.cpp`
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61369

Test Plan: `tools/linter/clang_tidy/test/test_max_tokens_pragma.py`

Reviewed By: malfet

Differential Revision: D29604291

Pulled By: 1ntEgr8

fbshipit-source-id: 3efe52573583769041a07e6776161d4d5bbf16a7
2021-07-09 13:30:52 -07:00
3b004aed3a Enable local clang-tidy lint (#61121)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61121

This change enables the make target to run clang-tidy locally

Test Plan:
Run this command
```
make clang-tidy
```
This should run `clang-tidy` on the paths and filters specified in `tools/linter/clang_tidy/__main__.py`

Quicklint
```
make quicklint
```
This should report "No files detected" if no c/cpp files are altered.

Reviewed By: soulitzer

Differential Revision: D29598927

Pulled By: 1ntEgr8

fbshipit-source-id: aa443030494fed92c313da4b203a5450be09fa38
2021-07-09 13:30:50 -07:00
8296cb37c7 [torchelastic] Set the correct maximum border width
Summary: The diff sets the correct max border delimiters between error sections

Test Plan: Example of the uncontrolled border: https://www.internalfb.com/intern/testinfra/diagnostics/7599824415964133.844424970500348.1625590344/

Reviewed By: kiukchung

Differential Revision: D29636814

fbshipit-source-id: 95465d3150066bff82dc7499bb1c63ea4f5ebc2d
2021-07-09 13:29:23 -07:00
6bb33d93ab disable the format library in C10 (#60052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60052

Introduction:
We would like to use the minimal implementation of C10 for our our SGX port of pytorch. This would include disabling signal handlers and the fmt library.

Problem :
When C10_SUPPORTS_SIGNAL_HANDLER is disabled there is no reason to have fmt enabled as it is used only in stacktraceSignalHandler. The problem is that fmt/format.h is included regardless whether C10_SUPPORTS_SIGNAL_HANDLER is disabled or not.

Solution :
Move the #include <fmt/format.h> inside the #ifdef section of code where  C10_SUPPORTS_SIGNAL_HANDLER is checked.

Test Plan: Run the pytorch unit tests.

Reviewed By: h397wang, LiJihang

Differential Revision: D29022628

fbshipit-source-id: 638cf98381585cd6059129d9c5a65d9e6a841575
2021-07-09 12:28:19 -07:00
b01329b164 [xplat] Update XNNPACK to github revision 79cd5f9 (#61400)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61400

allow-large-files Update XNNPACK to github version 79cd5f9.

Test Plan:
Spark apps build works.

Hand tracking works:

https://pxl.cl/1L76g

Reviewed By: dreiss

Differential Revision: D29385882

fbshipit-source-id: 6be920a68b876faedf7e86e33df43f8b1db14a4d
2021-07-09 12:28:16 -07:00
86463a8d02 Save some little memory in default_collate (#61424)
Summary:
It can be a non-little save if there are many workers and a large batch size.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61424

Reviewed By: soulitzer

Differential Revision: D29635477

Pulled By: ejguan

fbshipit-source-id: 1fc48b5964e873bd8833ad81bed9d51b0b6d137e
2021-07-09 12:27:07 -07:00
c830db0265 Raise error in CMake for CUDA <9.2 (#61462)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61462

Anything before CUDA 9.2 is not supported (see https://github.com/pytorch/pytorch/pull/36848), and perhaps not even that.
ghstack-source-id: 133312018

Test Plan: CI

Reviewed By: samestep

Differential Revision: D29637251

fbshipit-source-id: 4300169b7298274b2074649342902a34bd2220b5
2021-07-09 11:28:38 -07:00
b5c464d5ef Make Future store weak pointers to storages (#60943)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60943

In https://github.com/pytorch/pytorch/pull/60470 we made Future store Storages rather than store references to their DataPtrs (because these references could go stale...). However this meant that the Future could keep the Storage alive, and thus keep its memory allocated, even after the user was done with it. We fix it here by instead storing a weak ptr to that Storage (well, in fact to the StorageImpl, but it's the same).
ghstack-source-id: 133295799

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D29454104

fbshipit-source-id: d36dee00a4841c087bb7b3f5bc39e0459f209cdb
2021-07-09 11:28:36 -07:00
962c9fbf85 [pruner] add handles for hooks (#61425)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61425

Adding handle for activation reconstruction and bias forward hooks so they can be removed later
ghstack-source-id: 133244536

Test Plan:
This change should not affect behavior yet, but to double check:

`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1LpM9

Reviewed By: z-a-f

Differential Revision: D29619720

fbshipit-source-id: c7428d2d0325cd11ce7919e0b67321e8cc196041
2021-07-09 11:28:35 -07:00
682ebc1dd1 remove UsageError in favor of ValueError (#61031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61031

See https://github.com/pytorch/pytorch/pull/58916#issuecomment-868519515.

Test Plan: Imported from OSS

Reviewed By: iramazanli

Differential Revision: D29626810

Pulled By: mruberry

fbshipit-source-id: 25ddf26815f9ef82b8234d7dac811a6a13a53c54
2021-07-09 11:28:33 -07:00
5401dd2f9a change language from array to tensor (#60639)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60639

Test Plan: Imported from OSS

Reviewed By: iramazanli

Differential Revision: D29626812

Pulled By: mruberry

fbshipit-source-id: 1b0e78426fd08d7b72d890adc9811d31afd805fe
2021-07-09 11:28:31 -07:00
09c90b3589 relax type equality constraint (#60638)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60638

Initial proposal in https://github.com/pytorch/pytorch/pull/58981#issuecomment-866690334. Opposed to the proposal, this PR only allows relaxing the type equality constraint to a common superclass constraint, for example `torch.Tensor` vs `torch.nn.Parameter`. Inputs that do not share a common superclass will still fail.

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D29626811

Pulled By: mruberry

fbshipit-source-id: 1916c3b710d38889de7ce57eb0770c76cbbb8166
2021-07-09 11:27:32 -07:00
24a8915534 Relax use-count check to allow for 0 (#61414)
Summary:
Previously we require tensor use count to be exactly 1. We should actually allow for use count to be zero as well. Use count is zero when an undefined tensor is returned, and this is common in backward functions that have multiple outputs.

In this PR I also remove some entries from the skip list that should be covered by this change: they return multiple tensors AND are backward functions. Batch norm is also known to return undefined tensors when `training=False`.

Related issue: https://github.com/pytorch/pytorch/issues/60426

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61414

Reviewed By: albanD

Differential Revision: D29614687

Pulled By: soulitzer

fbshipit-source-id: ab0892aed4bd1346b50b0a9552ffcc3287ac96af
2021-07-09 10:28:12 -07:00
9e533a62f6 Make conv2d nnapi converter accept flexible batch (#61021)
Summary:
Same as title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61021

Test Plan: pytest test/test_nnapi.py::TestNNAPI

Reviewed By: anshuljain1

Differential Revision: D29480746

fbshipit-source-id: 7217c8f3a811db8c3c373f3e7ca31caf9502ef22
2021-07-09 10:28:10 -07:00
64d61901eb [ROCm] Skip test_masked_scatter_large_tensor_cuda (#61313)
Summary:
Refer https://github.com/pytorch/pytorch/issues/60190. Skipping unit test until hipcub issue is fixed.

Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61313

Reviewed By: iramazanli

Differential Revision: D29626664

Pulled By: malfet

fbshipit-source-id: db2a390d2a3e28ec05a5032a50aa9a35c86b96ca
2021-07-09 10:27:08 -07:00
ee2dd35ef4 Resolving native dependency and try_run for cross compile (#59764)
Summary:
This is a PR on build system that provides support for cross compiling on Jetson platforms.

The major change is:

1. Disable try runs for cross compiling in `COMPILER_WORKS`, `BLAS`, and `CUDA`. They will not be able to perform try run on a cross compile setup

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59764

Reviewed By: soulitzer

Differential Revision: D29524363

Pulled By: malfet

fbshipit-source-id: f06d1ad30b704c9a17d77db686c65c0754db07b8
2021-07-09 09:29:21 -07:00
8bd3e52e00 Add conv2d transpose NNAPI converter (#59529)
Summary:
* Conv2d transpose support
* Quantize WIP

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59529

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_conv2d_transpose

Reviewed By: anshuljain1

Differential Revision: D28926335

fbshipit-source-id: 8f90182f96cee0a13c4f38331d421e1e8ac618de
2021-07-09 09:29:20 -07:00
c19adfff54 [DataLoader] Introduce ConcatMapDataPipe functional datapipe (#61010)
Summary:
As part of https://github.com/pytorch/pytorch/issues/57031, this PR adds the ConcatMapDataPipe functional datapipe for the MapDataPipe class.

We may need to discuss how to treat the datapipes with no valid length. For now, I just use them as if they have infinite length and the `__getitem__` could not go pass them.

Thank you for your time on reviewing this~

cc ejguan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61010

Reviewed By: soulitzer

Differential Revision: D29587679

Pulled By: ejguan

fbshipit-source-id: 5eb97fa727209bec6c534520057c64a78000626e
2021-07-09 09:29:18 -07:00
2bbcc80de3 Enable disabling test cases on specific platforms (#61427)
Summary:
This adds functionality to our common_utils.py to allow disabling test cases for platforms Mac, Windows, and Linux.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61427

Test Plan:
CI should not change as no issues currently have the line "Platforms:..."

I tested locally by making sure `test_async_script` is skipped while running `python test/test_jit.py -k TestAsync.test_async_script` with a cached modified `.pytorch-disabled-tests.json`:
```
{
  "total_count": 32,
  "incomplete_results": false,
  "items": [
    {
      "url": "https://api.github.com/repos/pytorch/pytorch/issues/60652",
      "repository_url": "https://api.github.com/repos/pytorch/pytorch",
      "labels_url": "https://api.github.com/repos/pytorch/pytorch/issues/60652/labels{/name}",
      "comments_url": "https://api.github.com/repos/pytorch/pytorch/issues/60652/comments",
      "events_url": "https://api.github.com/repos/pytorch/pytorch/issues/60652/events",
      "html_url": "https://github.com/pytorch/pytorch/issues/60652",
      "id": 929288995,
      "node_id": "MDU6SXNzdWU5MjkyODg5OTU=",
      "number": 60652,
      "title": "DISABLED test_async_script (jit.test_async.TestAsync)",
      "user": {
        "login": "ezyang",
        "id": 13564,
        "node_id": "MDQ6VXNlcjEzNTY0",
        "avatar_url": "https://avatars.githubusercontent.com/u/13564?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/ezyang",
        "html_url": "https://github.com/ezyang",
        "followers_url": "https://api.github.com/users/ezyang/followers",
        "following_url": "https://api.github.com/users/ezyang/following{/other_user}",
        "gists_url": "https://api.github.com/users/ezyang/gists{/gist_id}",
        "starred_url": "https://api.github.com/users/ezyang/starred{/owner}{/repo}",
        "subscriptions_url": "https://api.github.com/users/ezyang/subscriptions",
        "organizations_url": "https://api.github.com/users/ezyang/orgs",
        "repos_url": "https://api.github.com/users/ezyang/repos",
        "events_url": "https://api.github.com/users/ezyang/events{/privacy}",
        "received_events_url": "https://api.github.com/users/ezyang/received_events",
        "type": "User",
        "site_admin": false
      },
      "labels": [
        {
          "id": 1301397902,
          "node_id": "MDU6TGFiZWwxMzAxMzk3OTAy",
          "url": "https://api.github.com/repos/pytorch/pytorch/labels/module:%20flaky-tests",
          "name": "module: flaky-tests",
          "color": "f7e101",
          "default": false,
          "description": "Problem is a flaky test in CI"
        },
        {
          "id": 679953883,
          "node_id": "MDU6TGFiZWw2Nzk5NTM4ODM=",
          "url": "https://api.github.com/repos/pytorch/pytorch/labels/oncall:%20distributed",
          "name": "oncall: distributed",
          "color": "f7e101",
          "default": false,
          "description": "Add this issue/PR to distributed oncall triage queue"
        }
      ],
      "state": "open",
      "locked": false,
      "assignee": {
        "login": "rohan-varma",
        "id": 8039770,
        "node_id": "MDQ6VXNlcjgwMzk3NzA=",
        "avatar_url": "https://avatars.githubusercontent.com/u/8039770?v=4",
        "gravatar_id": "",
        "url": "https://api.github.com/users/rohan-varma",
        "html_url": "https://github.com/rohan-varma",
        "followers_url": "https://api.github.com/users/rohan-varma/followers",
        "following_url": "https://api.github.com/users/rohan-varma/following{/other_user}",
        "gists_url": "https://api.github.com/users/rohan-varma/gists{/gist_id}",
        "starred_url": "https://api.github.com/users/rohan-varma/starred{/owner}{/repo}",
        "subscriptions_url": "https://api.github.com/users/rohan-varma/subscriptions",
        "organizations_url": "https://api.github.com/users/rohan-varma/orgs",
        "repos_url": "https://api.github.com/users/rohan-varma/repos",
        "events_url": "https://api.github.com/users/rohan-varma/events{/privacy}",
        "received_events_url": "https://api.github.com/users/rohan-varma/received_events",
        "type": "User",
        "site_admin": false
      },
      "assignees": [
        {
          "login": "rohan-varma",
          "id": 8039770,
          "node_id": "MDQ6VXNlcjgwMzk3NzA=",
          "avatar_url": "https://avatars.githubusercontent.com/u/8039770?v=4",
          "gravatar_id": "",
          "url": "https://api.github.com/users/rohan-varma",
          "html_url": "https://github.com/rohan-varma",
          "followers_url": "https://api.github.com/users/rohan-varma/followers",
          "following_url": "https://api.github.com/users/rohan-varma/following{/other_user}",
          "gists_url": "https://api.github.com/users/rohan-varma/gists{/gist_id}",
          "starred_url": "https://api.github.com/users/rohan-varma/starred{/owner}{/repo}",
          "subscriptions_url": "https://api.github.com/users/rohan-varma/subscriptions",
          "organizations_url": "https://api.github.com/users/rohan-varma/orgs",
          "repos_url": "https://api.github.com/users/rohan-varma/repos",
          "events_url": "https://api.github.com/users/rohan-varma/events{/privacy}",
          "received_events_url": "https://api.github.com/users/rohan-varma/received_events",
          "type": "User",
          "site_admin": false
        }
      ],
      "milestone": null,
      "comments": 0,
      "created_at": "2021-06-24T14:28:33Z",
      "updated_at": "2021-06-24T16:40:42Z",
      "closed_at": null,
      "author_association": "CONTRIBUTOR",
      "active_lock_reason": null,
      "body": "Platforms:Mac, windows, Linux\r\n```\r\nJun 24 00:59:14 ======================================================================\r\nJun 24 00:59:14 ERROR [0.477s]: test_async_script (__main__.ProcessGroupGlooWrapperTest)\r\nJun 24 00:59:14 ----------------------------------------------------------------------\r\nJun 24 00:59:14 Traceback (most recent call last):\r\nJun 24 00:59:14   File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 398, in wrapper\r\nJun 24 00:59:14     self._join_processes(fn)\r\nJun 24 00:59:14   File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 590, in _join_processes\r\nJun 24 00:59:14     self._check_return_codes(elapsed_time)\r\nJun 24 00:59:14   File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 633, in _check_return_codes\r\nJun 24 00:59:14     raise RuntimeError(error)\r\nJun 24 00:59:14 RuntimeError: Process 0 exited with error code 10 and exception:\r\nJun 24 00:59:14 RuntimeError: [/var/lib/jenkins/workspace/third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [172.17.0.2]:21400\r\nJun 24 00:59:14 \r\nJun 24 00:59:14 During handling of the above exception, another exception occurred:\r\nJun 24 00:59:14 \r\nJun 24 00:59:14 Traceback (most recent call last):\r\nJun 24 00:59:14   File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 516, in run_test\r\nJun 24 00:59:14     getattr(self, test_name)()\r\nJun 24 00:59:14   File \"/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py\", line 400, in wrapper\r\nJun 24 00:59:14     fn()\r\nJun 24 00:59:14   File \"distributed/test_pg_wrapper.py\", line 270, in test_collective_hang\r\nJun 24 00:59:14     self._test_collective_hang(pg)\r\nJun 24 00:59:14   File \"distributed/test_pg_wrapper.py\", line 52, in _test_collective_hang\r\nJun 24 00:59:14     wrapper_pg.allreduce([tensor])\r\nJun 24 00:59:14   File \"/opt/conda/lib/python3.6/unittest/case.py\", line 217, in __exit__\r\nJun 24 00:59:14     expected_regex.pattern, str(exc_value)))\r\nJun 24 00:59:14   File \"/opt/conda/lib/python3.6/unittest/case.py\", line 135, in _raiseFailure\r\nJun 24 00:59:14     raise self.test_case.failureException(msg)\r\nJun 24 00:59:14 AssertionError: \"Ranks 1 failed to pass monitoredBarrier\" does not match \"[/var/lib/jenkins/workspace/third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [172.17.0.2]:21400\"\r\n```\r\n\r\nhttps://www.internalfb.com/intern/opensource/ci/job/log/225221175921058/\n\ncc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23",
      "performed_via_github_app": null,
      "score": 0.0
    }
  ]
}
```

Reviewed By: iramazanli

Differential Revision: D29627799

Pulled By: janeyx99

fbshipit-source-id: 5ef79127cbe0055c4f41766048e66f98cf80d2c4
2021-07-09 09:29:16 -07:00
e9a40de1af Add other Linux GPU auxiliary test jobs (#61055)
Summary:
- [x] add the jobs to the matrix
  - [x] `jit_legacy`
  - [x] `nogpu_NO_AVX`
  - [x] `nogpu_NO_AVX2`
  - [x] `slow`
- [x] use the test config properly to enable the different test conditions
- [x] validate that it works
- [x] disable on pull requests before merging

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61055

Test Plan: CI. Example run: https://github.com/pytorch/pytorch/actions/runs/1013240987

Reviewed By: walterddr

Differential Revision: D29594080

Pulled By: samestep

fbshipit-source-id: 02c531ebc42feae81ecaea0785915f95e0f53ed7
2021-07-09 09:29:15 -07:00
c966ce6933 Fix several test_ops cuda dtypes tests (#60922)
Summary:
Close https://github.com/pytorch/pytorch/issues/60443

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60922

Reviewed By: jdonald, iramazanli

Differential Revision: D29630122

Pulled By: mruberry

fbshipit-source-id: 441f79828860282e5849a2565facf9e7f72912e8
2021-07-09 09:29:13 -07:00
5e9bcf9101 fix: support removing hook in the hook (#61250)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/58354

Problem:
Once a hook is called
05c1e5b655/torch/csrc/autograd/python_hook.cpp (L51-L54)

If the hook has `handle.remove()` while executing and if there are no references to the hook function object then `python` is free to garbage collect.

At the subsequent call to
05c1e5b655/torch/csrc/autograd/python_hook.cpp (L54)

we have `hook` pointing to invalid memory

Thus when we try to fetch the name for `hook` from `check_single_result` with
05c1e5b655/torch/csrc/autograd/python_hook.cpp (L175-L177)
we get segfault.

Solution:
Temporarily increase the life-time of hook with `Py_INCREF` till we have verified the result.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61250

Reviewed By: iramazanli

Differential Revision: D29623826

Pulled By: soulitzer

fbshipit-source-id: c71322311f19066cafb7203980668868c59d4e5e
2021-07-09 09:27:58 -07:00
179249084b Refactor DDP join() API, adding hooks (#60757)
Summary:
Targets https://github.com/pytorch/pytorch/issues/54318.

**Overview:**
DDP offers a `join()` context manager to accommodate training on uneven inputs. This creates a new generic `_Join()` API permitting custom hooks, refactors DDP `join()` to call this generic `_Join()`, and implements a hook for ZeRO. (For now, the generic `_Join()` is implemented as private, but this may change after design discussions are cleared.)

There are two classes introduced: `_JoinHook`, the class defining the customizable join hook, and `_Join`, the generic join context manager.

The `_JoinHook` provides two entry points: `main_hook()`, which is called repeatedly while there exists a non-joined process, and `post_hook()`, which is called once all process have joined with the additional `bool` argument `is_last_joiner`. The class also requires `process_group` and `device` information by defining corresponding abstract property methods. Thus, to implement a join hook, (1) inherit from `_JoinHook`, (2) override `main_hook()` and `post_hook()` as appropriate, and (3) override `process_group()` and `device()` to provide process group and device information to be used by the join context manager implementation for collective communications.

The `_Join` constructor requires `join_hooks: List[_JoinHook]` and optionally `enable: bool = True` and `throw_on_early_termination: bool = False`. A training loop only needs to be wrapped with `with _Join(join_hooks):` (using the appropriate `join_hooks`) to be able to train on uneven inputs without hanging/erroring. The context manager requires a `dist.all_reduce(torch.ones(1))` to be called on every non-joined process each time before it performs its collective communications in order to indicate that the process has not yet joined. It also requires that all `process_group` attributes in the `_JoinHook` objects are the same.

**Notes:**
- The argument `is_last_joiner` to `post_hook()` may be useful for finding an authoritative rank when synchronizing.
- `enable` is a flag that can be set to `False` if the user knows the current training loop will not have uneven inputs. This may be used to disable join-related computation in  the classes providing join hooks.
- `throw_on_early_termination` is a flag that can be set to `True` to notify processes to terminate upon detecting uneven inputs (i.e. upon the first process joining when there exists a non-joined process). Notably, the notification requires an all-reduce, so to prevent hanging/erroring, non-joined process must participate in the all-reduce. The first-joining process raises a `RuntimeError`, and the other processes are expected (but not required) to do the same. This may be used to implement training on uneven inputs in cases that do not conform to the generic join context manager (e.g. `SyncBatchNorm`).
- Classes providing a join hook should do so via a `_join_hook()` method that returns a `_JoinHook` instance with the methods appropriately overridden.
- If there are multiple join hooks, the device specified by the first is used by the join context manager implementation to perform its collective communications.
- If there are multiple join hooks, both the main and post-hooks are iterated in the order in which the `_JoinHook` objects are passed into the context manager constructor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60757

Test Plan:
The current implementation preserves backward compatibility by not changing the existing DDP `join()` API at all. To check this, I ran through the uneven input tests (`test_ddp_grad_div_uneven_inputs`, `test_ddp_uneven_inputs_stop_iteration_sync_bn`, `test_ddp_uneven_inputs`, `test_ddp_uneven_input_join_disable`, `test_ddp_uneven_input_exception`) on the AI AWS cluster:
```
touch /tmp/barrier && TEMP_DIR="/tmp" BACKEND="nccl" WORLD_SIZE="2" gpurun python test/distributed/test_distributed_fork.py --
```

Because the existing DDP join logic does not provide correct gradients to the joined processes if `gradient_as_bucket_view=False` and a joined process requires those gradients to correctly update its shard of the parameters in `ZeroRedundancyOptimizer.step()`, DDP and ZeRO are not fully compatible at the moment. To work around this and to test ZeRO's join hook separately, I added a test `_test_zero_join()` (with `test_zero_join_gpu()` and `test_zero_join_cpu()` flavors), which compares DDP with a local optimizer on uneven inputs against ZeRO on uneven inputs with the gradients set manually.

Reviewed By: iramazanli, mrshenli

Differential Revision: D29624636

Pulled By: andwgu

fbshipit-source-id: ec70a290e02518b0d8b683f9fed2126705b896c7
2021-07-09 08:29:20 -07:00
8423ab4f99 Fix CosineAnnealingWarmRestart annotation (#61106)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/44770.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61106

Reviewed By: 1ntEgr8

Differential Revision: D29635764

Pulled By: walterddr

fbshipit-source-id: ddc45a7f04532a76d033ae7774706da1fa8608f7
2021-07-09 08:28:18 -07:00
9b908ab0d0 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D29631829

fbshipit-source-id: 6cef1a3a091bdf0e10838d05b2e82fc0760ebe48
2021-07-09 05:28:44 -07:00
819bac63ff [Codemod][FBSourceBlackLinter] Daily arc lint --take BLACK
Reviewed By: zertosh

Differential Revision: D29632524

fbshipit-source-id: 3eccc1804a7bf953480b9754f68ea56a2a8e3fd8
2021-07-09 05:27:29 -07:00
14f63763c1 Avoid using mp.Manager to report #GPUs needed in dist tests (#61409)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61409

We used a multiprocessing.Manager in order to share TEST_SKIPS between the parent and the child processes. TEST_SKIPS is a global variable that defines a unique error code for each "error type", so that the parent can figure out the reason a child exited. While originally this mapping was immutable, at some point we allowed children to modify the parent's value of that mapping so they could update the message for the `multi-gpu` error to make it reflect how many GPUs were really needed. This occurred in D23285790 (2a4d312027). Since then this Manager proved to be quite problematic, especially around thread safety, races, TSAN, ... (see D22753459 (f0c46878c6), D23641618 (567c51cce9), D28490129, D28794321 (0128eb9a85) and D29585862). This seems like an awful lot of trouble for such a small functionality. Here I propose we drop Manager and instead get the same result by using separate error codes for each number of GPUs. It should be much simpler and thus more robust.
ghstack-source-id: 133236447

Test Plan: CI

Reviewed By: pritamdamania87

Differential Revision: D29612614

fbshipit-source-id: 8ad0fedcb7796e5832a0eb196f8fdc147e02b3df
2021-07-09 01:29:35 -07:00
905cd6733e [DDP Comm Hook] Re-enable the optimization of fusing copy and division when no comm hook is specified (#61379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61379

The optimization was accidentally removed in https://github.com/pytorch/pytorch/pull/59574

This optimization can help save a scan over all the input parameters, by fusing copy and div operations.

Now the default temporary hook is allreduce by sum, and no extra division is done inside the hook.
ghstack-source-id: 133288529

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_accumulate_gradients_no_sync
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_grad_div_uneven_inputs
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_grad_is_view
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork --  test_DistributedDataParallel_non_default_stream

buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_sparse_gradient

buck test mode/dev-nosan caffe2/test/distributed:c10 -- test_ddp_checkpointing_once
buck test mode/dev-nosan caffe2/test/distributed:c10 -- test_ddp_checkpointing_twice

Reviewed By: rohan-varma

Differential Revision: D29597614

fbshipit-source-id: 2434e4fd4e6abad7871cfe47886fe97b6e4ba28f
2021-07-09 01:29:33 -07:00
8f61d94610 Fix a variable initialization (#60896)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60896

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29431625

fbshipit-source-id: 076d5ed350507b3ab1f14c1a5c7700de0427eefc
2021-07-09 01:29:31 -07:00
15010bf223 Make some downcast issues explicit (#60412)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60412

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29243195

fbshipit-source-id: c508b729d6a0e6f8a591521bce788e6cfd8531f8
2021-07-09 01:29:29 -07:00
6a3170dba1 [package] minor cleanups to internal APIs (#61428)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61428

I was reading this code again after a while and didn't understand as
quickly as I would have liked. Some of the function names are no longer
accurate, etc.

This PR renames these functions to be in the same language of
"dependencies" that the rest of the API uses. I think the resulting
usage of the APIs is more clear than before

Test Plan: Imported from OSS

Reviewed By: Chillee

Differential Revision: D29620946

Pulled By: suo

fbshipit-source-id: 7df640a7ffbd43998063b9ee3955c9dfcbc42cfb
2021-07-09 01:28:24 -07:00
d52ebf2b1b conv2d (#61093)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61093

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D29562478

Pulled By: migeed-z

fbshipit-source-id: d41f3a9526ee52a9571cb861be03bf9ae176a373
2021-07-08 20:29:32 -07:00
5fbc853c5f [package] PackageExporter remove verbose mode (#61145)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61145

Remove 'verbose' mode from PackageExporter as people have complained that it is not useful.

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D29559681

Pulled By: Lilyjjo

fbshipit-source-id: eadb1a3a25fadc64119334a09bf1fa4b355b1edd
2021-07-08 18:26:43 -07:00
a74516d699 [static runtime] implement aten::log (#61393)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61393

Test Plan:
Added `StaticRuntime.IndividualOps_Log`

```
...
[ RUN      ] StaticRuntime.IndividualOps_Log
V0701 12:10:50.829100 3708165 impl.cpp:455] StaticModuleOptions: cleanup_activations 1, enable_out_variant 1, optimize_memory1, optimize_graph_output_memory0
V0701 12:10:50.888468 3708165 impl.cpp:1279] Switch to out variant for node: %3 : Tensor = aten::log(%inp.1)
V0701 12:10:50.889098 3708165 impl.cpp:1279] Switch to out variant for node: %a.1 : Tensor = aten::clone(%3, %2)
```

Reviewed By: hlu1

Differential Revision: D29511622

fbshipit-source-id: 819fd7d90c084609a060efeadb3015e35acac517
2021-07-08 18:25:35 -07:00
06dfaadfc6 update internal function names that apply to both cpu and cuda (#59701)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59701

These functions have been updated to work for cpu and cuda, their names are now changed to reflect that

quantize_per_channel_cpu -> quantize_per_channel
dequantize_quantized_cpu -> dequantize_quantized

(Note: this ignores all push blocking failures!)

Test Plan:
python test/test_quantization.py TestQuantizedTensor

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D29018270

fbshipit-source-id: 3a0da8d5e3f357dcf19119bcdbc6172d41f2b0c1
2021-07-08 17:26:25 -07:00
8726f08e15 [ONNX] Update documentation (#58712) (#60249)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60249

* Add introductory paragraph explaining what ONNX is and what the
  torch.onnx module does.
* In "Tracing vs Scripting" and doc-string for torch.onnx.export(),
  clarify that exporting always happens on ScriptModules and that
  tracing and scripting are the two ways to produce a ScriptModule.
* Remove examples of using Caffe2 to run exported models.
  Caffe2's website says it's deprecated, so it's probably best not to
  encourage people to use it by including it in examples.
* Remove a lot of content that's redundant:
  * The example of how to mix tracing and scripting, and instead
    link to Introduction to TorchScript, which includes very similar
    content.
  * "Type annotations" section. Link to TorchScript docs which explain
    that in more detail.
  * "Using dictionaries to handle Named Arguments as model inputs"
    section. It's redundant with the description of the `args` argument
    to `export()`, which appears on the same page once the HTML
    is generated.
  * Remove the list of supported Tensor indexing patterns. If it's not
    in the list of unsupported patterns, users can assume it's
    supported, so having both is redundant.
  * Remove the list of supported operators and models.
    I think the list of supported operators is not very useful.
    A list of supported model architectures may be useful, but in
    reality it's already very out of date. We should add it back if
    / when we have a system for keeping it up to date.
  * "Operator Export Type" section. It's redundant with the description
    of the `operator_export_type` arg to to `export()`, which appears on
    the same page once the HTML is generated.
  * "Use external data format" section. It's redundant with the
    description of the `use_external_data_format` arg to `export()`.
  * "Training" section.  It's redundant with the
    description of the `training` arg to `export()`.
* Move the content about different operator implementations producing
  different results from the "Limitations" section into the doc for the
  `operator_export_type` arg.
* Document "quantized" -> "caffe2" behavior of
  OperatorExportTypes.ONNX_ATEN_FALLBACK.
* Combing the text about using torch.Tensor.item() and the text about
  using NumPy types into a section titled
  "Avoid NumPy and built-in Python types", since they're both
  fundamentally about the same issue.
* Rename "Write PyTorch model in Torch way" to "Avoiding Pitfalls".
* Lots of minor fixes: spelling, grammar, brevity, fixing links, adding
  links.
* Clarify limitation on input and output types. Phrasing it in terms of
  PyTorch types is much more accessible than in terms of TorchScript
  types. Also clarify what actually happens when dict and str are used
  as inputs and outputs.
* In Supported operators, use torch function and class names and link
  to them. This is more user friendly than using the internal aten
  op names.
* Remove references to VariableType.h, which doesn't appear to contain
  the information that it once did. Instead refer to the generated
  .pyi files.
* Remove the text in the FAQ about appending to lists within loops.
  I think this limitation is no longer present
  (perhaps since https://github.com/pytorch/pytorch/pull/51577).
* Minor fixes to some code I read along the way.
* Explain the current rationale for the weird ::prim_PythonOp op name.

Test Plan: Imported from OSS

Reviewed By: zou3519, ZolotukhinM

Differential Revision: D29494912

Pulled By: SplitInfinity

fbshipit-source-id: 7756c010b2320de0692369289604403d28877719

Co-authored-by: Gary Miguel <garymiguel@microsoft.com>
2021-07-08 16:29:32 -07:00
00b0d826a1 [ONNX] shape type inference fixes for control flow (#59319) (#60248)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60248

* ~~Allow shape inference to skip for blocks by checking unsupported cases recursively. Currently onnx::Identity would trigger a shape inference failure.~~ Fixed in onnx submodule 1.9.
* Remove previous special post process for if op, since that was for constant folding, and now it is handled elsewhere. Update new post process for control flow nodes to copy assign node shape from subblock output shape correctly.

Test Plan: Imported from OSS

Reviewed By: zou3519, ZolotukhinM

Differential Revision: D29494913

Pulled By: SplitInfinity

fbshipit-source-id: de274a388df86e86403981e1b89b8b4a0d1e26d1

Co-authored-by: BowenBao <bowbao@microsoft.com>
2021-07-08 16:29:30 -07:00
81f95cce59 [ONNX] Extend chunk for dynamic chunk values (#59644) (#60247)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60247

Related to #42785

Test Plan: Imported from OSS

Reviewed By: zou3519, ZolotukhinM

Differential Revision: D29494914

Pulled By: SplitInfinity

fbshipit-source-id: 51ddb876d00185e59cfe54a8af5a9c8dd073a09f

Co-authored-by: Shubham Bhokare <shubhambhokare@gmail.com>
2021-07-08 16:29:28 -07:00
d9dc94406f [ONNX] Add linspace symbolic (#58854) (#60246)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60246

* Adds support for linspace op
* Modifies arange symbolic in opset 9 to replicate the same behavior in which dtype is determined (similar to opset 11) as in https://pytorch.org/docs/stable/generated/torch.arange.html
* Enabled some arange unit tests which were disabled for opset 9

Test Plan: Imported from OSS

Reviewed By: zou3519, ZolotukhinM

Differential Revision: D29494911

Pulled By: SplitInfinity

fbshipit-source-id: bddff18a90f8a78121c8ecdd1dafc15c69962d66

Co-authored-by: Shubham Bhokare <shubhambhokare@gmail.com>
2021-07-08 16:29:26 -07:00
4ccfa3ffeb [ONNX] Fix sum export with attribute keepdims (#59316) (#60245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60245

Fix after b9bdb07a0261ab5a0b1038f290fa03af6ce0415f. Improving previous fix on two aspects
* Not only checks 0 on first dimension for empty tensor.
* Do not assume empty tensor when shape is not accessible.

Test Plan: Imported from OSS

Reviewed By: zou3519, ZolotukhinM

Differential Revision: D29494917

Pulled By: SplitInfinity

fbshipit-source-id: 02587c3c3be0510312c1a1959f28cab12d81812d

Co-authored-by: BowenBao <bowbao@microsoft.com>
2021-07-08 16:29:24 -07:00
95a7f3ccfe [ONNX] Fix shape inference for large model (#59320) (#60244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60244

Do 2GB size check for protocol buffer serialization at a later time, to avoid false alarming for cases like shape inference where no serialization actually happens.

Test Plan: Imported from OSS

Reviewed By: zou3519, ZolotukhinM

Differential Revision: D29494910

Pulled By: SplitInfinity

fbshipit-source-id: 4c36d26de9a94e5d6cf78f332d4dffc46588ebf0

Co-authored-by: BowenBao <bowbao@microsoft.com>
2021-07-08 16:29:22 -07:00
9636c077c3 [ONNX] Handle onnx::Size in ComputeConstant folding (#59122) (#60243)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60243

Handle onnx::Size in ComputeConstant folding

Test Plan: Imported from OSS

Reviewed By: zou3519, ZolotukhinM

Differential Revision: D29494915

Pulled By: SplitInfinity

fbshipit-source-id: 9782e356f5e36ae1dd2819412f970010360e9cc0

Co-authored-by: jiafatom <jiafa@microsoft.com>
2021-07-08 16:29:21 -07:00
38c48e42c6 [Reland][BE] add test wall time report (#61389)
Summary:
This is a reland of https://github.com/pytorch/pytorch/issues/61322.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61389

Reviewed By: malfet

Differential Revision: D29601573

Pulled By: walterddr

fbshipit-source-id: dfb2bdc7d72d493c01b9dbac50ef9b79c1782054
2021-07-08 16:29:19 -07:00
7481c6fc02 Bump googletest version to v1.11.0 (#61395)
Summary:
This PR bumps the `googletest` version to v1.11.0.

To facilitate this change, `CAFFE2_ASAN_FLAG` and `CAFFE2_TSAN_FLAG` are divided into corresponding compiler and linker variants. This is required because `googletest v1.11.0` sets the `-Werror` flag. The `-pie` flag is a linker flag, and passing it to a compiler invocation results in a `-Wunused-command-line-argument` warning, which in turn will cause `googletest` to fail to build with ASAN.

Fixes https://github.com/pytorch/pytorch/issues/60865

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61395

Reviewed By: iramazanli

Differential Revision: D29620970

Pulled By: 1ntEgr8

fbshipit-source-id: cdb1d3d12e0fff834c2e62971e42c03f8c3fbf1b
2021-07-08 16:29:17 -07:00
13658b10bb [torch] Various improvements to torch.distributed.launch and torch.distributed.run (#61294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61294

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60925

* Make `torch.distributed.launch` restarts to 0
* Remove unnecessary `-use_env` warning, move `-use_env` warnings
* Move `-use_env` warnings to `torch.distributed.launch`
* Make default log level WARNING
* Add new doc section around transitioning to `torch.distributed.run`
* Make `torch.distributed.launch` not use error-propagation
* Set default events handler to `null` that does not print events to console
* Add reference from `torch.distributed.launch` to `torch.distributed.run`
* Set correct preexec function that sends SIGTERM to child processes when parent dies

Issues resolved:

https://github.com/pytorch/pytorch/issues/60716
https://github.com/pytorch/pytorch/issues/60754

Test Plan:
sandcastle

    python -m torch.distributed.launch --nproc_per_node 2 main.py -> uses 0 restarts
    python -m torch.distributed.run --nproc_per_node 2 main.py -> uses default for torchelastic, 0 restarts

    python -m torch.distributed.launch --nproc_per_node=4  --use_env --no_python  main.py -> produces error
    python -m torch.distributed.launch --nproc_per_node=4  --use_env main.py -> no warning
    python -m torch.distributed.launch --nproc_per_node=4  --no_python  main.py ->warning

Output of running torch.distributed.launch without --use_env:

    $path/torch/distributed/launch.py:173: FutureWarning: The module torch.distributed.launch is deprecated
    and will be removed in future. Use torch.distributed.run.
    Note that --use_env is set by default in torch.distributed.run.
    If your script expects `--local_rank` argument to be set, please
    change it to read from `os.environ('LOCAL_RANK')` instead.

New section:

{F628923078}

{F628974089}

Reviewed By: cbalioglu

Differential Revision: D29559553

fbshipit-source-id: 03ed9ba638bf154354e1530ffc964688431edf6b
2021-07-08 16:28:06 -07:00
10f372601d Support RRefs that contain torch.cuda.Event (#61354)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61354

Test Plan: Imported from OSS

Reviewed By: iramazanli

Differential Revision: D29617155

Pulled By: pbelevich

fbshipit-source-id: 6e56b3fd0a0f93ecec048b58c90f2a47b4cba688
2021-07-08 15:33:08 -07:00
8bc2ba3fe3 detect missing kernels from external backends in codegen (#60737)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60737

Test Plan: Imported from OSS

Reviewed By: ezyang, jdonald

Differential Revision: D29392615

Pulled By: bdhirsh

fbshipit-source-id: d49d013243dbc8c8b55fbdb0b9b3eed38df52255
2021-07-08 15:33:04 -07:00
7318747a3b move all external kernels into a class for better compiler error messages (#59839)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59839

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D29047680

Pulled By: bdhirsh

fbshipit-source-id: 18cf4124be440a0a343b5983e1a4165db808e7c1
2021-07-08 15:31:02 -07:00
86eac5b456 [caffe2] Check for number of created subnets and optionally throw an error (#57366)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57366

We often get error messages such as
```
Model failed AOT (glow ahead-of-time compilation) with exception: Error during AOT optimization (non-provisioned addNetwork):
Non-recoverable device error when adding network:
Error code: PARTITIONER_ERROR
Error message: Did not find a partition with an SLS node

Error return stack:
--------------------------------------------------------------------------------
glow/glow/lib/Partitioner/Partitioner.cpp:1244
--------------------------------------------------------------------------------
glow/glow/lib/Runtime/HostManager/HostManager.cpp:375
--------------------------------------------------------------------------------
```
This makes the error message more clear by checking for the number of OnnixifiOp created before going into Glow. The check is enabled with the `verify_only_single_subnet` flag, and is disabled by default.

Test Plan: Unit tests pass

Reviewed By: khabinov

Differential Revision: D28097674

fbshipit-source-id: 0eefd8f6ec1a82546b759be8e541256bf271a673
2021-07-08 14:29:03 -07:00
0fc110cdd1 [CUDA graphs] Don't sync between replays for cuda driver version 11.4+ (#61063)
Summary:
The bug in libcuda.so that required https://github.com/pytorch/pytorch/pull/57556 is fixed for libcuda.so versions >= 11.4.

This PR changes replay() to sync after each launch only if the process's in-use libcuda.so is < 11.4.

With all the "enhanced" and "forward" compatibility promises flying around, and the fact that "driver" sometimes means kernel-mode driver and sometimes means user-mode driver (libcuda.so), I wasn't sure if this PR's check suffices to trigger the sync iff the in-use libcuda.so is < 11.4, but Cuda people say what I wrote is reasonable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61063

Reviewed By: mruberry

Differential Revision: D29600907

Pulled By: ngimel

fbshipit-source-id: 71bf0bcbde43091e29f3812440abeb7a95d161e2
2021-07-08 13:26:07 -07:00
80797d03e0 Simplify lambda syntax in SegmentReduce.cpp (#61416)
Summary:
Fixes Windows build by dismantling a combination nested lambdas+preprocessor magic into explicit templates

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61416

Reviewed By: pbelevich

Differential Revision: D29616449

Pulled By: malfet

fbshipit-source-id: 687ef73b8b37bc272f82d44fc690448e403e3a0c
2021-07-08 12:30:35 -07:00
cdc027679b Add compare_set in distributed docs (#61351)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61351

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29588206

Pulled By: H-Huang

fbshipit-source-id: 9db48e7b6de29503275f10616470ad2d66b075f9
2021-07-08 12:30:32 -07:00
f01a4e3b02 .github: Ensure build-results per job is unique (#61005)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61005

build-results have the potential to be tainted between jobs since runs
are not ephemeral

Signed-off-by: Eli Uriegas <seemethere101@gmail.com>

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D29526747

Pulled By: seemethere

fbshipit-source-id: f8c5bc5f647b771a059cbe380d694ce6dc535ae4
2021-07-08 12:30:28 -07:00
4beb5f9ad6 [DDP Comm Hook] Fix some comments (#61376)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61376

After SPMD is retired, the API of `get_tensors` becomes `get_tensor`. Fix some comments that refer to the obsolete API.

The `allreduce` hook example does not do division inside, which actually is incorrect.
ghstack-source-id: 133174272

Test Plan: N/A

Reviewed By: rohan-varma

Differential Revision: D29596857

fbshipit-source-id: 2046b185225cd6d1d104907b5f9b4009b6e87c99
2021-07-08 12:30:24 -07:00
dfe25069a8 [ROCm] Skip test_*_stress_cuda test for ROCm (#60490)
Summary:
Skipping test_*_stress_cuda tests because they sometimes fail for ROCm

Signed-off-by: Kyle Chen <kylechen@amd.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60490

Reviewed By: SciPioneer

Differential Revision: D29595552

Pulled By: rohan-varma

fbshipit-source-id: fee18204775211747337985c472ab1084a71f2f1
2021-07-08 12:28:06 -07:00
9310f6bac1 Use our own statically stored vs_buildtools.exe (#61372)
Summary:
We might be getting limited for our VS install requests, leading to HUD failures. This PR moves it to curl from our own S3, so we wouldn't get limited.

This PR also upgrades our vs_install to 16.8.6 from 16.8.5 as moving to S3 didn't help, but moving to the newer installer did.

The CI passes the VS install now, but fails on a build error that I don't think is relevant: https://github.com/pytorch/pytorch/pull/61372/checks?check_run_id=3013140957

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61372

Reviewed By: iramazanli

Differential Revision: D29597204

Pulled By: janeyx99

fbshipit-source-id: 3eb52da308451271ea80120bbf2e511fb781b5dc
2021-07-08 11:27:02 -07:00
ac5b910600 clang-tidy patch (#60714)
Summary:
Two changes made here:
1. Set `LANG=C.UTF-8` for clang-tidy so we can properly decode symbols in comment;
2. In case of file removed, `end` could be null and we should skip the chunk/file;
3. tiny bug fix for the loop indent.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60714

Reviewed By: iramazanli

Differential Revision: D29617171

Pulled By: 1ntEgr8

fbshipit-source-id: b1603929333529a174105baf51e18246d09c012e
2021-07-08 11:16:00 -07:00
074c776011 Force mypy colors in CI (#61391)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61391

Both the [GitHub Actions log viewer](https://github.community/t/ansi-color-output-in-webview/17621) and the HUD PR page log viewer support ANSI color codes so turn those on via this [secret env variable](https://github.com/python/mypy/issues/7771)

Test Plan: Imported from OSS

Reviewed By: walterddr

Differential Revision: D29602686

Pulled By: driazati

fbshipit-source-id: e8f4cd71572cc068927e6719534e64773cb16c7f
2021-07-08 11:08:38 -07:00
c76eba650a [bootcamp][pytorch][WIP] Support embedding_bag_byte_rowwise_offsets in cuda (#61075)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61075

Completed implementation of the embedding_bag_byte_rowwise_offsets wrote randomized test comparing GPU and CPU kernel outputs.

Test Plan:
```
buck build mode/opt --show-full-output  //caffe2/torch/fb/sparsenn:gpu_test
/data/users/johnsonpaul/fbsource/fbcode/buck-out/gen/caffe2/torch/fb/sparsenn/gpu_test#binary.par -r test_embedding_bag_byte_rowwise_offsets
```

Reviewed By: hyuen

Differential Revision: D29218597

fbshipit-source-id: 786260466ab4e8e3d89540496bd8a38be14c5c1b
2021-07-08 10:51:50 -07:00
9ef1c64907 [PyTorch][Edge] Tests for QuantizationFx API on lite modules (#60476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60476

# Context
Add tests for Lite modules that are quantized using fx API

Read this posts for details about why we need a test bench for quantized lite modules
https://fb.workplace.com/groups/2322282031156145/permalink/4289792691071726/

https://github.com/pytorch/pytorch/pull/60226#discussion_r654615851

moved common code to `caffe2/torch/testing/_internal/common_quantization.py`

ghstack-source-id: 133144292

Test Plan:
```
~/fbsource/fbcode] buck test caffe2/test:fx_quantization_lite
Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss
Building: finished in 8.3 sec (100%) 11892/11892 jobs, 2 updated
  Total time: 8.6 sec
More details at https://www.internalfb.com/intern/buck/build/ffb7d517-d85e-4c8f-9531-5e5d9ca1d34c
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: d79a5713-bd29-4bbf-ae76-33a413869a09
Trace available for this run at /tmp/tpx-20210630-105547.675980/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/3096224749578707
    ✓ ListingSuccess: caffe2/test:fx_quantization_lite - main (9.423)
    ✓ Pass: caffe2/test:fx_quantization_lite - test_embedding (mobile.test_quantize_fx_lite_script_module.TestFuseFx) (10.630)
    ✓ Pass: caffe2/test:fx_quantization_lite - test_submodule (mobile.test_quantize_fx_lite_script_module.TestFuseFx) (12.464)
    ✓ Pass: caffe2/test:fx_quantization_lite - test_conv2d (mobile.test_quantize_fx_lite_script_module.TestFuseFx) (12.728)
Summary
  Pass: 3
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/3096224749578707
```

Reviewed By: iseeyuan

Differential Revision: D29306402

fbshipit-source-id: aa481e0f696b7e9b04b9dcc6516e8a390f7dc1be
2021-07-08 10:40:08 -07:00
179b3ab88c [cuDNN] Enable cudnn_batchnorm_spatial_persistent for BatchNorm3d channels_last_3d (#59129)
Summary:
This PR enables the use of cuDNN BatchNorm spatial persistent algorithm for BatchNorm3d (5-D tensor) in channels_last_3d format, aka NDHWC. Performance and numerical accuracy are tested.

- [x] Performance check for common shapes.
- [x] Numerical accuracy check for (1 million) random shapes
    https://github.com/xwang233/code-snippet/tree/master/batchnorm3d-channels-last/A100
    https://github.com/xwang233/code-snippet/tree/master/batchnorm3d-channels-last/V100
- [ ] Convergence check for common 3D models

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59129

Reviewed By: mruberry

Differential Revision: D29593309

Pulled By: ngimel

fbshipit-source-id: 2caf282c6cf2f426aa14a24f94e6bddada68ddac
2021-07-07 21:28:29 -07:00
0222291544 Fix docs for ShardMetadata. (#61388)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61388

The doc for `placement` argument was outdated and is now fixed.
ghstack-source-id: 133184441

Test Plan: waitforbuildbot

Reviewed By: wanchaol

Differential Revision: D29601316

fbshipit-source-id: a0817f799382bf91a5192c54dfeea4d253eb0d56
2021-07-07 21:27:30 -07:00
7011513d23 Enable sparse_csr.to_dense() for bool, float16, bfloat16 and complex (#60657)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60657

Fixes https://github.com/pytorch/pytorch/issues/60648

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D29408102

Pulled By: cpuhrsch

fbshipit-source-id: 406505c1c52c0eada934833f9723f58fa67e9256
2021-07-07 19:29:19 -07:00
5054cb8934 fix torch.cat bug with boxed CPUFallback (#60993)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60993

Fixes https://github.com/pytorch/pytorch/issues/60902

The boxed fallback was written to assume that there was at least one tensor argument, which it used to figure out what device to move the cpu tensors to. That fails with an op like `torch.cat()`, which doesn't have any tensor arguments, but instead has a single `TensorList` argument.

I also added handling to gracefully deal with the case where you have an empty list of tensors - in that case we don't know what device to move everything to, but that doesn't matter because an empty list of tensors implies that we have no tensors to move anyway.

I tested it out though and noticed that `torch.cat(())` doesn't handle empty lists well anyway (erroring out in the dispatcher). I'm not sure that it's a huge issue, and not even sure that we want to fix it (default to CPU? add an extra codegen'd check into every op that only takes TensorList args?) but I'll file a separate bug for that: https://github.com/pytorch/pytorch/issues/60997

I tested it by running the pytorch/xla suite after removing `cat` from `xla_native_functions.yaml`, and confirming that we don't segfault anymore.

Test Plan: Imported from OSS

Reviewed By: asuhan

Differential Revision: D29471577

Pulled By: bdhirsh

fbshipit-source-id: 58c96e8d48d993785b8d15cfa846ec745a34e623
2021-07-07 19:29:17 -07:00
141bfbef86 [iOS GPU] Add tanh and clamp to support GAN (#61383)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61383

Since we already have the support for hardtanh, it's easy to add support for clamp. GPU is 40% ish faster.
ghstack-source-id: 133113272

Test Plan:
- CI
- buck test pp-macos

Reviewed By: dhruvbird

Differential Revision: D29572933

fbshipit-source-id: d22ec09e18d02456440f552067c9a8aea9a1a8ab
2021-07-07 19:29:16 -07:00
4937d9fd6f Fix Dispatching not considering List[Optional[Tensor]] for dispatch (#60787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60787

Fixes #60461.

Previously, when one calls `self.index(indices)` using a regular `self`
Tensor and a `BatchedTensor` indices the dispatcher would not dispatch
to the Batched key. This is because the dispatcher did not extract
dispatch keys from `indices`.

Similar #58283 and #58296, this PR modifies the dispatcher to extract
dispatch keys from List[Optional[Tensor]] arguments. We do this for both
boxed and unboxed kernels.

Test Plan:
- run the test case in
https://gist.github.com/zou3519/4421df7c5271376a0ef53ca857b18740
(requires functorch). After this PR, it raises `RuntimeError: Batching
rule not implemented for aten::index.Tensor. We could not generate a
fallback.`, which shows that dispatch happened on the Batched key.
- Taking suggestions for how to write a test for this in core

Reviewed By: jbschlosser

Differential Revision: D29438611

Pulled By: zou3519

fbshipit-source-id: 77e182f763e18aa3fa857eebafa8b7f83384db71
2021-07-07 19:28:07 -07:00
426c42ba45 [package] ensure we don't write files twice to the archive. (#61371)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61371

The ZIP format allows for writing multiple files with the same name. But
this is handled poorly by most tooling (including our own), so doing so
produces weird behavior depending on the implementation of the ZIP
reader.

Since we have no valid use case for writing multiple files with the same
name to a `torch.package`, just ban it.

Differential Revision:
D29595518
D29595518

Test Plan: Imported from OSS

Reviewed By: Lilyjjo

Pulled By: suo

fbshipit-source-id: b9f5263ab47572abde233745c102af3d6143946e
2021-07-07 18:28:42 -07:00
1d1d5acbb0 [RPC] Ensure _wait_all_workers doesn't swallow exception. (#61094)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61094

`_wait_all_workers` was swallowing exceptions and as a result if there
were any errors it would still continue with rpc_agent.join() which would hang
since something already failed before.

To fix this, I've ensured that wait_all_workers throws and in that case we just
proceed with an ungraceful shutdown without joining.
ghstack-source-id: 133160706

Test Plan:
1) Added unit test.
2) waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D29509286

fbshipit-source-id: 7c3f1c68d712ae2f63e10e0216580db8e9bcc29d
2021-07-07 18:28:41 -07:00
7b6ddb6793 [nnapi] add log_softmax (#61378)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61378

Test Plan: Imported from OSS

Reviewed By: axitkhurana

Differential Revision: D29597355

Pulled By: IvanKobzarev

fbshipit-source-id: 55124749f8eeffa2b2713f7cffd5ccf965561de1
2021-07-07 18:28:39 -07:00
eb82a88d85 Add a type for test fixture world_size (#61363)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61363

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29561360

fbshipit-source-id: 821217e33adc483b1810585a2b91a2ee416513bd
2021-07-07 18:27:37 -07:00
d51b437b74 Cuda quantized tensors, support for quantize per channel (#58245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58245

This adds the support for the per_channel quantization,

(Note: this ignores all push blocking failures!)

Test Plan:
python test/test_quantization.py TestQuantizedTensors
python test/test_quantization.py TestQuantizedTensors.test_compare_quant_dequant_device_numerics
python test/test_quantization.py TestQuantizedTensors.test_qtensor_to_device

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D29018271

fbshipit-source-id: 4f59aed98f2f8ff607154250e4e3f85592e17854
2021-07-07 17:36:53 -07:00
b1dc9c3946 Skip _cudnn_rnn_backward in codegen check (#61386)
Summary:
Fixes internal test failure encountered internally

For context see: https://github.com/pytorch/pytorch/issues/60426

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61386

Reviewed By: malfet

Differential Revision: D29601031

Pulled By: soulitzer

fbshipit-source-id: 3592ca45a01e7bbaa804ab5404338191154f0fbc
2021-07-07 17:36:51 -07:00
b25c65b4f3 Revert D29589020: [pytorch][PR] adding a build_start_time_epoch to build meta info
Test Plan: revert-hammer

Differential Revision:
D29589020 (d33066ab3f)

Original commit changeset: 309fc3b01cbc

fbshipit-source-id: 9b50c1e8dd63e59ab4e593d250dfd5eeb623f0af
2021-07-07 17:35:29 -07:00
9dd1824741 Fix dispatch keys for eigh, lu_solve (#60945)
Summary:
I added a test to `test_ops.py` that verifies that the op can run correctly from different cuda devices. This test revealed that `linalg_eigh`, `linalg_eigvalsh`, `linalg_matrix_rank`, `linalg_pinv` were failing. `matrix_rank` and `pinv` are calling `eigh` internally.

`linalg_eigh` and `lu_solve` internally use dispatch stubs, so they should be registered with `CPU, CUDA` dispatch keys. The generated code includes device guards in this case and the problem is not present.

Implemented a better out variant for `eigvalsh` and registered it with `CPU, CUDA` dispatch keys.

~I added a device guard to `linalg_eigh_kernel` as a fix for `eigvalsh` function. This function needs to be registered as CompositeImplicitAutograd, because it calls `at::linalg_eigh` if `at::GradMode::is_enabled()`.~

Fixes https://github.com/pytorch/pytorch/issues/60892.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60945

Reviewed By: mruberry

Differential Revision: D29589580

Pulled By: ngimel

fbshipit-source-id: 5851605958bdfc3a1a1768263934619449957168
2021-07-07 16:28:22 -07:00
fb00194030 Fix typo in common_utils.py (#61365)
Summary:
Missed this in review of https://github.com/pytorch/pytorch/pull/57953. I don't think this has affected much, though.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61365

Reviewed By: walterddr

Differential Revision: D29593764

Pulled By: janeyx99

fbshipit-source-id: 2c6f6aa961eabca0d8b8a7607aaae979667cca3b
2021-07-07 16:28:20 -07:00
6107cf3750 Add --jobs 0 for git submodule update (#61311)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61311

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61152

Some related docs about `submodule.fetchJobs`
https://git-scm.com/docs/git-config#Documentation/git-config.txt-submodulefetchJobs

```
time git submodule update --init --recursive
________________________________________________________
Executed in  243.20 secs    fish           external
   usr time   49.64 secs  213.00 micros   49.64 secs
   sys time   29.27 secs  795.00 micros   29.27 secs
```

```
time git submodule update --init --recursive --jobs 4
________________________________________________________
Executed in  143.04 secs    fish           external
   usr time   51.06 secs  246.00 micros   51.06 secs
   sys time   30.96 secs  742.00 micros   30.96 secs
```

```
time git submodule update --init --recursive --jobs 8
________________________________________________________
Executed in  124.64 secs    fish           external
   usr time   51.76 secs  264.00 micros   51.76 secs
   sys time   30.49 secs  739.00 micros   30.49 secs

```

```
time git submodule update --init --recursive --jobs 0 # use all online cpus
 ________________________________________________________
Executed in  129.75 secs    fish           external
   usr time   51.64 secs  181.00 micros   51.64 secs
   sys time   31.49 secs  781.00 micros   31.49 secs

```

Test Plan: Imported from OSS

Reviewed By: 1ntEgr8

Differential Revision: D29560875

Pulled By: zhouzhuojie

fbshipit-source-id: 556027dffe744c66428075a8a1bf64683930aaaf
2021-07-07 16:28:18 -07:00
d33066ab3f adding a build_start_time_epoch to build meta info (#61322)
Summary:
Adding a `build_start_time_epoch` as a normal field in scribe reporting.
This should fix https://github.com/pytorch/pytorch/issues/60591.

The decision was made because:
- we would like only one build (test CI job) start time as partition key string
  - the alternative is to report the duration on each test case individually which would result in duplicate numeric value upload.
- we would be easily calculate the wall-time of a test job from `MAX('time') - build_start_time_epoch` for all reporting messages with the same normal keys.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61322

Test Plan:
CI should report the extra normal field.

See: https://fburl.com/scuba/pytorch_test_times/pm6chz9w

Reviewed By: driazati

Differential Revision: D29589020

Pulled By: walterddr

fbshipit-source-id: 309fc3b01cbce76cd62f8ccd2eb0ecad27782b88
2021-07-07 16:27:13 -07:00
429436edbd Avoid complex-to-real cast warning in CopyBackward (#60021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60021

Dropping the imaginary component is expected and gives the correct gradient
formula, so silencing the warning is appropriate.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D29589371

Pulled By: mruberry

fbshipit-source-id: 73e1511cae69207dc9abe576e2769ee1d03f1bbd
2021-07-07 15:28:38 -07:00
10b2a24508 Migrate log_sigmoid (forward and backward) to ATen (CUDA) (#60881)
Summary:
Fixes gh-24591, fixes gh-24590, closes gh-39642

Benchmarks were run with nvprof using contiguous inputs; they show improvement across the board.

#### Forward benchmarks

| Num Elements | Master (us) | This PR (us) |
|:------------:|:-----------:|:------------:|
|     10^4     |    2.5840   |    2.5230    |
|     10^5     |    4.6410   |    3.9280    |
|     10^6     |    33.772   |    23.025    |
|     10^7     |    299.67   |    206.35    |
|     10^8     |    3001.9   |    2052.8    |

#### Backward benchmarks

| Num Elements | Master (us) | This PR (us) |
|:------------:|:-----------:|:------------:|
|     10^4     |    2.7750   |    2.7080    |
|     10^5     |    5.2430   |    3.9010    |
|     10^6     |    46.198   |    32.878    |
|     10^7     |    447.18   |    296.18    |
|     10^8     |    4393.2   |    2938.0    |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60881

Reviewed By: mruberry

Differential Revision: D29589455

Pulled By: ngimel

fbshipit-source-id: 70cd5db244bf6292e9ca367462640530a1d85f7d
2021-07-07 15:28:36 -07:00
f86460a352 Add coverage files to .gitignore (#61144)
Summary:
Fixes failures when coverage is turned on: https://github.com/pytorch/pytorch/runs/2966295169 https://github.com/pytorch/pytorch/runs/2964409741

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61144

Test Plan:
```bash
$ echo hi > test/.coverage.jit.1625168654.4504092
$ git status
$
```

Reviewed By: zhouzhuojie

Differential Revision: D29530709

Pulled By: driazati

fbshipit-source-id: 0e6a1cb217c4d48f14c0c58a546f98393d2b0392
2021-07-07 15:28:35 -07:00
5e83fefdf8 [sparsity] sparsifier step tests (#60107)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60107

Unit tests for sparsifier `step`

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestWeightNormSparsifier`

https://pxl.cl/1LhQP

Reviewed By: z-a-f

Differential Revision: D29167029

fbshipit-source-id: 053027ca92701097406372ef0f81d79ef28380aa
2021-07-07 15:28:33 -07:00
8881b9d852 [sparsity] sparsifier convert tests (#60105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60105

Unit tests for sparsifier `convert`

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestWeightNormSparsifier`

https://pxl.cl/1LhQ8

Reviewed By: z-a-f

Differential Revision: D29145450

fbshipit-source-id: b87b8f0d44751a7dae19d454a11b2d207a7286e2
2021-07-07 15:28:31 -07:00
ec200a60bd [sparsity] sparsifier prepare tests (#60042)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60042

Unit tests for sparsifier `prepare`

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestWeightNormSparsifier`

https://pxl.cl/1LhR1

Reviewed By: z-a-f

Differential Revision: D29140945

fbshipit-source-id: 73cbf27f278ce849e3930ba6caa82bb2f64f1321
2021-07-07 15:28:30 -07:00
21ad978d4f [sparsity] rename sparsity_pattern to sparse_block_shape (#59898)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59898

In `weight_norm_sparsifier`, the name of the argument `sparsity_pattern` is not intuitive for an argument describing the shape of the sparse block. It has been changed to `sparse_block_shape`.

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestWeightNormSparsifier`
https://pxl.cl/1LhRM

Reviewed By: z-a-f

Differential Revision: D29077045

fbshipit-source-id: 0cf9c5387d41ca8e839ee050d71f4fe477374143
2021-07-07 15:27:16 -07:00
aa6a8a6d21 [nnc] Add LoopNest::unsafe_fuseLoops to let users apply fusion on stmts that may violate our correctness checks (#60601)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60601

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D29346128

Pulled By: huiguoo

fbshipit-source-id: 0eb143e97dc57224adeedf99981036ad836e5a03
2021-07-07 14:27:18 -07:00
8fd90f7cfd Implementing transpose for PackedTensorAccessor (#61114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61114

Matching the functionality of THCDeviceTensor::transpose. This
is the same as PR 60968 (https://github.com/pytorch/pytorch/pull/60968)
which was already approved; the state of the PR got messed up so
creating a fresh one.
ghstack-source-id: 133050553

Test Plan:
Unit tests at aten/src/ATen/test/packedtensoraccessor_test.cpp

Imported from OSS

Reviewed By: ezyang

Differential Revision: D29516530

fbshipit-source-id: 91d5bcc38381c00420825646b1c352c0d6bc8b3f
2021-07-07 14:27:16 -07:00
39a76fe73c BatchNorm2D (#61012)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61012

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D29562337

Pulled By: migeed-z

fbshipit-source-id: 2b848d0af607bd4f36cea2436ab2278ac4bc28d7
2021-07-07 14:26:07 -07:00
357c4d9cc4 Add a test case for findDanglingImpls (#61104)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61104

This patch added a new test case for findDanglingImpls. The test case introduces a C++ extension which has a dangling impl such that findDanglingImpls can find it and output its information.

Test Plan:
python test/test_dispatch.py TestDispatch.test_find_dangling_impls_ext

Imported from OSS

Reviewed By: ezyang

Differential Revision: D29512520

fbshipit-source-id: 6883fb8f065f2c0ae0e7a1adf6fd298591497e2b
2021-07-07 13:34:16 -07:00
4d9fd8958b Support __rand__, __ror__ and __rxor__ (#59240)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58120.

This PR implements `torch.Tensor.{__rand__/__ror__/__rxor__}` for the compatibility with NumPy’s interface.
(cc: mruberry, rgommers, emcastillo, kmaehashi)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59240

Reviewed By: ngimel

Differential Revision: D29482304

Pulled By: mruberry

fbshipit-source-id: 13789202c1d8dddf8658a45381aeedcc31e2f603
2021-07-07 13:34:14 -07:00
9547e57643 Create SECURITY.md (#61356)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61356

Reviewed By: samestep

Differential Revision: D29589904

Pulled By: malfet

fbshipit-source-id: 5d79d25e35af9cb258fd6843559955360dc0cc4e
2021-07-07 13:34:12 -07:00
f84a441718 [torch][segment_reduce] Update default values when initial value is not set (#61266)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61266

Same as title.
Mainly this concludes the initially planned features from the op. Only missing functionality is to do reduction on any axis (currently axis 0 only is supported).

Test Plan: Updated unit test.

Reviewed By: ngimel

Differential Revision: D29552037

fbshipit-source-id: 023c7cbf750a0671f76082708f14c05739dda07a
2021-07-07 13:34:10 -07:00
a78ad5dc4c [torch][segment_reduce] Add support for int lengths as well (#61141)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61141

Currently only long is supported. This diff adds support for other index type.

Next Steps:
- Update default, refactor unit test and test non_initial value as well
- Cleanup (more tests, benchmark, update documentation)

Test Plan: updated unit test. rely on CI.

Reviewed By: ngimel

Differential Revision: D29526308

fbshipit-source-id: b4043603483851ef7e0e93b0bb02ac7849c6449d
2021-07-07 13:34:09 -07:00
423523d8bb Alias for logsumexp to special namespace (#58838)
Summary:
See https://github.com/pytorch/pytorch/issues/50345

cc: kshitij12345 Lezcano mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58838

Reviewed By: malfet

Differential Revision: D29565033

Pulled By: mruberry

fbshipit-source-id: 9b715ea00c78f47b6f183357ee3c7d4c3abe4d01
2021-07-07 13:32:15 -07:00
c03f99f3ef Remove pyproject.toml (#61367)
Summary:
This reverts https://github.com/pytorch/pytorch/issues/60408, since it doesn't really give much benefit, and it ended up breaking things:

- https://github.com/pytorch/pytorch/issues/60665
- https://github.com/pytorch/pytorch/pull/60408#issuecomment-873979383

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61367

Reviewed By: malfet, janeyx99

Differential Revision: D29593886

Pulled By: samestep

fbshipit-source-id: b1ba0ac7695e3eacf66a35e293080e8a1240efca
2021-07-07 12:47:45 -07:00
994ce7dbd9 Cuda quantized tensors, support for quantize per tensor (#59700)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59700

implements quantized tensors in cuda for for per_tensor
quantization, along with several necessary functions

(Note: this ignores all push blocking failures!)

Test Plan:
python test/test_quantization.py TestQuantizedTensors
python test/test_quantization.py
TestQuantizedTensors.test_compare_quant_dequant_device_numerics
python test/test_quantization.py
TestQuantizedTensors.test_qtensor_to_device

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29018272

fbshipit-source-id: e07d19d6d67729c46324c2bb5946d959e6e6db8e
2021-07-07 12:40:51 -07:00
baa518e2f6 Add Int32 support for NNAPI (#59365)
Summary:
Support Int32 tensors in NNAPI converter

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59365

Test Plan: Local testing with FB prod models

Reviewed By: anshuljain1

Differential Revision: D28881040

fbshipit-source-id: 2dacceffd322a21d91bfefcf2fb2ea400d952d0d
2021-07-07 12:40:49 -07:00
cf285d8eea Add aten::slice NNAPI converter (#59364)
Summary:
Add support for aten::slice op in the NNAPI model converter

* If start = 0; end = max -> identity
* Flexible shapes can be passed through
* Flexible shapes can't be sliced over

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59364

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_slice

Reviewed By: anshuljain1

Differential Revision: D28881039

fbshipit-source-id: 3c1c630ff27b5bba6eda403d87570c61d43ae90e
2021-07-07 12:40:47 -07:00
d26372794a Add aten::detach NNAPI converter (#58543)
Summary:
* Add support for aten::detach op in the NNAPI model converter as a no-op
* Also add flexible op support for add_pointwise_simple_unary_op

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58543

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_detatch

Reviewed By: anshuljain1

Differential Revision: D28531942

fbshipit-source-id: 4387dbbbadd8ce6b690841f3a903e68a380b849d
2021-07-07 12:40:46 -07:00
0be228dd5f Add aten::flatten NNAPI converter (#60885)
Summary:
Add support for aten::div op in the NNAPI model converter. Startup time
variable size support isn't supported as shapes go as inputs to NNAPI op

Runtime variable size support to supported soon

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60885

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_flatten

Reviewed By: anshuljain1

Differential Revision: D29451725

fbshipit-source-id: 8902745f7758c8cc88ad4b4ce02b8301ff894bd4
2021-07-07 12:40:44 -07:00
b297f65b66 Add aten::div NNAPI converter (#58541)
Summary:
Add support for aten::div op in the NNAPI model converter. Add variable
size input test as well.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58541

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_div

Reviewed By: anshuljain1

Differential Revision: D28531943

fbshipit-source-id: e96342146f6de216f7b88443618edfc54963747c
2021-07-07 12:40:42 -07:00
eab18a9a40 Add aten::to NNAPI converter (#58540)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58540

Add support for aten::to op in the NNAPI model converter for simple
cases like to("cpu"), to("gpu")

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_to

Reviewed By: anshuljain1

Differential Revision: D28531941

fbshipit-source-id: 0c934f7aceaff2669307c3426efe32046d8c44f3
2021-07-07 12:40:41 -07:00
14d604a13e Add aten::softmax NNAPI converter (#58539)
Summary:
Add support for aten::softmax op in the NNAPI model converter with
flexible size

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58539

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_softmax

Reviewed By: anshuljain1

Differential Revision: D28531946

fbshipit-source-id: 8633f3e3f7f52795f9866ff16ad0867ea36a19e8
2021-07-07 12:39:31 -07:00
45ce26c397 Port isposinf & isneginf kernel to structured kernels (#60633)
Summary:
Porting `torch.isposinf` & `torch.isneginf` to structured kernel
Related https://github.com/pytorch/pytorch/issues/55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60633

Reviewed By: saketh-are

Differential Revision: D29517528

Pulled By: bdhirsh

fbshipit-source-id: f8f62e4c203e0c54790437b5e512024bfabdddfc
2021-07-07 12:33:41 -07:00
c2b0af2560 [static runtime] Implement aten::sign (#61154)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61154

Test Plan:
Added `StaticRuntime.IndividualOps_Sign`

```
[djang@devvm861.prn0 ~/local/fbsource/fbcode/caffe2] buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- -v 1
...
[ RUN      ] StaticRuntime.IndividualOps_Sign
V0701 12:05:31.836099 3679080 impl.cpp:455] StaticModuleOptions: cleanup_activations 1, enable_out_variant 1, optimize_memory1, optimize_graph_output_memory0
V0701 12:05:31.898192 3679080 impl.cpp:1279] Switch to out variant for node: %3 : Tensor = aten::sign(%input.1)
V0701 12:05:31.898849 3679080 impl.cpp:1279] Switch to out variant for node: %4 : Tensor = aten::clone(%3, %2)
```

Reviewed By: hlu1

Differential Revision: D29518603

fbshipit-source-id: e47b96d037fea639c41052f3849c82bbfa5f482a
2021-07-07 12:29:25 -07:00
1262b2c4c6 fix torch.futures docstring examples (#61029)
Summary:
Trying to run the doctests for the complete documentation hangs if it reaches the examples of `torch.futures`. It turns out to be only syntax errors, which are normally just reported. My guess is that `doctest` probably doesn't work well for failures within async stuff.

Anyway, while debugging this, I fixed the syntax.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61029

Reviewed By: mruberry

Differential Revision: D29571923

Pulled By: mrshenli

fbshipit-source-id: bb8112be5302c6ec43151590b438b195a8f30a06
2021-07-07 11:47:55 -07:00
376dc500a9 Minor bug fix in the warning message (#61127)
Summary:
The current example code does not work. The correct one is like this: cb7d813275/torch/distributed/run.py (L266)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61127

Reviewed By: cbalioglu

Differential Revision: D29572003

Pulled By: mrshenli

fbshipit-source-id: 05b470230f3d70f8a6164edb5f92894a1112069f
2021-07-07 11:42:51 -07:00
90241d254f Automated submodule update: FBGEMM (#59968)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: a2257d9471

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59968

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: r-barnes

Differential Revision: D29109045

fbshipit-source-id: 386b28b28275e728ee229d4baf1ff192635d49c3
2021-07-07 11:33:57 -07:00
29ecb9f90b Don't check stride by default (#60637)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60637

We now have ~three out of three~  four out of four datapoints that `check_stride` will be `partial`'ed to `False`:

- `torch` test suite: https://github.com/pytorch/pytorch/pull/58981#discussion_r639514081
- `torchvision` test suite: https://github.com/pytorch/pytorch/issues/56544#issuecomment-845352605
- `kornia`: 9041c42b41/test/utils.py (L25)
- `torch.fft`: https://github.com/pytorch/pytorch/pull/60304#pullrequestreview-687882323

Given that the strides in most cases are in implementation detail, IMO we should change the default to `False`. In cases were matching strides is a requirement for closeness / equality it can always set to `True` manually.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D29556355

Pulled By: mruberry

fbshipit-source-id: 0029a44280d8f4369fbdb537dce3202eeee4b1d9
2021-07-07 09:55:36 -07:00
e2a3f4b560 Use maximum of tolerances in case of mismatching dtypes (#60636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60636

See https://github.com/pytorch/pytorch/pull/58981#issuecomment-866654600.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D29556352

Pulled By: mruberry

fbshipit-source-id: 36e97e0f338df5d17a94af078f172c668ef51ecb
2021-07-07 09:55:34 -07:00
5f18ba7075 upcast to most precise dtype within their category before the comparison (#60536)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60536

`torch.isclose` does not do this bool tensors, which results in a test failure since subtraction (`abs(actual - expected)`) is not supported for them (see #58981). Since the `dtype` is already checked at this point, we can safely move the upcasting before `torch.isclose` is invoked.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D29556356

Pulled By: mruberry

fbshipit-source-id: 4c65fad4f06cf402d6aab9dde5b127235766d5e0
2021-07-07 09:55:32 -07:00
5ac87cde30 tests for diagnostics in callable msg in torch.testing.assert_close (#60254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60254

Before we only tested that the correct error message is returned if `msg` is passed as callable. This adds tests that make sure that

- the inputs passed to the callable are the same inputs passed to `torch.assert_close` and
- the `diagnostics` namespace has the same attributes and types as documented.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D29556354

Pulled By: mruberry

fbshipit-source-id: 9793c6d86fda842b6329381fc03b945eee878464
2021-07-07 09:55:30 -07:00
76d9e680d7 update docstring examples of torch.testing.assert_close (#60163)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60163

Changes to the default error message in case of mismatching values need to be reflected in the examples given in the docstring. Normally this should be enforced by a [`doctest`](https://docs.python.org/3/library/doctest.html). mruberry do you know why we don't have such a check?

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D29556353

Pulled By: mruberry

fbshipit-source-id: 8dbc3f566f429618811b542a059d9abde9a6530b
2021-07-07 09:55:29 -07:00
9979289037 Improve error messages of torch.testing.assert_close in case of mismatching values (#60091)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60091

Closes #58383. (1) and (2) are implemented. (3) was rejected. No consensus was reached on (4) and (5).

Improvements:

- Instead of calling everything "Tensors" we now use "Scalars" and "Tensor-likes" depending on the shape. Plus, we now internally have the option to adapt this identifier for example to report "Imaginary components of complex tensor-likes", which is even more expressive.
- The reported conditions "not close" and "not equal" are now determined based on `rtol` and `atol`.
- The number of mismatched elements and the offending indices are only reported in case the inputs are not scalar
- The allowed `rtol` and `atol` is only reported if `> 0`

**Example 1**

```python
torch.testing.assert_close(1, 3, rtol=0, atol=1)
```

Before:

```
AssertionError: Tensors are not close!

Mismatched elements: 1 / 1 (100.0%)
Greatest absolute difference: 2 at 0 (up to 1 allowed)
Greatest relative difference: 0.6666666865348816 at 0 (up to 0 allowed)
```

After:

```
AssertionError: Scalars are not close!

Absolute difference: 2 (up to 1 allowed)
Relative difference: 0.6666666865348816
```

**Example 2**

```python
torch.manual_seed(0)
t = torch.rand((2, 2), dtype=torch.complex64)
torch.testing.assert_close(t, t + complex(0, 1))
```

Before:

```
AssertionError: Tensors are not close!

Mismatched elements: 4 / 4 (100.0%)
Greatest absolute difference: 1.0000000596046448 at (0, 0) (up to 1e-05 allowed)
Greatest relative difference: 0.8833684352411922 at (0, 1) (up to 1.3e-06 allowed)

The failure occurred for the imaginary part.
```

After:

```
AssertionError: Imaginary components of tensor-likes are not close!

Mismatched elements: 4 / 4 (100.0%)
Greatest absolute difference: 1.0000000596046448 at index (0, 0) (up to 1e-05 allowed)
Greatest relative difference: 0.8833684352411922 at index (0, 1) (up to 1.3e-06 allowed)
```

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D29556357

Pulled By: mruberry

fbshipit-source-id: 559d4a19ad4fc069b2b4f8cb5fc2f6058621e33d
2021-07-07 09:54:09 -07:00
e1338016dd cuSOLVER path for LU factorization in CUDA. (#56887)
Summary:
This PR adds cuSOLVER path for `torch.lu`.

Performance comparison results: https://github.com/pytorch/pytorch/issues/53879#issuecomment-830635381

Code for reproducing performance results: https://github.com/pytorch/pytorch/pull/56887#issuecomment-843212868

The following heuristics are used for choosing cuSOLVER over MAGMA:
* If batch size == 1 OR (batch size <= 8 AND shape <= 16), choose cuSOLVER over MAGMA.
* For all other cases use MAGMA.

See also https://github.com/pytorch/pytorch/issues/47953.

Following are the performance results between the MASTER branch and the current changes:

<details>

```
[-------------------------- LU factorization (ATen) torch.float64 ---------------------------]
                                     |  lu_factorize CURRENT |  lu_factorize MASTER
1 threads: -----------------------------------------------------------------------------------
      torch.Size([1, 1, 1])          |              363.9          |             284.1
      torch.Size([2, 1, 1])          |              354.8          |             271.8
      torch.Size([4, 1, 1])          |              393.7          |             278.0
      torch.Size([8, 1, 1])          |              459.3          |             279.1
      torch.Size([16, 1, 1])         |              524.2          |             288.9
      torch.Size([32, 1, 1])         |              525.1          |             281.2
      torch.Size([64, 1, 1])         |              524.5          |             281.7
      torch.Size([128, 1, 1])        |              522.8          |             285.2
      torch.Size([1, 2, 2])          |              360.4          |             277.7
      torch.Size([2, 2, 2])          |              372.9          |             279.2
      torch.Size([4, 2, 2])          |              419.4          |             278.3
      torch.Size([8, 2, 2])          |              475.7          |             279.2
      torch.Size([16, 2, 2])         |              530.0          |             299.5
      torch.Size([32, 2, 2])         |              530.0          |             294.5
      torch.Size([64, 2, 2])         |              531.0          |             291.5
      torch.Size([128, 2, 2])        |              544.4          |             292.3
      torch.Size([1, 8, 8])          |              372.6          |             292.8
      torch.Size([2, 8, 8])          |              380.9          |             296.2
      torch.Size([4, 8, 8])          |              420.0          |             293.4
      torch.Size([8, 8, 8])          |              490.6          |             294.6
      torch.Size([16, 8, 8])         |              535.6          |             296.5
      torch.Size([32, 8, 8])         |              534.7          |             302.1
      torch.Size([64, 8, 8])         |              539.1          |             305.5
      torch.Size([128, 8, 8])        |              540.7          |             296.5
      torch.Size([1, 16, 16])        |              345.0          |             303.2
      torch.Size([2, 16, 16])        |              405.0          |             306.3
      torch.Size([4, 16, 16])        |              482.8          |             305.6
      torch.Size([8, 16, 16])        |              596.3          |             305.9
      torch.Size([16, 16, 16])       |              539.6          |             304.4
      torch.Size([32, 16, 16])       |              542.2          |             305.8
      torch.Size([64, 16, 16])       |              556.1          |             311.0
      torch.Size([128, 16, 16])      |              545.1          |             308.1
      torch.Size([1, 32, 32])        |              432.7          |             342.4
      torch.Size([2, 32, 32])        |              582.6          |             341.8
      torch.Size([4, 32, 32])        |              580.4          |             344.4
      torch.Size([8, 32, 32])        |              586.5          |             343.8
      torch.Size([16, 32, 32])       |              582.9          |             346.0
      torch.Size([32, 32, 32])       |              574.4          |             343.7
      torch.Size([64, 32, 32])       |              562.8          |             350.8
      torch.Size([128, 32, 32])      |              568.3          |             349.8
      torch.Size([1, 64, 64])        |              537.1          |             518.4
      torch.Size([2, 64, 64])        |              766.5          |             539.1
      torch.Size([4, 64, 64])        |              771.6          |             551.9
      torch.Size([8, 64, 64])        |              783.4          |             556.0
      torch.Size([16, 64, 64])       |              798.8          |             555.3
      torch.Size([32, 64, 64])       |              795.6          |             548.6
      torch.Size([64, 64, 64])       |              804.2          |             580.4
      torch.Size([128, 64, 64])      |              837.6          |             616.9
      torch.Size([1, 128, 128])      |              844.7          |             848.9
      torch.Size([2, 128, 128])      |             1096.7          |             873.3
      torch.Size([4, 128, 128])      |             1117.9          |             884.8
      torch.Size([8, 128, 128])      |             1138.1          |             903.6
      torch.Size([16, 128, 128])     |             1169.1          |             943.9
      torch.Size([32, 128, 128])     |             1204.8          |             981.4
      torch.Size([64, 128, 128])     |             1336.6          |            1105.8
      torch.Size([128, 128, 128])    |             1639.4          |            1473.3
      torch.Size([1, 512, 512])      |             3714.3          |            3928.6
      torch.Size([2, 512, 512])      |             4388.3          |            4179.7
      torch.Size([4, 512, 512])      |             4765.4          |            4536.9
      torch.Size([8, 512, 512])      |             5615.2          |            5441.1
      torch.Size([16, 512, 512])     |             7203.6          |            7130.2
      torch.Size([32, 512, 512])     |            10580.5          |           10503.9
      torch.Size([64, 512, 512])     |            17374.8          |           17349.6
      torch.Size([128, 512, 512])    |            32542.3          |           32548.8
      torch.Size([1, 1024, 1024])    |            10041.5          |           14292.3
      torch.Size([2, 1024, 1024])    |            17126.6          |           16971.0
      torch.Size([4, 1024, 1024])    |            20591.0          |           20490.8
      torch.Size([8, 1024, 1024])    |            27682.8          |           27560.7
      torch.Size([16, 1024, 1024])   |            41035.2          |           41035.8
      torch.Size([32, 1024, 1024])   |            67091.8          |           67345.9
      torch.Size([64, 1024, 1024])   |           119612.3          |          119782.3
      torch.Size([128, 1024, 1024])  |           230095.5          |          230766.2

Times are in microseconds (us).

```
</details>

The main reason why a performance regression can be seen is related to this issue (https://github.com/pytorch/pytorch/issues/55122) and there seems to be no easy way to fix this (atleast in this PR).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56887

Reviewed By: ngimel

Differential Revision: D29482342

Pulled By: mruberry

fbshipit-source-id: 4fdedf21b0d5597b289e168dff61d5f5d7727fb1
2021-07-07 09:45:23 -07:00
4a544df00d Implement and benchmark a torch.optim.multi_tensor.adagrad implementation (#59155)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59155

Test Plan: Imported from OSS

Reviewed By: iramazanli

Differential Revision: D29525213

Pulled By: ramvenkat98

fbshipit-source-id: 6d7e8da91c965d1f4e955a084ed875bab641dc9a
2021-07-07 08:08:32 -07:00
8bec478a9e MaxPool2d: use channels_last format for both output and indice when input is channels_last (#61245)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61245

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D29557884

Pulled By: ezyang

fbshipit-source-id: 0d2b8cbaaf13411eefd7d867021bd6028d40e5cc
2021-07-07 07:50:28 -07:00
66158a6e90 Enable AutogradXPU DispatchKey for Intel heterogeneous computation platform. (#61105)
Summary:
Add string wrapper for AutogradXPU to enable this DispatchKey.
We are going to use AutogradXPU as custom autograd backend, which needs this DispatchKey.
This sting wrapper is used to map AutogradXPU to the corresponding DispatchKey.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61105

Reviewed By: malfet

Differential Revision: D29557697

Pulled By: ezyang

fbshipit-source-id: f0c8155decc8e2fd90741650a05de5a8b5a70121
2021-07-07 07:47:01 -07:00
a69e947ffd avg_pool3d_backward: Port to structured (#59084)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59084

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D28802619

Pulled By: ezyang

fbshipit-source-id: 89a0fcdcf8976ca7c21da7a40fd26a1cba180faa
2021-07-07 07:44:17 -07:00
e4c450a4e8 The dispatch order for custom function (#60251)
Summary:
Hi, I am working on dev some custom ops.

And I found this issue:

Cause of the logical here: https://github.com/pytorch/pytorch/compare/master...zhuhaozhe:customer-op-trace?expand=1#diff-d7ade8589773904745c0cf965a19f24c940f1d36038f4c0ce85af2f3d89991dcL173-L177.
For all custom ops, "Tracer" dispatch key got the highest priority.

This make custom-ops and non-custom-ops get different behavior during dispatch. I do not understand whether there exist some special reason to let custom-ops "trace" first then begin to "dispatch".

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60251

Reviewed By: malfet

Differential Revision: D29577131

Pulled By: ezyang

fbshipit-source-id: a8e824029cf934f09f29638b127961a6a5c332de
2021-07-07 06:31:43 -07:00
a6fea03a8a Skip codegen checks for dequantize_self, lu_unpack, _cudnn_rnn, and .*conv.*_backward.* (#61139)
Summary:
Temporary fix for fb-internal tests. This and similar failures are being discussed here:
https://github.com/pytorch/pytorch/issues/60426

Applies the below changes:
 - This may seem counter intuitive because storage check comes before tensor check, but if TensorImpl use count is not enforced, we should also not enforce storage use count. If an op returns one of its inputs as-is, it is possible for this input to already be aliased with another tensor, and hence would have StorageImpl use count greater than one.
 - Also clarify in description that use_count is not necessarily > 1, use_count may but not necessarily return one of its inputs as-is.
 - Allow usage of regex in skip list

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61139

Reviewed By: malfet, Varal7

Differential Revision: D29564917

Pulled By: soulitzer

fbshipit-source-id: 806b7177117a573dd12f161cc80dcadac892f9d0
2021-07-07 05:21:19 -07:00
6f1455440b task 3: typecheck (#60805)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60805

Test Plan: Imported from OSS

Reviewed By: jamesr66a, VitalyFedyunin

Differential Revision: D29522885

Pulled By: migeed-z

fbshipit-source-id: 559a8a495a16e517af77fd5a0785a82e1ebb3bd7
2021-07-06 23:51:49 -07:00
9813b9bc0d Fix mypy.ini (#61333)
Summary:
Fixes CI regression caused by https://github.com/pytorch/pytorch/issues/61119
Unlike Python, `.ini` string lists could not  end with trailing comma.

Fixes CI on master

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61333

Reviewed By: bhosmer

Differential Revision: D29578696

Pulled By: malfet

fbshipit-source-id: b81e5f4c0a553299c4d4bee0a9bb73748910795f
2021-07-06 22:46:09 -07:00
f0316ec0b6 Revert D24068202: [pytorch][PR] Add typing return value to init in nn.Module
Test Plan: revert-hammer

Differential Revision:
D24068202 (506397a809)

Original commit changeset: 4cd9b6ca12b5

fbshipit-source-id: f45fcf7ee6ee9198ed6f3f34956ce68a64378c32
2021-07-06 22:15:31 -07:00
98119bfce9 task 2: ast rewrite (#60622)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60622

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D29493747

Pulled By: migeed-z

fbshipit-source-id: 684fcdfd3dd441e72c77bb7a4d64c18b9849a198
2021-07-06 20:15:30 -07:00
0dc40474fe Migrate glu from the THC to ATen (CUDA) (#61153)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61153

Fixes gh-24571, fixes gh-24572
Closes gh-39586, closes gh-39586

Benchmarks
----------

The benchmarks were run with nvprof calling the operator in a loop. It shows
reliable improvements for large tensors, but the TH implementation seems to fair
better for smaller tensors. For sufficiently large tensors, the ATen
implementation does win though.

|        Shape | Dim | Master Forward (us) | This PR Forward (us) | Master Backward (us) | This PR Backward (us) |
|-------------:|-----|:-------------------:|:--------------------:|:--------------------:|:---------------------:|
|    128, 1000 | 0   |        2.4770       |        2.0820        |        3.0440        |         3.4680        |
|              | 1   |        2.7060       |        4.4850        |        3.3380        |         3.6250        |
|   128, 10000 | 0   |        26.531       |        21.366        |        38.083        |         34.623        |
|              | 1   |        27.680       |        30.465        |        38.943        |         35.204        |
|  128, 100000 | 0   |        292.09       |        219.56        |        355.57        |         324.49        |
|              | 1   |        260.43       |        243.08        |        332.25        |         323.37        |
| 128, 1000000 | 0   |        2475.7       |        1874.6        |        3810.1        |         3215.7        |
|              | 1   |        2586.3       |        2380.9        |        3349.9        |         3207.8        |

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29538093

Pulled By: ngimel

fbshipit-source-id: 1f66b45ec7c46fb8e680b50110a5fde6fe7faab7
2021-07-06 19:06:51 -07:00
7a4ffbd1da [FX] s/IS_SANDCASTLE/IS_FBCODE/ in tests (#61304)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61304

Previously tests were unrunnable on devserver. This fixes that
ghstack-source-id: 133051811

Test Plan: waitforsadcastle

Reviewed By: Chillee

Differential Revision: D29561806

fbshipit-source-id: 6020e5b4ba72d6de1ea2563e70fdb0e604bee1a5
2021-07-06 17:20:53 -07:00
506397a809 Add typing return value to init in nn.Module (#45654)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/45497

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45654

Reviewed By: driazati

Differential Revision: D24068202

Pulled By: malfet

fbshipit-source-id: 4cd9b6ca12b531311302e3cdeeab39bc45d86c94
2021-07-06 17:09:30 -07:00
9f3167ebdf task 1: annotate (#60621)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60621

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D29493619

Pulled By: migeed-z

fbshipit-source-id: 1bd3fb02c90ae5b394869a474b2e6b06af0d4791
2021-07-06 16:48:11 -07:00
a1ad28da10 Refactor clang_tidy.py (#61119)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61119

This change spilts the clang-tidy CI job into smaller steps and uses a
refactored version of the clang_tidy.py script.

The new folder structure is as follows:
```
tools/linter/clang_tidy
|_ __main__py
|_ requirements.txt
|_ run.py
|_ setup.sh
```

`__main__.py`

This script will run `tools/linter/clang_tidy/setup.sh` if a `build`
directory doesn't exist, mimicing what used to be done as a separate
step in the CI job.

After that, it will invoke `clang-tidy` with default arguments being
declared in the script itself (as opposed to declaring them in
lint.yml).

The reasoning behind this approach is two-fold:

- Make it easier to run `clang-tidy` locally using this script
- De-duplicate the option passing

`requirements.txt`

Contains a list of additional python dependencies needed by the
`clang-tidy` script.

`setup.sh`

If a build directory doesn't exist, this command will run the necessary
codegen and build commands for running `clang-tidy`

Example usage:
```
python3 tools/linter/clang_tidy --parallel
```
Notice that we don't have to put the `.py` at the end of `clang_tidy`.

Test Plan:
Run the following command:
```
python3 tools/linter/clang_tidy --paths torch/csrc/fx --parallel
```

Reviewed By: walterddr, janeyx99

Differential Revision: D29568582

Pulled By: 1ntEgr8

fbshipit-source-id: cd6d11c5cb8ba9f1344a87c35647a1cd8dd45b04
2021-07-06 16:02:11 -07:00
81e36d02a6 Improve error message on invalid values to Distribution methods (#61056)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/18133

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61056

Reviewed By: jbschlosser

Differential Revision: D29510173

Pulled By: neerajprad

fbshipit-source-id: 205ec7de6c8576a73e77ee4bf01c30e99b38a52e
2021-07-06 15:44:55 -07:00
45cc207a88 Fix breakpad build + add test canary (#60990)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60990

This makes the breakpad build more explicit in its messaging and hints to cmake where to look for the library (it wasn't able to find it without `PATHS` on CI even though that works locally). This also adds a smoke test that will fail if breakpad isn't present on a CI job where it is expected (e.g. binary builds).

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D29514316

Pulled By: driazati

fbshipit-source-id: 79514363334788f311ba5d4f25deed3452f0c3eb
2021-07-06 14:15:07 -07:00
b6024b9d12 More loop transforms 2
Summary: Exact duplicate of D29410111 to fix land issues.

Test Plan: Sandcastle

Reviewed By: walterddr

Differential Revision: D29538335

fbshipit-source-id: 6a4f9ac4a505339ed242af60fe7fd4ba1fda3b32
2021-07-06 13:38:10 -07:00
c74c0c5718 add thrust/host_vector.h header for cuda 11.4 build (#61004)
Summary:
needed for cuda 11.4 build

Close https://github.com/pytorch/pytorch/issues/61011

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61004

Reviewed By: ngimel

Differential Revision: D29523896

Pulled By: malfet

fbshipit-source-id: acb11bdd19c0cc240696be21e5c492f8976fea65
2021-07-06 12:44:56 -07:00
5da507b57b Add bazel actions workflow (#61039)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61039

- Added a new template for bazel GH Actions workflow
- Simplified the workflow based on malfet's suggestion by combining build and test jobs into one as we only run a small subset of tests for bazel
- Tested the run to make sure it succeeds
- Build step takes 4 minutes, test step takes 7 minutes

The downside of this approach is that I duplicated some of the jobs in a new template file.  Alternative solution would be to use something like this https://jinja.palletsprojects.com/en/3.0.x/templates/#template-inheritance, however, that is better to be done in a separate PR as linux and windows workflows would need to be changed. Another solution is to use a bunch of if else statements in a linux workflow template to accommodate bazel build as part of it, but this seems not as clean as template inheritance with jinja.

Here is a link to the latest bazel run with this change https://github.com/pytorch/pytorch/actions/runs/1004656584

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D29562260

Pulled By: rsemenov

fbshipit-source-id: a7d7d3a0b8092f52929fb109820bfad4574f5602
2021-07-06 12:18:43 -07:00
fac744e116 Foreach Binary Test Refactor (#59907)
Summary:
Related: https://github.com/pytorch/pytorch/issues/58833

## Changes I'm a bit concerned
- binary ops with one tensorlist and one scalarlist support complex dtypes. To realize this, I added a specialization of [`TensorListScalarListMetadata<c10::complex<double>, 1>` ](https://github.com/pytorch/pytorch/pull/59907/files#diff-131eb9b310905b15b3528da6a23e542a3a3aa952bc88f7423c98a23a8a28cca1R49). This might be out of the scope of this pull request.

cc ptrblck ngimel mcarilli

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59907

Reviewed By: mruberry

Differential Revision: D29551001

Pulled By: ngimel

fbshipit-source-id: 46b25fdba85dd4d6332a77b27376fe96cd422384
2021-07-06 11:49:38 -07:00
5503a4ac6e DOC Improves shape documentation for *Flatten (#60980)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60841

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60980

Reviewed By: VitalyFedyunin

Differential Revision: D29526650

Pulled By: jbschlosser

fbshipit-source-id: 2b4b0b84e0652c4cf3e9a48debb3b1bfe4e04b05
2021-07-06 10:47:11 -07:00
95cada8810 Make breakpad depdendencies private (#61183)
Summary:
Otherwise, it will results in the following errors for people developing extensions
```
CMake Error in frontends/pytorch/csrc/CMakeLists.txt:
  Imported target "torch" includes non-existent path

    "/usr/local/include/breakpad"
```

Fixes different issue reported in https://github.com/pytorch/pytorch/issues/60485

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61183

Reviewed By: driazati

Differential Revision: D29538332

Pulled By: malfet

fbshipit-source-id: e83cfd0b335e9b0b1ba5715789b09765db671346
2021-07-06 10:02:34 -07:00
635d864b26 Fix modernize-use-equals-default nolint failures in torch/csrcs (#61142)
Summary:
Test-plan: Compile + clang-tidy

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61142

Reviewed By: VitalyFedyunin

Differential Revision: D29529372

Pulled By: malfet

fbshipit-source-id: 2ccde7712a51c28243b16bbb4d1d68086e0414a6
2021-07-06 09:46:46 -07:00
718db968b8 move CI related functions out of run_test.py (#61124)
Summary:
run_test.py currently does lots of downloading and test file/suite/case parsing. It doesn't work well outside of the CI environment

Restructured the run_test.py and created tools/test/test_selections.py and move all test selection logic (reordering, categorizing slow test, creating shards)

Follow up PRs should:
- refactor those file read/write logic entangled inside test_selections.py into stats/ folder
- restructure and add network independent test logics to test_test_selections.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61124

Test Plan:
- tools/test
- CI

Related PR:
This follows the refactoring example in: https://github.com/pytorch/pytorch/issues/60373

Reviewed By: malfet

Differential Revision: D29558981

Pulled By: walterddr

fbshipit-source-id: 7f0fd9b4720a918d82918766c002295e8df04169
2021-07-06 09:06:42 -07:00
864dcbb2cc Set sccache bucket on test runs to save some run minutes (#61140)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61140

While working on bazel port to GitHub Actions I noticed that we do not set sccache bucket for test runs that causing cache misses while running test jobs. For example https://github.com/pytorch/pytorch/runs/2965919198?check_suite_focus=true  test run 1 uses local cache and has 44 cache misses with avg 9 sec read per miss it is saving 44*9/60 = 7 minutes per run.

Here is another example
https://github.com/pytorch/pytorch/runs/2966210127?check_suite_focus=true

Open to feedback if there is a downside of using AWS cache.

Test Plan: Imported from OSS

Reviewed By: 1ntEgr8

Differential Revision: D29557292

Pulled By: rsemenov

fbshipit-source-id: e8fb000850ec4627d7cccf690e8f5743999fdf36
2021-07-06 07:29:57 -07:00
05c1e5b655 [sparsity] Lambda Scheduler (#59771)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59771

Implements a specific sparsity scheduler, that uses a user-provided lambda's to change the levels.

Test Plan:
```
python test/test_ao_sparsity.py
```
Imported from OSS

Differential Revision:
D29070604
D29070604

Reviewed By: raghuramank100

Pulled By: z-a-f

fbshipit-source-id: c7ccbe63fe4cd6a0c3563541b7fcf93a99d0e62f
2021-07-02 21:39:38 -07:00
37ebf2e3cd [sparsity] Base sparsity level scheduler class (#59770)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59770

Implements the base scheduler class for changing the sparsity levels in the sparsifier.

Test Plan:
```
python test/test_ao_sparsity.py
```
Imported from OSS

Differential Revision:
D29070603
D29070603

Reviewed By: raghuramank100

Pulled By: z-a-f

fbshipit-source-id: 0b160e4eb0a2a303d2d19e6a3beb4784002b2cb7
2021-07-02 21:38:24 -07:00
ed63fb5225 Fix some more loops (#60895)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60895

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29431572

fbshipit-source-id: fbcf48696bf2c90cc0973a767d83bb526f6ccd7f
2021-07-02 19:17:08 -07:00
43fb39c3eb [DDP] Make uneven inputs work with comm. hook (#61020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61020

Makes uneven input support with `join` context manager work with
custom communication hooks. This will ensure that the two features can work
well together. Added relevant unittests to test allreduce and powerSGD hooks.

Instead of calling `allreduce`, the join manager now calls into `_run_reduction_hook` which will automatically run whatever hook is installed.
ghstack-source-id: 132950108

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D29480028

fbshipit-source-id: c91dc467a62c5f1e0ec702a2944ae3deb10f93f4
2021-07-02 18:48:21 -07:00
94b730681f [DDP] Refactor uneven inputs to take GradBucket (#61019)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61019

Changes uneven input logic of running allreduce to using `GradBucket` structure. This is to enable support for comm. hook with join in the next diff.
ghstack-source-id: 132950107

Test Plan: ci

Reviewed By: SciPioneer

Differential Revision: D29480027

fbshipit-source-id: 7c42c53653052f71b86a75e14a5fc7ae656433f7
2021-07-02 18:47:23 -07:00
512448a425 CTCLoss: Remove dispatching in parallel region (#60599)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60599

Ref #56794

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D29446190

Pulled By: ngimel

fbshipit-source-id: eb01783c8c32a1405b58e1364fc3d71c0f054e0a
2021-07-02 17:55:56 -07:00
d42f1751d4 [sparsity] WeightNormSparsifier (#58955)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58955

Implements the weight norm sparsifier.
This type of sparsifier computes the norm of the weights, sorts them, and zeroes-out the target fraction of them.

The main imeplemented method is `update_mask`, which holds the main logic of changing the masks.

Test Plan:
```
python test/test_ao_sparsity.py
```
Imported from OSS

Differential Revision:
D28970960
D28970960

Reviewed By: raghuramank100

Pulled By: z-a-f

fbshipit-source-id: 8f2a4360ad877f430cdc1065c6777106938b58d5
2021-07-02 17:35:27 -07:00
7ab2729481 [sparsity][refactor] Import factoring out (#58707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58707

Minor refactor that changes the format of the import.
This is done to avoid accidental circular dependencies.

Test Plan:
```
python test/test_ao_sparsity.py
```
Imported from OSS

Differential Revision:
D28970961
D28970961

Reviewed By: raghuramank100

Pulled By: z-a-f

fbshipit-source-id: c312742f5e218c435a1a643532f5842116bfcfff
2021-07-02 16:32:39 -07:00
973e9266ff [sparsity] Sparsifier class (#58704)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58704

Implements the base sparsifier class based on the #59835 RFC documents.

This PR implements the base class for the sparsification. Specifically, the prepare method is implemented.

Test Plan:
```
python test/test_ao_sparsity.py
```
Imported from OSS

Differential Revision:
D28970958
D28970958

Reviewed By: raghuramank100

Pulled By: z-a-f

fbshipit-source-id: 0ef98a445c0a0aca22ce5708e34a9f94606d0e2b
2021-07-02 16:31:21 -07:00
80cab10534 [sparsity] Sparsity parametrization (#58705)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58705

The basic demo for this particular implementation can be found here:
https://gist.github.com/z-a-f/1d06ae8d5a509d3c9c1596dcb924afe0

Test Plan:
```
python test/test_ao_sparsity.py
```
Imported from OSS

Differential Revision:
D28970959
D28970959

Reviewed By: raghuramank100

Pulled By: z-a-f

fbshipit-source-id: 2a0bea1e0a81816690e05f83051d607c90925d32
2021-07-02 11:12:31 -07:00
5d34b7955b [sparsity][refactor] Changing linear row/col control (#60850)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60850

Test Plan:
```
python test/test_ao_sparsity.py
```

```
python test/test_ao_sparsity.py
```

Differential Revision:
D29465900
D29465900

Reviewed By: raghuramank100

Pulled By: z-a-f

fbshipit-source-id: 412f50da857f377898fea79d378ae54a049b81fe
2021-07-02 11:12:30 -07:00
509b1ef9d5 [sparsity] Add sparsity tests to run_test.py (#60887)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60887

Test Plan:
```
./test/run_test.py -i test_ao_sparsity
```

```
./test/run_test.py -i test_ao_sparsity
```

Differential Revision:
D29465834
D29465834

Reviewed By: mruberry

Pulled By: z-a-f

fbshipit-source-id: 144f940363a20dd65c2bbfe70924c266d8791dc7
2021-07-02 11:11:20 -07:00
54673fc944 Sparse: Remove dispatch in parallel region (#60598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60598

Ref #56794

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D29446192

Pulled By: ngimel

fbshipit-source-id: 1a11f3aa847e4ce83fc6f50cee362b7d0cb61eae
2021-07-01 21:56:17 -07:00
11b722c063 [DDP] Refactor hook running logic (#61018)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61018

Extract logic of hook running to a function `run_reduction_hook` that takes in a `GradBucket` and runs the hook/allreduce. This is mainly to prepare for join to support comm. hook in follow up diffs.
ghstack-source-id: 132924220

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D29477143

fbshipit-source-id: 87e8e563e71821fd462d6b259c98a6a0afbcd7b4
2021-07-01 20:41:55 -07:00
b21df03f3b [DDP] Remove SPMD from get_bucket_tensors (#61017)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61017

Removes SPMD nested vector logic from this codepath. This is mostly in preparation for the next diffs in this stack which enable support for join with comm. hook.
ghstack-source-id: 132924223

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D29477360

fbshipit-source-id: f8132a94b1abfe28586aa78ac47e13a7ce6bb137
2021-07-01 20:40:53 -07:00
4a2e8b53bb [JIT] Add torch._C.ScriptList` (#52832)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52832

**Summary**
This commit adds `torch._C.ScriptList`, a list type that has reference
semantics across the Python/TorchScript boundary. That is, modifications
made in TorchScript to instances of `torch._C.ScriptList`
are visible in Python even when it is not returned from the function.

`torch._C.ScriptList` is implemented using a modified version of pybind's
`stl_bind.h`-style bindings attached to `ScriptList` and `ScriptListIterator`,
wrapper classes around `c10::impl::GenericList` and
`c10::impl::GenericList::iterator`. These bindings allow instances of
`torch._C.ScriptList` to be used as if it were a
regular `list` in Python. Reference semantics are achieved by simply
retrieving the `IValue` contained in `ScriptList` in `toIValue` (invoked
when converting Python arguments to `IValues` before calling TorchScript
code).

**Test Plan**
This commit adds `TestScriptList` to `test_list_dict.py`, a set of tests
that check that all of the common list operations are supported
and that instances have reference semantics across the
Python/TorchScript boundary.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D29478121

Pulled By: SplitInfinity

fbshipit-source-id: 652cc25cfa37debe28db9527504846f22abd8b54
2021-07-01 20:28:13 -07:00
6e9e30cc1d Ignore notebooks when checking for newlines (#61156)
Summary:
Fix lint on master (these files should be considered "generated" so don't lint them)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61156

Reviewed By: malfet

Differential Revision: D29532211

Pulled By: driazati

fbshipit-source-id: a1e47f45bedf441613bdc2bd60fbf8299e5c962f
2021-07-01 18:11:43 -07:00
a4d86e0d53 [quant][fx][perf] improve runtime of prepare step for large models (#61132)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61132

For large models, the insert_observers_for_model function was taking a long time, especially for the case where not all the nodes are being quantized

For example for a model with 21000 nodes of which only ~50 are being quantized the breakdown of prepare_fx vs convert fx was

prepare_fx 979 seconds
convert_fx 9 seconds

The main reason was because we were doing some unnecessary computation for all nodes in this function, this PR just moves them to where they are actually used

After this PR
prepare_fx 26 seconds
convert_fx 9 seconds

Test Plan:
Existing tests

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D29522303

fbshipit-source-id: 7ce12582a859d02ff763abebf4a592d28e0764ca
2021-07-01 17:17:10 -07:00
277b310edb [DataLoader] Add notebook with DataPipes API example (#60680)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60680

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29461079

Pulled By: VitalyFedyunin

fbshipit-source-id: 6532bf77113ab89a50f8bb022daf80f8477e9297
2021-07-01 16:39:28 -07:00
ca2702a776 [pruner] Make bias hook stateless (#61077)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61077

Removing `BiasHook` class, using function instead.
ghstack-source-id: 132899223

Test Plan:
` buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1L7Tg

Reviewed By: z-a-f

Differential Revision: D29504119

fbshipit-source-id: 6dd9689d18b17ac64e8a461f466e2c9018bc530b
2021-07-01 14:59:00 -07:00
0a7875231b [pruner] Add bias support (#60970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60970

Support adding bias in eager mode
ghstack-source-id: 132695883

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1L3K3

Reviewed By: z-a-f

Differential Revision: D29441499

fbshipit-source-id: 47e0fff5b3014612bd021e145160ea54e2645e24
2021-07-01 14:57:09 -07:00
87dbdef65d MAINT Adds test and docs for Linear with no batch dims (#60992)
Summary:
Towards https://github.com/pytorch/pytorch/issues/60585

This PR updates docs for `Linear` and adds a non-batch test case to `common_nn.py`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60992

Reviewed By: VitalyFedyunin

Differential Revision: D29518451

Pulled By: jbschlosser

fbshipit-source-id: 6dd79c0f21ac5b6f693e3e1ba954379d2606d4e0
2021-07-01 14:49:24 -07:00
369802a504 Add aten::avgpool2d NNAPI converter (#58538)
Summary:
Add support for aten::avgpool2d op in the NNAPI model converter with var
size support

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58538

Test Plan: pytest test/test_nnapi.py::TestNNAPI::test_avgpool2d

Reviewed By: anshuljain1

Differential Revision: D28531944

fbshipit-source-id: 43ff8c9389365698c282f204042b49c7ec84d824
2021-07-01 14:07:14 -07:00
19b6ee4d4e model_dump working with delegate models (#61043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61043

Trying to make model_dump work with delegate models
ghstack-source-id: 132809755

Test Plan:
N509022.

The data.pkl in the lowered model:
```
bash-3.2$ python -m torch.utils.show_pickle /Users/myuan/models/backend/lowered_model.pt@*/data.pkl
torch.jit.backend_with_compiler_demo.LoweredModule.__torch__.___torch_mangle_5.ModuleAdd()(state=
 (torch.jit._pickle.restore_type_tag({'forward': torch.jit._pickle.restore_type_tag({'input_shapes': '((1, 1, 320, 240), (1, 3))',
                   'some_other_option': 'True'},
                  'Dict[str, str]')},
    'Dict[str, Any]'),
  torch.jit._pickle.restore_type_tag({'forward': 'prim::Constant#1<debug_handle>271,aten::add<debug_handle>272'},
    'Dict[str, str]'),
  True))
```
Comparing to data.pkl in scripted_model.pt:
```
__torch__.___torch_mangle_7.ModuleAdd()(state=
 {'_is_full_backward_hook': None, 'training': True})
```

Reviewed By: Amyh11325

Differential Revision: D29464860

fbshipit-source-id: d738e98ea518339465f8e3375207cf83e3dac532
2021-07-01 13:39:56 -07:00
374278f431 Improved sparse CSR tensor sampling method (#60283)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59379

The improved sparse CSR tensor sampling method is described in https://pearu.github.io/csr_sampling.html that features:
- for specified `nnz`, one gets a CSR sample with the same `nnz`
- variability of the number of specified columns per row is maximized
- `crow_indices` content is randomized
- a given row specific `col_indices` content is sorted and filled with unique values (see also https://github.com/pytorch/pytorch/issues/60277)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60283

Reviewed By: bhosmer

Differential Revision: D29492605

Pulled By: cpuhrsch

fbshipit-source-id: 8d875b7c2b0573a9ab37047c6d8fe8b540295ce1
2021-07-01 13:26:19 -07:00
6ecc1a4c4f Make pytorch clang-tidy clean (#60649)
Summary:
This PR suppresses clang-tidy warnings in the codebase (for now) so that we can re-enable clang-tidy checks on master.

I ran this script to add the `NOLINTNEXTLINE` comments (on a devserver):
```bash
python3 setup.py develop

# Uses same script that's run on CI and adds the -j (parallel), -s (add comments), -k (continue if diagnostic errors are found) options
python3 tools/clang_tidy.py \
  -j \
  -s \
  -k \
  -v \
  --paths torch/csrc/ \
  -g"-torch/csrc/jit/passes/onnx/helper.cpp" \
  -g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp" \
  -g"-torch/csrc/jit/serialization/onnx.cpp" \
  -g"-torch/csrc/jit/serialization/export.cpp" \
  -g"-torch/csrc/jit/serialization/import.cpp" \
  -g"-torch/csrc/jit/serialization/import_legacy.cpp" \
  -g"-torch/csrc/onnx/init.cpp" \
  -g"-torch/csrc/cuda/nccl.*" \
  -g"-torch/csrc/cuda/python_nccl.cpp" \
  -g"-torch/csrc/autograd/FunctionsManual.cpp" \
  -g"-torch/csrc/generic/*.cpp" \
  -g"-torch/csrc/jit/codegen/cuda/runtime/*" \
  -g"-torch/csrc/deploy/interpreter/interpreter.cpp" \
  -g"-torch/csrc/deploy/interpreter/interpreter.h" \
  -g"-torch/csrc/deploy/interpreter/interpreter_impl.h" \
  -g"-torch/csrc/deploy/interpreter/test_main.cpp"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60649

Test Plan: Verified changes by re-running the script (without the `-s` option) and seeing no warnings/errors.

Reviewed By: walterddr, janeyx99

Differential Revision: D29504258

Pulled By: 1ntEgr8

fbshipit-source-id: 78310b30ee8213b73ddb4771ad874665323e7a4e
2021-07-01 12:21:07 -07:00
a0a9ea6598 Fix documentation preview instructions (#61080)
Summary:
People don't need to self host these anymore since we do it automatically in PRs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61080

Reviewed By: VitalyFedyunin, janeyx99

Differential Revision: D29506465

Pulled By: driazati

fbshipit-source-id: 45875cb229f8cc565a9a1405f52cef198ee0e687
2021-07-01 12:17:34 -07:00
60509f8921 Update DDP documentation to mention outputs not used in loss is supported (#60275)
Summary:
We recently landed a change to ensure that when running under ``find_unused_parameters=True``, not all module outputs have to be used in loss computation and DDP will work as expected. Mention this update in the documentation and add some additional clarification.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60275

Reviewed By: SciPioneer

Differential Revision: D29502609

Pulled By: rohan-varma

fbshipit-source-id: ddb3129cff9492018e61813413b30711af212309
2021-07-01 11:56:53 -07:00
0128eb9a85 Fix TSAN issue in distributed tests (#59238)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59238

Creating a `mutliprocessing.Manager()` launches a new process using the `fork` method (because it's the default one), and then in that subprocess it launches a new thread. TSAN really doesn't like this (and rightly so!) because we already had threads in the superprocess, and intermixing threads and forks is dangerous. The proper way to deal with this is to `exec` inside the child process or, in other words, use the `spawn` method.

Note that the method used to launch the Manager is entirely unrelated from the method used to launch our "own" subprocesses, hence we were using `fork` for the Manager even though we were using `spawn` for our own subprocesses.
ghstack-source-id: 130240724

Test Plan: Reverted the silencing introduced in D28490129, ran the `test_init_rpc_then_pg` test from the TensorPipe suite and saw the original TSAN failure. Then applied my fix, re-ran the test, and the failure was gone.

Reviewed By: zhaojuanmao

Differential Revision: D28794321

fbshipit-source-id: 12242e69be399a7f02a40a0ebb3d92f92e00ce73
2021-07-01 11:53:01 -07:00
5b44d817fb Expose raw saved tensors for codegen functions (#60565)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60565

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29466225

fbshipit-source-id: 77eb4214a1baecc501282413d99d55f8935dc01f
2021-07-01 11:25:21 -07:00
3f0f860a1c Condense JIT/Quantization triage into one workflow (#61130)
Summary:
The `.github/workflows/{jit,quantization}_triage.yml` workflows are nearly identical, so this PR consolidates them into a single GitHub Actions workflow to reduce code duplication. It also renames the workflow so it starts with a capital letter, so that it will show up alongside all our other GitHub Actions workflows on [the HUD](https://hud.pytorch.org/build2/pytorch-master).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61130

Reviewed By: walterddr

Differential Revision: D29520022

Pulled By: samestep

fbshipit-source-id: 673789762e08c2c77d72e7c20eb16d6beec573ba
2021-07-01 10:50:26 -07:00
6f92f10c94 Use a leaky singleton for CublasHandlePool. (#60987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60987

We were seeing deadlocks as follows during shutdown:

```
Thread 1 (LWP 2432101):
#0  0x00007efca470190b in __pause_nocancel () from /lib64/libc.so.6
#1  0x00007efca49de485 in __pthread_mutex_lock_full () from /lib64/libpthread.so.0
#2  0x00007ef91d4c42c6 in __cuda_CallJitEntryPoint () from /lib64/libnvidia-ptxjitcompiler.so.1
#3  0x00007efc651ac8f1 in ?? () from /lib64/libcuda.so
#4  0x00007efc651aee03 in ?? () from /lib64/libcuda.so
#5  0x00007efc64f76b84 in ?? () from /lib64/libcuda.so
#6  0x00007efc64f77f5d in ?? () from /lib64/libcuda.so
#7  0x00007efc64eac858 in ?? () from /lib64/libcuda.so
#8  0x00007efc64eacfbc in ?? () from /lib64/libcuda.so
#9  0x00007efc7810a924 in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#10 0x00007efc780fa2be in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#11 0x00007efc78111044 in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#12 0x00007efc7811580a in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#13 0x00007efc78115aa4 in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#14 0x00007efc781079ec in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#15 0x00007efc780e6a7a in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#16 0x00007efc7811cfa5 in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#17 0x00007efc777ea98c in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#18 0x00007efc777ebd80 in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#19 0x00007efc777ea2c9 in ?? () from /usr/local/cuda/lib64/libcublas.so.11
#20 0x00007efc778c2e2d in cublasDestroy_v2 () from /usr/local/cuda/lib64/libcublas.so.11
#21 0x00007efc51a3fb56 in std::_Sp_counted_ptr_inplace<at::cuda::(anonymous namespace)::DeviceThreadHandlePool<cublasContext*, &at::cuda::(anonymous namespace)::createCublasHandle, &at::cuda::(anonymous namespace)::destroyCublasHandle>, std::allocator<at::cuda::(anonymous namespace)::DeviceThreadHandlePool<cublasContext*, &at::cuda::(anonymous namespace)::createCublasHandle, &at::cuda::(anonymous namespace)::destroyCublasHandle> >, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /data/users/pritam/pytorch/torch/lib/libtorch_cuda.so
#22 0x00007efc51a3fc5f in std::shared_ptr<at::cuda::(anonymous namespace)::DeviceThreadHandlePool<cublasContext*, &at::cuda::(anonymous namespace)::createCublasHandle, &at::cuda::(anonymous namespace)::destroyCublasHandle> >::~shared_ptr() () from /data/users/pritam/pytorch/torch/lib/libtorch_cuda.so
#23 0x00007efca4648b0c in __run_exit_handlers () from /lib64/libc.so.6
#24 0x00007efca4648c40 in exit () from /lib64/libc.so.6
#25 0x0000558c8852e5f9 in Py_Exit (sts=0) at /tmp/build/80754af9/python_1614362349910/work/Python/pylifecycle.c:2292
#26 0x0000558c8852e6a7 in handle_system_exit () at /tmp/build/80754af9/python_1614362349910/work/Python/pythonrun.c:636
#27 0x0000558c8852e742 in PyErr_PrintEx (set_sys_last_vars=<optimized out>, set_sys_last_vars=<optimized out>) at /tmp/build/80754af9/python_1614362349910/work/Python/pythonrun.c:646
#28 0x0000558c88540dd6 in PyRun_SimpleStringFlags (command=0x7efca4dc9050 "from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=9, pipe_handle=13)\n", flags=0x7ffe3a986110) at /tmp/build/80754af9/python_1614362349910/work/Python/pythonrun.c:457
#29 0x0000558c88540ead in pymain_run_command (cf=0x7ffe3a986110, command=<optimized out>) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:420
#30 pymain_run_python (pymain=0x7ffe3a986220) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:2907
#31 pymain_main (pymain=0x7ffe3a986220) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:3460
#32 0x0000558c8854122c in _Py_UnixMain (argc=<optimized out>, argv=<optimized out>) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:3495
#33 0x00007efca4632493 in __libc_start_main () from /lib64/libc.so.6
#34 0x0000558c884e5e90 in _start () at ../sysdeps/x86_64/elf/start.S:103
```

This was likely caused due to a static singleton that wasn't leaky. Following
the guidance in https://isocpp.org/wiki/faq/ctors#construct-on-first-use-v2 to
use a leaky singleton instead.
ghstack-source-id: 132847448

Test Plan: Verified locally.

Reviewed By: malfet

Differential Revision: D29468866

fbshipit-source-id: 89250594c5cd2643417b1da584c658b742dc5a5c
2021-07-01 10:23:07 -07:00
d2fef350f2 add embedding bag skeleton take 2 (#61126)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61126

adding skeleton implementations of quantized embedding tables with zeroes

Test Plan:
compilation, farm test, and ran test_find_dangling_impls and passed

did a manual negative test and verified the message is printed properly
```
======================================================================
FAIL: test_find_dangling_impls (test_dispatch.TestPythonDispatcher)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/users/hyz/fbsource/fbcode/buck-out/opt/gen/caffe2/test/others#binary,link-tree/test_dispatch.py", line 892, in test_find_dangling_impls
    self.assertEqual(
  File "/data/users/hyz/fbsource/fbcode/buck-out/opt/gen/caffe2/test/others#binary,link-tree/torch/testing/_internal/common_utils.py", line 1498, in assertEqual
    super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Scalars failed to compare as equal! 0 != 1
Expect zero dangling impls, but found: ['name: quantized::qembedding_bag_4bit_unpack\nschema: (none)\nCUDA: registered at caffe2/aten/src/ATen/native/quantized/cuda/embedding_bag.cu:394 :: (Tensor _0) -> (Tensor _0) [ boxed unboxed ]\n']

Reviewed By: walterddr

Differential Revision: D29518274

fbshipit-source-id: d0cb81c8bf51cdc4b83038758131ccf61e4360f5
2021-07-01 10:11:45 -07:00
e5ae0e652d [jit] Allow instance overrides of ignored methods (#61076)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61076

Previously we would always retrieve ignored methods from the
type, which doesn't work when the user has overriden the ignored method
for a specific instance.

This PR changes things up so we retrieve the ignored method as a bound
method from the object being scripted, unwrap it, then re-bind it to the
scriptmodule.

Test Plan: Imported from OSS

Differential Revision: D29504421

Pulled By: suo

fbshipit-source-id: 14649863ea69a8d2180dd2c4341ec9a826039de1
2021-07-01 09:26:30 -07:00
ccfdb30644 Revert D29413019: [torch] Various improvements to torch.distributed.launch and torch.distributed.run
Test Plan: revert-hammer

Differential Revision:
D29413019 (4e181dfc35)

Original commit changeset: 323bfbad9d0e

fbshipit-source-id: 1f8ae4b3d0a23f3eaff28c37e9148efff25fafe2
2021-07-01 08:44:51 -07:00
48bfc0e51c [DataLoader] Add Example Only fork DataPipe (#60679)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60679

This is example only DataPipe, not intended to be used in production. Used for tutorials, tests and documentation.
Have to be replaced by real `fork` upon DataLoader update.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29461084

Pulled By: VitalyFedyunin

fbshipit-source-id: a7e435f055f040e358f5465092b8daa07f8e29b7
2021-07-01 08:41:26 -07:00
62b2dc2059 [DataLoader] Decorate ZipDataPipe as zip (#60678)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60678

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29461085

Pulled By: VitalyFedyunin

fbshipit-source-id: f2037fbc67369aae10b07ef80a19e2a0ea7bf530
2021-07-01 08:41:25 -07:00
8e21ff91e2 [DataLoader] Add simple groupby DataPipe (#60675)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60675

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29461082

Pulled By: VitalyFedyunin

fbshipit-source-id: ded5a3a1555bfd8457d64b7e61ab6729fff9cb75
2021-07-01 08:40:20 -07:00
cb7d813275 Revert D28836794: SumKernel (BFloat16): use float as accumulation type
Test Plan: revert-hammer

Differential Revision:
D28836794 (4f5c68857f)

Original commit changeset: 46ed3a862c2b

fbshipit-source-id: 3b586eeb752b7cdee909fa97a4c78876a6014770
2021-07-01 08:12:31 -07:00
11dca2e5f3 Fix some integer comparisons (#60894)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60894

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29431512

fbshipit-source-id: b0ef7656806f378ad823e503e7c27cc563d3dc7d
2021-07-01 08:08:39 -07:00
7017dc101f Revert D29313058: add an embedding bag skeleton operators
Test Plan: revert-hammer

Differential Revision:
D29313058 (ae21357ada)

Original commit changeset: b05df6ff9a7c

fbshipit-source-id: ef422aedad71dee6cb2824c58aceb66104376a65
2021-07-01 07:37:02 -07:00
d6521c2249 [pyper][emb][quantization] Support emb trained in FP16 (#60736)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60736

Add support of embedding with input data type as float16, utilize new kernel functions added in fbgemm https://github.com/pytorch/FBGEMM/pull/616

Test Plan: `buck test caffe2/test/:quantization -- test_embedding_bag`

Reviewed By: supriyar

Differential Revision: D29392320

fbshipit-source-id: 0a120b3a58b6cf1d84961831097e9581ffd2b591
2021-07-01 07:35:59 -07:00
d42aa176e4 Bump docker image tag for clang-tidy (#61115)
Summary:
The new tag should fix the "Missing <omp.h>" error message on clang-tidy runs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61115

Test Plan:
Ran the clang-tidy job using the diff from https://github.com/pytorch/pytorch/issues/60976.

Expected Output:
There should be no clang diagnostic errors.

Reviewed By: walterddr

Differential Revision: D29516845

Pulled By: 1ntEgr8

fbshipit-source-id: 554229904db67eb7a7b93b3def434b30de6a43b0
2021-07-01 07:30:28 -07:00
46595a9623 [Static Runtime] Add gflag to disable nnc and caffe2 math library (#61090)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61090

Reviewed By: ajyu

Differential Revision: D29479860

fbshipit-source-id: 2b53405f41d319f074c75d8923d97fd6a45fee4b
2021-07-01 00:01:37 -07:00
c1499a9933 Enable jit tracing to parametrization and add jit tests (#60969)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60969

This PR fixes the tracing in the parametrizations.
The current resolution is that when tracing is performed while caching is enabled, we throw an error.
Without caching, the tracing should work properly (tests added).

Currently, the parametrizations don't support scripting.
This PR introduces the same logic as with the tracing (throw error if caching).
However, the scripting itself cannot enabled due to the use of the generator expressions in the parametrizations.
Added TODO to fix it.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29462887

Pulled By: z-a-f

fbshipit-source-id: 49721d3059be58f36055d1c374080df41a748d66
2021-06-30 23:54:02 -07:00
4e181dfc35 [torch] Various improvements to torch.distributed.launch and torch.distributed.run (#60925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60925

* Make `torch.distributed.launch` restarts to 0
* Remove unnecessary `-use_env` warning, move `-use_env` warnings
* Move `-use_env` warnings to `torch.distributed.launch`
* Make default log level WARNING
* Add new doc section around transitioning to `torch.distributed.run`
* Make `torch.distributed.launch` not use error-propagation
* Set default events handler to `null` that does not print events to console
* Add reference from `torch.distributed.launch` to `torch.distributed.run`
* Set correct preexec function that sends SIGTERM to child processes when parent dies

Issues resolved:

https://github.com/pytorch/pytorch/issues/60716
https://github.com/pytorch/pytorch/issues/60754

Test Plan:
sandcastle

    python -m torch.distributed.launch --nproc_per_node 2 main.py -> uses 0 restarts
    python -m torch.distributed.run --nproc_per_node 2 main.py -> uses default for torchelastic, 0 restarts

    python -m torch.distributed.launch --nproc_per_node=4  --use_env --no_python  main.py -> produces error
    python -m torch.distributed.launch --nproc_per_node=4  --use_env main.py -> no warning
    python -m torch.distributed.launch --nproc_per_node=4  --no_python  main.py ->warning

Output of running torch.distributed.launch without --use_env:

    $path/torch/distributed/launch.py:173: FutureWarning: The module torch.distributed.launch is deprecated
    and will be removed in future. Use torch.distributed.run.
    Note that --use_env is set by default in torch.distributed.run.
    If your script expects `--local_rank` argument to be set, please
    change it to read from `os.environ('LOCAL_RANK')` instead.

New section:

{F628923078}

{F628974089}

Reviewed By: kiukchung, cbalioglu

Differential Revision: D29413019

fbshipit-source-id: 323bfbad9d0e4aba3b10ddd7a243ca6e48169630
2021-06-30 23:31:02 -07:00
ae21357ada add an embedding bag skeleton operators (#60491)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60491

basic reference embedding bag operators, these are not going to be performant but can be used for functionality enablement

these operators will output the right shape, but the implementation is empty

Test Plan: tbd

Reviewed By: vkuzo

Differential Revision: D29313058

fbshipit-source-id: b05df6ff9a7c0c6ac46ef64a42464988453bd460
2021-06-30 23:09:11 -07:00
db1dd9e7e0 add support for quantized tensors in torch.testing.assert_close (#58926)
Summary:
This adds support for quantized tensors the same way torch.testing._internal.common_utils.TestCase.assertEqual does:

bf269fdc98/torch/testing/_internal/common_utils.py (L1314-L1341)

- `.qscheme()` is checked for equality
- `.q_scale` and `q_zero_point` are checked for equality (see comment below) for `.qscheme() == torch.per_tensor_affine`
- `.q_per_channel_scales`, `q_per_channel_zero_points`, and `q_per_channel_axis` are checked for equality (see comment below) for `.qscheme() == torch.per_tensor_affine`
- values are checked with the default checks after a `.int_repr().to(torch.int32)` call

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58926

Reviewed By: jerryzh168

Differential Revision: D29483532

Pulled By: mruberry

fbshipit-source-id: 003fde7e21cf844778a879c3de0a7c84d13877bd
2021-06-30 21:43:02 -07:00
06fc637b41 Check native_function's outputs' TensorImpl and StorageImpl (#60286)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/25927

Does some checks described in https://github.com/pytorch/pytorch/issues/25927#issuecomment-589354373:
If function does not modify its inputs (non-inplace and has no out arg):
- Check TensorImpl has use_count of 1. (This should make us aware of functions that return self.
- If function is a view function check that StorageImpl is same as that of the aliased input, otherwise, StorageImpl's use_count is 1.

Detected a couple functions that failed the check that returned TensorImpl should have use_count of 1: 'native_batch_norm', 'native_batch_norm_backward', '_embedding_bag'. (Filing issues).

Examples of generated code:
We did not update checks for in-place ops (this includes in-place views).

Example of a view:
- Check that outputs StorageImpl of `result` is the same as that of `self`.
- Check TensorImpl has use_count of 1
```cpp
at::Tensor as_strided(c10::DispatchKeySet ks, const at::Tensor & self, at::IntArrayRef size, at::IntArrayRef stride, c10::optional<int64_t> storage_offset) {
  auto& self_ = unpack(self, "self", 0);
  auto _any_requires_grad = compute_requires_grad( self );
  (void)_any_requires_grad;
  std::shared_ptr<AsStridedBackward> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<AsStridedBackward>(new AsStridedBackward(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self ));
    grad_fn->self_geometry = TensorGeometry(self);
    grad_fn->size = size.vec();
    grad_fn->stride = stride.vec();
    grad_fn->storage_offset = storage_offset;
  }
  #ifndef NDEBUG
  c10::optional<Storage> self__storage_saved =
    self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
  c10::intrusive_ptr<TensorImpl> self__impl_saved;
  if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
  #endif
  auto _tmp = ([&]() {
    at::AutoDispatchBelowAutograd guard;
    return at::redispatch::as_strided(ks & c10::after_autograd_keyset, self_, size, stride, storage_offset);
  })();
  auto result = std::move(_tmp);
  #ifndef NDEBUG
  if (self__storage_saved.has_value())
    AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
  if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr());
  if (self__storage_saved.has_value())
    AT_ASSERT(self__storage_saved.value().is_alias_of(result.storage())); <<<<<<<<<<<<<<<<<<<<<<<<
  AT_ASSERT(result.use_count() == 1); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  #endif
  if (grad_fn) {
      set_history(flatten_tensor_args( result ), grad_fn);
  }
  TORCH_CHECK_NOT_IMPLEMENTED(!(isFwGradDefined(self)), "Trying to use forward AD with as_strided that does not support it.");
  return result;
}
```
Example of non-view:
- Check that output's StorageImpl has use_count of 1.
- Check that output's TensorImpl has use_count of 1.
```cpp
at::Tensor asin(c10::DispatchKeySet ks, const at::Tensor & self) {
  auto& self_ = unpack(self, "self", 0);
  auto _any_requires_grad = compute_requires_grad( self );
  (void)_any_requires_grad;
  std::shared_ptr<AsinBackward> grad_fn;
  if (_any_requires_grad) {
    grad_fn = std::shared_ptr<AsinBackward>(new AsinBackward(), deleteNode);
    grad_fn->set_next_edges(collect_next_edges( self ));
    grad_fn->self_ = SavedVariable(self, false);
  }
  #ifndef NDEBUG
  c10::optional<Storage> self__storage_saved =
    self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
  c10::intrusive_ptr<TensorImpl> self__impl_saved;
  if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
  #endif
  auto _tmp = ([&]() {
    at::AutoDispatchBelowADInplaceOrView guard;
    return at::redispatch::asin(ks & c10::after_autograd_keyset, self_);
  })();
  auto result = std::move(_tmp);
  #ifndef NDEBUG
  if (self__storage_saved.has_value())
    AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
  if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr());
  if (result.has_storage()) AT_ASSERT(result.storage().use_count() == 1); <<<<<<<<<<<<<<<<<<<<<<<<<<
  AT_ASSERT(result.use_count() == 1); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
  #endif
  if (grad_fn) {
      set_history(flatten_tensor_args( result ), grad_fn);
  }
  if (isFwGradDefined(self)) {
      auto self_t_raw = toNonOptFwGrad(self);
      auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self));
      auto self_p = toNonOptPrimal(self);
      auto result_new_fw_grad = (self_t.conj() * (-self_p * self_p + 1).rsqrt().conj()).conj();
      if (result_new_fw_grad.defined()) {
        // The hardcoded 0 here will need to be updated once we support multiple levels.
        result._set_fw_grad(result_new_fw_grad, /* level */ 0, /* is_inplace_op */ false);
      }
  }
  return result;
}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60286

Reviewed By: jbschlosser

Differential Revision: D29402253

Pulled By: soulitzer

fbshipit-source-id: b90f34c455b8767f95a52c329db351dbbb495397
2021-06-30 19:19:01 -07:00
03b5a225a7 Test parametrization for instantiated device-specific tests (#60233)
Summary:
The `ops` decorator provides a way to parameterize a test across a given list of ops. This would be useful for modules as well (e.g. a `modules` decorator), but the mechanism by which this is accomplished is specific to ops. In the details, the `ops` decorator tags a test function with the metadata needed (list of ops, `dtypes`) and the actual tests are generated according to this metadata during the call to `instantiate_device_type_tests()`.

This PR makes this mechanism more generic, allowing for test parameterization across arbitrary dimensions. This makes a `modules` decorator (or any similar type of decorator) straightforward to implement without changes to the device-specific test instantiation logic.

One caveat is that, since this is implemented where the old `ops` decorator was (within `instantiate_device_type_tests()`), this only works for tests instantiated using the device-specific instantiation logic. Longer term, even device-specific test instantiation could be treated as an optional parameterization across device types, but this PR takes a low-risk approach for now. In practice, this just means that a `device` kwarg is required for all test signatures used with the mechanism.

The `ops` decorator has been refactored to use the generic mechanism and works the same as before, with one difference: when `OpDTypes.none` is specified, the test signature no longer needs an unused `dtype` kwarg. This is a nice bonus that demonstrates the added flexibility of a generic parameterization mechanism. The refactored form also has the bonus that all op-specific test generation logic is contained within the `ops` decorator class, improving readability.

Behind the scenes, the generic mechanism is a base decorator class (`_TestParameterizer`) from which `ops` derives. The core functionality is in the `_parameterize_test()` method, which takes in a test function and returns a generator that produces parameterized tests, including names and parameter kwargs to pass to them. Using the `ops` decorator results in a set of op-specific tests from a given generic test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60233

Reviewed By: iramazanli

Differential Revision: D29494995

Pulled By: jbschlosser

fbshipit-source-id: a14446488c106094fafcaa75ccf8e9e3faf33bfc
2021-06-30 18:50:22 -07:00
6643df2680 [jit] Use computed loop to dispatch to next instruction in interpreter. (#60211)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60211

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D29211283

fbshipit-source-id: 2f87b5a78d4fc00ce11ed509fc15db35332690b6
2021-06-30 17:44:26 -07:00
357a21bc92 Fix numerical issue of rowwise normalization in Caffe2 and internal tests. (#60880)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60880

Fix numerical issue of rowwise normalization in Caffe2 and internal tests.

Test Plan: buck test mode/opt //dper3/dper3/modules/tests:xdeepint_test -- --exact 'dper3/dper3/modules/tests:xdeepint_test - test_xdeepint_with_full_features_with_interactions_3 (dper3.dper3.modules.tests.xdeepint_test.XdeepInt_Test)'

Reviewed By: esqu1

Differential Revision: D29431597

fbshipit-source-id: 72df52fdcbb29ad3de7b9472f25fde26cf804a76
2021-06-30 17:31:04 -07:00
0824b919ec [BE] move general script out of .circleci/ into tools/ (#60973)
Summary:
Second step in https://github.com/pytorch/pytorch/issues/60373.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60973

Reviewed By: samestep

Differential Revision: D29499385

Pulled By: walterddr

fbshipit-source-id: 22df22f78f6b9af6221917a10188218773245009
2021-06-30 17:20:05 -07:00
4036820506 Add PocketFFT support (#60976)
Summary:
Needed on platforms, that do not have MKL, such as aarch64 and M1
- Add `AT_POCKETFFT_ENABLED()` to Config.h.in
- Introduce torch._C.has_spectral that is true if PyTorch was compiled with either MKL or PocketFFT
- Modify spectral test to use skipCPUIfNoFFT instead of skipCPUIfNoMKL

Share implementation of `_out` functions as well as fft_fill_with_conjugate_symmetry_stub between MKL and PocketFFT implementations

Fixes https://github.com/pytorch/pytorch/issues/41592

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60976

Reviewed By: walterddr, driazati, janeyx99, samestep

Differential Revision: D29466530

Pulled By: malfet

fbshipit-source-id: ac5edb3d40e7c413267825f92a5e8bc4bb249caf
2021-06-30 16:28:20 -07:00
2d0c6e60a7 going back to use packaging.version.parse instead (#61053)
Summary:
I think this may be related to https://app.circleci.com/pipelines/github/pytorch/vision/9352/workflows/9c8afb1c-6157-4c82-a5c8-105c5adac57d/jobs/687003

Apparently `pkg_resource.parse_version` returns a type of `pkg_resources.extern.packaging.version.Version` instead of `packaging.version.Version` and seems on some older version of the setuptools it doesn't support `.major/minor` operation. changing it back to using `packaging.version.parse`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61053

Test Plan: CI

Reviewed By: samestep

Differential Revision: D29494322

Pulled By: walterddr

fbshipit-source-id: 294572a10b167677440d7404e5ebe007ab59d299
2021-06-30 16:23:59 -07:00
a2ad84afbb Send test reports to S3 (#61071)
Summary:
This sends the test reports zip to S3 in addition to the GitHub artifact store. This makes it easier to query in the PR HUD since we don't have to deal with the GitHub API's rate limits / download speeds. The impact on S3 storage should be minimal since it's only 500 KB or so per run.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61071

Reviewed By: nikithamalgifb

Differential Revision: D29498941

Pulled By: driazati

fbshipit-source-id: 74bfbe7fa7d1d97fd8a6938c98dfe0caff0ab6eb
2021-06-30 16:00:01 -07:00
812ed47caa [Static runtime] Add unit tests to ops bmm and addmm (#61000)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61000

Add unit tests to bmm and addmm operators in static runtime.

Test Plan:
buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest

{F628935117}

Reviewed By: hlu1

Differential Revision: D29459679

fbshipit-source-id: 5c7fa5c9b0675c1c84f3ae3110204d663255009c
2021-06-30 15:55:58 -07:00
4ff81ab112 escape backward slash in stack trace in Windows to slash (#60842)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60187

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60842

Reviewed By: gdankel

Differential Revision: D29498498

Pulled By: malfet

fbshipit-source-id: 78e1b25a2e6bdfd3ba0c988d023c7a7f79a22cf4
2021-06-30 15:32:03 -07:00
6c1c1111de [JIT] Add reference semantics to TorchScript classes (#44324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44324

**Summary**
This commit adds reference semantics to TorchScript class types;
modifications made to them within TorchScript will be visible in Python.

**Test Plan**
This commit adds a unit test to `TestClassType` that checks that
modifications made to a class type instance passed into TorchScript are
visible in Python after executing the scripted function or module.

**Fixes**
This commit closes #41421.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D24912807

Pulled By: SplitInfinity

fbshipit-source-id: d64ac6211012425b040b987e3358253016e84ca0
2021-06-30 14:27:17 -07:00
aa728dc335 Fix fx patch module name (#61062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61062

Instead of being 'patch' this should be the import-able name of the module (it's defined as `_fx` on the `torch._C` module, so the full name should be `torch._C._fx`). This now works correctly:

```python
>>> import torch._C._fx
>>> dir(torch._C._fx)
['__doc__', '__loader__', '__name__', '__package__', '__spec__', 'patch_function']
```

Test Plan: Imported from OSS

Reviewed By: Chillee

Differential Revision: D29497018

Pulled By: driazati

fbshipit-source-id: 093aa0552b48feb0aabe47bdf72776dddd5a3b8f
2021-06-30 14:23:35 -07:00
dabadd7e20 [quant] Added reset_min_max_vals() function to observers (#60883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60883

As per this [comment](https://github.com/pytorch/pytorch/pull/59964#discussion_r659064270), I created a `reset_min_max_vals()` function inside the observers which will be called during input-weight equalization. This is so that we will not expose the implementation of the observers in the equalization code.

Test Plan:
`python test/test_quantization.py TestEqualizeFx`

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29491848

fbshipit-source-id: 00e91959ceb3b4f3688175a1a7ba11823e929b2f
2021-06-30 14:22:08 -07:00
1a0195db49 [quant] Input-Weight Equalization - support for LinearReLU layers (#60653)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60653

Special casing was needed to get the weight attribute in the linear layers of fused LinearReLU layers.

Initial Model: `x -> linear1 -> relu`

After fusion: `x -> linearRelu`

After prepare: `x -> input_quant_obs -> input_eq_obs1 -> linearRelu -> output_quant_obs1`

After equalization functions: `x -> mul -> input_quant_obs (scaled) -> linearRelu -> output_quant_obs`

After convert: `x -> mul -> quantize_per_tensor -> quantized::linearRelu -> dequantize`

More step-throughs here: https://fb.quip.com/A9J3AsBxkykR

Test Plan:
`python test/test_quantization.py TestEqualizeFx`

Original model:
```
LinearReluModel(
  (fc): Linear(in_features=5, out_features=5, bias=True)
  (relu): ReLU()
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {})
    %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {})
    return fc_activation_post_process_0
```

Graph after equalization functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%x_activation_post_process_0,), kwargs = {})
    %fc_activation_post_process_0 : [#users=1] = call_module[target=fc_activation_post_process_0](args = (%fc,), kwargs = {})
    return fc_activation_post_process_0
```

Graph after `convert_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %fc_input_scale_0 : [#users=1] = get_attr[target=fc_input_scale_0]
    %fc_input_zero_point_0 : [#users=1] = get_attr[target=fc_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %fc_input_scale_0, %fc_input_zero_point_0, torch.quint8), kwargs = {})
    %fc : [#users=1] = call_module[target=fc](args = (%quantize_per_tensor,), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%fc,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: supriyar

Differential Revision: D29406999

fbshipit-source-id: add38e8e7fb84a241c3b10bfb8451b50103effd4
2021-06-30 14:22:06 -07:00
546102e161 Fix overflow in quantize_val_arm (#60079)
Summary:
By using `__builtin_add_overflow` to detect integer overflows when `zero_point` is added to rounded integral value.
Also fix small typo.

After this PR `python3 -c "import torch;print(torch.torch.quantize_per_tensor(torch.ones(10) * 2**32, 0.5, 1, torch.quint8))"` returns same vector of `127` on both x86_64 and aarch64 platforms

This change merely mitigates overflow bug, more proper (and perhaps performance impacting) fix would be to add `zero_point` to floating values both in serial and in vectorized code. Filed https://github.com/pytorch/pytorch/issues/61047 to track this one

Also filed https://github.com/pytorch/pytorch/issues/61046 to clarify intended use of `__ARM_NEON__` define

Fixes https://github.com/pytorch/pytorch/issues/60077

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60079

Reviewed By: kimishpatel

Differential Revision: D29157883

Pulled By: malfet

fbshipit-source-id: 6f75d93e6d3d4d0d5a5eab545cb27773086b9768
2021-06-30 14:20:56 -07:00
cef0851223 Make torch.utils.bencmark numpy free (#60564)
Summary:
PyTorch core do not depend on numpy, so benchmarks should not depend on it as well

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60564

Reviewed By: robieta

Differential Revision: D29497375

Pulled By: malfet

fbshipit-source-id: d9566e5b2e48868cef5568cd62f691af19ccf1f1
2021-06-30 14:17:32 -07:00
d1a4c9e682 [ROCm] allow user to override PYTORCH_ROCM_ARCH (#60602)
Summary:
Restores the ability of a user to call .jenkins/pytorch/build.sh while
also setting PYTORCH_ROCM_ARCH. Otherwise, with IN_CI=1 as the new
default, it will forcibly ignore user settings when build.sh is used
outside of CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60602

Reviewed By: samestep

Differential Revision: D29490791

Pulled By: janeyx99

fbshipit-source-id: b5e8a529b8e0b5020b260b4bf027a37e0c1df8d5
2021-06-30 13:35:11 -07:00
14cc234a8a Fix some comparison warnings (#60875)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60875

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29406593

fbshipit-source-id: 0eb070ef05c1cd343c9e835786b42014d0553aa5
2021-06-30 13:09:41 -07:00
74692f3ada Loop transformation (#60874)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60874

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29406474

fbshipit-source-id: c994361e9fdafb7c4519ce2f1c40288a9ef025be
2021-06-30 13:09:39 -07:00
a8b56ea58b Remove another for-loop in SoftMax (#60873)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60873

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29406429

fbshipit-source-id: 3b5710ed9e5d1d14379f64670638ab119d0d78e3
2021-06-30 13:09:38 -07:00
850ff82edc Remove for-loop for getting number of elements in favour of abstraction (#60872)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60872

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29406199

fbshipit-source-id: ae49672cf1bb370d574d0c21231477bb17dea0ca
2021-06-30 13:08:25 -07:00
95e77e0af2 [Delegate] A more specific prefix for lowered module name. (#61007)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61007

Test Plan: Imported from OSS

Reviewed By: kimishpatel, raziel

Differential Revision: D29477733

Pulled By: iseeyuan

fbshipit-source-id: 94a7a784d98a41ff7ba255955acf74bd26297c9f
2021-06-30 12:37:09 -07:00
f32f85e6da Implemented torch.corrcoef (#60420)
Summary:
Implements `torch.corrcoef` similar to [`np.corrcoef`](https://numpy.org/doc/stable/reference/generated/numpy.corrcoef.html) using `torch.cov` implemented in https://github.com/pytorch/pytorch/pull/58311.

closes https://github.com/pytorch/pytorch/issues/1254

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60420

Reviewed By: mruberry

Differential Revision: D29474687

Pulled By: heitorschueroff

fbshipit-source-id: f3c7c5610363aebd88274a51fc77e3cf879cb611
2021-06-30 12:36:02 -07:00
d5be67a338 Expose findDanglingImpls to Python (#60827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60827

This diff exposed Dispatcher.findDanglingImpls to Python as _C._dispatch_find_dangling_impls.
ghstack-source-id: 132799970

Test Plan: buck test mode/dev //caffe2/test:others -- test_find_dangling_impls

Reviewed By: ezyang

Differential Revision: D29416330

fbshipit-source-id: d2f26054b6e247be1bb9e818eaa7cb9e68a4a913
2021-06-30 12:31:19 -07:00
3cf267bfa6 Embedding: Remove dispatch in parallel region (#60597)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60597

Ref #56794

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D29446191

Pulled By: ngimel

fbshipit-source-id: d6ff010104ae621d5e3d9c269ed2b48407e71d67
2021-06-30 12:30:15 -07:00
4f5c68857f SumKernel (BFloat16): use float as accumulation type (#55217)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55217

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D28836794

Pulled By: VitalyFedyunin

fbshipit-source-id: 46ed3a862c2bb4c6325c78ecfc5d01761f7a113a
2021-06-30 12:27:42 -07:00
4d5edef8d4 Python composite module execution unit tests on delegation of backend_with_compiler_demo (#60801)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60801

backend_with_compiler_demo

Added unit tests for the execution of a simple composite module with a
compiler

Test Plan:
Running python test/test_jit.py TestBackendsWithCompiler -v gives a
success

Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D29409958

fbshipit-source-id: b02e58bdcc25a2997b70ecae41a019b8596323c1
2021-06-30 12:23:32 -07:00
3957ed41a9 [DDP] Disable reducer hooks from running outside of DDP backwards. (#60921)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60921

Sometimes local modules can fire hooks (such as when user calls
backward after using `ddp_module.module` explicitly). This isn't supported
behavior and can cause issues with various state and gradient reduction we run
in DDP, so it's best to disable this entirely.
ghstack-source-id: 132739311

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D29435737

fbshipit-source-id: fef76a0dd2955c432131632fb81dde4a4982ad91
2021-06-30 12:19:18 -07:00
5a4282d06b fix typo in binary_build_script (#61016)
Summary:
resolve comments in https://github.com/pytorch/pytorch/issues/60849

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61016

Reviewed By: samestep

Differential Revision: D29487908

Pulled By: janeyx99

fbshipit-source-id: 32feb6c6e1009324201e3d2c6fcd9a7388791401
2021-06-30 11:52:38 -07:00
d44515c418 Fix lint (#61058)
Summary:
https://github.com/pytorch/pytorch/issues/61003 broke Lint / shellcheck because of a race condition with https://github.com/pytorch/pytorch/issues/60221. This PR fixes it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61058

Test Plan: CI.

Reviewed By: walterddr

Differential Revision: D29494727

Pulled By: samestep

fbshipit-source-id: e6c5ea6daa47db13eb6a42cc2b5bf9c938c1839d
2021-06-30 11:45:23 -07:00
a25e6370e5 Add IMethod interface
Summary:
Expose IMethod interface, which provides a unified interface to either script or python methods backed by torchscript or torchdeploy.

IMethod provides a way to depend on a torch method without depending on a particular runtime implementation such as torchscript or python/deploy.

Test Plan: add unit tests.

Reviewed By: suo

Differential Revision: D29463455

fbshipit-source-id: 903391d9af9fbdd8fcdb096c1a136ec6ac153b7c
2021-06-30 11:28:24 -07:00
dace860008 Migrate pytorch-linux-bionic-py3.8-gcc9-coverage to GHA (#61050)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59166

`pytorch-linux-bionic-py3.8-gcc9-coverage` build & tests can be run on `linux.2xlarge` instances on GHA,
which have AVX512 support.

Thanks

cc malfet seemethere samestep zhouzhuojie

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61050

Reviewed By: walterddr, 1ntEgr8

Differential Revision: D29493335

Pulled By: samestep

fbshipit-source-id: de79e61f13c537ef7ff30a1e04d1bbc625a06dd1
2021-06-30 11:02:57 -07:00
b4496df7d3 mkl_scsrmm needs to be disabled when MKL is not used (#60051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60051

Introduction:
We want to minimize the number of dependencies for the SGX port. Therefore we need the ability to disable MKL when it is not used.

Problem :
There is a call to mkl_scsrmm that is enabled when CAFFE2_USE_MKL is not defined. This causes a compile error.

Solution :
Surround the call with preprocessor checks to CAFFE2_USE_MKL

Test Plan: Run the pytorch tests.

Reviewed By: LiJihang

Differential Revision: D29022635

fbshipit-source-id: 94ae9fdfe53399b64d8c2d4089eebe93d1d260e8
2021-06-30 10:40:18 -07:00
5644c31ec0 Move windows periodic jobs to GHA (#61003)
Summary:
Moves periodic 11.3 windows jobs to GHA.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61003

Test Plan:
https://github.com/pytorch/pytorch/pull/61003/checks?check_run_id=2947910829

Does NOT yet move the debuggable CI part yet

Reviewed By: malfet

Differential Revision: D29488761

Pulled By: janeyx99

fbshipit-source-id: b16b23b40fe1f6ae189292c6f2c561e5e70f122b
2021-06-30 10:25:10 -07:00
9b5e1e0734 [DataLoader] Make batch DataPipe sensitive to unbatch_level argument (#60672)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60672

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29461086

Pulled By: VitalyFedyunin

fbshipit-source-id: efc6b3b567323defe64d3f1b30a5708107e62dd4
2021-06-30 10:04:32 -07:00
66de50cc11 [DataLoader] Make shuffle DataPipe sensitive to unbatch_level argument (#60671)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60671

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29461083

Pulled By: VitalyFedyunin

fbshipit-source-id: 3d371017d5ce948a1e5b8182ae91033190f64da7
2021-06-30 10:03:29 -07:00
a652398465 [DataLoader] Rename transform DataPipe to legacy_transform (#60670)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60670

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29461081

Pulled By: VitalyFedyunin

fbshipit-source-id: 57f53a91db9032a6126e86243ddea9149c473060
2021-06-30 09:49:14 -07:00
abb4ed7412 Move clang-format to lint.yml (#60918)
Summary:
Refactor and consolidate the location of lint related workflows

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60918

Reviewed By: mruberry

Differential Revision: D29459605

Pulled By: zhouzhuojie

fbshipit-source-id: c2993cfd037a03b733a414897bd53cf407c7c268
2021-06-30 09:45:35 -07:00
0b8a7daa2a Enable multigpu_test in GHA (#60221)
Summary:
- [x] add to test matrix
- [x] enable on PRs for testing
- [x] modify the scripts so it actually runs the multigpu tests
- [x] put `num_shards` after `shard` number
- [x] use a separate test-reports artifact
- [x] run on `linux.16xlarge.nvidia.gpu`
- [x] validate that it works
- [x] disable on PRs before merging

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60221

Test Plan: CI. Example run: https://github.com/pytorch/pytorch/actions/runs/984347177

Reviewed By: malfet

Differential Revision: D29430567

Pulled By: samestep

fbshipit-source-id: 09f8e208e524579b603611479ca00515c8a1b5aa
2021-06-30 08:52:38 -07:00
5576c7bdd1 ns for fx: initial support for int8 shadows fp32 (#60419)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60419

Adds support for NS for FX shadowed activations pass to handle int8
modules shadowing fp32 modules. The difficulty here is that in order
to insert the dtype cast, we need the qparams of the input.

For the current PR, we only handle the easy cases where the previous
node is either a `quantize_per_tensor` or an OSS quantized module.
A future PR can handle more complicated cases such as various functions.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_int8_shadows_fp32_simple
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29280050

fbshipit-source-id: 465257c9f82a34fa91b48ae8887355c68e00edc6
2021-06-30 08:08:46 -07:00
a5e2ea4345 Add noop register hook (#60685)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60685

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29466224

fbshipit-source-id: 68c8aa022ccffeefd45062f1443d15c9a6824f3d
2021-06-30 07:46:34 -07:00
1fd65967e5 Revert D29312809: add quantized_resize and dequantize for some cuda backends
Test Plan: revert-hammer

Differential Revision:
D29312809 (c4cc26f26a)

Original commit changeset: c5c5eabb98bc

fbshipit-source-id: 565e215513b68eae0dacdd1660b1a01759215511
2021-06-30 07:37:09 -07:00
bfe03120ee [PyPer] Fix schema of fb::equally_split (#60852)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60852

Reviewed By: ajyu

Differential Revision: D29423425

fbshipit-source-id: 4525db1f268ca65d6851a5ec846a6ae2f710ec6b
2021-06-30 03:18:15 -07:00
af5a0df1d0 Prefer linalg::qr over qr in the C++ API (#60529)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60060

Also adds `torch::linalg::qr` to the C++ API, as it was missing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60529

Reviewed By: ngimel

Differential Revision: D29353133

Pulled By: mruberry

fbshipit-source-id: e18feaffca91c13940ad3d6bd1f40bb57dc101ae
2021-06-30 02:48:04 -07:00
b39770c461 Fix degenerate shape behavior for ord=+/-2 (#60273)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59198

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60273

Reviewed By: jbschlosser

Differential Revision: D29422907

Pulled By: mruberry

fbshipit-source-id: 609cd640b0477f90bebca20865e34cbe182d3909
2021-06-30 02:17:26 -07:00
10fc58620e [PyTorch][NASProfiler] Add moduleHierarchy Python API to print out hierarchical information about a Node (#60384)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60384

Currently inlining module graph will drop module hierarchy info on Python side. Here we retrieve the module hierarchy from cpp side and expose it to a new Python API on Node called `moduleHierarchy()`.

Test Plan:
Usage:
```
torch._C._jit_pass_inline(module.graph)
torch._C._jit_pass_propagate_shapes_on_graph(module.graph)
node = module.graph.findNode("quantized::conv2d_relu")
'top(' + module.original_name + ').' + node.moduleHierarchy() + '.' + node.kind()
```
Output:
```
'top(QuantWrapper).module(FBNetHR).0(Sequential).xif0_0(ConvBNRelu).conv(ConvReLU2d).quantized::conv2d_relu'
```

Reviewed By: kimishpatel

Differential Revision: D29252169

fbshipit-source-id: 74163a87f919e061e5e75dfebc4c5cdbe8489d93
2021-06-30 01:32:31 -07:00
44b3dc4eac resolve conjugate bit in torch.testing.assert_close (#60522)
Summary:
We need to resolve the conjugate bit for complex tensors, because otherwise we may not be able to access the imaginary component:

```python
>>> torch.tensor(complex(1, 1)).conj().imag
RuntimeError: view_as_real doesn't work on unresolved conjugated tensors.  To resolve the conjugate tensor so you can view it as real, use self.resolve_conj(); however, be warned that the resulting tensor will NOT alias the original.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60522

Reviewed By: ngimel

Differential Revision: D29353095

Pulled By: mruberry

fbshipit-source-id: c36eaf883dd55041166f692f7b1d35cd2a34acfb
2021-06-30 01:31:30 -07:00
c4cc26f26a add quantized_resize and dequantize for some cuda backends (#60489)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60489

adding entries into native_functions.yaml to enable these functions
since the code is common between cuda and cpu

Test Plan: tested with a full model, unit tests on the way

Reviewed By: ezyang

Differential Revision: D29312809

fbshipit-source-id: c5c5eabb98bc192343ec78980dc4e3fc3f41d3db
2021-06-30 00:33:12 -07:00
4adc5eb6c5 [Caffe2][Testing] Check for equality first in assertTensorEqualsWithType<float> (#61006)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61006

Test Plan: Modified existing unit test to test for eps = 0. It would fail without the equality test first.

Reviewed By: ajyu

Differential Revision: D29423770

fbshipit-source-id: 168e7de00d8522c4b646a8335d0120700915f260
2021-06-29 23:31:37 -07:00
287c0ab170 [FX] Add requires_grad to TensorMetadata (#60972)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60972

For PyTorch model memory requirement calculation, requires_grad is needed. Output tensors with requires_grad are saved in module context and increases memory during forward pass.

Test Plan: Existing test cases

Reviewed By: jamesr66a

Differential Revision: D29024932

fbshipit-source-id: def990f8c6ff6fa4537bfc377c646b9d44464ebd
2021-06-29 23:07:27 -07:00
ce232e7847 [ROCM] enable fft tests (#60313)
Summary:
This PR enables fft tests on ROCM. It contains a function that generates a valid input for fft tests that call hipfftExecC2R or hipfftExecZ2D. With this helper function we are able to fix a number of fft tests. This brings a close to the series of fft PRs enabling fft tests on ROCM.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60313

Reviewed By: mruberry

Differential Revision: D29463487

Pulled By: malfet

fbshipit-source-id: d0903fbf12d24ba95a42c8b7589714fdb63353ed
2021-06-29 22:43:29 -07:00
e2b42c6f52 [ROCm] Update the magma build to new commit (#60900)
Summary:
Magma master branch is updated with all the fixes required for ROCm, so updating the magma build to the new commit for ROCm pyTorch builds.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60900

Reviewed By: jbschlosser

Differential Revision: D29440587

Pulled By: malfet

fbshipit-source-id: 2ccdf48441dfff3d19c4a478e03ac11a843f8419
2021-06-29 22:38:58 -07:00
93772792e3 [nnc] Get rid of fuser trigger counters (#57334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57334

Here's a possibly controversial PR.  These counters got in the way of
generalizing the fuser tests to handle arbitrary devices, and I guess I'm just
generally skeptical that they provide much value.  While true that they let us
observe whether fusion groups were created, we already have assertions based on
the shape of the graph, and I'm not sure that I trust those any less than these
counters.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D29471484

Pulled By: bertmaher

fbshipit-source-id: f6d76f6e72dbfb581acff1d834b0c74500941b57
2021-06-29 22:22:15 -07:00
c4f718cb72 [nnc] Serialize initialization of LLVM targets (#60996)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60996

We've had a bug report of weird LLVM initialization errors, e.g.,
```
Unexpected failure in LLVM JIT: Cannot choose between targets "x86-64" and "x86-64"
```

While I haven't repro'ed that exact message, I did run a stress-test that
compiles on many threads simultaneously, and it deadlocks in
TargetRegistry::lookupTarget.  And in fact I remember debugging this before in
a different system, and finding "Clients are responsible for avoid race
conditions in registration" in
https://llvm.org/doxygen/TargetRegistry_8cpp_source.html.

So yeah, let's lock this thing.
ghstack-source-id: 132719018

Test Plan: Heavy multithreaded compilation.  Not sure if it's suitable for landing.

Reviewed By: ZolotukhinM

Differential Revision: D29471343

fbshipit-source-id: b495e468b57e77796a08b627884d3efeca2d1f7c
2021-06-29 22:21:00 -07:00
5bc28c897e fixed launch bounds for gamma_cuda_kernel (#60393)
Summary:
Changed launch bounds for gamma_cuda_kernel from 512 to 256.

Timing data (using Nvidia Titan-V):
![GammaTimingData](https://user-images.githubusercontent.com/22803332/122821464-bc873300-d291-11eb-9be6-2fb690f0d5c7.PNG)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60393

Reviewed By: jbschlosser

Differential Revision: D29447926

Pulled By: ngimel

fbshipit-source-id: c2112f9be8ede3bb07cb72f301393f24d17e0c01
2021-06-29 19:22:07 -07:00
b3ec92cf66 BatchNorm: Remove dispatch in parallel region (#60596)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60596

Ref #56794

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D29446193

Pulled By: ngimel

fbshipit-source-id: 3ebf44a5f1e001e7dc42cd5963752b7e5b9bcbd9
2021-06-29 18:28:46 -07:00
28dc02fe9f Accumulate 16-bit float sums in 32-bit accumulators (#60387)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60387

Fixes gh-59489

Using 32-bit accumulators is a win-win: improved precision and improved
performance since the half precision types needed to be converted back and forth
to 32-bit float to do the arithmetic anyway.

Note that on multi-threaded or dis-contiguous sums, there can be partial sums
stored in the output so they are necessarily trucated to 16-bit. Fixing this
would require a rework of TensorIterator reductions.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D29447187

Pulled By: ngimel

fbshipit-source-id: d0619e0ca2fe116d101460142b79ca56fd6d0840
2021-06-29 17:52:30 -07:00
f54290fd72 Expose raw saved tensors for custom functions (#60551)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60551

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29466228

fbshipit-source-id: 7565f6cc3f2488c7e444cf81c7eb37a60c75b0e8
2021-06-29 17:21:52 -07:00
a469298707 Free space in windows libtorch build (#60849)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60856
Remove more unless pre-installed softwares in CI image

verification links
https://app.circleci.com/pipelines/github/pytorch/pytorch/342992/workflows/3f52cacc-ba1c-4093-804f-d4c1b1c0b806/jobs/14436533
https://app.circleci.com/pipelines/github/pytorch/pytorch/342992/workflows/3f52cacc-ba1c-4093-804f-d4c1b1c0b806/jobs/14437351

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60849

Reviewed By: mruberry

Differential Revision: D29473637

Pulled By: seemethere

fbshipit-source-id: f33dd98de32a79ba1195481f1bd9f2d5362fe16e
2021-06-29 16:53:10 -07:00
af66356d47 [skip-ci] Bump docker image tag (#60988)
Summary:
This PR bumps the docker image tag for clang-tidy. The new image runs ubuntu-20.04 (and therefore has python3.8 by default).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60988

Reviewed By: malfet

Differential Revision: D29469941

Pulled By: 1ntEgr8

fbshipit-source-id: 7268bdb23edff0bc26f275689bf4b1f1ca129df7
2021-06-29 15:23:06 -07:00
8780f8fc3c Remove extraneous process group agent test code (#60903)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60903

RPC tests using process group backend were disabled for CI internally / externally. This is removing the code for process group (only) tests. Faulty agent tests which also use process group will be in a later PR.

Test Plan: Imported from OSS

Reviewed By: jbschlosser, mrshenli

Differential Revision: D29440674

Pulled By: H-Huang

fbshipit-source-id: 4724c189a110ac821c3f4f6f1f8a5c98e057a2a4
2021-06-29 14:21:56 -07:00
d3de37609f Support fused_dropout with XPU backend (#60231)
Summary:
## Motivation
Enable the fused dropout optimization on XPU devices.

## Solution
Add XPU device in the fused dropout acceptable checking.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60231

Reviewed By: jbschlosser

Differential Revision: D29437659

Pulled By: ezyang

fbshipit-source-id: b77245bb53d3ac93ab30a2a85994376ae5928c34
2021-06-29 14:20:17 -07:00
b4a4a8434d [1/n]support double for Caffe2 ScatterWeightedSum (#60402)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60402

Add float64 data type support for ScatterWeightedSum for cases that 10^7 precision is not sufficient.

Test Plan: buck test caffe2/caffe2/python/operator_test:sparse_ops_test -- testScatterWeightedSum

Reviewed By: jianyuh

Differential Revision: D29190324

fbshipit-source-id: 871a60744694e901a2c7685a67350860745d6729
2021-06-29 14:17:04 -07:00
5f51406a51 Modify error message when atol=0 and rtol=0 (#60897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60897

Fixes #56377
Example output: #60898

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D29461107

Pulled By: 1ntEgr8

fbshipit-source-id: c6e15b299290aab6f8d5a19011c1d39279673f74
2021-06-29 14:17:02 -07:00
6d952dbaf0 [nnc] Fixed checking for loop carried dependence while fusing 2D reduction loops (#60609)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60609

Fixes #60310

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D29386144

Pulled By: navahgar

fbshipit-source-id: 230df4f59d6196a250ea57ff649b117d096fcdbc
2021-06-29 14:17:01 -07:00
b099f5429c Port argmin kernel to structured kernels. (#60364)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60364

Tracking issue: #55070

This PR was openned so as to solve the CI failures in main when merging: #59371 #59372 #59373 #59937 #59938.

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D29265855

Pulled By: ezyang

fbshipit-source-id: ccee3810940542f8b370596105826c96b32231ec
2021-06-29 14:16:59 -07:00
3e2233841f Port argmax to structured kernels. (#60363)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60363

Tracking issue: #55070

This PR was openned so as to solve the CI failures in main when merging: #59371 #59372 #59373 #59937 #59938.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29265857

Pulled By: ezyang

fbshipit-source-id: 586914d2aa79028c56988896093945755a2b9781
2021-06-29 14:16:57 -07:00
df47fa5bdc Using meta checks for unary torch.all and torch.any. (#60362)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60362

This PR makes use of the newly implemented unified `at::meta::check_reduction` for
validating the inputs and configuring its `TensorIterator`.

This PR was openned so as to solve the CI failures in main when merging: #59371 #59372 #59373 #59937 #59938.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29265858

Pulled By: ezyang

fbshipit-source-id: e8961b7da65a31acfed5ac3f5c1f5985ae81ec37
2021-06-29 14:16:56 -07:00
0dd90cceaf [package] track storages across lifetime of PackageExporter (#59735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59735

1. Fixes ABA storage identity problem during serialization for `torch.package` by keeping reference of serialized storages through lifetime of `PackageExporter` to prevent reuse of memory address. Achieved by extending logic used in solution to mobile's same issue.
2. Adds determinism to naming scheme of serialized storages in export code paths which utilize `tensor_cdata_naming_scheme`(introduced 2nd mapping in `StorageContext`, now maps `storage cdata ptr` -> `unique id`, `unique id` -> `c10::Storage`)
3. Additionally uses presence of a storage in the `StorageContext` instance as marker for if a storage has been serialized or not, removing the need to scan the `PythonStreamWriter` for presence of the storage's serialization file

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D29075276

Pulled By: Lilyjjo

fbshipit-source-id: 15a5c30b1de99c5bd7079388f2db9b6ece2eca12
2021-06-29 14:16:54 -07:00
eb2f535689 c10::Storage python to cpp converter and typecast (#59734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59734

Adds typecast logic to allow for c10::Storages to cross the Python/C++ barrier with pyBind

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D29075279

Pulled By: Lilyjjo

fbshipit-source-id: 3e67b8525d308c5bccc64438ebac82b4d17ba462
2021-06-29 14:16:52 -07:00
93eba7471b Remove fetch in clang-tidy setup (#60974)
Summary:
This was necessary previously since we'd have to diff against upstream in order to figure out what to run in clang-tidy, but now we pull this from GitHub https://github.com/pytorch/pytorch/issues/60045 so we can delete this part of the workflow

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60974

Reviewed By: mruberry

Differential Revision: D29466036

Pulled By: driazati

fbshipit-source-id: a9d619ab731e77bc69ab32b37cfb2c249e22a477
2021-06-29 14:15:34 -07:00
91c076eadc Add TorchVitals for DataLoader (#60959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60959

Add TorchVitals for Dataloader, this indicates that the data loader was enabled.

This is a no-op if TORCH_VITALS environment variable is not set.

Test Plan: buck test mode/dbg caffe2/test:torch -- --regex vitals

Reviewed By: VitalyFedyunin

Differential Revision: D29445146

fbshipit-source-id: d5778fff3dafb3c0463fec7a498bff4905597518
2021-06-29 14:08:32 -07:00
652d911f81 add BFloat16 support for LayerNorm CPU (#55210)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55210

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D28836793

Pulled By: VitalyFedyunin

fbshipit-source-id: 998298deedd7a18e45fb761a0a4e0d88b65f2e0c
2021-06-29 14:08:30 -07:00
89d0e31fe5 [torch][repeat_interleave] Remove stream sync when output_size is given for scalar repeats (#60965)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60965

Same as title. Simple change on tensor creation.

Test Plan: Rely on existing signals and verify manually that sync is not happening.

Reviewed By: ngimel

Differential Revision: D29461773

fbshipit-source-id: 21d6ebfba08449da39fc7f109958f6c6978a4f32
2021-06-29 14:08:28 -07:00
086f6e557e Fix divide by zero error in the ASAN test (#60723)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60722

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60723

Reviewed By: jbschlosser

Differential Revision: D29432147

Pulled By: albanD

fbshipit-source-id: c82cd0df8e4a04ee561ca26ae821a8b61c13a698
2021-06-29 14:07:26 -07:00
ec9c03c234 Implemented torch.cov (#58311)
Summary:
Based from https://github.com/pytorch/pytorch/pull/50466

Adds the initial implementation of `torch.cov` similar to `numpy.cov`. For simplicity, we removed support for many parameters in `numpy.cov` that are either redundant such as `bias`, or have simple workarounds such as `y` and `rowvar`.

cc PandaBoi

closes https://github.com/pytorch/pytorch/issues/19037

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58311

Reviewed By: jbschlosser

Differential Revision: D29431651

Pulled By: heitorschueroff

fbshipit-source-id: 167dea880f534934b145ba94291a9d634c25b01b
2021-06-29 14:02:39 -07:00
8f658d537d Improved JIT support for torch.einsum (#59265)
Summary:
Added JIT support for the vararg version of `torch.einsum`. Note that JIT does not support the Python's Ellipsis object (`...`)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59265

Reviewed By: VitalyFedyunin

Differential Revision: D29328469

Pulled By: heitorschueroff

fbshipit-source-id: 5e4b177fda93255251f45d735b00c08220f0f124
2021-06-29 14:01:21 -07:00
d46eb77b04 Improve CUDA extension building error/warning messages (#59665)
Summary:
See https://github.com/pytorch/pytorch/issues/55267

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59665

Reviewed By: mruberry

Differential Revision: D29462248

Pulled By: ezyang

fbshipit-source-id: 9de13a284a14a7cd24200b9684151ce652e1eb1e
2021-06-29 13:03:30 -07:00
12b63f4046 [DDP] Fix case where new tensors with no grad_fn are returned in DDP forward. (#60882)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60882

Fixes https://github.com/pytorch/pytorch/issues/60733, which
identified an issue with a previous PR that resulted in DDP no longer
supporting cases where newly created tensors are returned that don't have a
grad_fn. The result of this is the grad_fn is set to that of the `DDPSink`
custom backward which results in errors during the backwards pass.

This PR fixes the issue by ensuring we don't touch the `grad_fn` of the tensors
if it is `None`. Added relevant tests as well.
ghstack-source-id: 132632515

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D29423822

fbshipit-source-id: a9e01046c7be50aa43ffb955f6e0f48fef4bc881
2021-06-29 12:50:48 -07:00
1db2d9b0a8 [ProcessGroupNCCL] change WARNING to INFO (#60901)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60901

Short-term fix to address
https://github.com/pytorch/pytorch/issues/60752 . Longer-term fix is tracked here:
https://github.com/pytorch/pytorch/issues/53658 and will involve detecting
whether the user has called `torch.cuda.set_device` in their script and
respecting that device if so, otherwise falling back to our current approach.
ghstack-source-id: 132637336

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D29439322

fbshipit-source-id: 92a18fadbb514b1c029332b60fd48075874906ff
2021-06-29 12:46:47 -07:00
150c828803 Add lint rule to keep collect_env.py python2 compliant (#60946)
Summary:
Fixes T94400857

- [x] Add lint rule
- [x] Verify lint rule works
- [x] Fix torch/utils/collect_env.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60946

Reviewed By: malfet, mruberry

Differential Revision: D29457294

Pulled By: rsemenov

fbshipit-source-id: 3c0670408d7aee1479e1de335291deb13a04ace9
2021-06-29 11:57:53 -07:00
808d0e3353 [caffe2] update make_mnist_db and make_image_db to move strings into DB::Put() (#60919)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60919

Update make_mnist_db.cc and make_image_db.cc to work with the DB API changes
in D29204425 (00896cb9ed).  This is similar to the changes to make_cifar_db.cc landed in
D29374754 (394f60b0fc).
ghstack-source-id: 132621346

Test Plan: buck build caffe2/binaries/...

Reviewed By: valmikir

Differential Revision: D29447314

fbshipit-source-id: 33aff85c24d8b785211287de23d46704c7eb0726
2021-06-29 11:52:43 -07:00
fab1b6cc70 .github: Increase test shards for linux GPU (#60914)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60914

Linux GPU tests are taking almost 4 hours to execute through, let's up
the test shards for these jobs so they finish in a more timely fashion

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: samestep

Differential Revision: D29461968

Pulled By: seemethere

fbshipit-source-id: a1eab08f9cd3abd8ceca48871fe702d0bccd8a3f
2021-06-29 10:44:01 -07:00
5fbca0d281 Use cpu docker image for cpu builds (#60920)
Summary:
This was set to use the [CUDA 10.0 image](https://hub.docker.com/r/pytorch/manylinux-cuda100) which hasn't been updated in quite a while, so fix it to use the up-to-date [cpu image](https://hub.docker.com/r/pytorch/manylinux-cpu) instead

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60920

Reviewed By: janeyx99

Differential Revision: D29447897

Pulled By: driazati

fbshipit-source-id: 6e89091110361d0ddda859bb266e229c6cf83c2d
2021-06-29 10:11:55 -07:00
10b929bbfb Make Jeff and Jithun .circleci/docker code owners (#60958)
Summary:
Following up on https://github.com/pytorch/pytorch/pull/60658#issuecomment-870681027.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60958

Reviewed By: 1ntEgr8

Differential Revision: D29460721

Pulled By: samestep

fbshipit-source-id: 74badff6c4a17b3ff48dc2fc27d1faa9edeae097
2021-06-29 09:47:58 -07:00
53489bc385 fix for #60319 , forcing to use fork as start method in test/test_dat… (#60868)
Summary:
fix for https://github.com/pytorch/pytorch/issues/60319 , forcing to use fork as start method in test/test_dataloader.py

Fixes #{60319}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60868

Reviewed By: mruberry

Differential Revision: D29432876

Pulled By: ejguan

fbshipit-source-id: 5da25f7cfaf8ea0803c0b1aacf2badd656799e16
2021-06-29 09:30:37 -07:00
4310044fec update unsafe flag documentation (#60899)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60899

modify documentation for `unsafe` flag in `parametrize.py`
ghstack-source-id: 132591862

Test Plan:
shouldn't modify code behavior but as a double check,
`buck test mode/dev-nosan //caffe2/test:nn -- --exact 'caffe2/test:nn - test_register_and_remove_parametrization (test_nn.TestNN)'`

https://pxl.cl/1L1fw

Reviewed By: albanD

Differential Revision: D29436688

fbshipit-source-id: 85499ad22b49ad992507b9ed5e7def8231cbfeba
2021-06-29 09:25:37 -07:00
5b6818f08a [Model Averaging] Enforce a synchronization before allreduce parameters (#60891)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60891

This fix is particularly useful for local SGD when the averaging period is very small, which may cause the conflict between gradient allreduce within per-machine subgroup and the global parameter allreduce by the communication world.
ghstack-source-id: 132564252

Test Plan:
f281873295 (#Try1) failed due to the conflict between global process group and subgroup.
```
<Thread(configerator-monitor-singleton, started 139839806633728)>
  File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/tmp/jetter.gson7tr3/configerator/client.py", line 348, in _monitor_loop
    self._parent_thread.join(self._interval_ms / 1000)
  File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 1015, in join
    self._wait_for_tstate_lock(timeout=max(timeout, 0))
  File "/usr/local/fbcode/platform009/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
```

Fixed after adding an explicit sync: f282044866, f282241800

Reviewed By: rohan-varma

Differential Revision: D29434597

fbshipit-source-id: a4f777fc26f379639f85fda32de425cd3b337b33
2021-06-29 01:39:40 -07:00
fbd4cb1cd7 Fix error logging in common_distributed. (#60917)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60917

The second line of error log didn't handle f-string properly.

Before fix:
```
exiting process with exit code: {MultiProcessTestCase.TEST_ERROR_EXIT_CODE}
```

After fix:
```
exiting process 3 with exit code: 10
```
ghstack-source-id: 132618199

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D29446574

fbshipit-source-id: f806ef0470cb6aa86fe3c404e1c895514abb6488
2021-06-28 19:32:17 -07:00
d71e7ae740 [PyTorch][vulkan] Unify vtensor_from_vulkan to always return non-const ref (#59996)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59996

Just like D28811477 (dce8697aea), there's no reason we can't give it this signature.
ghstack-source-id: 132566618

Test Plan: CI

Reviewed By: AshkanAliabadi

Differential Revision: D29119070

fbshipit-source-id: d049d49c38099eef6c96e8f69909827e64376097
2021-06-28 19:25:13 -07:00
7eef78597e fixed launch bounds for grid sampler 3d (#60385)
Summary:
Changed launch bounds for grid_sampler_3d from 1024 to 512 and grid_sampler_3d_backward from 1024 to 256.

Timing data (using Nvidia Titan-V):
![GridSampler3dTimingData](https://user-images.githubusercontent.com/22803332/122813457-d3c12300-d287-11eb-99c1-6572f539660f.PNG)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60385

Reviewed By: jbschlosser

Differential Revision: D29433741

Pulled By: ngimel

fbshipit-source-id: 7f475d0c2e854ae65dd0f1fb0167dfae7e506ec9
2021-06-28 19:01:38 -07:00
d36ce61a5e use explicitly non-returning GPU atomics (#60607)
Summary:
Enables an important performance optimization for ROCm, in light of the discussion in https://github.com/pytorch/pytorch/issues/41028.

CC jithunnair-amd sunway513

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60607

Reviewed By: jbschlosser

Differential Revision: D29409894

Pulled By: ngimel

fbshipit-source-id: effca258a0f37eaefa35674a7fd19459ca7dc95b
2021-06-28 18:17:29 -07:00
d62c3ea354 [skip ci] Add GitHub Actions label for g3.16xlarge (#60888)
Summary:
Prerequisite for https://github.com/pytorch/pytorch/issues/60221.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60888

Reviewed By: seemethere

Differential Revision: D29436592

Pulled By: samestep

fbshipit-source-id: b3254139ec9c46c533f8f951a9ede3b372a65536
2021-06-28 15:49:52 -07:00
d5a44f9f12 Use expecttest from PyPI (#60658)
Summary:
This PR removes `torch/testing/_internal/expecttest.py` in favor of https://github.com/ezyang/expecttest. See also https://github.com/ezyang/ghstack/pull/71.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60658

Test Plan: CI.

Reviewed By: ezyang

Differential Revision: D29430763

Pulled By: samestep

fbshipit-source-id: b7cdc7ba37330176149fd465312118e2254ae92e
2021-06-28 15:43:34 -07:00
ddb1f293b6 Fix the NNC-disabled path in static runtime for perf comparisons
Summary:
The path which has NNC/LLVM disabled still constructs a tensor
expression, even though `supports()` will always return false, so a
`KernelScope` is necessary to manage those memory allocations.

I guess we could avoid building the TEs at all in this case, but it's pretty
clean this way.

Test Plan:
```
scripts/bertrand/static_runtime/run.sh
```

Reviewed By: hlu1

Differential Revision: D29415909

fbshipit-source-id: dde43de8516b9a2cf9f5f7f3699962bf9ccd8c30
2021-06-28 15:39:07 -07:00
9b94aa5356 [quant][fx][fix] Fused modules with object_type in qconfig (#60779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60779

When we do fusion, we replace certain modules (such as Linear + ReLU) with fused versions (such as LinearReLU) by calling `_fuse_fx` in prepare_fx. However when we try to look up using the fused module type in qconfig_dict, we cannot find a match anymore since the qconfig dict contains the original module types. An example is here [N882873](https://fburl.com/anp/azenjx3v).

So we will now update the qconfig_dict to include the fused modules mapping to the qconfigs used for the modules that make up the fused modules. If the modules are not mapped to the same qconfig, then we will raise an error.

Test Plan:
`python test/test_quantization.py TestFuseFx.test_qconfig_fused_module`

Imported from OSS

Reviewed By: supriyar

Differential Revision: D29406941

fbshipit-source-id: 74b5db89f4998aeb02b2bf7c37bf97326580c654
2021-06-28 15:22:22 -07:00
cyy
cadce14e02 don't return in __init__ functions (#60830)
Summary:
Fix some warnings from a code analyzer

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60830

Reviewed By: jbschlosser

Differential Revision: D29433638

Pulled By: albanD

fbshipit-source-id: 148df1d8a0a79778f18e8b6abffbddef36c5031c
2021-06-28 14:56:13 -07:00
9af8aecd00 [caffe2/libtorch] Remove already-owned source
Summary:
This source is already owned by a more fine-grained rule, so avoid a
package boundary violation by having it also be owned by an outer
rule.

Test Plan: CI

Reviewed By: aniketmathur

Differential Revision: D29422794

fbshipit-source-id: 432accc969abcb4d56bd97341a07029926939ea0
2021-06-28 14:45:34 -07:00
eeea696c02 [caffe2] Fix include of corresponding header
Summary:
AFAICT, this include was a typo, and meant to be the corresponding
header for this .cpp, but instead pulled in an unrelated header.

Test Plan: CI

Reviewed By: igorsugak

Differential Revision: D29422993

fbshipit-source-id: cc9bb29ee1f1007b68c6666ea8e389f6f39928af
2021-06-28 14:45:32 -07:00
c3977bf3da [caffe2/utils] Add some fine-grained rules to avoid package boundary violations
Test Plan: CI

Reviewed By: igorsugak

Differential Revision: D29401295

fbshipit-source-id: e921e5578c1fcc8df6bd670ae9f95722b8e32d85
2021-06-28 14:45:30 -07:00
03de807d81 [caffe2/utils] Add explicit rule to avoid package boundary violation (#60677)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60677

Add a rule to wrap conversions.h and depend on that, rather than
relying on a glob which violates package boundaries.

Test Plan: `buck2 build fbcode//caffe2/caffe2:caffe2_core`

Reviewed By: mzlee

Differential Revision: D29370841

fbshipit-source-id: d4dd383eb8457d4f5118574e34e6f17c32fde647
2021-06-28 14:43:30 -07:00
41c380e649 Enable bionic-cuda10.2-cudnn7-py3.9-gcc7 in GHA (#60204)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60204

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D29430679

Pulled By: samestep

fbshipit-source-id: 9380f5535cd370ec7aabf609a6170c8cb4df505d
2021-06-28 13:08:36 -07:00
971cdafd15 Upgrade benchmark to v1.5.5 (#60750)
Summary:
This fixes the build for gcc 11.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60750

Test Plan: CI

Reviewed By: larryliu0820

Differential Revision: D29394541

Pulled By: dreiss

fbshipit-source-id: 61557431b52a3e898ffcc32f97133b3ea94a838f
2021-06-28 13:03:03 -07:00
007ba37c9a [pruning] Speedup activation reconstruction (#60683)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60683

Vectorized reconstruction without for loops

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1KSJQ

Reviewed By: z-a-f

Differential Revision: D29370805

fbshipit-source-id: 75402437654a0b6f6391c8590bbe3f6fe3f43d8f
2021-06-28 12:58:21 -07:00
f302e0c781 [pruning] Additional pruning tests (#60681)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60681

Adding additional pruning tests for more complex models and more pruned rows

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1KQ2Z

Reviewed By: z-a-f

Differential Revision: D29347546

fbshipit-source-id: cb65e564dd46d24f4aca1b00dd915ee8d64f8318
2021-06-28 12:58:20 -07:00
8d4a6ef962 [pruning] Activation reconstruction (#60292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60292

Added activation reconstruction in the `reconstruct` method

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1KLl1

Reviewed By: z-a-f

Differential Revision: D29236569

fbshipit-source-id: 1ad085f4143eb9fa3efca51e00d810e0fdb7e9b1
2021-06-28 12:58:18 -07:00
965dad25a5 Allow resizing of parametrized tensors (#60418)
Summary:
Modify `parametrize.py` to allow resizing of parametrized tensors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60418

Test Plan:
`buck test mode/dev-nosan //caffe2/test:nn -- --exact 'caffe2/test:nn - test_register_and_remove_parametrization (test_nn.TestNN)'`

https://pxl.cl/1L0wh

Reviewed By: z-a-f

Differential Revision: D29279442

Pulled By: kazhou

fbshipit-source-id: 4d94915748f896e7761a40ad18f4c6444f505c3a
2021-06-28 12:57:11 -07:00
956faea585 [fix] cauchy sampling inf on cuda (#60186)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59144

As pointed by ngimel, the issue is indeed with calling `tan`.

However the C++ `std::tan` [documenation](https://en.cppreference.com/w/cpp/numeric/math/tan) states that

```
The function has mathematical poles at π(1/2 + n); however no common floating-point representation
is able to represent π/2 exactly, thus there is no value of the argument for which a pole error occurs.
```

All `torch.tan`,`numpy.tan` and `math.tan` are compliant with the above statement.

<details>

```python
import torch
import math
import numpy as np

# Single Precision
print(torch.tan(torch.tensor(math.pi, device='cuda', dtype=torch.float32) * 0.5))
print(np.tan(np.array(np.pi, dtype=np.float32) * 0.5))

# Double Precision
print(math.tan(math.pi * 0.5))
print(torch.tan(torch.tensor(math.pi, device='cuda', dtype=torch.double) * 0.5))
print(np.tan(np.array(np.pi, dtype=np.float64) * 0.5))
```

Output
```
tensor(-22877334., device='cuda:0')
-22877332.42885646
1.633123935319537e+16
tensor(1.6331e+16, device='cuda:0', dtype=torch.float64)
1.633123935319537e+16
```

</details>

So this issue stems from the use of `__tanf` faster approximation of tan from CUDA library (for float16, bfloat16 and float).

8a839c5478/aten/src/ATen/NumericUtils.h (L91-L100)

The fix in the PR is to use the **slower** but more correct version.

Benchmark::
```
[ cauchy : input dtype torch.float16 device cuda ]
                             |  Before  |  After
1 threads: -------------------------------------
      (128,)                 |    3.8   |    4.3
      (256, 128)             |    3.8   |    4.2
      (2, 512, 256)          |    3.8   |    4.2
      (2, 64, 256, 128)      |   22.8   |   29.6
      (4, 2, 512, 256, 128)  |  649.6   |  869.3

Times are in microseconds (us).

[ cauchy : input dtype torch.bfloat16 device cuda ]
                             |  Before  |  After
1 threads: -------------------------------------
      (128,)                 |    3.8   |    4.3
      (256, 128)             |    3.8   |    4.3
      (2, 512, 256)          |    3.8   |    4.3
      (2, 64, 256, 128)      |   23.8   |   30.8
      (4, 2, 512, 256, 128)  |  682.5   |  904.2

Times are in microseconds (us).

[ cauchy : input dtype torch.float32 device cuda ]
                             |  Before  |  After
1 threads: --------------------------------------
      (128,)                 |     3.8  |     4.2
      (256, 128)             |     3.7  |     4.2
      (2, 512, 256)          |     3.7  |     4.2
      (2, 64, 256, 128)      |    35.3  |    37.1
      (4, 2, 512, 256, 128)  |  1020.0  |  1058.3

Times are in microseconds (us).

[- cauchy : input dtype torch.float64 device cuda ]
                             |   Before  |   After
1 threads: ----------------------------------------
      (128,)                 |      3.8  |      4.2
      (256, 128)             |      8.0  |      8.0
      (2, 512, 256)          |     46.0  |     46.0
      (2, 64, 256, 128)      |    669.2  |    669.4
      (4, 2, 512, 256, 128)  |  21255.0  |  21262.1

Times are in microseconds (us).
```

<details>

Benchmark Script:
```python
import torch
import itertools
import time
from torch.utils.benchmark import Timer
from torch.utils.benchmark import Compare
import sys
import pickle

print('Using pytorch %s' % (torch.__version__))

cuda_shapes = [(128,), (256, 128), (2, 512, 256), (2, 64, 256, 128), (4, 2, 512, 256, 128)]
cuda_dtypes = [torch.half, torch.bfloat16, torch.float, torch.double]
results = []
repeats = 10

for device in ['cuda']:
    dtypes = cuda_dtypes
    shapes = cuda_shapes

    for dtype in dtypes:
        for shape in shapes:
            t = torch.randn(shape, device=device, dtype=dtype) * 10

            tasks = [("t.cauchy_()", "After", "")]
            timers = [Timer(stmt=stmt, label=f"cauchy : input dtype {dtype} device {device}", sub_label=f"{(shape)}", description=desc, globals=globals()) for stmt, desc, label in tasks]

            for i, timer in enumerate(timers * repeats):
                results.append(
                    timer.blocked_autorange()
                )
                print(f"\r{i + 1} / {len(timers) * repeats}", end="")
                sys.stdout.flush()

with open('after-pr.pkl', 'wb') as f:
    pickle.dump(results, f)

comparison = Compare(results)
comparison.print()
```

Compare Script:
```
import torch
import itertools
import time
from torch.utils.benchmark import Timer
from torch.utils.benchmark import Compare
import sys
import pickle

with open('before-pr.pkl', 'rb') as f:
    after_results = pickle.load(f)

with open('after-pr.pkl', 'rb') as f:
    before_results = pickle.load(f)

comparison = Compare(after_results + before_results)
comparison.print()
```

</details>

TODO:
* [x] Add comment

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60186

Reviewed By: jbschlosser

Differential Revision: D29433897

Pulled By: ngimel

fbshipit-source-id: 9c5f14b83e3372bed72369f70eed9256c04385c6
2021-06-28 12:49:30 -07:00
70e205a2ab Use the new URL for docs preview link (#60893)
Summary:
This is all set up on CloudFront now with a custom domain, so we don't need the long default cloudfront domain anymore

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60893

Reviewed By: malfet

Differential Revision: D29437300

Pulled By: driazati

fbshipit-source-id: 6f5ffd1b10c5167b0022b7e64b2164508624ca91
2021-06-28 12:45:04 -07:00
f5e5ced202 Enable parallel clang-tidy on ec2 runner (#60870)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60870

This PR makes `clang-tidy` run on our self-hosted runner in a parallel fashion.

Fixes #60867

Test Plan: #60871

Reviewed By: jbschlosser

Differential Revision: D29434240

Pulled By: 1ntEgr8

fbshipit-source-id: cead30ed718ddf5e14b13afe70cb209aa16b44a0
2021-06-28 11:45:44 -07:00
c8fb785857 Print stdout and stderr to console on parallel runs (#60869)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60869

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D29434155

Pulled By: 1ntEgr8

fbshipit-source-id: 925c9d832775dbb710af9367c07962f3367fda38
2021-06-28 11:29:12 -07:00
a8057e7ef1 docs: add permute in torch docs (#60821)
Summary:
fix https://github.com/pytorch/pytorch/issues/60181

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60821

Reviewed By: VitalyFedyunin

Differential Revision: D29431949

Pulled By: jbschlosser

fbshipit-source-id: 2353afceaa188315cde1f0c955897c4750809c8e
2021-06-28 11:20:35 -07:00
d7c58e5a04 [vulkan] Implement tanh activation function (#60695)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60695

As title. Implement tanh in Vulkan.

Test Plan:
Build Pytorch repository with the build command in P425131222.

Run test command `pytorch/build/bin/vulkan_api_test`

Output:

{F627752306}

Reviewed By: SS-JIA

Differential Revision: D29375071

fbshipit-source-id: 2d613a9542774719dd78524757a677e3b2450c74
2021-06-28 10:58:44 -07:00
da70dd199d [quant] Input-Weight Equalization - tests (#60378)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60378

Created the following unit-tests to check that our equalization algorithm is as expected:
- Check the equalization scales calculated and stored in the graph are as expected
- Check the scaled weights and biases are as expected
- Check that the min/max values in the quantization observers are as expected
- Check that the graphs with equalization are structured in the same way as graphs without equalization (except that equalized graphs have additional equalization scale and mul nodes) before and after quantization

Test Plan:
`python test/test_quantization TestEqualizeFx.test_input_weight_equalization_equalization_scales`
`python test/test_quantization TestEqualizeFx.test_input_weight_equalization_weights_bias`
`python test/test_quantization TestEqualizeFx.test_input_activation_values`
`python test/test_quantization TestEqualizeFx.test_input_weight_equalization_graphs`

Imported from OSS

Reviewed By: supriyar

Differential Revision: D29406942

fbshipit-source-id: 518208546ae5835c1ebb2af217507e90af66fbe4
2021-06-28 10:44:29 -07:00
dfb9c0bae8 [quant] Input-Weight Equalization - support for connected F.linear layer (#60272)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60272

Test Plan:
`python test/test_quantization.py TestEqualizeFx`

Original model:
```
FunctionalLinear2Module(
  (linear1): Linear()
  (linear2): Linear()
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {})
    %linear1_w : [#users=1] = get_attr[target=linear1.w]
    %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0](args = (%linear1_w,), kwargs = {})
    %linear1_w_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0_equalization_process_0](args = (%linear1_w_activation_post_process_0,), kwargs = {})
    %linear1_b : [#users=1] = get_attr[target=linear1.b]
    %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0_equalization_process_0, %linear1_w_activation_post_process_0_equalization_process_0), kwargs = {bias: %linear1_b})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    %linear_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0_equalization_process_0](args = (%linear_activation_post_process_0,), kwargs = {})
    %linear2_w : [#users=1] = get_attr[target=linear2.w]
    %linear2_w_activation_post_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0](args = (%linear2_w,), kwargs = {})
    %linear2_w_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0_equalization_process_0](args = (%linear2_w_activation_post_process_0,), kwargs = {})
    %linear2_b : [#users=1] = get_attr[target=linear2.b]
    %linear_1 : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%linear_activation_post_process_0_equalization_process_0, %linear2_w_activation_post_process_0_equalization_process_0), kwargs = {bias: %linear2_b})
    %linear_1_activation_post_process_0 : [#users=1] = call_module[target=linear_1_activation_post_process_0](args = (%linear_1,), kwargs = {})
    return linear_1_activation_post_process_0
```

Graph after equalization steps:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {})
    %linear1_w : [#users=1] = get_attr[target=linear1.w]
    %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_0](args = (%linear1_w,), kwargs = {})
    %linear1_b : [#users=1] = get_attr[target=linear1.b]
    %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    %linear2_w : [#users=1] = get_attr[target=linear2.w]
    %linear2_w_activation_post_process_0 : [#users=1] = call_module[target=linear2_w_activation_post_process_0](args = (%linear2_w,), kwargs = {})
    %linear2_b : [#users=1] = get_attr[target=linear2.b]
    %linear_1 : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%linear_activation_post_process_0, %linear2_w_activation_post_process_0), kwargs = {bias: %linear2_b})
    %linear_1_activation_post_process_0 : [#users=1] = call_module[target=linear_1_activation_post_process_0](args = (%linear_1,), kwargs = {})
    return linear_1_activation_post_process_0
```

Graph after `convert_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_scale0 : [#users=1] = get_attr[target=x_equalization_scale0]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_scale0), kwargs = {})
    %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0]
    %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {})
    %linear1_packed_weight_0 : [#users=1] = get_attr[target=linear1_packed_weight_0]
    %linear1_scale_0 : [#users=1] = get_attr[target=linear1_scale_0]
    %linear1_zero_point_0 : [#users=1] = get_attr[target=linear1_zero_point_0]
    %linear : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%quantize_per_tensor, %linear1_packed_weight_0, %linear1_scale_0, %linear1_zero_point_0), kwargs = {})
    %linear2_packed_weight_0 : [#users=1] = get_attr[target=linear2_packed_weight_0]
    %linear2_scale_0 : [#users=1] = get_attr[target=linear2_scale_0]
    %linear2_zero_point_0 : [#users=1] = get_attr[target=linear2_zero_point_0]
    %linear_1 : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%linear, %linear2_packed_weight_0, %linear2_scale_0, %linear2_zero_point_0), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear_1,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29267218

fbshipit-source-id: 6b97bed1a307f1d0b1f5efcbecf41f35418242f7
2021-06-28 10:44:27 -07:00
ddf2ce03bb [quant] Input-Weight Equalization - support for connected linear layers (#60034)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60034

Added support for equalizing models with connected linear
layers. To account for connected linear layers, we will additionally
multiply the previous weight values (row-wise) by the next equalization
scale, and remove the input equalization observer between the two linear
layers. We also want to scale the bias by the next equalization scale.
The math is shown here: https://fb.quip.com/fK8rA9aRM4ca .

Original Model: `x -> linear1 -> linear2`
After `prepare_fx`: `x -> InpEqObs -> InpQuantObs -> linear1 ->
OutQuantObs -> InpEqObs -> linear2`
After equalization: `x -> mul -> InpQuantObs -> linear1 -> OutQuantObs
-> linear2`

Test Plan:
`python test/test_quantization.py
TestEqualizeFx.test_input_weight_equalization_convert`

Original Model:
```
Linear2Module(
  (linear1): Linear(in_features=2, out_features=2, bias=True)
  (linear2): Linear(in_features=2, out_features=2, bias=True)
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=x_activation_post_process_0_equalization_process_0](args = (%x_activation_post_process_0,), kwargs = {})
    %linear1 : [#users=1] = call_module[target=linear1](args = (%x_activation_post_process_0_equalization_process_0,), kwargs = {})
    %linear1_activation_post_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0](args = (%linear1,), kwargs = {})
    %linear1_activation_post_process_0_equalization_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0_equalization_process_0](args = (%linear1_activation_post_process_0,), kwargs = {})
    %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1_activation_post_process_0_equalization_process_0,), kwargs = {})
    %linear2_activation_post_process_0 : [#users=1] = call_module[target=linear2_activation_post_process_0](args = (%linear2,), kwargs = {})
    return linear2_activation_post_process_0
```

Graph after equaliation functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0_equalization_process_0_scale : [#users=1] = get_attr[target=x_activation_post_process_0_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_activation_post_process_0_equalization_process_0_scale), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_0](args = (%mul,), kwargs = {})
    %linear1 : [#users=1] = call_module[target=linear1](args = (%x_activation_post_process_0,), kwargs = {})
    %linear1_activation_post_process_0 : [#users=1] = call_module[target=linear1_activation_post_process_0](args = (%linear1,), kwargs = {})
    %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1_activation_post_process_0,), kwargs = {})
    %linear2_activation_post_process_0 : [#users=1] = call_module[target=linear2_activation_post_process_0](args = (%linear2,), kwargs = {})
    return linear2_activation_post_process_0
```

Graph after `convert_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_activation_post_process_0_equalization_process_0_scale : [#users=1] = get_attr[target=x_activation_post_process_0_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_activation_post_process_0_equalization_process_0_scale), kwargs = {})
    %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0]
    %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {})
    %linear1 : [#users=1] = call_module[target=linear1](args = (%quantize_per_tensor,), kwargs = {})
    %linear2 : [#users=1] = call_module[target=linear2](args = (%linear1,), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear2,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29204347

fbshipit-source-id: 6bb9e25e2468f50df523885ded2edc731f002ac1
2021-06-28 10:44:25 -07:00
7917318917 [quant] Input-Weight Equalization - support for F.linear layers (#59964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59964

Input-Weight Equalization support for functional layers

Test Plan:
`python test/test_quantization.py
TestEqualizeFx.test_input_weight_equalization_convert`

Original model:
```
FunctionalLinearModule(
  (linear1): Linear()
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {})
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {})
    %linear1_w : [#users=1] = get_attr[target=linear1.w]
    %linear1_w_equalization_process_0 : [#users=1] = call_module[target=linear1_w_equalization_process_0](args = (%linear1_w,), kwargs = {})
    %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_00](args = (%linear1_w_equalization_process_0,), kwargs = {})
    %linear1_b : [#users=1] = get_attr[target=linear1.b]
    %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    return linear_activation_post_process_0
```

Graph after equalization functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%mul,), kwargs = {})
    %linear1_w : [#users=1] = get_attr[target=linear1.w]
    %linear1_w_equalization_process_0 : [#users=1] = call_module[target=linear1_w_equalization_process_0](args = (%linear1_w,), kwargs = {})
    %linear1_w_activation_post_process_0 : [#users=1] = call_module[target=linear1_w_activation_post_process_00](args = (%linear1_w_equalization_process_0,), kwargs = {})
    %linear1_b : [#users=1] = get_attr[target=linear1.b]
    %linear : [#users=1] = call_function[target=torch.nn.functional.linear](args = (%x_activation_post_process_0, %linear1_w_activation_post_process_0), kwargs = {bias: %linear1_b})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    return linear_activation_post_process_0
```

Graph after `convert_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {})
    %linear1_input_scale_0 : [#users=1] = get_attr[target=linear1_input_scale_0]
    %linear1_input_zero_point_0 : [#users=1] = get_attr[target=linear1_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear1_input_scale_0, %linear1_input_zero_point_0, torch.quint8), kwargs = {})
    %linear1_packed_weight_0 : [#users=1] = get_attr[target=linear1_packed_weight_0]
    %linear1_scale_0 : [#users=1] = get_attr[target=linear1_scale_0]
    %linear1_zero_point_0 : [#users=1] = get_attr[target=linear1_zero_point_0]
    %linear : [#users=1] = call_function[target=torch.ops.quantized.linear](args = (%quantize_per_tensor, %linear1_packed_weight_0, %linear1_scale_0, %linear1_zero_point_0), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29135459

fbshipit-source-id: 1e69bfbb82a0c89538e55b64968effd0b11b2fde
2021-06-28 10:44:24 -07:00
387289d4a5 support non-contiguous tensor in bilinear (#38409)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38409

Reviewed By: anjali411

Differential Revision: D29361043

Pulled By: albanD

fbshipit-source-id: 05147a9b0f7a47204bcd5ff70e281a464e8de1e6
2021-06-28 10:43:21 -07:00
f118d20bea Make requires grad check run only when grad mode is enabled (#60740)
Summary:
As per title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60740

Reviewed By: ngimel

Differential Revision: D29405934

Pulled By: albanD

fbshipit-source-id: 35c537939a3871f5a0d2146543506e4d07465724
2021-06-28 10:40:30 -07:00
3ad3f20bff Add an optional Device parameter to pin_memory/is_pinned that does nothing (#60201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60201

This is to flush out BC/FC problems with adding this parameter.  Later
PR will actually add the desired functionality.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D29331880

Pulled By: ezyang

fbshipit-source-id: 6036716d6ae55e6ea7ef2348b6c34a39613c8dd5
2021-06-28 10:38:52 -07:00
85af24f52b Remove some unnecessary functions from CUDAHooks (#59655)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59655

CUDAHooks is to be used solely when you need to call into CUDA
functionality from a context where you cannot directly link to
CUDA libraries.  Neither of hasPrimaryContext nor
getDevceIndexWithPrimaryContext (sic) needs to be used in such
contexts.  By moving them out of CUDAHooks and calling them
directly a dynamic dispatch can be skipped.

I also fixed the typo in getDev(i)ceIndexWithPrimaryContext

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D28972946

Pulled By: ezyang

fbshipit-source-id: edcd7a7b62aec97928f07fbf3bf413b9fb027517
2021-06-28 10:38:51 -07:00
b52849b589 Port silu_backward to structured (#58661)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58661

I removed dispatch: CompositeImplicitAutograd: math_silu_backward
Definitely not right, but I don't know how it works with structured core.
Keeping it will trigger an assertion failure

```
assert dispatch.keys() != {DispatchKey.CompositeImplicitAutograd}, \
    f"unexpected name for singleton CompositeImplicitAutograd dispatch entry: expected {cpp.name(func)} " \
    f"but got {dispatch[DispatchKey.CompositeImplicitAutograd]}.  Rename your implementation to the expected " \
    "name, then delete the dispatch table"
```

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D28572530

Pulled By: ezyang

fbshipit-source-id: 410f03bddf79cda7c9f0fd66f697383ee2925d32
2021-06-28 10:37:45 -07:00
66f01db36c Make some comparisons explicit (#60505)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60505

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29313240

fbshipit-source-id: 3f558e68cbb0328326d7540e2b3bd0c2e12ba3e2
2021-06-28 10:33:59 -07:00
f5341bd5e6 Enhance ProcessGroupWrapper with additional checks + refactor (#60237)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60237

Closes https://github.com/pytorch/pytorch/issues/58711

This diff refactors the collective consistency checking in `ProcessGroupWrapper` as described in the above issue. In particular, we no longer run separate verification checks (`all_gather`s) for shapes, op type, etc. Instead, we implement a function `serialize_fingerprint` to serialize all this data into a single tensor and only verify that.

This has the benefit of being a lot more extensible, the developer does not need to add separate `all_gather` calls in order to verify additional data in the future. We can also provide some sort of mechanism where we allow data that needs to be verified to be "registered" in the `CollectiveFingerPrint` struct and make it even easier to add additional data, we can consider doing this if there are significant additions to `process group wrapper`.

We now also begin to check tensor `dtypes` and device types for consistency as well. Tests are refactored/added accordingly.
ghstack-source-id: 132520261

Test Plan: CI

Reviewed By: cbalioglu

Differential Revision: D28597287

fbshipit-source-id: b09f14f628df9e2457623ba81fc13fd4e214f3c9
2021-06-28 10:24:11 -07:00
aaea81e3fb [torch/distributed] remove outdated FutureWarning in distributed/elastic/util/store.py (#60807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60807

Addresses: https://github.com/pytorch/pytorch/issues/60717

This warning should have been removed since this code is no longer in "experimental" mode.

Test Plan: N/A - just removing experimental warning that should've been removed.

Reviewed By: H-Huang, aivanou

Differential Revision: D29412972

fbshipit-source-id: 16a8a98abde70a4ae0c1ac1b14bda339cb44863a
2021-06-28 10:22:16 -07:00
94cdbbf48d Paren-matching kernel launch check without external deps (#60778)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60778

Matches parens and the opening `<<<` to make a more accurate kernel launch check.

Test Plan:
```
buck test //caffe2/test:kernel_launch_checks
```

Reviewed By: ngimel

Differential Revision: D29401624

fbshipit-source-id: 8649af7c33e67dbb24044af0134b1cea6f2e5dc3
2021-06-28 10:18:04 -07:00
88b0518a83 Python error unit tests on delegation of backend_with_compiler_demo (#60689)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60689

Added a test for errors that occur with a compiler, specifically when an
operator is not supported by the backend.
ghstack-source-id: 132485207

Test Plan:
Running python test/test_jit.py TestBackendsWithCompiler -v returns a
success.

Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D29374513

fbshipit-source-id: ac52b315a01719eaa4985680939239ae058d277b
2021-06-28 09:33:03 -07:00
e63db3ae46 ENH Adds byte support for nll_loss (CUDA) (#60650)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59765

This PR adds byte support for nll_loss on CUDA.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60650

Reviewed By: albanD

Differential Revision: D29429456

Pulled By: jbschlosser

fbshipit-source-id: 894c969ed6bfc6117dee8e844a7cb5b99977247c
2021-06-28 08:20:13 -07:00
7f6b2bc2d0 Add -I<directory> option to tools/linter/clang_tidy.py (#60745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60745

Fixes #60739

Test Plan:
Run this command:
```
python3 tools/linter/clang_tidy.py --paths torch/csrc/fx -I/usr/include/path -I/usr/include/another/path --print-include-paths
```

Output:

If the paths don't exist, you should see this:
```
ignoring nonexistent directory "/usr/include/path"
ignoring nonexistent directory "/usr/include/another/path"
```

If the paths exist, you should see them listed.

Reviewed By: ngimel

Differential Revision: D29395227

Pulled By: 1ntEgr8

fbshipit-source-id: c89650546d45887cd39e574da07f08bcfec686e0
2021-06-28 06:56:02 -07:00
5b118a7f23 Don't reference reflection_pad3d in functional.py (#60837)
Summary:
To work around FC issue

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60837

Reviewed By: jbschlosser

Differential Revision: D29421142

Pulled By: ngimel

fbshipit-source-id: f5c1d9c324173b628e286f9005edf7109162066f
2021-06-27 20:54:32 -07:00
f0e972a481 To add Nesterov Adam algorithm for multi-tensor optimizers API (#59165)
Summary:
Previously in the PR: https://github.com/pytorch/pytorch/issues/59009 we added NAdam to Optimizers.  Here in this PR we are proposing multi-tensor version of NAdam for PyTorch.

Nadam has been proposed in the paper   https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ and report  and report : http://cs229.stanford.edu/proj2015/054_report.pdf by Timothy Dozat.

It has been one of the most used algorithm in Deep Learning community.

It worth to noting that the implementation of NAdam is inspired by the implementation for Keras :
f9d3868495/tensorflow/python/keras/optimizer_v2/nadam.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59165

Reviewed By: vincentqb

Differential Revision: D29360577

Pulled By: iramazanli

fbshipit-source-id: 0fe14016303b2df2cb8cc31912a2674acf63d1e5
2021-06-27 17:00:41 -07:00
3bfe15085d [TensorExpr] Add a mechanism to register custom TS->NNC lowerings in TensorExprKernel. (#60804)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60804

The lowerings are stored as a map c10::Symbol -> std::function and the
signature of thoese functions match the signature of
`computeOperandValue`. Custom lowerings have higher priority over the
standard ones, i.e. we can redefine how a given op is lowered.

In general this feature is aimed at unblocking users whose models
contain ops that are not yet supported by NNC - it allows to quickly add
a custom lowering for a given op.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D29409580

Pulled By: ZolotukhinM

fbshipit-source-id: e8e8dc9d3cb9155cfbf5c08a4216ba1b5b791a60
2021-06-27 15:27:22 -07:00
5563f4bda0 To add Rectified Adam algorithm for multi-tensor optimizers API (#59161)
Summary:
Previously in the PR: https://github.com/pytorch/pytorch/issues/58968 we added RAdam to Optimizers. Here in this PR we are proposing multi-tensor version of RAdam for PyTorch.

Radam has been proposed in the paper https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al.

It has been one of the most used algorithm in Deep Learning community.

Differing from the paper, we selected variance tractability cut-off as 5 instead of 4 as it is the common practice.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59161

Reviewed By: vincentqb

Differential Revision: D29360576

Pulled By: iramazanli

fbshipit-source-id: 7ccdbf12b1ee7f12e66f7d7992123a70cc818b6b
2021-06-27 13:01:20 -07:00
0fbc471d10 Support default values on NamedTuple fields (#54682)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54682

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D27327241

Pulled By: ansley

fbshipit-source-id: 76546f1770d50ebc3435bba3b74540e3c6be8a1c
2021-06-26 15:18:21 -07:00
6b53792f18 fix cuda mem leak check not properly run on master_builds (#60742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60742

improved CI_MASTER flag check logic, since it can be unset, true or false

Test Plan:
search for `PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK` in logs below:

- Before adding ci/master:
  - build workflow (`PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=1`): https://circleci.com/api/v1.1/project/github/pytorch/pytorch/14394913/output/107/0?file=true&allocation-id=60d5fd2fa55ae50282aec997-0-build%2F10295B30
- After adding ci/master label:
  - build workflow (`PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=0`): https://circleci.com/api/v1.1/project/github/pytorch/pytorch/14398213/output/107/0?file=true&allocation-id=60d61cf8bb9d097afc7a11aa-0-build%2F400138F1
  - master build workflow (`PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=0`): https://circleci.com/api/v1.1/project/github/pytorch/pytorch/14398198/output/107/0?file=true&allocation-id=60d61ca3467438480c963290-0-build%2F2999C909

Reviewed By: ngimel

Differential Revision: D29405732

Pulled By: walterddr

fbshipit-source-id: 09dd653cbb47ca61b1f8872851bda6db8db671b9
2021-06-26 07:05:32 -07:00
e3abccec8a [Static Runtime] Remove output type constraints (#60669)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60669

Test Plan: Added unit test to check for nested outputs.

Reviewed By: ajyu

Differential Revision: D29322025

fbshipit-source-id: a3c8d3c5f0bb7cf7fda4bc5f579adb8fa7bc3724
2021-06-26 02:36:27 -07:00
dae25c2002 Fix missing spaces in error of constant_pad_nd (#60729)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60729

Reviewed By: ZolotukhinM

Differential Revision: D29404422

Pulled By: ngimel

fbshipit-source-id: c40458c7a6ae33f61c680bff8de778a80658c250
2021-06-25 19:20:03 -07:00
9a08e87d8b Modernize for-loops in aten (#59598)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59598

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D28946826

fbshipit-source-id: 9f3f7e38833c2bc33d27243cef16ab0118c65f3a
2021-06-25 19:02:00 -07:00
7e3a694b23 supports non-leaf inputs for autograd.backward() function (#60521)
Summary:
Close https://github.com/pytorch/pytorch/issues/60268

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60521

Reviewed By: ngimel

Differential Revision: D29393586

Pulled By: albanD

fbshipit-source-id: 2dd2de427ecfecca8d544237bacf690e0b7c918c
2021-06-25 18:57:26 -07:00
056a8e0d5c Remove un-used parameter in _trilinear backward (#60673)
Summary:
This argument is only important for speed and memory usage. So it is ok to ignore it during the backward.
As discussed, we might want to change this to speed up backward in the future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60673

Reviewed By: soulitzer

Differential Revision: D29370125

Pulled By: albanD

fbshipit-source-id: ad50b3ea530aeb194f5a51845523b517a50f2c71
2021-06-25 17:47:10 -07:00
f262217101 [Model Averaging] Move step out of model averaging API (#60632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60632

Address the comment https://github.com/pytorch/pytorch/pull/60320#discussion_r654845062
ghstack-source-id: 132340278

Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager

Reviewed By: rohan-varma

Differential Revision: D29355609

fbshipit-source-id: 50a6f13ed70b5a5b5b92ead2f3d7082c11277af5
2021-06-25 17:20:52 -07:00
c5f0692b6e Sparse CSR: increase dtype test coverage (#60656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60656

This PR uses `torch.testing.get_all_dtypes()` for dtype parametrisation
of tests in `test_sparse_csr.py`. It adds previously excluded from tests
bool, half, bfloat16, complex dtypes. `torch.complex32` is omitted due
to lack of coverage and lack of specialized `AT_DISPATCH...`.
The process of adding more dtypes to tests releaved that `.to_dense()`
doesn't work for all dtypes.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D29408058

Pulled By: cpuhrsch

fbshipit-source-id: 319b6f51b9786d6957d508f51657657a6d00267a
2021-06-25 17:11:21 -07:00
dd045ab540 add channels last for AdapativeMaxPool2d (#48920)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48920

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D25399467

Pulled By: VitalyFedyunin

fbshipit-source-id: d9d2cc728cc7a18a26983e96d3c3e81a23659e89
2021-06-25 16:36:20 -07:00
367aff91d8 Fix missing #pragma once in jit/method.h
Summary: it seems to be accidentally missing

Test Plan: run CI

Reviewed By: suo

Differential Revision: D29335990

fbshipit-source-id: 2790bc10d141f9484a0807ff7800024a02fd9cfa
2021-06-25 16:32:54 -07:00
8b6487c650 Add CUDA Vital (#58059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58059

Add CUDA.used vital sign which is true only if CUDA was "used" which technically means the context was created.

Also adds the following features:
- Force vitals to be written even if vitals are disabled, to enable testing when the env variable is not set from the start of execution
- Add a read_vitals call for python to read existing vital signs.

Test Plan: buck test mode/dbg caffe2/test:torch -- --regex basic_vitals

Reviewed By: xuzhao9

Differential Revision: D28357615

fbshipit-source-id: 681bf9ef63cb1458df9f1c241d301a3ddf1e5252
2021-06-25 16:31:11 -07:00
9134b0e42f add a boxed CPU fallback kernel (#58065)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58065

This PR replaces the existing code-generated CPU fallback kernels that XLA uses with a single boxed CPU fallback.

Current state: there are a couple different design ideas that I want to point out, but the logic for the actually kernel is mostly done and passing tests.

### Design

To preface, I'm not 100% tied to the current design and I'm putting the PR up now for opinions and totally open to alternatives, some of which I listed below. Actually after writing this description, I'm leaning toward the following changes:
* Confirm whether or not we can remove all C++ logging info directly in the yaml.

**Current Design**

All of the CPU fallback codegen is deleted. In its place, XLA (and other external backends, later) can choose to opt into a CPU fallback by adding the following code in a C++ file. I have an corresponding [xla-side PR with the xla changes](https://github.com/pytorch/xla/pull/2945/files#diff-1a005c10039f0cb11130a3b740f5de716d2f10acaea121017016025861886798R1).

There's no actual requirement to split up the code into a .h and .cpp file, but that's necessary in the XLA case because they sometimes need to call the fallback directly from their handcrafted kernels.

```
// xla_cpu_fallback.h
#include <ATen/native/CPUFallback.h>
...
void xla_cpu_fallback(const c10::OperatorHandle& op, torch::jit::Stack* stack);
...
```
```
// xla_cpu_fallback.cpp
#include "torch_xla/csrc/aten_cpu_fallback.h"
...
void xla_cpu_fallback(const c10::OperatorHandle& op, torch::jit::Stack* stack) {
  // Do custom logging here
  ...
  // Call the actual boxed CPU fallback.
  at::native::cpu_fallback(op, stack);
}

TORCH_LIBRARY_IMPL(_, XLA, m) {
  m.fallback(torch::CppFunction::makeFromBoxedFunction<&xla_cpu_fallback>());
}
```

Now that the fallback is exposed in the backend, they can call it directly. Doing so requires converting from an unboxed to a boxed context, which we provide a utility function before. E.g.:
```
#include <ATen/native/CPUFallback.h>

at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) {
  ....
  if (...call_fallback...) {
    return at::native::call_fallback_fn<&xla_cpu_fallback, decltype(at::addmm)>::call("aten::addmm", self, mat1, mat2, beta, alpha);
  }
  ...
}
```

That `decltype(at::addmm)` logic isn't actually used everywhere in the xla-side PR yet, since you hit issues with overloads. I could use it everywhere once #58092 lands.

**Alternatives: The API for calling the CPU fallback directly is ugly, can we make it nicer?**
We could change the api to use `at::redispatch`, which would make it look something like this:
```
at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) {
  ....
  if (...call_fallback...) {
    return at::redispatch::addmm(c10::DispatchKeySet(c10::DispatchKey::CPUFallback), self, mat1, mat2, beta, alpha);
  }
  ...
}
```
Which definitely feels cleaner, but also requires adding a new DispatchKey just for this use case. Conditionally calling the CPU fallback doesn't sound like a hugely important use case, so I don't know if giving up one of our 64 dispatch key slots is worth the API improvement. Totally open to other opinions though!

Another more mild improvement that would avoid having to pass operator string names (including overloads) around would be to codegen (yet another) namespaced API. Something like this:
```
at::Tensor addmm(const at::Tensor& self,const at::Tensor& mat1,const at::Tensor& mat2,const at::Scalar& beta,const at::Scalar& alpha) {
  ....
  if (...call_fallback...) {
    return at::fallback::addmm<&xla_cpu_fallback>(self, mat1, mat2, beta, alpha);
  }
  ...
}
```

Writing that out actually I actually like it more (I think it'll let us get rid of `decltype(...)`). Maybe that is nice enough to warrant a new codegen API - I haven't tried adding that yet, but if people like it I'm happy to try it out.

**More alternatives**
The current design also involves the backend manually writing and registering the boxed fallback themselves, but an alternative would be for us to do it in codegen too: they would just need to pass in all of the C++ logging that they want done in the fallback, directly through the yaml. The main downsides:
* Backend code that wants to call the fallback needs to abide by whatever convention our codegen uses to name the generated boxed fallback.
* Passing custom C++ logging through yaml is just more fragile: right now xla uses an `iostream` to log each tensor arg in the operator, so we'd have to either force other backends into the same convention or figure something else out later.

To be fair, we actually already do that: XLA has custom per-tensor-arg logging for all of the generated `out` wrappers in the codegen, which we do by passing their C++ logging info through the yaml. This seems unnecessary though, since `out` wrappers just call into a functional kernel, which is hand written with its own custom logging. So my take is: try to remove custom C++ logging from the yaml, and if it turns out to be really necessary, then we may as well take advantage of that to codegen the fallback.

### Performance impact

While ops that fall back to CPU aren't exactly hot path, we probably don't want to use a boxed fallback if it turns out to be an absolute perf killer.

I ran my benchmarks using callgrind, benchmarking both `at::add` and `at::add_out` run on XLA. My callgrind benchmark for `at::add` can be found here (the add_out benchmark looks basically the same): https://www.internalfb.com/phabricator/paste/view/P415418587. I created the benchmark by hacking the existing xla C++ test build scripts and throwing in a reference to callgrind.

I also attached the full callgrind output for each benchmark; the full output is actually pretty noise and hard to parse, but I focused on everything underneath the `at::add()` call in the output, which was much more stable. My guess is that it's due to some heavyweight async startup processing that xla does.

`at::add`:
before: 88,505,130 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415421001
after: 102,185,654 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415421273
delta: ~15.5% increase

`at::add_out`:
before: 63,897,395 instructions. Full output: https://www.internalfb.com/intern/everpaste/?handle=GBrrKwtAPlix9wUEAOZtrFXpdO5UbsIXAAAz
after: 73,170,346 instructions. Full output: https://www.internalfb.com/phabricator/paste/view/P415423227
delta: ~14.5% increase

High level takeaway: A framework overhead increase of 10-20% doesn't seem too horrible for the CPU fallback use case.

For structured, functional ops that requires a CPU fallback, we're actually in an unfortunate situation: we're doing even more work than necessary. Our codegen automatically creates a `CompositeExplicitAutograd` kernel which calls into the `out` operator. So the extra work that we end up doing is:
* An extra dispatcher hop: (at::add -> CompositeExplicitAutograd -> CPUFallback -> at::native::add) instead of (at::add -> CPUFallback -> at::native::add)
* An unnecessary tensor allocation (the CompositeExplicitAutograd kernel uses at::empty() to create an output tensor, which is immediately overwritten by the CPU fallback)
* An unnecessary meta() call (the CompositeExplicitAutograd kernel calls it to create the output tensor, but we call it again in the CPU kernel).
* unboxing->boxing->unboxing logic (this is the only strictly required piece)

There are definitely ways to avoid the unnecessary work explained above: one would be to give the boxed fallback higher priority than composite keys (there's [an issue for it here](https://github.com/pytorch/pytorch/issues/55104)), and codegen fallthroughs for all composite ops. It'll require more infra to set up, so I see it as more of a perf knob that we can apply if we need it later.

Unfortunately I couldn't dig much deeper into the differences aside from the aggregate change in instructions, since it looks like callgrind fudged some of the instruction attribution (`at::to_cpu` takes up a ton of instructions, but I don't see any attribution for the `at::native::add` kernel anywhere).

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D28833085

Pulled By: bdhirsh

fbshipit-source-id: 537ebd5d7fb5858f1158764ff47132d503c3b92b
2021-06-25 16:26:50 -07:00
ad69e2fd11 [torch] Module fix on the support of LazyModule on bug #60132 (#60517)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60517

This is to fix the module support on lazymodulefixin on the bug issue #60132
Check the link: https://github.com/pytorch/pytorch/issues/60132

We will have to update lazy_extension given the dependency on module.py and update the unit test as well.

Test Plan:
Unit test passes

torchrec test passes

Reviewed By: albanD

Differential Revision: D29274068

fbshipit-source-id: 1c20f7f0556e08dc1941457ed20c290868346980
2021-06-25 16:20:19 -07:00
cab926b2c0 faster generate_square_subsequent_mask in nn.Transformer (#60631)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60631

Per #48360, speed up `Transformer.generate_square_subsequent_mask`. New impl is informally ~5x faster, though absolute difference is probably small.

PR includes Python and C++ versions as well as a couple of places where the previous impl had been copied around.

Test Plan: Imported from OSS

Reviewed By: jbschlosser, albanD

Differential Revision: D29356673

Pulled By: bhosmer

fbshipit-source-id: 4c062ba0ead61a445aeef451c78777bf0b3a631e
2021-06-25 16:07:01 -07:00
7585783b8d Remove Optional[None] annotations (#60704)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60704

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D29380281

Pulled By: ansley

fbshipit-source-id: 055c17329a35375de4ebd058ee6d127475aad373
2021-06-25 15:53:58 -07:00
5ed7400b75 Fix doc preview source directory (#60792)
Summary:
`merge` is the directory with the actual changes, not `master`. Verified by downloading arficats from https://github.com/pytorch/pytorch/pull/60777/checks and searching through the result.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60792

Reviewed By: walterddr

Differential Revision: D29405288

Pulled By: driazati

fbshipit-source-id: 419c943727c00429945c1f116645bfa22fb12456
2021-06-25 15:46:30 -07:00
7b933cd9ea configurable pre/post LayerNorm in nn.Transformer (#60593)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60593

Per #55270, this PR makes it configurable whether to run LayerNorm before or after other operations in Transformer layers.

However, it leaves for a separate PR the removal of the LayerNorm performed after the final encoder/decoder layer has run, which is redundant when LayerNorms has been run after other in-layer operations (problem described in #24930 #50086 #51447).

Note: this means that transformers built with `nn.Transformer()` are now configurable, but will still contain a redundant LayerNorm when configured as before. However, callers of the `TransformerEncoder` and `TransformerDecoder` classes have always been able to avoid this redundancy.

Reviewer notes:
1. Ran across this during other work, don't know if anybody's working on it already (most recent conversation in issues seems to be from early April). Happy to abandon if so.
2. Was looking for a quick way to add tests but it looks like the existing ones in test_nn just compare against snapshots. I could add something similar, but curious if there's any prepackaged way to add a test that LayerNorm-first (the new option) yields model that trains properly, etc.
3. New code in the `forward`s was written to minimize diff churn rather than maximize beauty :P happy to pretty it up if desired.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D29356590

Pulled By: bhosmer

fbshipit-source-id: 308669326990b8923aab5fcd96e03b582fb21f24
2021-06-25 15:43:35 -07:00
e13a9587b4 Revert "Revert D29135358: [quant] Input-Weight Equaliaztion - convert modifications" (#60646)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60646

This reverts commit e60f9cfc58fb2fe3e2e7f65fcdbbf350e5b55a75.

Test Plan: Imported from OSS

Reviewed By: supriyar

Differential Revision: D29361191

Pulled By: angelayi

fbshipit-source-id: 275d8691d8e47da4ab80bb21b51d77ec25a0f714
2021-06-25 15:37:05 -07:00
7188d84ccf [Tools] Update path in clang_format_utils after #60473 (#60782)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60782

PR #60473 introduced a new folders nesting level, this change updates
clang_format_utils.py to accordingly adjust the way it sets up root
path.

Test Plan: Imported from OSS

Reviewed By: zhxchen17

Differential Revision: D29403622

Pulled By: ZolotukhinM

fbshipit-source-id: 6404271615c2d263834cf538ab0153c4d41cc5c3
2021-06-25 14:30:45 -07:00
394f60b0fc [caffe2] update make_cifar_db to move the string into DB::Put() (#60692)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60692

Update make_cifar_db.cc to work with the DB API changes in D29204425 (00896cb9ed).

Test Plan: buck build caffe2/binaries:make_cifar_db

Differential Revision: D29374754

fbshipit-source-id: 23d2acd24031d11071791e398433b537215ffd38
2021-06-25 14:02:24 -07:00
e1bd4963e2 To intorduce Functional API for multi-tensor (#60735)
Summary:
In this PR we change Multi-Tensor Optimizers to Functional API.

We can see that in the file : https://github.com/pytorch/pytorch/blob/master/torch/optim/_functional.py , there has been functional API defined for most of Optimizers. However we do not have similar file / functionality for multi tensors :
https://github.com/pytorch/pytorch/tree/master/torch/optim/_multi_tensor

Therefore we are adding it in this PR here.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60735

Reviewed By: vincentqb

Differential Revision: D29392253

Pulled By: iramazanli

fbshipit-source-id: cebc8e7b07ab11156370f5297cfb419cd9f20b46
2021-06-25 13:09:26 -07:00
8f16a38067 Add missing kernel checks (#60635)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60635

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29355747

fbshipit-source-id: 20bae292703a54b2895a33c11e6f1b8b9a9d8195
2021-06-25 12:54:40 -07:00
dfc8247d33 Faster cumsum and cumprod backwards (#60642)
Summary:
Piggybacking on https://github.com/pytorch/pytorch/pull/58747, now we can implement the backwards of `cumsum` and `cumprod` without tricks. This minimises the number of kernels that are launched in GPU, so we see a reasonable speed-up on GPU. We should also get a better stability for ill-conditioned inputs, as we do not perform any numerical tricks to get the result.

Note that the benchmarks test forward + backward, so the true speed-up on the backward should be even faster. Even more so in `cumsum`, as it requires less operations than the backward of `cumprod`.

<details>
<summary>
Test Script
</summary>

```python
from itertools import product

import torch
from torch.utils.benchmark import Compare, Timer

def get_timer(ndims, prod_dim, dim, num_threads, device):
    size = [500]*ndims
    size[dim] = prod_dim

    x = torch.rand(*size, device=device, requires_grad=True)
    # Make sure there are no zeros as the formula for the backward
    # that we are testing is for when the backward has no zeros
    with torch.no_grad():
        x.add_(1e-3)
    grad = torch.ones_like(x)

    timer = Timer(
        "torch.autograd.grad([x.cumprod(dim)], [x], grad_outputs=[grad])",
        globals={"x": x, "dim": dim, "grad": grad},
        label=f"Cumprod + Backwards {device}",
        description=f"dim: {dim}",
        sub_label=f"prod_dim: {prod_dim}",
        num_threads=num_threads,
    )

    return timer.blocked_autorange(min_run_time=5)

def get_params():
    ndims = 3
    dims = range(ndims)
    prod_dims = [10, 100, 500]
    for dim, prod_dim, device in product(dims, prod_dims, ("cpu", "cuda")):
        threads = (1, 2, 4) if device == "cpu" else (1,)
        for num_threads in threads:
            yield ndims, prod_dim, dim, num_threads, device

compare = Compare([get_timer(*params) for params in get_params()])
compare.trim_significant_figures()
compare.print()
```

</details>

<details>
<summary>
Benchmark PR
</summary>

```
[------------ Cumprod + Backwards cpu -------------]
                     |  dim: 0  |  dim: 1  |  dim: 2
1 threads: -----------------------------------------
      prod_dim: 10   |     11   |     14   |     12
      prod_dim: 100  |    260   |    270   |    260
      prod_dim: 500  |   1400   |   1550   |   1360
2 threads: -----------------------------------------
      prod_dim: 10   |      6   |      6   |      6
      prod_dim: 100  |    170   |    166   |    167
      prod_dim: 500  |    902   |    950   |    858
4 threads: -----------------------------------------
      prod_dim: 10   |      4   |      3   |      3
      prod_dim: 100  |    110   |    108   |    106
      prod_dim: 500  |    576   |    590   |    547

Times are in milliseconds (ms).

[------------ Cumprod + Backwards cuda ------------]
                     |  dim: 0  |  dim: 1  |  dim: 2
1 threads: -----------------------------------------
      prod_dim: 10   |    562   |    566   |   1075
      prod_dim: 100  |   5388   |   5394   |   6697
      prod_dim: 500  |  28170   |  27580   |  30740

Times are in microseconds (us).
```

</details>

<details>
<summary>
Benchmark master
</summary>

```
[------------ Cumprod + Backwards cpu -------------]
                     |  dim: 0  |  dim: 1  |  dim: 2
1 threads: -----------------------------------------
      prod_dim: 10   |     11   |     13   |     12
      prod_dim: 100  |    270   |    270   |    256
      prod_dim: 500  |   1500   |   1590   |   1300
2 threads: -----------------------------------------
      prod_dim: 10   |      6   |      6   |      6
      prod_dim: 100  |    170   |    170   |    164
      prod_dim: 500  |    911   |    940   |    840
4 threads: -----------------------------------------
      prod_dim: 10   |      4   |      4   |      4
      prod_dim: 100  |    111   |    109   |    105
      prod_dim: 500  |    570   |    590   |    536

Times are in milliseconds (ms).

[------------ Cumprod + Backwards cuda ------------]
                     |  dim: 0  |  dim: 1  |  dim: 2
1 threads: -----------------------------------------
      prod_dim: 10   |    616   |    597   |   1109
      prod_dim: 100  |   5976   |   5723   |   7017
      prod_dim: 500  |  31110   |  29160   |  32320

Times are in microseconds (us).
```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60642

Reviewed By: ngimel

Differential Revision: D29366368

Pulled By: albanD

fbshipit-source-id: b0d692ce030352965c2f152e0f92fbb61fc5ebde
2021-06-25 12:44:12 -07:00
d3bec9f4d2 Use S3 for documentation previews (#60711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60711

We already build the docs on each PR, this adds a step to push the relevant folder of the docs (we build the entire website for pytorch.github.io which clocks in at around 500 MB, but we really only need the "master" docs, not every version. The master docs by themselves are around 50 MB which is more reasonable). It uses the same S3 bucket as the artifacts but places the items at the `pytorch/pytorch/pr-previews/<pr number>` prefix. The bucket has a rule to expire resources in that prefix after 1 month.

On the AWS side the bucket has static hosting enabled with CloudFront directing to the docs preview prefix, so you can see the output at `https://d28slxzaq48q8t.cloudfront.net/<pr number>/`, e.g. https://d28slxzaq48q8t.cloudfront.net/60711/. For advertising we could link this on the HUD PR page as well as in the Dr. CI comment. We could add a CNAME on CloudFront to make this be `pr-preview.pytorch.org/<pr number>` or something but having random PRs be able to host content on the pytorch.org domain seems sketchy.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D29398818

Pulled By: driazati

fbshipit-source-id: 24032854d83815853b3650d8e54f60b684707f76
2021-06-25 12:12:26 -07:00
aacc722aec Dispatch to Python via __torch_dispatch__ (#59760)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59760

See https://github.com/pytorch/pytorch/issues/59049

There are some moving parts to this PR, I'll structure this explanation so the straightforward parts go first, and then the less straightforward parts.

**The actual dispatch to Python.** The core logic of dispatch to Python lives in `concrete_dispatch_fn` in `torch/csrc/autograd/python_variable.cpp`. It takes the input IValue stack, scans all the arguments for Tensor arguments, and defers most of the heavy lifting to `handle_torch_function_no_python_arg_parser` which actually does all of the logic for calling out to torch dispatch (in particular, this function handles multiple dispatch situations for you). Because we have a different function name than regular `__torch_function__` handling, `handle_torch_function_no_python_arg_parser` is generalized to accept a magic method name to look for when testing if Tensors have custom handling or not. Unlike `__torch_function__`, by default there is no `__torch_dispatch__` on Tensor classes.

**Maintaining the Python dispatch key.** In order to get to the dispatch to Python logic, we must tag Tensors with the `__torch_dispatch__` magic method with the newly added Python dispatch key (separated from PythonFuncTorch to allow for a transitional period while they migrate to this mechanism). We expose a new private property `_is_python_dispatch` that assists in debugging if a Tensor is participating in Python dispatch or not. We apply the Python dispatch key the first time a PyObject for a Tensor is constructed (THPVariable_NewWithVar), testing if `__torch_dispatch__` exists with  then newly added `check_has_torch_dispatch`.

**Shallow copy and detach.** For the simple examples tested in this PR, most creations of Tensor route through the dispatcher. The exception to this is `shallow_copy_and_detach`, which bypasses the dispatcher and is used when saving tensors for backwards. When a Tensor is Python dispatch, we override the behavior of `shallow_copy_and_detach` to instead directly call into `__torch_dispatch__` to perform a `detach` operation (in the same way it would be invoked if you called `detach` directly). Because this Python call is triggered directly from c10::TensorImpl, it must be indirected through `PyInterpreter::detach`, which is the general mechanism for dynamic dispatching to the Python interpreter associated with a TensorImpl.

**torchdeploy compatibility.** The dispatch to Python logic cannot be directly registered to the dispatcher as it is compiled in the Python library, which will get loaded multiple times per torchdeploy interpreter. Thus, we must employ a two phase process. First, we register a fallback inside a non-Python library (aten/src/ATen/core/PythonFallbackKernel.cpp). Its job is to determine the appropriate PyInterpreter to handle the Python dispatch by going through all of the arguments and finding the first argument that has a PyObject/PyInterpreter. With this PyInterpreter, it makes another dynamic dispatch via "dispatch" which will go to the correct torchdeploy interpreter to handle dispatching to actual Python.

**Testing.** We provide a simple example of a LoggingTensor for testing, which can be used to generate TorchScript-like traces to observe what operations are being called when a Tensor is invoked. Although a LoggingTensor would be better implemented via an is-a relationship rather than a has-a relationship (as is done in the test), we've done it this way to show that arbitrarily complex compositions of tensors inside a tensor work properly.

**Known limitations.**

* We haven't adjusted any operator code, so some patterns may not work (as they lose the Python subclass in an unrecoverable way)
* `__torch_function__` must be explicitly disabled with `_disabled_torch_function_impl` otherwise things don't work quite correctly (in particular, what is being disabled is default subclass preservation behavior.)
* We don't ever populate kwargs, even when an argument is kwarg-only

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision:
D29017912
D29017912

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Pulled By: ezyang

fbshipit-source-id: a67714d9e541d09203a8cfc85345b8967db86238
2021-06-25 11:50:32 -07:00
a53d7f8f7c Remove test linalg test skips from MAGMA integration (#58232)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/55552; majority of cases in https://github.com/pytorch/pytorch/issues/51303

Tests in torch/testing/_internal/common_methods_invocations.py  (tested through test_ops) cannot be fully removed, since the machines seem to be running out of gpu memory during the test, and needs further analysis

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58232

Reviewed By: ngimel

Differential Revision: D29394021

Pulled By: malfet

fbshipit-source-id: f108a70af33beec908ac1c0b58467f8744e6fe87
2021-06-25 11:44:49 -07:00
8216da1f23 Use python3.6 compatible APIs in clang_tidy.py (#60659)
Summary:
This PR make `tools/clang_tidy.py` use python 3.6 APIs for `asyncio` and `shlex`.

I ran into some issues when running this script with the `-j` flag inside of the clang-tidy docker image (which uses python 3.6). Specifically, the functions `asycnio.run` and `shlex.join` are available in python >= 3.8.

This change does not affect CI because we do not run the clang-tidy job in parallel.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60659

Reviewed By: albanD

Differential Revision: D29377851

Pulled By: 1ntEgr8

fbshipit-source-id: 92ab7ee6782b78d40ffccd03f1718ede4204d948
2021-06-25 10:35:03 -07:00
6322f66878 Add python version and cuda-specific folder to store extensions (#60592)
Summary:
See https://github.com/pytorch/pytorch/issues/55267

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60592

Reviewed By: albanD

Differential Revision: D29353368

Pulled By: ezyang

fbshipit-source-id: 1fbcd021f1030132c0f950f33ce4a3a2fef351e0
2021-06-25 10:27:04 -07:00
a404cc9a7b CUDA addcmul and addcdiv do math in float for 16 bits I/O (#60715)
Summary:
Currently foreach `addcmul` and `addcdiv` cast scalar to float so that actual math is done in FP32 when tensor dtype is Float16/BFloat16 while regular `addcmul` and `addcdiv`, not.

### Reproducible steps to see the behavioral difference
```ipython
In [1]: import torch; torch.__version__
Out[1]: '1.9.0'

In [2]: a, b, c = torch.tensor([60000.0], device='cuda', dtype=torch.half), torch.tensor([60000.0], device='cuda', dtype=torch.half), torch.tensor([-1.0], device='cuda', dtype=torch.half)

In [4]: torch.addcmul(a, b, c, value=2)
Out[4]: tensor([-inf], device='cuda:0', dtype=torch.float16)

In [5]: torch._foreach_addcmul([a], [b], [c], value=2)[0]
Out[5]: tensor([-60000.], device='cuda:0', dtype=torch.float16)
```

### How foreach casts?
Foreach addcmul and addcdiv cast scalar to `opmath_t` (almost equivalent to acc_type) here: 42c8439b6e/aten/src/ATen/native/cuda/ForeachPointwiseOp.cu (L30) and cast inputs and results here:
42c8439b6e/aten/src/ATen/native/cuda/ForeachFunctors.cuh (L133-L135)

Related to https://github.com/pytorch/pytorch/issues/58833 #60227 https://github.com/pytorch/pytorch/issues/60454
cc ptrblck mcarilli ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60715

Reviewed By: albanD

Differential Revision: D29385715

Pulled By: ngimel

fbshipit-source-id: 8bb2db19ab66fc99d686de056a6ee60f9f71d603
2021-06-25 10:21:35 -07:00
0be65cd52a [c10d] Fix test_collective_hang flakiness (#60662)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60662

Fixes this flaky test. Basically, sometimes a rank can exit the test
early before rank 0 calls into allreduce. In this case Gloo will throw
connection reset error on all other ranks.
ghstack-source-id: 132363151

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D29364806

fbshipit-source-id: ce0c292a2166edad57ea0dbb76df12cfd560a10d
2021-06-25 10:15:18 -07:00
474bdaf54d Add --print-include-paths option to tools/linter/clang_tidy.py (#60744)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60744

Fixes #60739

Test Plan:
Run this comand:
```
python3 tools/linter/clang_tidy.py --paths torch/csrc/fx --print-include-paths
```

Output (varies from machine to machine):
```
(clang-tidy output)
.
.
.

clang -cc1 version 11.0.0 based upon LLVM 11.0.0 default target x86_64-unknown-linux-gnu
ignoring nonexistent directory "nccl/include"
ignoring nonexistent directory "/include"
ignoring duplicate directory ".."
ignoring duplicate directory "../aten/src"
ignoring duplicate directory "../third_party/onnx"
ignoring duplicate directory ".."
ignoring duplicate directory ".."
ignoring duplicate directory "../torch/lib"
ignoring duplicate directory "../torch/../third_party/gloo"
  as it is a non-system directory that duplicates a system directory
ignoring duplicate directory "../third_party/ideep/mkl-dnn/src/../include"
  as it is a non-system directory that duplicates a system directory
#include "..." search starts here:
#include <...> search starts here:
 aten/src
 ../aten/src
 .
 ..
 ../cmake/../third_party/benchmark/include
 caffe2/contrib/aten
 ../third_party/onnx
 third_party/onnx
 ../third_party/foxi
 third_party/foxi
 ../torch/../aten/src/TH
 caffe2/aten/src
 third_party
 ../torch/../third_party/valgrind-headers
 ../torch/csrc
 ../torch/csrc/api/include
 ../torch/lib
 ../torch/lib/libshm
 ../torch/csrc/api
 third_party/ideep/mkl-dnn/include
 ../third_party/fmt/include
 third_party/gloo
 ../torch/../third_party/gloo
 ../cmake/../third_party/googletest/googlemock/include
 ../cmake/../third_party/googletest/googletest/include
 ../third_party/protobuf/src
 /data/users/eltonpinto/miniconda3/envs/pytorch/include
 ../third_party/gemmlowp
 ../third_party/neon2sse
 ../third_party/XNNPACK/include
 ../third_party
 ../cmake/../third_party/eigen
 /home/eltonpinto/local/miniconda3/envs/pytorch/include/python3.8
 /home/eltonpinto/local/miniconda3/envs/pytorch/lib/python3.8/site-packages/numpy/core/include
 ../cmake/../third_party/pybind11/include
 /usr/local/cuda-11.3/include
 ../third_party/ideep/mkl-dnn/src/../include
 ../third_party/ideep/include
 /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8
 /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/x86_64-redhat-linux
 /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/backward
 /usr/local/include
 /usr/lib64/clang/11.0.0/include
 /usr/include

.
.
.
(more clang-tidy output)
```

Imported from OSS

Reviewed By: ngimel

Differential Revision: D29395398

fbshipit-source-id: e92077a9c4e9dee7f9d7e05df180d552e3763540
2021-06-25 10:12:15 -07:00
608f12b818 Fix --dry-run option in tools/linter/clang_tidy.py (#60744)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60744

Fixes #60741

Test Plan:
Run this command:
```
python3 tools/linter/clang_tidy.py --paths torch/csrc/fx --dry-run
```
Output:
```
clang-tidy -p build -config '{"InheritParentConfig": true, "Checks": " bugprone-*, -bugprone-forward-declaration-namespace, -bugprone-macro-parentheses, -bugprone-lambda-function-name, -bugprone-reserved-identifier, cppcoreguidelines-*, -cppcoreguidelines-avoid-magic-numbers, -cppcoreguidelines-interfaces-global-init, -cppcoreguidelines-macro-usage, -cppcoreguidelines-owning-memory, -cppcoreguidelines-pro-bounds-array-to-pointer-decay, -cppcoreguidelines-pro-bounds-constant-array-index, -cppcoreguidelines-pro-bounds-pointer-arithmetic, -cppcoreguidelines-pro-type-cstyle-cast, -cppcoreguidelines-pro-type-reinterpret-cast, -cppcoreguidelines-pro-type-static-cast-downcast, -cppcoreguidelines-pro-type-union-access, -cppcoreguidelines-pro-type-vararg, -cppcoreguidelines-special-member-functions, -facebook-hte-RelativeInclude, hicpp-exception-baseclass, hicpp-avoid-goto, modernize-*, -modernize-concat-nested-namespaces, -modernize-return-braced-init-list, -modernize-use-auto, -modernize-use-default-member-init, -modernize-use-using, -modernize-use-trailing-return-type, performance-*, -performance-noexcept-move-constructor, -performance-unnecessary-value-param, ", "HeaderFilterRegex": "torch/csrc/.*", "AnalyzeTemporaryDtors": false, "CheckOptions": null}' torch/csrc/fx/fx_init.cpp
```

Reviewed By: ngimel

Differential Revision: D29394538

Pulled By: 1ntEgr8

fbshipit-source-id: b824bc2aa63631f074e9ad17092e4e063d347395
2021-06-25 09:53:29 -07:00
3a838e4ce3 Parametrizations depending on several inputs (#60530)
Summary:
Resubmit of https://github.com/pytorch/pytorch/pull/58488

There was a line that had been changed in `test_nn.py` as caught in https://github.com/pytorch/pytorch/pull/58488#discussion_r651267668

I reverted that line, which should never have been changed. I reckon that should solve the issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60530

Reviewed By: ngimel

Differential Revision: D29329865

Pulled By: albanD

fbshipit-source-id: 8dfd0cd968fe26a3924dae7ca366af2c8a8639b3
2021-06-25 09:16:57 -07:00
8cba365378 Fix incorrect doc about the dtype for torch.randint described in issue #56347 (#60507)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60507

Fix incorrect documentation about the dtype for `torch.randint` described in issue #56347

Test Plan: Review documentation to make sure formatting is right

Reviewed By: bdhirsh

Differential Revision: D29321181

fbshipit-source-id: caae69a9bbb30052da518a3f5d22a7ed3504cdd2
2021-06-25 07:51:36 -07:00
d8c3d555e4 [Delegate] Support composite of lowered sub modules of the same backend (#59921)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59921

Test Plan: Imported from OSS

Reviewed By: raziel

Differential Revision: D29091143

Pulled By: iseeyuan

fbshipit-source-id: 9ffcd18681917ece8ec73a34866c53701bdee1bc
2021-06-25 07:18:32 -07:00
7c2938bf67 To refactor Sparse Adam algorithm for functional form (#59171)
Summary:
Adds Functional Interface for Sparse Adam Optimizer.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59171

Reviewed By: vincentqb

Differential Revision: D29360582

Pulled By: iramazanli

fbshipit-source-id: 5ceffd7f4b7abd1e0b758a5b8445abdf5555eba0
2021-06-25 06:35:39 -07:00
963c983366 Improve numerical stability of LayerNorm (#59987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59987

Similar as GroupNorm, improve numerical stability of LayerNorm by Welford algorithm and pairwise sum.

Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm"

Reviewed By: ngimel

Differential Revision: D29115235

fbshipit-source-id: 5183346c3c535f809ec7d98b8bdf6d8914bfe790
2021-06-25 02:22:42 -07:00
5b1f5c8f17 When creating a single parition skip the output nodes, but process possible nodes after it. (#60370)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60370

When creating a single parition skip the output nodes, but process possible nodes after it.

Test Plan: Run all CI tests.

Reviewed By: jfix71

Differential Revision: D29265278

fbshipit-source-id: 2242009973a54498d8027cce5a294558a1206fdf
2021-06-24 23:50:30 -07:00
2b51a8a935 [BackwardCompatibility] Remove aten::to from allow_list (#60147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60147

Remove aten::to from allow_list now that the aten::to schema change has landed (D29121620 (eda2ddb5b0)).

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D29187314

fbshipit-source-id: abdb5a560287a861f3858732f7b3da342ee4aa55
2021-06-24 22:57:57 -07:00
3ca28656fa [special] erfcx cuda support (#60519)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60519

Reviewed By: ngimel

Differential Revision: D29353105

Pulled By: mruberry

fbshipit-source-id: 2f525a347a22f96411739a16e354c7291e863f95
2021-06-24 21:50:37 -07:00
46d27a53fe cuda rpc backward sparse tensor fix (#59609)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59609

quick fix for https://github.com/pytorch/pytorch/issues/58755

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D29335722

Pulled By: gcramer23

fbshipit-source-id: 0de7e0399b30f0934320f1e9abb1b92a45bcf929
2021-06-24 21:40:43 -07:00
561132f902 Revert D29330585: [pytorch][PR] add BFloat16 support for arange on CPU
Test Plan: revert-hammer

Differential Revision:
D29330585 (375d201086)

Original commit changeset: b8a04cee0c3f

fbshipit-source-id: dc138f9613becd083848e82d15c138d3883493c8
2021-06-24 20:57:43 -07:00
d63c236fb3 Introduce quantized convolution serialization format 3 (#60241)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60241

We're going to make a forward-incompatible change to this serialization
format soon, so I'm taking the opportunity to do a little cleanup.

- Use int for version.  This was apparently not possible when V2
  was introduced, but it works fine now as long as we use int64_t.
  (Note that the 64-bits are only used in memory.  The serializer will
  use 1 byte for small non-negative ints.)
- Remove the "packed params" tensor and replace it with a list of ints.
- Replace the "transpose" field with "flags" to allow more binary flags
  to be packed in.
- Unify required and optional tensors.  I just made them all optional
  and added an explicit assertion for the one we require.

A bit of a hack: I added an always-absent tensor to the front of the
tensor list.  Without this, when passing unpacked params from Python to
the ONNX JIT pass, they type would be inferred to `List[Tensor]` if all
tensors were present, making it impossible to cast to
`std::vector<c10::optional<at:Tensor>>` without jumping through hoops.

The plan is to ship this, along with another diff that adds a flag to
indicate numerical requirements, wait a few weeks for an FC grace
period, then flip the serialization version.

Test Plan: CI.  BC tests.

Reviewed By: vkuzo, dhruvbird

Differential Revision: D29349782

Pulled By: dreiss

fbshipit-source-id: cfef5d006e940ac1b8e09dc5b4c5ecf906de8716
2021-06-24 20:52:43 -07:00
42c8439b6e TH: Clean up dead code (#60655)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60655

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29371717

Pulled By: ngimel

fbshipit-source-id: faa71b1d4a15450c78e12aa917daec853057bce9
2021-06-24 19:42:16 -07:00
4a7d281119 Migrate THAllocator to ATen (#60325)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60325

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29371715

Pulled By: ngimel

fbshipit-source-id: 78ec8368a48e1a4690d0664a0b02d2a235af98ff
2021-06-24 19:42:14 -07:00
d586248544 Migrate THStorage_resizeBytes to ATen (CPU) (#60324)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60324

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29371716

Pulled By: ngimel

fbshipit-source-id: 056aee0ec87722090c133777b6948c28b03b37e4
2021-06-24 19:41:02 -07:00
ddec2e0ef4 tentative fix for adaptiveavgpool gradient computation (#60630)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60524

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60630

Reviewed By: jbschlosser

Differential Revision: D29374257

Pulled By: ngimel

fbshipit-source-id: be05f0ceb53e6f0f0a59a83b710dafde469d4e8a
2021-06-24 19:02:32 -07:00
40a7c317bc Run BLAS F2C checks on host architecture (#60703)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60351

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60703

Reviewed By: driazati

Differential Revision: D29379727

Pulled By: malfet

fbshipit-source-id: dadbb1d39373887f07d59d0a05e093a5d070b016
2021-06-24 18:44:41 -07:00
7bc86458e1 Revert "Revert D28833086: beef up at::_ops API" (#60214)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60214

Relanding this PR, but with a fix for windows cuda builds (example failure in master here: https://github.com/pytorch/pytorch/runs/2852662871)

This is identical to the original PR except for one change in `tools/codegen/gen.py`: `static constexpr` -> `static CONSTEXPR_EXCEPT_WIN_CUDA`

This actually took a while to figure out, until I tracked down a previous pytorch PR that encountered a similar issue: https://github.com/pytorch/pytorch/pull/40675

This reverts commit 6d0fb85a623f5ef3f3f1a2afc3660cb71fa70511.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D29213932

Pulled By: bdhirsh

fbshipit-source-id: b90c7c10e5a51f8d6173ddca673b418e5774c248
2021-06-24 18:08:54 -07:00
9c4eec2a2d Adjust path to distributed cpp tests (#60705)
Summary:
After https://github.com/pytorch/pytorch/issues/60543 they are installed in the same folder as the rest of the tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60705

Reviewed By: driazati

Differential Revision: D29380670

Pulled By: malfet

fbshipit-source-id: a432d26c731e9220e00d8c800b1429b37d51655b
2021-06-24 17:42:36 -07:00
8395fdde46 Increase tolerance for some distributed tests to 5e-5 (#60462)
Summary:
On A100 GPUs 10 tests fail due to slightly higher deviations.
This fixes those.

Note that rtol is still the default and atol was increased by a factor of 5 (from 1e-5)

The failing tests were:

- test_accumulate_gradients_module
- test_accumulate_gradients_module_with_grad_is_view
- test_ddp_checkpointing_once
- test_ddp_checkpointing_twice
- test_ddp_checkpointing_unused_params
- test_ddp_checkpointing_weight_sharing
- test_nccl_backend_1gpu_module_device_ids_integer_list
- test_nccl_backend_1gpu_module_device_ids_torch_device_list
- test_nccl_backend_single_device_module_device_ids_None
- test_nccl_backend_single_device_module_empty_device_id

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60462

Reviewed By: albanD

Differential Revision: D29366145

Pulled By: zhaojuanmao

fbshipit-source-id: c3e34c007363dfebf75ccb82004a67e4d2e6f3cd
2021-06-24 17:38:54 -07:00
2fa6c7627e [CUDA graphs][BC-breaking] Removes post-backward syncs on default stream (#60421)
Summary:
Before https://github.com/pytorch/pytorch/pull/57833, calls to backward() or grad() synced only the calling thread's default stream with autograd leaf streams at the end of backward. This made the following weird pattern safe:
```python
with torch.cuda.stream(s):
    # imagine forward used many streams, so backward leaf nodes may run on many streams
    loss.backward()
# no sync
use grads
```

but a more benign-looking pattern was unsafe:
```python
with torch.cuda.stream(s):
    # imagine forward used a lot of streams, so backward leaf nodes may run on many streams
    loss.backward()
    # backward() syncs the default stream with all the leaf streams, but does not sync s with anything,
    # so counterintuitively (even though we're in the same stream context as backward()!)
    # it is NOT SAFE to use grads here, and there's no easy way to make it safe,
    # unless you manually sync on all the streams you used in forward,
    # or move "use grads" back to default stream outside the context.
    use grads
```
mruberry ngimel and I decided backward() should have the [same user-facing stream semantics as any cuda op](https://pytorch.org/docs/master/notes/cuda.html#stream-semantics-of-backward-passes).** In other words, the weird pattern should be unsafe, and the benign-looking pattern should be safe. Implementationwise, this meant backward() should sync its calling thread's current stream, not default stream, with the leaf streams.

After https://github.com/pytorch/pytorch/pull/57833, backward syncs the calling thread's current stream AND default stream with all leaf streams at the end of backward. The default stream syncs were retained for temporary backward compatibility.

This PR finishes https://github.com/pytorch/pytorch/pull/57833's work by deleting syncs on the default stream.

With this PR, graph-capturing an entire backward() call should be possible (see the [test_graph_grad_scaling diffs](https://github.com/pytorch/pytorch/compare/master...mcarilli:streaming_backwards_remove_default_syncs?expand=1#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3641-R3642)).

** first paragraph has a formatting error which this PR should also fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60421

Reviewed By: albanD

Differential Revision: D29370344

Pulled By: ngimel

fbshipit-source-id: 3248bc5fb92fc517db0c15c897e5d7250f67d7fe
2021-06-24 17:34:02 -07:00
d90aefe380 Improve error message for non-differentiable inputs (#60610)
Summary:
Improve the error message when inputs should not requires_grad=True.

For example, we now get
```
RuntimeError: The function 'binary_cross_entropy' is not differentiable with respect to argument 'weight'. This input cannot have requires_grad True.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60610

Reviewed By: anjali411

Differential Revision: D29361424

Pulled By: albanD

fbshipit-source-id: 38163ce11ae1b8df326424e95ca20e55fea2a99a
2021-06-24 17:29:16 -07:00
4ed2d5d9bb ps sparse rpc (#58003)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58003

adds trainer class DdpTrainer
adds trainer class DdpSparseRpcTrainer
adds server class ParameterServerBase
adds server class AverageParameterServer
adds experiment ddp_cpu_sparse_rpc_nccl_allreduce
adds experiment ddp_cuda_sparse_rpc_nccl_allreduce

quip document https://fb.quip.com/iQUtAeKIxWpF

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29379696

Pulled By: gcramer23

fbshipit-source-id: 9cf5fb7398ba2fa3eb694afbddc4ed00d97f205f
2021-06-24 17:21:49 -07:00
fadaa52f64 [caffe2] add an EstimateAllBlobSizes operator (#59775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59775

This operator is similar to `GetAllBlobNames` but also returns the estimated
size required to serialize each node.

One goal of this operator is to allow checkpoint saving logic to estimate the
amount of space/bandwidth required to save a checkpoint when first starting
training, without actually serializing any blobs yet.  Currently the
checkpointing logic uses `GetAllBlobNames` to determine the blobs to
checkpoint.  It can instead be updated to use `EstimateAllBlobSizes` to also
get an estimate for how much space will be required for the checkpoint.
ghstack-source-id: 132275153

Test Plan: Included a new unit test.

Reviewed By: mraway

Differential Revision: D29020227

fbshipit-source-id: 811e5d86c4b59183e84e6424c48c97739be09043
2021-06-24 16:55:22 -07:00
fe4ded01f7 [package] typing.io/re edge case hack (#60666)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60666

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D29367847

Pulled By: Lilyjjo

fbshipit-source-id: 2c38140fbb3eab61ae3de60ab475243f0338c547
2021-06-24 14:53:46 -07:00
375d201086 add BFloat16 support for arange on CPU (#60444)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60444

Reviewed By: VitalyFedyunin

Differential Revision: D29330585

Pulled By: ezyang

fbshipit-source-id: b8a04cee0c3f2ff5544e2b821324ce8fc4e9d0f2
2021-06-24 14:38:47 -07:00
7fc4e67771 ns for fx: fix shadow logger error for resnet18 (#60559)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60559

Adds `resnet18` to integration test, and fixes the error to
make creating the shadow model work.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_resnet18
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29336236

fbshipit-source-id: 9425aa096162d80ef3a7c98144b2301cfbccc1ea
2021-06-24 13:42:18 -07:00
4ddb2b43b7 ns for fx: expose function to add comparisons between logged values (#60311)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60311

Adds a user facing utility function to FX Numeric Suite Core APIs
for comparing the values extracted by the loggers to each other.
This is needed for any kind of analysis, so would be great to
provide an example implementation.

Example:

```
// code

m = nn.Sequential(nn.Conv2d(1, 1, 1), nn.Conv2d(1, 1, 1)).eval()
qconfig_dict = {'': torch.quantization.default_qconfig}
mp = torch.quantization.quantize_fx.prepare_fx(m, qconfig_dict)
mq = torch.quantization.quantize_fx.convert_fx(copy.deepcopy(mp))
results = extract_weights('fp32', mp, 'int8', mq)
extend_logger_results_with_comparison(
    results, 'fp32', 'int8', compute_sqnr, 'sqnr_int8_vs_fp32')

print(results)

// results

{
  '_1': {'weight': {
    'fp32': [
      {'type': 'weight', 'values': [tensor([[[[-0.3284]]]])], 'prev_node_name': '_1', 'prev_node_target_type': "<class 'torch.nn.modules.conv.Conv2d'>", 'ref_node_name': '_1', 'index_within_arg': 0, 'index_of_arg': 0}
    ],
    'int8': [
      {'type': 'weight', 'values': [tensor([[[[-0.3297]]]], size=(1, 1, 1, 1), dtype=torch.qint8,
       quantization_scheme=torch.per_tensor_affine, scale=0.002575645223259926,
       zero_point=0)], 'prev_node_name': '_1', 'prev_node_target_type': "<class 'torch.nn.quantized.modules.conv.Conv2d'>", 'ref_node_name': '_1', 'index_within_arg': 0, 'index_of_arg': 0, 'sqnr_int8_vs_fp32': [tensor(48.1308)]}
    ]
  }},
  '_0': {'weight': {
    'fp32': [{'type': 'weight', 'values': [tensor([[[[0.5205]]]])], 'prev_node_name': '_0', 'prev_node_target_type': "<class 'torch.nn.modules.conv.Conv2d'>", 'ref_node_name': '_0', 'index_within_arg': 0, 'index_of_arg': 0}],
    'int8': [{'type': 'weight', 'values': [tensor([[[[0.5184]]]], size=(1, 1, 1, 1), dtype=torch.qint8,
       quantization_scheme=torch.per_tensor_affine, scale=0.004082232713699341,
       zero_point=0)], 'prev_node_name': '_0', 'prev_node_target_type': "<class 'torch.nn.quantized.modules.conv.Conv2d'>", 'ref_node_name': '_0', 'index_within_arg': 0, 'index_of_arg': 0, 'sqnr_int8_vs_fp32': [tensor(48.1309)]}]
  }}
}

```

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extend_logger_results_with_comparison
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29244715

fbshipit-source-id: a5547b449ea54e046c752119559be49bd738beea
2021-06-24 13:42:16 -07:00
31fe1c1323 ns for fx: rekey results by model node names (#60305)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60305

Adjusts the NS for FX weight and activation extraction APIs
to require a model name, and rekeys the results of these APIs
to use the node names of the specified model as layer keys.

For example, before

```
// API call
results = ns.extract_logger_info(
  model_a, model_b, ns.OutputLogger)

// results
{'base_op_1_0': {'node_output':
  {'model_a': [{'ref_node_name': 'linear1', ...}]}}}
```

and after

```
// API call
results = ns.extract_logger_info(
  model_a, model_b, ns.OutputLogger, 'model_b_name')

// results
// note: instead of `base_op_1_0`, the layer is named `linear1`
{'linear1': {'node_output':
  {'model_a': [{'ref_node_name': 'linear1', ...}]}}}
```

Note: we cannot use these names while collecting data because
node names are not guaranteed to be consistent across graphs.
This is why we only rekey as the very last step.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_layer_names
```

Imported from OSS

Reviewed By: hx89

Differential Revision: D29243045

fbshipit-source-id: d39ecdfdd18b07291e3ecefed2ede287b100b7d0
2021-06-24 13:41:01 -07:00
0ba4044b9d Increase some tolerances for tf32 for Conv3d tests (#60451)
Summary:
Allow those tests to pass on A100 GPUs which support tf32

Basically follow-up to https://github.com/pytorch/pytorch/pull/52871 which also increased some precisions to 0.05

For reference these are the failures I see (only ones in testnn with 1.9.0):
```
FAIL: test_Conv3d_pad_same_cuda_tf32 (__main__.TestNN)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper
    method(*args, **kwargs)
  File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper
    method(*args, **kwargs)
  File "test_nn.py", line 11296, in with_tf32_on
    test.test_cuda(self, **kwargs)
  File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_nn.py", line 5103, in test_cuda
    test_case.assertEqualIgnoreType(cpu_d_i, gpu_d_i, atol=self.precision, rtol=0)
  File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1254, in assertEqualIgnoreType
    return self.assertEqual(*args, exact_dtype=False, **kwargs)
  File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1355, in assertEqual
    super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=0.005, found 161 element(s) (out of 288) whose difference(s) exceeded the margin of error (including 0 nan compariso
ns). The greatest difference was 0.032408137116391345 (-33.45570601919647 vs. -33.42329788208008), which occurred at index (2, 0, 0, 1, 0).

======================================================================
FAIL: test_Conv3d_pad_same_dilated_cuda_tf32 (__main__.TestNN)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper
    method(*args, **kwargs)
  File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper
    method(*args, **kwargs)
  File "test_nn.py", line 11296, in with_tf32_on
    test.test_cuda(self, **kwargs)
  File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_nn.py", line 5103, in test_cuda
    test_case.assertEqualIgnoreType(cpu_d_i, gpu_d_i, atol=self.precision, rtol=0)
  File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1254, in assertEqualIgnoreType
    return self.assertEqual(*args, exact_dtype=False, **kwargs)
  File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1355, in assertEqual
    super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=0.005, found 111 element(s) (out of 288) whose difference(s) exceeded the margin of error (including 0 nan compariso
ns). The greatest difference was 0.024654212557543076 (35.104286017977465 vs. 35.07963180541992), which occurred at index (3, 0, 0, 0, 2).

======================================================================
FAIL: test_Conv3d_pad_valid_cuda_tf32 (__main__.TestNN)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper
    method(*args, **kwargs)
  File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1033, in wrapper
    method(*args, **kwargs)
  File "test_nn.py", line 11296, in with_tf32_on
    test.test_cuda(self, **kwargs)
  File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_nn.py", line 5103, in test_cuda
    test_case.assertEqualIgnoreType(cpu_d_i, gpu_d_i, atol=self.precision, rtol=0)
  File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1254, in assertEqualIgnoreType
    return self.assertEqual(*args, exact_dtype=False, **kwargs)
  File "/tmp/easybuild-tmp/eb-ED4 (1f47a80e88)M3d/tmpqOhUjN/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1355, in assertEqual
    super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0 and atol=0.005, found 41 element(s) (out of 288) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.010903167642320355 (8.074376869119371 vs. 8.06347370147705), which occurred at index (0, 0, 1, 0, 0).

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60451

Reviewed By: albanD

Differential Revision: D29353255

Pulled By: ngimel

fbshipit-source-id: 155a02242be5a11dcbd9dd40ab63f15c6757ae1b
2021-06-24 13:36:27 -07:00
a3ebc40bab Update intro doc for derivatives.yaml (#60614)
Summary:
Clarify some phrasing and document the findings on the different non differentiable states.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60614

Reviewed By: anjali411

Differential Revision: D29362740

Pulled By: albanD

fbshipit-source-id: 5bc2e8b8dde57ba5a9247d7c28b83c793703e35f
2021-06-24 13:20:40 -07:00
48509b1a9b Add exclusion list to _check_kernel_launches.py (#60562)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60562

Test Plan:
```
buck test //caffe2/test:kernel_launch_checks
```

Reviewed By: ngimel

Differential Revision: D29336561

fbshipit-source-id: 0cc101143d24e887e852bd6a9ab34ac43155eb63
2021-06-24 13:18:07 -07:00
a016150163 Move torch/lib/c10d to torch/csrc/distributed/c10d (#60543)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60543

Since now c10d is part of libtorch, it would also be nice if the sources lived all in one place.
ghstack-source-id: 132306292

Test Plan: It builds

Reviewed By: cbalioglu

Differential Revision: D29062002

fbshipit-source-id: d9e1301e9d73e1643fa0f0119cd2d618f1ad52e6
2021-06-24 12:38:51 -07:00
b8d7db3b31 Turn default kernels into Meyer singletons (#60568)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60568

https://github.com/pytorch/pytorch/pull/58661 induced a static
initialization order fiasco as flagged by ASAN strict_init_order=true.
On further inspection, it became clear that it was not necessary for
these to actually be globals initialized at module load time; so
I converted them into Meyer singletons which ensures they get loaded
immediately when another compilation unit requests them.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D29338019

Pulled By: ezyang

fbshipit-source-id: 282846118df6867277404a1830d0ce39fccaa769
2021-06-24 12:30:26 -07:00
4c00df12ec Include full Python version in collect_env.py output (#59632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59632

Before:

```
Python version: 3.7 (64-bit runtime)
```

After:

```
Python version: 3.7.7 (default, Mar 23 2020, 17:31:31)  [Clang 4.0.1 (tags/RELEASE_401/final)] (64-bit runtime)
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D28961500

Pulled By: ezyang

fbshipit-source-id: 0f95a49abf6977941f09a64243916576a820679f
2021-06-24 12:11:01 -07:00
d52ef2497a Python basic module execution unit test on delegation of backend_with_compiler_demo (#60468)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60468

Added a unit test for the execution of a basic module with a compiler
ghstack-source-id: 132307488

Test Plan:
Running python test/test_jit.py TestBackendsWithCompiler -v returns a successful test

Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D29306225

fbshipit-source-id: bf1ff075ebc63acbbe46d6ea030086405e29d7d3
2021-06-24 11:43:45 -07:00
b7298f499d Annotate NoneType as Optional[type] (#60383)
Summary:
------------
Infer NoneType as Optional[torch.Tensor] for monkeytype type inference

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60383

Test Plan:
------
python test/test_jit.py -k TestPDT.test_nonetype_as_optional_of_type

Reviewed By: gmagogsfm

Differential Revision: D29341513

Pulled By: nikithamalgifb

fbshipit-source-id: 9a96670cb5cf2560cd4e19962faef5fecea8b24a
2021-06-24 11:00:26 -07:00
5a077bb10b Optimize some redunction operators on CPU BFloat16 (#55202)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55202

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D28836790

Pulled By: VitalyFedyunin

fbshipit-source-id: f3a29633d85eb5a614652e568140e9b19509f959
2021-06-24 10:50:24 -07:00
4aff267072 Fix Windows error in distributed (#60167)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60167

We were getting errors such as this on Windows in our c10d ProcessGroup test suite:
```
  test_send_recv_all_to_all (__main__.ProcessGroupGlooTest) ... Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Jenkins\Miniconda3\lib\threading.py", line 932, in _bootstrap_inner
    self.run()
  File "C:\Jenkins\Miniconda3\lib\threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_distributed.py", line 471, in _event_listener
    if pipe.poll(None):
  File "C:\Jenkins\Miniconda3\lib\multiprocessing\connection.py", line 257, in poll
    return self._poll(timeout)
  File "C:\Jenkins\Miniconda3\lib\multiprocessing\connection.py", line 330, in _poll
    return bool(wait([self], timeout))
  File "C:\Jenkins\Miniconda3\lib\multiprocessing\connection.py", line 883, in wait
    ov.cancel()
OSError: [WinError 6] The handle is invalid
Fatal Python error: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads
Python runtime state: finalizing (tstate=000001EFDF228CE0)

Thread 0x00001f68 (most recent call first):
  File "C:\Jenkins\Miniconda3\lib\threading.py", line 1202 in invoke_excepthook
  File "C:\Jenkins\Miniconda3\lib\threading.py", line 934 in _bootstrap_inner
  File "C:\Jenkins\Miniconda3\lib\threading.py", line 890 in _bootstrap

Current thread 0x00000f94 (most recent call first):
<no Python frame>
FAIL (5.009s)
```
And the process would then exit with error code 3221226505.
See: https://app.circleci.com/pipelines/github/pytorch/pytorch/337351/workflows/ad919a3e-fe9a-4566-8ad6-8b0a252f730c/jobs/14170191/steps

By looking at [the code of `_event_listener` in `common_distributed.py`](eb36f67dcc/torch/testing/_internal/common_distributed.py (L467-L489)) I think that the first exception (the one about the handle being invalid) is "expected" as it results from another thread purposely closing the pipe while that thread is polling it.

The relevant part of the problem seems to be the "could not acquire lock" one. I think this stems from the event listener thread being launched as a daemon thread, which means the interpreter will not wait for that thread to complete before shutting down. When the interpreter shuts down it instantly aborts all other threads. If the event listener thread was aborter _while_ it was logging to stdout then that thread was holding the lock but never got to release it. This is probably what the error is complaining about. This seems to be intended/expected behavior for CPython: https://bugs.python.org/issue42717.

The solution thus is simple: don't make that thread a daemon thread and explicitly wait for it to terminate before shutting down.
ghstack-source-id: 132293710

Test Plan: Will see...

Reviewed By: pritamdamania87

Differential Revision: D29193014

fbshipit-source-id: 4aabe1fc74bf9c54ca605e7a702ac99655489780
2021-06-24 10:35:38 -07:00
f2f2f5bf20 .github: Zip test reports before uploading (#60475)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60475

Uploading many artifacts can cause issues with GHA backend leading to
errors on our side. To be safe let's zip our artifacts into one archive
so that we avoid uploading too many files at once.

See: https://github.com/actions/upload-artifact#too-many-uploads-resulting-in-429-responses

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: samestep

Differential Revision: D29307205

Pulled By: seemethere

fbshipit-source-id: da8c9957f88bdcc758969157ee696205db5d4dff
2021-06-24 10:30:51 -07:00
7e619b9588 First step to rearrange files in tools folder (#60473)
Summary:
Changes including:
- introduced `linter/`, `testing/`, `stats/` folders in `tools/`
- move appropriate scripts into these folders
- change grepped references in the pytorch/pytorch repo

Next step
- introduce `build/` folder for build scripts

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60473

Test Plan:
- CI (this is important b/c pytorch/test-infra also rely on some script reference.
- tools/tests/

Reviewed By: albanD

Differential Revision: D29352716

Pulled By: walterddr

fbshipit-source-id: bad40b5ce130b35dfd9e59b8af34f9025f3285fd
2021-06-24 10:13:58 -07:00
40d2fe1053 correct filename issue for test_cpp_extensions_aot (#60604)
Summary:
Using file copy to make actual ninja vs. no_ninja suffixed python test files.
This is to trick xmlrunner to report test cases in the correct folder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60604

Test Plan:
- CI reports correctly into the corresponding folders
- If download the test statistics, calculate shards now doesn't need custom logic to handle `test_cpp_extensions_aot`

CI result shown it is working properly:
https://github.com/pytorch/pytorch/pull/60604/checks?check_run_id=2900038654 vs
https://github.com/pytorch/pytorch/pull/60604/checks?check_run_id=2900038673

Reviewed By: albanD

Differential Revision: D29349562

Pulled By: walterddr

fbshipit-source-id: e86e6bc0db288a2a57bea3c5f8edf03be1773944
2021-06-24 09:20:19 -07:00
9cab894367 Fix build_only for libtorch (#60615)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60605

We have the `build_only` defined, but the config.yml doesn't have the parameter, this PR fixed that. As a result, the docker image push will be skipped

```
// in config.yml

if [ -z "${BUILD_ONLY}" ]; then
```

```
            ("11.1", [
                ("3.8", [
                    ("shard_test", [XImportant(True)]),
                    ("libtorch", [
                        (True, [
                            ('build_only', [X(True)]),
                        ]),
                    ]),
                ]),
            ]),
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60615

Reviewed By: albanD

Differential Revision: D29351567

Pulled By: zhouzhuojie

fbshipit-source-id: dab78bb91f62e8bed47739377987167fea1602cb
2021-06-24 09:11:54 -07:00
eddc5f40f9 Added GLU and FeatureAlphaDropout to nn docs (#60590)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60563 and https://github.com/pytorch/pytorch/issues/60570

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60590

Reviewed By: albanD

Differential Revision: D29352372

Pulled By: jbschlosser

fbshipit-source-id: f81dd65deab1848a68dc202df252c416ce5214d0
2021-06-24 08:00:18 -07:00
204da12592 Reduce number of CEX when passing Tensors to Python (#60546)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60546

Before, we assume conservatively that any Tensor passed to
THPVariable_Wrap could be aliased in another thread and therefore race.
However, THPVariable_Wrap takes in Variable by value; and so if
use_count() <= 1, it is impossible for another thread to have a
reference to it.  So we can conclude that it is definitely uninitialized
if the quick test fails!

Thanks bdhirsh for pointing out the optimization opportunity here.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29331718

Pulled By: ezyang

fbshipit-source-id: e100796fbc55a0af2c6565c6fbc9ddc8ae7ceb42
2021-06-24 07:40:39 -07:00
bdb964f89f Support RRefs that contain threading.Locks (#57943)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57943

This is a common scenario (our own tutorials propose it), hence we should ensure it works.

A more generic solution is desirable, but this should fix the immediate concern.
ghstack-source-id: 132289683

Test Plan: Added a test

Reviewed By: mrshenli

Differential Revision: D28316076

fbshipit-source-id: 64e9766189f40474298876227ea247ce5b699d97
2021-06-24 06:36:09 -07:00
4e347f1242 [docs] Fix backticks in docs (#60474)
Summary:
There is a very common error when writing docs: One forgets to write a matching `` ` ``, and something like ``:attr:`x`` is rendered in the docs. This PR fixes most (all?) of these errors (and a few others).

I found these running ``grep -r ">[^#<][^<]*\`"`` on the `docs/build/html/generated` folder. The regex finds an HTML tag that does not start with `#` (as python comments in example code may contain backticks) and that contains a backtick in the rendered HTML.

This regex has not given any false positive in the current codebase, so I am inclined to suggest that we should add this check to the CI. Would this be possible / reasonable / easy to do malfet ?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60474

Reviewed By: mrshenli

Differential Revision: D29309633

Pulled By: albanD

fbshipit-source-id: 9621e0e9f87590cea060dd084fa367442b6bd046
2021-06-24 06:27:41 -07:00
bb9e1150ea Revert D29342234: [pytorch][PR] [CUDA graphs][BC-breaking] Removes post-backward syncs on default stream
Test Plan: revert-hammer

Differential Revision:
D29342234 (675cea1adb)

Original commit changeset: 98e6be7fdd85

fbshipit-source-id: 84022973248b2254210eee57402df2c4f4bc43c6
2021-06-24 04:49:28 -07:00
2b72068a68 Make Future store Storages instead of references to DataPtrs (#60470)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60470

A Future needs to know what DataPtrs are used by its value, but it isn't always able to extract them (and even when it is, that's expensive) so they're cached. DataPtrs are kinda like unique_ptrs (movable only, cannot be copied) hence the Future can only hold _references_ to them. The Future's value, however, is unfortunately mutable (we'd wish that weren't the case, but we don't think we can prevent that), which means the tensor/storage that owned that DataPtr might be deleted and thus the DataPtr could be freed. This means our cached reference becomes stale! Which leads to all kinds of disaster, like reading garbage data or segfaulting.

Luckily all the DataPtrs we were dealing with were held inside Storages, which have a shared_ptr semantics, thus allowing us to hold a strong pointer to them which ensures they're kept alive.

ghstack-source-id: 132177396

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D29303570

fbshipit-source-id: d814754806fa58b24e45269e97d768485ef972ba
2021-06-24 03:56:04 -07:00
06e6d63187 Use a no-warning registry for TensorPipe backends (#60457)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60457

The "without warning" variants of the registry were introduced in https://github.com/pytorch/pytorch/pull/31126 to be used in Gloo for the exact same reason: we use a registry precisely so that backends can be overridden, no need to scare users with a warning.
ghstack-source-id: 132051268

Test Plan: Rebuilt and re-run

Reviewed By: mrshenli

Differential Revision: D29293840

fbshipit-source-id: 3450e547056b2c534166972e8266dab5479d5e43
2021-06-24 03:27:04 -07:00
d3a8505ee1 [jit] Added a pass to transform aten::cat ops to prim::Concat op with variable number of inputs (#59881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59881

This pass is not included in the JIT flow or anywhere else at this point. The idea is, once this lands, everyone can use this to test their workflow with this transformation and once we are convinced this is useful and/or improves performance, we can include it in the appropriate workflow.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29277876

Pulled By: navahgar

fbshipit-source-id: b5be7bdcc98dced59295bd7b8f6627619cb58d41
2021-06-24 01:27:41 -07:00
c35a3dd6f2 [jit] Added a new operator for concat that takes in variadic parameters (#59880)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59880

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29277877

Pulled By: navahgar

fbshipit-source-id: 6db24e7432f683a1d1466f9778201e0aa5d3b1ad
2021-06-24 01:26:22 -07:00
dfd2edc025 [special] add zeta (#59623)
Summary:
Reference https://github.com/pytorch/pytorch/issues/50345

`zeta` was already present in the codebase to support computation of `polygamma`.

However, `zeta` only had `double(double, double)` signature **for CPU** before the PR (which meant that computation `polygamma` were always upcasted to `double` for zeta part).

With this PR, float computations will take place in float and double in double.

Have also refactored the code and moved the duplicate code from `Math.cuh` to `Math.h`

**Note**: For scipy, q is optional, and if it is `None`, it defaults `1` which corresponds to Reimann-Zeta. However, for `torch.specia.zeta`, I made it mandatory cause for me it feels odd without `q` this is Reimann-Zeta and with `q` it is the general Hurwitz Zeta. I think sticking to just general made more sense as passing `1` for q sounds trivial.

Verify:
* [x] Docs https://14234587-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.zeta

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59623

Reviewed By: ngimel

Differential Revision: D29348269

Pulled By: mruberry

fbshipit-source-id: a3f9ebe1f7724dbe66de2b391afb9da1cfc3e4bb
2021-06-24 00:00:12 -07:00
26cdec6ce4 Support torch.bitwise_{left/right}_shift and __rlshift__, __rrshift__ (#59544)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58121

This PR implements `torch.bitwise_left_shift` and `torch.bitwise_right_shift` and `torch.Tensor.{__rlshift__/__rrshift__}`for compatibility with Python array API standard.
(cc: mruberry, rgommers, emcastillo, kmaehashi)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59544

Reviewed By: ngimel

Differential Revision: D29348869

Pulled By: mruberry

fbshipit-source-id: 329aee296cf890735e8a9f858bccfe87c03d06ca
2021-06-23 23:57:16 -07:00
b82453cbd4 Run dist_autograd backward RPCs on appropriate CUDA streams. (#60606)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60606

TensorPipe receives tensors over the wire on custom streams and these
streams are passed to some RPC callbacks but not to `BACKWARD_AUTOGRAD_REQ`. As a
result, `BACKWARD_AUTOGRAD_REQ` ran on the default stream while still using
tensors from the custom stream. This resulted in downstream autograd operations
running on the incorrect stream.

To fix this, I've passed the streams to `BACKWARD_AUTOGRAD_REQ` as well and
added an appropriate guard.

#Closes: https://github.com/pytorch/pytorch/issues/59793
ghstack-source-id: 132252069

Test Plan: Test https://github.com/pytorch/pytorch/issues/59793

Reviewed By: mrshenli

Differential Revision: D29347244

fbshipit-source-id: 8ff8b150763c970ab15c2cac8dccf56e66e9ef5d
2021-06-23 23:52:22 -07:00
675cea1adb [CUDA graphs][BC-breaking] Removes post-backward syncs on default stream (#60421)
Summary:
Before https://github.com/pytorch/pytorch/pull/57833, calls to backward() or grad() synced only the calling thread's default stream with autograd leaf streams at the end of backward. This made the following weird pattern safe:
```python
with torch.cuda.stream(s):
    # imagine forward used many streams, so backward leaf nodes may run on many streams
    loss.backward()
# no sync
use grads
```

but a more benign-looking pattern was unsafe:
```python
with torch.cuda.stream(s):
    # imagine forward used a lot of streams, so backward leaf nodes may run on many streams
    loss.backward()
    # backward() syncs the default stream with all the leaf streams, but does not sync s with anything,
    # so counterintuitively (even though we're in the same stream context as backward()!)
    # it is NOT SAFE to use grads here, and there's no easy way to make it safe,
    # unless you manually sync on all the streams you used in forward,
    # or move "use grads" back to default stream outside the context.
    use grads
```
mruberry ngimel and I decided backward() should have the [same user-facing stream semantics as any cuda op](https://pytorch.org/docs/master/notes/cuda.html#stream-semantics-of-backward-passes).** In other words, the weird pattern should be unsafe, and the benign-looking pattern should be safe. Implementationwise, this meant backward() should sync its calling thread's current stream, not default stream, with the leaf streams.

After https://github.com/pytorch/pytorch/pull/57833, backward syncs the calling thread's current stream AND default stream with all leaf streams at the end of backward. The default stream syncs were retained for temporary backward compatibility.

This PR finishes https://github.com/pytorch/pytorch/pull/57833's work by deleting syncs on the default stream.

With this PR, graph-capturing an entire backward() call should be possible (see the [test_graph_grad_scaling diffs](https://github.com/pytorch/pytorch/compare/master...mcarilli:streaming_backwards_remove_default_syncs?expand=1#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3641-R3642)).

** first paragraph has a formatting error which this PR should also fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60421

Reviewed By: VitalyFedyunin, albanD

Differential Revision: D29342234

Pulled By: ngimel

fbshipit-source-id: 98e6be7fdd8550872f0a78f9a66cb8dfe75abf63
2021-06-23 23:35:24 -07:00
00896cb9ed [caffe2] update db::Transaction::Put() to accept the value by rvalue reference (#60208)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60208

Update the DB APIs so that `db::Transaction::Put()` accepts the value by
rvalue reference.  This allows DB implementations to write data asynchronously
without being forced to make an additional copy of the data in memory.
`Put()` implementations can now use the string move constructor or assignment
operator to get the string data and continue performing the write
asynchronously after returning from `Put()`.

Note that I chose to entirely replace the existing `Put()`, removing the
ability for callers to call `Put()` with a `const std::string&` argument for
the value, rather than simply adding another overloaded version of `Put()`.

This was done because in practice there were no call sites using `Put()` that
cannot move in their data.  Eliminating the `const std::string&` API entirely
simplifies the DB implementations: DBs that wish do support move semantics do
not have to implement both the move and the copy versions of `Put()`.

Test Plan:
Searched through fbcode to try and make sure I found all `db::Transaction`
subclasses, and will check sandcastle results to help confirm.

Ran the modelstore checkpointing unit tests.

Differential Revision: D29204425

fbshipit-source-id: 28be6646e92e5df71954d4bb3dc0c8add30ed041
2021-06-23 22:12:53 -07:00
b09c0b6550 [caffe2] update the BlobSerializer acceptor to allow moving in the data (#60207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60207

Update the `BlobSerializerBase` API so that the serizialized blob data is
passed as a `std::string&&` rather than `const std::string&`.  This allows the
acceptor to take ownership of the string data.  This allows the acceptor to do
things like queue it for storing asynchronously, rather than having to make a
copy of the data if they need it to remain valid after returning.

All existing `BlobSerializerBase` implementations already pass in a valid
rvalue reference to the data, so this change did not require updating any of
the existing serializer implementations.
ghstack-source-id: 132216750

Test Plan:
Examined all ~46 `BlobSerializerBase` subclasses in fbsource to confirm they
already pass in an rvalue reference for this argument.  Also searched for
`BlobSerializerBase` on google and did not find any external references to
this class in other open source projects that might be affected.

Differential Revision: D29204426

fbshipit-source-id: b1d567e52a5c17a01d651c70bbfa2fddbaea6cd9
2021-06-23 22:11:42 -07:00
6ea22672c4 add support for sparse tensors in torch.testing.assert_close (#58844)
Summary:
This adds support for sparse tensors the same way `torch.testing._internal.common_utils.TestCase.assertEqual` does:

5c7dace309/torch/testing/_internal/common_utils.py (L1287-L1313)

- Tensors are coalesced before comparison.
- Indices and values are compared individually.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58844

Reviewed By: zou3519

Differential Revision: D29160250

Pulled By: mruberry

fbshipit-source-id: b0955656c2c7ff3db37a1367427ca54ca14f2e87
2021-06-23 21:59:01 -07:00
80f40b172f [Model Averaging] Periodic model averager (#60320)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60320

This averager can be used for post-local SGD.
ghstack-source-id: 131908011

Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_periodic_model_averager

Reviewed By: rohan-varma

Differential Revision: D29249850

fbshipit-source-id: 09675d6bb1edfb8ffbeb94510d91962532d8ca3e
2021-06-23 20:23:04 -07:00
4e51503b1f DOC Improves input and target docstring for loss functions (#60553)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56581

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60553

Reviewed By: VitalyFedyunin

Differential Revision: D29343797

Pulled By: jbschlosser

fbshipit-source-id: cafc29d60a204a21deff56dd4900157d2adbd91e
2021-06-23 20:20:29 -07:00
6d1b4642f0 DOC Describes parameters/buffers registered as None in load_state_dict (#60549)
Summary:
Related to https://github.com/pytorch/pytorch/issues/8104

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60549

Reviewed By: VitalyFedyunin

Differential Revision: D29343732

Pulled By: jbschlosser

fbshipit-source-id: ef5ba3094c8eaf2f9c8efeba6a9d9ab52ebf8b2c
2021-06-23 20:15:22 -07:00
1e31d26b1d [Static Runtime] Fix bugs in static_runtime::to_copy (#60503)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60503

Fixed a few issues in the static_runtime::to_copy impl:
- fixed a bug with memory_format
- copy strides when appropriate. This is necessary to make sure that the fbgemm path in the copy kernel gets hit.
- fix the schema in the `ReplaceWithCopy` pass
- add registration of `static_runtime::to_copy.other`

Add more unit tests:
- test dynamic shapes
- test strided input tensor to `aten::to`
- test alias case (same input/output)
- test `to.other`

Reviewed By: ajyu

Differential Revision: D26838933

fbshipit-source-id: ec0d1a2deebe998fcfe8858e772e1ef429cb4522
2021-06-23 19:57:17 -07:00
d200e9de26 [Static Runtime] Test for dynamic shapes in SR unit tests (#60579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60579

- Modify testStaticRuntime to take two sets of inputs so if the second set of inputs have bigger shapes, it would trigger memory allocations in resize_ calls.
- Modify test scripts so that the output of the test op is managed by the memory planner, as explained in comments.

Reviewed By: ajyu

Differential Revision: D29221452

fbshipit-source-id: 09f0f7eb384dc8ca67594f1fa76e1e31392ee6ca
2021-06-23 19:56:05 -07:00
99b641169b Migrates nll_loss_forward from TH to Aten (CUDA) (#60097)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24610
Aten Umbrella issue https://github.com/pytorch/pytorch/issues/24507
Related to https://github.com/pytorch/pytorch/issues/59765

The performance does not change between this PR and master with the following benchmark script:

<details>
 <summary>Benchmark script</summary>

```python
import torch
import torch.nn as nn
import time

torch.manual_seed(0)

def _time():
    torch.cuda.synchronize()
    MS_PER_SECOND = 1000
    return time.perf_counter() * MS_PER_SECOND

device = "cuda"
C = 30
softmax = nn.LogSoftmax(dim=1)
n_runs = 250

for reduction in ["none", "mean", "sum"]:
    for N in [100_000, 500_000, 1_000_000]:
        fwd_t = 0
        bwd_t = 0
        data = torch.randn(N, C, device=device)
        target = torch.empty(N, dtype=torch.long, device=device).random_(0, C)
        loss = nn.NLLLoss(reduction=reduction)
        input = softmax(data)

        for i in range(n_runs):
            t1 = _time()
            result = loss(input, target)
            t2 = _time()
            fwd_t = fwd_t + (t2 - t1)
        fwd_avg = fwd_t / n_runs
        print(
            f"input size({N}, {C}), reduction: {reduction} "
            f"forward time is {fwd_avg:.2f} (ms)"
        )
    print()
```

</details>

## master

```
input size(100000, 30), reduction: none forward time is 0.02 (ms)
input size(500000, 30), reduction: none forward time is 0.08 (ms)
input size(1000000, 30), reduction: none forward time is 0.15 (ms)

input size(100000, 30), reduction: mean forward time is 1.81 (ms)
input size(500000, 30), reduction: mean forward time is 8.24 (ms)
input size(1000000, 30), reduction: mean forward time is 16.46 (ms)

input size(100000, 30), reduction: sum forward time is 1.66 (ms)
input size(500000, 30), reduction: sum forward time is 8.24 (ms)
input size(1000000, 30), reduction: sum forward time is 16.46 (ms)
```

## this PR

```
input size(100000, 30), reduction: none forward time is 0.02 (ms)
input size(500000, 30), reduction: none forward time is 0.08 (ms)
input size(1000000, 30), reduction: none forward time is 0.15 (ms)

input size(100000, 30), reduction: mean forward time is 1.80 (ms)
input size(500000, 30), reduction: mean forward time is 8.24 (ms)
input size(1000000, 30), reduction: mean forward time is 16.46 (ms)

input size(100000, 30), reduction: sum forward time is 1.66 (ms)
input size(500000, 30), reduction: sum forward time is 8.24 (ms)
input size(1000000, 30), reduction: sum forward time is 16.46 (ms)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60097

Reviewed By: mrshenli

Differential Revision: D29303099

Pulled By: ngimel

fbshipit-source-id: fc0d636543a79ea81158d286dcfb84043bec079a
2021-06-23 19:47:01 -07:00
ef84bcfee6 Convert floating-point constants to T in Bessel functions (#59416)
Summary:
If T is float, many of the computations are more expensive than
expected. Compilers may be reluctant to optimize because they often lead
to different outcome. Converting many constants to T before using them
to clear any doubt.

Benchmark: (Debian 11, no turbo, Release build, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz, gcc 10.2.1)

```python
import timeit
for dtype in ('torch.float',):
    for func in ('i0', 'i0e', 'i1', 'i1e'):
        for n, t in [(10_000, 10000),
                    (100_000, 1000)]:
            print(f'torch.special.{func}(torch.arange(n, dtype=torch.float32)), n = {n} for {t} times, dtype={dtype}')
            print(timeit.timeit(f'torch.special.{func}(a)', setup=f'import torch; a = torch.arange({n}, dtype=torch.float32)', number=t))
```

Before:

```
torch.special.i0(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float
1.539132010017056
torch.special.i0(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float
0.9613071230123751
torch.special.i0e(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float
4.32450835997588
torch.special.i0e(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float
1.5751779029960744
torch.special.i1(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float
1.0810036820184905
torch.special.i1(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float
0.5314770240220241
torch.special.i1e(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float
0.41711462699458934
torch.special.i1e(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float
0.1759720179834403
```

After:

```
torch.special.i0(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float
1.337154256994836
torch.special.i0(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float
0.8640981369826477
torch.special.i0e(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float
4.308618158014724
torch.special.i0e(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float
1.5217605629877653
torch.special.i1(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float
0.9398589830088895
torch.special.i1(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float
0.4667845010117162
torch.special.i1e(torch.arange(n, dtype=torch.float32)), n = 10000 for 10000 times, dtype=torch.float
0.3658539849857334
torch.special.i1e(torch.arange(n, dtype=torch.float32)), n = 100000 for 1000 times, dtype=torch.float
0.15680673700990155
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59416

Reviewed By: anjali411

Differential Revision: D29249897

Pulled By: mruberry

fbshipit-source-id: c170e78f2ab47176ea95b8442c6279d7ec1d75c2
2021-06-23 19:43:27 -07:00
08020220f3 [Testing] Adding reference tests to OpInfo class (#59369)
Summary:
This PR will ideally add `ref` argument to `OpInfo` base class. The idea is to add reference checks for all the ops _eligible_. For more discussion, please check https://github.com/pytorch/pytorch/issues/58294

* [x] Migrate (but not removing yet) and modify helper functions from `UnaryUfuncOpInfo` class to `OpInfo` base class.
* [x] Test the reference checks for multiple ops. (also decide a list of different and eligible ops for this)
* [x] Handle possible edge cases (for example: `uint64` isn't implemented in PyTorch but is there in NumPy, and this needs to be handled -- more on this later) -- _Update_: We decided that these reference tests should only test for values and not types.
* [x] Create a sample PR for a single (of all different categories?) on adding reference functions to the eligible ops. -- _Update_: This is being done in this PR only.
* [x] ~Remove reference tests from `test_unary_ufuncs.py` and test to make sure that nothing breaks.~ (*Update*: We won't be touching Unary Ufunc reference tests in this PR)
* [x] Add comments, remove unnecessary prints/comments (added for debugging).

Note: To keep the PR description short, examples of edge cases encountered have been mentioned in the comments below.

cc: mruberry pmeier kshitij12345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59369

Reviewed By: ngimel

Differential Revision: D29347252

Pulled By: mruberry

fbshipit-source-id: 69719deddb1d23c53db45287a7e66c1bfe7e65bb
2021-06-23 19:26:08 -07:00
236d3afd82 manual revert of 57575 (#60572)
Summary:
manually reverting 57575 while keeping 57574 since it's fixing a bug: https://github.com/pytorch/pytorch/issues/55609
Sandcastle couldn't do it automatically

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60572

Reviewed By: driazati

Differential Revision: D29342473

Pulled By: Krovatkin

fbshipit-source-id: 66ad7d316984a13d203158ceba9706a5f451f9b2
2021-06-23 19:21:48 -07:00
9e773ea7d5 Use accscalar_t for CUDA add/sub with Tensor and Scalar (#60454)
Summary:
Follow up of https://github.com/pytorch/pytorch/issues/60227, related to https://github.com/pytorch/pytorch/issues/59907 & https://github.com/pytorch/pytorch/issues/58833

With this pull request, `torch.add` & `torch.sub` use `acc_type` for `Scalar` if either of two arguments is `Scalar`.
This mimics the behavior of [`torch.mul`](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu#L18), `torch._foreach_(add|sub).Scalar` and `torch._foreach_(add|sub).ScalarList`.

 ---

**reference**
- torch.mul CUDA kernel: b0c9762e2d/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu (L17-L25)
- `torch._foreach_(add|sub).Scalar`: cast scalar b0c9762e2d/aten/src/ATen/native/cuda/ForeachBinaryOpScalar.cu (L27)
- `torch._foreach_(add|sub).ScalarList`: `BinaryOpScalarListFunctor` b0c9762e2d/aten/src/ATen/native/cuda/ForeachFunctors.cuh (L180-L182) and multi_tensor_apply handles `scalar_t` and computes `opmath_t` (almost equivalent `accscalar_t`)  b0c9762e2d/aten/src/ATen/native/cuda/MultiTensorApply.cuh (L60-L68). BinaryOpScalarListFunctor
is used b0c9762e2d/aten/src/ATen/native/cuda/ForeachBinaryOpScalarList.cu (L24)

cc ngimel ptrblck mcarilli

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60454

Reviewed By: VitalyFedyunin

Differential Revision: D29345035

Pulled By: ngimel

fbshipit-source-id: 5dbafbdfe029a9544ec2e58f17d547928e017a04
2021-06-23 18:59:22 -07:00
af66824c1f [torch][segment_reduce] Add support for sum and min reductions (#60379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60379

This concludes the support for all reductions types initially planned (min, max, mean, sum).

Next Steps:
- Cleanups
       -  update default values when length is 0 and initial not given
       - templatize the code to avoid branching with every item.( and other known improvements)
- more unit tests, verification
- benchmarking

Test Plan: updated unit tests.

Reviewed By: ngimel

Differential Revision: D29268218

fbshipit-source-id: c77d91671e01dcf96c18c758fa3ea522b2e13db9
2021-06-23 18:51:44 -07:00
63219f1f9f To add Rectified Adam Algorithm to Optimizers (#58968)
Summary:
Fixes : https://github.com/pytorch/pytorch/issues/24892

In the paper : https://arxiv.org/pdf/1908.03265.pdf  Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm.

It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process.

Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high.

Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well :

2f03dd1970/radam/radam.py (L156)

f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968

Reviewed By: vincentqb

Differential Revision: D29310601

Pulled By: iramazanli

fbshipit-source-id: b7bd487f72f1074f266687fd9c0c6be264a748a9
2021-06-23 18:27:57 -07:00
5a2f41a2db [torch/distributed.elastic] Fix utils.distributed_test.test_create_store_timeout_on_server to be dual-stack ip compatible (#60558)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60558

Fixes 1/2 flaky tests as described in: https://github.com/pytorch/pytorch/issues/60260

`test_create_store_timeout_on_server` tests whether trying to create a `c10d::TCPStore` server on an already taken port actually fails with an `IOError`. Prior to this change the `utils.get_socket_with_port()` util method was used to synthetically reserve a port, then try creating the `TCPStore` on that port to validate the `IOError`. The issue with this is that on a dual stack ip setup, `get_socket_with_port()` (since it uses `socket.AF_UNSPEC`) reserves an ipv6 port, while `TCPStore` will try binding to an ipv4 port, so an `IOError` is not observed.

Changing the logic of the test to create two `TCPStore` servers. The first chooses a free port (by passing `server_port=0`) while the second tries to create a `TCPStore` server on the port that the first store is already running on. This would induce an `IOError` on the second store's constructor.

NOTE: this change does not solve another broader issue with `TCPStore` where the server and workers can listen and connect on ipv4 vs ipv6 when they are running on dual-stak ip hosts without ipv4 DNS entry and/or a `/etc/gai.conf` specifying the preferred bind ordering. See: https://github.com/pytorch/pytorch/pull/49124

Test Plan:
```
buck test //caffe2/test/distributed/elastic/utils:distributed_test
```

Reviewed By: cbalioglu

Differential Revision: D29334947

fbshipit-source-id: 76b998c59082cb04c0e86b7a1f3b509367fa0136
2021-06-23 17:12:18 -07:00
1a0058f593 [nnc] Merge inconsistent profiling information (#60510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60510

We encountered a situation where loop unrolling caused us to duplicate
profiled tensor types in a manner that wasn't logically consistent (see the
attached test case).  When applying this profiling information, we need to
merge the profiled types so that we use a conservative (unspecialized) type.
ghstack-source-id: 132160002

Test Plan: new unit test, plus local predictor using P424983338

Reviewed By: Krovatkin

Differential Revision: D29322487

fbshipit-source-id: 4c18ee69c71bb0622c2e6f6aa361ab5613cbaca4
2021-06-23 17:05:32 -07:00
b5b42d4ce2 [iOS GPU] Add tests for RoIAlign (#60595)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60595

ghstack-source-id: 132245331

Test Plan: CI

Reviewed By: husthyc

Differential Revision: D29345400

fbshipit-source-id: 7406edee232a0ab7b40a4820e3ff9ac07871cdd4
2021-06-23 16:26:53 -07:00
1120a1b92e [quant][fx][fix] QAT with object_type in qconfig (#60555)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60555

When we do QAT, we swap the FP32 modules with the corresponding quantized modules counterpart by calling `qat_swap_modules` in prepare.
However when we try to look up using the swapped module type in qconfig_dict, we cannot find a match anymore since the qconfig dict contains the original
module type.

In this PR we update the qconfig_dict to include the modules swapped for QATT

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qconfig_qat_module_type

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29337036

fbshipit-source-id: 60212eec3ee252a2445c1b58874cb36048c9f7dd
2021-06-23 15:55:25 -07:00
d867340c7b [nnc] Add LoopNest::getLoopAt to retrieve a specified inner For-stmt (#60569)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60569

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D29337767

Pulled By: huiguoo

fbshipit-source-id: e3ae23c1b290739c03d1fa5d7da25de878eb1d4c
2021-06-23 15:53:29 -07:00
c0d08dc10f [NNC] Add tile transformation in loopnest (fixed #52785) (#57758)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57758

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D28260744

Pulled By: huiguoo

fbshipit-source-id: 6b5591850aaf46455bf3c2d776fa930654839a63
2021-06-23 15:52:19 -07:00
aeea5bf4a1 [Model Averaging] Provide a util function for model averaging (#60303)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60303

The util function can be used for averaging parameters.

More optimizations can be done in the future.
ghstack-source-id: 132214212

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_average_parameters
buck test mode/dev-nosan caffe2/test/distributed:distributed_gloo_fork -- test_average_parameters

Reviewed By: rohan-varma

Differential Revision: D29242806

fbshipit-source-id: 76fb5a92adb4bdc6151a9f411e366a0ed2a31f47
2021-06-23 15:41:15 -07:00
b770c4b61a Fix ZeRO sort to be by numel (#60556)
Summary:
**Overview:**
This is a follow-up to [this PR](https://github.com/pytorch/pytorch/pull/59586) and corrects the ZeRO partitioning algorithm to sort by the number of elements in the tensor rather than the size of the first dimension. As context, that PR was meant to migrate from using a _naive greedy_ algorithm to a _sorted-greedy_ algorithm when partitioning parameters in ZeRO.

**Updated Results:**
The updated table for the partitions can be found [here](https://github.com/pytorch/pytorch/pull/59410#issuecomment-865203219). There, I also considered a third algorithm (sometimes known as multifit), which is more computationally expensive than the greedy and sorted-greedy algorithms but cannot perform worse. However, because of its increased complexity and lack of improved results, I chose to settle with the simpler sorted-greedy algorithm.

The `step()` latencies show slight improvements, but the improvements may be in the noise. The values below are in seconds and were generated using NCCL backend (unlike in the previous PR which used Gloo):

Two processes:
| Model | Max `optimizer.step()` Time - Greedy (Std.) | Max `optimizer.step()` Time - Sorted-Greedy (Std.) |
| --- | --- | --- |
| ResNet-50 | 0.047 (0.00142) | **0.044 (0.00025)** |
| ResNet-152 | 0.057 (0.00034) | **0.054 (0.00022)** |
| BERT | 0.021 (0.00008) | **0.020 (0.00008)** |

Four processes:
| Model | Max `optimizer.step()` Time - Greedy | Max `optimizer.step()` Time - Sorted-Greedy (Std.) |
| --- | --- | --- |
| ResNet-50 | 0.019 (0.00065) | **0.013 (0.00040)** |
| ResNet-152 | 0.045 (0.00024) | 0.045 (0.00025) |
| BERT | 0.019 (0.00022) | **0.018 (0.00016)** |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60556

Test Plan:
I verified that the ZeRO tests pass (via the AI AWS cluster):
```
srun -p $DEV_QUEUE --cpus-per-task=16 -t 5:00:00 --gpus-per-node=4 python test/distributed/optim/test_zero_redundancy_optimizer.py
```

Reviewed By: VitalyFedyunin

Differential Revision: D29335260

Pulled By: andwgu

fbshipit-source-id: 469d1c6e029b77c1b300a94cd1fd94b633cd28dd
2021-06-23 15:22:36 -07:00
1054ad5af3 Add back smoke tests for windows shard 1 for CircleCI (#60571)
Summary:
The reason I removed the smoke tests here were because we didn't have gflags on our GHA runners and we wanted to get sharding done sooner rather than later.

However, we shouldn't remove these tests for windows as they are important for debugging linker issues with torch. Thus, this is step 1 in adding the tests back.

Next step:
- add gflags to base ami
- remove the exist check

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60571

Test Plan: CI shouldn't break

Reviewed By: walterddr

Differential Revision: D29341850

Pulled By: janeyx99

fbshipit-source-id: 7e0c98887534d096f867e28a5482b32aa493b132
2021-06-23 14:52:14 -07:00
555c154df5 Use asyncio in tools/clang_tidy.py (#60495)
Summary:
This replaces Ninja for parallel builds with asyncio which is more idiomatic Python + easier to debug when things go wrong since the data never leaves Python.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60495

Reviewed By: bhosmer

Differential Revision: D29315526

Pulled By: driazati

fbshipit-source-id: 196b1807fe4ee6db432d5fef146e52f96939b44d
2021-06-23 14:18:03 -07:00
2dedd96dd2 cmake: Prefer CMAKE_CURRENT_SOURCE_DIR to TORCH_SRC_DIR (#60493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60493

TORCH_SRC_DIR appears to be a bit bugged when it comes to identifying
include directories so let's try and use CMAKE_CURRENT_SOURCE_DIR
instead

<details>
<summary>Logs for builds with torchaudio</summary>

```
-- Building version 0.10.0a0+9e36281
running bdist_wheel
running build
running build_py
copying torchaudio/version.py -> build/lib.linux-x86_64-3.6/torchaudio
running build_ext
-- Configuring done
-- Generating done
-- Build files have been written to: /home/eliuriegas/work/audio/build/temp.linux-x86_64-3.6
[1/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o -c ../../third_party/kaldi/submodule/src/base/kaldi-error.cc
[2/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o -c ../../third_party/kaldi/submodule/src/base/kaldi-math.cc
[3/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o -c ../../third_party/kaldi/submodule/src/feat/feature-functions.cc
[4/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o -c ../../third_party/kaldi/src/matrix/kaldi-matrix.cc
[5/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o -c ../../third_party/kaldi/submodule/src/feat/resample.cc
[6/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o -c ../../third_party/kaldi/src/matrix/kaldi-vector.cc
[7/11] /usr/lib64/ccache/c++ -DINCLUDE_KALDI -DTORCH_API_INCLUDE_EXTENSION_H -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_torchaudio_EXPORTS -I../../ -I/tmp/tmp.GKeM3KKcFi/include/python3.6m -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o -MF torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o.d -o torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o -c ../../torchaudio/csrc/kaldi.cpp
[8/11] /usr/lib64/ccache/c++ -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -I../../third_party/kaldi/src -I../../third_party/kaldi/submodule/src -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include -isystem /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/include/breakpad -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility=hidden -O3 -DNDEBUG -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -std=gnu++14 -MD -MT third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o -MF third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o.d -o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o -c ../../third_party/kaldi/submodule/src/feat/pitch-functions.cc
../../third_party/kaldi/submodule/src/feat/pitch-functions.cc: In member function ‘void kaldi::OnlinePitchFeatureImpl::UpdateRemainder(const kaldi::VectorBase<float>&)’:
../../third_party/kaldi/submodule/src/feat/pitch-functions.cc:814:11: warning: unused variable ‘full_frame_length’ [-Wunused-variable]
  814 |     int32 full_frame_length = opts_.NccfWindowSize() + nccf_last_lag_;
      |           ^~~~~~~~~~~~~~~~~
../../third_party/kaldi/submodule/src/feat/pitch-functions.cc: In member function ‘void kaldi::OnlineProcessPitch::UpdateNormalizationStats(kaldi::int32)’:
../../third_party/kaldi/submodule/src/feat/pitch-functions.cc:1504:35: warning: comparison of integer expressions of different signedness: ‘std::vector<kaldi::OnlineProcessPitch::NormalizationStats>::size_type’ {aka ‘long unsigned int’} and ‘kaldi::int32’ {aka ‘int’} [-Wsign-compare]
 1504 |   if (normalization_stats_.size() <= frame)
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~
[9/11] : && /usr/bin/cmake -E rm -f third_party/kaldi/libkaldi.a && /usr/bin/ar qc third_party/kaldi/libkaldi.a  third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-vector.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/src/matrix/kaldi-matrix.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-error.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/base/kaldi-math.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/feature-functions.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/pitch-functions.cc.o third_party/kaldi/CMakeFiles/kaldi.dir/submodule/src/feat/resample.cc.o && /usr/bin/ranlib third_party/kaldi/libkaldi.a && :
[10/11] : && /usr/lib64/ccache/c++ -fPIC -Wall -D_GLIBCXX_USE_CXX11_ABI=1 -O3 -DNDEBUG   -shared -Wl,-soname,_torchaudio.so -o torchaudio/csrc/_torchaudio.so torchaudio/csrc/CMakeFiles/_torchaudio.dir/pybind.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/lfilter.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/overdrive.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/utils.cpp.o torchaudio/csrc/CMakeFiles/_torchaudio.dir/kaldi.cpp.o  -Wl,-rpath,/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib:  /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so  /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch_python.so  third_party/kaldi/libkaldi.a  /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch.so  -Wl,--no-as-needed,"/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so" -Wl,--as-needed  /usr/local/lib/libbreakpad_client.a  /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so  -lpthread  -Wl,--no-as-needed,"/tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libtorch.so" -Wl,--as-needed  /tmp/tmp.GKeM3KKcFi/lib/python3.6/site-packages/torch/lib/libc10.so && :
[10/11] cd /home/eliuriegas/work/audio/build/temp.linux-x86_64-3.6 && /usr/bin/cmake -P cmake_install.cmake
-- Install configuration: "Release"
-- Installing: /home/eliuriegas/work/audio/build/lib.linux-x86_64-3.6/torchaudio/./_torchaudio.so
-- Set runtime path of "/home/eliuriegas/work/audio/build/lib.linux-x86_64-3.6/torchaudio/./_torchaudio.so" to ""
installing to build/bdist.linux-x86_64/wheel
running install
running install_lib
creating build/bdist.linux-x86_64/wheel
creating build/bdist.linux-x86_64/wheel/torchaudio
copying build/lib.linux-x86_64-3.6/torchaudio/kaldi_io.py -> build/bdist.linux-x86_64/wheel/torchaudio
copying build/lib.linux-x86_64-3.6/torchaudio/transforms.py -> build/bdist.linux-x86_64/wheel/torchaudio
copying build/lib.linux-x86_64-3.6/torchaudio/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio
creating build/bdist.linux-x86_64/wheel/torchaudio/compliance
copying build/lib.linux-x86_64-3.6/torchaudio/compliance/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/compliance
copying build/lib.linux-x86_64-3.6/torchaudio/compliance/kaldi.py -> build/bdist.linux-x86_64/wheel/torchaudio/compliance
creating build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/cmuarctic.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/librispeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/libritts.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/vctk.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/commonvoice.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/gtzan.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/ljspeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/speechcommands.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/tedlium.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
copying build/lib.linux-x86_64-3.6/torchaudio/datasets/yesno.py -> build/bdist.linux-x86_64/wheel/torchaudio/datasets
creating build/bdist.linux-x86_64/wheel/torchaudio/_internal
copying build/lib.linux-x86_64-3.6/torchaudio/_internal/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal
copying build/lib.linux-x86_64-3.6/torchaudio/_internal/fft.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal
copying build/lib.linux-x86_64-3.6/torchaudio/_internal/module_utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/_internal
creating build/bdist.linux-x86_64/wheel/torchaudio/backend
copying build/lib.linux-x86_64-3.6/torchaudio/backend/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend
copying build/lib.linux-x86_64-3.6/torchaudio/backend/common.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend
copying build/lib.linux-x86_64-3.6/torchaudio/backend/no_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend
copying build/lib.linux-x86_64-3.6/torchaudio/backend/soundfile_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend
copying build/lib.linux-x86_64-3.6/torchaudio/backend/sox_io_backend.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend
copying build/lib.linux-x86_64-3.6/torchaudio/backend/utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/backend
creating build/bdist.linux-x86_64/wheel/torchaudio/extension
copying build/lib.linux-x86_64-3.6/torchaudio/extension/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/extension
copying build/lib.linux-x86_64-3.6/torchaudio/extension/extension.py -> build/bdist.linux-x86_64/wheel/torchaudio/extension
creating build/bdist.linux-x86_64/wheel/torchaudio/models
copying build/lib.linux-x86_64-3.6/torchaudio/models/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models
copying build/lib.linux-x86_64-3.6/torchaudio/models/conv_tasnet.py -> build/bdist.linux-x86_64/wheel/torchaudio/models
copying build/lib.linux-x86_64-3.6/torchaudio/models/deepspeech.py -> build/bdist.linux-x86_64/wheel/torchaudio/models
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2letter.py -> build/bdist.linux-x86_64/wheel/torchaudio/models
copying build/lib.linux-x86_64-3.6/torchaudio/models/wavernn.py -> build/bdist.linux-x86_64/wheel/torchaudio/models
creating build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/components.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/model.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2
creating build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/import_fairseq.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils
copying build/lib.linux-x86_64-3.6/torchaudio/models/wav2vec2/utils/import_huggingface.py -> build/bdist.linux-x86_64/wheel/torchaudio/models/wav2vec2/utils
creating build/bdist.linux-x86_64/wheel/torchaudio/sox_effects
copying build/lib.linux-x86_64-3.6/torchaudio/sox_effects/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/sox_effects
copying build/lib.linux-x86_64-3.6/torchaudio/sox_effects/sox_effects.py -> build/bdist.linux-x86_64/wheel/torchaudio/sox_effects
creating build/bdist.linux-x86_64/wheel/torchaudio/utils
copying build/lib.linux-x86_64-3.6/torchaudio/utils/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/utils
copying build/lib.linux-x86_64-3.6/torchaudio/utils/sox_utils.py -> build/bdist.linux-x86_64/wheel/torchaudio/utils
creating build/bdist.linux-x86_64/wheel/torchaudio/functional
copying build/lib.linux-x86_64-3.6/torchaudio/functional/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional
copying build/lib.linux-x86_64-3.6/torchaudio/functional/filtering.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional
copying build/lib.linux-x86_64-3.6/torchaudio/functional/functional.py -> build/bdist.linux-x86_64/wheel/torchaudio/functional
creating build/bdist.linux-x86_64/wheel/torchaudio/prototype
copying build/lib.linux-x86_64-3.6/torchaudio/prototype/__init__.py -> build/bdist.linux-x86_64/wheel/torchaudio/prototype
copying build/lib.linux-x86_64-3.6/torchaudio/prototype/rnnt_loss.py -> build/bdist.linux-x86_64/wheel/torchaudio/prototype
copying build/lib.linux-x86_64-3.6/torchaudio/version.py -> build/bdist.linux-x86_64/wheel/torchaudio
copying build/lib.linux-x86_64-3.6/torchaudio/_torchaudio.so -> build/bdist.linux-x86_64/wheel/torchaudio
running install_egg_info
running egg_info
writing torchaudio.egg-info/PKG-INFO
writing dependency_links to torchaudio.egg-info/dependency_links.txt
writing requirements to torchaudio.egg-info/requires.txt
writing top-level names to torchaudio.egg-info/top_level.txt
reading manifest file 'torchaudio.egg-info/SOURCES.txt'
writing manifest file 'torchaudio.egg-info/SOURCES.txt'
Copying torchaudio.egg-info to build/bdist.linux-x86_64/wheel/torchaudio-0.10.0a0+9e36281-py3.6.egg-info
running install_scripts
adding license file "LICENSE" (matched pattern "LICEN[CS]E*")
creating build/bdist.linux-x86_64/wheel/torchaudio-0.10.0a0+9e36281.dist-info/WHEEL
creating 'dist/torchaudio-0.10.0a0+9e36281-cp36-cp36m-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it
adding 'torchaudio/__init__.py'
adding 'torchaudio/_torchaudio.so'
adding 'torchaudio/kaldi_io.py'
adding 'torchaudio/transforms.py'
adding 'torchaudio/version.py'
adding 'torchaudio/_internal/__init__.py'
adding 'torchaudio/_internal/fft.py'
adding 'torchaudio/_internal/module_utils.py'
adding 'torchaudio/backend/__init__.py'
adding 'torchaudio/backend/common.py'
adding 'torchaudio/backend/no_backend.py'
adding 'torchaudio/backend/soundfile_backend.py'
adding 'torchaudio/backend/sox_io_backend.py'
adding 'torchaudio/backend/utils.py'
adding 'torchaudio/compliance/__init__.py'
adding 'torchaudio/compliance/kaldi.py'
adding 'torchaudio/datasets/__init__.py'
adding 'torchaudio/datasets/cmuarctic.py'
adding 'torchaudio/datasets/commonvoice.py'
adding 'torchaudio/datasets/gtzan.py'
adding 'torchaudio/datasets/librispeech.py'
adding 'torchaudio/datasets/libritts.py'
adding 'torchaudio/datasets/ljspeech.py'
adding 'torchaudio/datasets/speechcommands.py'
adding 'torchaudio/datasets/tedlium.py'
adding 'torchaudio/datasets/utils.py'
adding 'torchaudio/datasets/vctk.py'
adding 'torchaudio/datasets/yesno.py'
adding 'torchaudio/extension/__init__.py'
adding 'torchaudio/extension/extension.py'
adding 'torchaudio/functional/__init__.py'
adding 'torchaudio/functional/filtering.py'
adding 'torchaudio/functional/functional.py'
adding 'torchaudio/models/__init__.py'
adding 'torchaudio/models/conv_tasnet.py'
adding 'torchaudio/models/deepspeech.py'
adding 'torchaudio/models/wav2letter.py'
adding 'torchaudio/models/wavernn.py'
adding 'torchaudio/models/wav2vec2/__init__.py'
adding 'torchaudio/models/wav2vec2/components.py'
adding 'torchaudio/models/wav2vec2/model.py'
adding 'torchaudio/models/wav2vec2/utils/__init__.py'
adding 'torchaudio/models/wav2vec2/utils/import_fairseq.py'
adding 'torchaudio/models/wav2vec2/utils/import_huggingface.py'
adding 'torchaudio/prototype/__init__.py'
adding 'torchaudio/prototype/rnnt_loss.py'
adding 'torchaudio/sox_effects/__init__.py'
adding 'torchaudio/sox_effects/sox_effects.py'
adding 'torchaudio/utils/__init__.py'
adding 'torchaudio/utils/sox_utils.py'
adding 'torchaudio-0.10.0a0+9e36281.dist-info/LICENSE'
adding 'torchaudio-0.10.0a0+9e36281.dist-info/METADATA'
adding 'torchaudio-0.10.0a0+9e36281.dist-info/WHEEL'
adding 'torchaudio-0.10.0a0+9e36281.dist-info/top_level.txt'
adding 'torchaudio-0.10.0a0+9e36281.dist-info/RECORD'
removing build/bdist.linux-x86_64/wheel

```

</details>

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: walterddr

Differential Revision: D29316372

Pulled By: seemethere

fbshipit-source-id: 02be64df6197c0d4bad5a5bfb3cef336c11f53ed
2021-06-23 14:08:19 -07:00
ad1041576a Fix loop types (#60504)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60504

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29313197

fbshipit-source-id: bc86622b587e4fdb73431c2ff27300404c9693ae
2021-06-23 13:26:22 -07:00
da030c59e7 ENH Adds Byte support for nll_loss (CPU) (#60308)
Summary:
Addresses a part of https://github.com/pytorch/pytorch/issues/59765

This PR adds byte support for nll_loss on the CPU for `input.dim() == 2`.

CUDA support will be implemented when `nll_loss` migration to CUDA is completed in https://github.com/pytorch/pytorch/pull/60299 and https://github.com/pytorch/pytorch/pull/60097

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60308

Reviewed By: VitalyFedyunin

Differential Revision: D29329458

Pulled By: jbschlosser

fbshipit-source-id: d3585c4966030bc61e451f8aa817406a8a3acf47
2021-06-23 12:16:45 -07:00
7bf195f360 fix kernel launch check in cross kernel
Summary: per title

Test Plan: buck test mode/opt //caffe2/test:kernel_launch_checks -- --exact 'caffe2/test:kernel_launch_checks - test_check_cuda_launches (test_kernel_launch_checks.AlwaysCheckCudaLaunchTest)' --run-disabled

Reviewed By: r-barnes

Differential Revision: D29335739

fbshipit-source-id: 385c66b1806886deba35f7fd83e29e0885999119
2021-06-23 11:47:50 -07:00
308d238377 add SequenceMask op (#60235)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60235

This diff
- added SequenceMask op in Dper3 (caffe2 & pytorch)
- added shape inference functions for SequenceMask op

Test Plan:
```
buck test dper3/dper3/modules/low_level_modules/tests:single_operators_test -- test_sequence_mask
```

Differential Revision: D29210097

fbshipit-source-id: cab3460e0fd6c49bec6d0c5c624bd4652de7604b
2021-06-23 11:33:00 -07:00
e60f9cfc58 Revert D29135358: [quant] Input-Weight Equaliaztion - convert modifications
Test Plan: revert-hammer

Differential Revision:
D29135358 (3de79b7757)

Original commit changeset: 2d0005672904

fbshipit-source-id: cac30c1202ebbce4f22e50ed920340c7b4c6849f
2021-06-23 11:23:24 -07:00
03ab5b72c9 Fix parallel tbb build (#60532)
Summary:
Small typo in https://github.com/pytorch/pytorch/issues/60183

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60532

Reviewed By: walterddr

Differential Revision: D29336173

Pulled By: ngimel

fbshipit-source-id: 57d753f21d484bbae26a23cb3eb35e497e25118a
2021-06-23 11:16:36 -07:00
bea83e2e46 Add NoChunk wrapper for pipeline args. (#57325)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57325

As per the design outlined in
https://github.com/pytorch/pytorch/issues/53952, adding a `NoChunk` wrapper for
pipeline parallelism inputs.

If a Tensor is wrapped with this wrapper, the pipeline implementation does not
split this Tensor across micro-batches and instead just replicates this tensor
as-is similar to non-tensors.
ghstack-source-id: 132009305

Test Plan:
1) unit tests.
2) waitforbuildbot.

Reviewed By: SciPioneer

Differential Revision: D28109277

fbshipit-source-id: ee78c814c715d207d2796aba40b756a8e1834898
2021-06-23 11:13:14 -07:00
6385621003 Use JOB_BASE_NAME throughout code--consolidate CIRCLE_JOB (#60425)
Summary:
This PR is a first step in unifying our environment variables across CI (so that we don't have `CIRCLE_BLAH` in our GHA workflows, for example), though I'd like for this PR to be more for discussion about how best to consolidate these variables.

This small change only changes most CIRCLE_JOB references in our code to be JOB_BASE_NAME, as that seems the closest GHA (and ROCm) equivalent. Currently, JOB_BASE_NAME is defined as:
- in Circle: CIRCLE_JOB (name of the job, like `pytorch_linux_bionic_py3_8_gcc9_coverage_test1`)
- in GHA: the build_environment with a `-build` or `-test` tacked to the end , e.g., `pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7-test`
- in ROCm: I don't actually know, but it's important for ROCm test sharding as shown in https://github.com/pytorch/pytorch/pull/60409

I am not sure if this is the intention for JOB_BASE_NAME so it is open to discussion what variable we should use if not JOB_BASE_NAME. I also don't know if it's worth the effort consolidating all these variables, so discussion is also highly encouraged there!

Next steps:
- Consolidate more CIRCLE_* references, maybe into CI_* equivalents?
- We use BUILD_ENVIRONMENT everywhere in Circle though the variable is inconsistent across binary vs CI jobs and across platforms. For example, for linux tests and builds, BUILD_ENVIRONMENT contains the `_test` and `_build` suffixes, but the windows jobs don't. In GHA, BUILD_ENVIRONMENT is similar to how it's defined in windows jobs on Circle. This inconsistency is confusing, and we can probably do something about it. I'm thinking of switching out BUILD_ENVIRONMENT for JOB_BASE_NAME in our test scripts where we actually mean JOB_BASE_NAME.
- We should probably document the meaning of the variables we consolidate somewhere, preferably in a README in some unified `ci/` folder. For example, it seems BUILD_ENVIRONMENT is supposed to capture the build environment, whereas JOB_BASE_NAME is supposed to capture the environment _and_ whether we're building or testing.

Notes:
- I did not replace CIRCLE_JOB references in third_party directories
- Previously, print_test_stats reported CIRCLE_JOB as only the build environment for GHA workflows, and I think tacking on the `build` or `test` will not harm anything, though I may be wrong.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60425

Reviewed By: seemethere, samestep

Differential Revision: D29333882

Pulled By: janeyx99

fbshipit-source-id: a82080e6205a03a1183035011ce59698eca06748
2021-06-23 11:11:21 -07:00
ff3678eec2 Disable group group backend rpc tests from running on CI (#60407)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60407

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29278179

Pulled By: H-Huang

fbshipit-source-id: ee78085eeb04d81842c95236b8c3a33de7142a3a
2021-06-23 10:58:31 -07:00
109f831409 Support non-Tensor args in the Pipe API (#57226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57226

As per the design outlined in
https://github.com/pytorch/pytorch/issues/53952, this PR adds support for
non-Tensor args in the pipeline.

The `NoChunk` wrapper hasn't been implemented yet and will be implemented in a
follow up PR.
ghstack-source-id: 132008356

Test Plan:
1) unit tests
2) waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D28083564

fbshipit-source-id: 5f09da238eec0167feff76fe98916dedb0a9ae4e
2021-06-23 10:53:37 -07:00
10e11dbdcd Reland D29190420: [nnc][tests] Tests and benchmarks for computeSum (#60550)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60550

Original commit changeset: ed655497a981

Whatever gcc version OSS Bazel uses wasn't happy move-constructing the
SimpleIREvaluator, so use a unique_ptr instead.

Test Plan:
CI.  Hope that the gcc version used by OSS Bazel build is
happier with this (it should be), since actually testing it locally is
an intractable pain.

Reviewed By: navahgar

Differential Revision: D29333116

fbshipit-source-id: c3e4b5d8c91eb96a43ae5315a01ca0c0f4d4a99d
2021-06-23 10:50:03 -07:00
5fd45b8089 Port any kernel to structured kernels. (#60361)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60361

Tracking issue: #55070

This PR was openned so as to solve the CI failures in main when merging: #59371 #59372 #59373 #59937 #59938.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29265859

Pulled By: ezyang

fbshipit-source-id: 0cca0431569f38a168473b5cc572ced473799961
2021-06-23 10:44:24 -07:00
a5aa940f5e Port all kernel to structured kernels. (#60360)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60360

Tracking issue: #55070

This PR was openned so as to solve the CI failures in main when merging: #59371 #59372 #59373 #59937 #59938.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29265856

Pulled By: ezyang

fbshipit-source-id: 6e9b45ad3fc3852bb142ae2e3d58fc5d0a911aed
2021-06-23 10:43:25 -07:00
7b2d375148 Fix convolution_depthwise3x3_winograd for multichannel output (#60460)
Summary:
Before this change it was implemented with the assumption, that number of groups, input  and output channels are the same, which is not always the case
Extend the implementation to support any number of output channels as long as number of groups equals to the number of input channels (i.e. kernel.size(1) == 1)

Fixes https://github.com/pytorch/pytorch/issues/60176

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60460

Reviewed By: albanD

Differential Revision: D29299693

Pulled By: malfet

fbshipit-source-id: 31130c71ce86535ccfba2f4929eee3e2e287b2f0
2021-06-23 10:38:14 -07:00
c63a0d0cfe Adding windows CUDA smoke tests on PRs (#59686)
Summary:
Adding windows CUDA smoke tests on PRs (master should run the full suite).

Next step:
- Automate data update so we get a new smoke test list without manual effort

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59686

Test Plan: https://github.com/pytorch/pytorch/actions/runs/958296267 The sharded smoke tests take long still because of dependencies installation

Reviewed By: walterddr

Differential Revision: D29243533

Pulled By: janeyx99

fbshipit-source-id: dde7ba127fa15c95bda0e833cc5311598fb85e2b
2021-06-23 10:13:50 -07:00
8162439cbd [DDP] Remove python GradBucket construction (#60301)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60301

`GradBucket` is not meant to be constructed by Python user, only
consumed as part of comm. hook
ghstack-source-id: 131860243

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D29239320

fbshipit-source-id: f1631a16e7d66b7e4a9b4a44698e2319005d10b2
2021-06-23 10:05:34 -07:00
e8690dacb2 To add Nesterov Adam Algorithm to Optimizers (#59009)
Summary:
Fixes : https://github.com/pytorch/pytorch/issues/5804

In the paper : https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ  Timothy Dozat suggested a new optimization algorithm with an essence of combination of NAG and Adam algorithms.

It is known that the idea of momentum can be improved with the Nesterov acceleration in optimization algorithms, and Dozat is investigating to apply this idea to momentum component of Adam algorithm. Author provided experiment evidence in their work to show excellence of the idea.

In this PR we are implementing the proposed algorithm NAdam in the mentioned paper. Author has a preliminary work http://cs229.stanford.edu/proj2015/054_report.pdf  where he shows the decay base constant should be taken as 0.96 which we also followed the same phenomenon here in this implementation similar to Keras. Moreover, implementation / coding practice have been followed similar to Keras in some other places as well:

f9d3868495/tensorflow/python/keras/optimizer_v2/nadam.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59009

Reviewed By: gchanan, vincentqb

Differential Revision: D29220375

Pulled By: iramazanli

fbshipit-source-id: 4b4bb4b15f7e16f7527f368bbf4207ed345751aa
2021-06-23 08:21:43 -07:00
a2525b035c Remove unused sample input argument from functions to resolve issue #55737 (#60486)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60486

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D29311875

Pulled By: NivekT

fbshipit-source-id: 4bf451c4f8e78290398e0514860a14a335a51fa7
2021-06-23 08:02:04 -07:00
265f0e5321 Add device runtime API for the plug-in to register platform python module into torch (#59857)
Summary:
## Motivation
Allow the out-of-tree Pytorch plug-in, for the device type other than CUDA, to add the runtime interface to the `torch` module. The runtime interface of the device can be referred with the device type name in the `torch` module. I.E., `torch.cuda` or `torch.xpu`.

## Solution
- Add a register interface for the plug-in to add the platform python module into `torch` module with the device type name. I.E., The `torch.xpu` can be used to refer the XPU runtime interface after the XPU runtime module is registered with `torch._register_device_module('xpu', xpu_module)` in Intel's XPU plug-in.

## Additional Context
More details about runtime has been discussed in https://github.com/pytorch/pytorch/issues/53707.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59857

Reviewed By: mrshenli

Differential Revision: D29309320

Pulled By: ezyang

fbshipit-source-id: b9802a5f937ddef9e0bdaf2f7692dfe463912fbe
2021-06-23 07:54:45 -07:00
c97d4d5a34 Fix test failures with some glibc libraries (#60450)
Summary:
Large complex values lead to nan/inf results when using some glibc
implementations of atanh/acos
- Skip test_reference_numerics_hard instead of "normal"
- Test the edge values only for cdouble where the stdlib/glibc implementations support those large values

Fixes https://github.com/pytorch/pytorch/issues/60259

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60450

Reviewed By: mrshenli

Differential Revision: D29304834

Pulled By: ezyang

fbshipit-source-id: d6b97456847c5573b9d2cb447bfc62abba73cb2a
2021-06-23 07:49:27 -07:00
f0e4e4be72 Clean Up ZeRO (#60285)
Summary:
**Overview:**
Being relatively new to PyTorch and ZeRO, I found parts of the code slightly hard to follow. This change strives to clean up the `ZeroRedundancyOptimizer` code in `zero_redundancy_optimizer.py` by reorganizing some computations, making variable names more explicit and consistent, and unifying terminology in the documentation. The goal is for the code to be easier to extend afterwards.

**Changes:**
1) `state_dict()`: The [logic](85517a2b70/torch/distributed/optim/zero_redundancy_optimizer.py (L510)) for updating the global `state_dict` with each rank's local `state_dict` is simplified and made more explicit. Notably, the `dict` [`local_index_to_param_id`](85517a2b70/torch/distributed/optim/zero_redundancy_optimizer.py (L513)) is unneeded. It maps `local_pg["params"][i]` to `id(global_pg["params"][i])`, so it is equivalent to make a single pass over both lists in tandem, effectively iterating over `i`, without a need for the explicit `dict`.
2) `_update_trainable()`: The function [initializes](85517a2b70/torch/distributed/optim/zero_redundancy_optimizer.py (L597)) the local optimizer if it does not exist. I am unaware of any reason for the local optimizer to be destroyed after initialization, so I moved that logic to its own function `_init_local_optimizer()`, which is called once in the constructor.
After [discussion](https://github.com/pytorch/pytorch/pull/60285#discussion_r654706728), I removed the function `_update_trainable()` itself in favor of adding a check for `parameters_as_bucket_view` in `build_param_buckets()` directly.
3) `rank_local_state_dict()`: This [function](85517a2b70/torch/distributed/optim/zero_redundancy_optimizer.py (L528)) is currently broken. It appears to be legacy and relies on the input `state_dict` to have the key `"partitions"`. For now, I have removed it and added an [issue](https://github.com/pytorch/pytorch/issues/60284). Is it a notable use case to want to access another rank's `state_dict` in particular (as opposed to consolidating the entire state and then accessing)?
4) `local_state_dict():` After [discussion](https://github.com/pytorch/pytorch/pull/60285#discussion_r655571043), I removed the function.
5) `partition_parameters()`: After [discussion](https://github.com/pytorch/pytorch/pull/60285#discussion_r654708183), I renamed the function to `_partition_parameters()` to mark it as private.
6) `_param_to_index`: After [discussion](https://github.com/pytorch/pytorch/pull/60285#discussion_r654828100), I changed the key to be the parameter itself rather than its integer ID.
7) `buckets`: I renamed the data structure to `_buckets` to mark it as private.
8) Terminology: I tried to reduce the set of terms being used instead of juggling a number of synonyms. In particular, I made an effort to distinguish between "local" and "global" and to make names more indicative of typing.
9) Style: Per the [PyTorch contributing guide](https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md#writing-documentation), I made all docstrings abide by the 80 character limit, except for the one [line](554891f6fa/torch/distributed/optim/zero_redundancy_optimizer.py (L142)) showing the example ZeRO usage. Some code lines violate the limit for readability. Also, I unified some of the minor stylistic usages out of habit.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60285

Test Plan:
The test suite passes as expected (on the AI AWS cluster):
```
gpurun python test/distributed/optim/test_zero_redundancy_optimizer.py
```
I visually inspected the generated HTML doc (as generated following [this](https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md#writing-documentation)).

Reviewed By: mrshenli

Differential Revision: D29320726

Pulled By: andwgu

fbshipit-source-id: 23f69a19ecc5e877a38fe1df0da11329428311dd
2021-06-23 07:21:40 -07:00
56481f9762 Ensure proper syncs for out-of-place grad creation (torch.autograd.grad) when backward ops run on side streams (#60127)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59844.

Streaming backwards collects "leaf streams" for AccumulateGrad functions that stash or accumulate .grad attributes for autograd leaf tensors, and syncs those streams with some ambient stream(s) so later ops can safely consume the grads on the ambient stream(s).

But, currently, streaming backwards does not collect leaf streams for grads produced out-of-place (ie, not stashed onto a .grad attribute) by `torch.autograd.grad`, because these out-of-place grads are "captured" and returned before they reach an AccumulateGrad function. Some out-of-place grads might not even have an AccumulateGrad function to go to, because `torch.autograd.grad` can be told to make grads for non-leaf temporaries.[1]

The upshot is, when streaming backwards makes ops that produce out-of-place gradients run on side streams, no ambient stream is told to sync on these side streams, so `torch.autograd.grad` doesn't offer the same post-call safe-use guarantees for grads as the leaf accumulation of `torch.autograd.backward`.

This PR ensures `torch.autograd.grad` gives the same safe-use guarantees as `torch.autograd.backward` by also stashing leaf streams for grads created out-of-place.

I augmented a streaming backwards test to include a torch.autograd.grad attempt. The test fails on current master[2] and passes with the engine.cpp diffs.

I have no idea if this bug or its fix matter to distributed autograd. pritamdamania mrshenli should take a look before it's merged.

[1] example:
```python
leaf = torch.tensor(..., requires_grad=True)
tmp = leaf * 2
loss = tmp.sum()
torch.autograd.grad(loss, inputs=(tmp, leaf))
```
Technically, because `torch.autograd.grad` can be told to produce grads for non-leaf temporaries, these streams might NOT be "leaf streams". Maybe I should rename `leaf_streams`?

[2] the way the test currently fails is fun: it reports
```
AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 0 element(s) (out of 25) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.0 (5.0 vs. 5.0), which occurred at index (0, 0).
```
I suspect this [kafka trap](https://en.wiktionary.org/wiki/Kafkatrap) happens because assertEqual does a comparison test on the device, syncs on some bool result, sees failure and prints the tensors post-sync at which point is IS safe to access the values.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60127

Reviewed By: mrshenli

Differential Revision: D29276581

Pulled By: albanD

fbshipit-source-id: a9f797e2fd76e2f884cce5a32ecf5d9b704c88ee
2021-06-23 07:14:01 -07:00
b14f19b6fe Revert D29190420: [nnc][tests] Tests and benchmarks for computeSum
Test Plan: revert-hammer

Differential Revision:
D29190420 (21479ad20c)

Original commit changeset: 86246df82098

fbshipit-source-id: ed655497a981783da4c8f13e2d7fec104e3cb184
2021-06-23 06:59:37 -07:00
90cd57ee16 To add edge_order=2 and documentation for gradient operator (#58165)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56036
Fixes https://github.com/pytorch/pytorch/issues/56130

* All the interior points are computed using second order accurate central differences method for gradient operator. However, currently we only have first order method computation for edge points. In this PR we are adding second order methods for edge points as well.

* Currently, there is no detailed description of how gradient operator computed using second order method, and how to use parameters correctly. We add detailed explanation of meaning of each parameter, and return of the gradient operator, meanwhile giving description of the second-order computation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58165

Reviewed By: mruberry

Differential Revision: D29305321

Pulled By: iramazanli

fbshipit-source-id: 0e0e418eed801c8510b8babe2ad3d064479fb4d6
2021-06-23 03:35:15 -07:00
7ed07e2a7d [NormalizeArgs] Retain node.meta (#60449)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60449

After normalizing args, still retain each node's `meta`

Test Plan: Added unit test.

Reviewed By: gcatron

Differential Revision: D29293179

fbshipit-source-id: 432b409790041fa4d6e759f7b46a8bee363497b0
2021-06-23 03:31:53 -07:00
66452e0a8c Ensure num_threads is initialized before calling omp_get_max_threads (#60185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60185

`get_num_threads` is usually called before `parallel_for` so there's no
guaruntee we've initialized `num_threads` properly.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29287814

Pulled By: ngimel

fbshipit-source-id: 7e9c86fc32d63889a57a9b1d2b7d8f3863481dce
2021-06-23 01:18:24 -07:00
19553438ed OpenMP: Refactor parallel_reduce to share code with parallel_for (#60184)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60184

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29287817

Pulled By: ngimel

fbshipit-source-id: 734a33a8d965208662989e2497b345b68c132498
2021-06-23 01:18:22 -07:00
c75714e594 Ensure thread id is valid in nested parallel regions (#60183)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60183

Fixes https://github.com/pytorch/pytorch/pull/59149#issuecomment-863287331

`parallel_for` will call the function directly if it would have run on only a
single thread anyway. This is great for performance, but causes an issue in
nested parallel regions because `get_thread_num` will reflect the parent
parallel region instead of the current `parallel_for` call.

I fix this by using a `thread_local` variable for the current thread id and
manually setting it before each call to the user-provided function.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D29287816

Pulled By: ngimel

fbshipit-source-id: 777f771a0900750c7f22eb1dd185d84d19282108
2021-06-23 01:17:09 -07:00
3f3fd57044 Migrate crossKernel from THC to ATen (CUDA) (#60039)
Summary:
Ref  https://github.com/pytorch/pytorch/issues/24507 (There doesn't seem to be an actual issue for cross)

This also moves the remaining operator functors in `THCTensorMathPointwise.cuh`  to `SparseCUDATensorMath.cu` which is the only file using them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60039

Reviewed By: mrshenli

Differential Revision: D29314638

Pulled By: ngimel

fbshipit-source-id: aa7b57f6e11a933fb44f044e26945bb4a9e3de5f
2021-06-23 00:37:55 -07:00
f590cceacb [BE] Fix Convolution.cpp build warnings (#60463)
Summary:
Use `c10::irange` and `auto` to get rid of narrowing cast and signed-unsigned compilation warnings

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60463

Reviewed By: samestep

Differential Revision: D29300415

Pulled By: malfet

fbshipit-source-id: 4d7f519e2e3ebaa754364f60af762658c1b4a62e
2021-06-23 00:02:33 -07:00
3846cef2d7 Increase tolerance for test_grad_scaling_clipping (#60458)
Summary:
This makes it pass on A100 and with e.g. torch.manual_seed(6) called before running this test.

Fixes https://github.com/pytorch/pytorch/issues/60455

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60458

Reviewed By: mrshenli

Differential Revision: D29309618

Pulled By: ngimel

fbshipit-source-id: 72584087bcc949f7bc96b0644b701e69ae1fa025
2021-06-22 23:43:25 -07:00
40de03fc55 topk on CUDA supports bfloat16 (#59977)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56176 via https://github.com/pytorch/pytorch/issues/58196

CC zasdfgbnm ngimel ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59977

Reviewed By: mrshenli

Differential Revision: D29315018

Pulled By: ngimel

fbshipit-source-id: 0a87e7f155a97225fc6b2ec5dc0dc38a23156b41
2021-06-22 23:39:24 -07:00
21479ad20c [nnc][tests] Tests and benchmarks for computeSum (#60160)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60160

Adds a few simple tests and benchmarks for the `computeSum` op
(equivalent to `at::sum`).

The benchmarks test 1D reduction and 2D row and column reduction.  Performance
is in the ballpark of aten (14-15 GB/s) on my skylake devserver for all cases,
and occasionally better (e.g. 256k * 64 row reduction goes from 9 GB/s to 13).

Results (on my skylake-avx512, with turbo disabled):
```
------------------------------------------------------------------------------------------
Benchmark                                   Time           CPU Iterations UserCounters...
------------------------------------------------------------------------------------------
Reduce1D/Torch/16777216               4746995 ns    4746722 ns        150 BYTES=14.1379G/s
Reduce1D/Naive/16777216              34063215 ns   34061388 ns         21 BYTES=1.97023G/s
Reduce1D/NativeRfactor/16777216       5057175 ns    5057167 ns        139 BYTES=13.2701G/s
Reduce1D/TeNaive/16777216            33868945 ns   33868851 ns         21 BYTES=1.98143G/s
Reduce1D/TeSplitTail/16777216        33902786 ns   33900436 ns         21 BYTES=1.97959G/s
Reduce1D/TeSplitMask/16777216        33922509 ns   33920604 ns         21 BYTES=1.97841G/s
Reduce1D/TeRfactorV1/16777216         5141150 ns    5141002 ns        135 BYTES=13.0537G/s
Reduce1D/Op/16777216                  5140390 ns    5140091 ns        135 BYTES=13.056G/s
Reduce2DCol/Torch/8/2097152          12824403 ns   12823563 ns         55 BYTES=5.8874G/s
Reduce2DCol/Torch/64/262144           8306873 ns    8306743 ns         83 BYTES=8.20507G/s
Reduce2DCol/Torch/4096/4096           7992364 ns    7992239 ns         87 BYTES=8.3988G/s
Reduce2DCol/OpSchedule/8/2097152/0    4866144 ns    4865766 ns        138 BYTES=15.5161G/s
Reduce2DCol/OpSchedule/64/262144/0   36668978 ns   36666415 ns         19 BYTES=1.85885G/s
Reduce2DCol/OpSchedule/4096/4096/0  155862459 ns  155801266 ns          4 BYTES=430.839M/s
Reduce2DCol/OpSchedule/8/2097152/1    8067683 ns    8061117 ns         85 BYTES=9.36563G/s
Reduce2DCol/OpSchedule/64/262144/1    7496686 ns    7496562 ns         93 BYTES=9.09183G/s
Reduce2DCol/OpSchedule/4096/4096/1    5262821 ns    5262186 ns        131 BYTES=12.7562G/s
Reduce2DCol/OpSchedule/8/2097152/2    6237899 ns    6237210 ns        109 BYTES=12.1044G/s
Reduce2DCol/OpSchedule/64/262144/2    5258012 ns    5257655 ns        127 BYTES=12.9635G/s
Reduce2DCol/OpSchedule/4096/4096/2    5231686 ns    5228241 ns        132 BYTES=12.839G/s
Reduce2DCol/OpSchedule/8/2097152/3   11088573 ns   11087557 ns         62 BYTES=6.80921G/s
Reduce2DCol/OpSchedule/64/262144/3    5338843 ns    5338326 ns        127 BYTES=12.7676G/s
Reduce2DCol/OpSchedule/4096/4096/3    4311617 ns    4308102 ns        162 BYTES=15.5812G/s
Reduce2DRow/Torch/8/2097152           4642244 ns    4641794 ns        151 BYTES=14.4575G/s
Reduce2DRow/Torch/64/262144           4628311 ns    4628245 ns        151 BYTES=14.4999G/s
Reduce2DRow/Torch/4096/4096           4894012 ns    4893316 ns        143 BYTES=13.7177G/s
Reduce2DRow/Torch/262144/64          10469098 ns   10468027 ns         68 BYTES=6.51101G/s
Reduce2DRow/Hand/262144/64            5554380 ns    5554059 ns        126 BYTES=12.2716G/s
Reduce2DRow/OpSchedule/8/2097152/0   33890363 ns   33888931 ns         21 BYTES=1.98026G/s
Reduce2DRow/OpSchedule/64/262144/0   33901317 ns   33899436 ns         21 BYTES=1.97965G/s
Reduce2DRow/OpSchedule/4096/4096/0   33500358 ns   33498815 ns         21 BYTES=2.00381G/s
Reduce2DRow/OpSchedule/262144/64/0   13132231 ns   13131049 ns         53 BYTES=5.19056G/s
Reduce2DRow/OpSchedule/8/2097152/1    5200423 ns    5200025 ns        134 BYTES=12.9055G/s
Reduce2DRow/OpSchedule/64/262144/1    5204428 ns    5204327 ns        133 BYTES=12.8949G/s
Reduce2DRow/OpSchedule/4096/4096/1    8724355 ns    8723370 ns         80 BYTES=7.69488G/s
Reduce2DRow/OpSchedule/262144/64/1 1811861280 ns 1811352083 ns          1 BYTES=37.6279M/s
Reduce2DRow/OpSchedule/8/2097152/2    9169829 ns    9168946 ns         76 BYTES=7.31915G/s
Reduce2DRow/OpSchedule/64/262144/2    9159901 ns    9158560 ns         76 BYTES=7.32747G/s
Reduce2DRow/OpSchedule/4096/4096/2    9217398 ns    9215557 ns         76 BYTES=7.28391G/s
Reduce2DRow/OpSchedule/262144/64/2   10820450 ns   10818998 ns         66 BYTES=6.29979G/s
Reduce2DRow/OpSchedule/8/2097152/3    5227921 ns    5226544 ns        133 BYTES=12.84G/s
Reduce2DRow/OpSchedule/64/262144/3    5194362 ns    5194082 ns        133 BYTES=12.9203G/s
Reduce2DRow/OpSchedule/4096/4096/3    5196080 ns    5195349 ns        134 BYTES=12.9203G/s
Reduce2DRow/OpSchedule/262144/64/3    5235189 ns    5234728 ns        133 BYTES=13.0202G/s
```

ghstack-source-id: 131753875

Test Plan: these tests

Reviewed By: navahgar

Differential Revision: D29190420

fbshipit-source-id: 86246df82098da4f5493d6c4f34a40016d95a9f0
2021-06-22 23:04:09 -07:00
fbeb8b4992 [nnc] Speed up batchnorm benchmark
Summary:
Use better scheduling: fuse and parallelize NC, fuse and
vectorize HW.

```
-----------------------------------------------
 N/C/H/W               ATen               NNC
-----------------------------------------------
1/64/112/112          45449 ns         36672 ns
1/256/14/14           15555 ns	        7116 ns
1/128/28/28           15737 ns	        8560 ns
1/64/56/56            20766 ns	       12153 ns
1/512/7/7             16985 ns	        8182 ns

5/64/112/112        2532475 ns	     2069668 ns
5/256/14/14           24507 ns	       12228 ns
5/128/28/28           29352 ns	       20146 ns
5/64/56/56            44786 ns	       38784 ns
5/512/7/7             22307 ns	       20505 ns
```

Test Plan: benchmark results above

Reviewed By: navahgar

Differential Revision: D29288658

fbshipit-source-id: dd05efa4b7d26b6ad94f54a9ef6c8c47adb160b5
2021-06-22 22:57:43 -07:00
b0c9762e2d [pytorch][nnc] external function call to xnnpack ops (#59525)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59525

This PR added NNC external function call binding for two XNNPack ops:
- prepacked::linear_clamp_run
- prepacked::conv2d_clamp_run

Both ops take two arguments: a regular input tensor and a prepacked context
object that contains other parameters like weights/bias/etc. The prepacked
context object's type is a custom class.

NNC doesn't generate assembly code that reads the content of the prepacked
object directly. It simply passes it into the XNNPack ops wrapper, so both
NNC and the generated assembly code don't need to know the custom class type.

At compilation time, we use a size-1 dummy tensor as the placeholder for the
prepacked XNNPack context object.

At runtime, we pass in the raw pointer of the XNNPack context object as if it
were a regular tensor storage data pointer.

Inside the external function call wrapper, we reinterpret_cast the raw pointer
back to the custom class type before dispatching to the XNNPack ops.
ghstack-source-id: 132135512

Test Plan: unit test

Reviewed By: bertmaher

Differential Revision: D28924934

fbshipit-source-id: 15326b35dc6c022f4c3f247a2037c361e06e80b4
2021-06-22 21:29:31 -07:00
79dc500a99 Add error message for sequence length to be equal to 0 case for RNNs (#60269)
Summary:
Fixes #https://github.com/pytorch/pytorch/issues/50192

It has been discussed in the issue that, currently RNN apis do not support inputs with `seq_len=0` and the error message does not reflect this issue clearly. This PR is suggesting a solution to this issue, by adding a more clear error message that, none of RNN api (nn.RNN, nn.GRU and nn.LSTM) do not support `seq_len=0` for neither one-directional nor bi-directional layers.

```
import torch

input_size = 5
hidden_size = 6
rnn = torch.nn.GRU(input_size, hidden_size)

for seq_len in reversed(range(4)):
    output, h_n = rnn(torch.zeros(seq_len, 10, input_size))
    print('{}, {}'.format(output.shape, h_n.shape))
```

Previously was giving output as :

```
torch.Size([3, 10, 6]), torch.Size([1, 10, 6])
torch.Size([2, 10, 6]), torch.Size([1, 10, 6])
torch.Size([1, 10, 6]), torch.Size([1, 10, 6])
Traceback (most recent call last):
  File "test.py", line 8, in <module>
    output, h_n = rnn(torch.zeros(seq_len, 10, input_size))
  File "/opt/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/miniconda3/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 739, in forward
    result = _VF.gru(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: stack expects a non-empty TensorList
```

However, after adding this PR, this error message change for any combination of
[RNN, GRU and LSTM] x [one-directional, bi-directional].

Let's illustrate the change with the following code snippet:

```
import torch

input_size = 5
hidden_size = 6
rnn = torch.nn.LSTM(input_size, hidden_size, bidirectional=True)
output, h_n = rnn(torch.zeros(0, 10, input_size))
```

would give output as following:

```
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/fsx/users/iramazanli/pytorch/torch/nn/modules/module.py", line 1054, in _call_impl
    return forward_call(*input, **kwargs)
  File "/fsx/users/iramazanli/pytorch/torch/nn/modules/rnn.py", line 837, in forward
    result = _VF.gru(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: Expected sequence length to be larger than 0 in RNN
```

***********************************

The change for Packed Sequence didn't seem to be necessary because from the following code snippet error message looks clear about the issue:

```
import torch
import torch.nn.utils.rnn as rnn_utils
import torch.nn as nn
packed = rnn_utils.pack_sequence([])
```

returns:

```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/fsx/users/iramazanli/pytorch/torch/nn/utils/rnn.py", line 398, in pack_sequence
    return pack_padded_sequence(pad_sequence(sequences), lengths, enforce_sorted=enforce_sorted)
  File "/fsx/users/iramazanli/pytorch/torch/nn/utils/rnn.py", line 363, in pad_sequence
    return torch._C._nn.pad_sequence(sequences, batch_first, padding_value)
RuntimeError: received an empty list of sequences
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60269

Reviewed By: mrshenli

Differential Revision: D29299914

Pulled By: iramazanli

fbshipit-source-id: 5ca98faa28d4e6a5a2f7600a30049de384a3b132
2021-06-22 21:25:05 -07:00
dc9aa7b960 Add custom code filter for TS (#60309)
Summary:
-----------

Adds custom code filter for Torchscript to include tracing of forward calls.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60309

Reviewed By: zhxchen17

Differential Revision: D29317150

Pulled By: nikithamalgifb

fbshipit-source-id: d49e4dc74a2b8cc98b0d4967980d819908b7ea7b
2021-06-22 20:55:57 -07:00
3de79b7757 [quant] Input-Weight Equaliaztion - convert modifications (#59963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59963

When converting, before quantizing the nodes, we call
`update_obs_for_equalization()` and `convert_eq_obs()`.

`update_obs_for_equalization`:
1. For each InputEqualizationObserver, we find the corresponding
WeightEqualizationObserver.
2. For nn.Linear layers, we will create an instance of the
WeightEqualizationObserver, run forward on the observer with the given
weights.
3. Calculate the equalization scale between the
InputEqualizationObserver and WeightEqualizationObserver.

`convert_eq_obs`:
For every InputEqualizationObserver, we will do the following:
1. Create a node (ex. `x0_activation_post_process_scale`) containing the
equalization scale constant.
2. Create another node containing a `mul` operator multiplying the
equalization scale and the input.
3. Remove the current InputEqualizationObserver node, and replace it
with the `mul` node.

For every WeightEqualizationObserver, we will do the following:
1. Get the next equalization scale (we may need this for equalizing
connected linear layers).
2. Scale the weights by multiplying it with the reciprocal of the
current equalization scale and the next equalization scale

Currently, this supports models with `nn.Linear` layers, but does not
support connecting linear layers.

Test Plan:
`python test/test_quantization.py
TestEqualizeFx.test_input_weight_equalization_convert`

Original Model:
```
.LinearModule(
  (linear): Linear(in_features=2, out_features=2, bias=True)
)
```

Graph after `prepare_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0 : [#users=1] = call_module[target=x_equalization_process_0](args = (%x,), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%x_equalization_process_0,), kwargs = {})
    %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    return linear_activation_post_process_0
```

Graph after equalization functions:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {})
    %x_activation_post_process_0 : [#users=1] = call_module[target=x_activation_post_process_00](args = (%mul,), kwargs = {})
    %linear : [#users=1] = call_module[target=linear](args = (%x_activation_post_process_0,), kwargs = {})
    %linear_activation_post_process_0 : [#users=1] = call_module[target=linear_activation_post_process_0](args = (%linear,), kwargs = {})
    return linear_activation_post_process_0
```

Graph after `convert_fx`:
```
graph():
    %x : [#users=1] = placeholder[target=x]
    %x_equalization_process_0_scale : [#users=1] = get_attr[target=x_equalization_process_0_scale]
    %mul : [#users=1] = call_function[target=torch.mul](args = (%x, %x_equalization_process_0_scale), kwargs = {})
    %linear_input_scale_0 : [#users=1] = get_attr[target=linear_input_scale_0]
    %linear_input_zero_point_0 : [#users=1] = get_attr[target=linear_input_zero_point_0]
    %quantize_per_tensor : [#users=1] = call_function[target=torch.quantize_per_tensor](args = (%mul, %linear_input_scale_0, %linear_input_zero_point_0, torch.quint8), kwargs = {})
    %linear : [#users=1] = call_module[target=linear](args = (%quantize_per_tensor,), kwargs = {})
    %dequantize : [#users=1] = call_method[target=dequantize](args = (%linear,), kwargs = {})
    return dequantize
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29135358

fbshipit-source-id: 2d00056729041318463de61841483490b6bfeee5
2021-06-22 20:43:30 -07:00
7589d9c58b Enable rcb lookup for typing (#60413)
Summary:
-----------

For FX traced models, types from typing modules are not available during the lookup for the function to be traced. Because of which the resolving the type results to a None type object. By enabling lookup for `typing` module in `_jit_internal.py`, we can mitigate this issue with FX_Tracing and scripting.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60413

Test Plan:
--------
with-proxy python test/test_jit.py -k TestPDT.test_fx_tracing_with_typing

Reviewed By: bhosmer

Differential Revision: D29314531

Pulled By: nikithamalgifb

fbshipit-source-id: 1aa651430b1074c7e6fa74ba02bbcc4e1b00b01b
2021-06-22 18:53:19 -07:00
135e203e5e avoid unnecessary copies in MultiDispatchKeySet (#60093)
Summary:
The code would previously pass Generator & optional<Tensor> by value.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60093

Reviewed By: swolchok

Differential Revision: D29310624

Pulled By: bhosmer

fbshipit-source-id: fb4a9740a57ef319aaf7c778d51430907a7c0cc5
2021-06-22 18:44:06 -07:00
4887c6e401 [quant] avoid resize calls in observer/fake_quant (#60386)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60386

During QAT we sometimes encounter errors with scripted models
`RuntimeError: cannot resize variables that require grad`

For per-tensor cases we don't need to resize some buffers so this PR removes the extra resize ops where applicable

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29271905

fbshipit-source-id: 01a484a9559a3a4180490f9476d0cd3044ba0d1b
2021-06-22 17:41:43 -07:00
d3ae3e07aa parse_reports() should include hidden files (#60404)
Summary:
Not sure why there are report files starting with `.`, but in that case
`glob('**/*.xml')` should not be used, as it will skip those

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60404

Reviewed By: samestep

Differential Revision: D29276459

Pulled By: malfet

fbshipit-source-id: 8e131c38013425ad786e0a9ca0c0a43e57b1679a
2021-06-22 15:53:00 -07:00
986a88056c Remove some unused variables (#60411)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60411

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29221207

fbshipit-source-id: da6ad44036291a98f0b36b260062d077a7c2691b
2021-06-22 15:44:33 -07:00
36d4062a62 Fix some variable types (#60414)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60414

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29221183

fbshipit-source-id: f855efca2fd08844de65d0f9ef73bcceffee657e
2021-06-22 15:44:31 -07:00
7d779f84a3 Fix some loop types (#60415)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60415

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29221174

fbshipit-source-id: 9bc56655f198f6eb95e6b2e7a4f0573a2cd2f9a1
2021-06-22 15:43:10 -07:00
6e926f1303 Fix lint (#60472)
Summary:
This PR fixes the `mypy` failure introduced by [`numpy` 1.21.0](https://github.com/numpy/numpy/releases/tag/v1.21.0) (by pinning `numpy` to 1.20, at least for now) and the `quick-checks` failure introduced by https://github.com/pytorch/pytorch/issues/60405.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60472

Test Plan: The Lint workflow in GitHub Actions.

Reviewed By: walterddr

Differential Revision: D29313009

Pulled By: driazati

fbshipit-source-id: 53fd0e0549c26be5fc5d3c502c5891c56c83a32c
2021-06-22 14:48:07 -07:00
0c916c8a4e up the priority of numpy array comparisons in self.assertEqual (#59067)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58988.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59067

Reviewed By: jbschlosser

Differential Revision: D28986642

Pulled By: heitorschueroff

fbshipit-source-id: 3ef2d26b4010fc3519d0a1a020ea446ffeb46ba0
2021-06-22 13:07:07 -07:00
82c52fd417 Do not wrap Tensor.{grad,_base} by default (#60464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60464

Fixes https://github.com/szagoruyko/pytorchviz/issues/65

An alternate implementation of this PR would be to remove the
__torch_function__ interposition points for these accessors entirely.
In the end, I decided to opt for extra expressivity.  See
torch.overrides for the criterion on how I decided which accessors
should get the nowrap treatment.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29302835

Pulled By: ezyang

fbshipit-source-id: fbe0ac4530a6cc9d6759a3fdf5514d4d7b1f7690
2021-06-22 12:49:23 -07:00
f42140cb8a Disable warn_unused_ignores again (#60480)
Summary:
Fixes https://github.com/pytorch/pytorch/pull/60006#issuecomment-866130657.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60480

Test Plan: Run `mypy --config mypy-strict.ini` with [`ruamel.yaml`](https://pypi.org/project/ruamel.yaml/) installed.

Reviewed By: zhouzhuojie

Differential Revision: D29307823

Pulled By: samestep

fbshipit-source-id: 97fa4b7dad0465c269411c48142b22ce751bf830
2021-06-22 12:42:37 -07:00
6a87e8d087 Implement erfcx() (#58194)
Summary:
Implement erfcx() https://github.com/pytorch/pytorch/issues/31945

Reference: https://github.com/pytorch/pytorch/issues/50345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58194

Reviewed By: ngimel

Differential Revision: D29285979

Pulled By: mruberry

fbshipit-source-id: 5bcfe77fddfabbeb8c8068658ba6d9fec6430399
2021-06-22 12:38:38 -07:00
b34965435d Improve testing of inplace views (#59891)
Summary:
Partially addresses https://github.com/pytorch/pytorch/issues/49825 by improving the testing
 - Rename some of the old tests that had "inplace_view" in their names, but actually mean "inplace_[update_]on_view" so there is no confusion with the naming
 - Adds some tests in test_view_ops that verify basic behavior
 - Add tests that creation meta is properly handled for no-grad, multi-output, and custom function cases
 - Add test that verifies that in the cross dtype view case, the inplace views won't be accounted in the backward graph on rebase as mentioned in the issue.
 - Update inference mode tests to also check in-place

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59891

Reviewed By: albanD

Differential Revision: D29272546

Pulled By: soulitzer

fbshipit-source-id: b12acf5f0e3f788167ebe268423cdb58481b56f6
2021-06-22 12:28:09 -07:00
20bda0057e [caffe2/utils] Add explicit rule to avoid package boundary violation
Summary:
Add a rule to wrap proto_utils.h and depend on that, rather than
relying on a glob which violates package boundaries.

Reviewed By: igorsugak

Differential Revision: D29273453

fbshipit-source-id: 08f198a03d06ee2fdf61f5dbe1d0087db22aec8b
2021-06-22 12:22:24 -07:00
7c1bca9e94 [caffe2/utils] Add explicit rule to avoid package boundary violation
Summary:
Add a rule to wrap simple_queue.h and depend on that, rather than
relying on a glob which violates package boundaries.

Test Plan: `buck2 build fbcode//caffe2/caffe2:caffe2_core`

Reviewed By: igorsugak

Differential Revision: D29273415

fbshipit-source-id: f2b62a82cd6478bd71a8194d661d1c8b023c0953
2021-06-22 12:21:08 -07:00
7f2592195d Adds stream recording for cross-stream uses of gradients in streaming backward (#60230)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33909.

I _think_ the two recordDataPtrOnStreams i added are necessary and sufficient. They're the ones that worked for dmitrivainbrand's intricate multistream pipelining in https://github.com/pytorch/pytorch/issues/33909 and I can more or less convince myself they're enough, but it's hard to be sure (and hard to test).

PRing without a test now for visibility. I'll try to come up with something.

input_buffer.cpp needs to compile in cuda or cpu-only builds, so I can't call `c10::cuda::CUDACachingAllocator::recordStream` directly. I planned to work around by adding a binding in VirtualGuardImpl but https://github.com/pytorch/pytorch/pull/57047 spared me the trouble, thanks lw .

Recording a usage stream on a generic tensor was uglier than I expected, see https://github.com/pytorch/pytorch/issues/60306. Up to you guys if adding a unified way to record streams on a tensor backed by any TensorImpl should block this PR (and if so, whether it should happen in a separate PR or as part of this PR).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60230

Reviewed By: mrshenli

Differential Revision: D29289392

Pulled By: albanD

fbshipit-source-id: 1339d382b7d238a461b082597b3962847b5201fe
2021-06-22 12:16:07 -07:00
c7d0e9da0a Add pyproject.toml (#60408)
Summary:
This makes PyTorch conform to [PEP 517](https://www.python.org/dev/peps/pep-0517/) and [PEP 518](https://www.python.org/dev/peps/pep-0518/) by explicitly stating that we use [`setuptools`](https://setuptools.readthedocs.io/). It also follows up on https://github.com/pytorch/pytorch/pull/60119#pullrequestreview-685791812 by moving our [`isort`](https://pycqa.github.io/isort/) config into the new `pyproject.toml` file. I didn't move any of our other tool configs into `pyproject.toml` in this PR because:

- `.flake8` is assumed to exist in its current format for `tools/actions_local_runner.py` to work
- `mypy.ini` is not our only `mypy` config
- `pytest.ini` has detailed comments on `addopts` which [would have to be removed](https://github.com/toml-lang/toml/issues/340#issuecomment-122164501) in TOML because that setting is [a string, not an array](https://docs.pytest.org/en/6.2.x/customize.html#pyproject-toml)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60408

Reviewed By: 1ntEgr8

Differential Revision: D29277327

Pulled By: samestep

fbshipit-source-id: 3f2e63f6cf9024f8c534cb13a0d854a75609c5ba
2021-06-22 12:12:36 -07:00
1abf45e37f Revert D29241736: [pytorch][PR] To add Rectified Adam Algorithm to Optimizers
Test Plan: revert-hammer

Differential Revision:
D29241736 (0d2a936176)

Original commit changeset: 288b9b1f3125

fbshipit-source-id: 56c4ec98647c6f1822b130726741a1c9ca193670
2021-06-22 12:08:31 -07:00
99ca2c5b4b Migrates nll_loss_backward from TH to Aten (CUDA) (#60299)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24609
Aten Umbrella issue https://github.com/pytorch/pytorch/issues/24507
Related to https://github.com/pytorch/pytorch/issues/59765

There are no performance differences when running the following benchmark:

<details>
 <summary>Benchmark script</summary>

```python
import torch
import torch.nn as nn
import time

torch.manual_seed(0)

def _time():
    torch.cuda.synchronize()
    MS_PER_SECOND = 1000
    return time.perf_counter() * MS_PER_SECOND

device = "cuda"
C = 30
softmax = nn.LogSoftmax(dim=1)
n_runs = 250

for reduction in ["none", "mean", "sum"]:
    for N in [100_000, 500_000, 1_000_000]:
        elapsed = 0
        for i in range(n_runs):
            data = torch.randn(N, C, device=device, requires_grad=True)
            target = torch.empty(N, dtype=torch.long, device=device).random_(0, C)
            loss = nn.NLLLoss(reduction=reduction)
            input = softmax(data)
            result = loss(input, target)

            if reduction == "none":
                gradient = torch.randn(N, device=device)
            else:
                gradient = torch.randn(1, device=device).squeeze()

            t1 = _time()
            result.backward(gradient)
            t2 = _time()
            elapsed = elapsed + (t2 - t1)
        elapsed_avg = elapsed / n_runs
        print(
            f"input size({N}, {C}), reduction: {reduction} "
            f"elapsed time is {elapsed_avg:.2f} (ms)"
        )
    print()

```

</details>

## master

```
input size(100000, 30), reduction: none elapsed time is 0.19 (ms)
input size(500000, 30), reduction: none elapsed time is 0.83 (ms)
input size(1000000, 30), reduction: none elapsed time is 1.66 (ms)

input size(100000, 30), reduction: mean elapsed time is 1.50 (ms)
input size(500000, 30), reduction: mean elapsed time is 7.19 (ms)
input size(1000000, 30), reduction: mean elapsed time is 14.35 (ms)

input size(100000, 30), reduction: sum elapsed time is 1.49 (ms)
input size(500000, 30), reduction: sum elapsed time is 7.17 (ms)
input size(1000000, 30), reduction: sum elapsed time is 14.21 (ms)
```

## this PR

```
input size(100000, 30), reduction: none elapsed time is 0.19 (ms)
input size(500000, 30), reduction: none elapsed time is 0.83 (ms)
input size(1000000, 30), reduction: none elapsed time is 1.66 (ms)

input size(100000, 30), reduction: mean elapsed time is 1.48 (ms)
input size(500000, 30), reduction: mean elapsed time is 7.16 (ms)
input size(1000000, 30), reduction: mean elapsed time is 14.29 (ms)

input size(100000, 30), reduction: sum elapsed time is 1.49 (ms)
input size(500000, 30), reduction: sum elapsed time is 7.15 (ms)
input size(1000000, 30), reduction: sum elapsed time is 14.18 (ms)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60299

Reviewed By: albanD

Differential Revision: D29287613

Pulled By: ngimel

fbshipit-source-id: 21e15f2c518087e9fb797a379e1e0a3508c98509
2021-06-22 12:04:07 -07:00
fca931d181 List striding with arbitrary step size (#58537)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58537

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D28531721

Pulled By: tugsbayasgalan

fbshipit-source-id: 8c8ed32ca00366603bfb5086e87dfa62736ff4b2
2021-06-22 11:25:23 -07:00
df8a8fbc1b Improve code and documentation clarity for DataPipes APIs (#60423)
Summary:
Fixes issues that are discussed with ezyang in the comments of PR https://github.com/pytorch/pytorch/issues/59498

Improved code and documentation clarity, and refactored .filter to nesting_level directly

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60423

Reviewed By: ezyang

Differential Revision: D29281599

Pulled By: NivekT

fbshipit-source-id: a9bbaf52f492db0741c00f3ceb4022b08ddb1506
2021-06-22 11:19:08 -07:00
71b83c27e2 [pruning] Move pruning directory into experimental folder (#60395)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60395

Experimental folder so other developers know this is work in progress

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1KGJD

Reviewed By: z-a-f

Differential Revision: D29272319

fbshipit-source-id: 93eeeceba0376753efc9a5bb69a155278ceb2fca
2021-06-22 11:08:48 -07:00
f75ea51e67 [pruning] Move pruning files to their own directory (#60293)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60293

Move pruning files to their own directory

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`
https://pxl.cl/1KCfz

Reviewed By: z-a-f

Differential Revision: D29238159

fbshipit-source-id: 0173a278b39ff5ee4cbd54f333f558b6fe412be5
2021-06-22 11:08:47 -07:00
b25db5251a [pruning] Base pruner class (#60278)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60278

Implemented `PruningParametrization`, which removes pruned rows, and `BasePruner`, which is the base class for structured pruning.

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1KC2n

Reviewed By: z-a-f

Differential Revision: D29208349

fbshipit-source-id: f34e8e258bf13fa80292c2bd64d56f5ad1e72b6a
2021-06-22 11:07:31 -07:00
31a884987d Remove some TH includes from ATen (#60323)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60323

Test Plan: Imported from OSS

Reviewed By: malfet, anjali411

Differential Revision: D29252862

Pulled By: ngimel

fbshipit-source-id: 9ea13495d382c04dfd52b8dd63314f53b7e83936
2021-06-22 10:55:17 -07:00
0d2a936176 To add Rectified Adam Algorithm to Optimizers (#58968)
Summary:
Fixes : https://github.com/pytorch/pytorch/issues/24892

In the paper : https://arxiv.org/pdf/1908.03265.pdf  Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm.

It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process.

Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high.

Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well :

2f03dd1970/radam/radam.py (L156)

f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968

Reviewed By: gchanan

Differential Revision: D29241736

Pulled By: iramazanli

fbshipit-source-id: 288b9b1f3125fdc6c7a7bb23fde1ea5c201c0448
2021-06-22 10:38:41 -07:00
0126f42841 [complex] torch.sigmoid: CUDA support and complex autograd support (#48647)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48552

**Changes**

* Complex support for `torch.sigmoid` CUDA (CPU support already exists)
* Complex autograd support for `torch.sigmoid` (CUDA and CPU)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48647

Reviewed By: H-Huang

Differential Revision: D29163012

Pulled By: anjali411

fbshipit-source-id: 0cac0412355312675bee1cc46e090be7351d5dac
2021-06-22 10:35:00 -07:00
567e6d3a87 Remove Caffe2 thread-pool leak warning (#60318)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57273.

Some users reported that they dislike the Caffe2 thread-pool leak warning, as it floods their logs, and have requested disabling it, or have asked for a way to filter it.

It seems caffe2 pthreadpool already exists because of some dependency in the binary distribution, so `torch.set_num_threads()` invocation isn't required to reproduce the issue (as is otherwise the case when building from the master branch).

https://github.com/pytorch/pytorch/issues/60171's test script does have a `set_num_threads` invocation & hence that's why I was able to reproduce the issue after building from the master branch's source code.

cc malfet & ejguan, who have the authority to make a decision.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60318

Reviewed By: albanD

Differential Revision: D29265771

Pulled By: ezyang

fbshipit-source-id: 26f678af2fec45ef8f7e1d39a57559790eb9e94b
2021-06-22 10:26:55 -07:00
91451369ed require non-empty inputs to grad() calls in the API (#52016)
Summary:
The grad() function needs to return the updated values, and hence
needs a non-empty inputs to populate.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52016

Test Plan:
Passes Python and C++ unit tests, and added new tests to catch this behavior.

Fixes https://github.com/pytorch/pytorch/issues/47061

Reviewed By: albanD

Differential Revision: D26406444

Pulled By: dagitses

fbshipit-source-id: 023aeca9a40cd765c5bad6a1a2f8767a33b75a1a
2021-06-22 10:10:58 -07:00
729f7cd52f Implement histogram operator on CPU (#58780)
Summary:
The existing [torch.histc](https://pytorch.org/docs/stable/generated/torch.histc.html) operator is limited in comparison to [numpy.histogram](https://numpy.org/doc/stable/reference/generated/numpy.histogram.html). This PR adds torch.histogram on CPU. The new operator replicates numpy.histogram's behavior, including support for caller-specified bin edges and weights. It was motivated by previous community requests for histogram.

The implementation was [benchmarked](https://docs.google.com/spreadsheets/d/1xCR0jODchVvwdVSAjiLsNCkmyictA6j1LNfDpWOafjw/edit?usp=sharing) against numpy.histogram as well as torch.histc. This implementation is weakly faster than numpy.histogram across all types of inputs tested, and performs in line with torch.histc for the limited inputs histc supports.

mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58780

Test Plan:
Added unit tests, OpInfo for the new torch.histogram operator.

Tested execution time on a variety of input sizes and compared to numpy.histogram performance: https://docs.google.com/spreadsheets/d/1xCR0jODchVvwdVSAjiLsNCkmyictA6j1LNfDpWOafjw/edit?usp=sharing

Reviewed By: ezyang

Differential Revision: D29134626

Pulled By: saketh-are

fbshipit-source-id: f2773085de1697f6bc6ffdeffe9a81267f51bdfc
2021-06-22 10:06:04 -07:00
3a56758e1f changed launch bound to fix col2im kernel (#60315)
Summary:
Changed launch bound for col2im kernel from 1024 to 512 to fix register spilling into local memory.

Perf comparison (using Nvidia Titan-V):

![Col2ImTimingData](https://user-images.githubusercontent.com/22803332/122627527-e0b1fc80-d064-11eb-83df-f2a1165cefcc.PNG)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60315

Reviewed By: albanD

Differential Revision: D29288113

Pulled By: ngimel

fbshipit-source-id: f78eb90941835700a1aef8e08fac6aff86dedfe9
2021-06-22 09:29:34 -07:00
926bb5d6be changed launch bounds, unrolled for loop for grid sampler 2d fwd and bwd (#60405)
Summary:
Changed launch bounds for grid sampler 2d fwd and bwd from 1024 to 256, added loop unrolling to fix register spilling into local memory.

Timing Data: (using Nvidia Titan-V)
Interpolation mode 2, padding 0, align corners False

![GridSampler2dTimingData](https://user-images.githubusercontent.com/22803332/122830305-01fd2d80-d29d-11eb-9cd3-7da533a03f33.PNG)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60405

Reviewed By: albanD

Differential Revision: D29288075

Pulled By: ngimel

fbshipit-source-id: 5e060f0c2d1cc0a3086718e6be263413dfa29689
2021-06-22 09:22:41 -07:00
23bb2ed00a Improve documentation for torch.set_rng_state (#60422)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59974 by improving documentation for the function torch.set_rng_state

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60422

Test Plan: Only a comment is being changed.

Reviewed By: bdhirsh

Differential Revision: D29281578

Pulled By: NivekT

fbshipit-source-id: 2c160f782438b7f91f16c44f06c342e8b8b8437b
2021-06-22 07:10:50 -07:00
700df82881 [PyTorch Edge] Update iOS readme to use lite interpreter (#59841)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59841

As lite interpreter moves to beta, it's recommended to let users start using it.
ghstack-source-id: 131766778

Test Plan: CI

Reviewed By: husthyc

Differential Revision: D29048350

fbshipit-source-id: 54d2ad09b4e9475304522c80b358647bcea79b14
2021-06-22 02:17:04 -07:00
15dc320cae Fix lint build (#60438)
Summary:
per title

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60438

Reviewed By: ngimel

Differential Revision: D29288175

Pulled By: mruberry

fbshipit-source-id: f59b579b1793fdb1d298109c2bef0a70badb37b4
2021-06-22 00:11:55 -07:00
0585daae83 fixed launch bounds for gathertopk kernel (#60314)
Summary:
Changed launch bounds for gatherTopK kernel to fix register spilling into local memory.

Comparison (Nvidia Titan-V GPU):

Args: Input size as below, k=32, dim=None

![TopKTimingData](https://user-images.githubusercontent.com/22803332/122624922-46978780-d057-11eb-9b52-d5786da432c0.PNG)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60314

Reviewed By: mruberry

Differential Revision: D29267789

Pulled By: ngimel

fbshipit-source-id: 4056efb2e44e5527786167af66a127504980a3af
2021-06-21 22:24:44 -07:00
45ae2e7863 Set TORCH_WARN_ONCE to always warn inside of assertNotWarn (#60020)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60020

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D29249909

Pulled By: mruberry

fbshipit-source-id: 10a8d5c05bd8d4aec345f70b132efd3623601f6a
2021-06-21 21:35:54 -07:00
5d476f5b95 Fix FFT documentation examples and run doctests in the test suite (#60304)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59514

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60304

Reviewed By: anjali411

Differential Revision: D29253980

Pulled By: mruberry

fbshipit-source-id: 0654f00197e5fae338aa8edf0b61ef5692cdaa7e
2021-06-21 20:47:25 -07:00
5921b5480a ensure xml report path are relative to */pytorch/test (#60380)
Summary:
Changes the approach.

Root cause of this is for some reason: `inspect.getfile` returns absolute path instead of relative path to `os.getcwd` in newer python version. we sanitize this by removing the CI_PREFIX if applies

See:
https://app.circleci.com/pipelines/github/pytorch/pytorch/339568/workflows/43cac71c-759e-471f-83c2-d59c152dcd8a/jobs/14278585 vs. https://app.circleci.com/pipelines/github/pytorch/pytorch/339568/workflows/43cac71c-759e-471f-83c2-d59c152dcd8a/jobs/14278285

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60380

Test Plan:
CI

Plot twist:

windows tests are actually launched via
```
pushd test
python run_test.py
```
while linux/macos tests are
```
python test/run_test.py
```
This might cause problem when using `os.getcwd()` we will see from PR CI results.

Reviewed By: malfet

Differential Revision: D29276969

Pulled By: walterddr

fbshipit-source-id: 336c2805d0c92733e0ff4c309ff2044dc2ed4e21
2021-06-21 20:47:23 -07:00
9b30fb8528 add support for constant (#60166)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58739 Add support for constants according to python array API stipulation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60166

Reviewed By: anjali411

Differential Revision: D29253958

Pulled By: mruberry

fbshipit-source-id: 0bc86b74d3a4eb3ec4a65c941ec2710747402db1
2021-06-21 20:47:21 -07:00
1764aa79b9 restore JOB_BASE_NAME for test1 and test2 in test.sh (#60409)
Summary:
JOB_BASE_NAME for test1 and test2 were removed by https://github.com/pytorch/pytorch/issues/60124.  This caused the ROCm CI to run all tests for both test1 and test2.  Restore the use of JOB_BASE_NAME.

Fixes https://github.com/pytorch/pytorch/issues/60377.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60409

Reviewed By: anjali411

Differential Revision: D29277560

Pulled By: walterddr

fbshipit-source-id: ddf01466492a9a626ce1b6adf87cd102d8f1fe35
2021-06-21 20:46:17 -07:00
7d39608a29 split TestAsserts by functionality (#58919)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58919

Instead of having one large TestAsserts test case, we split of tests for
self-contained functionality like container or complex checking into
separate test cases. That makes it a lot easier to keep an overview over
what is tested.

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D29259407

Pulled By: mruberry

fbshipit-source-id: 9769cb6d56c1a3790280542db398cb247986b09a
2021-06-21 20:44:23 -07:00
14b0191d1f make assert_equal an example how to partial torch.testing.assert_close (#58918)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58918

~Instead of a distinct `torch.testing.assert_close` and `torch.testing.assert_equal`, this makes `torch.testing.assert_equal` a special case of `torch.testing.assert_close` for `rtol=atol=0`. In this case the closeness definition `abs(actual - expected) <= atol + rtol * abs(expected)` boils down to `abs(actual - expected) <= 0`. Since `abs(x)` can never be `<0`, this is equivalent to `abs(a - b) == 0` and this again boils down to `a == b`.~

Following https://github.com/pytorch/pytorch/pull/58918#issuecomment-860642057 and some offline discussions, we opted to use `assert_equal` as an example how to `partial` it.

This makes maintaing the module a lot easier, because we don't need to keep two functions in sync.

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D29259404

Pulled By: mruberry

fbshipit-source-id: fa1a1fa93672a7ed1c5f0e4beb0dcd45b5c14fce
2021-06-21 20:44:21 -07:00
583f072778 introduce TestingErrorMeta for internal use (#58917)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58917

In #54780 we opted to return `Optional[Exception]` from all internal
helper functions. Since then multiple PRs added functionality that needs
to amend the error message. For this we recreate the error

09a1b1cf87/torch/testing/_asserts.py (L417-L430)

To untangle this a little, this PR introduces the `_TestingErrorMeta`,
which carries the exception type and the message. The idiom

```python
exc = check_foo():
if exc:
    return exc
```

is still valid although `exc` should be renamed to `error_meta` to
reflect the new nature. In the top-level functions
`assert_(equal|close)`

```python
exc = check_foo():
if exc:
    raise exc
```

changes to

```python
error_meta = check_foo():
if error_meta:
    raise error_meta.to_error()
```

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D29259405

Pulled By: mruberry

fbshipit-source-id: 9078fe326283d5aa3d0cf256bf007887df9bfbfb
2021-06-21 20:44:20 -07:00
cf789b9941 remove pytest.UsageError (#58916)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58916

Using `pytest.UsageError` in case `pytest` is available adds almost
nothing as observed in
https://github.com/pytorch/pytorch/pull/53820#discussion_r593868752, but
makes it harder to maintain: due to the conditional import, `mypy` is
not able to handle `UsageError` in a type annotation.

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D29259409

Pulled By: mruberry

fbshipit-source-id: 82b00d13fa47db77383996d0caa69177804a48b6
2021-06-21 20:44:18 -07:00
9fffd05e54 hide top-level test functions from pytest's traceback (#58915)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58915

History:

- It was included for internal helper functions in the initial proposal
  in #53820
- It was removed in #54780, since it is not honored when used with
  `pytest`'s `--tb=native`, which is the default for PyTorch

Since PyTorch shouldn't be the only user of `assert_(equal|close)` we
add it here to the top-level functions `assert_(equal|close)`. If
`pytest` is used without `--tb=native`, the traceback for

```python
assert torch.eq(actual, expected), "Tensors are not equal!"
torch.testing.assert_equal(actual, expected)
```

looks the same, making it more concise.

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D29259406

Pulled By: mruberry

fbshipit-source-id: acee47b30b7f14def27433f7d56a4b19d77393c0
2021-06-21 20:44:16 -07:00
18d45b960b remove rouge raise in helper function (#58914)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58914

Only the top-level functions `assert_(equal|close)` should raise the
exception to keep the traceback managable.

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D29259408

Pulled By: mruberry

fbshipit-source-id: 40dd52eec6f9e8166b3b239d5172ee44b749e8dc
2021-06-21 20:43:06 -07:00
dca97b4394 Weighted decay with frequency (count-based) (#60382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60382

Instead of setting weight_decay w uniformly for all ids, for each row i in the sparse embedding table, the actual weight_decay `w_i` becomes `w*freq_i` where `freq_i = halflife/counter_i \in [\log(2), halflife]`. Counter is from `rowwise_counter` with definition `counter_i = 1 + \exp(-iter_{\delta}*\rho)*counter_i`.

Test Plan:
buck test //caffe2/caffe2/python/operator_test:adagrad_test -- test_row_wise_sparse_adagrad

buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_weight_decay

Reviewed By: 0x10cxR1

Differential Revision: D25581030

fbshipit-source-id: 54b3831b20516c76c559b13d8deb809e2ee3b446
2021-06-21 18:46:35 -07:00
8f03018980 [pytorch] Move signal handler test to internal codebase (#60394)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60394

Move signal handler test to internal codebase

Github issue: https://github.com/pytorch/pytorch/issues/60260

Test Plan:
buck test mode/dev-nosan //caffe2/test/distributed/elastic/multiprocessing:api_test

    buck test mode/dev-nosan //caffe2/torch/distributed/elastic/multiprocessing/fb/test:api_test

Reviewed By: cbalioglu

Differential Revision: D29273160

fbshipit-source-id: e4ae72f7f6d54cbba324119fce7446a30a6c37c9
2021-06-21 18:26:41 -07:00
af3f7a210a add BFloat16 support for kthvalue and median on CPU (#60074)
Summary:
Add BFloat16 support for kthvalue and median on CPU

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60074

Reviewed By: gchanan

Differential Revision: D29230348

Pulled By: heitorschueroff

fbshipit-source-id: fa9c086758d51069acf270faa526e4b141b0ef68
2021-06-21 17:52:18 -07:00
2606022d01 [package] fix for edge case os and os.path importing (#60276)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60276

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D29234143

Pulled By: Lilyjjo

fbshipit-source-id: 4d96dde4ef1d84f9966f9f58c883ab9bb92fe728
2021-06-21 16:54:02 -07:00
25e077bce1 [Issue 59296] added VE device (#59620)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59296

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59620

Reviewed By: zou3519

Differential Revision: D29196830

Pulled By: ezyang

fbshipit-source-id: 7bb49f776dc755804a0ba0bc3a7dbdab9c93914e
2021-06-21 16:44:52 -07:00
9d1d799034 Added API to change logging levels for JIT (#58821)
Summary:
Description:
- Before this, logging level could only be changed by changing the env
variable "PYTORCH_JIT_LOG_LEVEL"
    - Can change the level from python now
- Have not added stream configuration for now
- Configuration is stored in a singleton class managing the options

Issue Link: https://github.com/pytorch/pytorch/issues/54188

Gotchas:
- Created separate functions
`::torch::jit::get_jit_logging_levels/set_jit_logging_levels` instead of
using the singleton class's method directly
    - This is because when running test cases, two different instances
    of the singleton are created for the test suite and the actual code
    (`jit_log.cpp`)
    - On using these methods directly, `is_enabled` calls the singleton
    in `jit_log.cpp` while we are setting the config using another
    singleton
    - See: https://stackoverflow.com/questions/55467246/my-singleton-can-be-called-multiple-times

API:
- To set the level: `torch._C._jit_set_logging_option("level")`
- To get the level: `torch._C._jit_get_logging_option()`

Testing:
- UTs were added for C++
- A very simple UT was added for python to just check if the API is
being called correctly
- The API was checked by running trace in a sample python file
    - Set env variable to "" and used `_jit_set_logging_option` in python to set the variable to `>dead_code_elimination`
    - The error output had logs of form [DUMP..] [UPDATE...] etc

Fixes https://github.com/pytorch/pytorch/issues/54188

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58821

Reviewed By: soulitzer

Differential Revision: D29116712

Pulled By: ZolotukhinM

fbshipit-source-id: 8f2861ee2bd567fb63b405953d035ca657a3200f
2021-06-21 16:10:49 -07:00
82a6574d89 cmake: Use BUILD_INTERFACE with TORCH_SRC_DIR (#60403)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60403

TORCH_SRC_DIR has the potential to be hardcoded thus breaking downstream
cmake extensions. Prefer CMAKE_CURRENT_SOURCE_DIR with BUILD_INTERFACE
to make it magically work together

See https://cmake.org/cmake/help/latest/command/target_include_directories.html

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: samestep

Differential Revision: D29276503

Pulled By: seemethere

fbshipit-source-id: 6ec0754de6a02cdc35a4a453d6271ac4fdfc5ee3
2021-06-21 15:37:27 -07:00
8dd1dc89cb [PyTorch][Edge] Adding tests for lite quantized models (#60226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60226

# Context
Read this posts for details about why we need a test bench for quantized lite modules
https://fb.workplace.com/groups/2322282031156145/permalink/4289792691071726/

# This Diff
Adds test cases for Quantized Lite modules
ghstack-source-id: 131859101

Test Plan:
```
[ ~/fbsource/fbcode] buck test caffe2/test:mobile -- mobile.test_lite_script_module.TestLiteScriptQuantizedModule
Unable to connect to Buck daemon, restarting it...

Running with tpx session id: 44cf0b2f-0905-444a-95df-4a2eec774163
Trace available for this run at /tmp/tpx-20210618-093849.343917/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7036874461151326
    ✓ ListingSuccess: caffe2/test:mobile - main (16.736)
    ✓ Pass: caffe2/test:mobile - test_two_layer (mobile.test_lite_script_module.TestLiteScriptQuantizedModule) (14.836)
    ✓ Pass: caffe2/test:mobile - test_annotated_nested (mobile.test_lite_script_module.TestLiteScriptQuantizedModule) (15.073)
    ✓ Pass: caffe2/test:mobile - test_quantization_example (mobile.test_lite_script_module.TestLiteScriptQuantizedModule) (16.286)
    ✓ Pass: caffe2/test:mobile - test_single_layer (mobile.test_lite_script_module.TestLiteScriptQuantizedModule) (18.360)
Summary
  Pass: 4
  ListingSuccess: 1
```

https://www.internalfb.com/intern/testinfra/testconsole/testrun/7036874461151326/

Reviewed By: iseeyuan

Differential Revision: D29212232

fbshipit-source-id: 8d0b61b3f414e31720f1e3ce681ec8fa716555c1
2021-06-21 15:09:42 -07:00
5bd49c3396 fix workflow id usage in GHA (#60376)
Summary:
This fixes: https://github.com/pytorch/pytorch/issues/60139

GHA workflow ID is set to `run_id` previously and it doesn't change across re-runs
see: https://docs.github.com/en/actions/reference/environment-variables#default-environment-variables

Using GITHUB_RUN_NUMBER to report workflow ID instead.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60376

Test Plan:
CI
see: [with rerun](https://github.com/pytorch/pytorch/actions/runs/952508536) and [without rerun](https://github.com/pytorch/pytorch/actions/runs/955665324 ) example --> they reported everything under the same run ID but in fact the first one ran twice as many test cases reported in scuba. This shouldn't occur after this PR.

Reviewed By: samestep

Differential Revision: D29267455

Pulled By: walterddr

fbshipit-source-id: 00fc6b75b84861e2f7d3e21698a5f840c3c21dcd
2021-06-21 14:54:49 -07:00
1f50dc6e46 Fix ignoring Tensor properties in torch.overrides (#60050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60050

It doesn't work to put torch.Tensor.prop.__get__ in the ignored
list.  Now it does.  (Not exercised here, see next diff in stack).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D29171464

Pulled By: ezyang

fbshipit-source-id: e7354668b481f9275f2eb5bb3a6228d1815fecea
2021-06-21 14:49:51 -07:00
65f33ec85c Follow-up fix for compilation error on CUDA92 (#60287)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60287

Follow up of #60017

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D29236208

Pulled By: ejguan

fbshipit-source-id: f1acf9630b45fea8cbdf7d64e47661643d0a52b8
2021-06-21 13:29:11 -07:00
01e0296eb7 [special] migrate log1p, sinc, round to special namespace (#55878)
Summary:
Reference : https://github.com/pytorch/pytorch/issues/50345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55878

Reviewed By: zou3519, janeyx99

Differential Revision: D29160593

Pulled By: mruberry

fbshipit-source-id: f3ca9c541382bab33fb85d7817ce8ddc117c6826
2021-06-21 12:34:29 -07:00
769c299dcf [caffe2] add tests for inplace elementwise ops (#60106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60106

In Caffe2, some elementwise in-place compatible ops lack coverage for the in-place case. We add tests for a subset of them here and thereby increase coverage.

Test Plan:
```
buck test //caffe2/caffe2/python/operator_test:elementwise_ops_test
```
Let CI run.

Reviewed By: clrfb

Differential Revision: D29143189

fbshipit-source-id: 83138ad8eff8fe95c40aece53714da3577396a23
2021-06-21 12:04:18 -07:00
f66b53e8b2 Ignore unsupported attribute checker pass for torch.jit.trace (#60200)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60200

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D29207583

Pulled By: tugsbayasgalan

fbshipit-source-id: 241620209dbafc94ebdb83d99257e341b11e999b
2021-06-21 11:55:12 -07:00
b505adbb09 Fix typo in ChainDataset docs (#60336)
Summary:
* chainning -> chaining

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60336

Reviewed By: bdhirsh

Differential Revision: D29265236

Pulled By: anjali411

fbshipit-source-id: 17a9b73af9e094550bd1ee25bc9439fb8d455e2b
2021-06-21 11:47:21 -07:00
2f3be2735f Don't split oversize cached blocks (#44742)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35901

This change is designed to prevent fragmentation in the Caching Allocator.  Permissive block splitting in the allocator allows very large blocks to be split into many pieces.  Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned.   Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks'

Approach:

- Large blocks above a certain size are designated "oversize".  This limit is currently set 1 decade above large, 200 MB
- Oversize blocks can not be split
- Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block)
- In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated.  This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering

Initial performance tests show this is similar or quicker than the original strategy.  Additional tests are ongoing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742

Reviewed By: zou3519

Differential Revision: D29186394

Pulled By: ezyang

fbshipit-source-id: c88918836db3f51df59de6d1b3e03602ebe306a9
2021-06-21 11:46:08 -07:00
eaa36ee679 Enable sharding for Windows GHA CI (#59970)
Summary:
Enables sharding for Windows on CI. To make that possible, we currently remove the smoke tests tested in shard 1 which don't seem all that important as they are
1. tested on nightlies
2. seems to be tested anyway by running the test suite

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59970

Reviewed By: seemethere

Differential Revision: D29268484

Pulled By: janeyx99

fbshipit-source-id: 7f90d73037cfeb2c267b28714550316eb471b4dd
2021-06-21 11:42:22 -07:00
023907a6fe Allow Docker build on macOS (#60375)
Summary:
This PR allows developers using macOS to build Docker images locally. The `basename $(mktemp -u)` part was suggested by seemethere; I modified it slightly to appease ShellCheck and because [Docker doesn't allow uppercase characters in tags](https://stackoverflow.com/a/54291205).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60375

Test Plan:
On a Mac:
```
cd .circleci/docker
./build.sh pytorch-linux-xenial-py3.6-gcc5.4
```

Reviewed By: driazati

Differential Revision: D29267025

Pulled By: samestep

fbshipit-source-id: ba27d2fb108f573a50db069cf9ddea0414ed6074
2021-06-21 11:27:49 -07:00
27e34f731a Re-enable clang-tidy on PRs (#60297)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60297

This switches clang-tidy to the fresh tag from https://github.com/pytorch/test-infra/runs/2860763986 which has a fix for the missing OMP headers we were seeing. Along with #60225 this should restore clang-tidy to normal functionality and we shouldn't see any spurious warnings.

Test Plan: Imported from OSS

Reviewed By: seemethere, 1ntEgr8

Differential Revision: D29239783

Pulled By: driazati

fbshipit-source-id: b1893256fdb27436af03d6c5279e81f64b47fe6b
2021-06-21 11:04:09 -07:00
c16f87949f ENH Adds nn.ReflectionPad3d (#59791)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/27655

This PR adds a C++ and Python version of ReflectionPad3d with structured kernels. The implementation uses lambdas extensively to better share code from the backward and forward pass.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59791

Reviewed By: gchanan

Differential Revision: D29242015

Pulled By: jbschlosser

fbshipit-source-id: 18e692d3b49b74082be09f373fc95fb7891e1b56
2021-06-21 10:53:14 -07:00
f89ae9cb8d Moves grid_sampler to autocast promote list (#58618)
Summary:
Should close https://github.com/pytorch/pytorch/issues/42218

Numerically, `grid_sampler` is fine in fp16 or fp32, but takes several inputs and expects their dtypes to match, so it belongs on the autocast promote list.

`grid_sampler` currently uses `gpuAtomicAdd`, notoriously slow in fp16 because it calls cuda's atomicAdd __half overload which uses a software compare-and-swap loop internally. To allow good performance if both inputs happen to be FP16, the PR also modifies `grid_sampler_[2,3]d_backward_kernel`s to use `fastAtomicAdd` instead.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58618

Reviewed By: mruberry

Differential Revision: D29257199

Pulled By: ngimel

fbshipit-source-id: 3cc7505945b480427f2fc1beb36bee80bf3853b3
2021-06-21 10:22:36 -07:00
61e0bc1955 [nnc] Remove check on initializer in compressBuffer (#60194)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60194

Test Plan: Imported from OSS

Reviewed By: bertmaher, huiguoo

Differential Revision: D29206255

Pulled By: navahgar

fbshipit-source-id: 0a68ec4067c37f06ca1ea9ddeeb5ad5e0dcb0639
2021-06-21 09:57:37 -07:00
f2bb0932da [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D29259226

fbshipit-source-id: 15fd79f6fed38d6ed2d84018852806683d5a09fa
2021-06-21 03:57:10 -07:00
5ff407df67 Skips failing MacOS tests (#60348)
Summary:
Mitigates, but does not fix https://github.com/pytorch/pytorch/issues/60347.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60348

Reviewed By: ngimel

Differential Revision: D29257917

Pulled By: mruberry

fbshipit-source-id: de9be93ddeda1ca27ea2ff4650162f886d10f1e2
2021-06-21 01:35:36 -07:00
1dee99c973 LU Solve using cublas and cusolver (#59148)
Summary:
This PR introduces cuSOLVER and cuBLAS for the `lu_solve` routine. Solves a part of https://github.com/pytorch/pytorch/issues/47953.

Since usage of cuSOLVER with MAGMA introduces performance regressions in MAGMA (https://github.com/pytorch/pytorch/issues/56590), we use heuristics for determining when to call cuSOLVER, cuBLAS or MAGMA depending on the batch and matrix sizes. The 64-bit cuSOLVER API is not introduced in this PR since there are several problems with the LU factorization using cusolver (https://github.com/pytorch/pytorch/pull/59148).

The following are performance benchmarks using various configurations:

<details>

```
[--------------------------------------------------------- LU solve CUDA torch.float64 ----------------------------------------------------------]
                                     |  lu_solve CUSOLVER  |  lu_solve MAGMA  |  lu_solve CUBLAS  |  lu_solve cuSOLVER/MAGMA  |  lu_solve TEST ALL
1 threads: ---------------------------------------------------------------------------------------------------------------------------------------
      torch.Size([1, 1, 1])          |          703.4      |        489.8     |         511.8     |             710.1         |          487.1
      torch.Size([2, 1, 1])          |          738.9      |        504.1     |         513.0     |             958.2         |          494.4
      torch.Size([4, 1, 1])          |          790.7      |        514.7     |         506.8     |             983.9         |          540.2
      torch.Size([8, 1, 1])          |          865.3      |        496.4     |         514.7     |             975.2         |          520.0
      torch.Size([16, 1, 1])         |          955.5      |        483.9     |         508.3     |             937.6         |          526.5
      torch.Size([32, 1, 1])         |         1167.7      |        495.2     |         511.2     |             934.0         |          528.7
      torch.Size([64, 1, 1])         |         1730.0      |        492.1     |         537.8     |             936.4         |          533.2
      torch.Size([128, 1, 1])        |         2748.4      |        499.7     |         526.5     |             982.9         |          540.8
      torch.Size([1, 2, 2])          |          724.6      |        498.2     |         541.7     |             715.0         |          504.7
      torch.Size([2, 2, 2])          |          737.0      |        514.3     |         527.6     |             934.5         |          524.5
      torch.Size([4, 2, 2])          |          750.5      |        524.1     |         537.4     |             935.5         |          543.0
      torch.Size([8, 2, 2])          |          844.8      |        513.7     |         538.9     |             953.3         |          534.4
      torch.Size([16, 2, 2])         |         1013.1      |        521.9     |         530.0     |             932.2         |          537.9
      torch.Size([32, 2, 2])         |         1335.8      |        515.1     |         544.4     |             939.9         |          559.5
      torch.Size([64, 2, 2])         |         1819.6      |        511.8     |         534.1     |             973.9         |          540.0
      torch.Size([128, 2, 2])        |         3018.7      |        526.3     |         546.1     |             979.3         |          543.5
      torch.Size([1, 8, 8])          |          732.5      |        524.9     |         532.9     |             762.4         |          516.8
      torch.Size([2, 8, 8])          |          771.2      |        514.9     |         538.7     |            1007.5         |          531.1
      torch.Size([4, 8, 8])          |          811.3      |        507.7     |         534.6     |            1002.2         |          548.5
      torch.Size([8, 8, 8])          |          866.6      |        530.0     |         532.0     |            1016.1         |          562.9
      torch.Size([16, 8, 8])         |          991.8      |        533.6     |         548.0     |            1022.6         |          548.5
      torch.Size([32, 8, 8])         |         1271.7      |        541.2     |         534.7     |            1013.8         |          545.6
      torch.Size([64, 8, 8])         |         1817.2      |        530.2     |         520.6     |            1008.7         |          566.3
      torch.Size([128, 8, 8])        |         2678.7      |        531.6     |         552.2     |            1006.2         |          555.0
      torch.Size([1, 16, 16])        |          738.2      |        546.1     |         536.6     |             775.6         |          540.1
      torch.Size([2, 16, 16])        |          782.6      |        543.5     |         539.6     |            1010.9         |          541.1
      torch.Size([4, 16, 16])        |          815.2      |        546.1     |         560.9     |            1012.5         |          553.1
      torch.Size([8, 16, 16])        |          877.7      |        543.0     |         547.9     |            1012.8         |          551.5
      torch.Size([16, 16, 16])       |         1008.7      |        549.2     |         562.7     |            1016.6         |          546.8
      torch.Size([32, 16, 16])       |         1291.9      |        540.8     |         560.3     |            1055.8         |          539.3
      torch.Size([64, 16, 16])       |         1846.3      |        553.5     |         556.0     |            1010.8         |          551.9
      torch.Size([128, 16, 16])      |         2953.8      |        562.7     |         547.5     |            1026.2         |          555.8
      torch.Size([1, 32, 32])        |          789.1      |        590.6     |         590.9     |             790.5         |          579.0
      torch.Size([2, 32, 32])        |          806.9      |        596.6     |         600.2     |            1085.6         |          573.8
      torch.Size([4, 32, 32])        |          852.0      |        597.9     |         588.2     |            1098.9         |          574.7
      torch.Size([8, 32, 32])        |          914.2      |        597.8     |         591.4     |            1090.3         |          585.7
      torch.Size([16, 32, 32])       |         1063.0      |        604.6     |         597.3     |            1094.0         |          580.5
      torch.Size([32, 32, 32])       |         1302.0      |        602.0     |         598.9     |            1090.3         |          583.6
      torch.Size([64, 32, 32])       |         1861.7      |        601.1     |         599.8     |            1113.4         |          588.6
      torch.Size([128, 32, 32])      |         3251.0      |        619.6     |         595.3     |            1106.8         |          608.9
      torch.Size([1, 64, 64])        |          978.6      |        842.7     |         778.6     |            1071.4         |          825.8
      torch.Size([2, 64, 64])        |         1072.3      |        845.7     |         785.4     |            1400.6         |          829.0
      torch.Size([4, 64, 64])        |         1051.9      |        842.9     |         796.1     |            1352.2         |          788.2
      torch.Size([8, 64, 64])        |         1090.3      |        834.1     |         805.2     |            1382.6         |          804.7
      torch.Size([16, 64, 64])       |         1206.9      |        835.7     |         802.2     |            1365.6         |          801.2
      torch.Size([32, 64, 64])       |         1671.2      |        846.5     |         794.5     |            1345.1         |          814.2
      torch.Size([64, 64, 64])       |         2759.3      |        848.5     |         795.4     |            1409.7         |          832.9
      torch.Size([128, 64, 64])      |         4928.6      |        877.4     |         848.3     |            1439.0         |          883.9
      torch.Size([1, 128, 128])      |         1315.6      |       1158.4     |        1130.0     |            1301.3         |         1177.1
      torch.Size([2, 128, 128])      |         1334.7      |       1198.2     |        1186.6     |            1703.9         |         1209.5
      torch.Size([4, 128, 128])      |         1374.6      |       1200.7     |        1266.2     |            1640.6         |         1272.3
      torch.Size([8, 128, 128])      |         1453.6      |       1215.9     |        1287.3     |            1669.1         |         1288.7
      torch.Size([16, 128, 128])     |         1882.1      |       1244.9     |        1337.6     |            1698.8         |         1347.1
      torch.Size([32, 128, 128])     |         2789.0      |       1284.5     |        1398.6     |            1747.6         |         1396.3
      torch.Size([64, 128, 128])     |         4763.0      |       1425.2     |        1581.7     |            1921.0         |         1584.1
      torch.Size([128, 128, 128])    |         8835.9      |       1808.9     |        1968.7     |            2197.6         |         1961.8
      torch.Size([1, 512, 512])      |         4369.9      |       4577.6     |        4804.0     |            4331.4         |         4599.0
      torch.Size([2, 512, 512])      |         4635.9      |       4850.1     |        5159.1     |            5315.4         |         4845.5
      torch.Size([4, 512, 512])      |         5367.5      |       5261.6     |        6134.7     |            5807.8         |         5345.2
      torch.Size([8, 512, 512])      |         7025.2      |       6184.5     |        7065.6     |            6711.6         |         6303.9
      torch.Size([16, 512, 512])     |        10221.3      |       7849.7     |        8820.1     |            8323.6         |         7992.1
      torch.Size([32, 512, 512])     |        16574.8      |      11208.4     |       12284.3     |           11704.7         |        11394.4
      torch.Size([64, 512, 512])     |        29500.1      |      18043.1     |       19249.3     |           18744.0         |        18242.1
      torch.Size([128, 512, 512])    |        56783.3      |      33903.9     |       34713.5     |           33893.8         |        34041.8
      torch.Size([1, 1024, 1024])    |        14864.5      |      15714.6     |       16128.1     |           14726.7         |        14992.6
      torch.Size([2, 1024, 1024])    |        17891.0      |      18553.3     |       19111.6     |           19271.5         |        19283.0
      torch.Size([4, 1024, 1024])    |        22143.4      |      21909.2     |       23667.1     |           22698.9         |        22713.8
      torch.Size([8, 1024, 1024])    |        30621.1      |      28669.9     |       30822.9     |           29725.0         |        29760.8
      torch.Size([16, 1024, 1024])   |        47045.9      |      41900.0     |       44353.8     |           43215.6         |        43237.5
      torch.Size([32, 1024, 1024])   |        79245.5      |      68316.9     |       70959.0     |           69506.4         |        69876.7
      torch.Size([64, 1024, 1024])   |       147973.9      |     121120.6     |      124601.1     |          122084.4         |       122578.7
      torch.Size([128, 1024, 1024])  |       295586.2      |     232871.8     |      237421.8     |          233765.3         |       234704.6

Times are in microseconds (us).
```

</details>

Here's the details of how the tests were performed:
* CUSOLVER - Only call `cusolver` for all problem sizes.
* MAGMA - Only call `magma` for all problem sizes (this is the current master branch).
* CUBLAS - Only call `cublas` for all problem sizes.
* cuSOLVER / MAGMA - Use cusolver for `batch_size == 1` and magma for all others.
* TEST ALL - Employ heuristics to switch between cublas/cusolver/magma. This yields the best overall results (this PR).

Script for reproducing the results:

<details>

``` python

import torch
import pickle
import itertools
from torch.utils.benchmark import Timer
import sys

shapes = [1, 2, 8, 16, 32, 64, 128, 512, 1024]
batches = [(1,), (2,), (4,), (8,), (16,), (32,), (64,), (128,)]
results = []
num_threads = 1
dtype = torch.float64
repeats = 2

from torch.testing._internal.common_utils import random_hermitian_pd_matrix

def lu_factorize_solve(mat, b):
    lu_data = torch.lu(mat)
    x = torch.lu_solve(b, *lu_data)

for shape, batch in itertools.product(shapes, batches):
    mat = torch.randn(*batch, shape, shape, dtype=dtype, device='cuda')
    b = torch.randn(*batch, shape, 1, dtype=dtype, device='cuda')

    tasks = [("lu_factorize_solve(mat, b)", "lu_solve CUSOLVER")]

    print("shape: ", shape, " batch: ", batch)

    timers = [Timer(stmt=stmt, num_threads=num_threads, label=f"LU solve CUDA {dtype}",
                    sub_label=f"{mat.shape}", description=label, globals=globals()) for stmt, label in tasks]
    for i, timer in enumerate(timers * repeats):
        results.append(
            pickle.dumps(timer.blocked_autorange())
        )
        print(f"\r{i + 1} / {len(timers) * repeats}", end="")
        sys.stdout.flush()

f = open("cusolver_lu_solve.pickle", "wb")
pickle.dump(results, f)
f.close()
```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59148

Reviewed By: H-Huang

Differential Revision: D29160609

Pulled By: mruberry

fbshipit-source-id: 7280f25db1e66aa650ea15608a6dc5d688fb4db2
2021-06-20 21:27:35 -07:00
4a3eea9a6a [quant][graphmode][fx] Produce reference linear module in convert (#60152)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60152

Test Plan:
python test/test_quantization.py TestQuantizeFx

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29188263

fbshipit-source-id: f7bbbef5d4d747eadf7a627a4e77a5ec9bb0bc94
2021-06-20 20:08:12 -07:00
510334f34b [BE] clean up IS_PYTORCH_CI and IN_CI (#60279)
Summary:
`IS_PYTORCH_CI` and `IN_CI` are used randomly, however in some cases IN_CI is not currently set because it only exist in .circleci/scripts/setup_ci_environment.sh. This cleans up the 2 flags and only use IN_CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60279

Test Plan: CI

Reviewed By: seemethere

Differential Revision: D29239545

Pulled By: walterddr

fbshipit-source-id: a069424a2bb8790a3adfdaf0dc460301026bf8c7
2021-06-20 19:45:07 -07:00
2293ab4e53 [quant][graphmode][fx] Refactor convert for linear to use get_static_module_mapping and get_dynamic_module_mapping (#60151)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60151

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29188264

fbshipit-source-id: d2b77ffcf4b7446fc6c43248e43218092d2a6aea
2021-06-20 19:41:16 -07:00
a516424a70 Update internal code for torch.linalg.solve (#56613)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56613

Replace linalg_solve_helper with `lu_stub` + `lu_solve_stub`.
Once `lu_stub` and `lu_solve_stub` have cuSOLVER-based codepath,
`torch.linalg.solve` will have it as well.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D28627408

Pulled By: mruberry

fbshipit-source-id: b95bbdf35f845a56a1489c04b53742a01b36e789
2021-06-20 19:37:12 -07:00
47d727fe1b [quant][graphmode][fx] Produce conv reference static quant modules (#60138)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60138

Test Plan:
python test/test_quantization.py TestQuantizeFx

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29184791

fbshipit-source-id: 971a40012dbba0cf687c62a3a4af9358513c253b
2021-06-20 19:25:45 -07:00
b298013cd5 [add/sub] Cast alpha to acc_type (#60227)
Summary:
This PR lets `torch.add` & `torch.sub` CUDA kernels cast `alpha` to `acc_type`, not `scalar_t`.
I do not remove `cast`s from `test/test_foreach.py` because I'll do this in https://github.com/pytorch/pytorch/issues/59907 or follow-up for it.

Current upstream `torch._foreach_add` & `torch._foreach_sub` upcast `alpha` parameter to `acc_type<scalar_t>` while `torch.add` & `torch.sub` not. This is kind of problematic because outputs of `torch.add` and `torch.sub` are different from `torch._foreach_add` and `torch._foreach_sub`, respectively if the dtype of input tensors is either `torch.half` or `torch.bfloat16`. The discrepancy is proportional-ish to `abs(alpha)` except when `alpha` is representable with 16 bits.

ref:
- `torch._foreach_add` & `torch._foreach_sub` cast `alpha`: 6d0fb85a62/aten/src/ATen/native/cuda/ForeachBinaryOpList.cu (L21-L28), `BinaryOpListAlphaFunctor` is defined here: 6d0fb85a62/aten/src/ATen/native/cuda/ForeachFunctors.cuh (L202)

related: https://github.com/pytorch/pytorch/issues/58833, https://github.com/pytorch/pytorch/pull/59907

cc ngimel ptrblck mcarilli

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60227

Reviewed By: mruberry

Differential Revision: D29252759

Pulled By: ngimel

fbshipit-source-id: 847f3b9493ae30a900f7445af00aef1abcc1ab21
2021-06-20 19:05:22 -07:00
0131a5972d [DDP] Test inference works with eval() and no_grad() (#59666)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59666

Tests that inference with DDP model won't hang when user sets eval()
or no_grad(). Note that if the model has a syncBN layer, they need both eval()
and no_grad() as eval() makes SyncBN work like a regular BN layer.
ghstack-source-id: 131906625

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D28974146

fbshipit-source-id: 137f8245b1c303beb2416518476e70fe67c73376
2021-06-20 12:02:43 -07:00
69b2bf70f9 [pytorch] fix tools/code_analyzer for llvm 11 (#60322)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60322

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D29250420

Pulled By: ljk53

fbshipit-source-id: ff7f9cbacd1d9518ed81c06fc843a90d6948f760
2021-06-20 00:39:11 -07:00
c19acf816f Replace TensorRT's deprecated API in caffe2/python/trt/test_pt_onnx_trt.py (#60236)
Summary:
TensorRT v8 is going to remove some functions/methods that used in test.

ref:
- getMaxWorkspaceSize deprecation: b2d60b6e10/include/NvInfer.h (L6984-L6993)
- buildCudaEngine deprecation: b2d60b6e10/include/NvInfer.h (L7079-L7087)

cc ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60236

Reviewed By: gchanan

Differential Revision: D29232376

Pulled By: ngimel

fbshipit-source-id: 2b8a48787bf61c68a81568b6026d6afd5a83e751
2021-06-19 19:56:30 -07:00
5ec4ad7f54 [special] Add special.ndtri (#58650)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345

TODO
* [x] Add docs https://13865352-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.ndtri
* [x] Add comments on implementation
* [x] Clean-up

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58650

Reviewed By: H-Huang

Differential Revision: D29160170

Pulled By: mruberry

fbshipit-source-id: 50e4ea663920e97b8437d03d5b52bcd9dedc1a8d
2021-06-19 18:36:54 -07:00
5824a866b7 [pytorch][nnc] support custom class parameters (#59466)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59466

Change saved parameter type from at::Tensor to at::IValue to support custom
class parameters, e.g. `__torch__.torch.classes.xnnpack.Conv2dOpContext`.

The NNC produced kernels won't deal with custom class parameters directly.
They simply pass through to the external operators that take these custom
class parameters, e.g. `prepacked::conv2d_clamp_run`.

It will reuse the `__getstate__` and `__setstate__` methods on the custom class
to persist and restore the state of the parameters.

When calling into the kernel, it will pass in the untyped raw pointer of the custom
class objects to the kernel as `void*`. It's similar to the regular tensor parameters,
for which it will pass in the raw data pointer of the tensor storage. The generated
kernel needs to hardcode the expected type for each parameter and cast before
calling the external ops.
ghstack-source-id: 131897904

Test Plan: - unit tests

Reviewed By: kimishpatel

Differential Revision: D28902496

fbshipit-source-id: 4b2c0895dd28f0b7d344aa08183d42ad6a355dae
2021-06-19 06:11:01 -07:00
cac9ae1506 [iOS GPU][BE][3/n] Give MPSImage objects a label for better debugging experience (#60282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60282

1. Adds a label to the MPSImage objects. The label describes the size of the image.
2. Remove `[image markRead]`.
3. Rename two APIs for better naming convention.
ghstack-source-id: 131839557

Test Plan:
1. CircleCI
2. buck test pp-mac

Reviewed By: SS-JIA

Differential Revision: D29232975

fbshipit-source-id: 075175c4b5a1c5b79e795f4860e1694d7c06d4f2
2021-06-18 18:47:05 -07:00
b9cd97c94b [iOS GPU][BE][2/n] Remove unused APIs (#60281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60281

1. REmove unused APIs from MPSImageUtils.
2. Move tensor related APIs from MetalUtils to MetalTensorUtils. Delete MetalUtils.h/mm
3. Move metal buffer related APIs to MetalContext
ghstack-source-id: 131839559

Test Plan:
1. CircleCI
2. buck test pp-mac

Reviewed By: SS-JIA

Differential Revision: D29232973

fbshipit-source-id: a4c0c848883b8ef615eeb2936c1f3d18cddcb318
2021-06-18 18:47:04 -07:00
80e6e3f1da [iOS GPU][BE][1/n] Rename MPSCNNContext to MetalContext (#60280)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60280

No significant changes besides renaming the class. In the future, we'll convert this objc class to c++.
ghstack-source-id: 131827490

Test Plan:
- CircleCI
- buck test pp-mac

Reviewed By: SS-JIA

Differential Revision: D29231824

fbshipit-source-id: a0d1327a55a0414011c78a7144d3b05f1579cf42
2021-06-18 18:45:24 -07:00
319890b1b2 Support *args in Pipe.forward API. (#55441)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55441

This is the first step towards supporting the proposal outlined in
https://github.com/pytorch/pytorch/issues/53952.

In this PR I've ensured Pipe.forward() accepts a *inputs argument instead of
just a single input as previously. This lays the groundwork for supporting
non-Tensors and generic arguments to the Pipe API. In this PR we still only
support Tensors and non-Tensor support will come in future PRs.

For backward compatibility I've ensured a single Tuple[Tensor] input still
works as expected previously.
ghstack-source-id: 130767499

Test Plan: waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D27613887

fbshipit-source-id: 05e19e537e6d7fe4999745fc4ba9941ac54906de
2021-06-18 17:53:32 -07:00
a8430f1076 Remove PlacementSpec from ShardingSpecs. (#59990)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59990

ShardingSpecs accepted a Device/PlacementSpec and was initially
written this way for flexibility. Although, it is slightly confusing given
there is no general use case for this. As a result, to keep things simple I've
ensured that both specs only accept devices for now.

We can always extend this to include a general PlacementSpec later on.
ghstack-source-id: 131842525

Test Plan: waitforbuildbot

Reviewed By: SciPioneer, rohan-varma

Differential Revision: D29116463

fbshipit-source-id: a6f2b3f1346ac6afab91c9595d4cae4f4da04fda
2021-06-18 17:37:43 -07:00
1c97c3e3a4 DOC Adds LSTM docs for defined variables when bidirectional=True (#60120)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59332

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60120

Reviewed By: gchanan

Differential Revision: D29240245

Pulled By: jbschlosser

fbshipit-source-id: acad9c24f41f7253a7d42cd940e54bb66e083ecf
2021-06-18 17:28:44 -07:00
aae2a3c95e Clarify ConvTransposeNd + reference links (#60291)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56873

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60291

Reviewed By: gchanan

Differential Revision: D29239199

Pulled By: jbschlosser

fbshipit-source-id: 9b2de1a8b1a7444797f82c73195c5efc929562eb
2021-06-18 17:18:11 -07:00
e8e3394ea8 Recognize transposed dense tensors as a form of partial overlap (#59014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59014

Fixes #48401

`assert_no_overlap` currently has a false-negative where it recognizes
the transpose of a contiguous tensor as fully overlapping. This happens because
the memory regions do fully overlap, but of course the strides are different so
the actual elements don't all overlap.

This goes slightly in the other direction, by requiring strides to exactly
match we get false-positives for some unusual situations, e.g.
```
torch.add(a, a, out=a.view([1, *a.shape]))
```
Or replacing strides of length-1 dimensions, etc. However, I think these are
sufficiently obscure that it's okay to error and the common cases like
inplace operations still work as before.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D29040928

Pulled By: ngimel

fbshipit-source-id: 5a636c67536a3809c83f0d3117d2fdf49c0a45e6
2021-06-18 16:29:25 -07:00
47bbc01e0b [nnc] Added micro-benchmark to show perf improvement with cat subgraph optimization (#59581)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59581

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28955317

Pulled By: navahgar

fbshipit-source-id: 53bb3dbfafbd3b146063f305523c2e6ec96cf6b8
2021-06-18 14:32:09 -07:00
d0c4ace00f [jit] Added a tranformation to move consumers of aten::cat to its inputs, in the fused subgraphs (#59580)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59580

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28955318

Pulled By: navahgar

fbshipit-source-id: 7504d5aea441920f4eb9234cdfa17077161ab13c
2021-06-18 14:32:07 -07:00
d4c626a346 [jit] Exported a method to get the supported list of elementwise ops (#60162)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60162

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D29190841

Pulled By: navahgar

fbshipit-source-id: bb786a653441c5b586509e25cc80d357d2223af3
2021-06-18 14:32:05 -07:00
55755edc60 [jit] Made a list for element-wise ops. (#59579)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59579

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D28955319

Pulled By: navahgar

fbshipit-source-id: 605531aedf9250a226b0401d55fda3427bdc6f33
2021-06-18 14:30:47 -07:00
a029422cae [quant][graphmode][fx][refactor] Change the env map to add dtype as a key (#60054)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60054

Previously env in convert is Dict[str, Tuple[Node, torch.dtype]], that is, at a given time each node can only have one dtype,
this causes a problem for the following case:
```
class M(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.conv = nn.Conv2d(1, 1, 1)

        def forward(self, x):
            x = self.conv(x)
            x1 = x.expand_as(x)
            x2 = torch.add(x, x1)
            return x2

def forward(self, x):
    x = self.activation_post_process_0(x)
    x = self.conv(x)
    x = self.activation_post_process_1(x)
    x1 = x.expand_as(x)
    x1 = self.activation_post_process_2(x1)
    x2 = torch.add(x, x1)
    x2 = self.activation_post_process_3(x2)
    return x2

def forward(self, x):
    x = torch.quantize_per_tensor(x, ...)
    x = self.conv(x). # quantized conv
    x = torch.dequantize(x)
    x1 = x.expand_as(x)
    x1 = torch.quantize_per_tensor(x1, ...)
    # Error: x is dequantized
    x2 = torch.ops.quantized.add(x, x1)
    return x2

Currently we have a env that is a map from node name of the observed graph to the Node in the quantized graph, here the problem is that following a quantized operator conv, we have two operators, one is expecting float input (expand_as), the other is expecting quantized input (quantized add), and in the quantized graph, ideally, expand_as should consume the dequantized output, and quantized add should consume the quantized output:

quantized_conv - dequantize - expand_as
  \ ------- quantized_add

But currently in env, each node needs to either be quantized or not quantized. Therefore we will need to change env to include dtype as well:
env: Dict[str, Dict[dtype, Node]], e.g. {‘x’: {torch.float: dequantized_node, torch.quint8: quantized_node}}
And when we load from the env, we will need to provide the dtype of the Node that we want to load as well. We can have a separate pass to figure out this information for each node.
```

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29149408

fbshipit-source-id: c9e4b7d65444ab6a6f573929bae1db5037629892
2021-06-18 13:31:43 -07:00
c0f8cad0f0 Be fix shard inbalance (#60206)
Summary:
First step to address https://github.com/pytorch/pytorch/issues/60136

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60206

Reviewed By: janeyx99

Differential Revision: D29215237

Pulled By: walterddr

fbshipit-source-id: ec25beb57366ef2eaf37878cdea391b245de9bef
2021-06-18 12:49:30 -07:00
d9e7df707b [TensorExpr] Add NNC lowerings for aten::mean, aten::addmm, and aten::adaptive_avg_pool2d. (#59347)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59347

We had external call wrappers for them, but they were not used in NNC.
This PR adds lowerings using these ext calls and fixes some bugs in
them.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D28853832

Pulled By: ZolotukhinM

fbshipit-source-id: 1718400368e1a9cf3f19180ee2290a4ed9c99d41
2021-06-18 11:56:32 -07:00
c6bb9409b8 [TensorExpr] Handle not-specified dtypes and strides. (#59346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59346

Currently JIT has a pass to propagate shapes, but doesn't have a
capability to fill in strides and dtypes. This PR works around that by
assuming default dtype to be Float and strides corresponding to
contiguous layout, unless otherwise specified. Ideally, we won't need
this, and this is done simply as a workaround unless the corresponding
features are implemented on JIT side.

This is required for AOT compilation of mobilenet v3 with NNC.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D28853831

Pulled By: ZolotukhinM

fbshipit-source-id: 81adb59409684f39b444909ab8ec58ee4a39d496
2021-06-18 11:56:30 -07:00
f042455a8d [JIT] ShapeProp: add missing ops from mobilenet v3. (#59163)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59163

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D28853833

Pulled By: ZolotukhinM

fbshipit-source-id: 451fb9ee848968049d26fb5623a904d8fa7bd6fc
2021-06-18 11:55:00 -07:00
3870e68644 TF32 threshold twiddling for tests (#60209)
Summary:
Following https://github.com/pytorch/pytorch/issues/59624 I observed some straggling failing tests on Ampere due to TF32 thresholds. This PR just twiddles some more thresholds to fix the (6) failing tests I saw on A100.

CC Flamefire ptrblck ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60209

Reviewed By: gchanan

Differential Revision: D29220508

Pulled By: ngimel

fbshipit-source-id: 7c83187a246e1b3a24b181334117c0ccf2baf311
2021-06-18 11:41:33 -07:00
5f010c066f [package] Bring back save_source_file (#59962)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59962

This reverts commit 44b021d21b5681c105529881bdbaefb6d3e335f6.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D29113224

Pulled By: zhxchen17

fbshipit-source-id: 55d42acc421c5f4abbbad9d9ed4d32b615939463
2021-06-18 11:13:35 -07:00
5a45103139 ns for fx: add API usage logging (#60103)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60103

Adds internal logging for NS for FX API usage.

Test Plan: CI

Reviewed By: jerryzh168

Differential Revision: D29166710

fbshipit-source-id: 2a1bf2f6038b0c6c5945b57b2db2de25c585a04a
2021-06-18 10:25:59 -07:00
0baad214b0 [static runtime][fix] resize to the input tensor size for full_like (#60229)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60229

Fix bug where we did not resize to the input tensor size, causing
the output to be incorrect

Test Plan:
Test on replayer, rebased on D29217781, with model 278203319_26.

Verify with jit outputs (D28583950)

`./buck-out/gen/admarket/lib/ranking/prediction_replayer/replayer --model_inference_type_target=DISAGG_ACCELERATOR --prediction_replayer_force_model_type=inline_cvr_post_imp_model --prediction_replayer_force_model=278203319_26 --prediction_replayer_target_tier=sigrid.predictor.perf.dianshi_staticruntime_debug_0604.test --prediction_replayer_input_stream_filename=/data/users/ansha/tmp/adfinder/filtered_requests_inline_cvr_100 --ignore_model_id_mismatch --check_performance --fully_remote_sr_connection_options="overall_timeout:10000000,processing_timeout:10000000" --use_new_encoding_for_ads_services --use_new_encoding_from_model_id_to_shard_id --sigrid_force_model_dir=/data/users/ansha/tmp/adfinder/278203319_26/ --sigrid_predictor_model_suffix=.predictor.disagg.local —use_new_encoding_from_model_id_to_shard_id=true --prediction_replayer_force_model_kind=19 --pytorch_predictor_static_runtime_enable=true --prediction_replayer_target_qps=1`

Reviewed By: hlu1, movefast1990

Differential Revision: D29218918

fbshipit-source-id: dab4bbbabeaa8367174ed90edca43d6204c65409
2021-06-18 09:56:25 -07:00
d5df274ea5 [DDP] Support for multiple backwards (#59359)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59359

Move `prepare_for_backward` into `_DDPSink` backward instead of calling it in DDP forward pass so that we can run multiple backwards in DDP with `retain_graph=True`.

ghstack-source-id: 131774159

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D28855226

fbshipit-source-id: 6b7b25d75b7696f5b5629078233433f97663d61c
2021-06-18 09:23:57 -07:00
3815a013ed Enable xenial-cuda11.1-cudnn8-py3.6-gcc7 in GHA (#60196)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60196

Test Plan:
https://github.com/pytorch/pytorch/issues/60198: https://github.com/pytorch/pytorch/actions/runs/947796763

I should have used `ghstack` but I forgot; will do that in the future.

Reviewed By: walterddr

Differential Revision: D29231161

Pulled By: samestep

fbshipit-source-id: 8299a248ca9c1d36c3845d1c8a10ca9bf7101124
2021-06-18 09:18:53 -07:00
d5988c5eca remove unused type: ignore directives (#60006)
Summary:
During development it is common practice to put `type: ignore` comments on lines that are correct, but `mypy` doesn't recognize this. This often stems from the fact, that the used `mypy` version wasn't able to handle the used pattern.

With every new release `mypy` gets better at handling complex code. In addition to fix all the previously accepted but now failing patterns, we should also revisit all `type: ignore` comments to see if they are still needed or not. Fortunately, we don't need to do it manually: by adding `warn_unused_ignores = True` to the configuration, `mypy` will error out in case it encounters an `type: ignore` that is no longer needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60006

Reviewed By: jbschlosser, malfet

Differential Revision: D29133237

Pulled By: albanD

fbshipit-source-id: 41e82edc5cd5affa7ccedad044b59b94dad4425a
2021-06-18 07:23:31 -07:00
7c29ca7f2b Fix Subset of a Subset not sliceable issue (#59513)
Summary:
Dataset can be indexed by a list, but a list can not be indexed by a list. This gives error when slicing a Subset initialised with a Subset, instead of a dataset.

Fixed the issue by changing the indices to a Tensor which can be indexed by a list.

Fixes https://github.com/pytorch/pytorch/issues/59512

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59513

Reviewed By: zou3519

Differential Revision: D29196891

Pulled By: ejguan

fbshipit-source-id: ccde6e474fbcbddd2e9c7c107bc8b5de1307cdb9
2021-06-18 07:07:34 -07:00
08ce5eedf5 [reland] Move RPC agents to libtorch (#60170)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60170

Reland of #59939.

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D29193234

fbshipit-source-id: ee2a90d6be961c10f91361512bdd4cadca43dd60
2021-06-18 05:15:09 -07:00
958b881d70 [reland] Add some TORCH_API annotations to RPC (#60169)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60169

Reland of #59939.
ghstack-source-id: 131706861

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D29193233

fbshipit-source-id: 91d3ef9003b9da7b99e1b9310b7f5a6c505d3b99
2021-06-18 05:15:07 -07:00
83fde5d981 [reland] Pass RequestCallback to FaultyPG RPC agent (#60168)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60168

Reland of #59939.
ghstack-source-id: 131706860

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D29193235

fbshipit-source-id: 170108956a041f6a91b2b21c76ab1a0e0cdd34a2
2021-06-18 05:13:57 -07:00
8a839c5478 Fix saved variable unpacking version counter (#60195)
Summary:
We only set the value and not the actual VC.
This means that in the context of double backward, if that saved tensor is saved again and the original Tensor is modified inplace, we would not detect it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60195

Reviewed By: Varal7

Differential Revision: D29208766

Pulled By: albanD

fbshipit-source-id: 81175f8e3f111f89524f8e46f47577b2ea4fc945
2021-06-18 04:36:46 -07:00
5609c2e59c Adds an OpInfo note (#57428)
Summary:
Like the title says. The OpInfo pattern can be confusing when first encountered, so this note links the Developer Wiki and tracking issue, plus elaborates on the goals and structure of the OpInfo pattern.

cc imaginary-person, who I can't add as a reviewer, unfortunately

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57428

Reviewed By: SplitInfinity

Differential Revision: D29221874

Pulled By: mruberry

fbshipit-source-id: aa73228748c9c96eadf2b2397a8b2ec31383971e
2021-06-18 03:40:42 -07:00
ecc37184a5 Fix clang-tidy path filtering (#60225)
Summary:
PR https://github.com/pytorch/pytorch/issues/60048 neglected to include the `--paths` option for file filtering, so it ended up passing every changed file in the diff to clang-tidy (cpp files outside `torch/csrc/`, yaml/sh files, etc.). This adds that back in to make the filtering work properly again.

Tested it manually by printing out the files to lint and running

```bash
curl -L https://github.com/pytorch/pytorch/pull/60018.diff > diff
python tools/clang_tidy.py --diff-file diff --paths torch/csrc/

curl -L https://github.com/pytorch/pytorch/pull/60222.diff > diff
python tools/clang_tidy.py --diff-file diff --paths torch/csrc/
```

Should fix https://github.com/pytorch/pytorch/issues/60192 and fix https://github.com/pytorch/pytorch/issues/60193, the files tripping errors there shouldn't have been passed to clang-tidy in the first place (supporting aten/ for clang-tidy is a separate task)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60225

Reviewed By: zhouzhuojie

Differential Revision: D29216251

Pulled By: driazati

fbshipit-source-id: b5d7fb7161d33eb7958a6f1ccc25809942045209
2021-06-17 23:03:59 -07:00
38c3116813 [hierarchical sharding 5/n] enable table-wise -> col-wise sharding in embedding table lookup
Summary:
This diff add table-wise -> col-wise sharding support in GroupedShardedEmbeddingBag. Changes includes:
1. Add necessary member variables set up.
2. Create new fast kernel and add fast kernel lookup support
3. Add intra-host all2all and cross-host all2all logic.

Test Plan:
UT
```
buck test mode/dev-nosan //caffe2/torch/fb/training_toolkit/backend/tests:test_model_materializer_full_sync_spawn
```
```
buck test caffe2/torch/fb/hpc/tests:model_sharder_test
```
QPS check:
```
buck run mode/dev-nosan -c python.package_style=inplace caffe2/torch/fb/training_toolkit/examples:sync_sgd_local_driver -- prod-preset --num-trainers 32 --use-shrunk-model false --model-version=inline_cvr_dec_2020 --fast-kernel table_batched --max-batches 10000 --num-dpp-worker-threads 16 --num-readers 100 --hpc-identity ads_model_platform --table-partition hierarchical_based --hierarchical-options "["table_based", "column_based"]" --flow-entitlement ads_global_qps
```
with diff:
dec inline_cvr:
table-wise -> table-wise (82K):
https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_d0a0cba5?version=0&tab=status&env=PRODUCTION

table-wise -> column-wise (80k):
https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_b1ac5873

column-wise:
dec inline_cvr:
gpu trace: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1623827677%2F127.0.0.1%2Flibkineto_activities_4550.json.gz&bucket=gpu_traces

https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_a79e1522 (81k)

https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_2dacc13e (88k)

row-wise(62k):
https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_4e349cab

table-wise(90k):
https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_5d51b608

10x ctr_mbl_feed:
```
buck run mode/dev-nosan -c python.package_style=inplace caffe2/torch/fb/training_toolkit/examples:sync_sgd_local_driver -- prod-preset --num-trainers 128 --use-shrunk-model false --model-version=ctr_mbl_oct_2020_10x_3tb --num-dpp-worker-threads 16 --num-readers 200 --fast-kernel table_batched --max-batches 5000000 --hpc-identity ads_model_platform --table-partition column_based --flow-entitlement ads_global_tc_mimo
```
column-wise:
https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_f05fb306?version=0&tab=status&env=PRODUCTION (290k)

w/o diff:
dec inline_cvr:
column-wise (87K):
gpu trace: https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1623864444%2F127.0.0.1%2Flibkineto_activities_4451.json.gz&bucket=gpu_traces
https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_e1315f14

row-wise (60k):
https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_8fcc0adf

table-wise (91k):
https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_cb94ff41

10x ctr_mbl_feed:
https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_203ef35b?version=0&tab=status&env=PRODUCTION (281k)

NE check(use deterministic reading D28711400)
```
buck run mode/dev-nosan -c python.package_style=inplace caffe2/torch/fb/training_toolkit/examples:sync_sgd_local_driver -- prod-preset --num-trainers 32 --use-shrunk-model false --model-version=inline_cvr_dec_2020 --fast-kernel table_batched --max-batches 100000 --num-dpp-worker-threads 16 --num-readers 64 --hpc-identity ads_model_platform --table-partition hierarchical_based --hierarchical-options "[table_based, column_based]" --flow-entitlement ads_global_qps --use-deterministic-model --use-deterministic-reading --model-entity-id 995557193
```
w/o this diff:
```
I0611 12:19:18.766000 647 print_publisher.py:33  master      ] Publishing batch metrics: ne-ne|lifetime_ne 0.8660048340401448
I0611 12:19:18.766000 647 print_publisher.py:33  master      ] Publishing batch metrics: ne-ne|window_ne 0.8660048340401447
I0611 12:19:18.766000 647 print_publisher.py:33  master      ] Publishing batch metrics: qps-qps|total_examples 1867776.0
I0611 12:19:18.766000 647 print_publisher.py:33  master      ] Publishing batch metrics: qps-qps|window_qps 491.5199890136719
```
https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_15bc6243?version=0&tab=status&env=PRODUCTION

w this diff:
```
I0611 12:19:18.766000 647 print_publisher.py:33  master      ] Publishing batch metrics: ne-ne|lifetime_ne 0.8660048340401448
I0611 12:19:18.766000 647 print_publisher.py:33  master      ] Publishing batch metrics: ne-ne|window_ne 0.8660048340401447
I0611 12:19:18.766000 647 print_publisher.py:33  master      ] Publishing batch metrics: qps-qps|total_examples 1867776.0
```
https://www.internalfb.com/mast/job/tsm_ruilinchen-SparseNNApplication_15bc6243?version=0&tab=status&env=PRODUCTION

Reviewed By: JadeNie

Differential Revision: D28689126

fbshipit-source-id: 1c7879d4e3ee2b90aaf2a89e87f7b827d54173b3
2021-06-17 22:25:25 -07:00
8b55e9feaf removed cat, equal, and stack from autocast promote list (#59497)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59497

Reviewed By: zou3519

Differential Revision: D29185909

Pulled By: ngimel

fbshipit-source-id: db96239106d9e46a2704b8f457fd0463dacc1f5c
2021-06-17 21:13:22 -07:00
faf459f13e [Profiler] Fix memory profiler merge issue (#60037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60037

The memory profiler was broken due to a mis-merge during rebase. Add lost line back.

Reviewed By: ezyang

Differential Revision: D29143469

fbshipit-source-id: c3bf0088ca12e7535eeddbede24e28201eccd5f4
2021-06-17 21:05:23 -07:00
bcf8752fb2 updated launch bounds for trilinear 3d (#59999)
Summary:
Updates launch bounds for upsample_trilinear_3d forward and backward kernel to remove register spilling into local memory. Improves runtime for forward pass by 3-4x factor, backward pass has same runtime (probably different bottleneck).

Timing data: (Using Nvidia Titan-V GPU)
![TrilinearTimingData](https://user-images.githubusercontent.com/22803332/121979658-72f19200-cd3f-11eb-9363-c00e2c4eea6d.PNG)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59999

Reviewed By: zou3519

Differential Revision: D29185976

Pulled By: ngimel

fbshipit-source-id: 0b2313e70e45c53938cd7262464d3aa4fab8da4a
2021-06-17 21:02:12 -07:00
7e032f18cf DOC Describes behavior for None in module.register_* (#60125)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/45834

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60125

Reviewed By: zou3519

Differential Revision: D29196138

Pulled By: jbschlosser

fbshipit-source-id: af736c0d66005ec33412860f00b233a5d2922137
2021-06-17 19:18:23 -07:00
047925dac1 .github: Run Windows CUDA build on pull requests (#60215)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60215

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: samestep

Differential Revision: D29214519

Pulled By: seemethere

fbshipit-source-id: 58df5ee49cc5cd46f48938f023f87a6da958f3b6
2021-06-17 16:30:31 -07:00
6af5d00e4b [torch][segment_reduce] Add support for multi-dimensional input (cuda) (#60018)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60018

Same as title. This diff finishes cuda support for currently implemented reductions and input parameters.

Next Steps:
- Add support for sum/min
- More testing and benchmarking
- Cleanup
    - Update default values when length is 0
    - Use TensorIterator
    - Update documentation

Test Plan: Unit test to cover cuda forward path.

Reviewed By: ngimel

Differential Revision: D29135373

fbshipit-source-id: d070727eeb660f56782e7ac8a5b0798be688480a
2021-06-17 16:30:30 -07:00
a727f655c8 [torch][segment_reduce] Support for multi dimension (cpu only) (#59951)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59951

Add support for multi-d input for cpu forward/backward implementation.

Next step: Adding cuda support for multi-d input.

Test Plan: Added unit tests.

Reviewed By: ngimel

Differential Revision: D29105457

fbshipit-source-id: a389ba4cc10f02434a336b8e7d36259f32552e11
2021-06-17 16:29:14 -07:00
8e67981995 .github: Disable clang-tidy for now (#60219)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60219

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D29214928

Pulled By: seemethere

fbshipit-source-id: 20cf38ebfe77ed646e25293c577937c56bd930d3
2021-06-17 16:26:31 -07:00
acf04cdedf Fix default DEFAULT_FILE_PATTERN in clang-tidy (#60212)
Summary:
Without the change, clang-tidy also checks folders like `.circleci/...`

Example of the clang-tidy that looked into `.circleci` changes
https://github.com/pytorch/pytorch/runs/2844682644?check_suite_focus=true

[skip ci]

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60212

Reviewed By: seemethere

Differential Revision: D29214728

Pulled By: zhouzhuojie

fbshipit-source-id: fd53f7b2f7d88936264db1effdc06cc4fc271ca4
2021-06-17 16:25:18 -07:00
9c03de1dde Use mirrors for ubuntu apt source (#60216)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60135

Experimented on circleci
https://app.circleci.com/pipelines/github/zhouzhuojie/gha-ci-playground/7/workflows/965c95b8-2186-434a-92ca-9cd9c8aaafdc/jobs/7

Sample logs
```
Need to get 1,389 kB of archives.
After this operation, 5,495 kB of additional disk space will be used.
Get:1 http://mirrors.ubuntu.com/mirrors.txt Mirrorlist [3,270 B]
Get:2 http://mirror.lstn.net/ubuntu focal/main amd64 libtcl8.6 amd64 8.6.10+dfsg-1 [902 kB]
Get:7 http://ubuntu.securedservers.com focal/main amd64 libipc-run-perl all 20180523.0-2 [89.7 kB]
Get:5 http://mirrors.edge.kernel.org/ubuntu focal/universe amd64 expect amd64 5.45.4-2build1 [137 kB]
Get:4 http://mirror.pnl.gov/ubuntu focal/universe amd64 tcl-expect amd64 5.45.4-2build1 [105 kB]
Get:6 http://mirror.lstn.net/ubuntu focal/main amd64 libio-pty-perl amd64 1:1.12-1 [32.4 kB]
Get:9 https://mirrors.bloomu.edu/ubuntu focal/main amd64 libtimedate-perl all 2.3200-1 [34.0 kB]
Get:8 http://la-mirrors.evowise.com/ubuntu focal/universe amd64 libtime-duration-perl all 1.21-1 [13.1 kB]
Get:3 http://mirrors.ocf.berkeley.edu/ubuntu focal/main amd64 tcl8.6 amd64 8.6.10+dfsg-1 [14.8 kB]
Get:10 http://mirrors.ocf.berkeley.edu/ubuntu focal/universe amd64 moreutils amd64 0.63-1 [60.5 kB]
Fetched 1,392 kB in 3s (464 kB/s)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60216

Reviewed By: seemethere

Differential Revision: D29214661

Pulled By: zhouzhuojie

fbshipit-source-id: ed2d85f8c0c23af4bcf33558c57472fcf9d913e8
2021-06-17 16:19:27 -07:00
3995fb1840 Add new_ones symbolic (#59255) (#59539)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59539

Add new_ones symbolic in PT-ONNX exporter

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D29046603

Pulled By: SplitInfinity

fbshipit-source-id: e7420c7b543c33e3640e62461d08ff4d5843eda7

Co-authored-by: Shubham Bhokare <shubhambhokare@gmail.com>
2021-06-17 15:49:24 -07:00
ef1c107be5 [vulkan] Do not use memcmp to compare structs (#60199)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60199

It isn't safe to use `memcmp` to determine the equality of structs due to potential random padding between fields of the struct. This can cause overloaded equality operators to return false when comparing structs with equivalent fields.

This bug appears to be responsible for the Vulkan backend crashing on WorkVC release builds.

Test Plan:
Run Vulkan unit tests:

```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
cd -
```

Test on workvc rdk build, first ensure you are receiving the Vulkan models.
```
buck install fbsource//fbandroid/mode/opt fbsource//fbandroid/mode/aloha_build_rdk fbsource//fbandroid/mode/no_obfuscation fbandroid/buck-configs/buckconfig.caffe2_pkg_snpe_libs_android aloha_workvc_rdk --deep --show-full-output
```

Reviewed By: IvanKobzarev

Differential Revision: D29203177

fbshipit-source-id: e0ee79d4e635174e165b250f2cee842a09092df9
2021-06-17 15:20:30 -07:00
6d0fb85a62 Revert D28833086: beef up at::_ops API
Test Plan: revert-hammer

Differential Revision:
D28833086 (e2129d1c06)

Original commit changeset: 55f322a8378c

fbshipit-source-id: e55bf812ec411bb6bee87654f1d65ff10c046106
2021-06-17 14:28:32 -07:00
0cbb5e15d7 Correct backend in pipe_with_ddp_test (#60123)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60123

All of the tests would run with gloo, but some tests specify a
different backend param which we should respect.
ghstack-source-id: 131688188

Test Plan: CI

Reviewed By: cbalioglu

Differential Revision: D29171549

fbshipit-source-id: 3e306060df189c0e38d5ca6dd34f4b4fbca052b9
2021-06-17 13:43:01 -07:00
acd914f039 Fix Pipe + DDP for unused parameters, static graph (#60118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60118

Pipe + DDP has a few issues:

1) with static graph, does not synchronize gradients on first backward pass (i.e. delay allreduce is not run). does not work since https://github.com/pytorch/pytorch/pull/55248
2) when find_unused_parameters=True, also does not results in gradient synchronization. does not work since https://github.com/pytorch/pytorch/pull/57081

The reason for both cases is that calling `DDPSink.apply(output_tensor)` does not call the custom `backward` of `DDPSink` when the `output_tensor` is actually an `OwnerRRef`, which is the case when running DDP in `Pipe`. This is because we do `backward` on the `rref.local_value()` which does not have this autograd recording.

To fix, we unwrap the RRef and reconstruct it as needed, similar to the fix in https://github.com/pytorch/pytorch/pull/49908.

to test:
All tests in pipe_with_ddp_test pass.
The reason these tests did not catch the errors earlier is because all ranks received the same model inputs. So if gradient synchronization did not occur, then grads would still be the same because the model is the same on all ranks (guaranteed by ddp). Fixed the tests to use different inputs across ranks.
ghstack-source-id: 131688187

Test Plan: CI

Reviewed By: pritamdamania87

Differential Revision: D29167283

fbshipit-source-id: fe62310db2dc6de8519eb361b1df8ae4dfce3ab8
2021-06-17 13:41:51 -07:00
2062cafaa5 [iOS GPU][MaskRCNN] Implement RoIAlign in Metal shaders using Sampler (#56075)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56075

Inspired by the CUDA implementation - https://fburl.com/diffusion/e90tabkj. The main difference is the way we implement bilinear interpolation. CUDA does this manually by iterating every point in each bin box. Whereas, Metal does this by calling sampler's sample function, which is a bit easier and faster. The result is almost identical to the result from CPU - P365102522.

We'll do another round of refactor once we have figured out how to support custom ops on GPU.
ghstack-source-id: 131720620

Test Plan:
1. Circle CI
2. Sandcastle

Reviewed By: ajtulloch

Differential Revision: D27485068

fbshipit-source-id: 31e831aead9d3799a3fde96e99dd677d96bd3da1
2021-06-17 13:29:42 -07:00
e2129d1c06 beef up at::_ops API (#59115)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59115

This PR beefs up the `at::_ops::` API as a source of truth for compile-time information about each operator.

### Changes
For every op defined in native_functions.yaml, e.g. `at::_ops::add_Tensor` previously defined an unambiguous function; effectively an unambiguously named version of the C++ API that you could decltype() successfully because it had no overloads with a user-facing macro: `decltype(ATEN_FN2(add, Tensor)) // expands to decltype(at::_ops::add_Tensor)`.

Now, `at::_ops::add_Tensor` is a struct containing a few static fields and methods (declared in `Operators.h`, defined in `Operators.cpp`):
```
struct TORCH_API add_Tensor {
  using schema = at::Tensor (const at::Tensor &, const at::Tensor &, const at::Scalar &);
  using ptr_schema = at::Tensor (*)(const at::Tensor &, const at::Tensor &, const at::Scalar &);
  static constexpr const char* name = "aten::add";
  static constexpr const char* overload_name = "Tensor";
  static constexpr const char* schema_str = "add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor";
  static at::Tensor call(const at::Tensor & self, const at::Tensor & other, const at::Scalar & alpha);
  static at::Tensor redispatch(c10::DispatchKeySet dispatchKeySet, const at::Tensor & self, const at::Tensor & ot
};
```

What used to be the function `at::_ops::add_Tensor` can now be accessed as `at::_ops::add_Tensor::call`, and I've added a new macro to access the entire struct (naming suggestions welcome) - `ATEN_OP2(add, Tensor)`.

### Motivation

There were two motivations for this change:

**Codegen refactor**
The `at::_ops::` API as it exists now is (yet another) C++ entry point into the dispatcher, in addition to the Function, Method, and Redispatch APIs. Instead, after this PR, the existing three API's are all inline-able wrapper API's that call into the `at::_ops` API to do the real work. The function and method API's call into `at::_ops::{op}::call`, while the redispatch API calls into `at::_ops::{op}::redispatch`.

This will hopefully make it easier to pile in any future C++ API's that we want to code-generate. It also means that stuff like the string name, overload name, and schema of each operator is consolidated in a single place, rather than having the codegen hardcode various strings in multiple codegen output files.

**Extra compile-time metadata**
In the [boxed CPU fallback PR](https://github.com/pytorch/pytorch/pull/58065/files#diff-c9b55f0d692a9bea8019c6f19bc46877f1efa0f9d4fc2086cf299b52768343b4R31) above this in the stack, I added a new API that external backends can use to call directly into their boxed fallback from an unboxed context. Adding extra metadata to `at::_ops` means that XLA's usage of that API doesn't require passing in the string name and overload of each name as arguments; we can just infer them.

The updated API looks like this (see [the XLA-side PR ](https://github.com/pytorch/xla/pull/2945/files#diff-5e65c3c1d847191cb691d1874732e971f09fa1aad7a980a555c3b0504a5b6470R250) for more examples)
```
return at::native::call_fallback_fn<&xla_cpu_fallback, ATEN_OP2(add, Tensor)>::call(a, b, 1.0);
```

**Characteristics of the `at::_ops` API**
(I also commented this in the codegen)

(1) It follows the Dispatcher API.

This means, e.g., that it takes in the expanded arguments rather than `TensorOptions`. This is kind of necessary for perf, if we want to `at::_ops` to serve as the main implementation of the existing C++ API's. For example: if it followed the C++ API, then all of the faithful C++ factory functions would need to wrap their arguments into TensorOptions only to unwrap them again.

(2) Overload names are disambiguated.

This is the same as before; it's helpful for pytorch extenders who would like to decltype() an aten operator, that has overloads, e.g. decltype(at::_ops::mul_Tensor::call)

(3) No argument defaulting is allowed.

This is more of an implementation detail to avoid #include cycles, since TensorBody.h (which defines the Tensor class) needs to include this file. The #include situation is precarious though!

(4) manual_cpp_bindings and faithful names are not included in the API.

I think that this is one we have a choice with. This applies to stuff like __dispatch__is_complex(), and add_outf(). These aren't "real native_functions.yaml ops", they're just additional functions provided by the C++ API. They're implemented as wrappers in Functions.h that call into the actual operators defined here, i.e. at::_ops::is_complex::call() and at::_ops::add_out::call(). This means that ATEN_OP(is_complex) will not fastpath, and will go through the dispatcher. It also means that `ATEN_OP2(add, out)` is automatically faithful and takes its out argument at the end (this is just because it follows the dispatcher API).

**Details**

Instead of codegen'ing the existing 3 API's in `Functions.cpp`, `TensorMethods.cpp` and `RedispatchFunctions.cpp`, I codegen them directly into the headers: `Functions.h`, `TensorBody.h`, and `RedispatchFunctions.h`. I mostly did this for perf, since we want to avoid introducing an extra function call in the hot path of every operator. These functions are also now all one-liners that call into `at::_ops`, so the compiler should just inline them all anyway.

The main downside in doing that though was that I had to bend over backwards in a few cases to avoid cyclical #include statements. The issue is that `TensorBody.h` now includes `Operators.h` (because the codegen'd method API is implemented by calling into `at::_ops`), but `TensorBody.h` also includes the definition of the Tensor class. That means that `Operators.h` can't be aware of the Tensor class; it needs to forward declare everything and avoid using the Tensor class directly. To fix cyclic includes, I had to:
- Not allow defaulting in the `at::_ops` API
- Move some code that was called when translating from C++ to Dispatcher API's directly into the codegen template (`check_tensor_options_and_extract_memory_format`)

It's not great, but I don't think this specific include cycle will break down in the near future; the only code that we need to call before getting to `Operators.cpp` is the translations from various API's to the dispatcher API; there aren't many of them, and there's no major reason for them to live an external utils file somewhere.

Moving the code into the headers also meant that the codegen no longer needs to deal with `Functions.cpp`/`TensorMethods.cpp`/`RedispatchFunctions.cpp`. All of the functions that used to be defined in `TensorMethods.cpp` seemed small enough for me to lump into `TensorBody.h`, but some of the functions in `Functions.cpp` looked pretty big to put in a header, so I moved the file to `aten/src/ATen/native/Functions.cpp`.

It might be worth keeping `TensorMethods.cpp` there and leaving it too, in-case we have any beefy hand-written tensor methods that we don't want to put in a header.

**Perf**
I ran a few benchmarks in callgrind, and didn't see a noticeable instruction count change when calling `at::add()`. I also saw in the output that `at::add()` was successfully getting inlined.

There's also probably a light risk of binary size increase; I think that there's a binary size regression test that I can run in phabricator (going to try it). I can also try inspecting `libtorch.so` directly and seeing if it's any bigger, but my hope is that the inline-ing means that we aren't generated separate symbols for `at::add` and `at::_ops::add_Tensor::call`.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D28833086

Pulled By: bdhirsh

fbshipit-source-id: 55f322a8378cb9a3cb6642f72aa291be381dd95b
2021-06-17 13:09:46 -07:00
462448f07a Enable GHA sharding on linux (#60124)
Summary:
This is branch off of https://github.com/pytorch/pytorch/issues/59970 to only shard on linux so far (we're running in issues with windows gflags).

This would enable sharding of tests on a few Linux jobs on GHA, allowing tts to be essentially halved.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60124

Reviewed By: zou3519

Differential Revision: D29204211

Pulled By: janeyx99

fbshipit-source-id: 1cc31d1eccd564d96e2aef14c0acae96a3f0fcd0
2021-06-17 13:00:23 -07:00
bbedfd913d Run an dummy rpc._all_gather in init_rpc to avoid shutdown timeout (#59801)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59801

Fixes https://github.com/pytorch/pytorch/issues/59795.

The RPC calls in shutdown no longer able to finish within 5s if
there is no other RPCs before `rpc.shutdown()` in that process,
because agent initialization can take longer than 5s. We don't
have this problem previously, because TensorPipe's backend
registry used to use RPC to communicate CUDA devices in `init_rpc`.
However, after #58753, `init_rpc` uses ProcessGroup to communicate
devices, and hence the channels/transport could be uninitialized
after `init_rpc`.

Differential Revision:
D29039238
D29039238

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Pulled By: mrshenli

fbshipit-source-id: 46f89b01a058a51d271ddef9084a67b220a067b7
2021-06-17 11:47:54 -07:00
ebafd2aadf Stop warning on .names() access in max_pool2d and max_pool2d_backward (#60059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60059

Fixes #60053.

The problem is that `.names()` always triggers the named tensor warning.
To not trigger it, one has to guard it with has_names:
`x.has_names() ? x.names() : DimnameList{}`

This is not the first time this has happened; we should probably
make it so that .names() doesn't raise a warning unless it is actually
populated with names. That's a little tricky to implement so I'm leaving
it for the future.

Test Plan:
- New test, also run `python test/test_nn.py -v -k "max_pool"` and
confirm there are no warnings.

Reviewed By: gchanan

Differential Revision: D29152737

Pulled By: zou3519

fbshipit-source-id: 89a2fdbe6a6064a7044b5b75f7d0c58e51e57509
2021-06-17 10:34:41 -07:00
ef09428804 Revert D29104399: Port all kernel to structured kernels.
Test Plan: revert-hammer

Differential Revision:
D29104399 (7809494c68)

Original commit changeset: 18bb747b7a19

fbshipit-source-id: f57043df5646f1e675e8a555cb4fa0e436953751
2021-06-17 10:32:23 -07:00
3ff5507fb0 Revert D29104395: Port any kernel to structured kernels.
Test Plan: revert-hammer

Differential Revision:
D29104395 (519698362d)

Original commit changeset: 0cfde57c22ba

fbshipit-source-id: ac5ebdc4b9d3aeb4c5eeab55c92ac931599d39d1
2021-06-17 10:32:21 -07:00
81baa7fb0d Revert D29104398: Using meta checks for unary torch.all and torch.any.
Test Plan: revert-hammer

Differential Revision:
D29104398 (c078cefa7d)

Original commit changeset: 6771b80130c9

fbshipit-source-id: 10e5a34370113fcd2f87aea2c2e76108fa9328d8
2021-06-17 10:32:20 -07:00
873dac4b5a Revert D29104397: Port argmax to structured kernels.
Test Plan: revert-hammer

Differential Revision:
D29104397 (6f3da4f4bf)

Original commit changeset: 580355cf3b4e

fbshipit-source-id: e51fb79329066bc1a6364cfa44a8732908a684ed
2021-06-17 10:32:18 -07:00
6b5e77904f Revert D29104396: Port argmin kernel to structured kernels.
Test Plan: revert-hammer

Differential Revision:
D29104396 (226d745a0b)

Original commit changeset: 39c59bcc0446

fbshipit-source-id: 82de26f925a885f65572a785fa45a9980d3a974b
2021-06-17 10:31:06 -07:00
3dc8112187 [NNC] Handle int64 indices and loop bounds (#59769)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59769

Allow loop bound and tensor indice to be either int32 or int64, and avoid unnecessary cast op.

Test Plan:
```
build/bin/test_tensorexpr
```

Reviewed By: H-Huang

Differential Revision: D29173970

Pulled By: desertfire

fbshipit-source-id: 859a876ddb1b41535b2266089aa1222884295c78
2021-06-17 09:35:59 -07:00
96b3537e71 [NNC] Add a dtypeToCppString virtual method in IRPrinter (#59449)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59449

Make dtypeToCppString as a virtual method so that a child
class can easily override the dtype string generation rule. This is
needed as a preparation to make loop and tensor index as int64_t.

Test Plan:
```
build/bin/test_tensorexpr
```

Reviewed By: H-Huang

Differential Revision: D29173969

Pulled By: desertfire

fbshipit-source-id: a447badba76788354da1c79f80c834c99f105776
2021-06-17 09:34:58 -07:00
ed1da5be21 PG NCCL cleanup: remove usage of completed_ in WorkNCCL copies (#59899)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59899

Test Plan: Imported from OSS

Reviewed By: cbalioglu, osalpekar

Differential Revision: D29080299

Pulled By: agolynski

fbshipit-source-id: 9ae368f91e81f19471e0a20fc913d8e9df1b9dec
2021-06-17 09:05:35 -07:00
010f4b6f2d Add .isort.cfg (#60119)
Summary:
This adds the `.isort.cfg` file from https://github.com/pytorch/pytorch/issues/55928, but doesn't try to enforce it in CI because as that PR showed, that is currently difficult to do. We could use this to gradually sort the codebase according to this configuration (enforcing bits and pieces in CI) but I don't do that here.

The advantage of including this file (even if we don't enforce it) is that it affects how certain tools work, thus encouraging a specific import style for people who happen to use those tools.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60119

Test Plan: Open `test/run_test.py` in VS Code and run the **Python Refactor: Sort Imports** command. Compare with and without this PR.

Reviewed By: 1ntEgr8

Differential Revision: D29199504

Pulled By: samestep

fbshipit-source-id: 83e937b0f517c60e3e7dedb6c0306173908fbbb0
2021-06-17 09:04:25 -07:00
226d745a0b Port argmin kernel to structured kernels. (#59938)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59938

Tracking issue: #55070

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D29104396

Pulled By: ezyang

fbshipit-source-id: 39c59bcc044649c1ec9c9685366c4dda87f76aa7
2021-06-17 08:18:13 -07:00
6f3da4f4bf Port argmax to structured kernels. (#59937)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59937

Tracking issue: #55070

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D29104397

Pulled By: ezyang

fbshipit-source-id: 580355cf3b4e9e5c934b4e51a16196087bcb3459
2021-06-17 08:18:12 -07:00
c078cefa7d Using meta checks for unary torch.all and torch.any. (#59373)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59373

This PR makes use of the newly implemented unified `at::meta::check_reduction` for
validating the inputs and configuring its `TensorIterator`.

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D29104398

Pulled By: ezyang

fbshipit-source-id: 6771b80130c91c2f1360853127de0acebcfff183
2021-06-17 08:18:10 -07:00
519698362d Port any kernel to structured kernels. (#59372)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59372

Tracking issue: #55070

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D29104395

Pulled By: ezyang

fbshipit-source-id: 0cfde57c22ba88607945c98f28b18df7709becd0
2021-06-17 08:18:08 -07:00
7809494c68 Port all kernel to structured kernels. (#59371)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59371

Tracking issue: #55070

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D29104399

Pulled By: ezyang

fbshipit-source-id: 18bb747b7a19d873427d52c1145ef7cede333a0e
2021-06-17 08:16:41 -07:00
b8ab98626b only runs mem leak check on master (#60023)
Summary:
setting environment variable to only do cuda mem leak check on master CI jobs.

See discussion in https://github.com/pytorch/pytorch/pull/59402#issuecomment-860773034

See stats before/after disabling mem leak check: https://github.com/pytorch/pytorch/pull/59942#issuecomment-860947095

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60023

Test Plan:
https://github.com/pytorch/pytorch/issues/60108
https://github.com/pytorch/pytorch/issues/60116

Reviewed By: janeyx99

Differential Revision: D29164182

Pulled By: walterddr

fbshipit-source-id: dfe88c2c1275b6eb35f18b58aacdc220f34ccb59
2021-06-17 07:56:26 -07:00
59b10036d5 Unifies OpInfo dtype tests (#60157)
Summary:
Simplifies the OpInfo dtype tests and produces nicer error messages, like:

```
AssertionError: Items in the first set but not the second:
torch.bfloat16
Items in the second set but not the first:
torch.int64 : Attempted to compare [set] types: Expected: {torch.float64, torch.float32, torch.float16, torch.bfloat16}; Actual: {torch.float64, torch.float32, torch.float16, torch.int64}.
The supported dtypes for logcumsumexp on cuda according to its OpInfo are
        {torch.float64, torch.float32, torch.float16, torch.int64}, but the detected supported dtypes are {torch.float64, torch.float32, torch.float16, torch.bfloat16}.
        The following dtypes should be added to the OpInfo: {torch.bfloat16}. The following dtypes should be removed from the OpInfo: {torch.int64}.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60157

Reviewed By: ngimel

Differential Revision: D29188665

Pulled By: mruberry

fbshipit-source-id: e84c9892c6040ea47adb027cfef3a6c0fd2f9f3c
2021-06-17 06:34:54 -07:00
4caca7a15b Improved torch.einsum testing and fixed bug (#59731)
Summary:
Improved torch.einsum testing and fixed a bug where lower case letters appeared before upper case letters in the sorted order which is inconsistent with NumPy.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59731

Reviewed By: SplitInfinity, ansley

Differential Revision: D29183078

Pulled By: heitorschueroff

fbshipit-source-id: a33980d273707da2d60a387a2af2fa41527ddb68
2021-06-17 04:48:47 -07:00
3698 changed files with 186605 additions and 89948 deletions

View File

@ -44,7 +44,7 @@ jobs:
is_official_build: ${{ parameters.is_official_build}}
# Sync and update PyTorch submodules
- bash: git submodule update --init --recursive
- bash: git submodule update --init --recursive --jobs 0
displayName: Update PyTorch submodules
# Build PyTorch and run unit tests - no packaging

View File

@ -47,7 +47,7 @@ jobs:
is_official_build: ${{ parameters.is_official_build}}
# Sync and update PyTorch submodules
- script: git submodule update --init --recursive
- script: git submodule update --init --recursive --jobs 0
displayName: Update PyTorch submodules
# Build PyTorch and run unit tests - no packaging

View File

@ -0,0 +1,26 @@
parameters:
name: ''
pool: ''
customMatrixes: ''
jobs:
- job: ${{parameters.name}}
timeoutInMinutes: 600
strategy:
matrix:
${{ insert }}: ${{parameters.customMatrixes}}
pool:
name: ${{ parameters.pool}}
steps:
# Clone PyTorch Tests repository
- bash: |
B64_PAT=$(echo -n ":$_ADOTOKEN" | base64)
git -c http.extraHeader="Authorization: Basic ${B64_PAT}" clone $(AZURE_DEVOPS_PYTORCH_TESTS_REPO_URL)
cd pytorch_tests
git checkout $(PYTORCH_TESTS_CHECKOUT_BRANCH)
env:
_ADOTOKEN: $(AZURE_DEVOPS_CLI_PAT)
displayName: Clone PyTorch Tests repo
- bash: |
bash $(Build.SourcesDirectory)/pytorch_tests/webapp/notify_webapp.sh
displayName: Notify Webapp

View File

@ -33,7 +33,7 @@ jobs:
# Clone PyTorch Tests repository
- bash: |
B64_PAT=$(printf "%s"":$_ADOTOKEN" | base64)
B64_PAT=$(echo -n ":$_ADOTOKEN" | base64)
git -c http.extraHeader="Authorization: Basic ${B64_PAT}" clone $(AZURE_DEVOPS_PYTORCH_TESTS_REPO_URL)
cd pytorch_tests
git checkout $(PYTORCH_TESTS_CHECKOUT_BRANCH)

View File

@ -8,7 +8,7 @@ steps:
connectionType: 'connectedServiceName'
serviceConnection: circleciconn
method: 'POST'
headers: '{"Content-Type":"application/json", "BranchName":"$(_TARGET_BRANCH_TO_CHECK)", "JobName":"$(TARGET_CIRCLECI_BUILD_PR)", "PRNumber":"$(_NUMBER_BUILD_PR)", "TargetCommit":"$(_TARGET_COMMIT)", "PlanUrl":"$(System.CollectionUri)", "ProjectId":"$(System.TeamProjectId)", "HubName":"$(System.HostType)", "PlanId":"$(System.PlanId)", "JobId":"$(System.JobId)", "TimelineId":"$(System.TimelineId)", "TaskInstanceId":"$(System.TaskInstanceId)", "AuthToken":"$(System.AccessToken)"}'
headers: '{"Content-Type":"application/json", "BranchName":"$(_TARGET_BRANCH_TO_CHECK)", "JobName":"$(TARGET_CIRCLECI_BUILD_PR)", "PRNumber":"$(_TARGET_PR_NUMBER)", "TargetCommit":"$(_TARGET_COMMIT)", "PlanUrl":"$(System.CollectionUri)", "ProjectId":"$(System.TeamProjectId)", "HubName":"$(System.HostType)", "PlanId":"$(System.PlanId)", "JobId":"$(System.JobId)", "TimelineId":"$(System.TimelineId)", "TaskInstanceId":"$(System.TaskInstanceId)", "AuthToken":"$(System.AccessToken)"}'
body: ''
urlSuffix: 'api/JobStatus'
waitForCompletion: true

View File

@ -48,3 +48,13 @@ stages:
_PYTHON_VERSION: $(PYTHON_VERSION_WIN_2)
_CUDA_BUILD_VERSION: $(CUDA_BUILD_VERSION_WIN_2)
_RUN_TESTS: $(RUN_TESTS_WIN)
- stage: 'NotifyWebapp'
displayName: 'Notify Webapp that pipeline is finished'
dependsOn: NightlyCustomTests
condition: succeededOrFailed()
jobs:
- template: job_templates/notify-webapp-template.yml
parameters:
name: ubuntu_1804_CPU
pool: $(BUILD_POOL_LIN_1)

View File

@ -22,7 +22,7 @@ stages:
- template: job_templates/wheel-wait-template.yml
variables:
_TARGET_BRANCH_TO_CHECK: ${{parameters.GitHubPyTorchPRTrigger.TARGET_BRANCH_TO_CHECK_AZ_DEVOPS_PR}}
_NUMBER_BUILD_PR: ${{parameters.GitHubPyTorchPRTrigger.PR_NUMBER}}
_TARGET_PR_NUMBER: ${{parameters.GitHubPyTorchPRTrigger.PR_NUMBER}}
_TARGET_COMMIT: ${{parameters.GitHubPyTorchPRTrigger.TARGET_COMMIT}}
- stage: 'PRCustomTests'
@ -40,7 +40,23 @@ stages:
_CUDA_BUILD_VERSION: $(CUDA_BUILD_VERSION_PR)
_TARGET_CIRCLECI_BUILD: $(TARGET_CIRCLECI_BUILD_PR)
_TARGET_BRANCH_TO_CHECK: ${{parameters.GitHubPyTorchPRTrigger.TARGET_BRANCH_TO_CHECK_AZ_DEVOPS_PR}}
_NUMBER_BUILD_PR: ${{parameters.GitHubPyTorchPRTrigger.PR_NUMBER}}
_TARGET_PR_NUMBER: ${{parameters.GitHubPyTorchPRTrigger.PR_NUMBER}}
_TARGET_COMMIT: ${{parameters.GitHubPyTorchPRTrigger.TARGET_COMMIT}}
_DOCKER_IMAGE: $(DOCKER_IMAGE_PR)
_RUN_TESTS: $(RUN_TESTS_PR)
- stage: 'NotifyWebapp'
displayName: 'Notify Webapp that pipeline is finished'
dependsOn: PRCustomTests
condition: succeededOrFailed()
jobs:
- template: job_templates/notify-webapp-template.yml
parameters:
name: ubuntu_1804_CPU
pool: $(BUILD_POOL_LIN_1)
customMatrixes:
PR_Notify_WebApp:
_TARGET_CIRCLECI_BUILD: $(TARGET_CIRCLECI_BUILD_PR)
_TARGET_BRANCH_TO_CHECK: ${{parameters.GitHubPyTorchPRTrigger.TARGET_BRANCH_TO_CHECK_AZ_DEVOPS_PR}}
_TARGET_PR_NUMBER: ${{parameters.GitHubPyTorchPRTrigger.PR_NUMBER}}
_TARGET_COMMIT: ${{parameters.GitHubPyTorchPRTrigger.TARGET_COMMIT}}

View File

@ -1,3 +1,13 @@
build --copt=--std=c++14
build --copt=-I.
build --copt=-isystem --copt bazel-out/k8-fastbuild/bin
# Configuration to disable tty features for environments like CI
build:no-tty --curses no
build:no-tty --progress_report_interval 10
build:no-tty --show_progress_rate_limit 10
# Configuration to build with GPU support
build:gpu --define=cuda=true
# define a separate build folder for faster switching between configs
build:gpu --platform_suffix=-gpu

View File

@ -1 +1 @@
3.1.0
4.2.1

View File

@ -343,7 +343,6 @@ All linux builds occur in docker images. The docker images are
* Has ALL CUDA versions installed. The script pytorch/builder/conda/switch_cuda_version.sh sets /usr/local/cuda to a symlink to e.g. /usr/local/cuda-10.0 to enable different CUDA builds
* Also used for cpu builds
* pytorch/manylinux-cuda90
* pytorch/manylinux-cuda92
* pytorch/manylinux-cuda100
* Also used for cpu builds

View File

@ -126,9 +126,6 @@ class PackageFormatConfigNode(ConfigNode):
self.props["python_versions"] = python_versions
self.props["package_format"] = package_format
# XXX Disabling conda for 11.3 as there's currently no appropriate cudatoolkit available
if package_format == "conda":
self.props["gpu_versions"] = filter(lambda x: x != "cuda113", self.find_prop("gpu_versions"))
def get_children(self):
if self.find_prop("os_name") == "linux":

View File

@ -124,9 +124,9 @@ class Conf(object):
Output looks similar to:
- binary_upload:
name: binary_linux_manywheel_3_7m_cu92_devtoolset7_nightly_upload
name: binary_linux_manywheel_3_7m_cu113_devtoolset7_nightly_upload
context: org-member
requires: binary_linux_manywheel_3_7m_cu92_devtoolset7_nightly_test
requires: binary_linux_manywheel_3_7m_cu113_devtoolset7_nightly_test
filters:
branches:
only:
@ -134,7 +134,7 @@ class Conf(object):
tags:
only: /v[0-9]+(\\.[0-9]+)*-rc[0-9]+/
package_type: manywheel
upload_subfolder: cu92
upload_subfolder: cu113
"""
return {
"binary_upload": OrderedDict({

View File

@ -7,26 +7,19 @@ CONFIG_TREE_DATA = [
("5.4", [ # All this subtree rebases to master and then build
("3.6", [
("important", [X(True)]),
("parallel_tbb", [X(True)]),
("parallel_native", [X(True)]),
("pure_torch", [X(True)]),
]),
]),
# TODO: bring back libtorch test
("7", [X("3.6")]),
]),
("clang", [
("5", [
("7", [
("3.6", [
("asan", [
(True, [
("shard_test", [XImportant(True)]),
]),
]),
]),
]),
("7", [
("3.6", [
("onnx", [XImportant(True)]),
]),
]),
@ -34,38 +27,28 @@ CONFIG_TREE_DATA = [
("cuda", [
("10.2", [
("3.6", [
("shard_test", [X(True)]),
# Build are needed for slow_gradcheck
('build_only', [X(True)]),
("slow_gradcheck", [
# If you update this slow gradcheck, you should
# also update docker_definitions.py to make sure
# the docker image match the config used here
(True, [
('shard_test', [XImportant(True)]),
]),
]),
("libtorch", [
(True, [
('build_only', [X(True)]),
]),
]),
]),
]),
("11.1", [
("3.8", [
("shard_test", [XImportant(True)]),
("libtorch", [
(True, [
('build_only', [X(True)]),
]),
]),
# UNCOMMENT THE BELOW TO REENABLE LIBTORCH
# ("libtorch", [
# (True, [
# ('build_only', [X(True)]),
# ]),
# ]),
]),
]),
]),
]),
("bionic", [
("clang", [
("9", [
("3.6", [
("noarch", [XImportant(True)]),
]),
]),
("9", [
("3.6", [
("xla", [XImportant(True)]),
@ -73,31 +56,14 @@ CONFIG_TREE_DATA = [
]),
]),
]),
("cuda", [
("10.2", [
("3.9", [
("shard_test", [XImportant(True)]),
]),
]),
]),
("gcc", [
("9", [
("3.8", [
("coverage", [
(True, [
("shard_test", [XImportant(True)]),
]),
]),
]),
]),
]),
("rocm", [
("3.9", [
("3.6", [
('build_only', [XImportant(True)]),
]),
]),
]),
# @jithunnair-amd believes Jenkins builds are sufficient
# ("rocm", [
# ("3.9", [
# ("3.6", [
# ('build_only', [XImportant(True)]),
# ]),
# ]),
# ]),
]),
]

View File

@ -31,6 +31,7 @@ class Conf:
is_libtorch: bool = False
is_important: bool = False
parallel_backend: Optional[str] = None
build_only: bool = False
@staticmethod
def is_test_phase(phase):
@ -112,6 +113,8 @@ class Conf:
parameters["resource_class"] = "xlarge"
if hasattr(self, 'filters'):
parameters['filters'] = self.filters
if self.build_only:
parameters['build_only'] = miniutils.quote(str(int(True)))
return parameters
def gen_workflow_job(self, phase):
@ -175,35 +178,6 @@ class DocPushConf(object):
}
}
# TODO Convert these to graph nodes
def gen_dependent_configs(xenial_parent_config):
extra_parms = [
(["multigpu"], "large"),
(["nogpu", "NO_AVX2"], None),
(["nogpu", "NO_AVX"], None),
(["slow"], "medium"),
]
configs = []
for parms, gpu in extra_parms:
c = Conf(
xenial_parent_config.distro,
["py3"] + parms,
pyver=xenial_parent_config.pyver,
cuda_version=xenial_parent_config.cuda_version,
restrict_phases=["test"],
gpu_resource=gpu,
parent_build=xenial_parent_config,
is_important=False,
)
configs.append(c)
return configs
def gen_docs_configs(xenial_parent_config):
configs = []
@ -211,7 +185,7 @@ def gen_docs_configs(xenial_parent_config):
HiddenConf(
"pytorch_python_doc_build",
parent_build=xenial_parent_config,
filters=gen_filter_dict(branches_list=r"/.*/",
filters=gen_filter_dict(branches_list=["master", "nightly"],
tags_list=RC_PATTERN),
)
)
@ -227,7 +201,7 @@ def gen_docs_configs(xenial_parent_config):
HiddenConf(
"pytorch_cpp_doc_build",
parent_build=xenial_parent_config,
filters=gen_filter_dict(branches_list=r"/.*/",
filters=gen_filter_dict(branches_list=["master", "nightly"],
tags_list=RC_PATTERN),
)
)
@ -238,13 +212,6 @@ def gen_docs_configs(xenial_parent_config):
branch="master",
)
)
configs.append(
HiddenConf(
"pytorch_doc_test",
parent_build=xenial_parent_config
)
)
return configs
@ -369,6 +336,7 @@ def instantiate_configs(only_slow_gradcheck):
is_libtorch=is_libtorch,
is_important=is_important,
parallel_backend=parallel_backend,
build_only=build_only,
)
# run docs builds on "pytorch-linux-xenial-py3.6-gcc5.4". Docs builds
@ -389,19 +357,19 @@ def instantiate_configs(only_slow_gradcheck):
tags_list=RC_PATTERN)
c.dependent_tests = gen_docs_configs(c)
if cuda_version == "10.2" and python_version == "3.6" and not is_libtorch and not is_slow_gradcheck:
c.dependent_tests = gen_dependent_configs(c)
if (
compiler_name == "gcc"
and compiler_version == "5.4"
compiler_name != "clang"
and not rocm_version
and not is_libtorch
and not is_vulkan
and not is_pure_torch
and parallel_backend is None
and not is_noarch
and not is_slow_gradcheck
and not only_slow_gradcheck
and not build_only
):
bc_breaking_check = Conf(
"backward-compatibility-check",
distributed_test = Conf(
c.gen_build_name("") + "distributed",
[],
is_xla=False,
restrict_phases=["test"],
@ -409,7 +377,7 @@ def instantiate_configs(only_slow_gradcheck):
is_important=True,
parent_build=c,
)
c.dependent_tests.append(bc_breaking_check)
c.dependent_tests.append(distributed_test)
config_list.append(c)

View File

@ -6,35 +6,39 @@ from cimodel.data.simple.util.branch_filters import gen_filter_dict, RC_PATTERN
# TODO: make this generated from a matrix rather than just a static list
IMAGE_NAMES = [
"pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9",
"pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7",
"pytorch-linux-bionic-py3.6-clang9",
"pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9",
"pytorch-linux-bionic-py3.8-gcc9",
"pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7",
"pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7",
"pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",
"pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",
"pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
"pytorch-linux-xenial-py3-clang5-asan",
"pytorch-linux-xenial-py3-clang7-asan",
"pytorch-linux-xenial-py3-clang7-onnx",
"pytorch-linux-xenial-py3.8",
"pytorch-linux-xenial-py3.6-clang7",
"pytorch-linux-xenial-py3.6-gcc5.4", # this one is used in doc builds
"pytorch-linux-xenial-py3.6-gcc7.2",
"pytorch-linux-xenial-py3.6-gcc7",
"pytorch-linux-bionic-rocm3.9-py3.6",
"pytorch-linux-bionic-rocm4.0.1-py3.6",
"pytorch-linux-bionic-rocm4.1-py3.6",
"pytorch-linux-bionic-rocm4.2-py3.6",
"pytorch-linux-bionic-rocm4.3.1-py3.6",
]
# This entry should be an element from the list above
# This should contain the image matching the "slow_gradcheck" entry in
# pytorch_build_data.py
SLOW_GRADCHECK_IMAGE_NAME = "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
def get_workflow_jobs():
def get_workflow_jobs(only_slow_gradcheck=False):
"""Generates a list of docker image build definitions"""
ret = []
for image_name in IMAGE_NAMES:
if only_slow_gradcheck and image_name is not SLOW_GRADCHECK_IMAGE_NAME:
continue
parameters = OrderedDict({
"name": quote(f"docker-{image_name}"),
"image_name": quote(image_name),

View File

@ -1,78 +0,0 @@
import cimodel.lib.miniutils as miniutils
from cimodel.data.simple.util.versions import MultiPartVersion, CudaVersion
from cimodel.data.simple.util.docker_constants import DOCKER_IMAGE_BASIC, DOCKER_IMAGE_CUDA_10_2
class GeConfigTestJob:
def __init__(self,
py_version,
gcc_version,
cuda_version,
variant_parts,
extra_requires,
use_cuda_docker=False,
build_env_override=None):
self.py_version = py_version
self.gcc_version = gcc_version
self.cuda_version = cuda_version
self.variant_parts = variant_parts
self.extra_requires = extra_requires
self.use_cuda_docker = use_cuda_docker
self.build_env_override = build_env_override
def get_all_parts(self, with_dots):
maybe_py_version = self.py_version.render_dots_or_parts(with_dots) if self.py_version else []
maybe_gcc_version = self.gcc_version.render_dots_or_parts(with_dots) if self.gcc_version else []
maybe_cuda_version = self.cuda_version.render_dots_or_parts(with_dots) if self.cuda_version else []
common_parts = [
"pytorch",
"linux",
"xenial",
] + maybe_cuda_version + maybe_py_version + maybe_gcc_version
return common_parts + self.variant_parts
def gen_tree(self):
resource_class = "gpu.medium" if self.use_cuda_docker else "large"
docker_image = DOCKER_IMAGE_CUDA_10_2 if self.use_cuda_docker else DOCKER_IMAGE_BASIC
full_name = "_".join(self.get_all_parts(False))
build_env = self.build_env_override or "-".join(self.get_all_parts(True))
props_dict = {
"name": full_name,
"build_environment": build_env,
"requires": self.extra_requires,
"resource_class": resource_class,
"docker_image": docker_image,
}
if self.use_cuda_docker:
props_dict["use_cuda_docker_runtime"] = miniutils.quote(str(1))
return [{"pytorch_linux_test": props_dict}]
WORKFLOW_DATA = [
GeConfigTestJob(
MultiPartVersion([3, 6], "py"),
MultiPartVersion([5, 4], "gcc"),
None,
["jit_legacy", "test"],
["pytorch_linux_xenial_py3_6_gcc5_4_build"]),
GeConfigTestJob(
None,
None,
CudaVersion(10, 2),
["cudnn7", "py3", "jit_legacy", "test"],
["pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build"],
use_cuda_docker=True,
),
]
def get_workflow_jobs():
return [item.gen_tree() for item in WORKFLOW_DATA]

View File

@ -1,7 +1,7 @@
from cimodel.data.simple.util.versions import MultiPartVersion
import cimodel.lib.miniutils as miniutils
XCODE_VERSION = MultiPartVersion([12, 0, 0])
XCODE_VERSION = MultiPartVersion([12, 5, 1])
class ArchVariant:

View File

@ -1,4 +1,5 @@
import cimodel.data.simple.ios_definitions as ios_definitions
import cimodel.lib.miniutils as miniutils
class IOSNightlyJob:
@ -43,6 +44,8 @@ class IOSNightlyJob:
props_dict["ios_arch"] = self.variant
props_dict["ios_platform"] = ios_definitions.get_platform(self.variant)
props_dict["name"] = self.gen_job_name()
props_dict["use_metal"] = miniutils.quote(str(int(True)))
props_dict["use_coreml"] = miniutils.quote(str(int(True)))
template_name = "_".join([
"binary",

View File

@ -58,7 +58,7 @@ class WindowsJob:
self.cudnn_version = 8 if self.cuda_version.major == 11 else 7
arch_env_elements = (
["cuda" + str(self.cuda_version.major), "cudnn" + str(self.cudnn_version)]
["cuda" + str(self.cuda_version.major) + "." + str(self.cuda_version.minor)]
if self.cuda_version
else ["cpu"]
)
@ -78,6 +78,7 @@ class WindowsJob:
props_dict = {
"build_environment": build_environment_string,
"python_version": miniutils.quote(python_version),
"vs_version": miniutils.quote("16.8.6"),
"vc_version": miniutils.quote(self.vscode_spec.dotted_version()),
"vc_year": miniutils.quote(str(self.vscode_spec.year)),
"vc_product": self.vscode_spec.get_product(),
@ -145,10 +146,10 @@ class VcSpec:
_VC2019 = VcSpec(2019)
WORKFLOW_DATA = [
# VS2019 CUDA-10.1
WindowsJob(None, _VC2019, CudaVersion(10, 1), master_only=True),
# VS2019 CUDA-10.1 force on cpu
WindowsJob(1, _VC2019, CudaVersion(10, 1), force_on_cpu=True, master_only=True),
# VS2019 CUDA-10.2
WindowsJob(None, _VC2019, CudaVersion(10, 2), master_only=True),
# VS2019 CUDA-10.2 force on cpu
WindowsJob(1, _VC2019, CudaVersion(10, 2), force_on_cpu=True, master_only=True),
# TODO: This test is disabled due to https://github.com/pytorch/pytorch/issues/59724
# WindowsJob('_azure_multi_gpu', _VC2019, CudaVersion(11, 1), multi_gpu=True, master_and_nightly=True),

1217
.circleci/config.yml generated

File diff suppressed because it is too large Load Diff

View File

@ -27,5 +27,5 @@ Docker builds are now defined with `.circleci/cimodel/data/simple/docker_definit
./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
# Set flags (see build.sh) and build image
sudo bash -c 'BREAKPAD=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
sudo bash -c 'PROTOBUF=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
```

View File

@ -78,119 +78,108 @@ TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/u
case "$image" in
pytorch-linux-xenial-py3.8)
ANACONDA_PYTHON_VERSION=3.8
CMAKE_VERSION=3.10.3
GCC_VERSION=7
# Do not install PROTOBUF, DB, and VISION as a test
;;
pytorch-linux-xenial-py3.6-gcc5.4)
ANACONDA_PYTHON_VERSION=3.6
CMAKE_VERSION=3.10.3
GCC_VERSION=5
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
BREAKPAD=yes
;;
pytorch-linux-xenial-py3.6-gcc7.2)
ANACONDA_PYTHON_VERSION=3.6
CMAKE_VERSION=3.10.3
GCC_VERSION=7
# Do not install PROTOBUF, DB, and VISION as a test
;;
pytorch-linux-xenial-py3.6-gcc7)
ANACONDA_PYTHON_VERSION=3.6
CMAKE_VERSION=3.10.3
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
;;
pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7)
CUDA_VERSION=10.0
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
;;
pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7)
CUDA_VERSION=10.1
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
BREAKPAD=yes
;;
pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7)
CUDA_VERSION=10.2
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.6
CMAKE_VERSION=3.10.3
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
BREAKPAD=yes
;;
pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7)
CUDA_VERSION=11.1
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.6
CMAKE_VERSION=3.10.3
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
BREAKPAD=yes
;;
pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7)
CUDA_VERSION=11.3.0 # Deviating from major.minor to conform to nvidia's Docker image names
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.6
CMAKE_VERSION=3.10.3
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
BREAKPAD=yes
;;
pytorch-linux-xenial-py3-clang5-asan)
ANACONDA_PYTHON_VERSION=3.6
CLANG_VERSION=5.0
CMAKE_VERSION=3.10.3
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-py3-clang7-asan)
ANACONDA_PYTHON_VERSION=3.6
CLANG_VERSION=7
CMAKE_VERSION=3.10.3
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
;;
pytorch-linux-xenial-py3-clang7-onnx)
ANACONDA_PYTHON_VERSION=3.6
CLANG_VERSION=7
CMAKE_VERSION=3.10.3
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
;;
pytorch-linux-xenial-py3-clang5-android-ndk-r19c)
ANACONDA_PYTHON_VERSION=3.6
CLANG_VERSION=5.0
CMAKE_VERSION=3.10.3
LLVMDEV=yes
PROTOBUF=yes
ANDROID=yes
ANDROID_NDK_VERSION=r19c
GRADLE_VERSION=6.8.3
CMAKE_VERSION=3.7.0
NINJA_VERSION=1.9.0
;;
pytorch-linux-xenial-py3.6-clang7)
ANACONDA_PYTHON_VERSION=3.6
CMAKE_VERSION=3.10.3
CLANG_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
;;
pytorch-linux-bionic-py3.6-clang9)
ANACONDA_PYTHON_VERSION=3.6
@ -198,7 +187,6 @@ case "$image" in
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
VULKAN_SDK_VERSION=1.2.162.1
SWIFTSHADER=yes
;;
@ -208,8 +196,6 @@ case "$image" in
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
BREAKPAD=yes
;;
pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9)
CUDA_VERSION=10.2
@ -219,17 +205,6 @@ case "$image" in
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
;;
pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9)
CUDA_VERSION=10.2
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
;;
pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7)
CUDA_VERSION=10.2
@ -239,7 +214,6 @@ case "$image" in
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
;;
pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9)
CUDA_VERSION=11.0
@ -249,25 +223,14 @@ case "$image" in
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
ROCM_VERSION=3.9
;;
pytorch-linux-bionic-rocm4.0.1-py3.6)
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
ROCM_VERSION=4.0.1
;;
pytorch-linux-bionic-rocm4.1-py3.6)
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
ROCM_VERSION=4.1
;;
pytorch-linux-bionic-rocm4.2-py3.6)
@ -276,16 +239,25 @@ case "$image" in
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
ROCM_VERSION=4.2
;;
pytorch-linux-bionic-rocm4.3.1-py3.6)
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=4.3.1
;;
*)
# Catch-all for builds that are not hardcoded.
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
echo "image '$image' did not match an existing build configuration"
if [[ "$image" == *xenial* ]]; then
CMAKE_VERSION=3.10.3
fi
if [[ "$image" == *py* ]]; then
extract_version_from_image_name py ANACONDA_PYTHON_VERSION
fi
@ -320,7 +292,7 @@ if [ -n "${JENKINS:-}" ]; then
JENKINS_GID=$(id -g jenkins)
fi
tmp_tag="tmp-$(cat /dev/urandom | tr -dc 'a-z' | head -c 32)"
tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')
# Build image
# TODO: build-arg THRIFT is not turned on for any image, remove it once we confirm
@ -348,7 +320,6 @@ docker build \
--build-arg "GCC_VERSION=${GCC_VERSION}" \
--build-arg "CUDA_VERSION=${CUDA_VERSION}" \
--build-arg "CUDNN_VERSION=${CUDNN_VERSION}" \
--build-arg "BREAKPAD=${BREAKPAD}" \
--build-arg "ANDROID=${ANDROID}" \
--build-arg "ANDROID_NDK=${ANDROID_NDK_VERSION}" \
--build-arg "GRADLE_VERSION=${GRADLE_VERSION}" \

View File

@ -1,25 +0,0 @@
#!/bin/bash
set -ex
git clone https://github.com/driazati/breakpad.git
pushd breakpad
# breakpad has no actual releases, so this is pinned to the top commit from
# main when this was forked (including the one patch commit). This uses a fork
# of the breakpad mainline that automatically daisy-chains out to any previously
# installed signal handlers (instead of overwriting them).
git checkout 5485e473ed46d065e05489e50dfc59d90dfd7e22
git clone https://chromium.googlesource.com/linux-syscall-support src/third_party/lss
pushd src/third_party/lss
# same as with breakpad, there are no real releases for this repo so use a
# commit as the pin
git checkout e1e7b0ad8ee99a875b272c8e33e308472e897660
popd
./configure
make
make install
popd
rm -rf breakpad

View File

@ -4,6 +4,9 @@ set -ex
[ -n "$CMAKE_VERSION" ]
# Remove system cmake install so it won't get used instead
apt-get remove cmake -y
# Turn 3.6.3 into v3.6
path=$(echo "${CMAKE_VERSION}" | sed -e 's/\([0-9].[0-9]\+\).*/v\1/')
file="cmake-${CMAKE_VERSION}-Linux-x86_64.tar.gz"

View File

@ -69,8 +69,8 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
}
# Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README
# DO NOT install cmake here as it would install a version newer than 3.5, but
# we want to pin to version 3.5.
# DO NOT install cmake here as it would install a version newer than 3.10, but
# we want to pin to version 3.10.
SCIPY_VERSION=1.1.0
if [ "$ANACONDA_PYTHON_VERSION" = "3.9" ]; then
# Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
@ -86,11 +86,7 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
conda_install numpy=1.18.5 astunparse pyyaml mkl mkl-include setuptools cffi future six dataclasses typing_extensions
fi
if [[ "$CUDA_VERSION" == 10.0* ]]; then
conda_install magma-cuda100 -c pytorch
elif [[ "$CUDA_VERSION" == 10.1* ]]; then
conda_install magma-cuda101 -c pytorch
elif [[ "$CUDA_VERSION" == 10.2* ]]; then
if [[ "$CUDA_VERSION" == 10.2* ]]; then
conda_install magma-cuda102 -c pytorch
elif [[ "$CUDA_VERSION" == 11.0* ]]; then
conda_install magma-cuda110 -c pytorch
@ -116,6 +112,7 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
boto3==1.16.34 \
coverage==5.5 \
hypothesis==4.53.2 \
expecttest==0.1.3 \
mypy==0.812 \
tb-nightly

View File

@ -2,23 +2,6 @@
set -ex
# This function installs protobuf 2.6
install_protobuf_26() {
pb_dir="/usr/temp_pb_install_dir"
mkdir -p $pb_dir
# On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or
# else it will fail with
# g++: error: ./../lib64/crti.o: No such file or directory
ln -s /usr/lib64 "$pb_dir/lib64"
curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"
tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz
pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig
popd
rm -rf $pb_dir
}
install_ubuntu() {
apt-get update
apt-get install -y --no-install-recommends \

View File

@ -1,4 +0,0 @@
#!/bin/bash
sudo apt-get -qq update
sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1

View File

@ -1,4 +1,10 @@
#!/bin/bash
sudo apt-get update
# also install ssh to avoid error of:
# --------------------------------------------------------------------------
# The value of the MCA parameter "plm_rsh_agent" was set to a path
# that could not be found:
# plm_rsh_agent: ssh : rsh
sudo apt-get install -y ssh
sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

View File

@ -2,8 +2,8 @@
set -ex
# This function installs protobuf 2.6
install_protobuf_26() {
# This function installs protobuf 3.17
install_protobuf_317() {
pb_dir="/usr/temp_pb_install_dir"
mkdir -p $pb_dir
@ -12,37 +12,32 @@ install_protobuf_26() {
# g++: error: ./../lib64/crti.o: No such file or directory
ln -s /usr/lib64 "$pb_dir/lib64"
curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"
tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz
pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig
curl -LO "https://github.com/protocolbuffers/protobuf/releases/download/v3.17.3/protobuf-all-3.17.3.tar.gz"
tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-all-3.17.3.tar.gz
# -j2 to balance memory usage and speed.
# naked `-j` seems to use too much memory.
pushd "$pb_dir" && ./configure && make -j2 && make -j2 check && sudo make -j2 install && sudo ldconfig
popd
rm -rf $pb_dir
}
install_ubuntu() {
# Ubuntu 14.04 ships with protobuf 2.5, but ONNX needs protobuf >= 2.6
# so we install that here if on 14.04
# Ubuntu 14.04 also has cmake 2.8.12 as the default option, so we will
# Ubuntu 14.04 has cmake 2.8.12 as the default option, so we will
# install cmake3 here and use cmake3.
apt-get update
if [[ "$UBUNTU_VERSION" == 14.04 ]]; then
apt-get install -y --no-install-recommends cmake3
install_protobuf_26
else
apt-get install -y --no-install-recommends \
libprotobuf-dev \
protobuf-compiler
fi
# Cleanup
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
install_protobuf_317
}
install_centos() {
# Centos7 ships with protobuf 2.5, but ONNX needs protobuf >= 2.6
# so we always install install that here
install_protobuf_26
install_protobuf_317
}
# Install base packages depending on the base OS

View File

@ -4,9 +4,13 @@ set -ex
install_magma() {
# "install" hipMAGMA into /opt/rocm/magma by copying after build
git clone https://bitbucket.org/icl/magma.git
git clone https://bitbucket.org/icl/magma.git -b magma_ctrl_launch_bounds
pushd magma
git checkout 878b1ce02e9cfe4a829be22c8f911e9c0b6bd88f
# The branch "magma_ctrl_launch_bounds" is having a fix over the below commit, so keeping the below comment for reference.
#git checkout 878b1ce02e9cfe4a829be22c8f911e9c0b6bd88f
# Work around non-asii characters in certain magma sources; remove this after upstream magma fixes this.
perl -i.bak -pe 's/[^[:ascii:]]//g' sparse/control/magma_zfree.cpp
perl -i.bak -pe 's/[^[:ascii:]]//g' sparse/control/magma_zsolverinfo.cpp
cp make.inc-examples/make.inc.hip-gcc-mkl make.inc
echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc
echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc
@ -15,7 +19,7 @@ install_magma() {
sed -i 's/^FOPENMP/#FOPENMP/g' make.inc
export PATH="${PATH}:/opt/rocm/bin"
make -f make.gen.hipMAGMA -j $(nproc)
LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda
make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda
make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda
popd
mv magma /opt/rocm

View File

@ -2,23 +2,6 @@
set -ex
# This function installs protobuf 2.6
install_protobuf_26() {
pb_dir="/usr/temp_pb_install_dir"
mkdir -p $pb_dir
# On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or
# else it will fail with
# g++: error: ./../lib64/crti.o: No such file or directory
ln -s /usr/lib64 "$pb_dir/lib64"
curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"
tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz
pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig
popd
rm -rf $pb_dir
}
install_ubuntu() {
apt-get update
apt-get install -y --no-install-recommends \

View File

@ -61,6 +61,16 @@ RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh
ENV INSTALLED_VISION ${VISION}
ADD ./common/install_openssl.sh install_openssl.sh
ENV OPENSSL_ROOT_DIR /opt/openssl
RUN bash ./install_openssl.sh
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
ADD ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# Install ccache/sccache (do this last, so we get priority in PATH)
ADD ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
@ -72,11 +82,6 @@ ADD ./common/install_jni.sh install_jni.sh
ADD ./java/jni.h jni.h
RUN bash ./install_jni.sh && rm install_jni.sh
# Install NCCL for when CUDA is version 10.1
ADD ./common/install_nccl.sh install_nccl.sh
RUN if [ "${CUDA_VERSION}" = 10.1 ]; then bash ./install_nccl.sh; fi
RUN rm install_nccl.sh
# Install Open MPI for CUDA
ADD ./common/install_openmpi.sh install_openmpi.sh
RUN if [ -n "${CUDA_VERSION}" ]; then bash install_openmpi.sh; fi
@ -93,9 +98,5 @@ ENV TORCH_NVCC_FLAGS "-Xfatbin -compress-all"
# Install LLVM dev version (Defined in the pytorch/builder github repository)
COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm
ADD ./common/install_openssl.sh install_openssl.sh
ENV OPENSSL_ROOT_DIR /opt/openssl
RUN bash ./install_openssl.sh
USER jenkins
CMD ["bash"]

View File

@ -82,13 +82,6 @@ RUN rm AndroidManifest.xml
RUN rm build.gradle
ENV INSTALLED_ANDROID ${ANDROID}
# (optional) Install breakpad
ARG BREAKPAD
ADD ./common/install_breakpad.sh install_breakpad.sh
RUN if [ -n "${BREAKPAD}" ]; then bash ./install_breakpad.sh; fi
RUN rm install_breakpad.sh
ENV INSTALLED_BREAKPAD ${BREAKPAD}
# (optional) Install Vulkan SDK
ARG VULKAN_SDK_VERSION
ADD ./common/install_vulkan_sdk.sh install_vulkan_sdk.sh
@ -113,6 +106,10 @@ ADD ./common/install_ninja.sh install_ninja.sh
RUN if [ -n "${NINJA_VERSION}" ]; then bash ./install_ninja.sh; fi
RUN rm install_ninja.sh
ADD ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh
ENV OPENSSL_ROOT_DIR /opt/openssl
# Install ccache/sccache (do this last, so we get priority in PATH)
ADD ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
@ -130,9 +127,5 @@ ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
# Install LLVM dev version (Defined in the pytorch/builder github repository)
COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm
ADD ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh
ENV OPENSSL_ROOT_DIR /opt/openssl
USER jenkins
CMD ["bash"]

View File

@ -13,10 +13,8 @@ from collections import namedtuple
import cimodel.data.binary_build_definitions as binary_build_definitions
import cimodel.data.pytorch_build_definitions as pytorch_build_definitions
import cimodel.data.simple.android_definitions
import cimodel.data.simple.bazel_definitions
import cimodel.data.simple.binary_smoketest
import cimodel.data.simple.docker_definitions
import cimodel.data.simple.ge_config_tests
import cimodel.data.simple.ios_definitions
import cimodel.data.simple.macos_definitions
import cimodel.data.simple.mobile_definitions
@ -135,8 +133,6 @@ def gen_build_workflows_tree():
cimodel.data.simple.android_definitions.get_workflow_jobs,
cimodel.data.simple.ios_definitions.get_workflow_jobs,
cimodel.data.simple.mobile_definitions.get_workflow_jobs,
cimodel.data.simple.ge_config_tests.get_workflow_jobs,
cimodel.data.simple.bazel_definitions.get_workflow_jobs,
cimodel.data.simple.binary_smoketest.get_workflow_jobs,
cimodel.data.simple.nightly_ios.get_workflow_jobs,
cimodel.data.simple.nightly_android.get_workflow_jobs,
@ -154,7 +150,10 @@ def gen_build_workflows_tree():
binary_build_definitions.get_nightly_uploads,
]
slow_gradcheck_jobs = pytorch_build_definitions.get_workflow_jobs(only_slow_gradcheck=True)
slow_gradcheck_jobs = [
pytorch_build_definitions.get_workflow_jobs,
cimodel.data.simple.docker_definitions.get_workflow_jobs,
]
return {
"workflows": {
@ -172,7 +171,7 @@ def gen_build_workflows_tree():
},
"slow_gradcheck_build": {
"when": r"<< pipeline.parameters.run_slow_gradcheck_build >>",
"jobs": slow_gradcheck_jobs,
"jobs": [f(only_slow_gradcheck=True) for f in slow_gradcheck_jobs],
},
}
}

View File

@ -55,13 +55,13 @@ else
echo "Can't tell what to checkout"
exit 1
fi
retry git submodule update --init --recursive
retry git submodule update --init --recursive --jobs 0
echo "Using Pytorch from "
git --no-pager log --max-count 1
popd
# Clone the Builder master repo
retry git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"
retry git clone -q https://github.com/pytorch/builder.git -b release/1.10 "$BUILDER_ROOT"
pushd "$BUILDER_ROOT"
echo "Using builder from "
git --no-pager log --max-count 1

View File

@ -22,7 +22,7 @@ export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
# sync submodules
cd ${PROJ_ROOT}
git submodule sync
git submodule update --init --recursive
git submodule update --init --recursive --jobs 0
# run build script
chmod a+x ${PROJ_ROOT}/scripts/build_ios.sh
@ -31,8 +31,12 @@ cat ${PROJ_ROOT}/scripts/build_ios.sh
echo "########################################################"
echo "IOS_ARCH: ${IOS_ARCH}"
echo "IOS_PLATFORM: ${IOS_PLATFORM}"
echo "USE_PYTORCH_METAL: ${USE_PYTORCH_METAL}"
echo "USE_COREML_DELEGATE: ${USE_COREML_DELEGATE}"
export IOS_ARCH=${IOS_ARCH}
export IOS_PLATFORM=${IOS_PLATFORM}
export USE_PYTORCH_METAL=${USE_PYTORCH_METAL}
export USE_COREML_DELEGATE=${USE_COREML_DELEGATE}
unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts
#store the binary

View File

@ -8,16 +8,17 @@ cd ${PROJ_ROOT}/ios/TestApp
# install fastlane
sudo gem install bundler && bundle install
# install certificates
echo "${IOS_CERT_KEY}" >> cert.txt
echo "${IOS_CERT_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o Certificates.p12
rm cert.txt
bundle exec fastlane install_cert
bundle exec fastlane install_root_cert
bundle exec fastlane install_dev_cert
# install the provisioning profile
PROFILE=PyTorch_CI_2021.mobileprovision
PROFILE=PyTorch_CI_2022.mobileprovision
PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
mkdir -pv "${PROVISIONING_PROFILES}"
cd "${PROVISIONING_PROFILES}"
echo "${IOS_SIGN_KEY}" >> cert.txt
echo "${IOS_SIGN_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o ${PROFILE}
rm cert.txt
# run the ruby build script
@ -25,5 +26,5 @@ if ! [ -x "$(command -v xcodebuild)" ]; then
echo 'Error: xcodebuild is not installed.'
exit 1
fi
PROFILE=PyTorch_CI_2021
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}
PROFILE=PyTorch_CI_2022
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID} -f Accelerate,MetalPerformanceShaders,CoreML

View File

@ -27,11 +27,14 @@ lipo -i ${ZIP_DIR}/install/lib/*.a
cp ${PROJ_ROOT}/ios/LibTorch-Lite.h ${ZIP_DIR}/src/
cp ${PROJ_ROOT}/LICENSE ${ZIP_DIR}/
# zip the library
ZIPFILE=libtorch_ios_nightly_build.zip
export DATE="$(date -u +%Y%m%d)"
export IOS_NIGHTLY_BUILD_VERSION="1.10.0.${DATE}"
# libtorch_lite_ios_nightly_1.10.0.20210810.zip
ZIPFILE="libtorch_lite_ios_nightly_${IOS_NIGHTLY_BUILD_VERSION}.zip"
cd ${ZIP_DIR}
#for testing
touch version.txt
echo $(date +%s) > version.txt
echo "${IOS_NIGHTLY_BUILD_VERSION}" > version.txt
zip -r ${ZIPFILE} install src version.txt LICENSE
# upload to aws
# Install conda then 'conda install' awscli
@ -48,3 +51,14 @@ set +x
# echo "AWS KEY: ${AWS_ACCESS_KEY_ID}"
# echo "AWS SECRET: ${AWS_SECRET_ACCESS_KEY}"
aws s3 cp ${ZIPFILE} s3://ossci-ios-build/ --acl public-read
# create a new LibTorch-Lite-Nightly.podspec from the template
echo "cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec"
cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
# update pod version
sed -i '' -e "s/IOS_NIGHTLY_BUILD_VERSION/${IOS_NIGHTLY_BUILD_VERSION}/g" ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
cat ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
# push the new LibTorch-Lite-Nightly.podspec to CocoaPods
pod trunk push --verbose --allow-warnings --use-libraries --skip-import-validation ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec

View File

@ -9,10 +9,6 @@ python_nodot="\$(echo $DESIRED_PYTHON | tr -d m.u)"
# Set up Python
if [[ "$PACKAGE_TYPE" == conda ]]; then
# There was a bug that was introduced in conda-package-handling >= 1.6.1 that makes archives
# above a certain size fail out when attempting to extract
# see: https://github.com/conda/conda-package-handling/issues/71
conda install -y conda-package-handling=1.6.0
retry conda create -qyn testenv python="$DESIRED_PYTHON"
source activate testenv >/dev/null
elif [[ "$PACKAGE_TYPE" != libtorch ]]; then

View File

@ -14,6 +14,10 @@ chmod +x "$build_script"
# Build
cat >"$build_script" <<EOL
export PATH="$workdir/miniconda/bin:$PATH"
if [[ "$CIRCLE_BRANCH" == "nightly" ]]; then
export USE_PYTORCH_METAL_EXPORT=1
export USE_COREML_DELEGATE=1
fi
if [[ "$PACKAGE_TYPE" == conda ]]; then
"$workdir/builder/conda/build_pytorch.sh"
else

View File

@ -62,7 +62,7 @@ if [[ -z "$DOCKER_IMAGE" ]]; then
if [[ "$PACKAGE_TYPE" == conda ]]; then
export DOCKER_IMAGE="pytorch/conda-cuda"
elif [[ "$DESIRED_CUDA" == cpu ]]; then
export DOCKER_IMAGE="pytorch/manylinux-cuda100"
export DOCKER_IMAGE="pytorch/manylinux-cpu"
else
export DOCKER_IMAGE="pytorch/manylinux-cuda${DESIRED_CUDA:2}"
fi

View File

@ -8,15 +8,45 @@ export CUDA_VERSION="${DESIRED_CUDA/cu/}"
export USE_SCCACHE=1
export SCCACHE_BUCKET=ossci-compiler-cache-windows
export NIGHTLIES_PYTORCH_ROOT="$PYTORCH_ROOT"
if [[ "$CUDA_VERSION" == "92" || "$CUDA_VERSION" == "100" ]]; then
export VC_YEAR=2017
else
export VC_YEAR=2019
fi
export VC_YEAR=2019
if [[ "${DESIRED_CUDA}" == "cu111" || "${DESIRED_CUDA}" == "cu113" ]]; then
export BUILD_SPLIT_CUDA="ON"
export BUILD_SPLIT_CUDA="ON"
fi
echo "Free Space for CUDA DEBUG BUILD"
if [[ "$CIRCLECI" == 'true' ]]; then
if [[ -d "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community" ]]; then
rm -rf "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community"
fi
if [[ -d "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0" ]]; then
rm -rf "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0"
fi
if [[ -d "C:\\Program Files (x86)\\Microsoft.NET" ]]; then
rm -rf "C:\\Program Files (x86)\\Microsoft.NET"
fi
if [[ -d "C:\\Program Files\\dotnet" ]]; then
rm -rf "C:\\Program Files\\dotnet"
fi
if [[ -d "C:\\Program Files (x86)\\dotnet" ]]; then
rm -rf "C:\\Program Files (x86)\\dotnet"
fi
if [[ -d "C:\\Program Files (x86)\\Microsoft SQL Server" ]]; then
rm -rf "C:\\Program Files (x86)\\Microsoft SQL Server"
fi
if [[ -d "C:\\Program Files (x86)\\Xamarin" ]]; then
rm -rf "C:\\Program Files (x86)\\Xamarin"
fi
if [[ -d "C:\\Program Files (x86)\\Google" ]]; then
rm -rf "C:\\Program Files (x86)\\Google"
fi
fi
set +x
@ -32,7 +62,8 @@ if [[ "$CIRCLECI" == 'true' && -d "C:\\ProgramData\\Microsoft\\VisualStudio\\Pac
fi
if [[ "$CIRCLECI" == 'true' && -d "C:\\Microsoft" ]]; then
rm -rf "C:\\Microsoft\\Android*"
# don't use quotes here
rm -rf /c/Microsoft/AndroidNDK*
fi
echo "Free space on filesystem before build:"

View File

@ -4,13 +4,7 @@ set -eux -o pipefail
source "/c/w/env"
export CUDA_VERSION="${DESIRED_CUDA/cu/}"
export VC_YEAR=2017
if [[ "$CUDA_VERSION" == "92" || "$CUDA_VERSION" == "100" ]]; then
export VC_YEAR=2017
else
export VC_YEAR=2019
fi
export VC_YEAR=2019
pushd "$BUILDER_ROOT"

View File

@ -10,18 +10,27 @@ pt_checkout="/var/lib/jenkins/workspace"
# Since we're cat-ing this file, we need to escape all $'s
echo "cpp_doc_push_script.sh: Invoked with $*"
# Argument 1: Where to copy the built documentation for Python API to
# (pytorch.github.io/$install_path)
install_path="$1"
if [ -z "$install_path" ]; then
echo "error: cpp_doc_push_script.sh: install_path (arg1) not specified"
# for statements like ${1:-${DOCS_INSTALL_PATH:-docs/}}
# the order of operations goes:
# 1. Check if there's an argument $1
# 2. If no argument check for environment var DOCS_INSTALL_PATH
# 3. If no environment var fall back to default 'docs/'
# NOTE: It might seem weird to gather the second argument before gathering the first argument
# but since DOCS_INSTALL_PATH can be derived from DOCS_VERSION it's probably better to
# try and gather it first, just so we don't potentially break people who rely on this script
# Argument 2: What version of the Python API docs we are building.
version="${2:-${DOCS_VERSION:-master}}"
if [ -z "$version" ]; then
echo "error: cpp_doc_push_script.sh: version (arg2) not specified"
exit 1
fi
# Argument 2: What version of the Python API docs we are building.
version="$2"
if [ -z "$version" ]; then
echo "error: cpp_doc_push_script.sh: version (arg2) not specified"
# Argument 1: Where to copy the built documentation for Python API to
# (pytorch.github.io/$install_path)
install_path="${1:-${DOCS_INSTALL_PATH:-docs/${DOCS_VERSION}}}"
if [ -z "$install_path" ]; then
echo "error: cpp_doc_push_script.sh: install_path (arg1) not specified"
exit 1
fi

View File

@ -13,18 +13,27 @@ echo "python_doc_push_script.sh: Invoked with $*"
set -ex
# Argument 1: Where to copy the built documentation to
# (pytorch.github.io/$install_path)
install_path="$1"
if [ -z "$install_path" ]; then
echo "error: python_doc_push_script.sh: install_path (arg1) not specified"
# for statements like ${1:-${DOCS_INSTALL_PATH:-docs/}}
# the order of operations goes:
# 1. Check if there's an argument $1
# 2. If no argument check for environment var DOCS_INSTALL_PATH
# 3. If no environment var fall back to default 'docs/'
# NOTE: It might seem weird to gather the second argument before gathering the first argument
# but since DOCS_INSTALL_PATH can be derived from DOCS_VERSION it's probably better to
# try and gather it first, just so we don't potentially break people who rely on this script
# Argument 2: What version of the docs we are building.
version="${2:-${DOCS_VERSION:-master}}"
if [ -z "$version" ]; then
echo "error: python_doc_push_script.sh: version (arg2) not specified"
exit 1
fi
# Argument 2: What version of the docs we are building.
version="$2"
if [ -z "$version" ]; then
echo "error: python_doc_push_script.sh: version (arg2) not specified"
# Argument 1: Where to copy the built documentation to
# (pytorch.github.io/$install_path)
install_path="${1:-${DOCS_INSTALL_PATH:-docs/${DOCS_VERSION}}}"
if [ -z "$install_path" ]; then
echo "error: python_doc_push_script.sh: install_path (arg1) not specified"
exit 1
fi
@ -34,7 +43,7 @@ if [ "$version" == "master" ]; then
fi
# Argument 3: The branch to push to. Usually is "site"
branch="$3"
branch="${3:-${DOCS_BRANCH:-site}}"
if [ -z "$branch" ]; then
echo "error: python_doc_push_script.sh: branch (arg3) not specified"
exit 1

View File

@ -7,6 +7,9 @@ sudo rm -f /etc/apt/heroku.list
sudo rm -f /etc/apt/openjdk-r-ubuntu-ppa-xenial.list
sudo rm -f /etc/apt/partner.list
# To increase the network reliability, let apt decide which mirror is best to use
sudo sed -i -e 's/http:\/\/.*archive/mirror:\/\/mirrors/' -e 's/\/ubuntu\//\/mirrors.txt/' /etc/apt/sources.list
retry () {
$* || $* || $* || $* || $*
}
@ -40,9 +43,9 @@ if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L "https://nvidia.github.io/nvidia-docker/${distribution}/nvidia-docker.list" | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update -qq
retry sudo apt-get update -qq
# Necessary to get the `--gpus` flag to function within docker
sudo apt-get install -y nvidia-container-toolkit
retry sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
else
# Explicitly remove nvidia docker apt repositories if not building for cuda
@ -64,6 +67,7 @@ add_to_env_file() {
}
add_to_env_file IN_CI 1
add_to_env_file CI_MASTER "${CI_MASTER:-}"
add_to_env_file COMMIT_SOURCE "${CIRCLE_BRANCH:-}"
add_to_env_file BUILD_ENVIRONMENT "${BUILD_ENVIRONMENT}"
add_to_env_file CIRCLE_PULL_REQUEST "${CIRCLE_PULL_REQUEST}"

View File

@ -1,8 +1,8 @@
# https://developercommunity.visualstudio.com/t/install-specific-version-of-vs-component/1142479
# https://docs.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers
# Where to find the links: https://docs.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers
# 16.8.5 BuildTools
$VS_DOWNLOAD_LINK = "https://download.visualstudio.microsoft.com/download/pr/20130c62-1bc8-43d6-b4f0-c20bb7c79113/145a319d79a83376915d8f855605e152ef5f6fa2b2f1d2dca411fb03722eea72/vs_BuildTools.exe"
# BuildTools from S3
$VS_DOWNLOAD_LINK = "https://s3.amazonaws.com/ossci-windows/vs${env:VS_VERSION}_BuildTools.exe"
$COLLECT_DOWNLOAD_LINK = "https://aka.ms/vscollect.exe"
$VS_INSTALL_ARGS = @("--nocache","--quiet","--wait", "--add Microsoft.VisualStudio.Workload.VCTools",
"--add Microsoft.Component.MSBuild",
@ -18,32 +18,41 @@ if (${env:INSTALL_WINDOWS_SDK} -eq "1") {
$VS_INSTALL_ARGS += "--add Microsoft.VisualStudio.Component.Windows10SDK.19041"
}
if (Test-Path "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe") {
$VS_VERSION_major = [int] ${env:VS_VERSION}.split(".")[0]
$existingPath = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -products "Microsoft.VisualStudio.Product.BuildTools" -version "[${env:VS_VERSION}, ${env:VS_VERSION_major + 1})" -property installationPath
if (($existingPath -ne $null) -and (!${env:CIRCLECI})) {
echo "Found correctly versioned existing BuildTools installation in $existingPath"
exit 0
}
$pathToRemove = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -products "Microsoft.VisualStudio.Product.BuildTools" -property installationPath
}
echo "Downloading VS installer from S3."
curl.exe --retry 3 -kL $VS_DOWNLOAD_LINK --output vs_installer.exe
if ($LASTEXITCODE -ne 0) {
echo "Download of the VS 2019 Version 16.8.5 installer failed"
echo "Download of the VS 2019 Version ${env:VS_VERSION} installer failed"
exit 1
}
if (Test-Path "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe") {
$existingPath = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -products "Microsoft.VisualStudio.Product.BuildTools" -version "[16, 17)" -property installationPath
if ($existingPath -ne $null) {
echo "Found existing BuildTools installation in $existingPath"
$VS_UNINSTALL_ARGS = @("uninstall", "--installPath", "`"$existingPath`"", "--quiet","--wait")
$process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_UNINSTALL_ARGS -NoNewWindow -Wait -PassThru
$exitCode = $process.ExitCode
if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {
echo "Original BuildTools uninstall failed with code $exitCode"
exit 1
}
echo "Original BuildTools uninstalled"
if ($pathToRemove -ne $null) {
echo "Uninstalling $pathToRemove."
$VS_UNINSTALL_ARGS = @("uninstall", "--installPath", "`"$pathToRemove`"", "--quiet","--wait")
$process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_UNINSTALL_ARGS -NoNewWindow -Wait -PassThru
$exitCode = $process.ExitCode
if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {
echo "Original BuildTools uninstall failed with code $exitCode"
exit 1
}
echo "Other versioned BuildTools uninstalled."
}
echo "Installing Visual Studio version ${env:VS_VERSION}."
$process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_INSTALL_ARGS -NoNewWindow -Wait -PassThru
Remove-Item -Path vs_installer.exe -Force
$exitCode = $process.ExitCode
if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {
echo "VS 2017 installer exited with code $exitCode, which should be one of [0, 3010]."
echo "VS 2019 installer exited with code $exitCode, which should be one of [0, 3010]."
curl.exe --retry 3 -kL $COLLECT_DOWNLOAD_LINK --output Collect.exe
if ($LASTEXITCODE -ne 0) {
echo "Download of the VS Collect tool failed."
@ -51,6 +60,6 @@ if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {
}
Start-Process "${PWD}\Collect.exe" -NoNewWindow -Wait -PassThru
New-Item -Path "C:\w\build-results" -ItemType "directory" -Force
Copy-Item -Path "C:\Users\circleci\AppData\Local\Temp\vslogs.zip" -Destination "C:\w\build-results\"
Copy-Item -Path "${env:TEMP}\vslogs.zip" -Destination "C:\w\build-results\"
exit 1
}

View File

@ -1,70 +1,74 @@
#!/bin/bash
set -eux -o pipefail
cuda_major_version=${CUDA_VERSION%.*}
if [[ "$cuda_major_version" == "10" ]]; then
cuda_installer_name="cuda_10.1.243_426.00_win10"
msbuild_project_dir="CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"
cuda_install_packages="nvcc_10.1 cuobjdump_10.1 nvprune_10.1 cupti_10.1 cublas_10.1 cublas_dev_10.1 cudart_10.1 cufft_10.1 cufft_dev_10.1 curand_10.1 curand_dev_10.1 cusolver_10.1 cusolver_dev_10.1 cusparse_10.1 cusparse_dev_10.1 nvgraph_10.1 nvgraph_dev_10.1 npp_10.1 npp_dev_10.1 nvrtc_10.1 nvrtc_dev_10.1 nvml_dev_10.1"
elif [[ "$cuda_major_version" == "11" ]]; then
if [[ "${CUDA_VERSION}" == "11.1" ]]; then
case ${CUDA_VERSION} in
10.1)
cuda_installer_name="cuda_10.1.243_426.00_win10"
cuda_install_packages="nvcc_10.1 cuobjdump_10.1 nvprune_10.1 cupti_10.1 cublas_10.1 cublas_dev_10.1 cudart_10.1 cufft_10.1 cufft_dev_10.1 curand_10.1 curand_dev_10.1 cusolver_10.1 cusolver_dev_10.1 cusparse_10.1 cusparse_dev_10.1 nvgraph_10.1 nvgraph_dev_10.1 npp_10.1 npp_dev_10.1 nvrtc_10.1 nvrtc_dev_10.1 nvml_dev_10.1"
;;
10.2)
cuda_installer_name="cuda_10.2.89_441.22_win10"
cuda_install_packages="nvcc_10.2 cuobjdump_10.2 nvprune_10.2 cupti_10.2 cublas_10.2 cublas_dev_10.2 cudart_10.2 cufft_10.2 cufft_dev_10.2 curand_10.2 curand_dev_10.2 cusolver_10.2 cusolver_dev_10.2 cusparse_10.2 cusparse_dev_10.2 nvgraph_10.2 nvgraph_dev_10.2 npp_10.2 npp_dev_10.2 nvrtc_10.2 nvrtc_dev_10.2 nvml_dev_10.2"
;;
11.1)
cuda_installer_name="cuda_11.1.0_456.43_win10"
msbuild_project_dir="visual_studio_integration/CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"
cuda_install_packages="nvcc_11.1 cuobjdump_11.1 nvprune_11.1 nvprof_11.1 cupti_11.1 cublas_11.1 cublas_dev_11.1 cudart_11.1 cufft_11.1 cufft_dev_11.1 curand_11.1 curand_dev_11.1 cusolver_11.1 cusolver_dev_11.1 cusparse_11.1 cusparse_dev_11.1 npp_11.1 npp_dev_11.1 nvrtc_11.1 nvrtc_dev_11.1 nvml_dev_11.1"
elif [[ "${CUDA_VERSION}" == "11.3" ]]; then
;;
11.3)
cuda_installer_name="cuda_11.3.0_465.89_win10"
msbuild_project_dir="visual_studio_integration/CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"
cuda_install_packages="thrust_11.3 nvcc_11.3 cuobjdump_11.3 nvprune_11.3 nvprof_11.3 cupti_11.3 cublas_11.3 cublas_dev_11.3 cudart_11.3 cufft_11.3 cufft_dev_11.3 curand_11.3 curand_dev_11.3 cusolver_11.3 cusolver_dev_11.3 cusparse_11.3 cusparse_dev_11.3 npp_11.3 npp_dev_11.3 nvrtc_11.3 nvrtc_dev_11.3 nvml_dev_11.3"
else
echo "This should not happen! ABORT."
;;
*)
echo "CUDA_VERSION $CUDA_VERSION is not supported yet"
exit 1
fi
;;
esac
if [[ -f "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe" ]]; then
echo "Existing CUDA v${CUDA_VERSION} installation found, skipping install"
else
echo "CUDA_VERSION $CUDA_VERSION is not supported yet"
exit 1
tmp_dir=$(mktemp -d)
(
# no need to popd after, the subshell shouldn't affect the parent shell
pushd "${tmp_dir}"
cuda_installer_link="https://ossci-windows.s3.amazonaws.com/${cuda_installer_name}.exe"
curl --retry 3 -kLO $cuda_installer_link
7z x ${cuda_installer_name}.exe -o${cuda_installer_name}
pushd ${cuda_installer_name}
mkdir cuda_install_logs
set +e
# This breaks for some reason if you quote cuda_install_packages
# shellcheck disable=SC2086
./setup.exe -s ${cuda_install_packages} -loglevel:6 -log:"$(pwd -W)/cuda_install_logs"
set -e
if [[ ! -f "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe" ]]; then
echo "CUDA installation failed"
mkdir -p /c/w/build-results
7z a "c:\\w\\build-results\\cuda_install_logs.7z" cuda_install_logs
exit 1
fi
)
rm -rf "${tmp_dir}"
fi
if [[ "$cuda_major_version" == "11" && "${JOB_EXECUTOR:-}" == "windows-with-nvidia-gpu" ]]; then
cuda_install_packages="${cuda_install_packages} Display.Driver"
fi
cuda_installer_link="https://ossci-windows.s3.amazonaws.com/${cuda_installer_name}.exe"
curl --retry 3 -kLO $cuda_installer_link
7z x ${cuda_installer_name}.exe -o${cuda_installer_name}
cd ${cuda_installer_name}
mkdir cuda_install_logs
set +e
./setup.exe -s ${cuda_install_packages} -loglevel:6 -log:"$(pwd -W)/cuda_install_logs"
set -e
if [[ "${VC_YEAR}" == "2017" ]]; then
cp -r ${msbuild_project_dir}/* "C:/Program Files (x86)/Microsoft Visual Studio/2017/${VC_PRODUCT}/Common7/IDE/VC/VCTargets/BuildCustomizations/"
if [[ -f "/c/Program Files/NVIDIA Corporation/NvToolsExt/bin/x64/nvToolsExt64_1.dll" ]]; then
echo "Existing nvtools installation found, skipping install"
else
cp -r ${msbuild_project_dir}/* "C:/Program Files (x86)/Microsoft Visual Studio/2019/${VC_PRODUCT}/MSBuild/Microsoft/VC/v160/BuildCustomizations/"
# create tmp dir for download
tmp_dir=$(mktemp -d)
(
# no need to popd after, the subshell shouldn't affect the parent shell
pushd "${tmp_dir}"
curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/NvToolsExt.7z
7z x NvToolsExt.7z -oNvToolsExt
mkdir -p "C:/Program Files/NVIDIA Corporation/NvToolsExt"
cp -r NvToolsExt/* "C:/Program Files/NVIDIA Corporation/NvToolsExt/"
)
rm -rf "${tmp_dir}"
fi
if ! ls "/c/Program Files/NVIDIA Corporation/NvToolsExt/bin/x64/nvToolsExt64_1.dll"
then
curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/NvToolsExt.7z
7z x NvToolsExt.7z -oNvToolsExt
mkdir -p "C:/Program Files/NVIDIA Corporation/NvToolsExt"
cp -r NvToolsExt/* "C:/Program Files/NVIDIA Corporation/NvToolsExt/"
export NVTOOLSEXT_PATH="C:\\Program Files\\NVIDIA Corporation\\NvToolsExt\\"
fi
if ! ls "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe"
then
echo "CUDA installation failed"
mkdir -p /c/w/build-results
7z a "c:\\w\\build-results\\cuda_install_logs.7z" cuda_install_logs
exit 1
fi
cd ..
rm -rf ./${cuda_installer_name}
rm -f ./${cuda_installer_name}.exe

View File

@ -1,32 +1,46 @@
#!/bin/bash
set -eux -o pipefail
cuda_major_version=${CUDA_VERSION%.*}
# This is typically blank but for CUDA 10* it'll be set to 10
windows_version_qualifier=""
if [[ "$cuda_major_version" == "10" ]]; then
cudnn_installer_name="cudnn-${CUDA_VERSION}-windows10-x64-v7.6.4.38"
elif [[ "$cuda_major_version" == "11" ]]; then
if [[ "${CUDA_VERSION}" == "11.1" ]]; then
cudnn_installer_name="cudnn-${CUDA_VERSION}-windows-x64-v8.0.5.39"
elif [[ "${CUDA_VERSION}" == "11.3" ]]; then
cudnn_installer_name="cudnn-${CUDA_VERSION}-windows-x64-v8.2.0.53"
else
echo "This should not happen! ABORT."
case ${CUDA_VERSION} in
10.1)
archive_version="v7.6.4.38"
windows_version_qualifier="10"
;;
10.2)
archive_version="v7.6.5.32"
windows_version_qualifier="10"
;;
11.1)
archive_version="v8.0.5.39"
;;
11.3)
archive_version="v8.2.0.53"
;;
*)
echo "CUDA_VERSION: ${CUDA_VERSION} not supported yet"
exit 1
fi
else
echo "CUDNN for CUDA_VERSION $CUDA_VERSION is not supported yet"
exit 1
fi
;;
esac
cudnn_installer_link="https://ossci-windows.s3.amazonaws.com/${cudnn_installer_name}.zip"
cudnn_installer_name="cudnn_installer.zip"
cudnn_installer_link="https://ossci-windows.s3.amazonaws.com/cudnn-${CUDA_VERSION}-windows${windows_version_qualifier}-x64-${archive_version}.zip"
cudnn_install_folder="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/"
curl --retry 3 -O "$cudnn_installer_link"
7z x "${cudnn_installer_name}.zip" -ocudnn
# shellcheck recommends to use '${var:?}/*' to avoid potentially expanding to '/*'
# Remove all of the directories before attempting to copy files
rm -rf "${cudnn_install_folder:?}/*"
cp -rf cudnn/cuda/* "${cudnn_install_folder}"
rm -rf cudnn
rm -f "${cudnn_installer_name}.zip"
if [[ -f "${cudnn_install_folder}/include/cudnn.h" ]]; then
echo "Existing cudnn installation found, skipping install..."
else
tmp_dir=$(mktemp -d)
(
pushd "${tmp_dir}"
curl --retry 3 -o "${cudnn_installer_name}" "$cudnn_installer_link"
7z x "${cudnn_installer_name}" -ocudnn
# Use '${var:?}/*' to avoid potentially expanding to '/*'
# Remove all of the directories before attempting to copy files
rm -rf "${cudnn_install_folder:?}/*"
cp -rf cudnn/cuda/* "${cudnn_install_folder}"
)
rm -rf "${tmp_dir}"
fi

View File

@ -15,11 +15,15 @@ pytorch_params: &pytorch_params
build_only:
type: string
default: ""
ci_master:
type: string
default: ""
environment:
BUILD_ENVIRONMENT: << parameters.build_environment >>
DOCKER_IMAGE: << parameters.docker_image >>
USE_CUDA_DOCKER_RUNTIME: << parameters.use_cuda_docker_runtime >>
BUILD_ONLY: << parameters.build_only >>
CI_MASTER: << pipeline.parameters.run_master_build >>
resource_class: << parameters.resource_class >>
pytorch_android_params: &pytorch_android_params
@ -60,6 +64,9 @@ pytorch_ios_params: &pytorch_ios_params
lite_interpreter:
type: string
default: "1"
use_coreml:
type: string
default: "0"
environment:
BUILD_ENVIRONMENT: << parameters.build_environment >>
IOS_ARCH: << parameters.ios_arch >>
@ -67,6 +74,7 @@ pytorch_ios_params: &pytorch_ios_params
SELECTED_OP_LIST: << parameters.op_list >>
USE_PYTORCH_METAL: << parameters.use_metal >>
BUILD_LITE_INTERPRETER: << parameters.lite_interpreter >>
USE_COREML_DELEGATE: << parameters.use_coreml >>
pytorch_windows_params: &pytorch_windows_params
parameters:
@ -85,6 +93,9 @@ pytorch_windows_params: &pytorch_windows_params
python_version:
type: string
default: "3.8"
vs_version:
type: string
default: "16.8.6"
vc_version:
type: string
default: "14.16"
@ -102,6 +113,7 @@ pytorch_windows_params: &pytorch_windows_params
SCCACHE_BUCKET: "ossci-compiler-cache"
CUDA_VERSION: <<parameters.cuda_version>>
PYTHON_VERSION: <<parameters.python_version>>
VS_VERSION: <<parameters.vs_version>>
VC_VERSION: <<parameters.vc_version>>
VC_YEAR: <<parameters.vc_year>>
VC_PRODUCT: <<parameters.vc_product>>

View File

@ -171,4 +171,4 @@ commands:
cd ~/project
export ANDROID_BUILD_TYPE="<< parameters.build_type >>"
export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
python3 .circleci/scripts/upload_binary_size_to_scuba.py android
python3 -m tools.stats.upload_binary_size_to_scuba android

View File

@ -29,7 +29,7 @@
cd /pytorch && export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
python3 -mpip install requests && \
SCRIBE_GRAPHQL_ACCESS_TOKEN=${SCRIBE_GRAPHQL_ACCESS_TOKEN} \
python3 /pytorch/.circleci/scripts/upload_binary_size_to_scuba.py || exit 0
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- persist_to_workspace:
root: /
paths: final_pkgs
@ -239,7 +239,7 @@
binary_ios_build:
<<: *pytorch_ios_params
macos:
xcode: "12.0"
xcode: "12.5.1"
steps:
- attach_workspace:
at: ~/workspace
@ -266,7 +266,7 @@
binary_ios_upload:
<<: *pytorch_ios_params
macos:
xcode: "12.0"
xcode: "12.5.1"
steps:
- attach_workspace:
at: ~/workspace

View File

@ -41,7 +41,7 @@
no_output_timeout: "1h"
command: |
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
tag=${CIRCLE_TAG:1:5}
target=${tag:-master}
@ -86,7 +86,7 @@
no_output_timeout: "1h"
command: |
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
tag=${CIRCLE_TAG:1:5}
target=${tag:-master}
@ -126,6 +126,7 @@
set -e
export IN_CI=1
export CROSS_COMPILE_ARM64=1
export JOB_BASE_NAME=$CIRCLE_JOB
# Install sccache
sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache
@ -162,6 +163,7 @@
command: |
set -e
export IN_CI=1
export JOB_BASE_NAME=$CIRCLE_JOB
# Install sccache
sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache
@ -198,6 +200,7 @@
command: |
set -e
export IN_CI=1
export JOB_BASE_NAME=$CIRCLE_JOB
chmod a+x .jenkins/pytorch/macos-test.sh
unbuffer .jenkins/pytorch/macos-test.sh 2>&1 | ts
@ -208,12 +211,14 @@
set -ex
source /Users/distiller/workspace/miniconda3/bin/activate
pip install boto3
export PYTHONPATH="$PWD"
export IN_CI=1
export JOB_BASE_NAME=$CIRCLE_JOB
# Using the same IAM user to write stats to our OSS bucket
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}
python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test
python -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
when: always
- store_test_results:
path: test/test-reports
@ -235,6 +240,7 @@
set -e
export IN_CI=1
export BUILD_LITE_INTERPRETER=1
export JOB_BASE_NAME=$CIRCLE_JOB
chmod a+x ${HOME}/project/.jenkins/pytorch/macos-lite-interpreter-build-test.sh
unbuffer ${HOME}/project/.jenkins/pytorch/macos-lite-interpreter-build-test.sh 2>&1 | ts
- store_test_results:
@ -258,7 +264,7 @@
no_output_timeout: "1h"
command: |
set -eux
docker_image_commit=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
docker_image_commit=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
docker_image_libtorch_android_x86_32=${docker_image_commit}-android-x86_32
docker_image_libtorch_android_x86_64=${docker_image_commit}-android-x86_64
@ -347,7 +353,7 @@
no_output_timeout: "1h"
command: |
set -eux
docker_image_commit=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
docker_image_commit=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
docker_image_libtorch_android_x86_32_gradle=${docker_image_commit}-android-x86_32-gradle
@ -384,7 +390,7 @@
no_output_timeout: "1h"
command: |
set -e
docker_image_libtorch_android_x86_32=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}-android-x86_32
docker_image_libtorch_android_x86_32=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}-android-x86_32
echo "docker_image_libtorch_android_x86_32: "${docker_image_libtorch_android_x86_32}
# x86
@ -431,7 +437,7 @@
echo "DOCKER_IMAGE: ${DOCKER_IMAGE}:${DOCKER_TAG}"
time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null
git submodule sync && git submodule update -q --init --recursive --depth 1
git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0
VOLUME_MOUNTS="-v /home/circleci/project/:/var/lib/jenkins/workspace"
export id=$(docker run --env-file "${BASH_ENV}" ${VOLUME_MOUNTS} --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})
@ -447,7 +453,7 @@
pytorch_ios_build:
<<: *pytorch_ios_params
macos:
xcode: "12.0"
xcode: "12.5.1"
steps:
- checkout
- run_brew_for_ios_build
@ -461,16 +467,17 @@
# install fastlane
sudo gem install bundler && bundle install
# install certificates
echo ${IOS_CERT_KEY} >> cert.txt
echo ${IOS_CERT_KEY_2022} >> cert.txt
base64 --decode cert.txt -o Certificates.p12
rm cert.txt
bundle exec fastlane install_cert
bundle exec fastlane install_root_cert
bundle exec fastlane install_dev_cert
# install the provisioning profile
PROFILE=PyTorch_CI_2021.mobileprovision
PROFILE=PyTorch_CI_2022.mobileprovision
PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
mkdir -pv "${PROVISIONING_PROFILES}"
cd "${PROVISIONING_PROFILES}"
echo ${IOS_SIGN_KEY} >> cert.txt
echo ${IOS_SIGN_KEY_2022} >> cert.txt
base64 --decode cert.txt -o ${PROFILE}
rm cert.txt
- run:
@ -500,7 +507,7 @@
# sync submodules
cd ${PROJ_ROOT}
git submodule sync
git submodule update --init --recursive --depth 1
git submodule update --init --recursive --depth 1 --jobs 0
# export
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
@ -528,12 +535,8 @@
no_output_timeout: "30m"
command: |
set -e
if [ ${BUILD_LITE_INTERPRETER} == 0 ]; then
echo "Run Build Test is not for full jit, skipping."
exit 0
fi
PROJ_ROOT=/Users/distiller/project
PROFILE=PyTorch_CI_2021
PROFILE=PyTorch_CI_2022
# run the ruby build script
if ! [ -x "$(command -v xcodebuild)" ]; then
echo 'Error: xcodebuild is not installed.'
@ -557,21 +560,28 @@
if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then
echo "not SIMULATOR build, skip it."
exit 0
elif [ ${BUILD_LITE_INTERPRETER} == 0 ]; then
echo "Run Simulator Tests is not for full jit, skipping."
exit 0
fi
WORKSPACE=/Users/distiller/workspace
PROJ_ROOT=/Users/distiller/project
source ~/anaconda/bin/activate
pip install torch torchvision --progress-bar off
#run unit test
# use the pytorch nightly build to generate models
conda install pytorch torchvision -c pytorch-nightly --yes
# generate models for differnet backends
cd ${PROJ_ROOT}/ios/TestApp/benchmark
mkdir -p ../models
python trace_model.py
ruby setup.rb
if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then
ruby setup.rb --lite 1
else
ruby setup.rb
fi
cd ${PROJ_ROOT}/ios/TestApp
instruments -s -devices
fastlane scan
if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then
fastlane scan --only_testing TestAppTests/TestAppTests/testLiteInterpreter
else
fastlane scan --only_testing TestAppTests/TestAppTests/testFullJIT
fi
pytorch_linux_bazel_build:
<<: *pytorch_params
machine:
@ -593,7 +603,7 @@
echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"
git submodule sync && git submodule update -q --init --recursive --depth 1
git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0
docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace
@ -604,7 +614,7 @@
# Push intermediate Docker image for next phase to use
if [ -z "${BUILD_ONLY}" ]; then
# Augment our output image name with bazel to avoid collisions
output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-bazel-${CIRCLE_SHA1}
output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-bazel-${CIRCLE_SHA1}
export COMMIT_DOCKER_IMAGE=$output_image
docker commit "$id" ${COMMIT_DOCKER_IMAGE}
time docker push ${COMMIT_DOCKER_IMAGE}
@ -624,7 +634,7 @@
no_output_timeout: "90m"
command: |
set -e
output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-bazel-${CIRCLE_SHA1}
output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-bazel-${CIRCLE_SHA1}
export COMMIT_DOCKER_IMAGE=$output_image
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
@ -684,7 +694,7 @@
no_output_timeout: "30m"
command: |
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

View File

@ -30,11 +30,11 @@ jobs:
time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null
export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})
git submodule sync && git submodule update -q --init --recursive --depth 1
git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0
docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace
export COMMAND='((echo "sudo chown -R jenkins workspace && export CIRCLE_JOB="$CIRCLE_JOB" && cd workspace && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "$id" bash) 2>&1'
export COMMAND='((echo "sudo chown -R jenkins workspace && export JOB_BASE_NAME="$CIRCLE_JOB" && cd workspace && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
@ -47,7 +47,7 @@ jobs:
# The xla build uses the same docker image as
# pytorch_linux_bionic_py3_6_clang9_build. In the push step, we have to
# distinguish between them so the test can pick up the correct image.
output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-xla
elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then
@ -81,7 +81,7 @@ jobs:
cd /pytorch && export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
python3 -mpip install requests && \
SCRIBE_GRAPHQL_ACCESS_TOKEN=${SCRIBE_GRAPHQL_ACCESS_TOKEN} \
python3 .circleci/scripts/upload_binary_size_to_scuba.py || exit 0
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- store_artifacts:
path: /home/circleci/project/dist
@ -105,7 +105,7 @@ jobs:
export DOCKER_TAG="f3d89a32912f62815e4feaeed47e564e887dffd6"
fi
# See Note [Special build images]
output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-xla
elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then
@ -158,13 +158,14 @@ jobs:
}
if is_vanilla_build; then
echo "apt-get update && apt-get install -y qemu-user gdb" | docker exec -u root -i "$id" bash
echo "apt-get update || apt-get install libgnutls30" | docker exec -u root -i "$id" bash
echo "apt-get install -y qemu-user gdb" | docker exec -u root -i "$id" bash
echo "cd workspace/build; qemu-x86_64 -g 2345 -cpu Broadwell -E ATEN_CPU_CAPABILITY=default ./bin/basic --gtest_filter=BasicTest.BasicTestCPU & gdb ./bin/basic -ex 'set pagination off' -ex 'target remote :2345' -ex 'continue' -ex 'bt' -ex='set confirm off' -ex 'quit \$_isvoid(\$_exitcode)'" | docker exec -u jenkins -i "$id" bash
else
echo "Skipping for ${BUILD_ENVIRONMENT}"
fi
- run:
name: Run tests
name: Test
no_output_timeout: "90m"
command: |
set -e
@ -173,7 +174,16 @@ jobs:
# =================== The following code will be executed inside Docker container ===================
set -ex
export SCRIBE_GRAPHQL_ACCESS_TOKEN="${SCRIBE_GRAPHQL_ACCESS_TOKEN}"
export CIRCLE_JOB="$CIRCLE_JOB"
export JOB_BASE_NAME="$CIRCLE_JOB"
# temporary fix for https://github.com/pytorch/pytorch/issues/60746
if [ -z "$CIRCLE_PR_NUMBER" ]; then
if [[ $CIRCLE_BRANCH =~ .*pull.* ]]; then
export PR_NUMBER="$(echo $CIRCLE_BRANCH | sed 's/[^0-9]//g')"
export CIRCLE_PR_NUMBER="$PR_NUMBER"
fi
else
export PR_NUMBER="$CIRCLE_PR_NUMBER"
fi
${PARALLEL_FLAGS}
cd workspace
EOL
@ -220,11 +230,10 @@ jobs:
export CIRCLE_SHA1="$CIRCLE_SHA1"
export CIRCLE_PR_NUMBER="${CIRCLE_PR_NUMBER:-}"
export CIRCLE_BRANCH="$CIRCLE_BRANCH"
export CIRCLE_JOB="$CIRCLE_JOB"
export JOB_BASE_NAME="$CIRCLE_JOB"
export CIRCLE_WORKFLOW_ID="$CIRCLE_WORKFLOW_ID"
cd workspace
export PYTHONPATH="\${PWD}"
python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test
python -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
EOL
echo "(cat docker_commands.sh | docker exec -u jenkins -e LANG=C.UTF-8 -i "$id" bash) 2>&1" > command.sh
unbuffer bash command.sh | ts
@ -254,6 +263,9 @@ jobs:
python_version:
type: string
default: "3.8"
vs_version:
type: string
default: "16.8.6"
vc_version:
type: string
default: "14.16"
@ -321,6 +333,9 @@ jobs:
python_version:
type: string
default: "3.8"
vs_version:
type: string
default: "16.8.6"
vc_version:
type: string
default: "14.16"
@ -376,9 +391,8 @@ jobs:
set -ex
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}
export PYTHONPATH="$PWD"
pip install typing_extensions boto3
python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test
python -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
when: always
- store_test_results:
path: test/test-reports

View File

@ -1,169 +1,3 @@
scheduled-ci:
triggers:
- schedule:
# runs every 4 hours on the 45th minute
cron: "45 0,4,8,12,16,20 * * *"
filters:
branches:
only:
- master
jobs:
- docker_build_job:
name: "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
image_name: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
- pytorch_linux_build:
name: periodic_pytorch_xenial_cuda11_3_cudnn8_gcc7_build
requires:
- "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
- pytorch_linux_test:
name: periodic_pytorch_xenial_cuda11_3_cudnn8_gcc7_test
requires:
- periodic_pytorch_xenial_cuda11_3_cudnn8_gcc7_build
build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-test"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
use_cuda_docker_runtime: "1"
resource_class: gpu.medium
- pytorch_linux_build:
name: periodic_libtorch_xenial_cuda11_3_cudnn8_gcc7_build
requires:
- "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
build_environment: "pytorch-libtorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
- pytorch_windows_build:
build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3
cuda_version: "11.3"
name: periodic_pytorch_windows_cuda11.3_build
python_version: "3.8"
use_cuda: "1"
vc_product: BuildTools
vc_version: "14.28.29333"
vc_year: "2019"
- pytorch_windows_test:
build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3
cuda_version: "11.3"
executor: windows-with-nvidia-gpu
name: periodic_pytorch_windows_cuda11.3_test1
python_version: "3.8"
requires:
- periodic_pytorch_windows_cuda11.3_build
test_name: pytorch-windows-test1
use_cuda: "1"
vc_product: BuildTools
vc_version: "14.28.29333"
vc_year: "2019"
- pytorch_windows_test:
build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3
cuda_version: "11.3"
executor: windows-with-nvidia-gpu
name: periodic_pytorch_windows_cuda11.3_test2
python_version: "3.8"
requires:
- periodic_pytorch_windows_cuda11.3_build
test_name: pytorch-windows-test2
use_cuda: "1"
vc_product: BuildTools
vc_version: "14.28.29333"
vc_year: "2019"
# The following allows these jobs to run on ci-all and release branches
debuggable-scheduled-ci:
jobs:
- docker_build_job:
name: "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
image_name: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
filters:
branches:
only:
- /ci-all\/.*/
- /release\/.*/
- pytorch_linux_build:
name: pytorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_build
requires:
- "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
filters:
branches:
only:
- /ci-all\/.*/
- /release\/.*/
- pytorch_linux_test:
name: pytorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_test
requires:
- pytorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_build
build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-test"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
use_cuda_docker_runtime: "1"
resource_class: gpu.medium
filters:
branches:
only:
- /ci-all\/.*/
- /release\/.*/
- pytorch_linux_build:
name: pytorch_libtorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_build
requires:
- "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
build_environment: "pytorch-libtorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
filters:
branches:
only:
- /ci-all\/.*/
- /release\/.*/
- pytorch_windows_build:
build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3
cuda_version: "11.3"
name: pytorch_windows_vs2019_py38_cuda11.3_build
python_version: "3.8"
use_cuda: "1"
vc_product: BuildTools
vc_version: "14.28.29333"
vc_year: "2019"
filters:
branches:
only:
- /ci-all\/.*/
- /release\/.*/
- pytorch_windows_test:
build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3
cuda_version: "11.3"
executor: windows-with-nvidia-gpu
name: pytorch_windows_vs2019_py38_cuda11.3_test1
python_version: "3.8"
requires:
- pytorch_windows_vs2019_py38_cuda11.3_build
test_name: pytorch-windows-test1
use_cuda: "1"
vc_product: BuildTools
vc_version: "14.28.29333"
vc_year: "2019"
filters:
branches:
only:
- /ci-all\/.*/
- /release\/.*/
- pytorch_windows_test:
build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3
cuda_version: "11.3"
executor: windows-with-nvidia-gpu
name: pytorch_windows_vs2019_py38_cuda11.3_test2
python_version: "3.8"
requires:
- pytorch_windows_vs2019_py38_cuda11.3_build
test_name: pytorch-windows-test2
use_cuda: "1"
vc_product: BuildTools
vc_version: "14.28.29333"
vc_year: "2019"
filters:
branches:
only:
- /ci-all\/.*/
- /release\/.*/
# the following clones pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7's tests but enables
# slow tests and sets an environment variable so gradcheck runs with fast_mode=False
slow-gradcheck-scheduled-ci:

View File

@ -9,6 +9,7 @@ bugprone-*,
-bugprone-reserved-identifier,
cppcoreguidelines-*,
-cppcoreguidelines-avoid-magic-numbers,
-cppcoreguidelines-avoid-non-const-global-variables,
-cppcoreguidelines-interfaces-global-init,
-cppcoreguidelines-macro-usage,
-cppcoreguidelines-owning-memory,
@ -21,6 +22,7 @@ cppcoreguidelines-*,
-cppcoreguidelines-pro-type-union-access,
-cppcoreguidelines-pro-type-vararg,
-cppcoreguidelines-special-member-functions,
-cppcoreguidelines-non-private-member-variables-in-classes,
-facebook-hte-RelativeInclude,
hicpp-exception-baseclass,
hicpp-avoid-goto,
@ -37,5 +39,6 @@ performance-*,
'
HeaderFilterRegex: 'torch/csrc/.*'
AnalyzeTemporaryDtors: false
WarningsAsErrors: '*'
CheckOptions:
...

5
.gitattributes vendored
View File

@ -1 +1,4 @@
*.bat text eol=crlf
*.bat text eol=crlf
.circleci/config.yml linguist-generated=true
.github/workflows/generated-*.yml linguist-generated=true
.github/generated-* linguist-generated=true

View File

@ -1,5 +1,5 @@
---
name: "\U0001F680Feature Request"
name: "\U0001F680 Feature Request"
about: Submit a proposal/request for a new PyTorch feature
---

8
.github/actionlint.yaml vendored Normal file
View File

@ -0,0 +1,8 @@
self-hosted-runner:
labels:
- linux.2xlarge
- linux.8xlarge.nvidia.gpu
- linux.16xlarge.nvidia.gpu
- windows.4xlarge
- windows.8xlarge.nvidia.gpu
- bm-runner

102
.github/generated-ciflow-ruleset.json generated vendored Normal file
View File

@ -0,0 +1,102 @@
{
"__comment": "@generated DO NOT EDIT MANUALLY, Generation script: .github/scripts/generate_ci_workflows.py",
"label_rules": {
"ciflow/all": [
"libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
"libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
"linux-bionic-cuda10.2-py3.9-gcc7",
"linux-bionic-py3.6-clang9",
"linux-bionic-py3.8-gcc9-coverage",
"linux-xenial-cuda10.2-py3.6-gcc7",
"linux-xenial-cuda11.3-py3.6-gcc7",
"linux-xenial-py3.6-gcc5.4",
"linux-xenial-py3.6-gcc7-bazel-test",
"parallelnative-linux-xenial-py3.6-gcc5.4",
"periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-win-vs2019-cuda11.1-py3",
"puretorch-linux-xenial-py3.6-gcc5.4",
"win-vs2019-cpu-py3",
"win-vs2019-cuda10.2-py3",
"win-vs2019-cuda11.3-py3"
],
"ciflow/bazel": [
"linux-xenial-py3.6-gcc7-bazel-test"
],
"ciflow/coverage": [
"linux-bionic-py3.8-gcc9-coverage"
],
"ciflow/cpu": [
"linux-bionic-py3.6-clang9",
"linux-bionic-py3.8-gcc9-coverage",
"linux-xenial-py3.6-gcc5.4",
"linux-xenial-py3.6-gcc7-bazel-test",
"parallelnative-linux-xenial-py3.6-gcc5.4",
"puretorch-linux-xenial-py3.6-gcc5.4",
"win-vs2019-cpu-py3"
],
"ciflow/cuda": [
"libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
"libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
"linux-bionic-cuda10.2-py3.9-gcc7",
"linux-xenial-cuda10.2-py3.6-gcc7",
"linux-xenial-cuda11.3-py3.6-gcc7",
"periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-win-vs2019-cuda11.1-py3",
"win-vs2019-cuda10.2-py3",
"win-vs2019-cuda11.3-py3"
],
"ciflow/default": [
"linux-bionic-py3.6-clang9",
"linux-bionic-py3.8-gcc9-coverage",
"linux-xenial-cuda11.3-py3.6-gcc7",
"linux-xenial-py3.6-gcc5.4",
"linux-xenial-py3.6-gcc7-bazel-test",
"win-vs2019-cpu-py3",
"win-vs2019-cuda11.3-py3"
],
"ciflow/libtorch": [
"libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
"libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
"periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7"
],
"ciflow/linux": [
"libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
"libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
"linux-bionic-cuda10.2-py3.9-gcc7",
"linux-bionic-py3.6-clang9",
"linux-bionic-py3.8-gcc9-coverage",
"linux-xenial-cuda10.2-py3.6-gcc7",
"linux-xenial-cuda11.3-py3.6-gcc7",
"linux-xenial-py3.6-gcc5.4",
"linux-xenial-py3.6-gcc7-bazel-test",
"parallelnative-linux-xenial-py3.6-gcc5.4",
"periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-linux-xenial-cuda11.1-py3.6-gcc7",
"puretorch-linux-xenial-py3.6-gcc5.4"
],
"ciflow/noarch": [
"linux-bionic-py3.6-clang9"
],
"ciflow/scheduled": [
"periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-win-vs2019-cuda11.1-py3"
],
"ciflow/slow": [
"linux-bionic-cuda10.2-py3.9-gcc7",
"linux-xenial-cuda10.2-py3.6-gcc7"
],
"ciflow/win": [
"periodic-win-vs2019-cuda11.1-py3",
"win-vs2019-cpu-py3",
"win-vs2019-cuda10.2-py3",
"win-vs2019-cuda11.3-py3"
],
"ciflow/xla": [
"linux-bionic-py3.6-clang9"
]
},
"version": "v1"
}

View File

@ -13,5 +13,9 @@ labels_to_circle_params:
- run_build
ci/master:
parameter: run_master_build
set_to_false:
- run_build
ci/slow-gradcheck:
parameter: run_slow_gradcheck_build
set_to_false:
- run_build

View File

@ -1 +1,2 @@
tracking_issue: 24422
ciflow_tracking_issue: 64124

6
.github/regenerate.sh vendored Executable file
View File

@ -0,0 +1,6 @@
#!/bin/bash -e
# Allows this script to be invoked from any directory:
cd "$(dirname "$0")"
python3 scripts/generate_ci_workflows.py

View File

@ -27,6 +27,11 @@ runner_types:
os: linux
max_available: 50
disk_size: 150
linux.16xlarge.nvidia.gpu:
instance_type: g3.16xlarge
os: linux
max_available: 10
disk_size: 150
windows.4xlarge:
instance_type: c5d.4xlarge
os: windows

View File

@ -13,7 +13,10 @@ WORKFLOWS = REPO_ROOT / ".github" / "workflows"
def concurrency_key(filename: Path) -> str:
workflow_name = filename.with_suffix("").name.replace("_", "-")
return f"{workflow_name}-${{{{ github.event.pull_request.number || github.sha }}}}"
if workflow_name.startswith("generated-"):
workflow_name = workflow_name[len("generated-"):]
return f"{workflow_name}-${{{{ github.event.pull_request.number || github.sha }}}}" \
"-${{ github.event_name == 'workflow_dispatch' }}"
def should_check(filename: Path) -> bool:

View File

@ -1,222 +1,586 @@
#!/usr/bin/env python3
from dataclasses import asdict, dataclass, field
from pathlib import Path
from typing import Any, Dict
from typing import Dict, Set
import jinja2
import json
import os
import sys
from typing_extensions import Literal
YamlShellBool = Literal["''", 1]
Arch = Literal["windows", "linux"]
DOCKER_REGISTRY = "308535385114.dkr.ecr.us-east-1.amazonaws.com"
GITHUB_DIR = Path(__file__).parent.parent
# it would be nice to statically specify that build_environment must be
# present, but currently Python has no easy way to do that
# https://github.com/python/mypy/issues/4617
PyTorchWorkflow = Dict[str, Any]
GITHUB_DIR = Path(__file__).resolve().parent.parent
WINDOWS_CPU_TEST_RUNNER = "windows.4xlarge"
WINDOWS_CUDA_TEST_RUNNER = "windows.8xlarge.nvidia.gpu"
def PyTorchWindowsWorkflow(
*,
build_environment: str,
test_runner_type: str,
cuda_version: str,
on_pull_request: bool = False
) -> PyTorchWorkflow:
return {
"build_environment": build_environment,
"test_runner_type": test_runner_type,
"cuda_version": cuda_version,
"on_pull_request": on_pull_request,
}
WINDOWS_RUNNERS = {
WINDOWS_CPU_TEST_RUNNER,
WINDOWS_CUDA_TEST_RUNNER,
}
LINUX_CPU_TEST_RUNNER = "linux.2xlarge"
LINUX_CUDA_TEST_RUNNER = "linux.8xlarge.nvidia.gpu"
LINUX_RUNNERS = {
LINUX_CPU_TEST_RUNNER,
LINUX_CUDA_TEST_RUNNER,
}
CUDA_RUNNERS = {
WINDOWS_CUDA_TEST_RUNNER,
LINUX_CUDA_TEST_RUNNER,
}
CPU_RUNNERS = {
WINDOWS_CPU_TEST_RUNNER,
LINUX_CPU_TEST_RUNNER,
}
LABEL_CIFLOW_ALL = "ciflow/all"
LABEL_CIFLOW_BAZEL = "ciflow/bazel"
LABEL_CIFLOW_COVERAGE = "ciflow/coverage"
LABEL_CIFLOW_CPU = "ciflow/cpu"
LABEL_CIFLOW_CUDA = "ciflow/cuda"
LABEL_CIFLOW_DEFAULT = "ciflow/default"
LABEL_CIFLOW_LIBTORCH = "ciflow/libtorch"
LABEL_CIFLOW_LINUX = "ciflow/linux"
LABEL_CIFLOW_SCHEDULED = "ciflow/scheduled"
LABEL_CIFLOW_SLOW = "ciflow/slow"
LABEL_CIFLOW_WIN = "ciflow/win"
LABEL_CIFLOW_XLA = "ciflow/xla"
LABEL_CIFLOW_NOARCH = "ciflow/noarch"
def PyTorchLinuxWorkflow(
*,
build_environment: str,
docker_image_base: str,
test_runner_type: str,
on_pull_request: bool = False,
enable_doc_jobs: bool = False,
) -> PyTorchWorkflow:
return {
"build_environment": build_environment,
"docker_image_base": docker_image_base,
"test_runner_type": test_runner_type,
"on_pull_request": on_pull_request,
"enable_doc_jobs": enable_doc_jobs,
}
@dataclass
class CIFlowConfig:
enabled: bool = False
# For use to enable workflows to run on pytorch/pytorch-canary
run_on_canary: bool = False
labels: Set[str] = field(default_factory=set)
trigger_action: str = 'unassigned'
trigger_actor: str = 'pytorchbot'
root_job_name: str = 'ciflow_should_run'
root_job_condition: str = ''
# trigger_action_only controls if we listen only on the trigger_action of a pull_request.
# If it's False, we listen on all default pull_request actions, this is useful when
# ciflow (via probot) is not automated yet.
trigger_action_only: bool = False
def gen_root_job_condition(self) -> None:
# TODO: Make conditions strict
# At the beginning of the rollout of ciflow, we keep everything the same as what we have
# Once fully rollout, we can have strict constraints
# e.g. ADD env.GITHUB_ACTOR == '{self.trigger_actor}
# REMOVE github.event.action !='{self.trigger_action}'
label_conditions = [
f"contains(github.event.pull_request.labels.*.name, '{label}')" for label in sorted(self.labels)]
if self.run_on_canary:
self.root_job_condition = "(github.repository_owner == 'pytorch') && "
else:
self.root_job_condition = "(github.repository == 'pytorch/pytorch') && "
self.root_job_condition += f"((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || " \
f"(github.event.action !='{self.trigger_action}') || " \
f"({' || '.join(label_conditions)}))"
def reset_root_job(self) -> None:
self.root_job_name = ''
self.root_job_condition = ''
def __post_init__(self) -> None:
if not self.enabled:
self.reset_root_job()
return
self.labels.add(LABEL_CIFLOW_ALL)
self.gen_root_job_condition()
def generate_workflow_file(
*,
workflow: PyTorchWorkflow,
workflow_template: jinja2.Template,
) -> Path:
output_file_path = GITHUB_DIR / f"workflows/{workflow['build_environment']}.yml"
with open(output_file_path, "w") as output_file:
GENERATED = "generated"
output_file.writelines([f"# @{GENERATED} DO NOT EDIT MANUALLY\n"])
output_file.write(workflow_template.render(**workflow))
output_file.write("\n")
return output_file_path
@dataclass
class CIFlowRuleset:
version = 'v1'
output_file = f'{GITHUB_DIR}/generated-ciflow-ruleset.json'
label_rules: Dict[str, Set[str]] = field(default_factory=dict)
def add_label_rule(self, labels: Set[str], workflow_name: str) -> None:
for label in labels:
if label in self.label_rules:
self.label_rules[label].add(workflow_name)
else:
self.label_rules[label] = {workflow_name}
def generate_json(self) -> None:
GENERATED = "generated" # Note that please keep the variable GENERATED otherwise phabricator will hide the whole file
output = {
"__comment": f"@{GENERATED} DO NOT EDIT MANUALLY, Generation script: .github/scripts/generate_ci_workflows.py",
"version": self.version,
"label_rules": {
label: sorted(list(workflows))
for label, workflows in self.label_rules.items()
}
}
with open(self.output_file, 'w') as outfile:
json.dump(output, outfile, indent=2, sort_keys=True)
outfile.write('\n')
@dataclass
class CIWorkflow:
# Required fields
arch: Arch
build_environment: str
test_runner_type: str
# Optional fields
ciflow_config: CIFlowConfig = field(default_factory=CIFlowConfig)
cuda_version: str = ''
docker_image_base: str = ''
enable_doc_jobs: bool = False
exclude_test: bool = False
is_coverage: bool = False
is_libtorch: bool = False
is_scheduled: str = ''
num_test_shards: int = 1
on_pull_request: bool = False
only_build_on_pull_request: bool = False
only_run_smoke_tests_on_pull_request: bool = False
num_test_shards_on_pull_request: int = -1
distributed_test: bool = True
# The following variables will be set as environment variables,
# so it's easier for both shell and Python scripts to consume it if false is represented as the empty string.
enable_jit_legacy_test: YamlShellBool = "''"
enable_distributed_test: YamlShellBool = "''"
enable_multigpu_test: YamlShellBool = "''"
enable_nogpu_no_avx_test: YamlShellBool = "''"
enable_nogpu_no_avx2_test: YamlShellBool = "''"
enable_slow_test: YamlShellBool = "''"
enable_docs_test: YamlShellBool = "''"
enable_backwards_compat_test: YamlShellBool = "''"
enable_xla_test: YamlShellBool = "''"
enable_noarch_test: YamlShellBool = "''"
def __post_init__(self) -> None:
if self.is_libtorch:
self.exclude_test = True
if not self.on_pull_request:
self.only_build_on_pull_request = False
if self.distributed_test:
self.enable_distributed_test = 1
# If num_test_shards_on_pull_request is not user-defined, default to num_test_shards unless we are
# only running smoke tests on the pull request.
if self.num_test_shards_on_pull_request == -1:
# Don't waste resources on runner spinup and cooldown for another shard if we are only running a few tests
if self.only_run_smoke_tests_on_pull_request:
self.num_test_shards_on_pull_request = 1
else:
self.num_test_shards_on_pull_request = self.num_test_shards
self.assert_valid()
def assert_valid(self) -> None:
err_message = f"invalid test_runner_type for {self.arch}: {self.test_runner_type}"
if self.arch == 'linux':
assert self.test_runner_type in LINUX_RUNNERS, err_message
if self.arch == 'windows':
assert self.test_runner_type in WINDOWS_RUNNERS, err_message
if self.ciflow_config.enabled:
# make sure if LABEL_CIFLOW_DEFAULT is set, we then need to set trigger_action_only to False
assert self.ciflow_config.trigger_action_only != (LABEL_CIFLOW_DEFAULT in self.ciflow_config.labels)
assert self.on_pull_request
assert LABEL_CIFLOW_ALL in self.ciflow_config.labels
assert LABEL_CIFLOW_ALL in self.ciflow_config.root_job_condition
if self.arch == 'linux':
assert LABEL_CIFLOW_LINUX in self.ciflow_config.labels
if self.arch == 'windows':
assert LABEL_CIFLOW_WIN in self.ciflow_config.labels
if self.test_runner_type in CUDA_RUNNERS:
assert LABEL_CIFLOW_CUDA in self.ciflow_config.labels
if self.test_runner_type in CPU_RUNNERS:
assert LABEL_CIFLOW_CPU in self.ciflow_config.labels
def generate_workflow_file(self, workflow_template: jinja2.Template) -> None:
output_file_path = GITHUB_DIR / f"workflows/generated-{self.build_environment}.yml"
with open(output_file_path, "w") as output_file:
GENERATED = "generated" # Note that please keep the variable GENERATED otherwise phabricator will hide the whole file
output_file.writelines([f"# @{GENERATED} DO NOT EDIT MANUALLY\n"])
try:
content = workflow_template.render(asdict(self))
except Exception as e:
print(f"Failed on template: {workflow_template}", file=sys.stderr)
raise e
output_file.write(content)
if content[-1] != "\n":
output_file.write("\n")
print(output_file_path)
WINDOWS_WORKFLOWS = [
PyTorchWindowsWorkflow(
build_environment="pytorch-win-vs2019-cpu-py3",
CIWorkflow(
arch="windows",
build_environment="win-vs2019-cpu-py3",
cuda_version="cpu",
test_runner_type=WINDOWS_CPU_TEST_RUNNER,
on_pull_request=True
on_pull_request=True,
num_test_shards=2,
ciflow_config=CIFlowConfig(
enabled=True,
run_on_canary=True,
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_CPU, LABEL_CIFLOW_WIN}
),
),
PyTorchWindowsWorkflow(
build_environment="pytorch-win-vs2019-cuda10-cudnn7-py3",
cuda_version="10.1",
CIWorkflow(
arch="windows",
build_environment="win-vs2019-cuda10.2-py3",
cuda_version="10.2",
test_runner_type=WINDOWS_CUDA_TEST_RUNNER,
on_pull_request=True,
num_test_shards=2,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_CUDA, LABEL_CIFLOW_WIN}
),
),
PyTorchWindowsWorkflow(
build_environment="pytorch-win-vs2019-cuda11-cudnn8-py3",
CIWorkflow(
arch="windows",
build_environment="win-vs2019-cuda11.3-py3",
cuda_version="11.3",
test_runner_type=WINDOWS_CUDA_TEST_RUNNER,
num_test_shards=2,
on_pull_request=True,
only_run_smoke_tests_on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
run_on_canary=True,
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_CUDA, LABEL_CIFLOW_WIN}
),
),
CIWorkflow(
arch="windows",
build_environment="periodic-win-vs2019-cuda11.1-py3",
cuda_version="11.1",
test_runner_type=WINDOWS_CUDA_TEST_RUNNER,
)
num_test_shards=2,
is_scheduled="45 0,4,8,12,16,20 * * *",
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_WIN, LABEL_CIFLOW_CUDA}
),
),
]
LINUX_WORKFLOWS = [
PyTorchLinuxWorkflow(
build_environment="pytorch-linux-xenial-py3.6-gcc5.4",
CIWorkflow(
arch="linux",
build_environment="linux-xenial-py3.6-gcc5.4",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
test_runner_type=LINUX_CPU_TEST_RUNNER,
on_pull_request=True,
enable_jit_legacy_test=1,
enable_doc_jobs=True,
enable_docs_test=1,
enable_backwards_compat_test=1,
num_test_shards=2,
ciflow_config=CIFlowConfig(
enabled=True,
run_on_canary=True,
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU}
),
),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-paralleltbb-linux-xenial-py3.6-gcc5.4",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ParallelTBB does not have a maintainer and is currently flaky
# CIWorkflow(
# arch="linux",
# build_environment="paralleltbb-linux-xenial-py3.6-gcc5.4",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# # This is a master only job despite on_pull_request is set to True
# on_pull_request=True,
# ciflow_config=CIFlowConfig(
# enabled=True,
# trigger_action_only=True,
# labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},
# ),
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-parallelnative-linux-xenial-py3.6-gcc5.4",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-pure_torch-linux-xenial-py3.6-gcc5.4",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-gcc7",
CIWorkflow(
arch="linux",
build_environment="parallelnative-linux-xenial-py3.6-gcc5.4",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
test_runner_type=LINUX_CPU_TEST_RUNNER,
# This is a master only job despite on_pull_request is set to True
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},
),
),
# Build PyTorch with BUILD_CAFFE2=OFF
CIWorkflow(
arch="linux",
build_environment="puretorch-linux-xenial-py3.6-gcc5.4",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
test_runner_type=LINUX_CPU_TEST_RUNNER,
exclude_test=True,
# This is a master only job despite on_pull_request is set to True
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},
),
),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-gcc7",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc7",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang5-asan",
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-asan",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-asan",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang7-onnx",
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang7-onnx",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang7-onnx",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
PyTorchLinuxWorkflow(
build_environment="pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7",
CIWorkflow(
arch="linux",
build_environment="linux-bionic-cuda10.2-py3.9-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
num_test_shards=2,
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
run_on_canary=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_SLOW, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA}
),
),
CIWorkflow(
arch="linux",
build_environment="linux-xenial-cuda10.2-py3.6-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
enable_jit_legacy_test=1,
enable_multigpu_test=1,
enable_nogpu_no_avx_test=1,
enable_nogpu_no_avx2_test=1,
enable_slow_test=1,
num_test_shards=2,
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels=set([LABEL_CIFLOW_SLOW, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),
),
),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-cuda11.1-cudnn8-py3.6-gcc7",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",
# test_runner_type=LINUX_CUDA_TEST_RUNNER,
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-libtorch-linux-xenial-cuda11.1-cudnn8-py3.6-gcc7",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",
# test_runner_type=LINUX_CUDA_TEST_RUNNER,
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-bionic-py3.6-clang9-noarch",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.6-clang9",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-xla-linux-bionic-py3.6-clang9",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.6-clang9",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-vulkan-linux-bionic-py3.6-clang9",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.6-clang9",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-bionic-py3.8-gcc9-coverage",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.8-gcc9",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-bionic-rocm3.9-py3.6",
CIWorkflow(
arch="linux",
build_environment="libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
is_libtorch=True,
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels=set([LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),
),
),
CIWorkflow(
arch="linux",
build_environment="linux-xenial-cuda11.3-py3.6-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
num_test_shards=2,
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
labels=set([LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),
),
),
CIWorkflow(
arch="linux",
build_environment="libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
is_libtorch=True,
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels=set([LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),
),
),
CIWorkflow(
arch="linux",
build_environment="periodic-linux-xenial-cuda11.1-py3.6-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
num_test_shards=2,
is_scheduled="45 0,4,8,12,16,20 * * *",
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA}
),
),
CIWorkflow(
arch="linux",
build_environment="periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
is_libtorch=True,
is_scheduled="45 0,4,8,12,16,20 * * *",
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_CUDA},
),
),
CIWorkflow(
arch="linux",
build_environment="linux-bionic-py3.8-gcc9-coverage",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.8-gcc9",
test_runner_type=LINUX_CPU_TEST_RUNNER,
on_pull_request=True,
is_coverage=True,
num_test_shards=2,
ciflow_config=CIFlowConfig(
enabled=True,
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_COVERAGE, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},
),
),
CIWorkflow(
arch="linux",
build_environment="linux-bionic-py3.6-clang9",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.6-clang9",
test_runner_type=LINUX_CPU_TEST_RUNNER,
on_pull_request=True,
num_test_shards=2,
distributed_test=False,
enable_noarch_test=1,
ciflow_config=CIFlowConfig(
enabled=True,
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_XLA, LABEL_CIFLOW_NOARCH},
),
),
# CIWorkflow(
# arch="linux",
# build_environment="linux-bionic-rocm3.9-py3.6",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-rocm3.9-py3.6",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang5-android-ndk-r19c-x86_32",
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-x86_32",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang5-android-ndk-r19c-x86_64",
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-x86_64",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang5-android-ndk-r19c-arm-v7a",
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-arm-v7a",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang5-android-ndk-r19c-arm-v8a",
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-arm-v8a",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang5-mobile",
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-mobile",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-asan",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang5-mobile-custom-dynamic",
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-mobile-custom-dynamic",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang5-mobile-custom-static",
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-mobile-custom-static",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang5-mobile-code-analysis",
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-mobile-code-analysis",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
]
BAZEL_WORKFLOWS = [
CIWorkflow(
arch="linux",
build_environment="linux-xenial-py3.6-gcc7-bazel-test",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7",
test_runner_type=LINUX_CPU_TEST_RUNNER,
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BAZEL, LABEL_CIFLOW_CPU, LABEL_CIFLOW_LINUX},
),
),
]
if __name__ == "__main__":
jinja_env = jinja2.Environment(
variable_start_string="!{{",
loader=jinja2.FileSystemLoader(str(GITHUB_DIR.joinpath("templates"))),
undefined=jinja2.StrictUndefined,
)
template_and_workflows = [
(jinja_env.get_template("linux_ci_workflow.yml.j2"), LINUX_WORKFLOWS),
(jinja_env.get_template("windows_ci_workflow.yml.j2"), WINDOWS_WORKFLOWS)
(jinja_env.get_template("windows_ci_workflow.yml.j2"), WINDOWS_WORKFLOWS),
(jinja_env.get_template("bazel_ci_workflow.yml.j2"), BAZEL_WORKFLOWS),
]
# Delete the existing generated files first, this should align with .gitattributes file description.
existing_workflows = GITHUB_DIR.glob("workflows/generated-*")
for w in existing_workflows:
try:
os.remove(w)
except Exception as e:
print(f"Error occurred when deleting file {w}: {e}")
ciflow_ruleset = CIFlowRuleset()
for template, workflows in template_and_workflows:
for workflow in workflows:
print(generate_workflow_file(workflow=workflow, workflow_template=template))
workflow.generate_workflow_file(workflow_template=template)
if workflow.ciflow_config.enabled:
ciflow_ruleset.add_label_rule(workflow.ciflow_config.labels, workflow.build_environment)
elif workflow.on_pull_request:
# If ciflow is disabled but still on_pull_request, we can denote
# it as a special label LABEL_CIFLOW_DEFAULT in the ruleset, which will be later
# turned into an actual LABEL_CIFLOW_DEFAULT label in the workflow.
# During the rollout phase, it has the same effect as LABEL_CIFLOW_DEFAULT
ciflow_ruleset.add_label_rule({LABEL_CIFLOW_DEFAULT}, workflow.build_environment)
ciflow_ruleset.generate_json()

View File

@ -0,0 +1,94 @@
#!/usr/bin/env python3
"""Generates a matrix to be utilized through github actions
Will output a matrix to represent our testing configurations, which is currently
dictated by just sharding.
"""
import json
import os
import re
from typing import Dict
from typing_extensions import TypedDict
class Config(TypedDict):
num_shards: int
runner: str
def get_disabled_issues() -> str:
pr_body = os.getenv('PR_BODY', '')
# The below regex is meant to match all *case-insensitive* keywords that
# GitHub has delineated would link PRs to issues, more details here:
# https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue.
# E.g., "Close #62851", "fixES #62851" and "RESOLVED #62851" would all match, but not
# "closes #62851" --> extra space, "fixing #62851" --> not a keyword, nor "fix 62851" --> no #
regex = '(?i)(Close(d|s)?|Resolve(d|s)?|Fix(ed|es)?) #([0-9]+)'
issue_numbers = [x[4] for x in re.findall(regex, pr_body)]
return ','.join(issue_numbers)
def main() -> None:
TEST_RUNNER_TYPE = os.getenv('TEST_RUNNER_TYPE')
assert TEST_RUNNER_TYPE is not None
ON_PULL_REQUEST = os.getenv('GITHUB_HEAD_REF')
NUM_TEST_SHARDS_ON_PULL_REQUEST = os.getenv('NUM_TEST_SHARDS_ON_PULL_REQUEST')
NUM_TEST_SHARDS = int(os.getenv('NUM_TEST_SHARDS', '1'))
if ON_PULL_REQUEST and NUM_TEST_SHARDS_ON_PULL_REQUEST:
NUM_TEST_SHARDS = int(NUM_TEST_SHARDS_ON_PULL_REQUEST)
MULTIGPU_RUNNER_TYPE = os.getenv('MULTIGPU_RUNNER_TYPE')
NOGPU_RUNNER_TYPE = os.getenv('NOGPU_RUNNER_TYPE')
configs: Dict[str, Config] = {}
if os.getenv('ENABLE_JIT_LEGACY_TEST'):
configs['jit_legacy'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
if MULTIGPU_RUNNER_TYPE is not None and os.getenv('ENABLE_MULTIGPU_TEST'):
configs['multigpu'] = {'num_shards': 1, 'runner': MULTIGPU_RUNNER_TYPE}
if NOGPU_RUNNER_TYPE is not None and os.getenv('ENABLE_NOGPU_NO_AVX_TEST'):
configs['nogpu_NO_AVX'] = {'num_shards': 1, 'runner': NOGPU_RUNNER_TYPE}
if NOGPU_RUNNER_TYPE is not None and os.getenv('ENABLE_NOGPU_NO_AVX2_TEST'):
configs['nogpu_NO_AVX2'] = {'num_shards': 1, 'runner': NOGPU_RUNNER_TYPE}
if os.getenv('ENABLE_DISTRIBUTED_TEST'):
configs['distributed'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
if os.getenv('ENABLE_SLOW_TEST'):
configs['slow'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
if os.getenv('ENABLE_DOCS_TEST'):
configs['docs_test'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
if os.getenv('ENABLE_BACKWARDS_COMPAT_TEST'):
configs['backwards_compat'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
if os.getenv('ENABLE_XLA_TEST'):
configs['xla'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
if os.getenv('ENABLE_NOARCH_TEST'):
configs['noarch'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
matrix = {
'include': [
{
'config': 'default',
'shard': shard,
'num_shards': NUM_TEST_SHARDS,
'runner': TEST_RUNNER_TYPE,
}
for shard in range(1, NUM_TEST_SHARDS + 1)
] + [
{
'config': name,
'shard': shard,
'num_shards': config['num_shards'],
'runner': config['runner'],
}
for name, config in configs.items()
for shard in range(1, config['num_shards'] + 1)
]
}
render_matrix = {'config': list(dict.fromkeys(x['config'] for x in matrix['include']))}
print(json.dumps({'matrix': matrix, 'render-matrix': render_matrix}, indent=2))
print(f'::set-output name=matrix::{json.dumps(matrix)}')
print(f'::set-output name=render-matrix::{json.dumps(render_matrix)}')
print(f'::set-output name=ignore-disabled-issues::{get_disabled_issues()}')
if __name__ == "__main__":
main()

View File

@ -65,6 +65,8 @@ class PytorchVersion:
self.no_build_suffix = no_build_suffix
def get_post_build_suffix(self) -> str:
if self.no_build_suffix:
return ""
if self.gpu_arch_type == "cuda":
return f"+cu{self.gpu_arch_version.replace('.', '')}"
return f"+{self.gpu_arch_type}{self.gpu_arch_version}"
@ -87,9 +89,9 @@ def main() -> None:
)
parser.add_argument(
"--no-build-suffix",
type=strtobool,
action="store_true",
help="Whether or not to add a build suffix typically (+cpu)",
default=os.environ.get("NO_BUILD_SUFFIX", False)
default=strtobool(os.environ.get("NO_BUILD_SUFFIX", "False"))
)
parser.add_argument(
"--gpu-arch-type",

View File

@ -0,0 +1,11 @@
function Get-SSH-Sessions {
Get-Process sshd -IncludeUserName |
Where-Object UserName -notLike "*SYSTEM*" |
Select-Object Id
}
$runningSessions = Get-SSH-Sessions
foreach ($session in $runningSessions) {
Stop-Process -id $session.Id
}

View File

@ -13,6 +13,7 @@ Testing environment:
# 1. Does not reuse the build artifact in other CI workflows
# 2. CI jobs are serialized because there is only one worker
import os
import git # type: ignore[import]
import pathlib
import argparse
import subprocess
@ -23,6 +24,7 @@ CUDA_VERSION = "cu102"
PYTHON_VERSION = "3.7"
TORCHBENCH_CONFIG_NAME = "config.yaml"
MAGIC_PREFIX = "RUN_TORCHBENCH:"
MAGIC_TORCHBENCH_PREFIX = "TORCHBENCH_BRANCH:"
ABTEST_CONFIG_TEMPLATE = """# This config is automatically generated by run_torchbench.py
start: {control}
end: {treatment}
@ -57,7 +59,7 @@ def extract_models_from_pr(torchbench_path: str, prbody_file: str) -> List[str]:
lines = map(lambda x: x.strip(), pf.read().splitlines())
magic_lines = list(filter(lambda x: x.startswith(MAGIC_PREFIX), lines))
if magic_lines:
# Only the first magic line will be respected.
# Only the first magic line will be recognized.
model_list = list(map(lambda x: x.strip(), magic_lines[0][len(MAGIC_PREFIX):].split(",")))
# Shortcut: if model_list is ["ALL"], run all the tests
if model_list == ["ALL"]:
@ -71,6 +73,26 @@ def extract_models_from_pr(torchbench_path: str, prbody_file: str) -> List[str]:
return []
return model_list
def identify_torchbench_branch(torchbench_path: str, prbody_file: str) -> None:
branch_name: str
with open(prbody_file, "r") as pf:
lines = map(lambda x: x.strip(), pf.read().splitlines())
magic_lines = list(filter(lambda x: x.startswith(MAGIC_TORCHBENCH_PREFIX), lines))
if magic_lines:
# Only the first magic line will be recognized.
branch_name = magic_lines[0][len(MAGIC_TORCHBENCH_PREFIX):].strip()
# If not specified, directly return without the branch checkout
if not branch_name:
return
try:
print(f"Checking out the TorchBench branch: {branch_name} ...")
repo = git.Repo(torchbench_path)
origin = repo.remotes.origin
origin.fetch(branch_name)
repo.create_head(branch_name, origin.refs[branch_name]).checkout()
except git.exc.GitCommandError:
raise RuntimeError(f'{branch_name} doesn\'t exist in the pytorch/benchmark repository. Please double check.')
def run_torchbench(pytorch_path: str, torchbench_path: str, output_dir: str) -> None:
# Copy system environment so that we will not override
env = dict(os.environ)
@ -96,6 +118,12 @@ if __name__ == "__main__":
if not models:
print("Can't parse the model filter from the pr body. Currently we only support allow-list.")
exit(1)
# Identify the specified TorchBench branch, verify the branch exists, and checkout the branch
try:
identify_torchbench_branch(args.torchbench_path, args.pr_body)
except RuntimeError as e:
print(f"Identify TorchBench branch failed: {str(e)}")
exit(1)
print(f"Ready to run TorchBench with benchmark. Result will be saved in the directory: {output_dir}.")
# Run TorchBench with the generated config
torchbench_config = gen_abtest_config(args.pr_base_sha, args.pr_head_sha, models)

View File

@ -0,0 +1,17 @@
function Get-SSH-Users {
# Gets ssh sessions for all users not named SYSTEM
Get-CimInstance -ClassName Win32_Process -Filter "Name = 'sshd.exe'" |
Get-CimAssociatedInstance -Association Win32_SessionProcess |
Get-CimAssociatedInstance -Association Win32_LoggedOnUser |
Where-Object {$_.Name -ne 'SYSTEM'} |
Measure-Object
}
$usersLoggedOn = Get-SSH-Users
Write-Output "Holding runner until all ssh sessions have logged out"
while ($usersLoggedOn.Count -gt 0) {
$usersLoggedOn = Get-SSH-Users
Write-Output "."
Start-Sleep -s 5
}

13
.github/scripts/wait_for_ssh_to_drain.sh vendored Executable file
View File

@ -0,0 +1,13 @@
#!/usr/bin/env bash
set -eou pipefail
echo "Holding runner for 2 hours until all ssh sessions have logged out"
for _ in $(seq 1440); do
# Break if no ssh session exists anymore
if [ "$(who)" = "" ]; then
break
fi
echo "."
sleep 5
done

View File

@ -0,0 +1,137 @@
{%- extends "linux_ci_workflow.yml.j2" -%}
{%- set exclude_test = true -%}
{% block name -%}
# Template is at: .github/templates/bazel_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: !{{ build_environment }}
{%- endblock %}
on:
{%- if on_pull_request %}
pull_request:
{%- if ciflow_config.enabled %}
{%- if ciflow_config.trigger_action_only %}
types: [!{{ ciflow_config.trigger_action }}]
{%- else %}
types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
{%- endif %}
{%- endif %}
{%- else %}
# TODO: Enable pull_request builds when we can verify capacity can be met by auto-scalers
{%- endif %}
{% block build +%}
# building and testing in a single job since bazel runs only small subset of tests
build-and-test:
runs-on: !{{ test_runner_type }}
needs: [calculate-docker-image, !{{ ciflow_config.root_job_name }}]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: !{{ build_environment }}-build-and-test
NUM_TEST_SHARDS: !{{ num_test_shards }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
!{{ common.setup_ec2_linux() }}
!{{ common.checkout_pytorch("recursive") }}
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- name: Output disk space left
run: |
sudo df -H
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e PR_LABELS \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && sudo chown -R jenkins /dev && .jenkins/pytorch/build.sh'
!{{ common.parse_ref() }}
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Test
run: |
# detached container should get cleaned up by teardown_ec2_linux
export SHARD_NUMBER=0
# TODO: Stop building test binaries as part of the build phase
# Make sure we copy test results from bazel-testlogs symlink to
# a regular directory ./test/test-reports
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e GITHUB_ACTIONS \
-e IN_CI \
-e SHARD_NUMBER \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e CONTINUE_THROUGH_ERROR \
-e PR_LABELS \
-e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && sudo chown -R jenkins /dev && .jenkins/pytorch/test.sh && cp -Lr ./bazel-testlogs ./test/test-reports'
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
!{{ common.upload_test_reports(name='bazel') }}
!{{ common.upload_test_statistics(build_environment) }}
!{{ common.teardown_ec2_linux() }}
{%- endblock %}

186
.github/templates/common.yml.j2 vendored Normal file
View File

@ -0,0 +1,186 @@
{%- set upload_artifact_s3_action = "seemethere/upload-artifact-s3@v3" -%}
{# squid_proxy is an private ELB that only available for GHA custom runners #}
{%- set squid_proxy = "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -%}
{# squid_no_proxy is a list of common set of fixed domains or IPs that we don't need to proxy. See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/http_proxy_config.html#windows-proxy #}
{%- set squid_no_proxy = "localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" -%}
{%- macro concurrency(build_environment) -%}
concurrency:
group: !{{ build_environment }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
{%- endmacro -%}
{%- macro display_ec2_information() -%}
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
{%- endmacro -%}
{%- macro parse_ref() -%}
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
{%- endmacro -%}
{%- macro upload_test_statistics(build_environment) -%}
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: !{{ build_environment }}-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
{%- endmacro -%}
{%- macro setup_ec2_linux() -%}
!{{ display_ec2_information() }}
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
{%- endmacro -%}
{%- macro teardown_ec2_linux() -%}
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
{%- endmacro -%}
{%- macro checkout_pytorch(submodules) -%}
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: !{{ submodules }}
{%- endmacro -%}
{%- macro upload_test_reports(name) -%}
- name: Zip test reports for upload
if: always()
env:
{%- if name == 'linux' or name == 'windows' %}
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
{%- else %}
FILE_SUFFIX: '!{{ name }}-${{ github.job }}'
{%- endif %}
{%- if name == 'windows' %}
shell: powershell
run: |
# -ir => recursive include all files in pattern
7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml'
{%- else %}
run: |
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
{%- endif %}
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
{%- if name == 'linux' or name == 'windows' %}
name: test-reports-${{ matrix.config }}
{%- else %}
name: test-reports-!{{ name }}
{%- endif %}
retention-days: 14
if-no-files-found: error
path:
{%- if name == 'windows' %}
pytorch-${{ github.run_id }}/test-reports-*.zip
{%- else %}
test-reports-*.zip
{%- endif %}
- uses: !{{ upload_artifact_s3_action }}
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
{%- if name == 'windows' %}
pytorch-${{ github.run_id }}/test-reports-*.zip
{%- else %}
test-reports-*.zip
{%- endif %}
{%- endmacro -%}
{%- macro render_test_results() -%}
- name: Install render_test_results dependencies
if: always()
shell: bash
run: |
python3 -m pip install junitparser==2.1.1 rich==10.9.0
- name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
if: always()
shell: bash
# Encoding is weird on windows, just try to default to utf-8 if possible
env:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
{%- endmacro -%}

View File

@ -1,55 +1,79 @@
{% import 'common.yml.j2' as common %}
{%- block name -%}
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: Linux CI (!{{ build_environment }})
name: !{{ build_environment }}
{%- endblock %}
on:
# TODO: Enable pull_request builds when we can verify capacity can be met by auto-scalers
{%- if on_pull_request %}
pull_request:
{%- if ciflow_config.enabled %}
{%- if ciflow_config.trigger_action_only %}
types: [!{{ ciflow_config.trigger_action }}]
{%- else %}
types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
{%- endif %}
{%- endif %}
{%- else %}
# TODO: Enable pull_request builds when we can verify capacity can be met by auto-scalers
{%- endif %}
{%- if is_scheduled %}
schedule:
- cron: !{{ is_scheduled }}
{%- else %}
push:
branches:
- master
- release/*
{%- endif %}
workflow_dispatch:
env:
BUILD_ENVIRONMENT: !{{ build_environment }}
DOCKER_IMAGE_BASE: !{{ docker_image_base }}
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: !{{ build_environment }}-${{ github.event.pull_request.number || github.sha }}
cancel-in-progress: true
!{{ common.concurrency(build_environment) }}
jobs:
{%- if ciflow_config.enabled %}
!{{ ciflow_config.root_job_name }}:
runs-on: ubuntu-18.04
if: ${{ !{{ ciflow_config.root_job_condition }} }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running !{{ ciflow_config.root_job_name }}
- name: print labels
run: echo "${LABELS}"
{%- endif %}
calculate-docker-image:
runs-on: linux.2xlarge
{%- if ciflow_config.enabled %}
needs: [!{{ ciflow_config.root_job_name }}]
{%- endif %}
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Log in to ECR
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Checkout PyTorch
uses: actions/checkout@v2
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
!{{ common.setup_ec2_linux() }}
!{{ common.checkout_pytorch("false") }}
- name: Calculate docker image tag
id: calculate-tag
run: |
@ -89,93 +113,78 @@ jobs:
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: steps.check.outputs.rebuild
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
{% block build +%}
build:
runs-on: linux.2xlarge
needs: calculate-docker-image
needs: [calculate-docker-image, !{{ ciflow_config.root_job_name }}]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: !{{ build_environment }}-build
steps:
- name: Log in to ECR
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Checkout PyTorch
uses: actions/checkout@v2
with:
fetch-depth: 0 # deep clone, to allow sharding to use git rev-list
submodules: recursive
!{{ common.setup_ec2_linux() }}
!{{ common.checkout_pytorch("recursive") }}
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Preserve github env variables for use in docker
- name: Build
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Build PyTorch
run: |
docker run \
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}" \
sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
!{{ common.parse_ref() }}
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/print_test_stats.py to natively support GitHub Actions
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: ${{ github.run_id }} # dunno if this corresponds
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
export PYTHONPATH=$PWD
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests
python3 .circleci/scripts/upload_binary_size_to_scuba.py || exit 0
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
{%- if not is_libtorch %}
- name: Archive artifacts into zip
run: |
zip -r artifacts.zip dist/ build/
# Upload to github so that people can click and download artifacts
- uses: actions/upload-artifact@v2
# Don't fail on upload to GH since it's only for user convenience
continue-on-error: true
name: Store PyTorch Build Artifacts on Github
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
artifacts.zip
- uses: seemethere/upload-artifact-s3@9d7ceb0ab39c2c88d93ef7792b27425b27d59162
zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
- uses: !{{ common.upload_artifact_s3_action }}
name: Store PyTorch Build Artifacts on S3
with:
name: ${{ env.BUILD_ENVIRONMENT }}
@ -183,38 +192,78 @@ jobs:
if-no-files-found: error
path:
artifacts.zip
{%- endif %}
!{{ common.teardown_ec2_linux() }}
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
{%- endblock %}
{%- if not exclude_test %}
{% block test +%}
generate-test-matrix:
runs-on: ubuntu-18.04
{%- if ciflow_config.enabled %}
needs: [!{{ ciflow_config.root_job_name }}]
{%- endif %}
env:
TEST_RUNNER_TYPE: !{{ test_runner_type }}
ENABLE_DISTRIBUTED_TEST: !{{ enable_distributed_test }}
ENABLE_JIT_LEGACY_TEST: !{{ enable_jit_legacy_test }}
ENABLE_MULTIGPU_TEST: !{{ enable_multigpu_test }}
ENABLE_NOGPU_NO_AVX_TEST: !{{ enable_nogpu_no_avx_test }}
ENABLE_NOGPU_NO_AVX2_TEST: !{{ enable_nogpu_no_avx2_test }}
ENABLE_SLOW_TEST: !{{ enable_slow_test }}
ENABLE_DOCS_TEST: !{{ enable_docs_test }}
ENABLE_BACKWARDS_COMPAT_TEST: !{{ enable_backwards_compat_test }}
ENABLE_XLA_TEST: !{{ enable_xla_test }}
ENABLE_NOARCH_TEST: !{{ enable_noarch_test }}
NUM_TEST_SHARDS: !{{ num_test_shards }}
MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
NOGPU_RUNNER_TYPE: linux.2xlarge
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
runs-on: !{{ test_runner_type }}
needs:
- calculate-docker-image
- build
needs: [calculate-docker-image, build, generate-test-matrix, !{{ ciflow_config.root_job_name }}]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: !{{ build_environment }}-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
- name: Log in to ECR
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)/../":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Checkout PyTorch
uses: actions/checkout@v2
with:
submodules: recursive
!{{ common.setup_ec2_linux() }}
!{{ common.checkout_pytorch("recursive") }}
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') }}
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
run: |
bash .github/scripts/install_nvidia_utils_linux.sh
echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"
@ -240,134 +289,98 @@ jobs:
- name: Output disk space left
run: |
sudo df -H
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Test PyTorch
!{{ common.parse_ref() }}
- name: Test
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
IS_GHA: 1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_DEFAULT_REGION: us-east-1
run: |
if [[ $TEST_CONFIG == 'multigpu' ]]; then
TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
else
TEST_COMMAND=.jenkins/pytorch/test.sh
fi
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
# detached container should get cleaned up by teardown_ec2_linux
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
docker run \
container_name=$(docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
-e PR_NUMBER \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e GITHUB_ACTIONS \
-e IN_CI \
-e IS_GHA \
-e CIRCLE_BRANCH \
-e CIRCLE_SHA1 \
-e CIRCLE_PR_NUMBER \
-e AWS_DEFAULT_REGION \
-e IN_WHEEL_TEST \
-e SHARD_NUMBER \
-e JOB_BASE_NAME \
-e TEST_CONFIG \
-e NUM_TEST_SHARDS \
-e PYTORCH_IGNORE_DISABLED_ISSUES \
-e PR_LABELS \
-e CONTINUE_THROUGH_ERROR \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
--name="${container_name}" \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}" \
sh -c 'sudo chown -R jenkins . && pip install dist/*.whl && .jenkins/pytorch/test.sh'
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- uses: actions/upload-artifact@v2
name: Store PyTorch Test Reports
if: always()
with:
name: test-reports
retention-days: 14
if-no-files-found: error
path:
test/**/*.xml
- name: Clean up docker images
if: always()
!{{ common.render_test_results() }}
{%- if is_coverage %}
- name: Report coverage
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
# Prune all of the docker images
docker system prune -af
# this is a separate step from test because the log files from test are too
# long: basically, GitHub tries to render all of the log files when you click
# through an action causing extreme slowdown on actions that contain too many
# logs (like test); we can always move it back to the other one, but it
# doesn't create the best experience
render_test_results:
if: always()
needs:
- test
runs-on: ubuntu-18.04
steps:
- name: Checkout PyTorch
uses: actions/checkout@v2
with:
# deep clone, to allow tools/print_test_stats.py to use Git commands
fetch-depth: 0
- uses: actions/download-artifact@v2
name: Download PyTorch Test Reports
with:
name: test-reports
path: test/test-reports
- uses: actions/setup-python@v2
with:
python-version: 3.9
- name: Install dependencies
# boto3 version copied from .circleci/docker/common/install_conda.sh
run: |
pip install -r requirements.txt
pip install boto3==1.16.34 junitparser rich
- name: Output Test Results (Click Me)
run: |
python tools/render_junit.py test
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload test statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/print_test_stats.py to natively support GitHub Actions
env:
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_SECRET_ACCESS_KEY }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_JOB: !{{ build_environment }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: ${{ github.run_id }} # dunno if this corresponds
run: |
export PYTHONPATH=$PWD
python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test
{%- if enable_doc_jobs %}
pytorch_python_doc_build:
python3 -mpip install codecov==2.1.12
python3 -mcodecov
{%- endif %}
!{{ common.upload_test_reports(name='linux') }}
!{{ common.upload_test_statistics(build_environment) }}
!{{ common.teardown_ec2_linux() }}
{% endblock %}
{%- endif -%}
{%- if enable_doc_jobs %}
build-docs:
runs-on: linux.2xlarge
needs:
- calculate-docker-image
- build
strategy:
matrix:
docs_type: [cpp, python]
needs: [calculate-docker-image, build, !{{ ciflow_config.root_job_name }}]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
DOCS_TYPE: ${{ matrix.docs_type }}
steps:
- name: Log in to ECR
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Checkout PyTorch
uses: actions/checkout@v2
with:
fetch-depth: 0 # deep clone, to allow sharding to use git rev-list
submodules: recursive
!{{ common.setup_ec2_linux() }}
!{{ common.checkout_pytorch("recursive") }}
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
@ -375,45 +388,64 @@ jobs:
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Build Python Doc in Docker
- name: Build ${{ matrix.docs_type }} docs
run: |
set -ex
time docker pull "${DOCKER_IMAGE}" > /dev/null
echo "${GITHUB_REF}"
ref=${GITHUB_REF##*/}
target=${ref//v}
docker run \
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e IN_CI \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e CIRCLE_SHA1="$GITHUB_SHA" \
-e DOCS_VERSION="${target}" \
-e DOCS_TYPE \
-e PR_LABELS \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--name="$GITHUB_SHA" \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}" \
bash -c "sudo chown -R jenkins . && pip install dist/*.whl && ./.circleci/scripts/python_doc_push_script.sh docs/$target $target site"
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" bash -c "sudo chown -R jenkins . && pip install dist/*.whl && ./.circleci/scripts/${DOCS_TYPE}_doc_push_script.sh"
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- uses: !{{ common.upload_artifact_s3_action }}
name: Upload Python Docs Preview
if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'python' }}
with:
retention-days: 14
s3-bucket: doc-previews
if-no-files-found: error
path: pytorch.github.io/docs/merge/
s3-prefix: pytorch/${{ github.event.pull_request.number }}
- uses: !{{ common.upload_artifact_s3_action }}
name: Upload C++ Docs Preview
if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'cpp' }}
with:
retention-days: 14
if-no-files-found: error
s3-bucket: doc-previews
path: cppdocs/
s3-prefix: pytorch/${{ github.event.pull_request.number }}/cppdocs
- name: Archive artifacts into zip
run: |
zip -r pytorch_github_io.zip "${GITHUB_WORKSPACE}/pytorch.github.io"
zip -r "docs_${DOCS_TYPE}.zip" "${GITHUB_WORKSPACE}/pytorch.github.io" "${GITHUB_WORKSPACE}/cppdocs"
- uses: actions/upload-artifact@v2
name: Store PyTorch Build Artifacts
with:
name: pytorch_github_io
name: docs_${{ matrix.docs_type }}
path: docs_${{ matrix.docs_type }}.zip
if-no-files-found: error
path: pytorch_github_io.zip
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
{%- endif -%}
!{{ common.teardown_ec2_linux() }}
{%- endif -%}

View File

@ -1,15 +1,43 @@
{% import 'common.yml.j2' as common %}
{%- macro wait_and_kill_ssh() -%}
- name: Wait until all sessions have drained
shell: powershell
if: always()
timeout-minutes: 120
run: |
.github\scripts\wait_for_ssh_to_drain.ps1
- name: Kill active ssh sessions if still around (Useful if workflow was cancelled)
shell: powershell
if: always()
run: |
.github\scripts\kill_active_ssh_sessions.ps1
{%- endmacro -%}
# Template is at: .github/templates/windows_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: Windows CI (!{{ build_environment }})
name: !{{ build_environment }}
on:
{%- if on_pull_request %}
pull_request:
{%- if ciflow_config.enabled %}
{%- if ciflow_config.trigger_action_only %}
types: [!{{ ciflow_config.trigger_action }}]
{%- else %}
types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
{%- endif %}
{%- endif %}
{%- endif %}
{%- if is_scheduled %}
schedule:
- cron: !{{ is_scheduled }}
{%- else %}
push:
branches:
- master
- release/*
{%- endif %}
workflow_dispatch:
env:
@ -18,33 +46,56 @@ env:
CUDA_VERSION: "!{{ cuda_version }}"
IN_CI: 1
INSTALL_WINDOWS_SDK: 1
JOB_BASE_NAME: test
PYTHON_VERSION: "3.8"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
SCCACHE_BUCKET: "ossci-compiler-cache"
VC_PRODUCT: "BuildTools"
VC_VERSION: ""
VS_VERSION: "16.8.6"
VC_YEAR: "2019"
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
no_proxy: !{{ common.squid_no_proxy }}
{%- if cuda_version != "cpu" %}
TORCH_CUDA_ARCH_LIST: "7.0"
USE_CUDA: 1
{%- endif %}
concurrency:
group: !{{ build_environment }}-${{ github.event.pull_request.number || github.sha }}
cancel-in-progress: true
!{{ common.concurrency(build_environment) }}
jobs:
{%- if ciflow_config.enabled %}
!{{ ciflow_config.root_job_name }}:
runs-on: ubuntu-18.04
if: ${{ !{{ ciflow_config.root_job_condition }} }}
steps:
- name: noop
run: echo running !{{ ciflow_config.root_job_name }}
{%- endif %}
build:
runs-on: "windows.4xlarge"
defaults:
run:
working-directory: pytorch-${{ github.run_id }}
{%- if ciflow_config.enabled %}
needs: [!{{ ciflow_config.root_job_name }}]
{%- endif %}
env:
JOB_BASE_NAME: !{{ build_environment }}-build
http_proxy: "!{{ common. squid_proxy }}"
https_proxy: "!{{ common.squid_proxy }}"
steps:
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Checkout PyTorch
uses: actions/checkout@v2
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
submodules: recursive
- name: Clean workspace (including things in .gitignore)
shell: bash
run: |
git clean -xdf
path: pytorch-${{ github.run_id }}
# deep clone, to allow use of git merge-base
fetch-depth: 0
!{{ common.display_ec2_information() }}
- name: Install Visual Studio 2019 toolchain
shell: powershell
run: |
@ -61,6 +112,8 @@ jobs:
{%- endif %}
- name: Build
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
run: |
.jenkins/pytorch/win-build.sh
# Upload to github so that people can click and download artifacts
@ -73,31 +126,86 @@ jobs:
retention-days: 14
if-no-files-found: error
name: ${{ env.BUILD_ENVIRONMENT }}
path: C:\w\build-results
path: C:\${{ github.run_id }}\build-results
- name: Upload artifacts to s3
if: always()
uses: seemethere/upload-artifact-s3@9d7ceb0ab39c2c88d93ef7792b27425b27d59162
uses: !{{ common.upload_artifact_s3_action }}
with:
retention-days: 14
if-no-files-found: error
name: ${{ env.BUILD_ENVIRONMENT }}
path: C:\w\build-results
path: C:\${{ github.run_id }}\build-results
!{{ wait_and_kill_ssh() }}
- name: Cleanup build-results and workspaces
if: always()
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
# Should remove the entirety of pytorch-${{ github.run_id }}
run: |
rm -rf "${PYTORCH_FINAL_PACKAGE_DIR}"
rm -rf ./*
generate-test-matrix:
{%- if ciflow_config.enabled %}
needs: [!{{ ciflow_config.root_job_name }}]
{%- endif %}
runs-on: ubuntu-18.04
env:
TEST_RUNNER_TYPE: !{{ test_runner_type }}
NUM_TEST_SHARDS: !{{ num_test_shards }}
NUM_TEST_SHARDS_ON_PULL_REQUEST: !{{ num_test_shards_on_pull_request }}
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
runs-on: !{{ test_runner_type }}
{%- if only_build_on_pull_request %}
if: ${{ github.event_name == 'push' }}
{%- endif %}
env:
JOB_BASE_NAME: !{{ build_environment }}-test
needs:
- build
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
TEST_CONFIG: ${{ matrix.config }}
http_proxy: "!{{ common.squid_proxy }}"
https_proxy: "!{{ common.squid_proxy }}"
RUN_SMOKE_TESTS_ONLY_ON_PR: !{{ only_run_smoke_tests_on_pull_request }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
needs: [build, generate-test-matrix, !{{ ciflow_config.root_job_name }}]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
defaults:
run:
working-directory: pytorch-${{ github.run_id }}
steps:
- name: Checkout PyTorch
uses: actions/checkout@v2
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
submodules: recursive
- name: Clean workspace (including things in .gitignore)
shell: bash
run: |
git clean -xdf
path: pytorch-${{ github.run_id }}
# deep clone, to allow use of git merge-base
fetch-depth: 0
!{{ common.display_ec2_information() }}
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Install Visual Studio 2019 toolchain
shell: powershell
run: |
@ -126,71 +234,26 @@ jobs:
name: Setup Python3
with:
python-version: '3.x'
- name: Run test scripts
- name: Test
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
run: |
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
if [[ -n $GITHUB_HEAD_REF && "$RUN_SMOKE_TESTS_ONLY_ON_PR" == "true" ]]; then
export RUN_SMOKE_TESTS_ONLY=1
fi
.jenkins/pytorch/win-test.sh
- uses: actions/upload-artifact@v2
name: Store PyTorch Test Reports
!{{ common.upload_test_reports(name='windows') }}
!{{ common.render_test_results() }}
!{{ wait_and_kill_ssh() }}
!{{ common.parse_ref() }}
!{{ common.upload_test_statistics(build_environment) }}
- name: Cleanup workspace
if: always()
with:
name: test-reports
retention-days: 14
if-no-files-found: error
path:
test/**/*.xml
# this is a separate step from test because the log files from test are too
# long: basically, GitHub tries to render all of the log files when you click
# through an action causing extreme slowdown on actions that contain too many
# logs (like test); we can always move it back to the other one, but it
# doesn't create the best experience
render_test_results:
if: always()
needs:
- test
runs-on: ubuntu-18.04
# TODO: Make this into a composite step
steps:
- name: Checkout PyTorch
uses: actions/checkout@v2
with:
# deep clone, to allow tools/print_test_stats.py to use Git commands
fetch-depth: 0
- uses: actions/download-artifact@v2
name: Download PyTorch Test Reports
with:
name: test-reports
path: test/test-reports
- uses: actions/setup-python@v2
with:
python-version: 3.9
- name: Install dependencies
# boto3 version copied from .circleci/docker/common/install_conda.sh
shell: bash
# Should remove the entirety of pytorch-${{ github.run_id }}
run: |
pip install -r requirements.txt
pip install boto3==1.16.34 junitparser rich
- name: Output Test Results (Click Me)
run: |
python tools/render_junit.py test
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload test statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/print_test_stats.py to natively support GitHub Actions
env:
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_SECRET_ACCESS_KEY }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_JOB: !{{ build_environment }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: ${{ github.run_id }} # dunno if this corresponds
run: |
export PYTHONPATH=$PWD
python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test
rm -rf ./*

View File

@ -1,66 +0,0 @@
name: Add annotations
on:
workflow_run:
types:
- completed
workflows:
- Lint
jobs:
annotate:
strategy:
fail-fast: false
matrix:
name:
- flake8-py3
- clang-tidy
runs-on: ubuntu-18.04
steps:
- name: Download artifact
uses: actions/github-script@v3
env:
RUN_ID: ${{ github.event.workflow_run.id }}
LINT_NAME: ${{ matrix.name }}
with:
# https://securitylab.github.com/research/github-actions-preventing-pwn-requests/
script: |
const artifacts = await github.actions.listWorkflowRunArtifacts({
owner: context.repo.owner,
repo: context.repo.repo,
run_id: process.env.RUN_ID,
});
const filteredArtifacts = artifacts.data.artifacts.filter(artifact => {
return artifact.name == process.env.LINT_NAME;
});
if (filteredArtifacts.length > 0) {
const matchArtifact = filteredArtifacts[0];
const download = await github.actions.downloadArtifact({
owner: context.repo.owner,
repo: context.repo.repo,
artifact_id: matchArtifact.id,
archive_format: 'zip',
});
const fs = require('fs');
fs.writeFileSync(
`${process.env.GITHUB_WORKSPACE}/linter-output.zip`,
Buffer.from(download.data),
);
}
- name: Unzip artifact
id: unzip
run: |
if unzip linter-output.zip annotations.json commit-sha.txt; then
echo ::set-output \
name=sha::"$(grep -Em1 '^[[:xdigit:]]{40}$' commit-sha.txt)"
fi
- if: steps.unzip.outputs.sha
name: Add annotations
uses: pytorch/add-annotations-github-action@master
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
check_name: ${{ matrix.name }}
linter_output_path: annotations.json
commit_sha: ${{ steps.unzip.outputs.sha }}
mode: json

View File

@ -6,8 +6,15 @@ on:
pull_request_target:
types: [edited, opened, synchronize, reopened]
concurrency:
group: auto-label-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
auto-label-rocm:
if: ${{ github.repository == 'pytorch/pytorch' }}
runs-on: ubuntu-18.04
steps:
- name: Retrieve information

View File

@ -16,7 +16,7 @@ jobs:
image: python:3.9
steps:
- name: Clone pytorch/pytorch
uses: actions/checkout@v2
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating build matrix
id: set-matrix
run: |
@ -57,12 +57,12 @@ jobs:
- name: Clean runner workspace
run: rm -rf "$GITHUB_WORKSPACE"
- name: Clone pytorch/pytorch
uses: actions/checkout@v2
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
path: pytorch
submodules: recursive
- name: Clone pytorch/builder
uses: actions/checkout@v2
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
repository: pytorch/builder
path: builder
@ -91,23 +91,25 @@ jobs:
with:
name: pytorch-conda-py${{ matrix.python_version }}-${{matrix.gpu_arch_type}}-${{ matrix.gpu_arch_version }}
path: /remote/**/*.bz2
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/print_test_stats.py to natively support GitHub Actions
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: ${{ github.run_id }} # dunno if this corresponds
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
export PYTHONPATH=$PWD
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests
python3 .circleci/scripts/upload_binary_size_to_scuba.py || exit 0
pip3 install requests==2.26
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
concurrency:
group: build-linux-conda-${{ github.event.pull_request.number || github.sha }}
group: build-linux-conda-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true

View File

@ -16,7 +16,7 @@ jobs:
image: python:3.9
steps:
- name: Clone pytorch/pytorch
uses: actions/checkout@v2
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating build matrix
id: set-matrix
run: |
@ -51,12 +51,12 @@ jobs:
- name: Clean runner workspace
run: rm -rf "$GITHUB_WORKSPACE"
- name: Clone pytorch/pytorch
uses: actions/checkout@v2
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
path: pytorch
submodules: recursive
- name: Clone pytorch/builder
uses: actions/checkout@v2
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
repository: pytorch/builder
path: builder
@ -90,23 +90,25 @@ jobs:
with:
name: pytorch-libtorch-${{ matrix.libtorch_variant }}-${{ matrix.devtoolset }}-${{matrix.gpu_arch_type}}-${{ matrix.gpu_arch_version }}
path: /remote/**/*.zip
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/print_test_stats.py to natively support GitHub Actions
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: ${{ github.run_id }} # dunno if this corresponds
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
export PYTHONPATH=$PWD
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests
python3 .circleci/scripts/upload_binary_size_to_scuba.py || exit 0
pip3 install requests==2.26
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
concurrency:
group: build-linux-libtorch-${{ github.event.pull_request.number || github.sha }}
group: build-linux-libtorch-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true

View File

@ -16,7 +16,7 @@ jobs:
image: python:3.9
steps:
- name: Clone pytorch/pytorch
uses: actions/checkout@v2
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating build matrix
id: set-matrix
run: |
@ -46,12 +46,12 @@ jobs:
- name: Clean runner workspace
run: rm -rf "$GITHUB_WORKSPACE"
- name: Clone pytorch/pytorch
uses: actions/checkout@v2
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
path: pytorch
submodules: recursive
- name: Clone pytorch/builder
uses: actions/checkout@v2
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
repository: pytorch/builder
path: builder
@ -89,23 +89,25 @@ jobs:
with:
name: pytorch-wheel-py${{ matrix.python_version }}-${{matrix.gpu_arch_type}}-${{ matrix.gpu_arch_version }}
path: /remote/**/*.whl
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/print_test_stats.py to natively support GitHub Actions
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: ${{ github.run_id }} # dunno if this corresponds
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
export PYTHONPATH=$PWD
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests
python3 .circleci/scripts/upload_binary_size_to_scuba.py || exit 0
pip3 install requests==2.26
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
concurrency:
group: build-linux-wheels-${{ github.event.pull_request.number || github.sha }}
group: build-linux-wheels-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true

View File

@ -1,48 +0,0 @@
name: clang-format
on:
pull_request:
jobs:
clang-format:
runs-on: ubuntu-18.04
steps:
- name: Setup Python
uses: actions/setup-python@v2
with:
python-version: 3.x
architecture: x64
- name: Fetch PyTorch
uses: actions/checkout@v2
with:
fetch-depth: 0 # deep clone, to allow us to use git merge-base
- name: Run clang-format
env:
BASE_SHA: ${{ github.event.pull_request.base.sha }}
run: |
set -eu
# This is necessary to get the same results regardless of whether the
# PR was opened directly or from a forked repo. See: `9f890a92` for more info.
git remote add upstream https://github.com/pytorch/pytorch
git fetch upstream "$GITHUB_BASE_REF"
# only run clang-format on allowlisted files
echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
echo "| clang-format failures found! Run: "
echo "| tools/clang_format_ci.sh ${BASE_SHA} "
echo "| to fix this error. "
echo "| For more info, see: https://github.com/pytorch/pytorch/wiki/clang-format "
echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
tools/clang_format_ci.sh "${BASE_SHA}"
GIT_DIFF=$(git diff)
if [[ -z $GIT_DIFF ]]; then
exit 0
fi
echo "$GIT_DIFF"
exit 1
concurrency:
group: clang-format-${{ github.event.pull_request.number || github.sha }}
cancel-in-progress: true

53
.github/workflows/create_release.yml vendored Normal file
View File

@ -0,0 +1,53 @@
name: Create Release
on:
push:
tags: ['v*']
branches: [master]
release:
types: [published]
pull_request:
paths: [.github/workflows/create_release.yml]
jobs:
release:
if: ${{ github.repository == 'pytorch/pytorch' }}
name: Create Release
runs-on: ubuntu-latest
steps:
- uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
submodules: 'recursive'
- name: Fake name for PRs
if: ${{ github.event_name == 'pull_request' }}
run: echo "PT_GITHUB_REF=refs/tags/pr-tag" >> "$GITHUB_ENV"
- name: Real name for non-PRs
if: ${{ github.event_name != 'pull_request' }}
run: echo "PT_GITHUB_REF=$GITHUB_REF" >> "$GITHUB_ENV"
- name: Set filenames
run: |
tag_or_branch="${PT_GITHUB_REF#refs/tags/}"
tag_or_branch="${tag_or_branch#refs/heads/}"
echo "PT_RELEASE_NAME=pytorch-$tag_or_branch" >> "$GITHUB_ENV"
echo "PT_RELEASE_FILE=pytorch-$tag_or_branch.tar.gz" >> "$GITHUB_ENV"
- name: Create source distribution
run: |
# Create new folder with specified name so extracting the archive yields that
rm -rf "/tmp/$PT_RELEASE_NAME"
cp -r "$PWD" "/tmp/$PT_RELEASE_NAME"
mv "/tmp/$PT_RELEASE_NAME" .
# Cleanup
rm -r "$PT_RELEASE_NAME"/{.azure_pipelines,.circleci,.jenkins}
find "$PT_RELEASE_NAME" -name '.git*' -exec rm -rv {} \; || true
# Create archive
tar -czf "$PT_RELEASE_FILE" "$PT_RELEASE_NAME"
echo "Created source archive $PT_RELEASE_FILE with content: $(ls -a "$PT_RELEASE_NAME")"
- name: Upload source distribution
if: ${{ github.event_name == 'release' }}
uses: softprops/action-gh-release@v1
with:
files: ${{env.PT_RELEASE_FILE}}
concurrency:
group: create-release-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true

View File

@ -0,0 +1,283 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: libtorch-linux-xenial-cuda10.2-py3.6-gcc7
on:
pull_request:
types: [unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: libtorch-linux-xenial-cuda10.2-py3.6-gcc7
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: libtorch-linux-xenial-cuda10.2-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/libtorch') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: libtorch-linux-xenial-cuda10.2-py3.6-gcc7-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af

View File

@ -0,0 +1,283 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: libtorch-linux-xenial-cuda11.3-py3.6-gcc7
on:
pull_request:
types: [unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: libtorch-linux-xenial-cuda11.3-py3.6-gcc7
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: libtorch-linux-xenial-cuda11.3-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/libtorch') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: libtorch-linux-xenial-cuda11.3-py3.6-gcc7-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af

View File

@ -0,0 +1,562 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: linux-bionic-cuda10.2-py3.9-gcc7
on:
pull_request:
types: [unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: linux-bionic-cuda10.2-py3.9-gcc7
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: linux-bionic-cuda10.2-py3.9-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository_owner == 'pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/slow'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Archive artifacts into zip
run: |
zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
- uses: seemethere/upload-artifact-s3@v3
name: Store PyTorch Build Artifacts on S3
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
artifacts.zip
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
generate-test-matrix:
runs-on: ubuntu-18.04
needs: [ciflow_should_run]
env:
TEST_RUNNER_TYPE: linux.8xlarge.nvidia.gpu
ENABLE_DISTRIBUTED_TEST: 1
ENABLE_JIT_LEGACY_TEST: ''
ENABLE_MULTIGPU_TEST: ''
ENABLE_NOGPU_NO_AVX_TEST: ''
ENABLE_NOGPU_NO_AVX2_TEST: ''
ENABLE_SLOW_TEST: ''
ENABLE_DOCS_TEST: ''
ENABLE_BACKWARDS_COMPAT_TEST: ''
ENABLE_XLA_TEST: ''
ENABLE_NOARCH_TEST: ''
NUM_TEST_SHARDS: 2
MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
NOGPU_RUNNER_TYPE: linux.2xlarge
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
run: |
bash .github/scripts/install_nvidia_utils_linux.sh
echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Output disk space left
run: |
sudo df -H
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Test
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
IS_GHA: 1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_DEFAULT_REGION: us-east-1
run: |
if [[ $TEST_CONFIG == 'multigpu' ]]; then
TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
else
TEST_COMMAND=.jenkins/pytorch/test.sh
fi
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
# detached container should get cleaned up by teardown_ec2_linux
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
container_name=$(docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
-e PR_NUMBER \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e GITHUB_ACTIONS \
-e IN_CI \
-e IS_GHA \
-e CIRCLE_BRANCH \
-e CIRCLE_SHA1 \
-e CIRCLE_PR_NUMBER \
-e AWS_DEFAULT_REGION \
-e IN_WHEEL_TEST \
-e SHARD_NUMBER \
-e JOB_BASE_NAME \
-e TEST_CONFIG \
-e NUM_TEST_SHARDS \
-e PYTORCH_IGNORE_DISABLED_ISSUES \
-e PR_LABELS \
-e CONTINUE_THROUGH_ERROR \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
--name="${container_name}" \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Install render_test_results dependencies
if: always()
shell: bash
run: |
python3 -m pip install junitparser==2.1.1 rich==10.9.0
- name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
if: always()
shell: bash
# Encoding is weird on windows, just try to default to utf-8 if possible
env:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Zip test reports for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
run: |
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af

View File

@ -0,0 +1,562 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: linux-bionic-py3.6-clang9
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: linux-bionic-py3.6-clang9
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.6-clang9
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: linux-bionic-py3.6-clang9-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/noarch') || contains(github.event.pull_request.labels.*.name, 'ciflow/xla'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-bionic-py3.6-clang9-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Archive artifacts into zip
run: |
zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
- uses: seemethere/upload-artifact-s3@v3
name: Store PyTorch Build Artifacts on S3
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
artifacts.zip
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
generate-test-matrix:
runs-on: ubuntu-18.04
needs: [ciflow_should_run]
env:
TEST_RUNNER_TYPE: linux.2xlarge
ENABLE_DISTRIBUTED_TEST: ''
ENABLE_JIT_LEGACY_TEST: ''
ENABLE_MULTIGPU_TEST: ''
ENABLE_NOGPU_NO_AVX_TEST: ''
ENABLE_NOGPU_NO_AVX2_TEST: ''
ENABLE_SLOW_TEST: ''
ENABLE_DOCS_TEST: ''
ENABLE_BACKWARDS_COMPAT_TEST: ''
ENABLE_XLA_TEST: ''
ENABLE_NOARCH_TEST: 1
NUM_TEST_SHARDS: 2
MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
NOGPU_RUNNER_TYPE: linux.2xlarge
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-bionic-py3.6-clang9-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
run: |
bash .github/scripts/install_nvidia_utils_linux.sh
echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Output disk space left
run: |
sudo df -H
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Test
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
IS_GHA: 1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_DEFAULT_REGION: us-east-1
run: |
if [[ $TEST_CONFIG == 'multigpu' ]]; then
TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
else
TEST_COMMAND=.jenkins/pytorch/test.sh
fi
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
# detached container should get cleaned up by teardown_ec2_linux
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
container_name=$(docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
-e PR_NUMBER \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e GITHUB_ACTIONS \
-e IN_CI \
-e IS_GHA \
-e CIRCLE_BRANCH \
-e CIRCLE_SHA1 \
-e CIRCLE_PR_NUMBER \
-e AWS_DEFAULT_REGION \
-e IN_WHEEL_TEST \
-e SHARD_NUMBER \
-e JOB_BASE_NAME \
-e TEST_CONFIG \
-e NUM_TEST_SHARDS \
-e PYTORCH_IGNORE_DISABLED_ISSUES \
-e PR_LABELS \
-e CONTINUE_THROUGH_ERROR \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
--name="${container_name}" \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Install render_test_results dependencies
if: always()
shell: bash
run: |
python3 -m pip install junitparser==2.1.1 rich==10.9.0
- name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
if: always()
shell: bash
# Encoding is weird on windows, just try to default to utf-8 if possible
env:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Zip test reports for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
run: |
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: linux-bionic-py3.6-clang9-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af

View File

@ -0,0 +1,566 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: linux-bionic-py3.8-gcc9-coverage
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: linux-bionic-py3.8-gcc9-coverage
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.8-gcc9
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: linux-bionic-py3.8-gcc9-coverage-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/coverage') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-bionic-py3.8-gcc9-coverage-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Archive artifacts into zip
run: |
zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
- uses: seemethere/upload-artifact-s3@v3
name: Store PyTorch Build Artifacts on S3
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
artifacts.zip
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
generate-test-matrix:
runs-on: ubuntu-18.04
needs: [ciflow_should_run]
env:
TEST_RUNNER_TYPE: linux.2xlarge
ENABLE_DISTRIBUTED_TEST: 1
ENABLE_JIT_LEGACY_TEST: ''
ENABLE_MULTIGPU_TEST: ''
ENABLE_NOGPU_NO_AVX_TEST: ''
ENABLE_NOGPU_NO_AVX2_TEST: ''
ENABLE_SLOW_TEST: ''
ENABLE_DOCS_TEST: ''
ENABLE_BACKWARDS_COMPAT_TEST: ''
ENABLE_XLA_TEST: ''
ENABLE_NOARCH_TEST: ''
NUM_TEST_SHARDS: 2
MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
NOGPU_RUNNER_TYPE: linux.2xlarge
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-bionic-py3.8-gcc9-coverage-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
run: |
bash .github/scripts/install_nvidia_utils_linux.sh
echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Output disk space left
run: |
sudo df -H
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Test
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
IS_GHA: 1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_DEFAULT_REGION: us-east-1
run: |
if [[ $TEST_CONFIG == 'multigpu' ]]; then
TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
else
TEST_COMMAND=.jenkins/pytorch/test.sh
fi
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
# detached container should get cleaned up by teardown_ec2_linux
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
container_name=$(docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
-e PR_NUMBER \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e GITHUB_ACTIONS \
-e IN_CI \
-e IS_GHA \
-e CIRCLE_BRANCH \
-e CIRCLE_SHA1 \
-e CIRCLE_PR_NUMBER \
-e AWS_DEFAULT_REGION \
-e IN_WHEEL_TEST \
-e SHARD_NUMBER \
-e JOB_BASE_NAME \
-e TEST_CONFIG \
-e NUM_TEST_SHARDS \
-e PYTORCH_IGNORE_DISABLED_ISSUES \
-e PR_LABELS \
-e CONTINUE_THROUGH_ERROR \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
--name="${container_name}" \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Install render_test_results dependencies
if: always()
shell: bash
run: |
python3 -m pip install junitparser==2.1.1 rich==10.9.0
- name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
if: always()
shell: bash
# Encoding is weird on windows, just try to default to utf-8 if possible
env:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Report coverage
run: |
python3 -mpip install codecov==2.1.12
python3 -mcodecov
- name: Zip test reports for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
run: |
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: linux-bionic-py3.8-gcc9-coverage-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af

View File

@ -0,0 +1,562 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: linux-xenial-cuda10.2-py3.6-gcc7
on:
pull_request:
types: [unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: linux-xenial-cuda10.2-py3.6-gcc7
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: linux-xenial-cuda10.2-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/slow'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-xenial-cuda10.2-py3.6-gcc7-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Archive artifacts into zip
run: |
zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
- uses: seemethere/upload-artifact-s3@v3
name: Store PyTorch Build Artifacts on S3
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
artifacts.zip
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
generate-test-matrix:
runs-on: ubuntu-18.04
needs: [ciflow_should_run]
env:
TEST_RUNNER_TYPE: linux.8xlarge.nvidia.gpu
ENABLE_DISTRIBUTED_TEST: 1
ENABLE_JIT_LEGACY_TEST: 1
ENABLE_MULTIGPU_TEST: 1
ENABLE_NOGPU_NO_AVX_TEST: 1
ENABLE_NOGPU_NO_AVX2_TEST: 1
ENABLE_SLOW_TEST: 1
ENABLE_DOCS_TEST: ''
ENABLE_BACKWARDS_COMPAT_TEST: ''
ENABLE_XLA_TEST: ''
ENABLE_NOARCH_TEST: ''
NUM_TEST_SHARDS: 2
MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
NOGPU_RUNNER_TYPE: linux.2xlarge
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-xenial-cuda10.2-py3.6-gcc7-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
run: |
bash .github/scripts/install_nvidia_utils_linux.sh
echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Output disk space left
run: |
sudo df -H
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Test
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
IS_GHA: 1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_DEFAULT_REGION: us-east-1
run: |
if [[ $TEST_CONFIG == 'multigpu' ]]; then
TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
else
TEST_COMMAND=.jenkins/pytorch/test.sh
fi
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
# detached container should get cleaned up by teardown_ec2_linux
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
container_name=$(docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
-e PR_NUMBER \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e GITHUB_ACTIONS \
-e IN_CI \
-e IS_GHA \
-e CIRCLE_BRANCH \
-e CIRCLE_SHA1 \
-e CIRCLE_PR_NUMBER \
-e AWS_DEFAULT_REGION \
-e IN_WHEEL_TEST \
-e SHARD_NUMBER \
-e JOB_BASE_NAME \
-e TEST_CONFIG \
-e NUM_TEST_SHARDS \
-e PYTORCH_IGNORE_DISABLED_ISSUES \
-e PR_LABELS \
-e CONTINUE_THROUGH_ERROR \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
--name="${container_name}" \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Install render_test_results dependencies
if: always()
shell: bash
run: |
python3 -m pip install junitparser==2.1.1 rich==10.9.0
- name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
if: always()
shell: bash
# Encoding is weird on windows, just try to default to utf-8 if possible
env:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Zip test reports for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
run: |
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: linux-xenial-cuda10.2-py3.6-gcc7-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af

View File

@ -0,0 +1,562 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: linux-xenial-cuda11.3-py3.6-gcc7
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: linux-xenial-cuda11.3-py3.6-gcc7
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: linux-xenial-cuda11.3-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-xenial-cuda11.3-py3.6-gcc7-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Archive artifacts into zip
run: |
zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
- uses: seemethere/upload-artifact-s3@v3
name: Store PyTorch Build Artifacts on S3
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
artifacts.zip
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
generate-test-matrix:
runs-on: ubuntu-18.04
needs: [ciflow_should_run]
env:
TEST_RUNNER_TYPE: linux.8xlarge.nvidia.gpu
ENABLE_DISTRIBUTED_TEST: 1
ENABLE_JIT_LEGACY_TEST: ''
ENABLE_MULTIGPU_TEST: ''
ENABLE_NOGPU_NO_AVX_TEST: ''
ENABLE_NOGPU_NO_AVX2_TEST: ''
ENABLE_SLOW_TEST: ''
ENABLE_DOCS_TEST: ''
ENABLE_BACKWARDS_COMPAT_TEST: ''
ENABLE_XLA_TEST: ''
ENABLE_NOARCH_TEST: ''
NUM_TEST_SHARDS: 2
MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
NOGPU_RUNNER_TYPE: linux.2xlarge
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-xenial-cuda11.3-py3.6-gcc7-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
run: |
bash .github/scripts/install_nvidia_utils_linux.sh
echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Output disk space left
run: |
sudo df -H
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Test
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
IS_GHA: 1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_DEFAULT_REGION: us-east-1
run: |
if [[ $TEST_CONFIG == 'multigpu' ]]; then
TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
else
TEST_COMMAND=.jenkins/pytorch/test.sh
fi
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
# detached container should get cleaned up by teardown_ec2_linux
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
container_name=$(docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
-e PR_NUMBER \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e GITHUB_ACTIONS \
-e IN_CI \
-e IS_GHA \
-e CIRCLE_BRANCH \
-e CIRCLE_SHA1 \
-e CIRCLE_PR_NUMBER \
-e AWS_DEFAULT_REGION \
-e IN_WHEEL_TEST \
-e SHARD_NUMBER \
-e JOB_BASE_NAME \
-e TEST_CONFIG \
-e NUM_TEST_SHARDS \
-e PYTORCH_IGNORE_DISABLED_ISSUES \
-e PR_LABELS \
-e CONTINUE_THROUGH_ERROR \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
--name="${container_name}" \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Install render_test_results dependencies
if: always()
shell: bash
run: |
python3 -m pip install junitparser==2.1.1 rich==10.9.0
- name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
if: always()
shell: bash
# Encoding is weird on windows, just try to default to utf-8 if possible
env:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Zip test reports for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
run: |
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: linux-xenial-cuda11.3-py3.6-gcc7-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af

View File

@ -0,0 +1,709 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: linux-xenial-py3.6-gcc5.4
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: linux-xenial-py3.6-gcc5.4
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: linux-xenial-py3.6-gcc5.4-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository_owner == 'pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-xenial-py3.6-gcc5.4-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Archive artifacts into zip
run: |
zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
- uses: seemethere/upload-artifact-s3@v3
name: Store PyTorch Build Artifacts on S3
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
artifacts.zip
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
generate-test-matrix:
runs-on: ubuntu-18.04
needs: [ciflow_should_run]
env:
TEST_RUNNER_TYPE: linux.2xlarge
ENABLE_DISTRIBUTED_TEST: 1
ENABLE_JIT_LEGACY_TEST: 1
ENABLE_MULTIGPU_TEST: ''
ENABLE_NOGPU_NO_AVX_TEST: ''
ENABLE_NOGPU_NO_AVX2_TEST: ''
ENABLE_SLOW_TEST: ''
ENABLE_DOCS_TEST: 1
ENABLE_BACKWARDS_COMPAT_TEST: 1
ENABLE_XLA_TEST: ''
ENABLE_NOARCH_TEST: ''
NUM_TEST_SHARDS: 2
MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
NOGPU_RUNNER_TYPE: linux.2xlarge
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-xenial-py3.6-gcc5.4-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
run: |
bash .github/scripts/install_nvidia_utils_linux.sh
echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Output disk space left
run: |
sudo df -H
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Test
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
IS_GHA: 1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_DEFAULT_REGION: us-east-1
run: |
if [[ $TEST_CONFIG == 'multigpu' ]]; then
TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
else
TEST_COMMAND=.jenkins/pytorch/test.sh
fi
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
# detached container should get cleaned up by teardown_ec2_linux
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
container_name=$(docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
-e PR_NUMBER \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e GITHUB_ACTIONS \
-e IN_CI \
-e IS_GHA \
-e CIRCLE_BRANCH \
-e CIRCLE_SHA1 \
-e CIRCLE_PR_NUMBER \
-e AWS_DEFAULT_REGION \
-e IN_WHEEL_TEST \
-e SHARD_NUMBER \
-e JOB_BASE_NAME \
-e TEST_CONFIG \
-e NUM_TEST_SHARDS \
-e PYTORCH_IGNORE_DISABLED_ISSUES \
-e PR_LABELS \
-e CONTINUE_THROUGH_ERROR \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
--name="${container_name}" \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Install render_test_results dependencies
if: always()
shell: bash
run: |
python3 -m pip install junitparser==2.1.1 rich==10.9.0
- name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
if: always()
shell: bash
# Encoding is weird on windows, just try to default to utf-8 if possible
env:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Zip test reports for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
run: |
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: linux-xenial-py3.6-gcc5.4-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
build-docs:
runs-on: linux.2xlarge
strategy:
matrix:
docs_type: [cpp, python]
needs: [calculate-docker-image, build, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
DOCS_TYPE: ${{ matrix.docs_type }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Build ${{ matrix.docs_type }} docs
run: |
set -ex
time docker pull "${DOCKER_IMAGE}" > /dev/null
echo "${GITHUB_REF}"
ref=${GITHUB_REF##*/}
target=${ref//v}
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e IN_CI \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e CIRCLE_SHA1="$GITHUB_SHA" \
-e DOCS_VERSION="${target}" \
-e DOCS_TYPE \
-e PR_LABELS \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" bash -c "sudo chown -R jenkins . && pip install dist/*.whl && ./.circleci/scripts/${DOCS_TYPE}_doc_push_script.sh"
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- uses: seemethere/upload-artifact-s3@v3
name: Upload Python Docs Preview
if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'python' }}
with:
retention-days: 14
s3-bucket: doc-previews
if-no-files-found: error
path: pytorch.github.io/docs/merge/
s3-prefix: pytorch/${{ github.event.pull_request.number }}
- uses: seemethere/upload-artifact-s3@v3
name: Upload C++ Docs Preview
if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'cpp' }}
with:
retention-days: 14
if-no-files-found: error
s3-bucket: doc-previews
path: cppdocs/
s3-prefix: pytorch/${{ github.event.pull_request.number }}/cppdocs
- name: Archive artifacts into zip
run: |
zip -r "docs_${DOCS_TYPE}.zip" "${GITHUB_WORKSPACE}/pytorch.github.io" "${GITHUB_WORKSPACE}/cppdocs"
- uses: actions/upload-artifact@v2
name: Store PyTorch Build Artifacts
with:
name: docs_${{ matrix.docs_type }}
path: docs_${{ matrix.docs_type }}.zip
if-no-files-found: error
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af

View File

@ -0,0 +1,367 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/bazel_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: linux-xenial-py3.6-gcc7-bazel-test
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: linux-xenial-py3.6-gcc7-bazel-test
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: linux-xenial-py3.6-gcc7-bazel-test-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/bazel') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
# building and testing in a single job since bazel runs only small subset of tests
build-and-test:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-xenial-py3.6-gcc7-bazel-test-build-and-test
NUM_TEST_SHARDS: 1
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- name: Output disk space left
run: |
sudo df -H
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e PR_LABELS \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && sudo chown -R jenkins /dev && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Test
run: |
# detached container should get cleaned up by teardown_ec2_linux
export SHARD_NUMBER=0
# TODO: Stop building test binaries as part of the build phase
# Make sure we copy test results from bazel-testlogs symlink to
# a regular directory ./test/test-reports
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e GITHUB_ACTIONS \
-e IN_CI \
-e SHARD_NUMBER \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e CONTINUE_THROUGH_ERROR \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && sudo chown -R jenkins /dev && .jenkins/pytorch/test.sh && cp -Lr ./bazel-testlogs ./test/test-reports'
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Zip test reports for upload
if: always()
env:
FILE_SUFFIX: 'bazel-${{ github.job }}'
run: |
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-bazel
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: linux-xenial-py3.6-gcc7-bazel-test-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af

View File

@ -0,0 +1,562 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: parallelnative-linux-xenial-py3.6-gcc5.4
on:
pull_request:
types: [unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: parallelnative-linux-xenial-py3.6-gcc5.4
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: parallelnative-linux-xenial-py3.6-gcc5.4-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: parallelnative-linux-xenial-py3.6-gcc5.4-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Archive artifacts into zip
run: |
zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
- uses: seemethere/upload-artifact-s3@v3
name: Store PyTorch Build Artifacts on S3
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
artifacts.zip
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
generate-test-matrix:
runs-on: ubuntu-18.04
needs: [ciflow_should_run]
env:
TEST_RUNNER_TYPE: linux.2xlarge
ENABLE_DISTRIBUTED_TEST: 1
ENABLE_JIT_LEGACY_TEST: ''
ENABLE_MULTIGPU_TEST: ''
ENABLE_NOGPU_NO_AVX_TEST: ''
ENABLE_NOGPU_NO_AVX2_TEST: ''
ENABLE_SLOW_TEST: ''
ENABLE_DOCS_TEST: ''
ENABLE_BACKWARDS_COMPAT_TEST: ''
ENABLE_XLA_TEST: ''
ENABLE_NOARCH_TEST: ''
NUM_TEST_SHARDS: 1
MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
NOGPU_RUNNER_TYPE: linux.2xlarge
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: parallelnative-linux-xenial-py3.6-gcc5.4-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
run: |
bash .github/scripts/install_nvidia_utils_linux.sh
echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Output disk space left
run: |
sudo df -H
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Test
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
IS_GHA: 1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_DEFAULT_REGION: us-east-1
run: |
if [[ $TEST_CONFIG == 'multigpu' ]]; then
TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
else
TEST_COMMAND=.jenkins/pytorch/test.sh
fi
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
# detached container should get cleaned up by teardown_ec2_linux
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
container_name=$(docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
-e PR_NUMBER \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e GITHUB_ACTIONS \
-e IN_CI \
-e IS_GHA \
-e CIRCLE_BRANCH \
-e CIRCLE_SHA1 \
-e CIRCLE_PR_NUMBER \
-e AWS_DEFAULT_REGION \
-e IN_WHEEL_TEST \
-e SHARD_NUMBER \
-e JOB_BASE_NAME \
-e TEST_CONFIG \
-e NUM_TEST_SHARDS \
-e PYTORCH_IGNORE_DISABLED_ISSUES \
-e PR_LABELS \
-e CONTINUE_THROUGH_ERROR \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
--name="${container_name}" \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Install render_test_results dependencies
if: always()
shell: bash
run: |
python3 -m pip install junitparser==2.1.1 rich==10.9.0
- name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
if: always()
shell: bash
# Encoding is weird on windows, just try to default to utf-8 if possible
env:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Zip test reports for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
run: |
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: parallelnative-linux-xenial-py3.6-gcc5.4-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af

View File

@ -0,0 +1,281 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7
on:
pull_request:
types: [unassigned]
schedule:
- cron: 45 0,4,8,12,16,20 * * *
workflow_dispatch:
env:
BUILD_ENVIRONMENT: periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/libtorch') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/scheduled'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af

View File

@ -0,0 +1,560 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: periodic-linux-xenial-cuda11.1-py3.6-gcc7
on:
pull_request:
types: [unassigned]
schedule:
- cron: 45 0,4,8,12,16,20 * * *
workflow_dispatch:
env:
BUILD_ENVIRONMENT: periodic-linux-xenial-cuda11.1-py3.6-gcc7
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: periodic-linux-xenial-cuda11.1-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/scheduled'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: periodic-linux-xenial-cuda11.1-py3.6-gcc7-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Archive artifacts into zip
run: |
zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
- uses: seemethere/upload-artifact-s3@v3
name: Store PyTorch Build Artifacts on S3
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
artifacts.zip
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
generate-test-matrix:
runs-on: ubuntu-18.04
needs: [ciflow_should_run]
env:
TEST_RUNNER_TYPE: linux.8xlarge.nvidia.gpu
ENABLE_DISTRIBUTED_TEST: 1
ENABLE_JIT_LEGACY_TEST: ''
ENABLE_MULTIGPU_TEST: ''
ENABLE_NOGPU_NO_AVX_TEST: ''
ENABLE_NOGPU_NO_AVX2_TEST: ''
ENABLE_SLOW_TEST: ''
ENABLE_DOCS_TEST: ''
ENABLE_BACKWARDS_COMPAT_TEST: ''
ENABLE_XLA_TEST: ''
ENABLE_NOARCH_TEST: ''
NUM_TEST_SHARDS: 2
MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
NOGPU_RUNNER_TYPE: linux.2xlarge
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: periodic-linux-xenial-cuda11.1-py3.6-gcc7-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
run: |
bash .github/scripts/install_nvidia_utils_linux.sh
echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Output disk space left
run: |
sudo df -H
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Test
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
IS_GHA: 1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_DEFAULT_REGION: us-east-1
run: |
if [[ $TEST_CONFIG == 'multigpu' ]]; then
TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
else
TEST_COMMAND=.jenkins/pytorch/test.sh
fi
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
# detached container should get cleaned up by teardown_ec2_linux
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
container_name=$(docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
-e PR_NUMBER \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e GITHUB_ACTIONS \
-e IN_CI \
-e IS_GHA \
-e CIRCLE_BRANCH \
-e CIRCLE_SHA1 \
-e CIRCLE_PR_NUMBER \
-e AWS_DEFAULT_REGION \
-e IN_WHEEL_TEST \
-e SHARD_NUMBER \
-e JOB_BASE_NAME \
-e TEST_CONFIG \
-e NUM_TEST_SHARDS \
-e PYTORCH_IGNORE_DISABLED_ISSUES \
-e PR_LABELS \
-e CONTINUE_THROUGH_ERROR \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
--name="${container_name}" \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Install render_test_results dependencies
if: always()
shell: bash
run: |
python3 -m pip install junitparser==2.1.1 rich==10.9.0
- name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
if: always()
shell: bash
# Encoding is weird on windows, just try to default to utf-8 if possible
env:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Zip test reports for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
run: |
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: periodic-linux-xenial-cuda11.1-py3.6-gcc7-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af

View File

@ -0,0 +1,314 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/windows_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: periodic-win-vs2019-cuda11.1-py3
on:
pull_request:
types: [unassigned]
schedule:
- cron: 45 0,4,8,12,16,20 * * *
workflow_dispatch:
env:
BUILD_ENVIRONMENT: periodic-win-vs2019-cuda11.1-py3
BUILD_WHEEL: 1
CUDA_VERSION: "11.1"
IN_CI: 1
INSTALL_WINDOWS_SDK: 1
PYTHON_VERSION: "3.8"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
SCCACHE_BUCKET: "ossci-compiler-cache"
VC_PRODUCT: "BuildTools"
VC_VERSION: ""
VS_VERSION: "16.8.6"
VC_YEAR: "2019"
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
no_proxy: localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock
TORCH_CUDA_ARCH_LIST: "7.0"
USE_CUDA: 1
concurrency:
group: periodic-win-vs2019-cuda11.1-py3-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/scheduled') || contains(github.event.pull_request.labels.*.name, 'ciflow/win'))) }}
steps:
- name: noop
run: echo running ciflow_should_run
build:
runs-on: "windows.4xlarge"
defaults:
run:
working-directory: pytorch-${{ github.run_id }}
needs: [ciflow_should_run]
env:
JOB_BASE_NAME: periodic-win-vs2019-cuda11.1-py3-build
http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"
https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"
steps:
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
submodules: recursive
path: pytorch-${{ github.run_id }}
# deep clone, to allow use of git merge-base
fetch-depth: 0
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Install Visual Studio 2019 toolchain
shell: powershell
run: |
.\.circleci\scripts\vs_install.ps1
- name: Install Cuda
shell: bash
run: |
.circleci/scripts/windows_cuda_install.sh
- name: Install Cudnn
shell: bash
run: |
.circleci/scripts/windows_cudnn_install.sh
- name: Build
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
run: |
.jenkins/pytorch/win-build.sh
# Upload to github so that people can click and download artifacts
- name: Upload artifacts to Github
if: always()
uses: actions/upload-artifact@v2
# Don't fail on upload to GH since it's only for user convenience
continue-on-error: true
with:
retention-days: 14
if-no-files-found: error
name: ${{ env.BUILD_ENVIRONMENT }}
path: C:\${{ github.run_id }}\build-results
- name: Upload artifacts to s3
if: always()
uses: seemethere/upload-artifact-s3@v3
with:
retention-days: 14
if-no-files-found: error
name: ${{ env.BUILD_ENVIRONMENT }}
path: C:\${{ github.run_id }}\build-results
- name: Wait until all sessions have drained
shell: powershell
if: always()
timeout-minutes: 120
run: |
.github\scripts\wait_for_ssh_to_drain.ps1
- name: Kill active ssh sessions if still around (Useful if workflow was cancelled)
shell: powershell
if: always()
run: |
.github\scripts\kill_active_ssh_sessions.ps1
- name: Cleanup build-results and workspaces
if: always()
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
# Should remove the entirety of pytorch-${{ github.run_id }}
run: |
rm -rf "${PYTORCH_FINAL_PACKAGE_DIR}"
rm -rf ./*
generate-test-matrix:
needs: [ciflow_should_run]
runs-on: ubuntu-18.04
env:
TEST_RUNNER_TYPE: windows.8xlarge.nvidia.gpu
NUM_TEST_SHARDS: 2
NUM_TEST_SHARDS_ON_PULL_REQUEST: 2
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
env:
JOB_BASE_NAME: periodic-win-vs2019-cuda11.1-py3-test
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
TEST_CONFIG: ${{ matrix.config }}
http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"
https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"
RUN_SMOKE_TESTS_ONLY_ON_PR: False
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
needs: [build, generate-test-matrix, ciflow_should_run]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
defaults:
run:
working-directory: pytorch-${{ github.run_id }}
steps:
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
submodules: recursive
path: pytorch-${{ github.run_id }}
# deep clone, to allow use of git merge-base
fetch-depth: 0
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Install Visual Studio 2019 toolchain
shell: powershell
run: |
.\.circleci\scripts\vs_install.ps1
- name: Install Cuda
shell: bash
run: |
.circleci/scripts/windows_cuda_install.sh
- name: Install Cudnn
shell: bash
run: |
.circleci/scripts/windows_cudnn_install.sh
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
path: C:\${{ github.run_id }}\build-results
- name: Check build-results folder
shell: powershell
run: |
tree /F C:\$Env:GITHUB_RUN_ID\build-results
# Needed for coverage in win-test.sh
- uses: actions/setup-python@v2
name: Setup Python3
with:
python-version: '3.x'
- name: Test
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
run: |
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
if [[ -n $GITHUB_HEAD_REF && "$RUN_SMOKE_TESTS_ONLY_ON_PR" == "true" ]]; then
export RUN_SMOKE_TESTS_ONLY=1
fi
.jenkins/pytorch/win-test.sh
- name: Zip test reports for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
shell: powershell
run: |
# -ir => recursive include all files in pattern
7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
pytorch-${{ github.run_id }}/test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
pytorch-${{ github.run_id }}/test-reports-*.zip
- name: Install render_test_results dependencies
if: always()
shell: bash
run: |
python3 -m pip install junitparser==2.1.1 rich==10.9.0
- name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
if: always()
shell: bash
# Encoding is weird on windows, just try to default to utf-8 if possible
env:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Wait until all sessions have drained
shell: powershell
if: always()
timeout-minutes: 120
run: |
.github\scripts\wait_for_ssh_to_drain.ps1
- name: Kill active ssh sessions if still around (Useful if workflow was cancelled)
shell: powershell
if: always()
run: |
.github\scripts\kill_active_ssh_sessions.ps1
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: periodic-win-vs2019-cuda11.1-py3-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Cleanup workspace
if: always()
shell: bash
# Should remove the entirety of pytorch-${{ github.run_id }}
run: |
rm -rf ./*

View File

@ -1,10 +1,11 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: Linux CI (pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7)
name: puretorch-linux-xenial-py3.6-gcc5.4
on:
# TODO: Enable pull_request builds when we can verify capacity can be met by auto-scalers
pull_request:
types: [unassigned]
push:
branches:
- master
@ -12,42 +13,92 @@ on:
workflow_dispatch:
env:
BUILD_ENVIRONMENT: pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7
BUILD_ENVIRONMENT: puretorch-linux-xenial-py3.6-gcc5.4
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}
group: puretorch-linux-xenial-py3.6-gcc5.4-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: actions/checkout@v2
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
- name: Calculate docker image tag
id: calculate-tag
run: |
@ -87,7 +138,7 @@ jobs:
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: steps.check.outputs.rebuild
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
@ -97,83 +148,115 @@ jobs:
build:
runs-on: linux.2xlarge
needs: calculate-docker-image
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: puretorch-linux-xenial-py3.6-gcc5.4-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Checkout PyTorch
uses: actions/checkout@v2
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
fetch-depth: 0 # deep clone, to allow sharding to use git rev-list
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Preserve github env variables for use in docker
- name: Build
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Build PyTorch
run: |
docker run \
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}" \
sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/print_test_stats.py to natively support GitHub Actions
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: ${{ github.run_id }} # dunno if this corresponds
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
export PYTHONPATH=$PWD
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests
python3 .circleci/scripts/upload_binary_size_to_scuba.py || exit 0
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Archive artifacts into zip
run: |
zip -r artifacts.zip dist/ build/
# Upload to github so that people can click and download artifacts
- uses: actions/upload-artifact@v2
# Don't fail on upload to GH since it's only for user convenience
continue-on-error: true
name: Store PyTorch Build Artifacts on Github
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
artifacts.zip
- uses: seemethere/upload-artifact-s3@9d7ceb0ab39c2c88d93ef7792b27425b27d59162
zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
- uses: seemethere/upload-artifact-s3@v3
name: Store PyTorch Build Artifacts on S3
with:
name: ${{ env.BUILD_ENVIRONMENT }}
@ -181,158 +264,31 @@ jobs:
if-no-files-found: error
path:
artifacts.zip
- name: Clean up docker images
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: |
# Prune all of the docker images
docker system prune -af
test:
runs-on: linux.8xlarge.nvidia.gpu
needs:
- calculate-docker-image
- build
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
steps:
- name: Log in to ECR
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)/../":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Checkout PyTorch
uses: actions/checkout@v2
with:
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') }}
run: |
bash .github/scripts/install_nvidia_utils_linux.sh
echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Output disk space left
run: |
sudo df -H
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Test PyTorch
run: |
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e GITHUB_ACTIONS \
-e IN_CI \
-e MAX_JOBS="$(nproc --ignore=2)" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}" \
sh -c 'sudo chown -R jenkins . && pip install dist/*.whl && .jenkins/pytorch/test.sh'
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- uses: actions/upload-artifact@v2
name: Store PyTorch Test Reports
if: always()
with:
name: test-reports
retention-days: 14
if-no-files-found: error
path:
test/**/*.xml
- name: Clean up docker images
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
# Prune all of the docker images
docker system prune -af
# this is a separate step from test because the log files from test are too
# long: basically, GitHub tries to render all of the log files when you click
# through an action causing extreme slowdown on actions that contain too many
# logs (like test); we can always move it back to the other one, but it
# doesn't create the best experience
render_test_results:
if: always()
needs:
- test
runs-on: ubuntu-18.04
steps:
- name: Checkout PyTorch
uses: actions/checkout@v2
with:
# deep clone, to allow tools/print_test_stats.py to use Git commands
fetch-depth: 0
- uses: actions/download-artifact@v2
name: Download PyTorch Test Reports
with:
name: test-reports
path: test/test-reports
- uses: actions/setup-python@v2
with:
python-version: 3.9
- name: Install dependencies
# boto3 version copied from .circleci/docker/common/install_conda.sh
run: |
pip install -r requirements.txt
pip install boto3==1.16.34 junitparser rich
- name: Output Test Results (Click Me)
run: |
python tools/render_junit.py test
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload test statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/print_test_stats.py to natively support GitHub Actions
env:
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_SECRET_ACCESS_KEY }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_JOB: pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: ${{ github.run_id }} # dunno if this corresponds
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
export PYTHONPATH=$PWD
python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af

298
.github/workflows/generated-win-vs2019-cpu-py3.yml generated vendored Normal file
View File

@ -0,0 +1,298 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/windows_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: win-vs2019-cpu-py3
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: win-vs2019-cpu-py3
BUILD_WHEEL: 1
CUDA_VERSION: "cpu"
IN_CI: 1
INSTALL_WINDOWS_SDK: 1
PYTHON_VERSION: "3.8"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
SCCACHE_BUCKET: "ossci-compiler-cache"
VC_PRODUCT: "BuildTools"
VC_VERSION: ""
VS_VERSION: "16.8.6"
VC_YEAR: "2019"
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
no_proxy: localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock
concurrency:
group: win-vs2019-cpu-py3-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository_owner == 'pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/win'))) }}
steps:
- name: noop
run: echo running ciflow_should_run
build:
runs-on: "windows.4xlarge"
defaults:
run:
working-directory: pytorch-${{ github.run_id }}
needs: [ciflow_should_run]
env:
JOB_BASE_NAME: win-vs2019-cpu-py3-build
http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"
https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"
steps:
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
submodules: recursive
path: pytorch-${{ github.run_id }}
# deep clone, to allow use of git merge-base
fetch-depth: 0
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Install Visual Studio 2019 toolchain
shell: powershell
run: |
.\.circleci\scripts\vs_install.ps1
- name: Build
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
run: |
.jenkins/pytorch/win-build.sh
# Upload to github so that people can click and download artifacts
- name: Upload artifacts to Github
if: always()
uses: actions/upload-artifact@v2
# Don't fail on upload to GH since it's only for user convenience
continue-on-error: true
with:
retention-days: 14
if-no-files-found: error
name: ${{ env.BUILD_ENVIRONMENT }}
path: C:\${{ github.run_id }}\build-results
- name: Upload artifacts to s3
if: always()
uses: seemethere/upload-artifact-s3@v3
with:
retention-days: 14
if-no-files-found: error
name: ${{ env.BUILD_ENVIRONMENT }}
path: C:\${{ github.run_id }}\build-results
- name: Wait until all sessions have drained
shell: powershell
if: always()
timeout-minutes: 120
run: |
.github\scripts\wait_for_ssh_to_drain.ps1
- name: Kill active ssh sessions if still around (Useful if workflow was cancelled)
shell: powershell
if: always()
run: |
.github\scripts\kill_active_ssh_sessions.ps1
- name: Cleanup build-results and workspaces
if: always()
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
# Should remove the entirety of pytorch-${{ github.run_id }}
run: |
rm -rf "${PYTORCH_FINAL_PACKAGE_DIR}"
rm -rf ./*
generate-test-matrix:
needs: [ciflow_should_run]
runs-on: ubuntu-18.04
env:
TEST_RUNNER_TYPE: windows.4xlarge
NUM_TEST_SHARDS: 2
NUM_TEST_SHARDS_ON_PULL_REQUEST: 2
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
env:
JOB_BASE_NAME: win-vs2019-cpu-py3-test
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
TEST_CONFIG: ${{ matrix.config }}
http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"
https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"
RUN_SMOKE_TESTS_ONLY_ON_PR: False
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
needs: [build, generate-test-matrix, ciflow_should_run]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
defaults:
run:
working-directory: pytorch-${{ github.run_id }}
steps:
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
submodules: recursive
path: pytorch-${{ github.run_id }}
# deep clone, to allow use of git merge-base
fetch-depth: 0
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Install Visual Studio 2019 toolchain
shell: powershell
run: |
.\.circleci\scripts\vs_install.ps1
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
path: C:\${{ github.run_id }}\build-results
- name: Check build-results folder
shell: powershell
run: |
tree /F C:\$Env:GITHUB_RUN_ID\build-results
# Needed for coverage in win-test.sh
- uses: actions/setup-python@v2
name: Setup Python3
with:
python-version: '3.x'
- name: Test
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
run: |
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
if [[ -n $GITHUB_HEAD_REF && "$RUN_SMOKE_TESTS_ONLY_ON_PR" == "true" ]]; then
export RUN_SMOKE_TESTS_ONLY=1
fi
.jenkins/pytorch/win-test.sh
- name: Zip test reports for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
shell: powershell
run: |
# -ir => recursive include all files in pattern
7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
pytorch-${{ github.run_id }}/test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
pytorch-${{ github.run_id }}/test-reports-*.zip
- name: Install render_test_results dependencies
if: always()
shell: bash
run: |
python3 -m pip install junitparser==2.1.1 rich==10.9.0
- name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
if: always()
shell: bash
# Encoding is weird on windows, just try to default to utf-8 if possible
env:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Wait until all sessions have drained
shell: powershell
if: always()
timeout-minutes: 120
run: |
.github\scripts\wait_for_ssh_to_drain.ps1
- name: Kill active ssh sessions if still around (Useful if workflow was cancelled)
shell: powershell
if: always()
run: |
.github\scripts\kill_active_ssh_sessions.ps1
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: win-vs2019-cpu-py3-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Cleanup workspace
if: always()
shell: bash
# Should remove the entirety of pytorch-${{ github.run_id }}
run: |
rm -rf ./*

316
.github/workflows/generated-win-vs2019-cuda10.2-py3.yml generated vendored Normal file
View File

@ -0,0 +1,316 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/windows_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: win-vs2019-cuda10.2-py3
on:
pull_request:
types: [unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: win-vs2019-cuda10.2-py3
BUILD_WHEEL: 1
CUDA_VERSION: "10.2"
IN_CI: 1
INSTALL_WINDOWS_SDK: 1
PYTHON_VERSION: "3.8"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
SCCACHE_BUCKET: "ossci-compiler-cache"
VC_PRODUCT: "BuildTools"
VC_VERSION: ""
VS_VERSION: "16.8.6"
VC_YEAR: "2019"
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
no_proxy: localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock
TORCH_CUDA_ARCH_LIST: "7.0"
USE_CUDA: 1
concurrency:
group: win-vs2019-cuda10.2-py3-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/win'))) }}
steps:
- name: noop
run: echo running ciflow_should_run
build:
runs-on: "windows.4xlarge"
defaults:
run:
working-directory: pytorch-${{ github.run_id }}
needs: [ciflow_should_run]
env:
JOB_BASE_NAME: win-vs2019-cuda10.2-py3-build
http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"
https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"
steps:
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
submodules: recursive
path: pytorch-${{ github.run_id }}
# deep clone, to allow use of git merge-base
fetch-depth: 0
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Install Visual Studio 2019 toolchain
shell: powershell
run: |
.\.circleci\scripts\vs_install.ps1
- name: Install Cuda
shell: bash
run: |
.circleci/scripts/windows_cuda_install.sh
- name: Install Cudnn
shell: bash
run: |
.circleci/scripts/windows_cudnn_install.sh
- name: Build
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
run: |
.jenkins/pytorch/win-build.sh
# Upload to github so that people can click and download artifacts
- name: Upload artifacts to Github
if: always()
uses: actions/upload-artifact@v2
# Don't fail on upload to GH since it's only for user convenience
continue-on-error: true
with:
retention-days: 14
if-no-files-found: error
name: ${{ env.BUILD_ENVIRONMENT }}
path: C:\${{ github.run_id }}\build-results
- name: Upload artifacts to s3
if: always()
uses: seemethere/upload-artifact-s3@v3
with:
retention-days: 14
if-no-files-found: error
name: ${{ env.BUILD_ENVIRONMENT }}
path: C:\${{ github.run_id }}\build-results
- name: Wait until all sessions have drained
shell: powershell
if: always()
timeout-minutes: 120
run: |
.github\scripts\wait_for_ssh_to_drain.ps1
- name: Kill active ssh sessions if still around (Useful if workflow was cancelled)
shell: powershell
if: always()
run: |
.github\scripts\kill_active_ssh_sessions.ps1
- name: Cleanup build-results and workspaces
if: always()
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
# Should remove the entirety of pytorch-${{ github.run_id }}
run: |
rm -rf "${PYTORCH_FINAL_PACKAGE_DIR}"
rm -rf ./*
generate-test-matrix:
needs: [ciflow_should_run]
runs-on: ubuntu-18.04
env:
TEST_RUNNER_TYPE: windows.8xlarge.nvidia.gpu
NUM_TEST_SHARDS: 2
NUM_TEST_SHARDS_ON_PULL_REQUEST: 2
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
env:
JOB_BASE_NAME: win-vs2019-cuda10.2-py3-test
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
TEST_CONFIG: ${{ matrix.config }}
http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"
https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"
RUN_SMOKE_TESTS_ONLY_ON_PR: False
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
needs: [build, generate-test-matrix, ciflow_should_run]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
defaults:
run:
working-directory: pytorch-${{ github.run_id }}
steps:
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
submodules: recursive
path: pytorch-${{ github.run_id }}
# deep clone, to allow use of git merge-base
fetch-depth: 0
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Install Visual Studio 2019 toolchain
shell: powershell
run: |
.\.circleci\scripts\vs_install.ps1
- name: Install Cuda
shell: bash
run: |
.circleci/scripts/windows_cuda_install.sh
- name: Install Cudnn
shell: bash
run: |
.circleci/scripts/windows_cudnn_install.sh
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
path: C:\${{ github.run_id }}\build-results
- name: Check build-results folder
shell: powershell
run: |
tree /F C:\$Env:GITHUB_RUN_ID\build-results
# Needed for coverage in win-test.sh
- uses: actions/setup-python@v2
name: Setup Python3
with:
python-version: '3.x'
- name: Test
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
run: |
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
if [[ -n $GITHUB_HEAD_REF && "$RUN_SMOKE_TESTS_ONLY_ON_PR" == "true" ]]; then
export RUN_SMOKE_TESTS_ONLY=1
fi
.jenkins/pytorch/win-test.sh
- name: Zip test reports for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
shell: powershell
run: |
# -ir => recursive include all files in pattern
7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
pytorch-${{ github.run_id }}/test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
pytorch-${{ github.run_id }}/test-reports-*.zip
- name: Install render_test_results dependencies
if: always()
shell: bash
run: |
python3 -m pip install junitparser==2.1.1 rich==10.9.0
- name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
if: always()
shell: bash
# Encoding is weird on windows, just try to default to utf-8 if possible
env:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Wait until all sessions have drained
shell: powershell
if: always()
timeout-minutes: 120
run: |
.github\scripts\wait_for_ssh_to_drain.ps1
- name: Kill active ssh sessions if still around (Useful if workflow was cancelled)
shell: powershell
if: always()
run: |
.github\scripts\kill_active_ssh_sessions.ps1
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: win-vs2019-cuda10.2-py3-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Cleanup workspace
if: always()
shell: bash
# Should remove the entirety of pytorch-${{ github.run_id }}
run: |
rm -rf ./*

316
.github/workflows/generated-win-vs2019-cuda11.3-py3.yml generated vendored Normal file
View File

@ -0,0 +1,316 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/windows_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: win-vs2019-cuda11.3-py3
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: win-vs2019-cuda11.3-py3
BUILD_WHEEL: 1
CUDA_VERSION: "11.3"
IN_CI: 1
INSTALL_WINDOWS_SDK: 1
PYTHON_VERSION: "3.8"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
SCCACHE_BUCKET: "ossci-compiler-cache"
VC_PRODUCT: "BuildTools"
VC_VERSION: ""
VS_VERSION: "16.8.6"
VC_YEAR: "2019"
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
no_proxy: localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock
TORCH_CUDA_ARCH_LIST: "7.0"
USE_CUDA: 1
concurrency:
group: win-vs2019-cuda11.3-py3-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository_owner == 'pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/win'))) }}
steps:
- name: noop
run: echo running ciflow_should_run
build:
runs-on: "windows.4xlarge"
defaults:
run:
working-directory: pytorch-${{ github.run_id }}
needs: [ciflow_should_run]
env:
JOB_BASE_NAME: win-vs2019-cuda11.3-py3-build
http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"
https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"
steps:
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
submodules: recursive
path: pytorch-${{ github.run_id }}
# deep clone, to allow use of git merge-base
fetch-depth: 0
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Install Visual Studio 2019 toolchain
shell: powershell
run: |
.\.circleci\scripts\vs_install.ps1
- name: Install Cuda
shell: bash
run: |
.circleci/scripts/windows_cuda_install.sh
- name: Install Cudnn
shell: bash
run: |
.circleci/scripts/windows_cudnn_install.sh
- name: Build
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
run: |
.jenkins/pytorch/win-build.sh
# Upload to github so that people can click and download artifacts
- name: Upload artifacts to Github
if: always()
uses: actions/upload-artifact@v2
# Don't fail on upload to GH since it's only for user convenience
continue-on-error: true
with:
retention-days: 14
if-no-files-found: error
name: ${{ env.BUILD_ENVIRONMENT }}
path: C:\${{ github.run_id }}\build-results
- name: Upload artifacts to s3
if: always()
uses: seemethere/upload-artifact-s3@v3
with:
retention-days: 14
if-no-files-found: error
name: ${{ env.BUILD_ENVIRONMENT }}
path: C:\${{ github.run_id }}\build-results
- name: Wait until all sessions have drained
shell: powershell
if: always()
timeout-minutes: 120
run: |
.github\scripts\wait_for_ssh_to_drain.ps1
- name: Kill active ssh sessions if still around (Useful if workflow was cancelled)
shell: powershell
if: always()
run: |
.github\scripts\kill_active_ssh_sessions.ps1
- name: Cleanup build-results and workspaces
if: always()
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
# Should remove the entirety of pytorch-${{ github.run_id }}
run: |
rm -rf "${PYTORCH_FINAL_PACKAGE_DIR}"
rm -rf ./*
generate-test-matrix:
needs: [ciflow_should_run]
runs-on: ubuntu-18.04
env:
TEST_RUNNER_TYPE: windows.8xlarge.nvidia.gpu
NUM_TEST_SHARDS: 2
NUM_TEST_SHARDS_ON_PULL_REQUEST: 1
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
env:
JOB_BASE_NAME: win-vs2019-cuda11.3-py3-test
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
TEST_CONFIG: ${{ matrix.config }}
http_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"
https_proxy: "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128"
RUN_SMOKE_TESTS_ONLY_ON_PR: True
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
needs: [build, generate-test-matrix, ciflow_should_run]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
defaults:
run:
working-directory: pytorch-${{ github.run_id }}
steps:
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
submodules: recursive
path: pytorch-${{ github.run_id }}
# deep clone, to allow use of git merge-base
fetch-depth: 0
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Install Visual Studio 2019 toolchain
shell: powershell
run: |
.\.circleci\scripts\vs_install.ps1
- name: Install Cuda
shell: bash
run: |
.circleci/scripts/windows_cuda_install.sh
- name: Install Cudnn
shell: bash
run: |
.circleci/scripts/windows_cudnn_install.sh
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
path: C:\${{ github.run_id }}\build-results
- name: Check build-results folder
shell: powershell
run: |
tree /F C:\$Env:GITHUB_RUN_ID\build-results
# Needed for coverage in win-test.sh
- uses: actions/setup-python@v2
name: Setup Python3
with:
python-version: '3.x'
- name: Test
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
run: |
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
if [[ -n $GITHUB_HEAD_REF && "$RUN_SMOKE_TESTS_ONLY_ON_PR" == "true" ]]; then
export RUN_SMOKE_TESTS_ONLY=1
fi
.jenkins/pytorch/win-test.sh
- name: Zip test reports for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
shell: powershell
run: |
# -ir => recursive include all files in pattern
7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
pytorch-${{ github.run_id }}/test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
pytorch-${{ github.run_id }}/test-reports-*.zip
- name: Install render_test_results dependencies
if: always()
shell: bash
run: |
python3 -m pip install junitparser==2.1.1 rich==10.9.0
- name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
if: always()
shell: bash
# Encoding is weird on windows, just try to default to utf-8 if possible
env:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Wait until all sessions have drained
shell: powershell
if: always()
timeout-minutes: 120
run: |
.github\scripts\wait_for_ssh_to_drain.ps1
- name: Kill active ssh sessions if still around (Useful if workflow was cancelled)
shell: powershell
if: always()
run: |
.github\scripts\kill_active_ssh_sessions.ps1
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: win-vs2019-cuda11.3-py3-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Cleanup workspace
if: always()
shell: bash
# Should remove the entirety of pytorch-${{ github.run_id }}
run: |
rm -rf ./*

Some files were not shown because too many files have changed in this diff Show More