Compare commits

..

2395 Commits

Author SHA1 Message Date
103fc5f9a5 Remove unused variable (#70261)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70261

ghstack-source-id: 146310591

Test Plan:
```
buck test  fbsource//xplat/caffe2:for_each_prod_ptl_model_test
```

{gif:p014gzft}

Reviewed By: iseeyuan

Differential Revision: D33265656

fbshipit-source-id: 6e303ee304064a61383ba2ae34f2e21077ec9db3
2021-12-28 22:21:29 -08:00
066c9ff08f Deprecating python 3.6 (#70325)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/70457

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70325

Reviewed By: seemethere

Differential Revision: D33339496

Pulled By: atalman

fbshipit-source-id: 7509cab4f7469dae234bcf3f79e0aabb54577b8a
2021-12-28 18:44:59 -08:00
a0c99a8d3b [Operator Verioning][Edge] Update upgrader codegen with latest change (#70293)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70293

```
python /Users/chenlai/pytorch/tools/codegen/operator_versions/gen_mobile_upgraders.py

```
https://github.com/pytorch/pytorch/pull/70161 is landed to resolve a thread safety issue. Accordingly, the upgrader codegen needs to be updated.
ghstack-source-id: 146296324

Test Plan:
```
buck test mode/opt //caffe2/test:upgrader_codegen
buck run mode/opt //caffe2/torch/fb/mobile/upgrader_codegen:upgrader_codegen
python /Users/chenlai/pytorch/tools/codegen/operator_versions/gen_mobile_upgraders.py

```

Reviewed By: iseeyuan

Differential Revision: D33274831

fbshipit-source-id: 0e1d2a81edc9b6111f3c6127dbd5b97e16c93dca
2021-12-28 18:34:31 -08:00
a6eadf9b50 Remove backward op for slow 3d convolution (#69978)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69978

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D33131003

Pulled By: jbschlosser

fbshipit-source-id: 097440b2eb501c1eeeb8a666d4bc3508fc5d0cfa
2021-12-28 16:19:23 -08:00
5e113eb24d .github: Add linux.4xlarge executor (#70474)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70474

Needed to compile linux wheels for CUDA 11.x since we were OOM'ing with
16GB of RAM

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: atalman

Differential Revision: D33343322

Pulled By: seemethere

fbshipit-source-id: 9f62e07ce2ca229fa25285429c01dc074d63b388
2021-12-28 15:40:28 -08:00
0fb73035f7 [Bootcamp Task] Replace string concatenation by fmt::format (#70366)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/69979

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70366

Reviewed By: H-Huang

Differential Revision: D33339291

Pulled By: LynneD

fbshipit-source-id: e4e0535cd2db8e9fa8b0875d17a900be58384367
2021-12-28 14:15:21 -08:00
e96dda15e5 Remove backward op for slow 2d transposed convolution (#70333)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70333

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D33301402

Pulled By: jbschlosser

fbshipit-source-id: 3cfb3165589fe1620f22479b05139676d20dc493
2021-12-28 12:38:59 -08:00
c732a26e59 Add macro to register CPU kernel for all arch types (#70332)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70332

Idea to avoid recompilations: what if we introduce a new macro REGISTER_ALL_CPU_DISPATCH that registers the same kernel across all CPU arch types? We'd call this from native/Convolution*.cpp and wouldn't need to move any logic underneath the native/cpu dir. That would simplify these PRs quite a bit and would also avoid the recompilation. Wdyt about this approach?

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D33301403

Pulled By: jbschlosser

fbshipit-source-id: d7cc163d4fe23c35c93e512d1f0a8af8c9897933
2021-12-28 12:37:36 -08:00
244730eeea .github: Add needs build for generate-test-matrix (#70456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70456

This job was still running on workflows despite ciflow not being enabled

This makes it so that test matrix generation only occurs before tests
are actually run.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: atalman

Differential Revision: D33338946

Pulled By: seemethere

fbshipit-source-id: 4b83d5fe6572771807708764609a72c4f1c5745d
2021-12-28 10:11:34 -08:00
4ed02748be fix typo in the docs of multiprocessing (#70448)
Summary:
Fix typo in the docs of multiprocessing.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70448

Reviewed By: gchanan

Differential Revision: D33336962

Pulled By: H-Huang

fbshipit-source-id: 1235703b8ddc26c33dcbc34bd25ac36b11a18923
2021-12-28 09:58:47 -08:00
73b5b6792f Adds reduction args to signature of F.multilabel_soft_margin_loss docs (#70420)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/70301

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70420

Reviewed By: gchanan

Differential Revision: D33336924

Pulled By: jbschlosser

fbshipit-source-id: 18189611b3fc1738900312efe521884bced42666
2021-12-28 09:48:05 -08:00
6f83841582 .github: Temporarily disable xla test config (#70453)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70453

Removes the current xla config, downstream `pytorch/xla` is broken for
clang compilation so temporarily removing this config until the xla team
can fix this upstream CI.

Context: https://github.com/pytorch/xla/pull/3255/files#r775980035

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: zengk95

Differential Revision: D33338463

Pulled By: seemethere

fbshipit-source-id: 1ef332c685d5e2cc7e2eb038e93bd656847fd099
2021-12-28 08:49:01 -08:00
15f14ce0dc fix typo in adam docs (#70387)
Summary:
Fix the typo in [adam docs in master branch](https://pytorch.org/docs/master/generated/torch.optim.Adam.html#torch.optim.Adam)

![image](https://user-images.githubusercontent.com/41060790/147345284-37e180d1-fd06-4a62-9c79-2d17b8aa5cd3.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70387

Reviewed By: H-Huang

Differential Revision: D33309283

Pulled By: albanD

fbshipit-source-id: d20c5d8f2498ac64013f71e202a6b50dcc069f2b
2021-12-28 07:35:39 -08:00
574dbb584d quant tests: fix log spew for HistogramObserver (#70107)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70107

Histogram observer used floor division on tensors, which is a deprecated
behavior.  There was a warning printed:

```
/Users/vasiliy/pytorch/torch/ao/quantization/observer.py:905: UserWarning: __floordiv__ is deprecated, and i
ts behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' funct
ion NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use
torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='flo
or').
```

This PR fixes the warning.

Test Plan:
```
python test/test_quantization.py TestHistogramObserver
```

Reviewed By: ejguan

Differential Revision: D33187926

Pulled By: vkuzo

fbshipit-source-id: 9c37de4c6d6193bee9047b6a28ff37ee1b019753
2021-12-28 06:27:51 -08:00
00df885d4e quant tests: clean up logs about incorrect tensor copy (#70106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70106

Some of quantization tests had log spew like

```
UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
```

This PR cleans up the root cause from the utils. Some other
tests may still hit this warning from other places

Test Plan:
```
python test/test_quantization.py TestFakeQuantizeOps
```

this particular warning no longer appears

Reviewed By: soulitzer

Differential Revision: D33187925

Pulled By: vkuzo

fbshipit-source-id: bd1acd77fd72a10dad0c254f9f9f32e513c8a89a
2021-12-28 06:26:40 -08:00
b7b32b56f1 Revert D33281300: Prevent sum overflow in broadcast_object_list
Test Plan: revert-hammer

Differential Revision:
D33281300 (807f9a828c)

Original commit changeset: 1bc83e8624ed

Original Phabricator Diff: D33281300 (807f9a828c)

fbshipit-source-id: beb81a9cbfba405a61b11dfaa8e39c9601f45643
2021-12-27 19:01:53 -08:00
807f9a828c Prevent sum overflow in broadcast_object_list (#70336)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70336

broadcast_object_list casted the sum of all object lengths to int from long causing overflows.

Test Plan:
Increased size of Tensor used in object transfers to have  >2GB storage requirement (in distributed_test.py)

Without fix the length will overflow and the program will request a negative sized Tensor:
```
RuntimeError: Trying to create tensor with negative dimension -2147482417: [-2147482417]
```
With fix it will pass the test.

Test used on server with GPUs:

buck test  mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn --local -- broadcast_object

Differential Revision: D33281300

fbshipit-source-id: 1bc83e8624edc14e747eeced7bc8a7a10e443ee4
2021-12-27 16:17:53 -08:00
5a9ea9e386 Automated submodule update: tensorpipe (#70438)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe).

New submodule commit: 52791a2fd2

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70438

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: zertosh

Differential Revision: D33331758

fbshipit-source-id: 1e811ddc30e9afa440523c6cb5c4e893eb560978
2021-12-27 15:19:21 -08:00
bf610f08b0 Back out "Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions"
Summary: as title

Test Plan:
```
buck run mode/opt-split-dwarf -c=python.package_style=inplace //ai_infra/distributed_ai/pyper_test_framework/templates:pyper_release_v2 -- --model inline_cvr_post_imp_deterministic_shrunk_pyper_release_v2 --cluster TSCTestCluster --hpc_identity oncall_pyper_oncall --stage prod_offline_training --test_module training_platform
...
############## Start inline_cvr_post_imp_model Test Results Analysis ##############
I1226 22:03:56.789000 3346280 test_driver.py:139  UNKNOWN     ] Test finished in 808.2743511786684 seconds.
+-------------------------+---------+------------------------+-----------------+
| Test Case               | Status  | Message                | Model Entity ID |
+-------------------------+---------+------------------------+-----------------+
| SmallWorld_release_test | Success | finished successfully. | 987987491       |
+-------------------------+---------+------------------------+-----------------+
I1226 22:03:56.790000 3346280 test_driver.py:143  UNKNOWN     ] test_run_id: 3d085f61-28d1-411d-bd27-940ea2554b23 use this id to find your run in scuba pyper_test_framework
I1226 22:03:56.792000 3346280 test_driver.py:160  UNKNOWN     ] Calling cleanup
I1226 22:03:56.792000 3346280 training_platform_test_launcher.py:385  UNKNOWN     ] Stopping launched jobs 1
I1226 22:03:59.563122 3346280 ClientSingletonManager.cpp:100] Shutting down Manifold ClientSingletonManager
```

Reviewed By: seemethere

Differential Revision: D33325936

fbshipit-source-id: 64414bf7061ad77e8ac12eb8abafee4043e0fa1e
2021-12-27 09:11:46 -08:00
4ae71c8d34 Add graph op replacement pass (#69915)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69915

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D33198158

Pulled By: tugsbayasgalan

fbshipit-source-id: f2b924edf9959aaf51f97db994fae031fa062cf8
2021-12-25 13:03:19 -08:00
63e58d262a Extend Graph, CompilationUnit, and schema matching to accept optional operator version number (#69914)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69914

Test Plan: Imported from OSS

Reviewed By: qihqi

Differential Revision: D33198157

fbshipit-source-id: b98d9401e515f695d6cf99116f695edc7976bf01
2021-12-25 00:35:33 -08:00
df3cbcff28 Add utility methods to find an upgrader (#68355)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68355

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D33198156

Pulled By: tugsbayasgalan

fbshipit-source-id: 68380148f0d9bee96d8090bf01c8dfca8e1f8b12
2021-12-24 12:23:04 -08:00
911d527b87 Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions (#70339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70339

When a python program is translated to TorchScript, the python exception type is dropped. This makes users's life hard when they need to categorize errors based more than only exception message.

Here we make the change so when we raise a python exception, we record the fully qualified class name for the exception. Later on when the TorchScript is interpreted, a special exception CustomJITException is thrown. User can get the python class name from CustomJITException::getPythonClassName .

Note that, this diff does not customize the mapping from C++ exception to Python exception. It's left to the users to do whatever mapping they want.

Code under scripts/shunting are just my own experimental code. I can split them out if requested.
ghstack-source-id: 146221879

Test Plan: buck test mode/opt //caffe2/test:jit

Reviewed By: gmagogsfm

Differential Revision: D33282878

fbshipit-source-id: 910f67a764519f1053a48589d1a34df69001525d
2021-12-24 00:25:40 -08:00
ab4f9862a3 [Compiled Mobilenetv3 Demo] Integrate Compiled Mobilenetv3 into FB4A Playground app (#70370)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70370

Demo of Mobilenetv3 compiled with NNC in FB4A Playground app:
- Add compiled ModelConfig in FB4A app
- Enable Camera inputs for Mobilenet processor in the app and add ability to show live outputs
- Use downscaled inputs, which works for both original mobilenetv3 model and the compiled model
- Update nnc_aten_adaptive_avg_pool2d to use adaptive_avg_pool2d instead of adaptive_avg_pool2d_out as the latter is not included in the traced operators of mobilenetv3 model and hence not included in the app.
- Update app dependencies to include nnc_backend_lib and asm binary

Test Plan:
Run `arc playground pytorchscenario` from fbandroid to build and install the app on a connected device.
Live demo with compiled Mobilenetv3 model:
https://pxl.cl/1W1kb

Reviewed By: larryliu0820

Differential Revision: D33301477

fbshipit-source-id: 5d50a0e70a7f7d2157d311d6b1feef46e78e85b6
2021-12-23 23:46:20 -08:00
0ee663d2fa Revert D33234529: [NNC Testing] Randomized loop nest infrastructure
Test Plan: revert-hammer

Differential Revision:
D33234529 (1d094587ea)

Original commit changeset: 9019f1f1d4ca

Original Phabricator Diff: D33234529 (1d094587ea)

fbshipit-source-id: a79deca9f186299bf884587eb7d50af2464979fb
2021-12-23 23:11:23 -08:00
e429a68478 Allow single node fusion for nvfuser (#70000)
Summary:
Setting `PYTORCH_NVFUSER_ONE_OP_FUSION=1` will take all nodes nvFuser support, instead of waiting for fusion opportunity.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70000

Reviewed By: samdow

Differential Revision: D33292195

Pulled By: davidberard98

fbshipit-source-id: 8ed5ce5e82fbb6737e8ab5ce4223b038eaf47756
2021-12-23 17:07:57 -08:00
5ccf28d066 Do not use ZeroTensor for inplace ops (#69998)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69998

Fixes: https://github.com/pytorch/pytorch/issues/69855

The check for undefined grads for forward AD was not being run because `check_undefined_grads` was only passed as True by OpInfo for backward AD. This PR updates gradcheck to interpret `check_undefined_grads` as possibly for forward or backward AD.

This PR also updates codegen to 1) not use ZeroTensor for `self` when the op is inplace. 2) only create zeros (either through ZeroTensor or at::zeros) if the tensor itself is not undefined. Previously we would error in this case when we call `.options` on the undefined tensor.

~TODO: undo the skips that are due to the original issue~

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D33235973

Pulled By: soulitzer

fbshipit-source-id: 5769b6d6ca123b2bed31dc2bc6bc8e4701581891
2021-12-23 15:52:34 -08:00
3116d87024 Add forward AD formulas for {adaptive_,fractional_,}max_pool{2,3}d_{backward,} (#69884)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69884

Also fixes: https://github.com/pytorch/pytorch/issues/69322, https://github.com/pytorch/pytorch/issues/69325

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D33093039

Pulled By: soulitzer

fbshipit-source-id: b9a522a00f4e9e85974888de5058de07280f8f66
2021-12-23 15:51:09 -08:00
6925576e88 [acc_ops] No longer mark acc_ops.cat as unary (#70365)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70365

We should only mark ops as unary if they should have a single fx.Node input. However, `cat` has a sequence of `tensors` input.

Reviewed By: alexbeloi

Differential Revision: D33299988

fbshipit-source-id: db3581eaee4ad9d2358eed01ec9027825f58f220
2021-12-23 15:09:03 -08:00
133c7f2cf9 Revert D33301254: [pytorch][PR] GHA Windows: Propagate exit code from .bat to calling bash script
Test Plan: revert-hammer

Differential Revision:
D33301254 (6431ac6c7a)

Original commit changeset: 6861dbf0f0a3

Original Phabricator Diff: D33301254 (6431ac6c7a)

fbshipit-source-id: c9d8f72bb198de678456e0a1bcf3264c2ea52874
2021-12-23 15:03:48 -08:00
6431ac6c7a GHA Windows: Propagate exit code from .bat to calling bash script (#70011)
Summary:
The windows 1st shard was silently failing to run (more details here https://github.com/pytorch/pytorch/issues/70010) because the code to run them was never reached. It was silently failing because our CI still returned green for those workflow jobs, because the exit code from the batch script DID NOT PROPAGATE to the calling bash script.

The key here is that even though we have
```
if ERRORLEVEL 1 exit \b 1
```

The exit code 1 was NOT propagating back to the bash script, as the `exit \b 1` was within an `if` statement and the batch script was actually run in a cmd shell, so the bash script win-test.sh continued without erroring. Moving the `exit \b 1` to be standalone fixes it.

More details for this can be found in this stack overflow https://stackoverflow.com/a/55290133

Evidence that now a failure in the .bat would fail the whole job:
https://github.com/pytorch/pytorch/runs/4621483334?check_suite_focus=true

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70011

Reviewed By: malfet

Differential Revision: D33301254

Pulled By: janeyx99

fbshipit-source-id: 6861dbf0f0a34d5baed59f928e34eab15af6f461
2021-12-23 14:09:41 -08:00
ab57f6d12c [LTC] Upstream utils to extract BackendDevice from at::Tensor (#70069)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70069

This commit upstreams utils to extract BackendDevice from at::Tensor.

Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.GetBackendDevice*

Reviewed By: samdow

Differential Revision: D33293160

Pulled By: alanwaketan

fbshipit-source-id: 78647239f90b4d04adce84ae6022b8983ad30c09
2021-12-23 12:42:03 -08:00
16e6e1a59e [Easy] Lint wrap.py file (#70341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70341

Per title
ghstack-source-id: 146181936

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D33290099

fbshipit-source-id: e4415a42086d9b1b78b0b5f42d4b02f275131dfa
2021-12-23 11:30:36 -08:00
3c231e9bd7 [FSDP] Remove module.wrapper_config support (#70340)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70340

Some wrap APIs support module.wrapper_config to specify the FSDP
arguments, though this feature is currently unused in all use cases and there
is no plan to support this API. enable_wrap() and wrap() along with FSDP
constructor wrapping should be enough for all use cases, so get rid of the
unnecessary code.
ghstack-source-id: 146181819

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D33290066

fbshipit-source-id: e7f3d8b2f2ff6bdf4a3e5021dbb53adf052ee8dc
2021-12-23 11:29:13 -08:00
d100d98db8 torch.linalg routines return torch.linalg.LinAlgError when a numerical error in the computation is found. (#68571)
Summary:
This PR fixes https://github.com/pytorch/pytorch/issues/64785 by introducing a `torch.LinAlgError` for reporting errors caused by bad values in linear algebra routines which should allow users to easily catch errors caused by numerical errors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68571

Reviewed By: malfet

Differential Revision: D33254087

Pulled By: albanD

fbshipit-source-id: 94b59000fdb6a9765e397158e526d1f815f18f0f
2021-12-23 10:53:26 -08:00
6a84449290 [SR] Fast path for VarStack on scalars (#70210)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70210

Add a fast-path for `VarStack` nodes for when the inputs are scalars.

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- VarStack`

Reviewed By: hlu1

Differential Revision: D33177498

fbshipit-source-id: 922ab76a6808fbfdb8eb6091163a380344e38de6
2021-12-23 10:31:17 -08:00
cc8b916395 Transformer{DecoderLayer} : no batch dim (#70322)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60585

TransformerDecoder Test Timings (takes about 30s)
<details>

```
pytest test/test_modules.py -k _TransformerDeco --durations=10
============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.10.0, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: /home/kshiteej/Pytorch/pytorch_no_batch_mha, configfile: pytest.ini
plugins: hypothesis-6.23.2, repeat-0.9.1
collected 639 items / 591 deselected / 48 selected

test/test_modules.py ss......ss......ss..ssssssssss..................                                                                                                                                      [100%]

================================================================================================================================================================================ slowest 10 durations ==============================================================================================
17.13s call     test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_TransformerDecoderLayer_cuda_float64
4.13s call     test/test_modules.py::TestModuleCPU::test_gradgrad_nn_TransformerDecoderLayer_cpu_float64
1.22s call     test/test_modules.py::TestModuleCUDA::test_grad_nn_TransformerDecoderLayer_cuda_float64
0.86s call     test/test_modules.py::TestModuleCPU::test_cpu_gpu_parity_nn_TransformerDecoderLayer_cpu_float32
0.73s call     test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerDecoderLayer_cuda_float32
0.57s call     test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerDecoderLayer_cuda_float32
0.56s call     test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerDecoderLayer_cuda_float64
0.48s call     test/test_modules.py::TestModuleCPU::test_grad_nn_TransformerDecoderLayer_cpu_float64
0.41s call     test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerDecoderLayer_cuda_float32
0.40s call     test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerDecoderLayer_cuda_float64
============================================================================================ short test summary info =============================================================================================
========================================================================== 32 passed, 16 skipped, 591 deselected, 3 warnings in 29.62s ===========================================================================
```

</details>

Transformer Test Timings (takes about 1m10s)

<details>
```
pytest test/test_modules.py -k _Transformer_ --durations=10
============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.10.0, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: /home/kshiteej/Pytorch/pytorch_no_batch_mha, configfile: pytest.ini
plugins: hypothesis-6.23.2, repeat-0.9.1
collected 639 items / 591 deselected / 48 selected

test/test_modules.py ss......ss......ss..ssssssssss..................                                                                                                                                      [100%]

==================================================================================
============================================================================================== slowest 10 durations ==============================================================================================
46.40s call     test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Transformer_cuda_float64
11.09s call     test/test_modules.py::TestModuleCPU::test_gradgrad_nn_Transformer_cpu_float64
2.48s call     test/test_modules.py::TestModuleCUDA::test_grad_nn_Transformer_cuda_float64
1.03s call     test/test_modules.py::TestModuleCPU::test_grad_nn_Transformer_cpu_float64
0.96s call     test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Transformer_cuda_float32
0.87s call     test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Transformer_cuda_float32
0.85s call     test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Transformer_cuda_float64
0.85s call     test/test_modules.py::TestModuleCPU::test_cpu_gpu_parity_nn_Transformer_cpu_float32
0.65s call     test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Transformer_cuda_float64
0.47s call     test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Transformer_cuda_float32
============================================================================================ short test summary info =============================================================================================
===================================================================== 32 passed, 16 skipped, 591 deselected, 3 warnings in 70.19s (0:01:10) ======================================================================
```
</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70322

Reviewed By: cpuhrsch

Differential Revision: D33286285

Pulled By: jbschlosser

fbshipit-source-id: 46e08cf47f37787733a535f683c3fd21f652486d
2021-12-23 10:13:31 -08:00
4d49af863f GaussianNLLLoss no_batch_dim docs and testing (#69783)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69783

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33200486

Pulled By: george-qi

fbshipit-source-id: a2bc2b366772682825f879dae4ac29c1f4d6a5f1
2021-12-23 09:27:53 -08:00
a9c7d626e1 Add the maximize flag to AdamW (#70146)
Summary:
Related issue: https://github.com/pytorch/pytorch/issues/68052

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70146

Reviewed By: malfet

Differential Revision: D33254561

Pulled By: albanD

fbshipit-source-id: f190c836a4162f936c5953e076747c345df21421
2021-12-23 09:20:29 -08:00
b15212c62b enable backward pass computation and communication overlap by prefetching all gather (#70235)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70235

address comments in https://github.com/pytorch/pytorch/pull/69282:
Have fixed a few corner cases for prefetching full parameters in post backward hook.

After benchmarking, prefetching full parameters in the pre-backward hook has the best performance and stable but at cost of increased memory; prefetching full parameters in the post-backward hook did not see expected performance, also failed in a few corner cases (fixed) although there is no memory increase. The main issue is that post backward hook fire order is not consistent with opposite of forward computation order, so incorrectly prefetched all gather could delay the really needed all gather in the single NCCL stream and cause some layer's computation delay.

So putting  these two algorithms as two configurable experimental algorithms for now

prefetch full parameters at pre-backward hook:

It is observed from past traces that all gather ops are not triggered until current layer's backward pass starts to compute, also for some models previous layers' reduce scatter is scheduled before next layer's all gather ops, since all gather and reduce scatter are in the same nccl stream, this case could result in backward pass has no communication and computation overlap.

To explicitly make next layers' all gather scheduled while previous layers' backward computation is running, we can prefetch next layers' all gather full params. This can help 1) both all gather and reduce scatter are overlapped with computation deterministically 2) only prefetch one layer's all gather full parameters, to avoid increasing too much memories.

The implementation borrowed the idea from facebookresearch/fairscale#865, where forward graph order is recorded in the forward pass.

In the backward pass, this PR prefetches all gather full parameter in current layer's pre-backward hook, instead of prefetching in current layer's post backward hook in facebookresearch/fairscale#865. Also make sure all gather streams are synced properly.

Experiments showed 10% memory increase and 20% latency speed up for 1GB roberta model in a slow network environment.

Test Plan: unit tests

Reviewed By: rohan-varma

Differential Revision: D33252795

fbshipit-source-id: 4e2f47225ba223e7429b0dcaa89df3634bb70050
2021-12-22 23:02:46 -08:00
1d094587ea [NNC Testing] Randomized loop nest infrastructure (#70174)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70174

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D33234529

fbshipit-source-id: 9019f1f1d4ca945c92bee401f7ec674b7d987de4
2021-12-22 22:07:39 -08:00
656d2a7bf6 [quant][fx][graphmode] Add backend_config_dict for standalone module (#70150)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70150

This PR allows user to specify backend_config_dict for standalone modules, both in prepare and convert step
adding this now to allow prototype for some of our customer use cases, test for the codepath will be added in
a separate PR

Test Plan:
regression tests
```
python test/test_quantization.py TestQuantizeFx
```
test that specifies backend_config for some module will be added in a separate PR for the use case we have in mind
since it requires other features

Imported from OSS

**Static Docs Preview: classyvision**
|[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D33205162/V9/classyvision/)|

|**Modified Pages**|

Reviewed By: vkuzo

Differential Revision: D33205162

fbshipit-source-id: a657cef8e49d99b6a43653141521dc87c33bfd89
2021-12-22 21:18:39 -08:00
795af1578c Revert D33172665: [LTC] Upstream utils to extract BackendDevice from at::Tensor
Test Plan: revert-hammer

Differential Revision:
D33172665 (121d067999)

Original commit changeset: b334ee358ea7

Original Phabricator Diff: D33172665 (121d067999)

fbshipit-source-id: 8bff43cddfc5d30483ec5cea8eff037aab9d1cfa
2021-12-22 21:12:49 -08:00
12afe2bb84 update poisson_nll_loss opinfo samples (#70300)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67461

cc albanD mruberry jbschlosser walterddr kshitij12345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70300

Reviewed By: cpuhrsch

Differential Revision: D33285896

Pulled By: jbschlosser

fbshipit-source-id: ec917ec7d3113dbc4ae03978fa5abb24aa082c01
2021-12-22 19:10:57 -08:00
681e78bace [Profiler] Address issues from profiler bifurcation. (#70327)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70327

After D32678163 (7ea86dfdb1), test_rpc_profiler began failing. This was surprising, because it should have been a no-op refactor. However, one change is that a Kineto profiler is no longer also an autograd profiler; the RPC framework was assuming a legacy profiler but when a kineto profiler was active things still kind of worked due to that implementation detail. (But crashed after the class split.)

This diff tidys up a couple of things:
1) Move `getProfilerConfig` into `api.cpp`, since it is no longer correct to static_cast a `KinetoThreadLocalState` to a `ProfilerLegacyThreadLocalState`. (And really the class we want is `ProfilerThreadLocalStateBase` anyway.)

2) Add a mechanism for callers to check if the active profiler is a legacy or kineto profiler. (So callers like RPC can adjust or provide a nice error message.)

3) Fix the RPC test to create a legacy profiler.

Test Plan: `caffe2/torch/fb/training_toolkit/backend/tests:test_rpc_profiler` now passes, and before the fix to `test_rpc_profiler.py`, I verified that the test failed with the error message added to `utils.cpp` rather than just crashing.

Reviewed By: suphoff

Differential Revision: D33283314

fbshipit-source-id: e4fc5b5cfc9ca3b91b8f5e09adea36f38611f90d
2021-12-22 18:50:42 -08:00
121d067999 [LTC] Upstream utils to extract BackendDevice from at::Tensor (#70069)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70069

This commit upstreams utils to extract BackendDevice from at::Tensor.

Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.GetBackendDevice*

Reviewed By: wconstab

Differential Revision: D33172665

Pulled By: alanwaketan

fbshipit-source-id: b334ee358ea7b031bbffb0a16fa634715dba83f5
2021-12-22 18:15:45 -08:00
bd8e8e3aaf [GHA] Clean after checkout (#70337)
Summary:
Github's checkout action sometimes leaves untracked files in the repo
Remedy it by running `git clean -fxd`, which should nuke them all

Tentative fix for https://github.com/pytorch/pytorch/issues/70097

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70337

Reviewed By: suo

Differential Revision: D33289189

Pulled By: malfet

fbshipit-source-id: 16e3ebe7a61fda1648189c78bdf1b1185247037a
2021-12-22 18:10:23 -08:00
a421ee0e52 [nn] InstanceNorm : no batch dim for modules (#65323)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/60585

cc albanD mruberry jbschlosser walterddr kshitij12345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65323

Reviewed By: davidberard98

Differential Revision: D33285268

Pulled By: jbschlosser

fbshipit-source-id: c5210bb431eaf27190e1cd75c42af3e5bcf83f72
2021-12-22 18:00:36 -08:00
c06b3208d4 Revert D33141012: test //c10/... in CI
Test Plan: revert-hammer

Differential Revision:
D33141012 (0ccccf4ed5)

Original commit changeset: 702000587171

Original Phabricator Diff: D33141012 (0ccccf4ed5)

fbshipit-source-id: 1e30c2dad940f54185dc93912fd7b3e81eec5b63
2021-12-22 17:48:48 -08:00
23ab6ce723 Revert D33141011: extract //c10/macros into its own package
Test Plan: revert-hammer

Differential Revision:
D33141011 (8f4c724bb6)

Original commit changeset: caa97448f922

Original Phabricator Diff: D33141011 (8f4c724bb6)

fbshipit-source-id: 79423ed51f9a43ecf1f716a739c74949b66fadb4
2021-12-22 17:48:45 -08:00
f126501d37 Revert D33141010: allow Bazel to build without glog and gflags
Test Plan: revert-hammer

Differential Revision:
D33141010 (8c41f258f4)

Original commit changeset: d951e5616459

Original Phabricator Diff: D33141010 (8c41f258f4)

fbshipit-source-id: d52ca20ddf4c5a91cb09a32fecb30a00227fc4ae
2021-12-22 17:47:23 -08:00
682fab19d4 [SR] verify_and_correct_memory_overlap handles tensor lists (#69774)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69774

We recently ran into a nasty bug caused by incorrect schema annotations on an `aten::split` overload. `verify_and_correct_memory_overlap` is supposed to prevent crashes in this scenario, but it didn't because it did not handle `Tensor[]` outputs.

This change extends the memory correction mechanism to handle tensor lists.
ghstack-source-id: 146152478

Test Plan: `buck test caffe2/benchmarks/static_runtime/...`

Reviewed By: hlu1

Differential Revision: D33022494

fbshipit-source-id: 8d1d41ca1d4fd5dfb7c8a66028c391ba63551eb0
2021-12-22 17:18:18 -08:00
385c12852e [LTC] Upstream LazyTensor <=> at::Tensor utils (#70066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70066

This commit upstreams utils to convert at::Tensors into LazyTensors and
vice versa.

Test Plan:
Covered by test_ptltc on the lazy_tensor_staging branch since TorchScript
Backend hasn't merged yet.

Reviewed By: desertfire

Differential Revision: D33171590

Pulled By: alanwaketan

fbshipit-source-id: b297ff5fc8ca1a02d30e16ad2249985310e836a9
2021-12-22 16:53:07 -08:00
2e94a0d282 Remove backward ops for NNPACK spatial convolution (#70305)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70305

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D33279223

Pulled By: jbschlosser

fbshipit-source-id: f263012b3edaa87ce5430ffd6204a5453360d5dd
2021-12-22 14:58:12 -08:00
7cdfd86a72 TestMathBits: test with neg and conj bit set (#68948)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68948

The case where both the negative and conjugate bits are set
isn't tested currently despite being handled explicitly by `copy`.
In theory this shouldn't matter because neg_bit is only used for real
values, but it does mean the code in copy is untested. So, this just
runs it with a single sample as a sanity check.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33064371

Pulled By: anjali411

fbshipit-source-id: e90c65e311507c4fc618ff74fecc4929599c4fa3
2021-12-22 14:30:35 -08:00
7c690ef1c2 FractionalMaxPool3d with no_batch_dim support (#69732)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69732

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33280090

Pulled By: george-qi

fbshipit-source-id: aaf90a372b6d80da0554bad28d56436676f9cb89
2021-12-22 14:30:32 -08:00
8c41f258f4 allow Bazel to build without glog and gflags (#69995)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69995
ghstack-source-id: 146027060

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D33141010

fbshipit-source-id: d951e5616459e8aa163ae0741e245f53185580e8
2021-12-22 14:30:30 -08:00
8f4c724bb6 extract //c10/macros into its own package (#69994)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69994
ghstack-source-id: 145799968

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D33141011

fbshipit-source-id: caa97448f922d7c12980bf01669c1b3ef5c1213b
2021-12-22 14:30:27 -08:00
0ccccf4ed5 test //c10/... in CI (#69993)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69993
ghstack-source-id: 145799967

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D33141012

fbshipit-source-id: 70200058717189a57858f3f8d94ecc364fb229d6
2021-12-22 14:30:24 -08:00
1bd147b61a Fix masked_softmax's perf for element_size is not 8 (#70271)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70271

Test Plan:
Rebase on top of D32407544 and
buck run mode/opt -c fbcode.enable_gpu_sections=true pytext/fb/tools:benchmark_masked_softmax -- masked-softmax --batch-size=10
to see correct perf data ( PT time = ~2.5x PT native time )

Reviewed By: ngimel

Differential Revision: D33268055

fbshipit-source-id: f48b17852c19c2bc646f9ed8d9d5aac85caa8a05
2021-12-22 14:29:09 -08:00
c34aa715fa AT_MKL_SEQUENTIAL and build changes (#70259)
Summary:
Re-land of  https://github.com/pytorch/pytorch/pull/69419

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70259

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D33246757

Pulled By: ngimel

fbshipit-source-id: 738f8558d4cad6752be14108f9931ec3514f6682
2021-12-22 13:52:23 -08:00
b37de0a4bb Update flags in nnc lowering (#70306)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70306

USE_XNNPACK is the right one to enable lowering to prepacked xnnpack based ops

Test Plan: CI

Reviewed By: ZolotukhinM, priyaramani

Differential Revision: D33279375

fbshipit-source-id: d19ded5643f487f7b58c54a860ad39c8d484ed05
2021-12-22 12:25:35 -08:00
f36b44bb9e Remove ciflow_should_run job (#70204)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66725

This removes the ci_flow_should_run job and puts it in the build stage for the different job templates.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70204

Reviewed By: malfet

Differential Revision: D33282338

Pulled By: zengk95

fbshipit-source-id: 327ff2bca9720d2a69083594ada5c7788b65adbd
2021-12-22 11:52:42 -08:00
276253b164 Fixed wrong return type in ModuleList getitem (#69083)
Summary:
Fixes typing error:
`Expected type ‘Iterable’ (matched generic type ‘Iterable[_T1]’), got ‘Module’ instead.
`

see: https://discuss.pytorch.org/t/modulelist-typing-error-not-an-iterable/138137/5 :

To reproduce (e.g. with mypy/pycharm):

```python
import torch.nn as nn
class Model(nn.Module):

    def __init__(self):
        super().__init__()
        self.module_list = nn.ModuleList(
            [nn.Linear(8, 8), nn.Linear(8, 8), nn.Linear(8, 8), nn.Linear(8, 8), nn.Linear(8, 1)]
        )

    def forward(self, batch):
        for i in self.module_list[1:4]:
            pass
        return batch
model = Model()
out = model(torch.randn(1, 1))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69083

Reviewed By: davidberard98

Differential Revision: D33279114

Pulled By: jbschlosser

fbshipit-source-id: 90d74e76602163586b6ff4c49613a2694a9af37c
2021-12-22 11:38:17 -08:00
ce9a2f8ba9 [C++ API] Added missing nearest-exact mode and anti-alias flag (#69318)
Summary:
Description:

Following https://github.com/pytorch/pytorch/pull/65142#issuecomment-981995692 adding missing nearest-exact mode and anti-alias flag to C++ frontend.

- https://github.com/pytorch/pytorch/pull/65142
- https://github.com/pytorch/pytorch/pull/64501

- added tests in pytorch/test/cpp/api/functional.cpp

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69318

Reviewed By: davidberard98

Differential Revision: D33278995

Pulled By: jbschlosser

fbshipit-source-id: fa87c0c78df6b398e4f9688cc02111eed187afa7
2021-12-22 11:10:51 -08:00
da63f3f92b Corrected typo in Cross entropy formula (#70220)
Summary:
Changes made to line 1073: The denominator of the formula was the EXP(SUM(x)) and changed it to SUM(EXP(x))

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70220

Reviewed By: davidberard98

Differential Revision: D33279050

Pulled By: jbschlosser

fbshipit-source-id: 3e13aff5879240770e0cf2e047e7ef077784eb9c
2021-12-22 11:06:21 -08:00
b7259b8660 [quant][be] Add a check in prepare_qat to make sure the model is in training mode (#69879)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69879

att

Test Plan:
```
python test/test_quantization.py TestQuantizationAwareTraining
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D33080989

fbshipit-source-id: 55a631284365ec9dfd6bd7469688490ab1891d41
2021-12-22 11:00:00 -08:00
2806d821b0 Add conversion of torch.permute to acc_ops.permute (#70294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70294

In order to inference shape for permute, the node target needs to get converted from torch.permute to acc_opts.permute.

Reviewed By: jfix71

Differential Revision: D33267469

fbshipit-source-id: b77eff1892211eac4a798a2f3e624140e287f4a2
2021-12-22 10:38:39 -08:00
56969bf88a make inverse call linalg_inv (#70276)
Summary:
`linalg.inv` and `inverse` are aliases according to documentation, yet their implementation is somewhat diverged. This makes `inverse` call into `linalg_inv`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70276

Reviewed By: malfet

Differential Revision: D33271847

Pulled By: ngimel

fbshipit-source-id: cf018ddd2c1cee29026dd5f546f03f3a1d3cf362
2021-12-22 10:15:40 -08:00
4db3a8fc0a [nn] TransformerEncoderLayer: no-batch-dim (#69291)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/60585
TODO:
* [ ] Update docs?
* [x] Generic reference function?

cc albanD mruberry jbschlosser walterddr kshitij12345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69291

Reviewed By: davidberard98

Differential Revision: D33278970

Pulled By: jbschlosser

fbshipit-source-id: 8dd5b6d7c0099fa38aa037c186778b10834bdee4
2021-12-22 10:00:09 -08:00
69b37a16f3 Remove unused CUDASolver.h from SparseCUDABlas (#70281)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70281

Reviewed By: ngimel

Differential Revision: D33272704

Pulled By: malfet

fbshipit-source-id: a33a7f9cd1513115a0b9ab75530e85e9913e8dd3
2021-12-22 09:04:34 -08:00
31c7e5d629 Install TensorRT lib on oss docker and enable fx2trt unit test (#70203)
Summary:
CI

Lib installed and unit test run on https://github.com/pytorch/pytorch/actions/runs/1604076060

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70203

Reviewed By: malfet

Differential Revision: D33264641

Pulled By: wushirong

fbshipit-source-id: ba30010bbd06e70d31415d8c52086d1779371bcf
2021-12-22 08:50:48 -08:00
b5f71375f5 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33275345

fbshipit-source-id: b07a27897680190f9fff86e22d8c68c1c9aff19a
2021-12-22 08:05:39 -08:00
29f1ccc8f0 Fix some Composite Compliance problems with binary_cross_entropy backward (#70198)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70198

This PR fixes composite compliance problems with:
- binary_cross_entropy's backward formula
- binary_cross_entropy_with_logits's backward formula
- binary_cross_entropy's double backward formula

It does so by adding checks for areAnyTensorSubclassLike.

Test Plan:
- I tested everything with functorch.
- We are going to do https://github.com/pytorch/pytorch/issues/69530 in
the future so we have a way of testing this in core. I need the
binary_cross_entropy ones for something right now and didn't want to
wait until we come up with a solution for #69530.

Reviewed By: Chillee

Differential Revision: D33246995

Pulled By: zou3519

fbshipit-source-id: 310ed3196b937d01b189870b86a6c5f77f9258b4
2021-12-22 07:24:04 -08:00
75dbe88b05 [DataPipe] removing unbatch_level from .groupby (#70249)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70249

IMO, the `unbatch_level` argument is not needed here since users can simply can `.unbatch` before calling `.groupby` if needed. One small step closer to an unified API with other libraries.

Note that we may rename the functional name from `.groupby` to `.group` in the future. TBD.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D33259104

Pulled By: NivekT

fbshipit-source-id: 490e3b6f5927f9ebe8772d5a5e4fbabe9665dfdf
2021-12-22 07:13:12 -08:00
e02d836cb2 [LTC] Upstream LTCTensorImpl (#70062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70062

This commit upstreams LTCTensorImpl from the lazy_tensor_staging branch.
It inherits from c10::TensorImpl and thus manages the lifetime/storage
of LazyTensor.

Test Plan: ./build/bin/test_lazy --gtest_filter=LazyTensorImplTest.*

Reviewed By: desertfire

Differential Revision: D33171186

Pulled By: alanwaketan

fbshipit-source-id: 6af9f91cc7c7e997f120cb89a7bcd6785c03ace0
2021-12-22 03:21:52 -08:00
633f770c3c [StaticRuntime] Add out-variant support for TensorExprDynamicGroup op (#69479)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69479

This diff adds support for out-variant optimization for `TensorExprDynamicGroup` op, which will be used for TensorExpr based fusion in Static Runtime.
ghstack-source-id: 146107008

Test Plan:
```
buck run mode/opt //caffe2/caffe2/fb/predictor:pytorch_predictor_test
```

Completed accuracy test on inline_cvr model 294738512 v0. Results:
```
get 1012 prediction values
get 1012 prediction values
pyper_inference_e2e_local_replayer_test.out.132ea03c2 pyper_inference_e2e_local_replayer_test.out.1858bbeb0
max_error:  0 % total:  0
```

Reviewed By: d1jang, mikeiovine

Differential Revision: D32768463

fbshipit-source-id: a3e6c1ea9ff5f3b57eb89095aa79a6d426fbb52a
2021-12-22 00:30:22 -08:00
7d4db93a7d [jit] Handle output tensor being passed in as inputs to TensorExprDynamicGroup (#69478)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69478

This diff handles the case when output tensors are being passed in as
inputs to TensorExprDynamicGroup op.

This is in preparation to support out-variant optimizations in Static Runtime.
ghstack-source-id: 146107007

Test Plan: buck test mode/dev-nosan //caffe2/test/cpp/jit:jit

Reviewed By: eellison

Differential Revision: D32823889

fbshipit-source-id: ff18e17fcd09953e55c8da6b892e60756521c2fc
2021-12-22 00:30:19 -08:00
4dec15e6d8 [nnc] Add a run method to TensorExprKernel that takes in output tensors (#69477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69477

This diff adds a new run method to `TensorExprKernel` which takes in
output tensors as inputs and stores the output in those given tensors.
ghstack-source-id: 146107009

Test Plan: buck test mode/dev-nosan //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - Kernel.RunWithAllocatedOutputs'

Reviewed By: ZolotukhinM

Differential Revision: D32823890

fbshipit-source-id: edc1f4839785124048b034060feb71cb8c1be34f
2021-12-22 00:30:15 -08:00
0bdf4702f6 [jit] Add a new op that composes all of the dynamic shape logic (#69476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69476

This diff adds a new op, `TensorExprDynamicGroup`, that composes all the logic behind running a dynamic shaped fused node. This includes a guard instruction that checks for conditions, a conditional that calls the fused node or the fallback graph depending on the guard.
ghstack-source-id: 146107006

Test Plan:
```
buck test mode/dev-nosan //caffe2/test/cpp/jit:jit
```

Reviewed By: eellison

Differential Revision: D32320082

fbshipit-source-id: 2bd1a43391ca559837d78ddb892d931abe9ebb73
2021-12-22 00:28:57 -08:00
b613fbdbf2 Back out "[Quant] Added 4 bit support for embedding quantized module" (#70273)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70273

Original commit changeset: 73e63383cf60

Original Phabricator Diff: D33152674 (9f512e129b)

Test Plan: CI

Reviewed By: larryliu0820

Differential Revision: D33268459

fbshipit-source-id: 051bfcbbad3fa083301a3cea508d00946d6db881
2021-12-21 21:28:04 -08:00
47ba28f3b5 Back out "[Quant][Eager] Added 4 bit support for eager mode quantization flow" (#70272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70272

Original commit changeset: 5cdaac5aee9b

Original Phabricator Diff: D33152675 (75718e5059)

Test Plan: CI

Reviewed By: larryliu0820

Differential Revision: D33268415

fbshipit-source-id: 99eb3209d513149ed23a1d9071d1b1c12174d09a
2021-12-21 21:28:01 -08:00
a86f9806bc Back out "[Quant][fx] Added test for quint4x2 for fx graph mode quantization" (#70274)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70274

Original commit changeset: 89951fcd23e7

Original Phabricator Diff: D33152672 (de4e7dece9)

Test Plan: CI

Reviewed By: larryliu0820

Differential Revision: D33268165

fbshipit-source-id: d667a761d72b9423407ce4d6617e9b6a04b5c9f8
2021-12-21 21:26:46 -08:00
6217fee96b Revert D33246843: [pytorch][PR] Implementation of Wishart distribution
Test Plan: revert-hammer

Differential Revision:
D33246843 (a217a62e73)

Original commit changeset: 825fcddf4785

Original Phabricator Diff: D33246843 (a217a62e73)

fbshipit-source-id: 2c8063e8d10e9d3ac20fa44673e6011ed1160753
2021-12-21 18:55:49 -08:00
2d509ff31b [GHA] Fix doc push jobs (#70269)
Summary:
Home folder in docker images is `/var/lib/jenkins`, rather than `/home/jenkins`
Also repo secrets can not start with `GITHUB_` prefix according to [Naming your secrets](https://docs.github.com/en/actions/security-guides/encrypted-secrets#naming-your-secrets) guide

Fixes https://github.com/pytorch/pytorch/issues/70211

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70269

Reviewed By: suo

Differential Revision: D33271404

Pulled By: malfet

fbshipit-source-id: 044bb34c75a0e8a9f0b2f5790be7aa2397524a24
2021-12-21 18:20:10 -08:00
591ca4d6bc [Operator Versioning][Edge] Reorganize upgrader initialization logic for thread safety (#70225)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70225

Thanks for zhxchen17's suggestion. This pr move the operator initialization logic to `upgrader_mobile.cpp`, such that we can leverage the static variable to ensure the operator initialization only happens once.
ghstack-source-id: 146103229

Test Plan:
```

buck test mode/opt //papaya/integration/service/test/analytics/histogram:generic_histogram_system_test -- --exact 'papaya/integration/service/test/analytics/histogram:generic_histogram_system_test - SumHistogramSystemTest.test' --run-disabled
buck test mode/opt //caffe2/test/cpp/jit:jit
buck test mode/dev //papaya/integration/service/test/mnist:mnist_system_test -- --exact 'papaya/integration/service/test/mnist:mnist_system_test - MnistFederatedSystemTest.test'
```

Reviewed By: zhxchen17

Differential Revision: D33247543

fbshipit-source-id: 6c3a87fe909a1be01452fa79649065845b26d805
2021-12-21 17:26:17 -08:00
21c6de9fdc Extend autograd functional benchmarking to run vectorized tasks (#67045)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67045

To run: `python benchmarks/functional_autograd_benchmark/functional_autograd_benchmark.py --gpu -1 --model-filter=ppl    _robust_reg --num-iter 100`

```
Results for model ppl_robust_reg on task vjp: 0.0012262486852705479s (var: 2.2107682351446556e-10)
Results for model ppl_robust_reg on task vhp: 0.002099371049553156s (var: 6.906406557760647e-10)
Results for model ppl_robust_reg on task jvp: 0.001860950025729835s (var: 1.1251884146634694e-10)
Results for model ppl_robust_reg on task hvp: 0.003481731517240405s (var: 2.2713633751614282e-10)
Results for model ppl_robust_reg on task jacobian: 0.0012128615053370595s (var: 1.3687526667638394e-09)
Results for model ppl_robust_reg on task hessian: 0.009885427542030811s (var: 9.366265096844018e-09)
Results for model ppl_robust_reg on task hessian_fwdrev: 0.005268776323646307s (var: 2.4293791422991262e-09)
Results for model ppl_robust_reg on task hessian_revrev: 0.002561321249231696s (var: 7.557877101938004e-10)
Results for model ppl_robust_reg on task jacfwd: 0.002619938924908638s (var: 5.109343503839625e-10)
Results for model ppl_robust_reg on task jacrev: 0.0013469004770740867s (var: 3.1857563254078514e-09)
```
Notes:
 - We go through batched fallback for both
 - ppl_robust_reg takes 3 tensor inputs and returns a single scalar output
   - this means that jacobian is equivalent to doing vjp and vmap would not help us
   - we expect jacfwd to be slower than jacrev

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D33265947

Pulled By: soulitzer

fbshipit-source-id: 14f537a1376dea7e5afbe0c8e97f94731479b018
2021-12-21 17:20:29 -08:00
82c5f298ed [shard] fix named_params_with_sharded_tensor (#70228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70228

fix named_params_with_sharded_tensor impl, where `named_parameters` already loop the submodules recursively, so we shouldn't put it in the submodule loop.
ghstack-source-id: 146076471

Test Plan: Added more complicated test cases (that involves multiple submodules) to capture this issue.

Reviewed By: pritamdamania87

Differential Revision: D33251428

fbshipit-source-id: cf24ca7fbe4a5e485fedd2614d00cdea2898239e
2021-12-21 15:29:38 -08:00
74c834e0dc [DataPipe] adding a finally statement to ensure hook is reset (#70214)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70214

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D33255306

Pulled By: NivekT

fbshipit-source-id: de2fe6bf08328e481c714aaad390db771073469e
2021-12-21 15:21:04 -08:00
23902fb895 Fixed typo in torch check for cdist (#70178)
Summary:
Description:
- Fixed typo in torch check for cdist

cc zou3519

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70178

Reviewed By: bdhirsh

Differential Revision: D33236027

Pulled By: zou3519

fbshipit-source-id: e87a982c0dc5fe576db8f2afc4b2010924f047c0
2021-12-21 15:16:39 -08:00
a217a62e73 Implementation of Wishart distribution (#68588)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68050

TODO:
- [x] Unit Test
- [x] Documentation
- [x] Change constraint of matrix variables with 'torch.distributions.constraints.symmetric' if it is reviewed and merged. https://github.com/pytorch/pytorch/issues/68720

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68588

Reviewed By: bdhirsh

Differential Revision: D33246843

Pulled By: neerajprad

fbshipit-source-id: 825fcddf478555235e7a66de0c18368c41e935cd
2021-12-21 14:07:30 -08:00
0544f975e1 [reland] Support torch.equal for ShardedTensor. (#70145)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70145

Added support for torch.equal to ShardedTensor. This is really
helpful in terms of comparing two ShardedTensors.
ghstack-source-id: 146066939

Test Plan: waitforbuildbot

Reviewed By: wanchaol

Differential Revision: D33201714

fbshipit-source-id: 56adfc36e345d512c9901c56c07759bf658c745b
2021-12-21 13:22:52 -08:00
c321d4c1ca [Operator Versioning] Split the upgrader test to a separate file and cover mobile part (#70090)
Summary:
1. Split the test `test_save_load.py` to two files. Basically move the operator versioning related changes to `test_save_load_for_op_versions.py`.
2. Add mobile module related test to `test_save_load_for_op_versions.py`

How to run:
```
buck test mode/opt //caffe2/test:jit
or
python test/test_jit.py TestSaveLoadForOpVersion
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70090

ghstack-source-id: 146103547

Test Plan:
```
buck test mode/opt //caffe2/test:jit
python test/test_jit.py TestSaveLoadForOpVersion
```

Reviewed By: tugsbayasgalan

Differential Revision: D33180767

fbshipit-source-id: dd31e313c81e90b598ea9dd5ad04a853c017f994
2021-12-21 13:08:01 -08:00
a6f953156e [StaticRuntime] Add TensorExpr fusion with dynamic shapes in SR (#69475)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69475

This diff adds TensorExpr fusion with dynamic shapes in SR. This includes tracing the input graph with sample inputs, and then performing fusion with generalization to get fused graphs with dynamic shapes.
ghstack-source-id: 146059043

Test Plan:
```
buck run mode/opt //caffe2/caffe2/fb/predictor:pytorch_predictor_test
```

Reviewed By: d1jang

Differential Revision: D32320088

fbshipit-source-id: 397f498878ddfcee9dad7a839652f79f034fefe3
2021-12-21 12:41:02 -08:00
c6d1162325 [jit] Add support for dynamic shape fusion in JIT. (#69474)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69474

This diff adds support for dynamic shape fusion in JIT. This is done
by performing fusion with the static shapes observed on the first run,
generalizing the fused subgraphs and generating code for the generalized fused
subgraphs with dynamic shapes.
ghstack-source-id: 146059044

Test Plan:
```
buck test mode/dev-nosan //caffe2/test/cpp/jit:jit
```

Reviewed By: eellison

Differential Revision: D32781307

fbshipit-source-id: f821d9f8c271bcb78babcb4783d66f2f0020b0ea
2021-12-21 12:39:44 -08:00
c5333cdfba [nnc] tensorexpr for quantized::add (#70188)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70188

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D33238093

Pulled By: IvanKobzarev

fbshipit-source-id: bd4e451bfd7531f31f216def2c3c1ba2f2e566e7
2021-12-21 12:30:56 -08:00
bb51519937 bug fix FractionalMaxPool2d (random_samples dimensions) (#70031)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70031

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D33200618

Pulled By: george-qi

fbshipit-source-id: 142f224c2cab1008d2d4e9ed333697a92d2d42db
2021-12-21 12:21:54 -08:00
91da2d5fa1 [StaticRuntime] Refactor StaticModule to pass in sample inputs (#69473)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69473

This diff refactors StaticModule and its uses to pass in sample inputs. These inputs need to be passed into the constructor because they are need to perform TensorExpr fusion before other optimizations are performed on the input graph.
ghstack-source-id: 146059041

Test Plan: buck run mode/opt //caffe2/caffe2/fb/predictor:pytorch_predictor_test

Reviewed By: donaldong

Differential Revision: D32320084

fbshipit-source-id: b8bd46d442be4cc90ca60f521e0416fdb88eea60
2021-12-21 11:20:25 -08:00
c4a6c7a436 fix cpu binary size increase for clamp (#70168)
Summary:
Per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70168

Reviewed By: bdhirsh

Differential Revision: D33229811

Pulled By: ngimel

fbshipit-source-id: 3509da766fa327f4103fdcf880d368f64c111496
2021-12-21 10:59:27 -08:00
5504e4ae5c [nnc] Move DispatchParallel to external_functions (#70221)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70221

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D33249149

Pulled By: IvanKobzarev

fbshipit-source-id: fa6b2535dc09229d72b1c45eaa75434477cdff5e
2021-12-21 10:51:38 -08:00
304efd8e9a Change TH_BLAS_MKL into AT_MKL_ENABLED() (#70219)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70219

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69419

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D33246758

Pulled By: ngimel

fbshipit-source-id: aedef4c9ef97b6aa9f574313c94f774b77df2748
2021-12-21 10:36:55 -08:00
a197f3fe52 [FSDP/Checkpoint] Activation offload support in checkpoint_wrapper (#70165)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70165

Implements activation offload support in checkpoint_wrapper API via
save_on_cpu hooks. We avoid modifying the torch.utils.checkpoint implementation
and instead compose offload + checkpoint using the save_on_cpu hook for the
former.
ghstack-source-id: 146078900

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D33228820

fbshipit-source-id: 98b4da0828462c41c381689ee07360ad014e808a
2021-12-21 10:08:18 -08:00
e428a90553 Android build migrated to GHA. (#68843)
Summary:
All for builds of the Android (arm32/64 and x86_32/64) are not migrated to the GHA, away from circleCI. Since this part of the workflow creates final binary with all architectures in it, it was not possible to do migration step by step.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68843

Reviewed By: malfet

Differential Revision: D33257480

Pulled By: b0noI

fbshipit-source-id: dd280c8268bdd31763754c36f38e4ea12b23cd2e
2021-12-21 10:02:51 -08:00
5e222d08a1 Revert "Revert D32498572: allow external backend codegen to be used without autograd kernels" (#69949)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69949

This reverts commit 33363cea64fd4be16975c32cf57e9eb123af371d.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D33113544

Pulled By: bdhirsh

fbshipit-source-id: e219f10d52776498c9ad273e97bca3e3406cf702
2021-12-21 08:19:37 -08:00
8e763cd735 Add explicit OperatorHandle destructor (#70033)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/70032

Windows build of PyTorch doesn't produce the `c10::OperatorHandle::~OperatorHandle(void)` symbol in any of its `*.lib` files. This fix is to explicitly define it in Dispatcher.cpp, so downstream consumers wanting to dllimport can find it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70033

Reviewed By: jbschlosser

Differential Revision: D33240599

Pulled By: bdhirsh

fbshipit-source-id: 56cc5963043bd5caac30e42c3501a4f48d086b36
2021-12-21 07:39:26 -08:00
adaf383837 dbr quant: better fix for bug with recursion on dequantize (#70128)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70128

Previous code disabled torch_function when dequantizing arguments
to an unquantizeable function.  This PR blocklists the dequantize
method from the dequantize hook instead, so we can remove
the previous hack.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: ejguan

Differential Revision: D33194396

Pulled By: vkuzo

fbshipit-source-id: 6175c2da637c1d0c93b3fea0ef8218eaee6a2872
2021-12-21 06:25:37 -08:00
cce9c9aa45 dbr quant: stop overridding tensor getters (#70115)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70115

This PR turns off DBR quant __torch_function__ overrides on
tensor attribute getters such as `x.dtype`. This should help
with making the debug logs more readable, and reduce framework
overhead.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: ejguan

Differential Revision: D33189544

Pulled By: vkuzo

fbshipit-source-id: e0d664bb6b76ca9e71c8a439ae985a0849312862
2021-12-21 06:25:34 -08:00
f291708058 dbr quant: clean up logging format (#70114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70114

This PR makes the debug logging for DBR quant be more useful
and easier to read.

New format looks like

```
DEBUG:auto_trace: fqn: _tf_ <function tanhshrink at 0x7fa4d02d4790> out torch.float32 end
```

This will be useful to speed up further work.

Test Plan:
```
// run this with logging enabled, logs easier to read
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D33189545

Pulled By: vkuzo

fbshipit-source-id: 20af7e066e710beac5a3871a9d6259ee5518f97d
2021-12-21 06:25:31 -08:00
fb2a6747b8 dbr quant: add test for qconfig_dict and methods (#70109)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70109

Adds a test case for DBR quant + qconfig_dict specifying methods
by object_type.  Fixes a bug in the FX rewriter for scripting
to make the test pass.

Full coverage of methods will come in future PRs, this PR is
just to verify qconfig_dict is hooked up correctly.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR.test_qconfig_dict_object_type_method
```

Reviewed By: jerryzh168

Differential Revision: D33188160

Pulled By: vkuzo

fbshipit-source-id: 47ab9dbca8cdb1cf22d6d673d9c15b3bc0d1ec81
2021-12-21 06:24:18 -08:00
78bea1bb66 update example in classification losses (#69816)
Summary:
Just updated a few examples that were either failing or raising deprecated warnings.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69816

Reviewed By: bdhirsh

Differential Revision: D33217585

Pulled By: albanD

fbshipit-source-id: c6804909be74585c8471b8166b69e6693ad62ca7
2021-12-21 02:46:48 -08:00
19f898402d Revert D33241684: [pytorch][PR] Install TensorRT lib on oss docker and enable fx2trt unit test
Test Plan: revert-hammer

Differential Revision:
D33241684 (dab3d3132b)

Original commit changeset: cd498908b00f

Original Phabricator Diff: D33241684 (dab3d3132b)

fbshipit-source-id: d5b2e663b5b0c9e570bd799b9f6111cd2a0de4f7
2021-12-20 23:14:35 -08:00
b376d82caf Remove backward op for slow dilated 2d convolution (#70067)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70067

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D33172551

Pulled By: jbschlosser

fbshipit-source-id: 2f1802c77253e543ebb7ee8ee0a12fa4defde311
2021-12-20 19:18:34 -08:00
dab3d3132b Install TensorRT lib on oss docker and enable fx2trt unit test (#70203)
Summary:
CI

Lib installed and unit test run on https://github.com/pytorch/pytorch/actions/runs/1604076060

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70203

Reviewed By: janeyx99

Differential Revision: D33241684

Pulled By: wushirong

fbshipit-source-id: cd498908b00f3417bdeb5ede78f5576b3b71087c
2021-12-20 18:51:48 -08:00
123be0e5b7 [fusion] Add ConvTranspose+BN fusion support (#70022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70022

Add support for fusing ConvTranpose{1,2,3}d with BatchNorm{1,2,3}d. This re-uses the existing fusion logic but adds a "transpose" flag to the fusing function which when enabled will use the appropriate reshape for ConTranspose's transposed weights.

Test Plan: `buck test mode/dev //caffe2/test:quantization -- -r quantization.eager.test_fusion.TestFusion`

Reviewed By: jerryzh168

Differential Revision: D33074405

fbshipit-source-id: 5e9eff1a06d8f98d117e7d18e80da8e842e973b7
2021-12-20 18:42:48 -08:00
24f16de987 [Static Runtime] Support native op split_with_sizes (#69999)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69999

This adds support for the split_with_sizes operator in static runtime by adding native operators. Those operators will have less overhead comparing to their JIT fallbacks (no dispatching, no stack constructing in runtime).

split_with_sizes can be called directly from cpp API, or in `torch.split`  when `split_sizes` is a list. This diff adds support for both use cases.

Test Plan:
- Added unit tests. Made sure the operators are used
- Benchmark
```
./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench \
--scripted_model=/data/users/dxd/305797439_0.predictor.precompute.remote_request_only \
--method_name=user.forward --pt_cleanup_activations=1 \
--pt_enable_out_variant=1 --pt_optimize_memory=1 --iters=1000 --warmup_iters=500 \
--num_threads=1 --pt_enable_static_runtime=1 --set_compatibility=1 \
--input_type="recordio" --pt_inputs=/data/users/dxd/305797439_0_user.inputs.recordio \
--recordio_use_ivalue_format=1 --do_profile=1 --do_benchmark=1
```

#### Before
```
Static runtime ms per iter: 3.62073. Iters per second: 276.187
0.0471904 ms.    1.31501%. aten::split_with_sizes (5 nodes)
```
#### After
```
Static runtime ms per iter: 3.44374. Iters per second: 290.382
0.0432057 ms.    1.34276%. aten::split_with_sizes (5 nodes, native)
```

Reviewed By: swolchok

Differential Revision: D33141006

fbshipit-source-id: feae34c4c873fc22d48a8ff3bf4d71c0e00bb365
2021-12-20 18:32:54 -08:00
6623c4838e Handle the corner case when min == max in L2 search (#70207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70207

In corner case when min == max, adjust_hist_to_include_zero() function used in L2 search will cause additional_nbins = -2147483648 and initialize bins_f with negative size.

Test Plan:
Before fix:
f315187213

After fix:
f315471862

Reviewed By: jspark1105

Differential Revision: D33227717

fbshipit-source-id: 7e8a455e51a0703a3a9c5eb7595d9b4d43966001
2021-12-20 17:46:55 -08:00
f17e76b0f2 Expand description of bias_sizes arg for convolution_backward (#70195)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70195

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D33240155

Pulled By: jbschlosser

fbshipit-source-id: c4f907d6e33e4d1eeb1b5228f1152307c8b27729
2021-12-20 17:33:17 -08:00
3e8ef9a272 Add return type annotation for ShardedTensor (#69945)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69945

Test Plan: CI

Reviewed By: wanchaol

Differential Revision: D32502393

fbshipit-source-id: 7bea08762446b211d8ea028d024d2acdabe45479
2021-12-20 17:15:44 -08:00
c555b7bacb GHA: Remove caffe2 check in Windows shard 1 smoke tests (#70010)
Summary:
Windows shard 1 hasn't actually been running any tests because the script that does so exited before running the python tests but did not report an error. This has been happening to all windows tests across the board, for example https://github.com/pytorch/pytorch/runs/4526170542?check_suite_focus=true

Removing the caffe2.python check passes the smoke tests now. You can observe that the run_test.py file is called in the windows cpu job now https://github.com/pytorch/pytorch/runs/4541331717?check_suite_focus=true

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70010

Reviewed By: malfet, seemethere

Differential Revision: D33161291

Pulled By: janeyx99

fbshipit-source-id: 85024b0ebb3ac42297684467ee4d0898ecf394de
2021-12-20 16:05:38 -08:00
e6d9bb8d57 reduce the number of instantiations for bernoulli tensor tensor kernel (#70169)
Summary:
Reduces the binary size of DistributionBernoulli.cu 12282600 -> 3946792
Tensor-tensor bernoulli kernels are rarely used, we limit dispatches to double probability type for double `self` tensor, and `float` probability type for everything else. This would be a minor perf hit if probability tensor is of the different dtype, but given how rarely these kernels are used (and how rarely the probability tensor is not float) this is not a problem.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70169

Reviewed By: jbschlosser

Differential Revision: D33237890

Pulled By: ngimel

fbshipit-source-id: 185c4b97aba0fb6ae159d572dd5bbb13cf676bb4
2021-12-20 13:46:34 -08:00
79a40b22aa [Checkpoint] Make checkpoint_wrapper an nn.Module (#70164)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70164

Implement Alban's suggestion to make checkpoint_wrapper an nn.Module
instead of patching the forward pass, which is too hacky.
ghstack-source-id: 146011215

Test Plan: IC

Reviewed By: mrshenli

Differential Revision: D33214696

fbshipit-source-id: dc4b3e928d66fbde828ab60d90b314a8048ff7a2
2021-12-20 13:22:28 -08:00
fcaecd718a Write flaky tests to rockset (#70136)
Summary:
Try using Rockset as backend for data instead of RDS

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70136

Reviewed By: suo

Differential Revision: D33242148

Pulled By: janeyx99

fbshipit-source-id: 8935ceb43717fff4922b634165030cca7e934968
2021-12-20 13:17:21 -08:00
5651e1e3ad Add auto_linear formulas and some others (#69727)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69727

Still need to test the backward ones. We would need to update gradgradcheck to check forward over backward.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33031728

Pulled By: soulitzer

fbshipit-source-id: 86c59df5d2196b5c8dbbb1efed9321e02ab46d30
2021-12-20 12:15:25 -08:00
65f54bc000 [SR] Optimize VarStack (#68750)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68750

There was some room for optimization in static runtime's `prim::VarStack`:

* Avoid refcount bumps - constructing the `std::vector<at::Tensor>` can be avoided by writing a custom version of `stack_out` that takes a `std::vector<at::Tensor*>`

* Skip the memory overlap check

* Avoid device dispatcher overhead in a few places (e.g. `tensor.unsqueeze -> at::native::unsqueeze`)

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Stack`

Reviewed By: swolchok

Differential Revision: D32596934

fbshipit-source-id: e8f0ccea37c48924cb4fccbfdac4e1e11da95ee0
2021-12-20 11:46:11 -08:00
a799ffebd2 Create lower code example (#70142)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70142

Create lower code example in oss, and run benchmark agaist resnet101

Test Plan: CI

Reviewed By: 842974287

Differential Revision: D33117440

fbshipit-source-id: 359d0c9e65899ab94c8f3eb112db70db5d938504
2021-12-20 11:37:08 -08:00
423ce416d8 Prune osx-arm64 binaries from nightly channel (#70132)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/70043

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70132

Reviewed By: janeyx99

Differential Revision: D33195431

Pulled By: malfet

fbshipit-source-id: 4579a6788255a6df306862c3e959ae7a9ddd4e45
2021-12-20 11:28:43 -08:00
41959ce77f [JIT] scripting, freezing, serialization for sparse csr (#69555)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69555

1. Implement pickling/unpickling
2. Add `test_freeze_sparse_csr, tests_serialize_sparse_csr` tests

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D33181367

Pulled By: davidberard98

fbshipit-source-id: a15d5193a7b1b1625a27e4af003cec33cdbc8071
2021-12-20 11:13:34 -08:00
bcb6076099 Sparse CSR tensors: storage access should throw (#70072)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70072

Like (sparse COO tensors), sparse CSR tensors don't really have an actual storage() that can be accessed, so sparsetensor->storage() should throw.

cc nikitaved pearu cpuhrsch

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D33181309

Pulled By: davidberard98

fbshipit-source-id: 8f1dc4da03073d807e5acee2ac47caeffb94b16c
2021-12-20 11:12:01 -08:00
bcc7dbdf37 Change open source unit test deps (#70167)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70167

1. Change unit test dependency to open source base class, so that this unit test can run on git oss CI
2. Remove usage of typing.Protocol, so that lower can run with Python 3.6

Test Plan:
oss CI
passed with change included in commit:
https://github.com/pytorch/pytorch/actions/runs/1597530689
see test(fx2trt)

Reviewed By: yinghai

Differential Revision: D33228894

fbshipit-source-id: ffe3d40a02a642b3b857a0605101797037a580bb
2021-12-20 10:41:38 -08:00
dd02af6283 Bilinear no_batch_dim (#69539)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69539

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D33200105

Pulled By: george-qi

fbshipit-source-id: c674e3937fea95c4ec41a01c5aa6d6890042b288
2021-12-20 09:44:07 -08:00
978089c381 Prevent divide-by-zero errors in Timer (#70050)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66503

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70050

Reviewed By: mruberry

Differential Revision: D33168868

Pulled By: robieta

fbshipit-source-id: 7d0ece9e888f6c69a9e0ced581c92d3259fb3540
2021-12-20 09:16:03 -08:00
ad0cd8a76e [DataPipe] Improve inline doc and testing for CollatorIterDataPipe (#70139)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70139

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D33199107

Pulled By: NivekT

fbshipit-source-id: f96d77490998ac9bc3da8d4ff1a9caa08e9e7f27
2021-12-20 08:05:21 -08:00
8a912014b1 [Operator Versioning][Edge] Initialize upgrader thread safe (#70161)
Summary:
Upgrader should only be initialized once when runtime loads the first module. It no longer needs to initialized afterwards.

Previously, instead of using an atomic variable, the upgrader will be initialized depends on whether byteCodeFunctionWithOperator.function.get_code().operators_ is empty. If it's empty, it means the operator from the upgrader is not initialized yet. However, it's not thread safe. When multiple thread loads module together, it's possible that they all consider it's the first module. Use an atomic variable here to make sure it's thread safe.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70161

ghstack-source-id: 146012642

Test Plan:
```
buck test mode/opt //papaya/integration/service/test/analytics/histogram:generic_histogram_system_test -- --exact 'papaya/integration/service/test/analytics/histogram:generic_histogram_system_test - SumHistogramSystemTest.test' --run-disabled
buck test mode/opt //caffe2/test/cpp/jit:jit
```

Reviewed By: iseeyuan

Differential Revision: D33220320

fbshipit-source-id: 10f2397c3b358d5a1d39a2ce25457e3fdb640d2c
2021-12-19 20:16:00 -08:00
7ea86dfdb1 [Profiler] Factor common logic into torch/csrc/profiler/api.h (#69459)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69459

This change breaks the dependency between the kineto and legacy profiler; instead of `profiler_kineto.h` including `profiler_legacy.h`, they both include `profiler/api.h`. As part of this refactor, I injected some intermediate classes to keep legacy behavior from leaking into the kineto profiler:

1) ProfilerThreadLocalState has become ProfilerThreadLocalStateBase which just handles config and callback handle. Legacy and Kineto profilers inherit this and implement their own very disjoint set of logic.

2) CUDAStubs is a pure virtual class to make the interface more readable, and the "always fail" behavior has been moved to a `DefaultCUDAStubs` class in `api.cpp`.

Test Plan: Ran the overhead ubenchmark.

Reviewed By: aaronenyeshi

Differential Revision: D32678163

fbshipit-source-id: 9b733283e4eae2614db68147de81b72f6094ce6c
2021-12-19 18:40:28 -08:00
181120f7d7 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33229251

fbshipit-source-id: 3a69bb459fa0a65888d6f9c8e70b5de032ddad97
2021-12-19 16:38:25 -08:00
60191196d4 [AutoAccept][Codemod][FBSourceBuckFormatLinter] Daily arc lint --take BUCKFORMAT
Reviewed By: zertosh

Differential Revision: D33229262

fbshipit-source-id: 7c22aa59a2a9eea94d2f403c339eb20abc7d9c41
2021-12-19 16:34:00 -08:00
ef70174f2e Separate c10::Symbol header from list of interned strings (#69406)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69406

Most files that include `interned_strings.h` don't actually depend on
anything generated from `FORALL_NS_SYMBOLS` yet because they're in a
single file you need to recompile whenever a new symbol is added. Here
I move the class definition into a separate file so this doesn't
happen.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32923637

Pulled By: albanD

fbshipit-source-id: 6e488cbfcfe2c041a99d9ff22e167dbddf3f46d7
2021-12-19 14:52:26 -08:00
06d0536dad Low precision support for jiterator (#70157)
Summary:
This adds support for bfloat16 and fp16 types for jiterator by adding at::Half and at::BFloat16 classes to the jiterator code template. The only methods defined in those classes are construction from float and implicit conversion to float. Mathematical operations on them never need to be defined, because jiterator is written in a way to implicitly upcast the inputs to the functor, so all math has to be performed on float only (e.g. compute part of the kernel would always be written as
```
        out[j] = i0<float>(arg0[j]);
```
It also adds support for casting to complex outputs, by adding a similar templated class c10::complex<T>. Originally I planned to only support float -> complex complex for it, but to compile fetch_and_cast function we also need complex -> float conversion. We can avoid it by compiling fetch_and_cast for a different subset of types, but I'm not doing it in this PR. Thus, technically, we can compile a kernel that would accept complex inputs and produce wrong results, but we are guarding against it by static asserting that none of the functor datatype are complex, and runtime-checking that none of the inputs are complex.
Adding bfloat16, half and complex support allows us to remove special handling for type promotion tests for gcd.
i0 (that supports half and bfloat16 inputs) is moved to use jiterator.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70157

Reviewed By: mruberry

Differential Revision: D33221645

Pulled By: ngimel

fbshipit-source-id: 9cfe8aba3498a0604c4ea62c217292ea06c826b1
2021-12-19 11:56:57 -08:00
78f06e0690 fixing conv2d decomposition and tests (#70127)
Summary:
Current implementation has a bug where decomposed `add_optional` from `conv2d` is placed before the producer node, this causes linter error on graph.

Cherry-picked from https://github.com/csarofeen/pytorch/pull/1333
Fixing issue posted in https://github.com/csarofeen/pytorch/issues/1325

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70127

Reviewed By: ejguan

Differential Revision: D33199018

Pulled By: jansel

fbshipit-source-id: bce1f14a443811b4d55116a04fd4daa86084cc47
2021-12-19 10:38:23 -08:00
de4e7dece9 [Quant][fx] Added test for quint4x2 for fx graph mode quantization (#69846)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69846

Test Plan:
In pytorch main dir, execute

    to run the added test

Reviewed By: jbschlosser

Differential Revision: D33152672

Pulled By: dzdang

fbshipit-source-id: 89951fcd23e7061d6c51e9422540b5f584f893aa
2021-12-19 06:15:26 -08:00
75718e5059 [Quant][Eager] Added 4 bit support for eager mode quantization flow (#69806)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69806

Minor modifications were made to support 4 bit embedding quantized module in eager mode quantization flow and to allow for testing of the changes

Test Plan:
In pytorch main dir, execute
```
python test_quantization.py TestPostTrainingStatic.test_quantized_embedding
```
to run the series of tests, including the newly added test_embedding_4bit
function

Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33152675

fbshipit-source-id: 5cdaac5aee9b8850e61c99e74033889bcfec5d9f
2021-12-19 06:14:12 -08:00
9f512e129b [Quant] Added 4 bit support for embedding quantized module (#69769)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69769

Added 4 bit support and the correpsonding test in the module api. Restructured the test_quantized_module for both 4 & 8 bit support.

Test Plan:
In pytorch main dir, execute
```
python test/test_quantization.py TestStaticQuantizedModule.test_embedding_api
```

Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33152674

fbshipit-source-id: 73e63383cf60994ab34cc7b4eedd8f32a806cf7f
2021-12-18 22:26:24 -08:00
b331752314 [Quant] Implemented 4 bit embedding op support; added corresponding test case (#69768)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69768

Support for the 4 embedding operator has been added. The support is analogous to the preexisting support for byte/8bit embedding. A corresponding test case was added to test_quantized_embedding_op.py

Test Plan:
In pytorch main dir, execute
```
python test/test_quantization.py TestStaticQuantizedModule.test_embedding_api
```
to run the series of tests, including the newly added test_embedding_4bit
function

Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33152673

fbshipit-source-id: bdcc2eb2e37de38fda3461ff3ebf1d2fb5e58071
2021-12-18 22:03:33 -08:00
94abf120c8 [quant][fx][graphmode][be] Use is_qat instead of model.training as a flag for qat (#69878)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69878

But we'll still verify that model.training is True when user call prepare_qat API.
Relaxing this condition might also mean that we change the api for methods in fuser_method_mapping,
with additional flag for qat (currently we just have different fusions for training/eval), I don't think
this is P0, we could revisit if there is a need in the future

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
```

Imported from OSS

Reviewed By: supriyar

Differential Revision: D33080988

fbshipit-source-id: b13715b91f10454948199323c5d81ef88bb3517f
2021-12-18 00:00:46 -08:00
fb34af1b21 [nnc][quantization] Optimize constructTensors in ext functions (#69856)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69856

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33064756

Pulled By: IvanKobzarev

fbshipit-source-id: 430d850f8591b8e0a0bdba5c41896627a72db88e
2021-12-17 23:45:03 -08:00
84b7832010 Updates CUDA memory leak check to verify against driver API and print more diagnostic information (#69556)
Summary:
Per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69556

Reviewed By: mrshenli

Differential Revision: D32954770

Pulled By: mruberry

fbshipit-source-id: a6c2ae6f704422c178569980ca4b9c72c4272f55
2021-12-17 23:37:49 -08:00
6c68045f60 [quant][graphmode][fx][be] Fix a typo in quantization/fx/graph_module (#69877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69877

att

Test Plan:
```
python tes/test_quantization.py TestQuantizeFx
```

Imported from OSS

Reviewed By: supriyar

Differential Revision: D33079525

fbshipit-source-id: dfd3afb916067a628071a59ce95c6b1d228a3c72
2021-12-17 23:33:33 -08:00
9d3a6fa623 [quant][bc-breaking] Remove QConfigDynamic from quantization api (#69875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69875

att

Test Plan:
ci + regression tets:
```
python test/test_quantization.py TestPostTrainingStatic
python test/test_quantization.py TestPostTrainingDynamic
python test/test_quantization.py TestQuantizeFx
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D33079096

fbshipit-source-id: 1e73bb27c518eba62b60f3a8c4b532dddc8367cf
2021-12-17 23:10:06 -08:00
5db711f9d3 [quant][be] Replace QConfigDynamic with QConfig in code (#69864)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69864

att, will have a follow up PR that removes QConfigDynamic in the api

Test Plan:
regression tests
```
python test/test_quantization.py TestPostTrainingStatic
python test/test_quantization.py TestPostTrainingDynamic
python test/test_quantization.py TestQuantizeFx
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D33073235

fbshipit-source-id: 6c1a1647032453803c55cdad7c04154502f085db
2021-12-17 22:30:57 -08:00
c463d50098 [fx2trt] Convert to tuple is output_size of adaptive avg pool is an integer (#70144)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70144

It can be an integer and in this case we need to extend it.

Test Plan:
Added a unit test.
```
RemoteExecution session id: reSessionID-d97b46e3-20d1-4f5c-a166-4efcf1579352-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8162774391775638
    ✓ ListingSuccess: caffe2/test/fx2trt/converters:test_adaptive_avgpool - main (9.454)
    ✓ Pass: caffe2/test/fx2trt/converters:test_adaptive_avgpool - test_adaptive_avgpool_with_dynamic_shape (caffe2.test.fx2trt.converters.acc_op.test_adaptive_avgpool.TestAdaptiveAvgPoolConverter) (16.083)
    ✓ Pass: caffe2/test/fx2trt/converters:test_adaptive_avgpool - test_adaptive_avgpool_1 (caffe2.test.fx2trt.converters.acc_op.test_adaptive_avgpool.TestAdaptiveAvgPoolConverter) (16.349)
    ✓ Pass: caffe2/test/fx2trt/converters:test_adaptive_avgpool - test_adaptive_avgpool_2 (caffe2.test.fx2trt.converters.acc_op.test_adaptive_avgpool.TestAdaptiveAvgPoolConverter) (16.543)
    ✓ Pass: caffe2/test/fx2trt/converters:test_adaptive_avgpool - test_adaptive_avgpool_0 (caffe2.test.fx2trt.converters.acc_op.test_adaptive_avgpool.TestAdaptiveAvgPoolConverter) (16.651)
Summary
  Pass: 4
  ListingSuccess: 1
```

Reviewed By: wushirong

Differential Revision: D33200773

fbshipit-source-id: 8c10d644982a4723a78f8615d8bcdbc3968790db
2021-12-17 18:31:25 -08:00
9ee3006d58 [fx-acc][graph-opts] bug fixes for transpose_to_reshape, optimize_quantization, finalize_kwargs_to_concrete
Summary:
Fixes a couple of bugs that surfaced during integration of graph opts into `AcceleratedGraphModule` (D31484770).

2. Fix bug in `graph_opt.transpose_to_reshape` implementation that causes it to incorrectly apply opt for `permute` op acting on shape `(B, N, N)` with `N > 1` and permutation `(0, 2, 1)`. Fixed the bug and added test case to cover this case.
3. Revert part of D31671833 (0e371e413d), where I made `acc_out_ty` into a required argument
4. Align `graph_opt.transpose_to_reshape` and `graph_opt.optimize_quantization` to not set `acc_out_ty` when adding a new node to graph and instead rely on tensor metadata
5. Run `acc_utils.copy_acc_out_ty_from_meta_to_acc_ops_kwargs()` in `GraphOptsTest.verify_numerics` before running graph on sample inputs.

Test Plan:
```
buck test mode/opt glow/fb/fx/graph_opts:
```

```
...
Summary
  Pass: 85
  ListingSuccess: 4
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/562950163929022
```

Reviewed By: jfix71

Differential Revision: D31851549

fbshipit-source-id: 602affe2a2a0831d2f17b87025107ca87ecb0e59
2021-12-17 17:35:48 -08:00
bd9983366b [fx2trt] Add support for torch.mean (#70052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70052

As the title. Also refactored a bit to separate out the common part of adding a reduce operator.

This would make mnasnet lowerable without splitter.

Test Plan: Added unit tests.

Reviewed By: wushirong

Differential Revision: D33163950

fbshipit-source-id: 7eb8f8a852cd8e8d9937029c4b4602b036502b3a
2021-12-17 15:48:31 -08:00
9fb199bc12 Add convolution_backward to aten_interned_strings.h (#70112)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70112

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33188664

Pulled By: jbschlosser

fbshipit-source-id: 20e565c2fef4c1c3c087ba9b36320b7e539e467e
2021-12-17 15:38:47 -08:00
9b14d93d78 Fix bazel workflows (#70137)
Summary:
Fixes regression after manual rebase of e35bf56461

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70137

Reviewed By: pbelevich

Differential Revision: D33197055

Pulled By: malfet

fbshipit-source-id: 21adf7297f75715a59d2a1b3751b4ec8f71c7c03
2021-12-17 14:48:11 -08:00
70ed4f3ffc Try dropping Torch from typeshed_internal (#69926)
Summary:
Removes the internal typeshed for PyTorch and replaces it with PyTorch's own type annotations.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69926

Generated files are in P471601595, P471601643, P471601662

Based on an example in D26410012

Test Plan: Sandcastle

Reviewed By: malfet, pradeep90

Differential Revision: D32292834

fbshipit-source-id: 5223f514cbdccd02c08ef0a027a48d92cdebed2c
2021-12-17 14:08:19 -08:00
e35bf56461 [Bazel] Add CUDA build to CI (#66241)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35316
On master, bazel cuda build is disabled due to lack of a proper `cu_library` rule. This PR:
- Add `rules_cuda` to the WORKSPACE and forward `cu_library` to `rules_cuda`.
- Use a simple local cuda and cudnn repositories (adopted from TRTorch) for cuda 11.3.
- Fix current broken cuda build.
- Enable cuda build in CI, not just for `:torch` target but all the test binaries to catch undefined symbols.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66241

Reviewed By: ejguan

Differential Revision: D31544091

Pulled By: malfet

fbshipit-source-id: fd3c34d0e8f80fee06f015694a4c13a8e9e12206
2021-12-17 13:44:29 -08:00
e0f4e28c69 Skip forward-over-reverse gradgrad check for pinv singular on CUDA fo… (#70123)
Summary:
…r cdouble

Fixes https://github.com/pytorch/pytorch/issues/70046

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70123

Reviewed By: zou3519

Differential Revision: D33193017

Pulled By: soulitzer

fbshipit-source-id: 846f97ad1bf38c7239e9fc40fd5f476e29264f7c
2021-12-17 13:38:57 -08:00
38e026c14d Add tanh_backward to AT symbols (#70071)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70071

This commit adds tanh_backward to aten_interned_strings.h as an AT symbol.

Test Plan: CI.

Reviewed By: mruberry

Differential Revision: D33173370

Pulled By: alanwaketan

fbshipit-source-id: e20ed2a807156ce772b7c1e3f434fa895116f4c3
2021-12-17 13:35:05 -08:00
a6b7521428 always use max cmake when cmake3 and cmake are all existed (#69355)
Summary:
For Pytorch source build when using Ninja generator, it requires **CMake >=3.13**,  Pytorch always checks **cmake3 >= 3.10** first, so when **3.13> cmake3 >= 3.10** and then PyTorch will use cmake3, there will report an error: ```Using the Ninja generator requires CMake version 3.13 or greater```  even the **CMake >=3.13** .

For example: for my centos machine, the system CMake3 is ```3.12```,  and my conda env's CMake is ```3.19.6```,  there will have a build error which PyTorch choose CMake 3, I can update CMake3 or create an alias or a symlink to solve this problem, but the more reasonable way is that ```_get_cmake_command ``` always return the newest CMake executable (unless explicitly overridden with a same CMAKE_PATH environment variable).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69355

Reviewed By: jbschlosser

Differential Revision: D33062274

Pulled By: malfet

fbshipit-source-id: c6c77ce1374e6090a498be227032af1e1a82d418
2021-12-17 12:53:49 -08:00
254360e182 [ROCm] Skip test_fn_fwgrad_bwgrad_* unexpected success tests (#70124)
Summary:
Skip tests that cause unexpected success for ROCm

Signed-off-by: Kyle Chen <kylechen@amd.com>

additional to this PR:
https://github.com/pytorch/pytorch/pull/70061

skipping 4 more tests that cause unexpected success and fail the CI job for ROCm

log:
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.3.1-py3.6-test2/15350/console

cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70124

Reviewed By: ejguan

Differential Revision: D33193508

Pulled By: malfet

fbshipit-source-id: 9949910e2e7dc66cbadd23cea874df26e2d4136d
2021-12-17 12:08:47 -08:00
26e32988bd Revert D32596264: Codegen: TraceType only includes operators being registered
Test Plan: revert-hammer

Differential Revision:
D32596264 (e66a8ab4f5)

Original commit changeset: 2f28b62d7b99

Original Phabricator Diff: D32596264 (e66a8ab4f5)

fbshipit-source-id: 7d18c4e77ce30dd7817a95f9c39b565cb246cd12
2021-12-17 11:20:12 -08:00
2f622e87bd Revert D32596274: Codegen: ADInplaceOrViewType only include operators registered
Test Plan: revert-hammer

Differential Revision:
D32596274 (9ad940d982)

Original commit changeset: 400cad023782

Original Phabricator Diff: D32596274 (9ad940d982)

fbshipit-source-id: 5c53195edaae47b9daba373cf166d2382178d01b
2021-12-17 11:02:08 -08:00
60eb1e53b2 Sparse CSR CPU: Add block sparse support for MKL path (#68710)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68710

This PR adds support for block sparse (BSR) matrices for functions that
use Inspector-Executor MKL Sparse API. At the moment of this PR it's:
* torch.addmm
* torch.addmv
* torch.triangular_solve (once https://github.com/pytorch/pytorch/pull/62180 is merged)

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D33179486

Pulled By: cpuhrsch

fbshipit-source-id: e1dec0dccdbfed8b280be16b8c11fc9e770d50ae
2021-12-17 10:56:05 -08:00
0cfff65395 Apply contiguous on inputs of cdist backward (#70016)
Summary:
Description:
- Apply contiguous on inputs of cdist backward
- Added a test

Fixes https://github.com/pytorch/pytorch/issues/69997

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70016

Reviewed By: ejguan

Differential Revision: D33187946

Pulled By: albanD

fbshipit-source-id: 645306aa043b2f84c4c2df0306fabfc224d746b6
2021-12-17 10:54:45 -08:00
bc95e5a196 [ROCm] Skip test_fn_fwgrad_bwgrad_gradient_cuda_complex128 (#70061)
Summary:
This PR will skip test_fn_fwgrad_bwgrad_gradient_cuda_complex128 test for ROCm

Signed-off-by: Kyle Chen <kylechen@amd.com>

Related github isssue:
[https://github.com/pytorch/pytorch/issues/70027](https://github.com/pytorch/pytorch/issues/70027)

jithunnair-amd jeffdaily

cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70061

Reviewed By: ejguan

Differential Revision: D33189411

Pulled By: malfet

fbshipit-source-id: a60d5b35099d3c8d3ceebb996e91470a8a676f85
2021-12-17 10:47:31 -08:00
de992c6b21 Specify ij indexing when cartesian_prod calls meshgrid (#68753)
Summary:
Currently, `cartesian_prod` calls `meshgrid` without passing an indexing parameter. This causes a warning to be shown when running the `cartesian_prod` example from the docs. This PR simply passes the default value for this indexing parameter instead.

Fixes https://github.com/pytorch/pytorch/issues/68741

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68753

Reviewed By: kimishpatel

Differential Revision: D33173011

Pulled By: mruberry

fbshipit-source-id: 667185ec85bd62bda177bc5768d36f56cfc8b9ab
2021-12-17 10:39:44 -08:00
9ad940d982 Codegen: ADInplaceOrViewType only include operators registered (#68692)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68692

ADInplaceOrViewType is a sharded file, so by only including specific
operator headers, we ensure that changing one (non-method) operator
only needs one shard to be re-compiled.

This also ports the generated code over to the `at::_ops` interface,
and the code generator itself to using `write_sharded` instead of
re-implementing its own version of sharding.

Test Plan: Imported from OSS

Reviewed By: jbschlosser, malfet

Differential Revision: D32596274

Pulled By: albanD

fbshipit-source-id: 400cad0237829720f94d60f9db7acd0e918e202e
2021-12-17 10:36:20 -08:00
e66a8ab4f5 Codegen: TraceType only includes operators being registered (#68691)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68691

TraceType is a sharded file, so by only including specific operator
headers, we ensure that changing one (non-method) operator only needs
one shard to be re-compiled.

This also changes all the included autograd and jit headers from
including `ATen/ATen.h` to just including `ATen/core/Tensor.h`.

Test Plan: Imported from OSS

Reviewed By: jbschlosser, malfet

Differential Revision: D32596264

Pulled By: albanD

fbshipit-source-id: 2f28b62d7b9932f30fad7daacd8ac5bb7f63c621
2021-12-17 10:35:05 -08:00
0d06616c47 Add dict methods to ParameterDict (#69403)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68476

We implemented all of the following `dict` methods for `ParameterDict`
- `get `
- `setdefault`
- `popitem`
- `fromkeys`
- `copy`
- `__or__`
- `__ior__`
- `__reversed__`
- `__ror__`

The behavior of these new methods matches the expected behavior of python `dict` as defined by the language itself: https://docs.python.org/3/library/stdtypes.html#typesmapping

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69403

Reviewed By: albanD

Differential Revision: D33187111

Pulled By: jbschlosser

fbshipit-source-id: ecaa493837dbc9d8566ddbb113b898997e2debcb
2021-12-17 10:15:47 -08:00
35519428a2 Remove backward ops for miopen depthwise convolution (#70064)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70064

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33171169

Pulled By: jbschlosser

fbshipit-source-id: 668ca9baa992d3bb1cfa7b53fd2127ffeb051147
2021-12-17 10:08:49 -08:00
ab2a739851 Remove backward ops for miopen transposed convolution (#70063)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70063

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33171170

Pulled By: jbschlosser

fbshipit-source-id: 4fd6c1cd027f714354644c4ac7694d0f9092c762
2021-12-17 10:07:27 -08:00
ec577300d7 OpInfo: Convert more sample_input_funcs to generators (#69976)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69976

These are sample functions that already use generators internally, this just moves the `yield` into the sample function itself.

Re-submit of #69257

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D33172953

Pulled By: mruberry

fbshipit-source-id: 7b8bae72df6a225df88a158b7ffa82a71d3c061b
2021-12-17 10:03:59 -08:00
950957f857 Fix jit tests assuming sample_inputs is a list (#69975)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69975

cc mruberry

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D33172952

Pulled By: mruberry

fbshipit-source-id: 1f8bb49179f7fbd0fec5e7344e8c213484518e27
2021-12-17 10:02:50 -08:00
ad79d0dd4b Add ciflow/trunk label (#69575)
Summary:
Which includes all workflows but periodic ones

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69575

Reviewed By: seemethere

Differential Revision: D32932850

Pulled By: malfet

fbshipit-source-id: 80b58fb3a0d5f8dbc527124be5bf25bd716448b8
2021-12-17 09:57:46 -08:00
de296d526f move torch.testing from prototype to beta (#69668)
Summary:
cc brianjo mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69668

Reviewed By: albanD

Differential Revision: D33028213

Pulled By: mruberry

fbshipit-source-id: 3316b887d4c322cc1262feee651464da4124a6de
2021-12-17 09:52:47 -08:00
de2d9e2966 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33183467

fbshipit-source-id: d7c37f3522a38e85891524c544eab4fdb01270de
2021-12-17 09:45:20 -08:00
1065739781 Fix build on latest main branch of thrust - SoftMax.cu (#70039)
Summary:
Similar to https://github.com/pytorch/pytorch/issues/69985

I think there's any other source file  which should `#include <thrust/iterator/constant_iterator.h>` as of 73a6c36f1b

```
mkozuki@mkozuki-srv ~/ghq/github.com/crcrpar/torch-0 master
torch-0 ❯ git rev-parse HEAD; rg -inw make_constant_iterator
73a6c36f1bfbf9aff04ba41cfe6ab06aa99883d9
aten/src/ATen/native/cuda/LegacyThrustHelpers.cu
54:    thrust::make_constant_iterator(1),

aten/src/ATen/native/sparse/cuda/SoftMax.cu
301:      thrust::make_constant_iterator(int64_t(1)),
```

## build error

```console
https://github.com/pytorch/pytorch/issues/22 2048. /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DAT_PER_OPERATOR_HEADERS -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMAGMA_V2 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DTORCH_CUDA_BUILD_MAIN_LIB -DUSE_C10D_GLOO -DUSE_C10D_MPI -DUSE_C10D_NCCL -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_NCCL -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -Dtorch_cuda_EXPORTS -Iaten/src -I../aten/src -I. -I../ -I../cmake/../third_party/benchmark/include -I../cmake/../third_party/cudnn_frontend/include -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -Iinclude -I../torch/csrc/distributed -I../aten/src/TH -I../aten/src/THC -I../aten/src/ATen/cuda -Icaffe2/aten/src -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -Inccl/include -I../c10/cuda/../.. -I../c10/.. -I../third_party/tensorpipe -Ithird_party/tensorpipe -I../third_party/tensorpipe/third_party/libnop/include -I../torch/csrc/api -I../torch/csrc/api/include -isystem=third_party/gloo -isystem=../cmake/../third_party/gloo -isystem=../cmake/../third_party/googletest/googlemock/include -isystem=../cmake/../third_party/googletest/googletest/include -isystem=../third_party/protobuf/src -isystem=/opt/conda/include -isystem=../third_party/gemmlowp -isystem=../third_party/neon2sse -isystem=../third_party/XNNPACK/include -isystem=../third_party -isystem=../cmake/../third_party/eigen -isystem=/opt/conda/include/python3.8 -isystem=/opt/conda/lib/python3.8/site-packages/numpy/core/include -isystem=../cmake/../third_party/pybind11/include -isystem=/opt/hpcx/ompi/include/openmpi -isystem=/opt/hpcx/ompi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include -isystem=/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem=/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem=/opt/hpcx/ompi/include -isystem=/usr/local/cuda/include -isystem=../third_party/ideep/mkl-dnn/third_party/oneDNN/include -isystem=../third_party/ideep/include -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -Xcudafe --diag_suppress=20236 -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -O3 -DNDEBUG -Xcompiler=-fPIC -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -DHAVE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD -Xcompiler=-Wall,-Wextra,-Wno-unused-parameter,-Wno-unused-variable,-Wno-unused-function,-Wno-unused-result,-Wno-unused-local-typedefs,-Wno-missing-field-initializers,-Wno-write-strings,-Wno-unknown-pragmas,-Wno-type-limits,-Wno-array-bounds,-Wno-unknown-pragmas,-Wno-sign-compare,-Wno-strict-overflow,-Wno-strict-aliasing,-Wno-error=deprecated-declarations,-Wno-missing-braces,-Wno-maybe-uninitialized -DTORCH_CUDA_BUILD_MAIN_LIB -Xcompiler -pthread -std=c++14 -MD -MT caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SoftMax.cu.o -MF caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SoftMax.cu.o.d -x cu -c ../aten/src/ATen/native/sparse/cuda/SoftMax.cu -o caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SoftMax.cu.o
https://github.com/pytorch/pytorch/issues/22 2048. ../aten/src/ATen/native/sparse/cuda/SoftMax.cu(301): error: namespace "thrust" has no member "make_constant_iterator"
...
https://github.com/pytorch/pytorch/issues/22 2048. 13 errors detected in the compilation of "../aten/src/ATen/native/sparse/cuda/SoftMax.cu".
```

cc xwang233 zasdfgbnm ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70039

Reviewed By: mruberry

Differential Revision: D33166702

Pulled By: ngimel

fbshipit-source-id: 33f3b80095c8562786a9a9b7a0e7eb58201af458
2021-12-17 09:28:44 -08:00
92463573d8 Sanitize string before passing it as shell argument (#70070)
Summary:
Use `c10::printQuotedString` to escape any characters that might render
string to be interpreted as more than one argument by shell script.

Please note, that this codepath is deprecated and is not accessible
by a typical PyTorch usage workflows.

This issue was discovered by Daniel Lawrence of the Amazon Alexa team.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70070

Reviewed By: suo

Differential Revision: D33172721

Pulled By: malfet

fbshipit-source-id: 9dbd17f6eb775aaa1a545da42cbc95864c1189ee
2021-12-17 08:08:28 -08:00
54406314cc Update PULL_REQUEST_TEMPLATE.md (#70105)
Summary:
Many users actually send things like `Fixes #{69696}` which then fails to properly close the corresponding issue.

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70105

Reviewed By: ejguan

Differential Revision: D33187501

Pulled By: albanD

fbshipit-source-id: 2080ee42c30b9db45177f049627118a6c3b544b7
2021-12-17 07:53:36 -08:00
b1d5948b34 Remove backward ops for miopen convolution (#69987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69987

Stack from [ghstack](https://github.com/ezyang/ghstack):
* __->__ #69987

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33170379

Pulled By: jbschlosser

fbshipit-source-id: 6bc274f1d457ec5bddc8b52c2f1c44eaae2ff0ed
2021-12-17 07:43:38 -08:00
f045618dab dbr quant: extend qconfig_dict support to functionals, part 2 (#69766)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69766

Follow-up on the previous PR, removes the requirement to have a parent
qconfig in order for the object type qconfig to be applied for a function.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D33020218

Pulled By: vkuzo

fbshipit-source-id: fa0e10f05ca5f88b48ef74b9d2043ea763506742
2021-12-17 05:59:55 -08:00
a4173fc887 dbr quant: extend qconfig_dict support to functions, part 1 (#69758)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69758

Extends DBR quant `qconfig_dict['object_type']` support to function types,
with the restriction that a parent module must have a qconfig.

A future PR will remove the restriction above (it is due to some technical
debt), to keep PR sizes small.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D33020217

Pulled By: vkuzo

fbshipit-source-id: ce8a8185f9c87d437e1319ff6f19e8f6adf41e02
2021-12-17 05:59:52 -08:00
c186773d92 dbr quant: make fqn during prepare op hook required (#69726)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69726

This is a cleanup, this variable was previously optional
but it always exists, because the only way an op hook
can run if there is a parent module with an `AutoQuantizationState`
object.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: albanD

Differential Revision: D33003472

Pulled By: vkuzo

fbshipit-source-id: de5769194808d42b025b848667815b4e3d73b6c6
2021-12-17 05:59:49 -08:00
b999f87503 fx quant: move _parent_name to common utils (#69720)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69720

This function is also useful for DBR quant, moving it from FX utils
to common utils.

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D33003473

Pulled By: vkuzo

fbshipit-source-id: 20360682c69d614a645c14fc29d3ee023d6b2623
2021-12-17 05:59:46 -08:00
4f450f44bf dbr quant: initial support of qconfig_dict for modules (#69719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69719

This PR changes the API signature of DBR quant to use `qconfig_dict`,
similar to FX graph mode quantization.  In this first PR, only basic
functionality is implemented:
* qconfig=None or static quantization with quint8 only is tested
* non-default qconfig for modules only is tested
* targeting ops by order is not implemented

Expanding this support will be done in future PRs.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D33003475

Pulled By: vkuzo

fbshipit-source-id: f5af81e29c34ea57c2e23333650e44e1758102e4
2021-12-17 05:59:44 -08:00
0f1ceb34ec fx quant: refactor qconfig_dict utils to separate file (#69636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69636

Moves some of the qconfig_dict utilities away from the FX subdirectory
into the quantization subdirectory. These utilities can be reused with
other workflows.

A future PR will start using these utilities in DBR quant.

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
```

Reviewed By: albanD

Differential Revision: D33003474

Pulled By: vkuzo

fbshipit-source-id: 34417b198681279469e6d7c43ea311180086d883
2021-12-17 05:58:25 -08:00
7abb7667a6 [tensorexpr] Add memory planning to reuse intermediate buffers (#66452)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66452

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D31557188

Pulled By: huiguoo

fbshipit-source-id: f18dfeba1df20d5d4f118640fc10782534eb9219
2021-12-17 01:38:02 -08:00
ac92f7cc75 [tensorexpr] Remove the optional argument in LoopNest::prepareForCodeGen (#67144)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67144

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D31881150

Pulled By: huiguoo

fbshipit-source-id: af99087722ec71d6deb9049b63b573ae7720c9ec
2021-12-17 01:37:59 -08:00
bbfd7b75ca [tensorexpr] Move the allocation of intermediate buffers from TEK to CodeGen (#67143)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67143

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D31881151

Pulled By: huiguoo

fbshipit-source-id: 457e5d4ff8a15f70af9c797c9ab4803d8e779abe
2021-12-17 01:37:56 -08:00
6075ec15b1 [tensorexpr] Add BufMap instruction to reuse the memory of dest buf for src buf (#66451)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66451

Test Plan: Imported from OSS

Reviewed By: navahgar, ZolotukhinM

Differential Revision: D31557190

Pulled By: huiguoo

fbshipit-source-id: 96e08a05cb1c558706c4189e27d5d72efbd9c510
2021-12-17 01:37:53 -08:00
c7e0951524 [tensorexpr] Add a stmt recorder to obtain stmt PCs (#66450)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66450

Test Plan: Imported from OSS

Reviewed By: navahgar, ZolotukhinM

Differential Revision: D31557189

Pulled By: huiguoo

fbshipit-source-id: 416d79ddfc46a0109187cdeb919ad9b5abde8030
2021-12-17 01:36:37 -08:00
043098ef7f [quant][graphmode] Rename backend_config_dict folder to backend (#69882)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69882

att

Test Plan:
```
python test/fx2trt/test_quant_trt.py
```

Imported from OSS

Reviewed By: supriyar

Differential Revision: D33081761

fbshipit-source-id: c3178eec5798ac8587be09a963944b570c73e8ea
2021-12-16 21:13:04 -08:00
3d51c88032 [DataPipe] Unifying API - removing options to have fn_args and fn_kwargs from MapDataPipes (#69561)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69561

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D32952099

Pulled By: NivekT

fbshipit-source-id: 95b725774a9d04d655e2542760726908f33043f4
2021-12-16 18:11:00 -08:00
b89c283c80 [DataPipe] Unifying API - removing options to have fn_args and fn_kwargs from IterDataPipes (#69560)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69560

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D32952100

Pulled By: NivekT

fbshipit-source-id: e0cc31408c7cf3220fe274feed1c7202a1aaae70
2021-12-16 18:09:52 -08:00
4a6a5d1630 OpInfos for torch.{flatten, column_stack} (#69237)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69237

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D32988956

Pulled By: anjali411

fbshipit-source-id: b7f5c537ff9731f56232aa5647910f03edf4582a
2021-12-16 17:50:58 -08:00
ef6f776e82 [quant][be] Cleanup test cases for eager mode workflow (#69880)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69880

Making the test cases more standardized, in general we would like to have
```
TestQuantizeEager,
TestQuantizeEagerOps,
TestQuantizeEagerModels,
```

but currently since we have separate ptq static, ptq dynamic and qat static apis, we only partially cleaned
up the test cases, we can merge all of them later when we merge all the apis

Test Plan:
python test/test_quantization.py

Imported from OSS

Reviewed By: supriyar

Differential Revision: D33081418

fbshipit-source-id: fcb96559b76bbc51eb1b0625e0d4b193dbb37532
2021-12-16 17:47:30 -08:00
92320dfe6e [shard] remove set device for nccl (#69946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69946

This PR remove the implicit set_device for nccl pg according to the proposal of https://github.com/pytorch/pytorch/issues/69731
ghstack-source-id: 145847504

Test Plan: wait for ci

Reviewed By: pritamdamania87

Differential Revision: D33099095

fbshipit-source-id: 3fe9f6a0facf5ea513c267e9f32c6a7fd56cc8a2
2021-12-16 17:16:42 -08:00
9813629500 [reland][quant][fx][graphmode] Add support for conv add pattern in backend_config_dict (#70007)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70007

This PR extends fusion pattern support from simple sequence of ops to a simple
subgraph like conv - add
```
x - conv ---\
y ---------add ---- ouptut
```
where input x, y and output are observed/quantized

Test Plan:
```
python test/fx2trt/test_quant_trt.py TestQuantizeFxTRTOps.test_conv_add
```

Imported from OSS

Imported from OSS

Reviewed By: supriyar

Differential Revision: D33144605

fbshipit-source-id: 331fda77bdc431a8cd9abe1caea8347a71776ec2
2021-12-16 17:10:44 -08:00
62809dc062 .github: Volume mount netrc to home directory (#70057)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70057

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D33169220

Pulled By: seemethere

fbshipit-source-id: 720e5fb946249a26f0505afc34b95530258e53ea
2021-12-16 15:23:45 -08:00
a73c6a45b6 [reland][quant][graphmode][fx] Enable fuse handler for sequence of 3 ops (#70006)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70006

reland: fixing some mypy errors that was missed before

This PR enables fuse handler for sequence of three ops, and merges all fuse handlers into one

TODO: we can also move this to backend_config_dict folder

Test Plan:
regression fusion test
```
python test/test_quantization.py TestFuseFx
```

Imported from OSS

Imported from OSS

Reviewed By: supriyar

Differential Revision: D33144606

fbshipit-source-id: ca34f282018a0fb4d04c7e35119eaf2d64258e78
2021-12-16 15:04:16 -08:00
fa582045fc Fix lint/mypy violations (#70059)
Summary:
Introduced by https://github.com/pytorch/pytorch/pull/69194

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70059

Reviewed By: suo, cccclai

Differential Revision: D33170748

Pulled By: malfet

fbshipit-source-id: a2e42f37d04c21a735f6474e42eb6670d2a0c3b9
2021-12-16 14:06:27 -08:00
02c63c3006 extract out c10 targets to the c10 package (#69992)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69992

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D33141013

fbshipit-source-id: e5edd6bd5b5834ac27390ba940ebed9148512c8d
2021-12-16 13:11:49 -08:00
d459e79500 [jit][edge] Remove usage of shared_ptr<mobile::Code>. (#68037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68037

Right now mobile::Code doesn't outlive its enclosing Function, and all accesses to Code happens inside interpreter loop which doesn't outlive the module, so we don't need to use std::shared_ptr here. This also should saves us 1-2 KB for binary size, because shared_ptr seems to bloat on arm64 android.
ghstack-source-id: 145818696

Test Plan: eyes.

Reviewed By: qihqi, tugsbayasgalan

Differential Revision: D32264616

fbshipit-source-id: d83f538d6604cf75fd7728a25127b4849ce7ab2a
2021-12-16 13:11:46 -08:00
39f65fee47 [jit] Split ClassType into a separate header. (#68036)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68036

In Edge cases we want to separately include class_type.h because in the future we want to stop depending on the rest of the JIT types declared inside jit_type.h
ghstack-source-id: 145818699

Test Plan: no behavior change.

Reviewed By: qihqi, gmagogsfm

Differential Revision: D32264618

fbshipit-source-id: 53dc187772e3dde88ff978b87252c31f3641860b
2021-12-16 13:10:05 -08:00
243e135eb4 Sparse CSR CUDA: Add block sparse support for torch.triangular_solve (#68709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68709

This PR adds support for triangular solver with a block CSR matrix.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D33066067

Pulled By: cpuhrsch

fbshipit-source-id: 9eaf1839071e9526be8d8c6d47732b24200f3557
2021-12-16 13:03:42 -08:00
5f3f327a9d update SequentialLR signature (#69817)
Summary:
- ~optimizer isn't required for `SequentialLR` since it's already present in the schedulers. Trying to match the signature of it with `ChainedScheduler`.~
- ~`verbose` isn't really used anywhere so removed it.~

updated missing docs and added a small check

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69817

Reviewed By: ngimel

Differential Revision: D33069589

Pulled By: albanD

fbshipit-source-id: f015105a35a2ca39fe94c70acdfd55cdf5601419
2021-12-16 12:58:00 -08:00
15b9e5f8a4 Revert D33136054: Remove backward ops for miopen convolution
Test Plan: revert-hammer

Differential Revision:
D33136054 (8b9b819d22)

Original commit changeset: e049168732bd

Original Phabricator Diff: D33136054 (8b9b819d22)

fbshipit-source-id: 2a3cc3df3519d04595795f0bc87a807705d13a13
2021-12-16 12:46:02 -08:00
b199e3c842 Provide functionality to write custom ShardedTensor ops. (#69874)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69874

We have a handful of ops supported for ShardedTensor via
``__torch_function__`` dispatch. However, we currently can't cover all torch
operators and having a way for users to extend this functionality will make
this functionality much more general.

In this PR, I've introduced a custom_sharded_op decorator which can be used to
register a custom sharded op implementation.
ghstack-source-id: 145841141

Test Plan: waitforbuildbot

Reviewed By: wanchaol

Differential Revision: D33078587

fbshipit-source-id: 5936b7ac25582e613653c19afa559219719ee54b
2021-12-16 12:40:13 -08:00
1f86e0ee2a don't compile pow kernels for non-existent case (#70017)
Summary:
Per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70017

Reviewed By: malfet

Differential Revision: D33163747

Pulled By: ngimel

fbshipit-source-id: 784c7934428ee896c637662fdd59833c3a395f64
2021-12-16 12:31:30 -08:00
8b9b819d22 Remove backward ops for miopen convolution (#69987)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69987

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33136054

Pulled By: jbschlosser

fbshipit-source-id: e049168732bdfcf590ec8102412f2ef0418f9dcc
2021-12-16 11:49:49 -08:00
b4c4a015d6 Revert D33163841: Revert D33102715: Back out "Revert D32606547: torch/monitor: add C++ events and handlers"
Test Plan: revert-hammer

Differential Revision:
D33163841

Original commit changeset: e262b6d8c80a

Original Phabricator Diff: D33102715 (eb374de3f5)

fbshipit-source-id: 644216036a238a458f0a2198460b36d24fb035f8
2021-12-16 11:12:18 -08:00
96fe82ac3c HANDLE_TH_ERRORS: Move exception translation out of line (#69974)
Summary:
I've noticed that the `HANDLE_TH_ERRORS` macros are actually very expensive in terms of compile time.  Moving the bulk of the catch statements out of line using a lippincott function significantly improves compile times and object file binary sizes. For just the generated autograd bindings, this halves serial build time from 8 minutes to 4 and binary size is more than halved for most files with the biggest difference being `python_variable_methods.cpp` which went from 126 MB to 43 MB.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69974

Reviewed By: mruberry

Differential Revision: D33160899

Pulled By: albanD

fbshipit-source-id: fc35fa86f69ffe5a0752557be30b438c8564e998
2021-12-16 11:04:48 -08:00
9ff8c49ed9 Enable cpu scalar arguments for jiterator (#69861)
Summary:
Creates analog of `gpu_kernel_with_scalars` for jiterator kernels

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69861

Reviewed By: mruberry

Differential Revision: D33134013

Pulled By: ngimel

fbshipit-source-id: fd2412e8d6432e15d5721e95a194d29fa70ad92c
2021-12-16 10:58:59 -08:00
ff53ed24d2 fix NameError of docstring in broadcast_object_list (#69810)
Summary:
This PR fixes NameError of docstring in broadcast_object_list.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69810

Reviewed By: kimishpatel

Differential Revision: D33143167

Pulled By: jbschlosser

fbshipit-source-id: 99c076466ae4b4a332763b7546028c5097b417d7
2021-12-16 10:50:45 -08:00
c9e898fef8 delete TH (#69929)
Summary:
Move TH<C>GenerateByteType includes into torch/csrc (the only place they are used), and we can remove TH folder altogether!
The only thing left in THC are includes left for bc compatibility.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69929

Reviewed By: mruberry

Differential Revision: D33133013

Pulled By: ngimel

fbshipit-source-id: 78c87cf93d2d641631b0f71051ace318bf4ec3c1
2021-12-16 10:45:30 -08:00
7f7966a888 [Docs] Fix the syntax of documentation (#69958)
Summary:
Fixes the syntax of documentation in the file torch/nn/utils/clip_grad.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69958

Reviewed By: mruberry

Differential Revision: D33160612

Pulled By: albanD

fbshipit-source-id: 2dc199fee345bb4c75632900bc6f73a1ab8192a6
2021-12-16 10:38:39 -08:00
ebc66bfeea [Profiler] Pull helper methods into dedicated file. (And start torch/csrc/profiler folder. (#69255)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69255

One thing that I've found as I optimize profier is that there's a lot of intermingled code, where the kineto profiler relies on the legacy (autograd) profiler for generic operations. This made optimization hard because I had to manage too many complex dependencies. (Exaserbated by the USE_KINETO #ifdef's sprinkled around.) This PR is the first of several to restructure the profiler(s) so the later optimizations go in easier.

Test Plan: Unit tests

Reviewed By: aaronenyeshi

Differential Revision: D32671972

fbshipit-source-id: efa83b40dde4216f368f2a5fa707360031a85707
2021-12-16 10:33:47 -08:00
b23890177f [Operator Versioning][Edge] Codegen upgrader_mobile.cpp (#69194)
Summary:
From operator version map and upgrader torchscript, generate upgrader_mobile.cpp file. It also includes a unit test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69194

ghstack-source-id: 145819351

Test Plan:
```
buck test mode/opt //caffe2/test:upgrader_codegen
```
```
buck run mode/opt //caffe2/torch/fb/mobile/upgrader_codegen:upgrader_codegen
```
```
python /Users/chenlai/pytorch/tools/codegen/operator_versions/gen_mobile_upgraders.py
```

Reviewed By: iseeyuan

Differential Revision: D32748985

fbshipit-source-id: f8437766edaba459bfc5e7fc7a3ca0520c4edb9a
2021-12-16 10:29:35 -08:00
c4281cc92d Prototype checkpoint_wrapper (#69955)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69955

Implements a checkpoint_wrapper function, which wraps nn.Module with checkpointing so user won't have to call checkpoint() everytime they want to checkpoint the module.

Currently only support for reentrant-based checkpointing is added and only tested with FSDP to unblock a use case.

Future work is to add support for new checkpointing API, add more tests, upstream to torch.utils.checkpoint.
ghstack-source-id: 145811242

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D33107276

fbshipit-source-id: c4a1c68d71d65713a929994940a8750f73fbdbdb
2021-12-16 09:59:19 -08:00
c80b5b8c8f Revert D33102715: Back out "Revert D32606547: torch/monitor: add C++ events and handlers"
Test Plan: revert-hammer

Differential Revision:
D33102715 (eb374de3f5)

Original commit changeset: 3816ff01c578

Original Phabricator Diff: D33102715 (eb374de3f5)

fbshipit-source-id: e262b6d8c80a05f3a67e024fedfbadefdbfe6e29
2021-12-16 09:39:57 -08:00
8c7f4a0d0b [tensorexpr] check for index out of bounds in ir_eval (#68858)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68858

when executing with ir_eval, check for index out of bounds.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D32657881

Pulled By: davidberard98

fbshipit-source-id: 62dd0f85bb182b34e9c9f795ff761081290f6922
2021-12-16 09:27:45 -08:00
76d282d447 Nvfuser code bump 12 5 (#69964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69964

Things added in this PR that requires review:
1. cuLaunchCooperativeKernel driver API added
aten/src/ATen/cuda/detail/LazyNVRTC.cpp
aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h

nvfuser code update:
1. perf turning on codegen scheduler that improves performance.
2. permutation support has been extended beyond contiguous/channels-last. (The improvements could be observed on PW benchmark)

Things reverted from local changes:
1. aten::gelu with approximation
2. local changes that is upstreamed in PR https://github.com/pytorch/pytorch/issues/68804

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69428

Reviewed By: ngimel

Differential Revision: D33073817

Pulled By: wconstab

fbshipit-source-id: e77d32e81d037d7370822b040456fd4c3bd68edb
2021-12-16 08:28:54 -08:00
a6a1c709ff Fixed libtorch at::Tensor::print() linking error (#69615)
Summary:
There was a declaration of function at::Tensor::print() in TensorBody.h,  left there during the refactoring of Tensor and TensorBase (d701357d921ef167d42c125e65b6f7da6be3ad0f). Removing it from TensorBody.h resolve the issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69615

Test Plan:
code below now compile and works fine (print `[CPUFloatType [3, 4, 5, 5, 5]] `)
```
#include <torch/torch.h>

int main()
{
    torch::Tensor tensor = torch::randn({3, 4, 5, 5, 5});
    tensor.print();
}
```

Fixes https://github.com/pytorch/pytorch/issues/69515

Reviewed By: ngimel

Differential Revision: D33020361

Pulled By: albanD

fbshipit-source-id: 190f253fb4101a4205aede3574b6e8acd19e54a1
2021-12-16 07:57:10 -08:00
531da0c43b change asan test shard to 3 (#69843)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68261

This PR changes the number of test shard from 2-->3 for all Asan test, aiming to improve the run time for Asan tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69843

Reviewed By: janeyx99

Differential Revision: D33160771

Pulled By: xidachen

fbshipit-source-id: dba1d318cc49b923e18704839471d8753cc00eca
2021-12-16 07:22:03 -08:00
fe7b6446d5 [LTC] Upstream LazyTensor and LazyGraphExecutor (#69815)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69815

Test Plan: Imported from OSS

Reviewed By: dagitses, jbschlosser

Differential Revision: D33059774

Pulled By: desertfire

fbshipit-source-id: dd1e3e5f4fd3181517eebd2742f6a5b7b6fb9a7d
2021-12-16 05:44:40 -08:00
28243769f9 [LTC] Upstream several internal ops (#69716)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69716

To prepare for the landing of LazyTensor and LazyGraphExecutor,
- arithmetic_ir_ops.h
- cast.h
- device_data.h
- expand.h
- generic.h
- scalar.h

Test Plan: Imported from OSS

Reviewed By: wconstab

Differential Revision: D32999410

Pulled By: desertfire

fbshipit-source-id: 31559dd7a1e525591ae9e2d7f915ee864437c11f
2021-12-16 05:44:37 -08:00
e6a4988b2d [LTC] Upstream utils in computation_client (#69621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69621

Upstream the following utils
- metrics.h
- multi_wait.h
- thread_pool.h
- unique.h

Test Plan: Imported from OSS

Reviewed By: wconstab, VitalyFedyunin

Differential Revision: D32957629

Pulled By: desertfire

fbshipit-source-id: 5f2fb57493856556099b7cda7560a568d1f9ed97
2021-12-16 05:43:09 -08:00
73a6c36f1b Add more details to the known limitations section of torchhub docs (#69970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69970

This is a follow up to https://github.com/pytorch/hub/issues/243

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33124060

Pulled By: NicolasHug

fbshipit-source-id: 298fe14b39a1aff3e0b029044c9a0db8bc82336a
2021-12-16 02:43:48 -08:00
eb374de3f5 Back out "Revert D32606547: torch/monitor: add C++ events and handlers" (#69923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69923

Original commit changeset: fbaf2cc06ad4

Original Phabricator Diff: D32606547 (e61fc1c03b)

This is the same thing as the original diff but just using a normal std::mutex instead of std::shared_timed_mutex which is not available on OSX 10.11. The performance difference should be negligible and easy to change down the line if it does become a bottleneck.

Old failing build: https://github.com/pytorch/pytorch/runs/4495465412?check_suite_focus=true

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783

Test Plan:
buck test //caffe2/test/cpp/monitor:monitor

will add ciflow tags to ensure mac builds are fine

Reviewed By: aivanou

Differential Revision: D33102715

fbshipit-source-id: 3816ff01c578d8e844d303d881a63cf5c3817bdb
2021-12-15 22:51:43 -08:00
5cc4037369 [PyTorch][Distributed] Integrate with ShardedOptimizer in the unit test of ShardedLinear (#69569)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69569

Since ShardedOptimizer is added in https://github.com/pytorch/pytorch/pull/68607. We now integrate it in our unit test for Sharded Linear.
ghstack-source-id: 145773749

Test Plan: CI + Unit test

Reviewed By: wanchaol

Differential Revision: D32777020

fbshipit-source-id: eb6b1bb0f6234976f024273833154cab274fed25
2021-12-15 17:55:01 -08:00
dc18048dd8 [PT-D][Fix] Broken sharded embedding and embedding bag test fix (#69725)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69725

We have added a `no_grad` cx manager in the tensor sharding to ensure that the local_shard is the root node. But it turns out for embedding and embedding_bag, when the `max_norm` is specified, it will complain for row-wise sharding. We use the original `max_norm` of the operators.

Error traces:
```
  File "/data/sandcastle/boxes/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/_sharded_tensor/sharded_embedding#binary,link-tree/torch/overrides.py", line 1389, in handle_torch_function
    result = torch_func_method(public_api, types, args, kwargs)
  File "/data/sandcastle/boxes/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/_sharded_tensor/sharded_embedding#binary,link-tree/torch/distributed/_sharded_tensor/api.py", line 554, in __torch_function__
    return sharded_embedding(types, args, kwargs, self._process_group)
  File "/data/sandcastle/boxes/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/_sharded_tensor/sharded_embedding#binary,link-tree/torch/distributed/_sharded_tensor/ops/embedding.py", line 115, in sharded_embedding
    return _handle_row_wise_sharding(
  File "/data/sandcastle/boxes/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/_sharded_tensor/sharded_embedding#binary,link-tree/torch/distributed/_sharded_tensor/ops/embedding.py", line 309, in _handle_row_wise_sharding
    gathered_input_embeddings = torch.nn.functional.embedding(
  File "/data/sandcastle/boxes/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/_sharded_tensor/sharded_embedding#binary,link-tree/torch/nn/functional.py", line 2153, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: A view was created in no_grad mode and its base or another view of its base has been modified inplace with grad mode enabled. Given that this use case is ambiguous and error-prone, it is forbidden. You can clarify your code by moving both the view and the inplace either both inside the no_grad block (if you don't want the inplace to be tracked) or both outside (if you want the inplace to be tracked).
 exiting process 2 with exit code: 10
```

As a fix, we clone, detach the local shard from the narrow result without using the context manager.
ghstack-source-id: 145773748

Test Plan: CI + Unit test.

Reviewed By: pritamdamania87, wanchaol

Differential Revision: D33000927

fbshipit-source-id: 4d5a93120675e90d4d6d6225a51c4a481d18d159
2021-12-15 17:53:49 -08:00
4d5dd00e61 Remove backward ops for cuDNN transposed convolution (#69902)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69902

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33093795

Pulled By: jbschlosser

fbshipit-source-id: 8b90150bd1996e48c0c888bdab4e95a849d10ef5
2021-12-15 17:48:25 -08:00
3dc3651e0e Remove backward ops for cuDNN convolution (#69901)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69901

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33093796

Pulled By: jbschlosser

fbshipit-source-id: f5beab6f3078144b6c8e5c4c51d69823815a9f99
2021-12-15 17:46:49 -08:00
bf15dc22bc Fix build on latest main branch of thrust (#69985)
Summary:
Our internal CI that builds PyTorch with the latest main branch of thrust fails with
```
https://github.com/pytorch/pytorch/issues/22 466.9 /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DAT_PER_OPERATOR_HEADERS -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMAGMA_V2 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DTORCH_CUDA_BUILD_MAIN_LIB -DUSE_C10D_GLOO -DUSE_C10D_MPI -DUSE_C10D_NCCL -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_NCCL -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -Dtorch_cuda_EXPORTS -Iaten/src -I../aten/src -I. -I../ -I../cmake/../third_party/benchmark/include -I../cmake/../third_party/cudnn_frontend/include -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -Iinclude -I../torch/csrc/distributed -I../aten/src/TH -I../aten/src/THC -I../aten/src/ATen/cuda -Icaffe2/aten/src -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -Inccl/include -I../c10/cuda/../.. -I../c10/.. -I../third_party/tensorpipe -Ithird_party/tensorpipe -I../third_party/tensorpipe/third_party/libnop/include -I../torch/csrc/api -I../torch/csrc/api/include -isystem=third_party/gloo -isystem=../cmake/../third_party/gloo -isystem=../cmake/../third_party/googletest/googlemock/include -isystem=../cmake/../third_party/googletest/googletest/include -isystem=../third_party/protobuf/src -isystem=/opt/conda/include -isystem=../third_party/gemmlowp -isystem=../third_party/neon2sse -isystem=../third_party/XNNPACK/include -isystem=../third_party -isystem=../cmake/../third_party/eigen -isystem=/opt/conda/include/python3.8 -isystem=/opt/conda/lib/python3.8/site-packages/numpy/core/include -isystem=../cmake/../third_party/pybind11/include -isystem=/opt/hpcx/ompi/include/openmpi -isystem=/opt/hpcx/ompi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include -isystem=/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem=/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem=/opt/hpcx/ompi/include -isystem=/usr/local/cuda/include -isystem=../third_party/ideep/mkl-dnn/third_party/oneDNN/include -isystem=../third_party/ideep/include -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -Xcudafe --diag_suppress=20236 -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -O3 -DNDEBUG -Xcompiler=-fPIC -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -DHAVE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD -Xcompiler=-Wall,-Wextra,-Wno-unused-parameter,-Wno-unused-variable,-Wno-unused-function,-Wno-unused-result,-Wno-unused-local-typedefs,-Wno-missing-field-initializers,-Wno-write-strings,-Wno-unknown-pragmas,-Wno-type-limits,-Wno-array-bounds,-Wno-unknown-pragmas,-Wno-sign-compare,-Wno-strict-overflow,-Wno-strict-aliasing,-Wno-error=deprecated-declarations,-Wno-missing-braces,-Wno-maybe-uninitialized -DTORCH_CUDA_BUILD_MAIN_LIB -Xcompiler -pthread -std=c++14 -MD -MT caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LegacyThrustHelpers.cu.o -MF caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LegacyThrustHelpers.cu.o.d -x cu -c ../aten/src/ATen/native/cuda/LegacyThrustHelpers.cu -o caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LegacyThrustHelpers.cu.o
https://github.com/pytorch/pytorch/issues/22 466.9 ../aten/src/ATen/native/cuda/LegacyThrustHelpers.cu(53): error: namespace "thrust" has no member "make_constant_iterator"
https://github.com/pytorch/pytorch/issues/22 466.9
https://github.com/pytorch/pytorch/issues/22 466.9 1 error detected in the compilation of "../aten/src/ATen/native/cuda/LegacyThrustHelpers.cu".
```
The failure is because this file uses `thrust::make_counting_iterator`, but didn't include the file where this function is defined.

cc: xwang233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69985

Reviewed By: jbschlosser

Differential Revision: D33135575

Pulled By: ngimel

fbshipit-source-id: 7a8da56bba609d6c30de4a064669faba12cb7168
2021-12-15 17:08:43 -08:00
98c0fb8b42 [sparsity] More descriptive error message for missing parameters (#69895)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69895

sparse.Linear has an error message that doesn't tell the user how to resolve the issue. This adds more info.
ghstack-source-id: 145603212

Test Plan: Not needed -- string change only

Reviewed By: jerryzh168

Differential Revision: D33039278

fbshipit-source-id: b5f7f5d257142eb3e7ad73f7c005755253a329d7
2021-12-15 16:58:31 -08:00
46ace4ac33 Add support for masked_softmax when softmax_elements > 1024 & corresponding unit tests (#69924)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69924

Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax

Reviewed By: ngimel

Differential Revision: D32819181

fbshipit-source-id: 6838a11d3554ec8e1bd48f1c2c7b1ee3a4680995
2021-12-15 16:44:15 -08:00
32ffad17a9 [PyTorch][Easy] make GlobalRecordFunctionCallbacks smallvector (#70002)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70002

callbacks are limited to 4. no reason for it to be `std::vector`

Test Plan: CI

Reviewed By: aaronenyeshi

Differential Revision: D32611294

fbshipit-source-id: 21823248abe40d461579b9b68d53c8c0de2a133d
2021-12-15 16:28:09 -08:00
65ab63310b [PyTorch] use div instead of mul when calculating sampling probability (#70001)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70001

multiplying inversion of `kLowProb` instead of division which uses less expensive `mul` instead of `idv`

Test Plan:
Before
{F682076291}

After
{F682076323}

Reviewed By: robieta

Differential Revision: D32608440

fbshipit-source-id: 7851317a0f7e33813f2bd7a152e5e7f4b5c361b4
2021-12-15 15:28:18 -08:00
66406ee0f7 [PyTorch][Static Runtime] Fix to() w/dtype bool (#69935)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69935

Didn't realize that `AT_DISPATCH_ALL_TYPES` should really be called `AT_DISPATCH_MOST_TYPES`.
ghstack-source-id: 145661358

Test Plan:
Added test for dtype bool.

Ran CMF local_ro net:

before:

```
I1215 12:33:49.300174 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.966491. Iters per second: 1034.67
I1215 12:33:49.825570 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.94867. Iters per second: 1054.11
I1215 12:33:50.349246 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.947926. Iters per second: 1054.93
I1215 12:33:50.870433 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.943779. Iters per second: 1059.57
I1215 12:33:51.393702 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.947185. Iters per second: 1055.76
I1215 12:33:51.915666 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.945672. Iters per second: 1057.45
I1215 12:33:52.438475 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.948407. Iters per second: 1054.4
I1215 12:33:52.965337 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.95472. Iters per second: 1047.43
I1215 12:33:53.494563 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.967083. Iters per second: 1034.04
I1215 12:33:54.017879 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.948945. Iters per second: 1053.8
I1215 12:33:54.017930 1606538 PyTorchPredictorBenchLib.cpp:290] Mean milliseconds per iter: 0.951888, standard deviation: 0.0083367
```

after:
```
I1215 12:32:35.820874 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.999845. Iters per second: 1000.15
I1215 12:32:36.343147 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.944363. Iters per second: 1058.91
I1215 12:32:36.863806 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.942542. Iters per second: 1060.96
I1215 12:32:37.385459 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.944677. Iters per second: 1058.56
I1215 12:32:37.905436 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.941135. Iters per second: 1062.55
I1215 12:32:38.424907 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.939748. Iters per second: 1064.11
I1215 12:32:38.944643 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.941764. Iters per second: 1061.84
I1215 12:32:39.463791 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.938946. Iters per second: 1065.02
I1215 12:32:39.987567 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.95437. Iters per second: 1047.81
I1215 12:32:40.511204 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.959139. Iters per second: 1042.6
I1215 12:32:40.511242 1594955 PyTorchPredictorBenchLib.cpp:290] Mean milliseconds per iter: 0.950653, standard deviation: 0.0184761
```

Reviewed By: hlu1

Differential Revision: D33106675

fbshipit-source-id: 5bb581f8d0ed22ef08df1936dc8d67045e44e862
2021-12-15 15:26:56 -08:00
b28a4100ff scripts: Fix manylinux2014 promotion to pypi (#70003)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70003

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: jbschlosser, janeyx99

Differential Revision: D33143730

Pulled By: seemethere

fbshipit-source-id: 83a46047fbfe4709e841fbfcaa75e434ff325be5
2021-12-15 14:55:00 -08:00
38cfacd817 Tensor: Define operators override functions in TensorBody.h (#68697)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68697

Currently, if you include `Tensor.h` but not `TensorOperators.h` then
using overloaded operators will compile but fail at link time.
Instead, this defines the member functions in `TensorBody.h` and
leaves `TensorOperators.h` as only the free functions.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32596269

Pulled By: albanD

fbshipit-source-id: 5ce39334dc3d505865268f5049b1e25bb90af44a
2021-12-15 14:29:38 -08:00
9c7c1b769a Functionalization: Only include headers for required ops (#68690)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68690

RegisterFunctionalization.cpp is a shared file, so only including the
required operators means a single operator change only requires 1
shard to be rebuilt instead of all of them.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32596275

Pulled By: albanD

fbshipit-source-id: 8b56f48872156b96fbc0a16b542b8bab76b73fd4
2021-12-15 14:29:35 -08:00
7bb4b683b5 Codegen: Registration now only includes the functions used (#68689)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68689

Currently Register{DispatchKey}.cpp includes all of
`NativeFunctions.h`, so any operator signature change requires all
backend registration to be recompiled. However, most backends only
have registrations for a small fraction of operators so it makes sense
to only include the specific functions required.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32596273

Pulled By: albanD

fbshipit-source-id: 11d511f47937fbd5ff9f677c9914277b5d015c25
2021-12-15 14:29:32 -08:00
6ba18ba87e Codegen: Generate static dispatch headers per operator (#68714)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68714

This splits the static dispatch headers (e.g. `CPUFunctions.h`)
into per operators headers (e.g. `ops/empty_cpu_dispatch.h`) which is
needed for when `Tensor.h` is compiled with static dispatch enabled.

There are also several places in ATen where the static dispatch
headers are used as an optimization even in dynamic dispatch builds.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32596265

Pulled By: albanD

fbshipit-source-id: 287783ef4e35c7601e9d2714ddbc8d4a5b1fb9e5
2021-12-15 14:29:29 -08:00
303d60b8da Add TORCH_ASSERT_ONLY_METHOD_OPERATORS macro (#68688)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68688

This adds a new macro `TORCH_ASSERT_ONLY_METHOD_OPERATORS` which
allows `Tensor.h` to be included, but not headers which pull in all
other operators. So, a file that defines this macro needs to use the
fine-grained headers to include only the operators being used.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32596267

Pulled By: albanD

fbshipit-source-id: 6fc2ce3d2b0f52ac6d81b3f063193ce26e0d75a3
2021-12-15 14:29:26 -08:00
bab61be43b Codegen: Add root_name property to NativeFunction{,sGroup} (#68687)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68687

This adds `NativeFunction.root_name` which is the canonical name
for the operator group. i.e. the BaseOperatorName without inplace or
double-underscores. In the previous PR I referred to this as
`base_name` but confusingly `BaseOperatorName` does potentially
include inplace or double-underscores.

I also add the property to `NativeFunctionsGroup` so that grouped
functions with type `Union[NativeFunction, NativeFunctionsGroup]`
can have the property queried without needing `isinstance` checks.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32596271

Pulled By: albanD

fbshipit-source-id: 8b6dad806ec8d796dcd70fc664604670d668cae7
2021-12-15 14:28:10 -08:00
a406a427ae Revert D33004315: Support torch.equal for ShardedTensor.
Test Plan: revert-hammer

Differential Revision:
D33004315 (1c4c81622c)

Original commit changeset: 786fe26baf82

Original Phabricator Diff: D33004315 (1c4c81622c)

fbshipit-source-id: e1dda70fea656834fdf0f2a9f874415f7b460c6e
2021-12-15 14:14:06 -08:00
1c4c81622c Support torch.equal for ShardedTensor. (#69734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69734

Added support for `torch.equal` to ShardedTensor. This is really
helpful in terms of comparing two ShardedTensors.

Will implement `allclose` in a follow PR.
ghstack-source-id: 145301451

Test Plan: waitforbuildbot

Reviewed By: fduwjj, wanchaol

Differential Revision: D33004315

fbshipit-source-id: 786fe26baf82e1bb4fecfdbfc9ad4b64e704877f
2021-12-15 13:07:36 -08:00
8a08e70bf4 Revert D32596676: Avoid adding torch::deploy interpreter library to the data section
Test Plan: revert-hammer

Differential Revision:
D32596676 (986d19c0a7)

Original commit changeset: 1ab15b2d3642

Original Phabricator Diff: D32596676 (986d19c0a7)

fbshipit-source-id: da4f02114fd7e41634f116ab659a55cd985cfd7d
2021-12-15 13:02:22 -08:00
24bc3be146 [Profiler] Clean up profiler includes. (#69421)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69421

I've hit a lot of build issues in D32671972, and I've come to realize that a lot of it boils down to header hygene. `function.h` includes `profiler.h` *solely* to transitively include `record_function.h` which winds up leaking the profiler symbols. Moreover several files are relying on transitive includes to get access to `getTime`. As long as I have to touch all the places that use `getTime`, I may as well also move them to the new namespace.

Test Plan: Unit tests and CI.

Reviewed By: aaronenyeshi, albanD

Differential Revision: D32865907

fbshipit-source-id: f87d6fd5afb784dca2146436e72c69e34623020e
2021-12-15 12:50:24 -08:00
587f8d9924 OperatorEntry: Avoid unnecessarily templated code (#67986)
Summary:
`assertSignatureIsCorrect` is instantiated at minimum once per unique operator signature yet its core logic is independent of the type. So, it makes sense to have a light-weight template that does nothing but call into the non-templated function with the correct `CppSignature` object.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67986

Reviewed By: jbschlosser

Differential Revision: D33108600

Pulled By: swolchok

fbshipit-source-id: 7594524d3156ff2422e6edcdffcb263dc67ea346
2021-12-15 12:43:53 -08:00
986d19c0a7 Avoid adding torch::deploy interpreter library to the data section (#69245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69245

Create custom section ".embedded_interpreter" in order to store interpreter instead of .data in order to allow in order to increae the amount of memory that can be used by 33% for the other sections of the executable (1.5GB -> 2.0GB) such as .text/.data/.bss. This also removes memory limitations of the interpreter and tech debt.

Test Plan:
buck test mode/opt //caffe2/torch/csrc/deploy:test_deploy
readelf -S ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/test_deploy
check the size of the .data section
Apply the fix and check the size of the .data section again. It should be reduced by the size of the interpreter.so

The output of `readelf -S ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/test_deploy` is as follows. The .data section is now 0.0015415GB and the .torch_deploy_payXXX section is 0.605125GB

```
(pytorch) [sahanp@devvm4333.vll0 ~/local/fbsource/fbcode] readelf -S buck-out/gen/caffe2/torch/csrc/deploy/test_deploy
There are 55 section headers, starting at offset 0x24bac82b0:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000200350  00000350
       0000000000000028  0000000000000000   A       0     0     1
  [ 2] .note.ABI-tag     NOTE             0000000000200378  00000378
       0000000000000020  0000000000000000   A       0     0     4
  [ 3] .note.gnu.build-i NOTE             0000000000200398  00000398
       0000000000000024  0000000000000000   A       0     0     4
  [ 4] .dynsym           DYNSYM           00000000002003c0  000003c0
       0000000000d07a48  0000000000000018   A       9     1     8
  [ 5] .gnu.version      VERSYM           0000000000f07e08  00d07e08
       0000000000115f86  0000000000000002   A       4     0     2
  [ 6] .gnu.version_r    VERNEED          000000000101dd90  00e1dd90
       0000000000000510  0000000000000000   A       9    15     4
  [ 7] .gnu.hash         GNU_HASH         000000000101e2a0  00e1e2a0
       00000000003b4fb0  0000000000000000   A       4     0     8
  [ 8] .hash             HASH             00000000013d3250  011d3250
       0000000000457e20  0000000000000004   A       4     0     4
  [ 9] .dynstr           STRTAB           000000000182b070  0162b070
       0000000004ef205a  0000000000000000   A       0     0     1
  [10] .rela.dyn         RELA             000000000671d0d0  0651d0d0
       0000000000110b80  0000000000000018   A       4     0     8
  [11] .rela.plt         RELA             000000000682dc50  0662dc50
       00000000000093f0  0000000000000018   A       4    35     8
  [12] .rodata           PROGBITS         0000000006837040  06637040
       00000000034067a8  0000000000000000 AMS       0     0     64
  [13] fb_build_info     PROGBITS         0000000009c3d7f0  09a3d7f0
       00000000000002ee  0000000000000000   A       0     0     16
  [14] .gcc_except_table PROGBITS         0000000009c3dae0  09a3dae0
       00000000014a9340  0000000000000000   A       0     0     4
  [15] .eh_frame_hdr     PROGBITS         000000000b0e6e20  0aee6e20
       00000000004abf54  0000000000000000   A       0     0     4
  [16] .eh_frame         PROGBITS         000000000b592d78  0b392d78
       000000000200e344  0000000000000000   A       0     0     8
  [17] .text             PROGBITS         000000000d5a2000  0d3a2000
       000000001e55944e  0000000000000000  AX       0     0     256
  [18] .init             PROGBITS         000000002bafb450  2b8fb450
       0000000000000017  0000000000000000  AX       0     0     4
  [19] .fini             PROGBITS         000000002bafb468  2b8fb468
       0000000000000009  0000000000000000  AX       0     0     4
  [20] .never_hugify     PROGBITS         000000002bafb480  2b8fb480
       0000000000000db3  0000000000000000  AX       0     0     16
  [21] text_env          PROGBITS         000000002bafc240  2b8fc240
       0000000000002e28  0000000000000000  AX       0     0     16
  [22] .plt              PROGBITS         000000002baff070  2b8ff070
       00000000000062b0  0000000000000000  AX       0     0     16
  [23] .tdata            PROGBITS         000000002bb06000  2b906000
       0000000000000b20  0000000000000000 WAT       0     0     8
  [24] .tbss             NOBITS           000000002bb06b40  2b906b20
       0000000000007cb8  0000000000000000 WAT       0     0     64
  [25] .fini_array       FINI_ARRAY       000000002bb06b20  2b906b20
       0000000000000028  0000000000000000  WA       0     0     8
  [26] .init_array       INIT_ARRAY       000000002bb06b48  2b906b48
       0000000000008878  0000000000000000  WA       0     0     8
  [27] .data.rel.ro      PROGBITS         000000002bb0f3c0  2b90f3c0
       0000000000029ce0  0000000000000000  WA       0     0     64
  [28] .ctors            PROGBITS         000000002bb390a0  2b9390a0
       0000000000000010  0000000000000000  WA       0     0     8
  [29] .dynamic          DYNAMIC          000000002bb390b0  2b9390b0
       0000000000000340  0000000000000010  WA       9     0     8
  [30] .got              PROGBITS         000000002bb393f0  2b9393f0
       000000000001f040  0000000000000000  WA       0     0     8
  [31] .bss.rel.ro       NOBITS           000000002bb58440  2b958430
       0000000000000c40  0000000000000000  WA       0     0     32
  [32] .data             PROGBITS         000000002bb5a000  2b959000
       0000000000194188  0000000000000000  WA       0     0     4096
  [33] .tm_clone_table   PROGBITS         000000002bcee188  2baed188
       0000000000000000  0000000000000000  WA       0     0     8
  [34] .probes           PROGBITS         000000002bcee188  2baed188
       0000000000000002  0000000000000000  WA       0     0     2
  [35] .got.plt          PROGBITS         000000002bcee190  2baed190
       0000000000003168  0000000000000000  WA       0     0     8
  [36] .bss              NOBITS           000000002bcf1300  2baf02f8
       00000000005214f0  0000000000000000  WA       0     0     128
  [37] .nvFatBinSegment  PROGBITS         000000002c213000  2baf1000
       0000000000002850  0000000000000000   A       0     0     8
  [38] .nv_fatbin        PROGBITS         000000002c216000  2baf4000
       0000000052baed38  0000000000000000  WA       0     0     8
  [39] .comment          PROGBITS         0000000000000000  7e6a2d38
       00000000000001dc  0000000000000000  MS       0     0     1
  [40] .debug_aranges    PROGBITS         0000000000000000  7e6a2f20
       0000000001266c00  0000000000000000           0     0     16
  [41] .debug_info       PROGBITS         0000000000000000  7f909b20
       000000007b21de49  0000000000000000           0     0     1
  [42] .debug_abbrev     PROGBITS         0000000000000000  fab27969
       000000000179f365  0000000000000000           0     0     1
  [43] .debug_line       PROGBITS         0000000000000000  fc2c6cce
       00000000176954ac  0000000000000000           0     0     1
  [44] .debug_str        PROGBITS         0000000000000000  11395c17a
       0000000039dc32b0  0000000000000001  MS       0     0     1
  [45] .debug_ranges     PROGBITS         0000000000000000  14d71f430
       0000000026a2d930  0000000000000000           0     0     16
  [46] .debug_types      PROGBITS         0000000000000000  17414cd60
       000000000b211ff5  0000000000000000           0     0     1
  [47] .debug_loc        PROGBITS         0000000000000000  17f35ed55
       000000009ca80c7e  0000000000000000           0     0     1
  [48] .debug_macinfo    PROGBITS         0000000000000000  21bddf9d3
       000000000000151c  0000000000000000           0     0     1
  [49] .note.stapsdt     NOTE             0000000000000000  21bde0ef0
       0000000000001b3c  0000000000000000           0     0     4
  [50] .debug_macro      PROGBITS         0000000000000000  21bde2a2c
       0000000000040e6a  0000000000000000           0     0     1
  [51] .torch_deploy_pay PROGBITS         0000000000000000  21be23896
       0000000026ba5d28  0000000000000000           0     0     1
  [52] .symtab           SYMTAB           0000000000000000  2429c95c0
       00000000020ce0c8  0000000000000018          54   863985     8
  [53] .shstrtab         STRTAB           0000000000000000  244a97688
       000000000000025c  0000000000000000           0     0     1
  [54] .strtab           STRTAB           0000000000000000  244a978e4
       00000000070309c6  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)
```

Reviewed By: shunting314

Differential Revision: D32596676

fbshipit-source-id: 1ab15b2d36422506d8f781d3bbc0c70c44bc3d91
2021-12-15 11:27:57 -08:00
c6bcfb152d [PyTorch][easy] Move GlobalRecordFunctionCallbacks{,Entry} to cpp file (#68483)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68483

Doesn't need to be in the header.
ghstack-source-id: 145668417

Test Plan: CI

Reviewed By: chaekit

Differential Revision: D32477113

fbshipit-source-id: 30e7796413e3220e4051544559f9110ab745022d
2021-12-15 09:38:51 -08:00
873585da2b [SR] Improve set_inputs (#69087)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69087
This diff includes a variety of improvements to `set_inputs` to unify behavior with `torch::jit::Module`:

1. Eliminate code duplication between rvalue/lvalue overloads
2. Add type checks
3. Make input length check a `TORCH_CHECK` instead of a debug check - we have to fail when the wrong number of inputs are passed.
4. `schema` now always includes `self`, even if we release `module_`. This is consistent with `torch::jit::Module`.|
ghstack-source-id: 145599837

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D32711705

fbshipit-source-id: fe97c10b4f03801ba59868b452e7d02b26b3106b
2021-12-15 09:31:19 -08:00
aeedd89d4e [PyTorch] RecordFunction: use SmallVector for ObserverContextList (#68412)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68412

These lists have the same size as CallbackHandles, so they should be the same container type.
ghstack-source-id: 145668416

Test Plan:
Run same command as previous diff.

Before: see previous diff, average about 0.46us
After: P467928077, average about 0.43us

Reviewed By: chaekit

Differential Revision: D32454856

fbshipit-source-id: 3a3ff4d381d99f51ef868d4dec4db7c411b5ea56
2021-12-15 09:31:16 -08:00
29914f55bf Skip print_test_stats checks for tests that use repeat_test_for_types (#69872)
Summary:
Once https://github.com/pytorch/pytorch/issues/69865 is fixed, this change should be undone.

This will avoid print_test_stats errors in CI, such as https://github.com/pytorch/pytorch/runs/4501145212?check_suite_focus=true (HUD view fc37e5b3ed)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69872

Reviewed By: dagitses, suo

Differential Revision: D33094446

Pulled By: janeyx99

fbshipit-source-id: 7378556d75ea94dd407a2bf9dda37b15c57014f7
2021-12-15 09:29:58 -08:00
d71b8e1a8d More distutils.version.LooseVersion changes (#69947)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69947

Reviewed By: seemethere

Differential Revision: D33111996

Pulled By: malfet

fbshipit-source-id: e7d2cc4ed3e39452e809965e360b05f0b409ec0d
2021-12-15 08:07:36 -08:00
6f9844693f Revert D32974907: [quant][graphmode][fx] Enable fuse handler for sequence of 3 ops
Test Plan: revert-hammer

Differential Revision:
D32974907 (bf089840ac)

Original commit changeset: ba205e74b566

Original Phabricator Diff: D32974907 (bf089840ac)

fbshipit-source-id: e47838f3008ba014d884aef53460df654f0cf731
2021-12-15 05:46:49 -08:00
87bc1f4ed8 Revert D33024528: [quant][fx][graphmode] Add support for conv add pattern in backend_config_dict
Test Plan: revert-hammer

Differential Revision:
D33024528 (59000cff91)

Original commit changeset: 5c770c82c8f6

Original Phabricator Diff: D33024528 (59000cff91)

fbshipit-source-id: 7da6f421ef63f47fbffad8b3ad91f6a31d19d867
2021-12-15 05:45:29 -08:00
43b8e833e9 Fix bug in aten::full signature in version_map.h to accurately reflect the current schema (#69860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69860

Previously I made a mistake and checked in the aten::full.names for the upgrader of aten::full. So changed it back to just aten::full.

Test Plan: None

Reviewed By: gmagogsfm

Differential Revision: D33066985

fbshipit-source-id: a5598d60d1bff9b4455f807361388fac0689ba14
2021-12-15 01:09:31 -08:00
5c7817fd43 Add test operator in upgrader entry (#69427)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69427

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D32867984

Pulled By: tugsbayasgalan

fbshipit-source-id: 25810fc2fd4b943911f950618968af067c04da5c
2021-12-15 00:40:05 -08:00
47f11730ec Add testing for forward over reverse gradgrad (#69740)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69740

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33031727

Pulled By: soulitzer

fbshipit-source-id: 2bcba422b4bcea3bbc936d07ba45171a6531e578
2021-12-14 23:35:10 -08:00
d0fe7db1f6 Add formulas for distributions (#69690)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69690

* #69558

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33031726

Pulled By: soulitzer

fbshipit-source-id: 9ae461dc6043d48d5bb8c2bbaa266d06ad99f317
2021-12-14 23:35:07 -08:00
b399a4d7b9 Add some reduction forward AD formulas (#69661)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69661

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33020601

Pulled By: soulitzer

fbshipit-source-id: 110da6dcd490e5c3849cace62a777aa1a2b6982e
2021-12-14 23:33:43 -08:00
3b7fc0243c [PyTorch] Make TypePrinter take const Type& (#69412)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69412

TypePrinter does not need to take ownership of the Type.

This helps unblock the following diff to stop refcounting Type singletons.
ghstack-source-id: 145671619

Test Plan: CI

Reviewed By: suo

Differential Revision: D32858525

fbshipit-source-id: df58676938fd20c7bae4a366d70b2067a852282d
2021-12-14 23:13:03 -08:00
7a12b5063e [AutoAccept][Codemod][FBSourceBuckFormatLinter] Daily arc lint --take BUCKFORMAT
Reviewed By: zertosh

Differential Revision: D33119794

fbshipit-source-id: ca327caf34560c0bba32511e57d5dc18b71bdfe1
2021-12-14 21:54:41 -08:00
59000cff91 [quant][fx][graphmode] Add support for conv add pattern in backend_config_dict (#69778)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69778

This PR extends fusion pattern support from simple sequence of ops to a simple
subgraph like conv - add
```
x - conv ---\
y ---------add ---- ouptut
```
where input x, y and output are observed/quantized

Test Plan:
```
python test/fx2trt/test_quant_trt.py TestQuantizeFxTRTOps.test_conv_add
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D33024528

fbshipit-source-id: 5c770c82c8f693fabdac5c69343942a9dfda84ef
2021-12-14 20:46:01 -08:00
408283319a [Operator Versioning][Edge] Change OP to CALL when there is a valid upgrader (#67731)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67731

1. Register upgrader function at loading stage
2. Change OP to CALL when there operator_version from model is smaller than current runtime version and there exists a valid upgrader

The interpreter log is :
```
RUNNING 0 STOREN 1 3
RUNNING 1 DROPR 1
RUNNING 2 LOAD 2
RUNNING 3 LOAD 3
RUNNING 4 CALL 0
RUNNING 0 STOREN 1 2
RUNNING 1 LOAD 1
RUNNING 2 OP 0, aten::is_floating_point
RUNNING 3 JF 3
RUNNING 4 LOADC 1
RUNNING 5 JMP 3
RUNNING 8 STORE 3
RUNNING 9 MOVE 3
RUNNING 10 JF 5
RUNNING 11 LOAD 1
RUNNING 12 LOAD 2
RUNNING 13 OP 1, aten::div.Tensor
RUNNING 14 JMP 5
RUNNING 19 STORE 4
RUNNING 20 DROPR 2
RUNNING 21 DROPR 1
RUNNING 22 MOVE 4
RUNNING 23 RET
RUNNING 5 LOAD 2
RUNNING 6 LOAD 3
RUNNING 7 CALL 0
RUNNING 0 STOREN 1 2
RUNNING 1 LOAD 1
RUNNING 2 OP 0, aten::is_floating_point
RUNNING 3 JF 3
RUNNING 4 LOADC 1
RUNNING 5 JMP 3
RUNNING 8 STORE 3
RUNNING 9 MOVE 3
RUNNING 10 JF 5
RUNNING 11 LOAD 1
RUNNING 12 LOAD 2
RUNNING 13 OP 1, aten::div.Tensor
RUNNING 14 JMP 5
RUNNING 19 STORE 4
RUNNING 20 DROPR 2
RUNNING 21 DROPR 1
RUNNING 22 MOVE 4
RUNNING 23 RET
RUNNING 8 MOVE 2
RUNNING 9 MOVE 3
RUNNING 10 CALL 0
RUNNING 0 STOREN 1 2
RUNNING 1 LOAD 1
RUNNING 2 OP 0, aten::is_floating_point
RUNNING 3 JF 3
RUNNING 4 LOADC 1
RUNNING 5 JMP 3
RUNNING 8 STORE 3
RUNNING 9 MOVE 3
RUNNING 10 JF 5
RUNNING 11 LOAD 1
RUNNING 12 LOAD 2
RUNNING 13 OP 1, aten::div.Tensor
RUNNING 14 JMP 5
RUNNING 19 STORE 4
RUNNING 20 DROPR 2
RUNNING 21 DROPR 1
RUNNING 22 MOVE 4
RUNNING 23 RET
RUNNING 11 TUPLE_CONSTRUCT 3
RUNNING 12 RET
```

The upgrader bytecode is:
```
(STOREN, 1, 2)
(LOAD, 1, 0)
(OP, 0, 0)
(JF, 3, 0)
(LOADC, 1, 0)
(JMP, 3, 0)
(LOAD, 2, 0)
(OP, 0, 0)
(STORE, 3, 0)
(MOVE, 3, 0)
(JF, 5, 0)
(LOAD, 1, 0)
(LOAD, 2, 0)
(OP, 1, 0)
(JMP, 5, 0)
(LOAD, 1, 0)
(LOAD, 2, 0)
(LOADC, 0, 0)
(OP, 2, 0)
(STORE, 4, 0)
(DROPR, 2, 0)
(DROPR, 1, 0)
(MOVE, 4, 0)
(RET, 0, 0)
```
ghstack-source-id: 145635622

Test Plan: describe in summary and CI

Reviewed By: iseeyuan

Differential Revision: D32092517

fbshipit-source-id: 0314b4bda5d2578cdd4e7cfbfd1e3c07fbccf8a3
2021-12-14 19:13:12 -08:00
9e4d60a552 [Operator Versioning][Edge] Use check in cpp source file for upgrader (#67728)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67728

1. Check in upgrader_mobile.h and upgrader_mobile.cpp
2. Add test to parse all bytecode from upgrader_mobile.h
ghstack-source-id: 145635621

Test Plan: buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterUpgraderTest.Upgrader'

Reviewed By: iseeyuan

Differential Revision: D32087295

fbshipit-source-id: 21e95aabb5e9db76be27e01adfea8fbc41caeaf6
2021-12-14 19:10:51 -08:00
bf089840ac [quant][graphmode][fx] Enable fuse handler for sequence of 3 ops (#69658)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69658

This PR enables fuse handler for sequence of three ops, and merges all fuse handlers into one

TODO: we can also move this to backend_config_dict folder

Test Plan:
regression fusion test
```
python test/test_quantization.py TestFuseFx
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32974907

fbshipit-source-id: ba205e74b566814145f776257c5f5bb3b24547c1
2021-12-14 19:04:21 -08:00
102684b252 [SR] Fix stack/concat bug (#68777)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68777

Fixed some cases where negative dimensions were not handled correctly

* `_stack_cpu` calls `maybe_wrap_dim`, but `_stack_cpu_out` does not. This is only problematic when `_stack_cpu_out` forwards to the serial kernel: [ref](https://www.internalfb.com/code/fbsource/[1b5af978b48f2e5d308d42b588bde3275869a57b]/fbcode/caffe2/aten/src/ATen/native/TensorShape.cpp?lines=1541-1547).
* concat also needs to wrap its dim

Test Plan:
`buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Added new tests to cover this case

Reviewed By: hlu1

Differential Revision: D32604623

fbshipit-source-id: 00aaa42817cd2d3e7606ce75ab5a9744645118cf
2021-12-14 16:26:27 -08:00
ebc35a7ead [JIT] Enable freezing for sparse COO tensors (#69614)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69614

Previously sparse COO tensors were ignored during freezing, because
`tryInsertConstant` would fail during `freeze_module.cpp`, and because
hashes weren't implemented for COO tensor IValues.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32954620

Pulled By: davidberard98

fbshipit-source-id: a91f97fdfc2152b417f43a6948100c94970c0831
2021-12-14 15:43:50 -08:00
33363cea64 Revert D32498572: allow external backend codegen to be used without autograd kernels
Test Plan: revert-hammer

Differential Revision:
D32498572 (b83b6f7424)

Original commit changeset: 3e7159c633f6

Original Phabricator Diff: D32498572 (b83b6f7424)

fbshipit-source-id: f93fa444c95a2423eef5975a2ecdb96f14e0c535
2021-12-14 15:28:49 -08:00
f6cad53443 Revert D32498569: allow external backend codegen to toggle whether to generate out= and inplace kernels
Test Plan: revert-hammer

Differential Revision:
D32498569 (aa0cf68c17)

Original commit changeset: ebd932d042b9

Original Phabricator Diff: D32498569 (aa0cf68c17)

fbshipit-source-id: 21a393fa339510d926512a7983d33ece327b743d
2021-12-14 15:27:24 -08:00
0ef523633f Revert D32498570: make codegen'd device guards not cuda-specific. Allow them to be used in external codegen
Test Plan: revert-hammer

Differential Revision:
D32498570 (2e7a91c45f)

Original commit changeset: 0ce6a5614417

Original Phabricator Diff: D32498570 (2e7a91c45f)

fbshipit-source-id: 7c64ce1b5e51a680b4aeae8721e0c9e15c793289
2021-12-14 15:04:10 -08:00
24ee1d13f6 Another attempt to fix version comparison check (#69939)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69939

Reviewed By: atalman

Differential Revision: D33108135

Pulled By: malfet

fbshipit-source-id: cadadfe5b04c4378f149136f8e1f8e8d6266775c
2021-12-14 14:54:15 -08:00
d4f8313497 Add low level torch.profiler.kineto_profile base class (#63302)
Summary:
Refactor torch.profiler.profile by separate it into one low level class and one high level wrapper.

The PR include the following change:
1. separate class torch.profiler.profile into two separated class: kineto_profiler and torch.profiler.profile.
2. The former class has the low-level functionality exposed in C++ level like: prepare_profiler, start_profiler, stop_profiler.
3. The original logics in torch.profiler.profile including export_chrome_trace, export_stacks, key_averages, events, add_metadata are all moved into kineto_profiler since they are all exposed by the torch.autograd.profiler.
4. The new torch.profiler.profile is fully back-compatible with original class since it inherit from torch.profiler.kineto_profiler. Its only responsibility in new implementation is the maintenance of the finite state machine of ProfilerAction.

With the refactoring, the responsibility boundary is clear and the new logic is simple to understand.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63302

Reviewed By: albanD

Differential Revision: D33006442

Pulled By: robieta

fbshipit-source-id: 30d7c9f5c101638703f1243fb2fcc6ced47fb690
2021-12-14 14:47:43 -08:00
e8d5c7cf7f [nn] mha : no-batch-dim support (python) (#67176)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/60585

* [x] Update docs
* [x] Tests for shape checking

Tests take roughly 20s on system that I use. Below is the timings for slowest 20 tests.

```
pytest test/test_modules.py -k _multih --durations=20
============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.10.0, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
rootdir: /home/kshiteej/Pytorch/pytorch_no_batch_mha, configfile: pytest.ini
plugins: hypothesis-6.23.2, repeat-0.9.1
collected 372 items / 336 deselected / 36 selected

test/test_modules.py ..............ssssssss..............                                                                                                                                                  [100%]

================================================================================================ warnings summary ================================================================================================
../../.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:73
test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float32
  /home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:73: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/warnings.html
============================================================================================== slowest 20 durations ==============================================================================================
8.66s call     test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MultiheadAttention_cuda_float64
2.02s call     test/test_modules.py::TestModuleCPU::test_gradgrad_nn_MultiheadAttention_cpu_float64
1.89s call     test/test_modules.py::TestModuleCUDA::test_grad_nn_MultiheadAttention_cuda_float64
1.01s call     test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float32
0.51s call     test/test_modules.py::TestModuleCPU::test_grad_nn_MultiheadAttention_cpu_float64
0.46s call     test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_cuda_float32
0.45s call     test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_cuda_float64
0.44s call     test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_cuda_float32
0.21s call     test/test_modules.py::TestModuleCUDA::test_pickle_nn_MultiheadAttention_cuda_float64
0.21s call     test/test_modules.py::TestModuleCUDA::test_pickle_nn_MultiheadAttention_cuda_float32
0.18s call     test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_cuda_float64
0.17s call     test/test_modules.py::TestModuleCPU::test_non_contiguous_tensors_nn_MultiheadAttention_cpu_float32
0.16s call     test/test_modules.py::TestModuleCPU::test_non_contiguous_tensors_nn_MultiheadAttention_cpu_float64
0.11s call     test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float64
0.08s call     test/test_modules.py::TestModuleCPU::test_pickle_nn_MultiheadAttention_cpu_float32
0.08s call     test/test_modules.py::TestModuleCPU::test_pickle_nn_MultiheadAttention_cpu_float64
0.06s call     test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_cuda_float64
0.06s call     test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_cuda_float32
0.06s call     test/test_modules.py::TestModuleCPU::test_forward_nn_MultiheadAttention_cpu_float32
0.06s call     test/test_modules.py::TestModuleCPU::test_forward_nn_MultiheadAttention_cpu_float64
============================================================================================ short test summary info =============================================================================================
=========================================================================== 28 passed, 8 skipped, 336 deselected, 2 warnings in 19.71s ===========================================================================
```

cc albanD mruberry jbschlosser walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67176

Reviewed By: dagitses

Differential Revision: D33094285

Pulled By: jbschlosser

fbshipit-source-id: 0dd08261b8a457bf8bad5c7f3f6ded14b0beaf0d
2021-12-14 13:21:21 -08:00
37ec99c0e4 Open source trt lowering workflow (#69381)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69381

Open source lowering workflow, related tools and tests.

Test Plan: CI

Reviewed By: 842974287

Differential Revision: D32815136

fbshipit-source-id: 3ace30833a2bc52e9b02513c5e223cb339fb74a3
2021-12-14 13:00:21 -08:00
930067d129 Build clang builds with -Werror (#69712)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69712

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D32997002

Pulled By: malfet

fbshipit-source-id: 8ebb5a955f8ae2d3fb67bc70636a2b1d66010c84
2021-12-14 12:41:57 -08:00
c76c6e9bd3 [ONNX] Add BFloat16 type support when export to ONNX (#66788)
Summary:
- PyTorch and ONNX has supported BFloat16, add this to unblock some mixed-precision training model.
- Support PyTorch TNLG model to use BFloat16 tensors for the inputs/outputs of the layers that run on the NPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66788

Reviewed By: jansel

Differential Revision: D32283510

Pulled By: malfet

fbshipit-source-id: 150d69b1465b2b917dd6554505eca58042c1262a
2021-12-14 12:23:32 -08:00
800a457b6f [shard] add ShardedOptimizer (#68607)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68607

This PR added ShardedOptimizer and a API to get module parameters along with ShardedTensor param, it allows user to use this Optimizer Wrapper to construct a optimizer that involves ShardedTensor

The state_dict support will be a follow up diff
ghstack-source-id: 145532834

Test Plan: python test_sharded_optim.py

Reviewed By: pritamdamania87

Differential Revision: D32539994

fbshipit-source-id: a3313c6870d1f1817fc3e08dc2fc27dc43bef743
2021-12-14 12:15:20 -08:00
457ba1dd3e Porting index_add to structured kernels, add an out variant (#65993)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65993

This PR attempts to port `index_add` to structured kernels, but does more than that:

* Adds an `out=` variant to `index_add`
* Revises `native_functions.yaml` registrations, to not have multiple entries and instead pass default value to `alpha`.
* Changes in `derivatives.yaml` file for autograd functioning
* Revises error messages, please see: https://github.com/pytorch/pytorch/pull/65993#issuecomment-945441615

Follow-up PRs in near future will attempt to refactor the OpInfo test, and will give another look at tests in `test/test_torch.py` for this function. (hence the use of ghstack for this)

~This is WIP because there are tests failing for `Dimname` variant on mobile/android builds, and I'm working on fixing them.~

Issue tracker: https://github.com/pytorch/pytorch/issues/55070

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D32646426

fbshipit-source-id: b035ecf843a9a27d4d1e18b202b035adc2a49ab5
2021-12-14 11:57:13 -08:00
9594a94d80 fix CompositeImplicitAutograd ops improperly labeled (#69863)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69863

This reverts commit 41c344d460a941c57f4793690c396f830a992824.

Test Plan: Imported from OSS

Reviewed By: albanD, soulitzer

Differential Revision: D33072958

Pulled By: bdhirsh

fbshipit-source-id: 3d3488f37986256986ab009d6f16476f29cff625
2021-12-14 11:47:07 -08:00
269e92669a [c2] Remove unused private fields (#69709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69709

Fix logical bug in `caffe2/ideep/operators/conv_op.cc`, which
contained an always false statement (fusion_type_ == X && fusion_type_ == Y ) statement

Test Plan: Imported from OSS

Reviewed By: r-barnes

Differential Revision: D32997006

Pulled By: malfet

fbshipit-source-id: 23e4db1b17cf8a77eae6a8691847ffa484d4736c
2021-12-14 11:31:08 -08:00
fef9981998 Update run_test.py (#69920)
Summary:
Do not compare LooseVersion against string

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69920

Reviewed By: atalman

Differential Revision: D33101166

Pulled By: malfet

fbshipit-source-id: a2df9e01d17663262718f11e580c8b009764f7b5
2021-12-14 11:26:56 -08:00
3e43c478a8 [Quant][fx] Lower reference conv[1-3]d module (#69228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69228

Implement lowering logic for reference conv modules,
similar to https://github.com/pytorch/pytorch/pull/65723.
ghstack-source-id: 145058198

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_conv_lowering

Imported from OSS

Reviewed By: anjali411

Differential Revision: D32890743

fbshipit-source-id: 04f2500628c60b0fbc84d22705164215e190aeba
2021-12-14 11:23:39 -08:00
b67eaec853 [DateLoader] more clearly expose 'default_collate' and 'default_convert' to users (#69862)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69862

Fixes #69445

cc SsnL VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan, ngimel

Differential Revision: D33068792

Pulled By: NivekT

fbshipit-source-id: ef9791acdc23d014b8761fa7420062d454ce8969
2021-12-14 11:18:26 -08:00
1188d89a1d TestMathBits: Call functions with original sample input values (#68947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68947

`_test_math_view` currently calls the operator with different values
than those specified in the `SampleInput`. This is undesirable as it
could break mathematical properties required by the operator. Instead,
this calls `math_op_view(math_op_physical(sample.input))` to get a
view that represents the same value as the original input.

`test_neg_view` already did this by returning `torch._neg_view(-x)`
from `math_op_view` but this moves the handling into `_test_math_view`
to make it apply to all view op tests.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D33064327

Pulled By: anjali411

fbshipit-source-id: 4d87e0c04fc39b95f8dc30dcabda0d554d16a1d8
2021-12-14 11:10:13 -08:00
1a299d8f1b Add support for transformer layout of masked_softmax (#69272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69272

In transformer encoder and MHA, masked_softmax's mask is a 2D tensor (B, D), where input is a 4D tensor (B, H, D, D).
This mask could be simply broadcasted to a (B, H, D, D) like input, and then do a regular masked_softmax, however it will bring the problem of non-contiguous mask & consume more memory.
In this diff, we maintained mask's shape unchanged, while calc the corresponding mask for input in each cuda thread.

This new layout is not currently supported in CPU yet.

Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax

Reviewed By: ngimel

Differential Revision: D32605557

fbshipit-source-id: ef37f86981fdb2fb264d776f0e581841de5d68d2
2021-12-14 10:51:58 -08:00
2e7a91c45f make codegen'd device guards not cuda-specific. Allow them to be used in external codegen (#68531)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68531

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32498570

Pulled By: bdhirsh

fbshipit-source-id: 0ce6a5614417671313b4d274ea84742c5b81d1b0
2021-12-14 10:25:04 -08:00
aa0cf68c17 allow external backend codegen to toggle whether to generate out= and inplace kernels (#68530)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68530

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32498569

Pulled By: bdhirsh

fbshipit-source-id: ebd932d042b988e19c71aa04a21677db9bdc9f04
2021-12-14 10:25:02 -08:00
b83b6f7424 allow external backend codegen to be used without autograd kernels (#68529)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68529

Test Plan: Imported from OSS

Reviewed By: wconstab

Differential Revision: D32498572

Pulled By: bdhirsh

fbshipit-source-id: 3e7159c633f6a80b60faa068436a4c49ebe731ca
2021-12-14 10:23:12 -08:00
8acd0a8b2f Allow row sizes to support int64/size_t. (#69303)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69303

Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/792

Follow up to D32715453 (e60fd10659), allowing row size to be 64-bit.

Test Plan:
buck test mode/opt -c fbcode.caffe2_gpu_type=v100,a100 //deeplearning/fbgemm/fbgemm_gpu:quantize_ops_test
   buck test mode/opt -c fbcode.caffe2_gpu_type=none //deeplearning/fbgemm/fbgemm_gpu:quantize_ops_test
   buck test mode/opt //caffe2/test:

Reviewed By: jspark1105, jianyuh

Differential Revision: D32768838

fbshipit-source-id: 9e2b01d8d23e71f8333820e725379c3fc1c0711a
2021-12-14 10:09:08 -08:00
2c9dd886af Modify torch.movedim to handle scalar as no-op (#69537)
Summary:
`torch.movedim` directly handle the case of a scalar tensor (0-dim) in input as a no-op by returning a view of the input tensor (after all the usual checks for the other parameters)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69537

Test Plan:
This code now works fine and res1 is a view of tensor
```
import torch

tensor = torch.rand(torch.Size([]))
res1 = torch.movedim(tensor, 0, 0)
```

Fixes https://github.com/pytorch/pytorch/issues/69432

Reviewed By: jbschlosser

Differential Revision: D33020014

Pulled By: albanD

fbshipit-source-id: b3b2d380d70158bd3b3d6b40c073377104e09007
2021-12-14 09:55:59 -08:00
7503ec58b2 [nnc][fix] xnnpack ifdef (#69870)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69870

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D33075061

Pulled By: IvanKobzarev

fbshipit-source-id: dd53ad8b7d0ff36a68f0864540d6f7dd2284f0e0
2021-12-14 09:50:24 -08:00
f7294cd865 [Static Runtime] Skip ReplaceWithCopy when inputs have writters (#69819)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69819

We should skip ReplaceWithCopy if the inputs to the operator can be updated during inference. For a set of tensors that share data, ReplaceWithCopy should not happen to any of them if there exists updates to any of them.

Currently, the check in place has missed some cases (suppose there exists updates, and uses <= 1). This diff addresses the missing cases by querying AliasDB.

Test Plan:
- Added test cases, including a one that is problematic before this diff
- CI

Reviewed By: mikeiovine

Differential Revision: D33052562

fbshipit-source-id: 61f87e471805f41d071a28212f2f457e8c6785e7
2021-12-14 09:39:49 -08:00
07767569c9 Properly import LooseVersion (#69904)
Summary:
This fixes regression introduced by https://github.com/pytorch/pytorch/pull/57040

Somehow importing `distutils` from `setuptool` caused import of
`distutils.versions`, which is not a documented dependency and got
change with the release of
[setuptools-59.6.0](https://github.com/pypa/setuptools/tree/v59.6.0)
We should not rely on that, as
`import distutils` never re-imports `distutils.version`, which one can
see by observing
https://github.com/python/cpython/blob/3.9/Lib/distutils/__init__.py
or by running:
```
% python3 -c "import distutils;print(distutils.__version__, dir(distutils))"
3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'sys']
% python3 -c "from setuptools import distutils;print(distutils.__version__, dir(distutils))"
3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'archive_util', 'ccompiler', 'cmd', 'config', 'core', 'debug', 'dep_util', 'dir_util', 'dist', 'errors', 'extension', 'fancy_getopt', 'file_util', 'filelist', 'log', 'spawn', 'sys', 'sysconfig', 'util', 'version']
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69904

Reviewed By: albanD, atalman, janeyx99

Differential Revision: D33094453

Pulled By: malfet

fbshipit-source-id: aaf1adb7c6f293c4e376ccff21c64cd6ba625e97
2021-12-14 09:28:19 -08:00
fdcb78df38 print fix in lr_scheduler (#68338)
Summary:
`{:5d}` fails for `CosineAnnealingWarmRestarts` which has float `epoch`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68338

Reviewed By: jbschlosser

Differential Revision: D33063970

Pulled By: albanD

fbshipit-source-id: 992e987f8d5f6f8f5067924df4671e9725b6d884
2021-12-14 09:05:19 -08:00
f7210f8d90 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33090919

fbshipit-source-id: 78efa486776014a27f280a01a21f9e0af6742e3e
2021-12-14 08:06:58 -08:00
4f81b2adbb Remove if conditioning from some MacOS workflow steps (#69788)
Summary:
Indirectly fixes https://github.com/pytorch/pytorch/issues/69389

These steps shouldn't error out when the credentials aren't set anyway

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69788

Reviewed By: seemethere

Differential Revision: D33061307

Pulled By: janeyx99

fbshipit-source-id: 7db6d15b3e80c3c13ea428248a8b4f8d2d32d4a1
2021-12-14 07:54:15 -08:00
fa615b332d added set_printoptions examples (#68324)
Summary:
Added examples for `torch.set_printoptions`

```
>>> torch.set_printoptions(precision=2)
>>> torch.tensor([1.12345])
tensor([1.12])
>>> torch.set_printoptions(threshold=5)
>>> torch.arange(10)
tensor([0, 1, 2, ..., 7, 8, 9])
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68324

Reviewed By: ngimel

Differential Revision: D33063869

Pulled By: anjali411

fbshipit-source-id: 24db99df1419f96ba8ae2b5217cb039b288b630a
2021-12-14 07:40:52 -08:00
d90012689f [DataPipe] Control shuffle settings from DataLoader2 (#65756)
Summary:
Makes `shuffle` DataPipe sensitive to DataLoader(2) `shuffle` kwarg.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65756

Reviewed By: albanD

Differential Revision: D31344867

Pulled By: VitalyFedyunin

fbshipit-source-id: e0084e0ac193ac784d6298328ca1222745681347
2021-12-14 07:35:26 -08:00
620a1fcb55 OpInfos for: normal, bernoulli, multinomial (#66358)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66358

Test Plan: - run tests

Reviewed By: mruberry

Differential Revision: D31551695

Pulled By: zou3519

fbshipit-source-id: cf1b43118a0414a1af9ece9ae8c0598b2701aa0a
2021-12-14 06:59:38 -08:00
4829dcea09 Codegen: Generate seperate headers per operator (#68247)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68247

This splits `Functions.h`, `Operators.h`, `NativeFunctions.h` and
`NativeMetaFunctions.h` into seperate headers per operator base name.
With `at::sum` as an example, we can include:
```cpp
<ATen/core/sum.h>         // Like Functions.h
<ATen/core/sum_ops.h>     // Like Operators.h
<ATen/core/sum_native.h>  // Like NativeFunctions.h
<ATen/core/sum_meta.h>    // Like NativeMetaFunctions.h
```

The umbrella headers are still being generated, but all they do is
include from the `ATen/ops' folder.

Further, `TensorBody.h` now only includes the operators that have
method variants. Which means files that only include `Tensor.h` don't
need to be rebuilt when you modify function-only operators. Currently
there are about 680 operators that don't have method variants, so this
is potentially a significant win for incremental builds.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32596272

Pulled By: albanD

fbshipit-source-id: 447671b2b6adc1364f66ed9717c896dae25fa272
2021-12-14 06:40:08 -08:00
badf7b0210 fix typo changing the generated code (#69899)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69899

Reviewed By: soulitzer

Differential Revision: D33093461

Pulled By: albanD

fbshipit-source-id: 2c672a2b767f0caed1ef3a1d2afa1cacdfcdc320
2021-12-14 06:36:14 -08:00
51033ec840 Add forward AD layout check for storage numel (#68631)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68631

This PR:
- Adds the check that the storage numel of the base and tangent tensors are the same. This is to support the case when as_strided reveals elements that aren't indexable by the input tensor.
- Skips the check when batched tensors are involved, because using as_strided to reveal elements that not indexable by the input tensor is already not allowed vmap.
- Adds tests for the above two cases, as well as an edge case regarding conj bit (what about neg bit?)

For functorch:
- we need to copy the batching rule implemented here

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32899678

Pulled By: soulitzer

fbshipit-source-id: 54db9550dd2c93bc66b8fb2d36ce40799ebba794
2021-12-14 04:34:25 -08:00
6078e12ad6 Add forward AD support for as_strided (#68629)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68629

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32899680

Pulled By: soulitzer

fbshipit-source-id: b80ba4483c06108938923f17dc67278b854515ef
2021-12-14 04:33:05 -08:00
fed9b90ed4 fixing removeProfilingNodes duplicated functions (#1282) (#68804)
Summary:
Unfortunately there're two versions of removeProfilingNodes function and one of them is not cleaning up profile_ivalue nodes properly. This leads to a dangling profile_ivalue node, which ended up being profiled multiple times and could give us false assert failures.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68804

Reviewed By: mrshenli

Differential Revision: D32980157

Pulled By: Krovatkin

fbshipit-source-id: cd57c58a941d10ccd01a6cd37aac5c16256aaea6
2021-12-13 22:54:30 -08:00
82075c0a19 Create trt plugin base (#69487)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69487

Write customized plugin for trt requires extend IPluginV2IOExt.  This diff extract functions that should share comon impl between plugins from IPluginV2IOExt into plugin_base, make writing customized plugin for oss user easier.

This diff also fix double creator issue, the root cause is about get_trt_plugin in converters.py look for plugin by name matching. Swith to use the util function from converters_utils.py resolve the issue.

Test Plan: CI

Reviewed By: 842974287

Differential Revision: D32747052

fbshipit-source-id: 7f2e8811c158230f66a0c389af4b84deaf7e2d1f
2021-12-13 21:31:24 -08:00
77a4b89411 Adding windows cuda 11.5 workflows (#69377)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/69081

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69377

Reviewed By: ngimel

Differential Revision: D33076022

Pulled By: atalman

fbshipit-source-id: aeb2791fc15d7b491976f57a74c1989c6ca61b81
2021-12-13 20:49:02 -08:00
b1ef56d646 [quant][docs] quantized model save/load instructions (#69789)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69789

Add details on how to save and load quantized models without hitting errors

Test Plan:
CI autogenerated docs

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D33030991

fbshipit-source-id: 8ec4610ae6d5bcbdd3c5e3bb725f2b06af960d52
2021-12-13 20:23:59 -08:00
2b81ea4f9a [DataPipe] Export ShardingFilter (#69844)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69844

Test Plan: Imported from OSS

Reviewed By: NivekT

Differential Revision: D33062183

Pulled By: ejguan

fbshipit-source-id: 6b3f4ad376959c4d2e8c8b2751ae6657527dcd36
2021-12-13 19:30:56 -08:00
603a1de871 Fix inefficient recursive update in ShardedTensor.state_dict hook (#68806)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68805

The bug is described in the linked issue. This PR is an attempt to make the functions `_recurse_update_dict` and `_recurse_update_module` more efficient in how they iterate over the submodules. The previous implementation was suboptimal, as it recursively called the update method on the submodules returned by `module.named_modules()`, while `module.named_modules()` already returned all submodules including nested ones.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68806

Reviewed By: pritamdamania87

Differential Revision: D33053940

Pulled By: wanchaol

fbshipit-source-id: 3e72822f65a641939fec40daef29c806af725df6
2021-12-13 19:22:55 -08:00
b08d64202a Remove THGeneral (#69041)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69041

`TH_CONCAT_{N}` is still being used by THP so I've moved that into
it's own header but all the compiled code is gone.

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D32872477

Pulled By: ngimel

fbshipit-source-id: 06c82d8f96dbcee0715be407c61dfc7d7e8be47a
2021-12-13 16:14:28 -08:00
8dfdc3df82 [ROCm] Refactor how to specify AMD gpu targets using PYTORCH_ROCM_ARCH (#61706)
Summary:
Remove all hardcoded AMD gfx targets

PyTorch build and Magma build will use rocm_agent_enumerator as
backup if PYTORCH_ROCM_ARCH env var is not defined

PyTorch extensions will use same gfx targets as the PyTorch build,
unless PYTORCH_ROCM_ARCH env var is defined

torch.cuda.get_arch_list() now works for ROCm builds

PyTorch CI dockers will continue to be built for gfx900 and gfx906 for now.

PYTORCH_ROCM_ARCH env var can be a space or semicolon separated list of gfx archs eg. "gfx900 gfx906" or "gfx900;gfx906"
cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61706

Reviewed By: seemethere

Differential Revision: D32735862

Pulled By: malfet

fbshipit-source-id: 3170e445e738e3ce373203e1e4ae99c84e645d7d
2021-12-13 15:41:40 -08:00
c6c3b43498 [SR][easy] Accessors for value array offsets (#69755)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69755

Per swolchok's suggestion on D32609915 (1c43b1602c). Hide the value offset indices behind accessors to provide more flexibility if we ever decide to change the layout of the values array.

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D32838145

fbshipit-source-id: cf805c077672de4c2fded9b41da01eca6d84b388
2021-12-13 15:31:39 -08:00
3d358a7678 Adds a maximize flag to Adam (#68164)
Summary:
Solves the next most important use case in https://github.com/pytorch/pytorch/issues/68052.

I have kept the style as close to that in SGD as seemed reasonable, given the slight differences in their internal implementations.

All feedback welcome!

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68164

Reviewed By: VitalyFedyunin

Differential Revision: D32994129

Pulled By: albanD

fbshipit-source-id: 65c57c3f3dbbd3e3e5338d51def54482503e8850
2021-12-13 05:53:53 -08:00
fc37e5b3ed Hook up general convolution to convolution_backward (#69584)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69584

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32936380

Pulled By: jbschlosser

fbshipit-source-id: c6fdd88db33bd1a9d0eabea47ae09a4d5b170e92
2021-12-12 17:30:01 -08:00
0420de3539 [SR] Log SR options (#69809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69809

SR options is only printed out once per model per net. Logging it is actually pretty helpful for debugging.

Test Plan: CI

Reviewed By: donaldong

Differential Revision: D33046814

fbshipit-source-id: 536b34e00fbc8a273c5eb4d8ae5caca0dc1f4c24
2021-12-12 16:32:00 -08:00
f0e98dcbd3 General convolution_backward function (#69044)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69044

Test Plan: Imported from OSS

Reviewed By: zou3519, albanD, H-Huang

Differential Revision: D32708818

Pulled By: jbschlosser

fbshipit-source-id: e563baa3197811d8d51553fc83718ace2f8d1b7a
2021-12-12 15:53:38 -08:00
a5b5152d7a Fix typo in aten::full in version_map (#69807)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69807

Test Plan: {gif:ursvp75m}

Reviewed By: gmagogsfm

Differential Revision: D33044503

fbshipit-source-id: 14aac66b123d84ca3f35f02c276b15e55015df9e
2021-12-12 14:47:16 -08:00
af7ee9fc01 Forward AD for inplace comparison operators (#69597)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69597

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33020600

Pulled By: soulitzer

fbshipit-source-id: 0c9ab210f7dc952a41fbcaa1f5f7921c2fdeb18b
2021-12-12 00:11:14 -08:00
0dcbd73eee Add some forward AD formulas (#69384)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69384

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33020602

Pulled By: soulitzer

fbshipit-source-id: a92dd243f2b5b21fe277b0bb17bcd61dfe5a0d67
2021-12-12 00:11:11 -08:00
baf92f9d5a Fix copy_ forward AD to handle broadcasting (#69592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69592

Currently, forward AD function for`copy_` (in `VariableTypeManual`) does not handle the broadcasting case. ~EDIT: but that is not a design decision, not a bug. In this PR, we make that clear as a comment.~

Note: `broadcast_to` does not have a batching rule in core, so the ops that rely on `copy_` to broadcast will still fail batched forward grad computation.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33020603

Pulled By: soulitzer

fbshipit-source-id: 09cb702bffc74061964a9c05cfef5121f8164814
2021-12-12 00:11:08 -08:00
db32daf4b2 Do not test batched forward grad for inplace ops (#69558)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69558

Currently we skip batched forward grad checks completely for certain views that also have inplace variants. This PR allow us to decouple the check.

Alternative: just skip the batched forward checks for inplace ops entirely. I'm okay with this because it was surprising to me these checks are being run in the first place.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D33020599

Pulled By: soulitzer

fbshipit-source-id: f8012aadc0e775f80da0ab62b2c11f6645bb1f51
2021-12-12 00:09:45 -08:00
f565167fbd Revert D32606547: torch/monitor: add C++ events and handlers
Test Plan: revert-hammer

Differential Revision:
D32606547 (e61fc1c03b)

Original commit changeset: a00d0364092d

Original Phabricator Diff: D32606547 (e61fc1c03b)

fbshipit-source-id: fbaf2cc06ad4bec606e8a9c6f591d65c04e6fa56
2021-12-11 22:51:03 -08:00
f575179953 [quant][fx][graphmode] Move more patterns to use ModuleReLU fuse handler (#69644)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69644

This PR cleans up the init of ModuleReLUFuseHandler and moved all `module - relu`
fusion pattern to use this handler

also disabled additional_fuser_method argument temporarily, will enable
after we bring back the simple pattern format

Test Plan:
```
python test/test_quantize_fx.py TestFuseFx
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32974906

fbshipit-source-id: 23483ea4293d569cb3cec6dadfefd4d9f30921a7
2021-12-11 22:00:06 -08:00
e61fc1c03b torch/monitor: add C++ events and handlers (#68783)
Summary:
This adds a C++ event handler corresponding to the Python one mentioned in the RFC.

This changes the counters a bit to all be push driven instead of being polled. The two window types are "fixed count" and "interval". One is based off the number of logged events and the other is based off of time windows. There's currently no active ticker for interval so it needs a regular stream of events to ensure events are produced. A follow up diff can add support for things like HHWheel / simple ticker.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783

Test Plan: buck test //caffe2/test/cpp/monitor:monitor

Reviewed By: kiukchung

Differential Revision: D32606547

fbshipit-source-id: a00d0364092d7d8a98e0b18e503c0ca8ede2bead
2021-12-11 16:44:46 -08:00
20f7c893c1 Populate runtime with upgrader graph (#68773)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68773

Test Plan: Imported from OSS

Reviewed By: qihqi, gmagogsfm

Differential Revision: D32603258

Pulled By: tugsbayasgalan

fbshipit-source-id: 6fa0b7ee4ebe46c9aa148923c6ef3e1de106ad13
2021-12-11 13:44:24 -08:00
17f3179d60 Back out "[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer" (#69796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69796

(Note: this ignores all push blocking failures!)

Test Plan: External CI + Sandcastle

Reviewed By: zhxchen17

Differential Revision: D33032671

fbshipit-source-id: dbf6690e960e25d6a5f19043cbe792add2acd7ef
2021-12-10 21:29:53 -08:00
3906f8247a clear predict_net field from PredictorExporterMeta stored in the exporter to save memory (#68485)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68485

In OSS, the only change is that we make the predict_net field of PredictorExporterMeta nullable.

Test Plan: sandcastle, let CI run

Reviewed By: boryiingsu

Differential Revision: D32467138

fbshipit-source-id: 81bd5fca695462f6a186bcfa927073874cc9c26a
2021-12-10 21:25:36 -08:00
19fecc63e4 [PyTorch][kineto] Remove heap-allocated vectors in saveExtraArgs (#69737)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69737

We can use stack allocation instead.
ghstack-source-id: 145312454

Test Plan: Ran internal framework overhead benchmark with --stressTestKinto --kinetoAddFlops, but difference was minimal. Still good to fix.

Reviewed By: chowarfb

Differential Revision: D33007329

fbshipit-source-id: e096312fef5b729cf12580be152c9418683745b8
2021-12-10 20:24:17 -08:00
731c8255b7 Fix the TorchBench CI when running with a benchmark branch. (#69795)
Summary:
Fixes TorchBench CI when user is running with their own branch

Supersedes https://github.com/pytorch/pytorch/pull/69770

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69795

Reviewed By: malfet

Differential Revision: D33032886

Pulled By: xuzhao9

fbshipit-source-id: 82baee94df6925bf91bb575143efa058ce98b914
2021-12-10 18:04:43 -08:00
59deee8308 Make c10 tests compilable with -Werror (#69711)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69711

Test Plan: Imported from OSS

Reviewed By: r-barnes

Differential Revision: D32997005

Pulled By: malfet

fbshipit-source-id: 369194051ece9d213b48584ca84e5d76b3794dae
2021-12-10 16:47:46 -08:00
e305e4d4d8 Suppress common warnings when building by clang (#69710)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69710

Namely no range-loop-analysis (that detect when loop variable can not be const reference

Test Plan: Imported from OSS

Reviewed By: r-barnes

Differential Revision: D32997003

Pulled By: malfet

fbshipit-source-id: dba0e7875e5b667e2cc394c70dd75e2403265918
2021-12-10 16:45:38 -08:00
41c344d460 Revert D32739976: fix CompositeImplicitAutograd ops improperly labeled
Test Plan: revert-hammer

Differential Revision:
D32739976 (195b0d0645)

Original commit changeset: a756dd9e0b87

Original Phabricator Diff: D32739976 (195b0d0645)

fbshipit-source-id: 6e898dd5435f31e604588e6e50be1217fa207a54
2021-12-10 13:04:29 -08:00
77213fa4d3 Fix docker builds for Python-3.6 (#69785)
Summary:
As [conda-4.11](https://anaconda.org/anaconda/conda/files?version=4.11.0) is no longer available for Python-3.6, stick to 4.10 for 3.6 builds

Fixes https://github.com/pytorch/pytorch/issues/69781

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69785

Reviewed By: seemethere, atalman

Differential Revision: D33026217

Pulled By: malfet

fbshipit-source-id: d742a1e79634ed62b3a941ba23a7a74f41c2f4cb
2021-12-10 12:29:15 -08:00
a5a7e30943 [DataPipe] Adding interface for MapDataPipes (#69648)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69648

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D32989066

Pulled By: NivekT

fbshipit-source-id: ef96bcd4ac4d7a576fdd2a3fb4ef52ae6a902e10
2021-12-10 12:06:08 -08:00
81a60b9813 [DataPipe] Adding output types to DataPipe interface file (#69647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69647

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D32989067

Pulled By: NivekT

fbshipit-source-id: 2c2e71e9e514e0d584affaa0b71b7b0d07a2ddbf
2021-12-10 12:04:45 -08:00
d026057bb3 [PyTorch] Update SmallVector from LLVM (#69110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69110

I pasted the current LLVM code, reapplied the modifications listed in the code comments, caught a few more in the diff/build process. The trivially copyable detection is different now; if gcc builds fail, will try reverting to C10_IS_TRIVIALLY_COPYABLE or copying what LLVM is doing.

The motivation for this change is that, as noted in an existing comment, C10_IS_TRIVIALLY_COPYABLE did the wrong thing for std::unique_ptr, which caused problems with D32454856 / #68412.

ghstack-source-id: 145327773

Test Plan: CI

Reviewed By: bhosmer, mruberry

Differential Revision: D32733017

fbshipit-source-id: 9452ab90328e3fdf457aad23a26f2f6835b0bd3d
2021-12-10 11:57:19 -08:00
1d269e8c15 [PyTorch] Simple refcount bump fixes in standardizeVectorForUnion & callees (#66695)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66695

More extra reference counting in this path.
ghstack-source-id: 145125484

Test Plan: CI

Reviewed By: suo

Differential Revision: D31692197

fbshipit-source-id: 126b6c72efbef9410d4c2e61179b6b67459afc23
2021-12-10 11:43:01 -08:00
5374d5d8c9 [shard] fix with_comms wrapper (#69493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69493

When added `with_comms` decorator with arguments, we added an `with_comms_decorator` inner function, `with_comms()` will refer to a function object, the added parentheses was necessary to use in test cases.

This PR fixes the `with_comms` wrapper behavior, to allow we both specify with/without arguments in test cases:
```
with_comms
def test_case:
    ...
```
or
```
with_comms(backend="gloo")
def test_case:
    ...
```
ghstack-source-id: 145327066

Test Plan: test_sharded_tensor

Reviewed By: pritamdamania87

Differential Revision: D32897555

fbshipit-source-id: 2f3504630df4f6ad1ea73b8084fb781f21604110
2021-12-10 10:25:54 -08:00
e1c583a691 [JIT] simplify logic for merging types during profiling (#69096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69096

Instead of storing profiling data in a map and then merginging at
the end, perform merging directly during profiling.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D32772626

Pulled By: davidberard98

fbshipit-source-id: 22622c916a61908b478dd09433815685ce43682a
2021-12-10 09:29:19 -08:00
3219f6a487 Make vec512 bfloat16 map function clang-Wall clean (#69707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69707

`const` modifier for `__m512` return value doesn't make much sense

Test Plan: Imported from OSS

Reviewed By: r-barnes

Differential Revision: D32997008

Pulled By: malfet

fbshipit-source-id: fb98659713fe2a23cc702252c0655106687f0dbf
2021-12-10 09:11:42 -08:00
a5ad2cdab5 Cleanup ProcessGroup.cpp (#69706)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69706

Mostly code modernization, also do not capture unused `this` in
end_handler functor

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32997009

Pulled By: malfet

fbshipit-source-id: ac907f0c6889ad06d4fb0171964cb05133e5e610
2021-12-10 09:11:39 -08:00
7ea5926130 Make blend operations clang-Wall clean (#69705)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69705

Test Plan: Imported from OSS

Reviewed By: r-barnes

Differential Revision: D32997007

Pulled By: malfet

fbshipit-source-id: cbadc44e1e7373800e94b7b2fd2711530854978c
2021-12-10 09:10:07 -08:00
195b0d0645 fix CompositeImplicitAutograd ops improperly labeled (#69169)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69169

I checked `derivatives.yaml`, and it doesn't look like `logical_not/and/xor` are meant to work with autograd. Those 3 ops are currently set as `CompositeImplicitAutograd` though, implying that they do work with autograd. Updating them to be CompositeExplicitAutograd instead.

This came up because I'm trying to improve the error checking in external backend codegen, and these ops being improperly labeled incorrectly triggers my new error checks for XLA (see https://github.com/pytorch/pytorch/pull/67090)

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32739976

Pulled By: bdhirsh

fbshipit-source-id: a756dd9e0b87276368063c8f4934be59dca371d3
2021-12-10 09:03:51 -08:00
29d759948e use irange for loops 2 (#66746)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66746

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D31705361

fbshipit-source-id: 33fd22eb03086d114e2c98e56703e8ec84460268
2021-12-10 04:26:23 -08:00
91d16cb633 [Jit] Fix schema of aten::split int[] version (#69745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69745

Missed in D31935573 (6b44e75f6b).

Reviewed By: d1jang

Differential Revision: D31889867

fbshipit-source-id: 417bd0b15db4891dbd641b35a803553f11d0d756
2021-12-10 02:33:36 -08:00
9962bfb3c9 Remove THTensor (#69040)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69040

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D32872478

Pulled By: ngimel

fbshipit-source-id: f93e16509d64308d91e374744410a6a811e7f4e3
2021-12-10 02:29:11 -08:00
531b045446 [tensorexpr] Fix the buf size of discontiguous tensors (#69657)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69657

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D32974473

Pulled By: huiguoo

fbshipit-source-id: 52dcd13d0ad7f7e4f1beb69dcaabc8ceb386ffca
2021-12-10 01:26:37 -08:00
aab67c6dff Add native masked_softmax (#69268)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69268

This diff enabled native masked softmax on CUDA, also expanded our current warp_softmax to accept masking.
The mask in this masked softmax has to be the same shape as input, and has to be contiguous.

In a following diff I will submit later, I will have encoder mask layout included, where input is BHDD and mask is BD.

Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax

Reviewed By: ngimel

Differential Revision: D32338419

fbshipit-source-id: 48c3fde793ad4535725d9dae712db42e2bdb8a49
2021-12-09 23:29:45 -08:00
a5996a6857 [SR] Wrap check_for_memory_leak with DCHECK (#69588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69588

Code cleanup

Reviewed By: mikeiovine

Differential Revision: D32938333

fbshipit-source-id: d15dc405b281411c4c3c27a1dabf82f430c3ed08
2021-12-09 22:11:21 -08:00
3bb20ae49f Make c10d tests -Werror clean (#69703)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69703

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D32997001

Pulled By: malfet

fbshipit-source-id: 38b5f195c04f2b3b920e6883a96fe9a36345b9d2
2021-12-09 22:10:04 -08:00
be757addfa Do not use std::labs (#69704)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69704

Instead, compute size diff inside the if statement

Test Plan: Imported from OSS

Reviewed By: zou3519, seemethere

Differential Revision: D32997004

Pulled By: malfet

fbshipit-source-id: a23819240bfe8278a11ebc6bae1e856de162f082
2021-12-09 22:05:14 -08:00
3f02ad09ec [ONNX] shapeValueMap: Represent symbolic shape as value (#68203) (#69545)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69545

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D32994272

Pulled By: malfet

fbshipit-source-id: 77cbdd78d01712faf4f9703549a2833340954509

Co-authored-by: jiafatom <jiafa@microsoft.com>
2021-12-09 22:00:46 -08:00
3d32a0c139 Back out "[wip][quant][graphmode] produce reference pattern for binary ops and then rewrite to quantized op" (#69713)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69713

Original commit changeset: 456086b308c4

Original Phabricator Diff: D32537714 (bd8a4a9372)

Reviewed By: jerryzh168

Differential Revision: D32976643

fbshipit-source-id: bea6bf6a2718e42c9efa48a0b0c1dc7fe3893065
2021-12-09 21:55:09 -08:00
7dba88dfdb [nnc][quant] Fix quantized concat (#69596)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69596

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D32941108

Pulled By: IvanKobzarev

fbshipit-source-id: 727f608b98625648e2e444396d910838c95f58f2
2021-12-09 18:55:32 -08:00
b2e79ed5ec Remove WindowsTorchApiMacro.h in favor of Export.h (#69585)
Summary:
Follow up to https://github.com/pytorch/pytorch/issues/68095

This also changes the files from the ATen folder to include c10's `Export.h` instead since they can't ever be exporting `TORCH_PYTHON_API`.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69585

Reviewed By: mrshenli

Differential Revision: D32958594

Pulled By: albanD

fbshipit-source-id: 1ec7ef63764573fa2b486928955e3a1172150061
2021-12-09 17:30:09 -08:00
f87f1d08e8 [SR] assignStorageToManagedTensors returns a vector (#69568)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69568

Non-empty vectors should never be passed to `assignStorageToManagedTensors` and `assignStorageToManagedOutputTensors`. Presumably, this out-variant convention was adopted to avoid move-assigning the corresponding attribtues in `MemoryPlanner`. But the cost of a vector move-assign is not high, and this function type signature is safer.

Test Plan: `buck test caffe2/bechmarks/static_runtime:static_runtime_cpptest`

Reviewed By: donaldong

Differential Revision: D32729289

fbshipit-source-id: 88f19de8eb89d8a4f1dd8bbd4d9e7f686e41888b
2021-12-09 17:01:48 -08:00
9aa1b3e396 [Static Runtime] [Code Cleanup] Encapsulate function objects within ProcessedFunction (#69595)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69595

This changes encapsulates `function` object in `ProcessedFunction` objects instead of exposing it unnecessarily just for executing it.

Test Plan: Existing tests

Reviewed By: mikeiovine

Differential Revision: D32908341

fbshipit-source-id: 5ff4951cbe276c5c6292227124d9eec1dd16e364
2021-12-09 15:11:03 -08:00
41e1ab0785 Introduce isTensorSubclassLike; add special cases to backwards formulas (#69534)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69534

Something is TensorSubclassLike if it is a Tensor subclass or if it has
the same problems as Tensor subclasses. Today that just includes Tensor
Subclasses and meta tensors but may include other things in the future.

Some of our backwards formulas are incompatible with TensorSubclassLike
objects. For example, calling .data_ptr() is a problem because many
TensorSubclassLike objects don't have storage. Another problem is
in-place operations: performing `regular_tensor.inplace_(tensor_subclass)`
is a problem.

This PR adds special cases to the backward formulas for torch.max and
torch.clamp to handle this. The backward formulas for torch.max and
torch.clamp are not dispatcher operations so they cannot be overridden
and we hesitate to make them dispatcher operations for FC/BC concerns
and performance overhead concerns.

Furthermore, the old concept of "is this inplace operation vmap
compatible?" can be subsumed by the general "is this inplace operation
tensor-subclass compatible" question, so I replaced all instances of
isInplaceVmapCompatible and replaced it with the isTensorSubclassLike
checks.

Test Plan
- I tested the changes using functorch.
- It's possible to write a test for these in core (one has to make
a custom tensor subclass and then send it through the operation and then
invoke autograd), but I wanted to push the work to doing some
generic testing for backward formulas
(https://github.com/pytorch/pytorch/issues/69530) instead of doing some
one-off things now.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32967727

Pulled By: zou3519

fbshipit-source-id: 30fda1a7581da4c55179b7a3ca05069150bbe2dc
2021-12-09 15:03:22 -08:00
d3649309e6 [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#69306)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69306

Included functions:

save_mobile_module -> saves a mobile::Module to flatbuffer
load_mobile_module_from_file -> loads a flatbuffer into mobile::Module
parse_mobile_module -> parses from bytes or deserialized flatbuffer
Module object

Test Plan: unittests

Reviewed By: gmagogsfm

Differential Revision: D32806835

fbshipit-source-id: 71913c6650e225634f878946bd16960d377a7f57
2021-12-09 14:53:31 -08:00
193e3c484e .github: Add fbsync to push triggers (#69718)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69718

canary is now pushing to fbsync so we should change our workflows to
reflect that.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet, janeyx99

Differential Revision: D32999967

Pulled By: seemethere

fbshipit-source-id: bc4bc9afd2d73c53f91d3af3b81aca1b31f665a4
2021-12-09 14:30:29 -08:00
3e20a74b55 [SR] Update memory planner docs (#69559)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69559

We have a lot of special cases. Document them so they're easy to learn about.
ghstack-source-id: 145226542

Test Plan: Spell check? :)

Reviewed By: d1jang

Differential Revision: D32929416

fbshipit-source-id: 2362410f25a27cdb74a4939903446192cef61978
2021-12-09 14:22:33 -08:00
e963b43691 Extend explanation of torch.cholesky_inverse to consider batched inputs. (#69069)
Summary:
While implementing https://github.com/pytorch/pytorch/issues/68720,
We found out empirically that `torch.cholesky_inverse` support batched inputs, but it is not explained in doc: [link](https://github.com/pytorch/pytorch/pull/68720#pullrequestreview-817243697)
`torch.cholesky_inverse` is implemented in https://github.com/pytorch/pytorch/issues/50269 and the doc was updated at https://github.com/pytorch/pytorch/issues/31275 but not merged.
neerajprad

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69069

Reviewed By: mrshenli

Differential Revision: D32979362

Pulled By: neerajprad

fbshipit-source-id: 0967c969434ce6e0ab15889c240149c23c0bce44
2021-12-09 14:01:31 -08:00
9ad05f2c3a Upgrade oneDNN to v2.3.3 and package oneDNN Graph API together (#63748)
Summary:
This PR upgrades oneDNN to [v2.3.3](https://github.com/oneapi-src/oneDNN/releases/tag/v2.3.3) and includes [Graph API preview release](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.2) in one package.

- oneDNN will be located at `pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN`
- The version of oneDNN will be [v2.3.3](https://github.com/oneapi-src/oneDNN/releases/tag/v2.3.3)
  The main changes on CPU:

  - v2.3
    - Extended primitive cache to improve primitive descriptor creation performance.
    - Improved primitive cache performance in multithreaded configurations.
    - Introduced initial optimizations for bfloat16 compute functionality for future Intel Xeon Scalable processor (code name Sapphire Rapids).
    - Improved performance of binary primitive and binary post-op for cases with broadcast and mixed source and destination formats.
    - Improved performance of reduction primitive
    - Improved performance of depthwise convolution primitive with NHWC activations for training cases
  - v2.3.1
    -  Improved int8 GEMM performance for processors with Intel AVX2 and Intel DL Boost support
    - Fixed integer overflow for inner product implementation on CPUs
    - Fixed out of bounds access in GEMM implementation for Intel SSE 4.1
  - v2.3.2
    - Fixed performance regression in fp32 inner product primitive for processors with Intel AVX512 support
  - v2.3.3
    - Reverted check for memory descriptor stride validity for unit dimensions
    - Fixed memory leak in CPU GEMM implementation

  More changes can be found in https://github.com/oneapi-src/oneDNN/releases.
- The Graph API provides flexible API for aggressive fusion, and the preview2 supports fusion for FP32 inference.  See the [Graph API release branch](https://github.com/oneapi-src/oneDNN/tree/dev-graph-preview2) and [spec](https://spec.oneapi.io/onednn-graph/latest/introduction.html) for more details. A separate PR will be submitted to integrate the oneDNN Graph API to Torchscript graph.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63748

Reviewed By: albanD

Differential Revision: D32153889

Pulled By: malfet

fbshipit-source-id: 536071168ffe312d452f75d54f34c336ca3778c1
2021-12-09 13:42:40 -08:00
17641fed2a Revert D32942007: OpInfo: Convert more sample_input_funcs to generators
Test Plan: revert-hammer

Differential Revision:
D32942007 (d21646c432)

Original commit changeset: bb5b253d6d87

Original Phabricator Diff: D32942007 (d21646c432)

fbshipit-source-id: d37c78174f0acea48e4cd4af3ac67ca4ee7ac54d
2021-12-09 10:54:41 -08:00
0ccb1dcdbb Fix inference_mode decorator (#68617)
Summary:
This fixes the case when `torch.inference_mode` is called with `mode=False` (disabled). When used as a decorator, it ignored the argument and enabled inference mode anyway.

`_DecoratorContextManager` is changed so that a new instance is a copy instead of a new instance with default parameters.

I also added more tests to cover this case.

Current behaviour:

```python
>>> import torch
>>> x = torch.ones(1, 2, 3, requires_grad=True)
>>> torch.inference_mode(mode=False)
... def func(x):
...     return x * x
...
>>> out = func(x)
>>> out.requires_grad
False
```

New behaviour (fixed):

```python
>>> import torch
>>> x = torch.ones(1, 2, 3, requires_grad=True)
>>> torch.inference_mode(mode=False)
... def func(x):
...     return x * x
...
>>> out = func(x)
>>> out.requires_grad
True
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68617

Reviewed By: mrshenli

Differential Revision: D32958434

Pulled By: albanD

fbshipit-source-id: 133c69970ef8bffb9fc9ab5142dedcffc4c32945
2021-12-09 10:45:09 -08:00
afb742382a use irange for loops 10 (#69394)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69394

Modified loops in files under fbsource/fbcode/caffe2/ from the format
```
for(TYPE var=x0;var<x_max;x++)
```
to the format
```
for(const auto var: irange(xmax))
```

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D32837991

fbshipit-source-id: fc7c4f76d2f32a17a0faf329294b3fe7cb81df32
2021-12-09 09:49:34 -08:00
2d5b3101c1 Added ScriptFunction pkl exception for issue #61210 #61381 (#67076)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61381, https://github.com/pytorch/pytorch/issues/61210

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67076

Reviewed By: jbschlosser

Differential Revision: D32908175

Pulled By: suo

fbshipit-source-id: f6e175793243dc96cde5e44022d92f2623b934eb

Co-authored-by: LucaStubbe <stubbeluca@gmail.com>
Co-authored-by: Kanon Tromp <ktromp1@student.cccd.edu>
2021-12-09 09:44:49 -08:00
d21646c432 OpInfo: Convert more sample_input_funcs to generators (#69257)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69257

These are sample functions that already use generators internally, this just moves the `yield` into the sample function itself.
Diff is best viewed ignoring whitespace changes https://github.com/pytorch/pytorch/pull/69257/files?diff=unified&w=1

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32942007

Pulled By: mruberry

fbshipit-source-id: bb5b253d6d87b3495b7059924bed35b09d2768a2
2021-12-09 08:38:51 -08:00
6de9f0fc94 OpInfo: Allow sample_inputs_func to be any iterable (#69256)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69256

Closes #52486

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32942008

Pulled By: mruberry

fbshipit-source-id: f5b01b0298c0160b0bec6e86e2b6db8cfe746206
2021-12-09 08:37:26 -08:00
d2917f705a Fix errors in common_utils.py (#69578)
Summary:
This fixes the following error:
```python
Traceback (most recent call last):
  File "/home/gaoxiang/pytorch-ucc2/test/distributed/test_distributed_spawn.py", line 40, in <module>
    run_tests()
  File "/home/gaoxiang/.local/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py", line 618, in run_tests
    ['--import-slow-tests'] if IMPORT_SLOW_TESTS else List[str]([]))
  File "/usr/lib/python3.9/typing.py", line 680, in __call__
    raise TypeError(f"Type {self._name} cannot be instantiated; "
TypeError: Type List cannot be instantiated; use list() instead
Traceback (most recent call last):
  File "/home/gaoxiang/pytorch-ucc2/test/run_test.py", line 1058, in <module>
    main()
  File "/home/gaoxiang/pytorch-ucc2/test/run_test.py", line 1036, in main
    raise RuntimeError(err_message)
RuntimeError: distributed/test_distributed_spawn failed!
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69578

Reviewed By: mrshenli

Differential Revision: D32963113

Pulled By: malfet

fbshipit-source-id: b064e230c5e572e890b4ac66ebdda2707b8c12d7
2021-12-09 07:33:43 -08:00
07932e2735 [sparsity] Convert function for sparse kernels without a context manager (#66778)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66778

This removes the hack of the context manager that would communicate the zeros block shape to the quantization convert.
The conversion will assume that the converted modules have `sparse_params` (which is added by the sparsifier).

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D31835721

Pulled By: z-a-f

fbshipit-source-id: c5fd2da3b09a728a2296765c00ca69275dbca3b1
2021-12-09 02:58:57 -08:00
b957b82db7 Replace issue templates with new issue forms - v2 (#69361)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69361

This PR introduces the new issue forms that replace issue templates.
(This is exactly the same as https://github.com/pytorch/pytorch/pull/65917 which was reverted due to an issue during the import)

This is similar to what was done in torchvision https://github.com/pytorch/vision/pull/4299 and torchaudio, you can see the end result here: https://github.com/pytorch/vision/issues/new/choose (click e.g. on the [bug report](https://github.com/pytorch/vision/issues/new?assignees=&labels=&template=bug-report.yml))

The main new thing is that we can enforce some of the fields to be filled, especially for bug reports. It's also a much cleaner GUI for users IMHO, and we can provide better examples and instructions.

There is still a "blank" template available.

I removed the "Questions" form: we say we close these issues anyway. I replaced it with a direct link to https://discuss.pytorch.org. Since we still have a "blank" template, I think this  covers all previous use-cases properly.

Test Plan: Imported from OSS

Reviewed By: albanD, mrshenli

Differential Revision: D32947189

Pulled By: NicolasHug

fbshipit-source-id: f19abe3e7c9c479b0b227969a207916db5bdb6e3
2021-12-09 02:42:29 -08:00
e948856ce7 [sparsity] Add ability to keep sparsity parameters in modules (#66777)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66777

Sometimes one might need to keep the sparsity parameters after the sparsifier is detached.
This saves the parameters in the `sparse_params`.
There are two ways of keeping the sparsifier params:

1. Tuple[str, ...]: A tuple of all the parameters that need to be stored.
2. Dict[str, Tuple[str, ...]]: A dict of layer keys and parameters. In this case only specified layers will have the parameters attached to.

For example:

```
>>> # This will keep params in every module
>>> sparsifier.squash_mask(keep_sparse_params=('sparse_block_shape',))
>>> print(model.submodule.linear1.sparse_params)
{'sparse_block_shape': (1, 4)}
>>> print(model.submodule.linear2.sparse_params)
{'sparse_block_shape': (1, 4)}
```

```
>>> # This will keep params only in specific modules
>>> sparsifier.squash_mask(keep_sparse_params={'submodule.linear1': ('sparse_block_shape',)})
>>> print(model.submodule.linear1.sparse_params)
{'sparse_block_shape': (1, 4)}
>>> print(model.submodule.linear2.sparse_params)
AttributeError: 'Linear' object has no attribute 'sparse_params'
```

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31835722

Pulled By: z-a-f

fbshipit-source-id: 20c2d80207eb7ce7291e7f5f655d3fb2a627190f
2021-12-09 02:36:27 -08:00
13faaff54c [Operator Versioning][Edge] Implement register function for upgrader (#67730)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67730

This pr implement the register function for upgrader so it can be used at loading stage
ghstack-source-id: 145170986

Test Plan:
```
buck test //caffe2/test/cpp/jit:jit
```

Reviewed By: iseeyuan

Differential Revision: D32092518

fbshipit-source-id: 779b51eb12b8cb162a93a55c1e66fe0becc4cb36
2021-12-09 02:18:09 -08:00
4f5806dee7 [AO] Clear the contents of the torch/ao/__init__.py (#69415)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69415

Adding the imports inside the torch/ao/__init__.py has a high chance of causing circular dependencies, especially if sparsity and quantization use each other's resources.
To avoid the dependency issues, we can just keep the __init__ empty.

Notes:
- This means that the user will have to explicitly import the `torch.ao.quantization` or `torch.ao.sparsity` instead of `from torch import ao; ao.quantization.???`.
- The issue of circular dependencies that are caused by the imports with binding submodules is [fixed in Python 3.7](https://docs.python.org/3/whatsnew/3.7.html#other-language-changes), which means this solution will become obsolete at the [3.6's EoL](https://www.python.org/dev/peps/pep-0494/#and-beyond-schedule), which comes [12/23/2022](https://devguide.python.org/#status-of-python-branches).

Future options to resolve the circular dependencies (subject to discussion):
1. Use interfaces for binding submodules. For example, have a torch/ao/_nn with all the source code, and an interface torch/ao/nn with only the __init__.py file. The __init__ files inside the torch/ao/_nn will be empty
2. Completely isolate the common code into a separate submodule, s.a. torch/ao/common. The other submodules will not be referencing each other.

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32860168

Pulled By: z-a-f

fbshipit-source-id: e3fe77e285992d34c87d8742e1a5e449ce417c36
2021-12-09 01:21:30 -08:00
015e481a41 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D32975574

fbshipit-source-id: 66856595c7bc29921f24a2c5c00c72892f262aa1
2021-12-09 00:10:33 -08:00
dc87cf5fe1 Fixes mem_get_info when querying on a device other than the current device (#69640)
Summary:
Also fixes the documentation failing to appear and adds a test to validate that op works with multiple devices properly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69640

Reviewed By: ngimel

Differential Revision: D32965391

Pulled By: mruberry

fbshipit-source-id: 4fe502809b353464da8edf62d92ca9863804f08e
2021-12-08 23:04:30 -08:00
24d885f5f8 [Vulkan] Thread-safe Vulkan backend for OSS (#69576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69576

Vulkan backend for OSS is also thread-safe by default:
* Removed `MAKE_VULKAN_THREADSAFE` preprocessor and if-conditions

Test Plan:
Test build on Android:
```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_perf_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_perf_test
adb shell "/data/local/tmp/vulkan_perf_test"
```
Test build on MacOS:
```
cd ~/fbsource
buck build //xplat/caffe2:pt_vulkan_perf_test_binAppleMac
./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac\#macosx-x86_64
```

Test result on Google Pixel 5:
```
//xplat/caffe2:pt_vulkan_perf_test_binAndroid#android-arm64 buck-out/gen/fe3a39b8/xplat/caffe2/pt_vulkan_perf_test_binAndroid#android-arm64
buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid#android-arm64: 1 file pushed, 0 skipped. 145.4 MB/s (826929592 bytes in 5.426s)
Running /data/local/tmp/vulkan_perf_test
Run on (8 X 1804.8 MHz CPU s)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-------------------------------------------------------------------------------------------------------------
Benchmark                                                                   Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1       39.3 ms         10.1 ms         1000
cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1       27.1 ms         5.86 ms         1000
cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1       58.5 ms         11.8 ms         1000
cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1        5.98 ms        0.803 ms         5000
cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1        9.14 ms        0.857 ms         5000
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:3       32.1 ms         31.3 ms         3000
```

Test result on MacOS:
```
Running ./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac#macosx-x86_64
Run on (16 X 2400 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 256 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 18.89, 29.61, 24.95
***WARNING*** Library was built as DEBUG. Timings may be affected.
-------------------------------------------------------------------------------------------------------------
Benchmark                                                                   Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1       53.3 ms         39.6 ms         1000
cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1       28.0 ms         20.7 ms         1000
cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1       51.8 ms         38.7 ms         1000
cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1        2.76 ms         1.31 ms         5000
cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1        2.29 ms         1.11 ms         5000
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:3       49.2 ms         41.8 ms         3000
```

Reviewed By: SS-JIA

Differential Revision: D32933891

fbshipit-source-id: d8ebd5394771e1d79230c1f3aa8fbec4472b3197
2021-12-08 21:04:52 -08:00
ecf9c82f24 Reduce binary size of TensorCompare.cu (#68835)
Summary:
This PR does several things
1) eliminates `where` instantiations for deprecated `byte` condition dtype, and casts `condition` to `bool` in this case. This is a perf penalty for people using deprecated calls
2) Makes `clamp_{min/max}.Tensor` overload reuse `clamp_{min/max}.Scalar` kernels if limit argument is cpu scalar, instead of instantiating `gpu_kernel_with_scalars`
3) Unifies all clamp_scalar kernels to use a single kernel with lambda picking the correct operation. I've verified that it doesn't degrade kernel performance.
4) Eliminates redundant TensorIterator construction that `clamp` structured kernel was doing when only `min` or `max` was specified

This reduces the cubin size for TensorCompare.cu on V100 from 15751920 bytes to 7691120 bytes, with corresponding reduction in compile time.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68835

Reviewed By: mruberry

Differential Revision: D32839241

Pulled By: ngimel

fbshipit-source-id: 0acde5af10a767264afbdb24684b137c5544b8d9
2021-12-08 20:08:53 -08:00
3e560239e2 [Vulkan] Implement clone operator (#69551)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69551

Implemented `clone` operator in the Vulkan backend:
* Supports only <= 4D tensors.
* Internal name is `aten::clone`.
* Vulkan `clone` operator accepts only `c10::MemoryFormat::Preserve` and  `c10::MemoryFormat::Contiguous` for the argument `c10::optional<c10::MemoryFormat> optional_memory_format`.
* Throws an exception if the `optional_memory_format argument` is neither `MemoryFormat::Preserve` nor `MemoryFormat::Contiguous`
* CPU implementation: [/aten/src/ATen/native/TensorFactories.cpp::clone()](3e45739543/aten/src/ATen/native/TensorFactories.cpp (L1415))
* MKL-DNN implementation: [/aten/src/ATen/native/mkldnn/TensorShape.cpp::mkldnn_clone()](3e45739543/aten/src/ATen/native/mkldnn/TensorShape.cpp (L58))
* `self.copy_(src)` calls `copy_()` for Vulkan to Vulkan copy operation
```
vTensor::copy_()
vTensor::copy_() X -> Vulkan
vTensor::copy_() CPU -> Vulkan
vTensor::clone()
vTensor::clone() -> MemoryFormat::Preserve
vTensor::clone() -> MemoryFormat::Preserve -> self = at::empty_like(src)
vTensor::clone() self.copy_(src); -> BEFORE
vTensor::copy_()
vTensor::copy_() X -> Vulkan
vTensor::copy_() Vulkan -> Vulkan
vTensor::clone() self.copy_(src); -> AFTER
vTensor::copy_()
vTensor::copy_() Vulkan -> X
vTensor::copy_() Vulkan -> CPU
```
* References:
  * Function `torch.clone` in PyTorch documentation: https://pytorch.org/docs/stable/generated/torch.clone.html
  * Pytorch preferred way to copy a tensor: https://stackoverflow.com/questions/55266154/pytorch-preferred-way-to-copy-a-tensor
  * `torch.memory_format`: https://pytorch.org/docs/stable/tensor_attributes.html?highlight=memory_format#torch.torch.memory_format
  * `c10::MemoryFormat` definition in [/c10/core/MemoryFormat.h](3e45739543/c10/core/MemoryFormat.h (L28))

Test Plan:
Build & test on Android:
```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
```
Build & test on MacOS:
```
cd ~/fbsource
buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac
./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64
```
Test result on Android (Google Pixel 5):
```
[ RUN      ] VulkanAPITest.clone_success
[       OK ] VulkanAPITest.clone_success (5 ms)
[ RUN      ] VulkanAPITest.clone_invalidinputs_exceptions
[       OK ] VulkanAPITest.clone_invalidinputs_exceptions (1 ms)
```
Test result on MacOS:
```
[ RUN      ] VulkanAPITest.clone_success
[       OK ] VulkanAPITest.clone_success (19 ms)
[ RUN      ] VulkanAPITest.clone_invalidinputs_exceptions
[       OK ] VulkanAPITest.clone_invalidinputs_exceptions (2 ms)
```

Reviewed By: SS-JIA

Differential Revision: D32923535

fbshipit-source-id: ea29792e1b0080cbbc1c8c7e8bf2beffad9b5c0d
2021-12-08 18:46:56 -08:00
eb2a803406 Run test_embedding_bag_with_no_grad_tensors only for TensorPipe (#69626)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69626

Sparse tensors are only supported by the TensorPipe RPC backend. As a
result, moving test_embedding_bag_with_no_grad_tensors to be a TensorPipe
specific test.
ghstack-source-id: 145134888

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D32959952

fbshipit-source-id: d65f2edbb6dad7705475690a8c6293a322299dde
2021-12-08 18:29:38 -08:00
b61c532f96 Make make_dual redispatch (#68630)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68630

Constraints:
1) (functorch) if all the inputs to an op have requires_grad=False and don't have tangents, then their VariableType
    kernel should be a no-op i.e., behave like a redispatch. This is due to functorch's DynamicLayerStack
   having the autograd key by default (which is so that transformations like vmap) still work with autograd
2) (inference mode) inference tensors in inference mode will call straight into the kernel, we should still do something sensible
    inside even if we normally wouldn't redispatch into it.
3) ~Should support potential application of interposition below autograd: `nn.Parameter` is a example of subclassing where the subclass
    is not preserved when an operation is performed. There is an exception though: we want calling `make_dual` on a
    `nn.Parameter` to preserve its parameterness.~
4) Should avoid calls to shallow_copy_and_detach to avoid spurious calls into `__python_dispatch__`.

This PR:
- does not redispatch to `make_dual` from its `ADInplaceOrView` kernel to satisfy (1)
- calls into `alias` from the kernel in the native namespace so that behavior is consistent with other views in inference mode to satisfy (2)
- discussion of (3). We still wouldn't be able to directly override `make_dual` below autograd. In this PR, instead of not redispatching at all, we choose to redispatch into `at::alias` so that one can override `make_dual`. The side effect is that one would not be able to distinguish calls between the two, which can be problematic (though a straightforward but hacky solution would be to create a new `at::alias_for_make_dual` that would allow users to distinguish) the two. This isn't ideal but seems to be the simplest way to satisfy (3). We don't pursue that hacky solution here.
- (4) is satisfied because we remove calls to `shallow_copy_and_detach`

<details>
<summary> A potentially less hacky but more involved solution? (WIP) </summary>

Realizing that make_dual is more like requires_grad, perhaps it shouldn't be autograd explicit? Make make_dual a composite or python-only construct. i.e., it would be a view on the primal followed by something to the effect of primal.set_fw_grad(tangent).

Additional constraints:
5) make_dual needs to be backward-differentiable (I can't think of any applications yet becuase
   technically as a high-order function, jvp's input is the tangent only, "detach" is not applied on
   the tangent, so one would still be able to propagate gradients through it).
6) set_fw_grad needs to raise an error if there is a layout mismatch and base is a forward-differnentiable view

Possible plan
- (6) implies that a plain view would not suffice. We need a `detach`-like operation to ensure that set_fw_grad
  knows the view is not forward differentiable.
- (5) implies that is this (new) `detach` would need to be backward differentiable (API TBD).
- (3) is no longer relevant because make_dual is no longer autograd explicit, but perhaps this new detach should behave like the current one? There is a lot of logic to replicate for detach, so this may be hard.
- (1) is satisfied if we use current detach logic, i.e., , and (4) is trivial.

I'm not convinced that this is the right solution either, because in the end does (3) still work?

 </details>

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32899679

Pulled By: soulitzer

fbshipit-source-id: 98e13ae954e14e1e68dbd03eb5ab3300d5ed2c5e
2021-12-08 17:56:03 -08:00
7956a405ef Make make_dual also return namedtuple when level less than zero (#68628)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68628

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32899681

Pulled By: soulitzer

fbshipit-source-id: 61ed09f4038e19817978a521e9571fdc482b424b
2021-12-08 17:54:40 -08:00
1c43b1602c [SR] Scope exit guard for memory planner deallocation (#68795)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68795

This change improves static runtime exception safety. Added a scope exit guard that invokes `MemoryPlanner::deallocate` in its destructor.

Caveat: we have to be really careful with the exception behavior of `MemoryPlanner::deallocate` and `MemoryPlanner`'s constructor, because they're now both potentially called in the destructor of the scope exit guard. Letting exceptions potentially escape destructors is playing with fire since 1) the destructor of `Deallocator` is (implicitly) `noexcept`, 2) even if it wasn't, `std::terminate` will be called if an exception escapes and the stack is already unwinding. To get around this, we wrap the deallocation stuff in a try/catch. If deallocation throws, then we simply reset all of the memory planner stuff and carry on.
There's a catch: the code path that we take when handling the deallocation exception can't throw. However, this code path is much simpler than memory planner construction/deallocation, so it's much easier to manually audit the correctness here.

Test Plan:
**New unit tests**

`buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D32609915

fbshipit-source-id: 71fbe6994fd573ca6b7dd859b2e6fbd7eeabcd9e
2021-12-08 16:41:52 -08:00
3b27304d20 Fix typos in ATen README (#69170)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69170

Reviewed By: mrshenli

Differential Revision: D32957504

Pulled By: H-Huang

fbshipit-source-id: d8e613b67a864f95e45b2d45398ee71efde0c567
2021-12-08 14:02:26 -08:00
b10381f42d Port smooth_l1_loss to structured kernels (#67404)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67404

Port smooth_l1_loss to structured kernels.

Brian Hirsh authored the part of adding build_borrowing_binary_op_coerce_to_scalar to TensorIterator.

Test Plan: This commit shouldn't change the behavior. So, CI.

Reviewed By: bdhirsh, ngimel

Differential Revision: D31981147

Pulled By: alanwaketan

fbshipit-source-id: a779bb76c848eed8b725dc0e1d56b97a3bd9c158
2021-12-08 12:56:24 -08:00
497ec9d9b8 Getting NS to work with Ferraris (#68908)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68908

see description in github

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32928449

fbshipit-source-id: ba7085b823a0ebcd0d9e40f4ac19ca0a2cac1169
2021-12-08 12:26:00 -08:00
51b6981c36 [PyTorch Tests] Split out skip logic, make changes for plugins (#67256)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67256

To change what tests can be run in various cases, the check logic should be moved to functions and variables that can be changed.

One challenge here is that decorators don't have dynamic functionality. If something is read in when imported and then changed afterwards, it will not actually change. This means we need to separate out the variables that need to be changed for our use case.

Those are put into common_distributed.py and can be changed before importing the distributed_test.py code.

The use case is to add new backends to the tests and split it into tests that can be ran on demand as a separate instance. To do so, you would change DistTestSkipCases after importing it into a launcher or a setup script and then load distributed_test.

Test Plan: Check the signals

Reviewed By: mrshenli

Differential Revision: D31906947

fbshipit-source-id: 45e3258c55f4dc34e12a468bed65280f4c25748f
2021-12-08 12:23:15 -08:00
e279963eef Remove remaining THC code (#69039)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69039

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D32872476

Pulled By: ngimel

fbshipit-source-id: 7972aacc24aef9450fb59b707ed6396c501bcb31
2021-12-08 12:18:08 -08:00
7407e3d6fd [fix] cross_entropy : fix weight with ignore_index and label_smoothing (#69511)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/69339

cc albanD mruberry jbschlosser walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69511

Reviewed By: mrshenli

Differential Revision: D32951935

Pulled By: jbschlosser

fbshipit-source-id: 482eae851861a32f96bd6231dd3448fb6d44a015
2021-12-08 12:08:33 -08:00
d44d59aa70 [BE] Enable C++ stacktraces for MultiProcessTestCase (#69175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69175

Shows C++ stacktraces for python distributed tests that inherit from
MultiProcessTestCase. Closes https://github.com/pytorch/pytorch/issues/69168
ghstack-source-id: 145085858

Test Plan: CI

Reviewed By: H-Huang

Differential Revision: D32736872

fbshipit-source-id: 743e870eefa7a9e77c5791d0936e2ebd5c9b1016
2021-12-08 11:57:51 -08:00
adb619a193 Adding hardswish, opinfo tests to custom rules (#69399)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69399

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D32937576

Pulled By: Gamrix

fbshipit-source-id: 0e53d9e6669e70abcc744399f022a902214ef213
2021-12-08 11:56:34 -08:00
a0efa48c7b [Operator Versioning][Edge] Have operator version number available at the loading stage (#67729)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67729

1. operator version is needed to decide whether applying upgrader or not. This pr make it available at loading stage.
2. Swap the order of parsing instruction and operator, because instruction needs to know the operator first because deciding whether applying upgrader or not (change `OP` to `CALL` or not).
ghstack-source-id: 145082390

Test Plan:
```
buck test //caffe2/test/cpp/jit:jit
```

Reviewed By: iseeyuan

Differential Revision: D32092516

fbshipit-source-id: 853a68effaf95dca86ae46b7f7f4ee0d8e8767da
2021-12-08 11:50:46 -08:00
2808563e69 Forward fix for failing master (#69625)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69625

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32959635

Pulled By: anjali411

fbshipit-source-id: 4d811c6a05deb991cb2886dd65b3f6059555b395
2021-12-08 11:30:38 -08:00
3e6164449f Add efficient zero tensors (#64837)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64837

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D32834987

Pulled By: anjali411

fbshipit-source-id: 20ea08ade0db0044ca633d9c1a117a6a2e65d1fd
2021-12-08 10:37:39 -08:00
30bb4e0071 Add nvidia-smi memory and utilization as native Python API (#69104)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69104

Add nvidia-smi memory and utilization as native Python API

Test Plan:
testing the function returns the appropriate value.
Unit tests to come.

Reviewed By: malfet

Differential Revision: D32711562

fbshipit-source-id: 01e676203299f8fde4f3ed4065f68b497e62a789
2021-12-08 10:33:23 -08:00
ee60b5ddf3 Improve efficiency of shape hash by not using tostring (#69496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69496

tostring is expensive, and this is equivalent and faster

Test Plan: covered by lazy tensor unit tests

Reviewed By: desertfire, alanwaketan

Differential Revision: D32901050

fbshipit-source-id: 34080f415db5fd5d3817f7f2533f062a6ec07d21
2021-12-08 09:16:00 -08:00
2cb385dd6e OpInfo for nn.functional.dropout2d, revise sample inputs for dropout (#67891)
Summary:
Earlier, we were only testing for inputs with the shape of `(5,)` for `nn.functional.dropout`, but since it's used a lot - I feel it's a good idea to test for a few more shapes including scalars. This PR:

1. Revises sample inputs for `nn.functional.dropout`
2. Adds an OpInfo for `nn.functional.dropout2d`.

A note regarding the documentation:

Looks like `nn.functional.dropout2d` also supports inputs of shape `(H, W)` apart from `(N, C, H, W) / (C, H, W)` but the [documentation](https://pytorch.org/docs/stable/generated/torch.nn.Dropout2d.html#torch.nn.Dropout2d) doesn't mention that (`H, W` case). Should that be revised or am I missing anything here? (Filed an issue here: https://github.com/pytorch/pytorch/issues/67892)

```python
# A 2D tensor is a valid input for Dropout2d
In [11]: tensor = torch.randn((3, 4), device='cpu', dtype=torch.float32)
In [12]: dropout2d = torch.nn.Dropout2d(p=0.5)

In [13]: dropout2d(tensor)
Out[13]:
tensor([[-0.1026, -0.0000, -0.0000, -0.0000],
        [-1.5647,  0.0000, -0.0000, -0.5820],
        [-0.0000, -3.2080,  0.1164, -3.6780]])
```

Issue Tracker: https://github.com/pytorch/pytorch/issues/54261

cc: mruberry zou3519

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67891

Reviewed By: mrshenli

Differential Revision: D32628527

Pulled By: mruberry

fbshipit-source-id: 4c9b89550f1d49526e294378ce107eba9f29cabb
2021-12-08 08:54:16 -08:00
f54745a6ff add OpInfo for torch.diagflat (#65680)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65680

cc mruberry

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D31730001

Pulled By: mruberry

fbshipit-source-id: 487e41da4b043944cc5b26d6081209fb0875f4de
2021-12-08 08:49:45 -08:00
7e49f4638c add OpInfo for torch.nn.functional.kl_div (#65469)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65469

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D31111698

Pulled By: mruberry

fbshipit-source-id: 0af41a2ef2b199db3d8c63050277e72213f04565
2021-12-08 08:48:18 -08:00
8b20dde932 add python dispatch test back to CI and fix typo in test (#69565)
Summary:
The error message was changed following a PR comment. And since the test doesn't run on CI, I forgot to update the test to catch the new error message.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69565

Reviewed By: mrshenli

Differential Revision: D32932982

Pulled By: albanD

fbshipit-source-id: a1da72b0ca735e72b481bc944039233094f1c422
2021-12-08 08:44:49 -08:00
afaa184b44 [Static Runtime] Avoid evaluating expressions of Node* for interpreter fallback op (#69489)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69489

This change avoids pulling out `Node*` out of `ProcessedNode*` to  evaluate expressions related to `Node*` at op execution time.

Perf gain is expected to be there but not measurable and the purpose of this change is to make SR's code more self-contained (calling more code from SR not JIT) during execution time.

Test Plan: Existing tests

Reviewed By: mikeiovine

Differential Revision: D32893265

fbshipit-source-id: f0f397666b3556f985d45112af8fe0b08de22139
2021-12-08 08:40:30 -08:00
fc2614537b Updating quantization documentation (#68907)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68907

Added information about symmetric
qschemes and corrected an error in reference to https://github.com/pytorch/pytorch/issues/68540

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32662033

fbshipit-source-id: 9052c597f61991934b86850fea8b6eab78397450
2021-12-08 08:32:33 -08:00
39fb855d91 [DataLoader] Implementing communication processes for Map-style DataPipes (#68549)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68549

cc SsnL VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32922676

Pulled By: NivekT

fbshipit-source-id: fd918a342214d617a489ac5acffff15b55e9b255
2021-12-08 07:27:01 -08:00
f3983f9c47 [quant][embdding qat] Re-land Add FX support for QAT EmbeddingBag (#69334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69334

Original PR #68121 broke with incompatible qengine for Mac OS, this PR re-introduces changes with fix

Add FX support for QAT EmbeddingBag operator, previously only eager mode support.

Test Plan:
pytest test/quantization/fx/test_quantize_fx.py  -v -k "test_qat_embeddingbag_linear"

Imported from OSS

Reviewed By: jingsh

Differential Revision: D32815153

fbshipit-source-id: 33654ce29de6e81920bf3277a75027fe403a1eb2
2021-12-08 05:57:20 -08:00
93aa3603ee [quant][embedding qat] Re-Land Support Embedding QAT via FX API (#69333)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69333

Original PR reverted due to break with incompatible qengine on Mac OS, this diff fixes that.

Support QAT workflow by using torch.fx QAT API.  e.g. `prepare_qat_fx` and `convert_fx`.

Test Plan:
`pytest test/quantization/fx/test_quantize_fx.py -v -k "test_qat_embedding_linear"`

Imported from OSS

Reviewed By: jingsh

Differential Revision: D32814827

fbshipit-source-id: f7a69d2b596f1276dc5860b397c5d5d07e5b9e16
2021-12-08 05:28:07 -08:00
fc8404b5bc histc: Avoid dispatch in parallel region (#68520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68520

Ref #56794

This changes the code from allocating 1 tensor per thread inside the
parallel region, to allocating one larger tensor outside the parallel
region and manually viewing each thread's slice of the histogram.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32929365

Pulled By: ngimel

fbshipit-source-id: e28da2736e849a0282b70f34d11526d3355d5bd5
2021-12-08 02:42:43 -08:00
2a38e1a76a Fix TSAN issue in TCPStore (#69590)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69590

The variable `callbackRegisteredData_` was written to without
synchronization.
ghstack-source-id: 145066862

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D32938979

fbshipit-source-id: bc9a11a70680db45ece95880ae19ce2026e8a88e
2021-12-07 23:29:08 -08:00
0ce49000db Release GIL during RPC shutdown. (#69586)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69586

In certain scenarios during shutdown the following assert failed:
https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/rpc/rpc_agent.cpp#L39.
This was due to _reset_current_rpc_agent not releasing GIL.

Fixed this issue by releasing GIL.
ghstack-source-id: 145062265

Test Plan: waitforbuildbot

Reviewed By: mrshenli

Differential Revision: D32937687

fbshipit-source-id: 980adbcc1e3799b40206f7bca6e7695ca67f0fc2
2021-12-07 23:24:57 -08:00
c236247826 OpInfo tests for (svd|pca)_lowrank (#69107)
Summary:
As per title.

While working on this I have discovered several issues with these methods related to grad instabilities. I will file them and link here later. These were quite painful to force to pass all the tests with these discovered issues, sorry for the delay, mruberry!

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69107

Reviewed By: zou3519

Differential Revision: D32920341

Pulled By: mruberry

fbshipit-source-id: 15b33e2b46acdcbff8a37d8e43e381eb55d1a296
2021-12-07 19:50:12 -08:00
e06af79136 Fix sign op converter (#69580)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69580

Fix bug in sign converter

Reviewed By: 842974287

Differential Revision: D32934661

fbshipit-source-id: f21d7c65b07ab2f0a0027939d660e56dacd9cdef
2021-12-07 19:04:51 -08:00
6b950eea27 Remove finput and fgrad_input from slow3d transpose signatures (#68899)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68899

Test Plan: Imported from OSS

Reviewed By: zou3519, albanD

Differential Revision: D32655872

Pulled By: jbschlosser

fbshipit-source-id: 963b391a489c639f98d9f634d4f4c668353c799a
2021-12-07 18:24:40 -08:00
05946051f8 [quant][graphmode] initial support for fusion pattern in backend_config_dict (#69335)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69335

This PR added support for configuring fusion with:
"pattern", "fuser_method"

This only works for simple sequence of 2 op patterns currently, will extend this in future PRs

Test Plan:
regresion test on linear-relu fusion:
```
python test/fx2trt/test_quant_trt.py TestQuantizeFxTRTOps
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32816164

fbshipit-source-id: f300b7b96b36908cb94a50a8a17e0e15032509eb
2021-12-07 16:54:42 -08:00
2d38d37f5f use irange for loops (#69533)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69533

Modified loops in files under fbsource/fbcode/caffe2/ from the format
```
for(TYPE var=x0;var<x_max;x++)
```
to the format
```
for(const auto var: irange(xmax))
```

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D32837942

fbshipit-source-id: 8663037a38ade8f81bd5e983a614d197ea11f0d1
2021-12-07 16:53:27 -08:00
8a975c0106 [LT] Sync with the lazy_tensor_staging branch (#69527)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69527

- Add missing TORCH_API in class/struct declarations;
- Fix internal op declarations in ltc_ops;
- Update lazy_ts_lowering.py

Test Plan: Imported from OSS

Reviewed By: alanwaketan

Differential Revision: D32918929

Pulled By: desertfire

fbshipit-source-id: e956d51aff5ef593fdf4cd5ad2a38e38788913d8
2021-12-07 16:47:35 -08:00
049debd97d [Reland][Autograd/Checkpoint] Checkpoint implementation without reentrant autograd (#69508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69508

Original Phabricator Diff: D32704467 (e032dae329)

Reland, fix is to not test traditional checkpoint when input does not require grad as that is unsupported as documented.

Original PR body:

Resubmission of https://github.com/pytorch/pytorch/pull/62964 with the
suggestions and tests discussed in
https://github.com/pytorch/pytorch/issues/65537.

Adds a `use_reentrant=False` flag to `checkpoint` function. When
`use_reentrant=True` is specified, a checkpointing implementation that uses
SavedVariableHooks instead of re-entrant autograd is used. This makes it more
composable with things such as `autograd.grad` as well as DDP (still need to
add thorough distributed testing).

As discussed in https://github.com/pytorch/pytorch/issues/65537, the tests that we need to add are:

- [x] Gradient hooks are called once
- [x] works when input does require grads but Tensor that require grads are captures (like first layer in a nn)
- [x] works for functions with arbitrary input/output objects
- [x] distributed tests (next PR)

Note that this is only for `torch.utils.checkpoint`, if this approach overall looks good, we will do something similar for `checkpoint_sequential`.
ghstack-source-id: 144948501

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D32902634

fbshipit-source-id: 2ee87006e5045e5471ff80c36a07fbecc2bea3fe
2021-12-07 16:31:23 -08:00
3456c2cbc8 Allow build_android.sh to forward Vulkan args (#69332)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69332

 ---

## Context

The `build_android.sh` script currently does not forward Vulkan configuration options, which makes it impossible to control them when running `build_pytorch_android.sh`.

## Changes

Slightly change the script to allow Vulkan configuration options to propagate from `build_pytorch_android.sh` to `build_android.sh`

Test Plan: Imported from OSS

Reviewed By: beback4u

Differential Revision: D32840908

Pulled By: SS-JIA

fbshipit-source-id: e55d89c93c996b92b743cf047f5a285bb516bbc4
2021-12-07 16:24:35 -08:00
fa39754e11 [vulkan] Disable shader optimization to avoid Validation Errors (#69331)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69331

 ---

## Context

When the optimization flag is turned on, some SPIR-V modules produced from the Vulkan compute shaders were invalid. The Vulkan Validation layer raises the following error for these modules:

```
[ UNASSIGNED-CoreValidation-Shader-InconsistentSpirv ] Object: VK_NULL_HANDLE (Type = 0) | SPIR-V module not valid: Header block 52[%52] is contained in the loop construct headed by 44[%44], but it's merge block 47[%47] is not
%52 = OpLabel
```

Turning off the optimization flag, the SPIR-V modules produced no longer reports these errors in the Validation layer.

## Changes

Turns off optimization when generating SPIR-V modules to ensure correctness of the modules.

**Note that disabling SPIR-V optimization did not regress inference latency for the several models I tested**.

Test Plan: Imported from OSS

Reviewed By: beback4u

Differential Revision: D32840910

Pulled By: SS-JIA

fbshipit-source-id: 7ccb5691fd0e2d11b9c8c28ad7b83906e8163699
2021-12-07 16:24:32 -08:00
bede33e3f5 [vulkan] Add image format qualifier to glsl files (#69330)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69330

 ---

## Context

Previously, our shader files did not declare any [image format qualifiers](https://www.khronos.org/opengl/wiki/Layout_Qualifier_(GLSL)#Image_formats) for image layouts. This causes the SPIR-V modules produced to declare the [StorageImageWriteWithoutFormat](https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html#_a_id_capability_a_capability) capability, which requires `shaderStorageImageWriteWithoutFormat` to be enabled in [VkPhysicalDeviceFeatures](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceFeatures.html). `shaderStorageImageWriteWithoutFormat` is not available on some devices, causing errors to be reported by the Vulkan validation layer.

## Changes

Vulkan shaders now declare the image format explicitly so that the SPIR-V modules produced are compatible with devices that do not have `shaderStorageImageWriteWithoutFormat` enabled.

Test Plan: Imported from OSS

Reviewed By: beback4u

Differential Revision: D32840909

Pulled By: SS-JIA

fbshipit-source-id: 76e0a0da68b423ebc74ae7e839b9cfaf57d2cd39
2021-12-07 16:23:09 -08:00
e5a1ee0e5a [quant][graphmode] Refactor fusion to use the new Pattern format (#68770)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68770

Previous fusion only works for a sequnce of ops, which is not general enough for fusion patterns
that is defined by a subgraph, this PR refactors that to make it more general

Test Plan:
```
python test/test_quantization.py TestFuseFx
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32602637

fbshipit-source-id: a7897c62081b9d71c67fb56e78484cf68deaacf6
2021-12-07 16:12:40 -08:00
1433160a36 use irange for loops 6 (#66742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66742

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D31705366

fbshipit-source-id: be58222426c192406a7f93c21582c3f6f2082401
2021-12-07 16:07:50 -08:00
9a7732e852 CMake: Support dynamic codegen outputs (#68246)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68246

Currently the codegen produces a list of output files at CMake
configuration time and the build system has no way of knowing if the
outputs change. So if that happens, you basically need to delete the
build folder and re-run from scratch.

Instead, this generates the output list every time the code generation
is run and changes the output to be a `.cmake` file that gets included
in the main cmake configuration step. That means the build system
knows to re-run cmake automatically if a new output is added. So, for
example you could change the number of shards that `Operators.cpp` is
split into and it all just works transparently to the user.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32596268

Pulled By: albanD

fbshipit-source-id: 15e0896aeaead90aed64b9c8fda70cf28fef13a2
2021-12-07 15:58:06 -08:00
cd9da3267c Rationalize API exports in torch_python (#68095)
Summary:
This renames `WindowsTorchApiMacro.h` to `Export.h` to mirror the c10 header `c10/macros/Export.h` and also updates it to use `C10_EXPORT`/`C10_IMPORT`. This also removes the `THP_API` macro from `THP_export.h` which appears to serve the same purpose.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68095

Reviewed By: jbschlosser

Differential Revision: D32810881

Pulled By: albanD

fbshipit-source-id: d6949ccd0d80d6c3e5ec1264207611fcfe2503e3
2021-12-07 15:24:37 -08:00
829b49b867 Output UnionType str rep with () instead of [] (#69502)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69502

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D32902781

Pulled By: tugsbayasgalan

fbshipit-source-id: 67a73b209575437477cdbd3eb8f685019709e99c
2021-12-07 14:17:06 -08:00
a8232ee1bc Sparse CSR CUDA: Add block torch.addmv when mat is sparse (#68708)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68708

This PR adds block CSR matrix times dense vector multiplication.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D32647694

Pulled By: cpuhrsch

fbshipit-source-id: a1c120691c4350284b156fe4259eda684b734b66
2021-12-07 14:02:59 -08:00
6df7b75186 skip ORT tensor in TensorIterator because it doesn't have storage (#68705)
Summary:
ORT Tensors are similar to XLA tensors which doesn't have storage. So extend the condition to ORT tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68705

Reviewed By: zou3519

Differential Revision: D32921378

Pulled By: albanD

fbshipit-source-id: 3bda9bba2ddd95cb561a4d1cff463de652256708
2021-12-07 13:33:54 -08:00
008469c5e2 [SR] Simplify memory re-use algorithm (#68302)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68302

Implement the new memory re-use algorithm. It’s roughly based on the c2 one, but after going through many iterations it may not be a 1:1 port anymore. Also deleted the old liveness analysis.

Test Plan:
## **Re-use metrics**

`inline_cvr` (294738512_58)
**Before**
* `local`
```
Total number of managed tensors: 2660
Total number of managed output tensors: 0
Total number of unmanaged values: 3041
Total memory managed: 4601984 bytes
Total number of reused tensors: 1183
```
* `local_ro`
```
Total number of managed tensors: 1412
Total number of managed output tensors: 0
Total number of unmanaged values: 2677
Total memory managed: 29696 bytes
Total number of reused tensors: 959
```

**After**
* `local`
```
Total number of managed tensors: 2660
Total number of managed output tensors: 0
Total number of unmanaged values: 3041
Total memory managed: 4520000 bytes
Total number of reused tensors: 1198
```
* `local_ro`
```
Total number of managed tensors: 1412
Total number of managed output tensors: 0
Total number of unmanaged values: 2677
Total memory managed: 29120 bytes
Total number of reused tensors: 963
```

Reviewed By: hlu1

Differential Revision: D32370424

fbshipit-source-id: 06a8e0a295ed7a2b4d14071349c1f1e975f746bf
2021-12-07 13:25:42 -08:00
c309637923 Making cuda 11.5 workflows periodic (#69323)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68259

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69323

Reviewed By: gchanan, malfet

Differential Revision: D32812346

Pulled By: atalman

fbshipit-source-id: 081f40802997cfb986742f1621eee4b4565660f0
2021-12-07 13:14:07 -08:00
baac51ff4a Add conda-forge dependency for cuda-11.5 (#69541)
Summary:
[NVIDIA's cudatoolkit=11.5](https://anaconda.org/nvidia/cudatoolkit/files?version=11.5.0) at the time of the writing depends on libstdcxx-ng >=9.4.0, but latest available from official anaconda channel is [9.3.0](https://anaconda.org/anaconda/libstdcxx-ng/files?version=9.3.0), so add `-c conda-forge` as extra dependency to resolve the problem

Should resolve problems such as https://app.circleci.com/pipelines/github/pytorch/pytorch/420750/workflows/19d6e3ce-a305-49c6-bac8-11ed43ed2b1e/jobs/16829102

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69541

Reviewed By: atalman

Differential Revision: D32921300

Pulled By: malfet

fbshipit-source-id: 09dd3575f968679f545aec739a2791dde85d37c1
2021-12-07 12:58:41 -08:00
358e908162 Add Union type to TorchScript Language Ref (#69514)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69514

Reviewed By: tugsbayasgalan

Differential Revision: D32909371

Pulled By: gmagogsfm

fbshipit-source-id: af1c3040cd59ee913dc576cf8a8c759313f1e07f
2021-12-07 12:53:54 -08:00
c21169ea41 [JIT] optimize_for_inference on methods other than forward (#69367)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69367

Test Plan: Imported from OSS

Reviewed By: cpuhrsch

Differential Revision: D32835529

Pulled By: davidberard98

fbshipit-source-id: d3066c23d071bc2a3bee59b8ab03b6ab0e43efcf
2021-12-07 12:36:47 -08:00
60ca6776e2 [JIT] run frozen optimizations on methods other than forward (#68668)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68668

This updates run_frozen_optimizations so that it will run on additional methods other than forward
ghstack-source-id: 143871758

Test Plan:
Added test in test_freezing.py
```
python3 test/test_jit.py -- test_conv_bn_folding_not_forward
```

Reviewed By: eellison

Differential Revision: D32567857

fbshipit-source-id: 75e56efad576404dc8d6897861d249573f5ccd7a
2021-12-07 12:35:30 -08:00
63470f9449 Sparse CSR: Implement unary ufuncs (with 0->0 correspondence) (#69292)
Summary:
This PR attempts to add support for unary ufuncs (with 0->0 correspondence) for Sparse CSR Layout.

Ops supported: `['abs', 'asin', 'asinh', 'atan', 'atanh', 'ceil', 'conj_physical', 'floor', 'log1p', 'neg', 'round', 'sin', 'sinh', 'sign', 'sgn', 'signbit', 'tan', 'tanh', 'trunc', 'expm1', 'sqrt', 'angle', 'isinf', 'isposinf', 'isneginf', 'isnan', 'erf', 'erfinv']`

cc nikitaved pearu cpuhrsch IvanYashchuk peterbell10

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69292

Reviewed By: pbelevich

Differential Revision: D32805514

Pulled By: cpuhrsch

fbshipit-source-id: 9ae20817e77a36d3aa6c5afa532b9dc3b8cf1dd3
2021-12-07 12:07:41 -08:00
1a202b0c39 Docs: Fix broken code syntax in autograd.rst (#69362)
Summary:
The backticks around `nn.Parameters` were not rendered correctly because the word was enclosed in an italics block.
Spotted the issue on https://pytorch.org/docs/stable/notes/autograd.html#locally-disable-grad-doc.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69362

Reviewed By: zou3519

Differential Revision: D32924093

Pulled By: albanD

fbshipit-source-id: 5a310ac3f3d13a5116f7aa911817b9452eee711d
2021-12-07 12:03:15 -08:00
10229e156b trt engine inspector demo (#66683)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66683

Starting from TensorRT 8.2, we have this nice engine inspector which gives you much details of trt layer.

Test Plan:
```
buck run  mode/opt -c python.package_style=inplace scripts/yinghai/test:trt_engine_inspector
```
And you will see something like
```
{"Layers": [{
  "Name": "PWN(PWN(relu_1), add_1)",
  "LayerType": "PointWiseV2",
  "Inputs": [
  {
    "Name": "x",
    "Dimensions": [10,2],
    "Format/Datatype": "Row major linear FP16 format"
  }],
  "Outputs": [
  {
    "Name": "(Unnamed Layer* 1) [ElementWise]_output",
    "Dimensions": [10,2],
    "Format/Datatype": "Row major linear FP16 format"
  }],
  "ParameterType": "PointWise",
  "ParameterSubType": "PointWiseExpression",
  "NbInputArgs": 1,
  "InputArgs": ["arg0"],
  "NbOutputVars": 1,
  "OutputVars": ["var1"],
  "NbParams": 0,
  "Params": [],
  "NbLiterals": 4,
  "Literals": ["0.000000e+00f", "1.000000e+00f", "0.000000e+00f", "0.000000e+00f"],
  "NbOperations": 2,
  "Operations": ["const auto var0 = pwgen::iMax(arg0, literal0);", "const auto var1 = pwgen::iPlus(arg0, var0);"],
  "TacticValue": "0x0"
},{
  "Name": "matmul_1",
  "LayerType": "MatrixMultiply",
  "Inputs": [
  {
    "Name": "(Unnamed Layer* 1) [ElementWise]_output",
    "Dimensions": [10,2],
    "Format/Datatype": "Row major linear FP16 format"
  },
  {
    "Name": "y",
    "Dimensions": [10,2],
    "Format/Datatype": "Row major linear FP16 format"
  }],
  "Outputs": [
  {
    "Name": "output0",
    "Dimensions": [10],
    "Format/Datatype": "Row major linear FP16 format"
  }],
  "ParameterType": "MatrixMultiply",
  "MatrixOpA": "VECTOR",
  "MatrixOpB": "VECTOR",
  "Alpha": 1,
  "Beta": 0,
  "TacticValue": "0x1"
}],
"Bindings": ["x"
,"y"
,"output0"
]}
```

Reviewed By: RoshanPAN, wushirong

Differential Revision: D31681405

fbshipit-source-id: 31f912c37812ac17c6421073e0c35e512463ba6e
2021-12-07 11:50:09 -08:00
aa9fbb9ae9 [JIT] check stack size after calling operator (#68788)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68788

In debug mode, this should throw errors for ops where the wrong number ops is returned (i.e. the number of values left on the stack is different from the number shown in the schema)

Test Plan:
Run this in debug mode and verify that it doesn't throw an assert
```
import torch

class Thing(torch.nn.Module):
    torch.jit.export
    def en(self, x: torch.Tensor):
        return torch.add(x, 2.0)

    def forward(self, x: torch.Tensor, y: torch.Tensor):
        a = torch.mm(x, y)
        b = torch.nn.functional.gelu(a)
        c = self.en(b)
        return c.std_mean()

if __name__ == '__main__':
    unsc = Thing()
    thing = torch.jit.script(unsc)
    x = torch.randn(4, 4)
    y = torch.randn(4, 4)
    std, mean = thing.forward(x, y)
    print(std, mean)
    print(str(thing.forward.graph))
```

Reviewed By: gchanan

Differential Revision: D32625256

Pulled By: davidberard98

fbshipit-source-id: 61d5ec0c5a9f8b43706257119f4f524bb9dbe6f5
2021-12-07 11:43:50 -08:00
bd8d4195a6 [DataPipe] Small change to generation script and update to DataPipe .pyi file (#69392)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69392

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D32849463

Pulled By: NivekT

fbshipit-source-id: b6d419fbe0e4cc9d718f21fb3fe886f721f618d3
2021-12-07 11:40:53 -08:00
fdfdafd1e6 [DataPipe] Removing usage of unbatch_level from .batch interface and DataFrame (#69393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69393

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D32849461

Pulled By: NivekT

fbshipit-source-id: 16abbe289ad2092faaa029fd78f3d6924e7b2ff4
2021-12-07 11:40:50 -08:00
357160e68e [DataPipe] Unifying API - removing nesting_level argument from FilterIterDataPipe (#69391)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69391

As part of the efforts to unify the APIs across different data backends (e.g. TorchData, TorchArrow), we are making changes to different DataPipes' APIs. In this PR, we are removing the input argument `nesting_level` from `FilterIterDataPipe`.

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D32849462

Pulled By: NivekT

fbshipit-source-id: 91cf1dc03dd3d3cbd7a9c6ccbd791ade91355f30
2021-12-07 11:40:46 -08:00
4478b14e4c [DataPipe] Unifying API - removing nesting_level argument from MapperIterDataPipe (#69390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69390

As part of the efforts to unify the APIs across different data backends (e.g. TorchData, TorchArrow), we are making changes to different DataPipes' APIs. In this PR, we are removing the input argument `nesting_level` from `MapperIterDataPipe`.

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D32849465

Pulled By: NivekT

fbshipit-source-id: 963ce70b84a7658331d126e5ed9fdb12273c8e1f
2021-12-07 11:39:08 -08:00
9cb52327a8 [quant][refactor] Move pattern type definition to ao/quantization/utils.py (#68769)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68769

att, since we want to use this type in fuser_method_mapping in later PRs

Test Plan:
no change to logic, just regression test on ci
```
python test/test_quantization.py
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32602636

fbshipit-source-id: 15b95241431dfca9b1088d0920bf75705b37aa9a
2021-12-07 11:00:22 -08:00
976b076715 [iOS] Add LibTorch nightly build (#69341)
Summary:
Add LibTorch nightly build for using in LibTorchvision.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69341

Test Plan:
CI jobs: https://fburl.com/lbyjzpxz
1. Validate lib is uploaded to link https://ossci-ios-build.s3.amazonaws.com/libtorch_ios_nightly_build.zip
2. Download lib from the link and validate `version.txt` is correct
3. Test the lib in HelloWorld demo
Imported from OSS

Reviewed By: xta0

Differential Revision: D32901836

Pulled By: hanton

fbshipit-source-id: 8622c3e6052cec2039bc15dea0d495ec1a8186cb
2021-12-07 10:07:28 -08:00
3edf1b6cee [PyTorch] Avoid no-op shared_ptr dtor when constructing tuple (#69337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69337

See note in code.
ghstack-source-id: 144657751

Test Plan:
Ran PyTorchFeatureConversionBenchmark 5x before/after:

```
swolchok@devbig032 ~/f/fbcode> for x in (seq 5); sudo scripts/bertrand/noise/denoise.sh /tmp/pytorch_feature_conversion_benchmark.Dec2CacheTupleTypes ; end                                                                                                                                                                                              (pytorch-ort-bert)
============================================================================
sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative  time/iter  iters/s
============================================================================
PyTorchFeatureConversionDenseBenchmark                       2.39us  418.75K
PyTorchFeatureConversionIdListBenchmark                      3.59us  278.91K
PyTorchFeatureConversionIdScoreListBenchmark                 5.01us  199.51K
============================================================================
============================================================================
sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative  time/iter  iters/s
============================================================================
PyTorchFeatureConversionDenseBenchmark                       2.42us  413.80K
PyTorchFeatureConversionIdListBenchmark                      3.56us  280.60K
PyTorchFeatureConversionIdScoreListBenchmark                 5.05us  198.15K
============================================================================
============================================================================
sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative  time/iter  iters/s
============================================================================
PyTorchFeatureConversionDenseBenchmark                       2.41us  414.25K
PyTorchFeatureConversionIdListBenchmark                      3.55us  281.59K
PyTorchFeatureConversionIdScoreListBenchmark                 5.02us  199.09K
============================================================================
============================================================================
sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative  time/iter  iters/s
============================================================================
PyTorchFeatureConversionDenseBenchmark                       2.39us  417.68K
PyTorchFeatureConversionIdListBenchmark                      3.55us  281.65K
PyTorchFeatureConversionIdScoreListBenchmark                 5.05us  198.06K
============================================================================
============================================================================
sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative  time/iter  iters/s
============================================================================
PyTorchFeatureConversionDenseBenchmark                       2.39us  417.54K
PyTorchFeatureConversionIdListBenchmark                      3.56us  281.03K
PyTorchFeatureConversionIdScoreListBenchmark                 5.05us  198.13K
============================================================================
swolchok@devbig032 ~/f/fbcode> for x in (seq 5); sudo scripts/bertrand/noise/denoise.sh /tmp/pytorch_feature_conversion_benchmark.Dec2TupleConstruction ; end                                                                                                                                                                                            (pytorch-ort-bert)
============================================================================
sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative  time/iter  iters/s
============================================================================
PyTorchFeatureConversionDenseBenchmark                       2.38us  420.38K
PyTorchFeatureConversionIdListBenchmark                      3.53us  282.90K
PyTorchFeatureConversionIdScoreListBenchmark                 4.99us  200.41K
============================================================================
============================================================================
sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative  time/iter  iters/s
============================================================================
PyTorchFeatureConversionDenseBenchmark                       2.37us  421.54K
PyTorchFeatureConversionIdListBenchmark                      3.54us  282.27K
PyTorchFeatureConversionIdScoreListBenchmark                 4.99us  200.28K
============================================================================
============================================================================
sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative  time/iter  iters/s
============================================================================
PyTorchFeatureConversionDenseBenchmark                       2.38us  420.99K
PyTorchFeatureConversionIdListBenchmark                      3.56us  280.56K
PyTorchFeatureConversionIdScoreListBenchmark                 5.08us  196.91K
============================================================================
============================================================================
sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative  time/iter  iters/s
============================================================================
PyTorchFeatureConversionDenseBenchmark                       2.37us  421.48K
PyTorchFeatureConversionIdListBenchmark                      3.54us  282.87K
PyTorchFeatureConversionIdScoreListBenchmark                 5.00us  199.88K
============================================================================
============================================================================
sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative  time/iter  iters/s
============================================================================
PyTorchFeatureConversionDenseBenchmark                       2.38us  419.69K
PyTorchFeatureConversionIdListBenchmark                      3.56us  280.68K
PyTorchFeatureConversionIdScoreListBenchmark                 4.97us  201.23K
============================================================================
```

Looks like maybe around 1% faster?

Reviewed By: hlu1

Differential Revision: D32817592

fbshipit-source-id: 4b015dc993b26a92e45a3673e14fde32105a34fa
2021-12-07 09:39:15 -08:00
617a3bd944 GHA: Re enable mac json uploads (#69387)
Summary:
Removed JSON uploading to S3 for Mac GHA workflows as the AWS credentials were not working.

This PR tries uploading them to GitHub instead, which works https://github.com/pytorch/pytorch/runs/4413940318?check_suite_focus=true

They should show up on the HUD page: hud.pytorch.org/pr/69387 with the name test-jsons after the CI is completed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69387

Reviewed By: seemethere

Differential Revision: D32885204

Pulled By: janeyx99

fbshipit-source-id: 3d25ead6d464144a228fdf8ead5172de3ed8430e
2021-12-07 08:25:51 -08:00
945d2e380c [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D32910817

fbshipit-source-id: 60d0cb10412e1a37a0249bb223b75855c5596dbd
2021-12-07 08:11:09 -08:00
4670f0f2c5 Set non-default backend names to lower case (#69400)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69400

Hopefully this makes naming more consistent. Without this change, some tests will fail for plugins since values can be set to upper case in some cases. This should prevent that and make lookup and comparison consistent.

Test Plan: Check the signals. There is no specific test for this, but all tests should pass.

Reviewed By: mrshenli

Differential Revision: D32836529

fbshipit-source-id: 1b7d2b64e04fe0391b710aa6ed6d1e47df9027a3
2021-12-07 07:58:46 -08:00
2dd46d3aa9 FX: ensure node stack trace survives copying (#69368)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69368

Before this PR, copying a node would lose the stack trace. This PR
ensures that the stack trace is preserved across copies.

This is useful because quantization passes would like to start
allowing the user to preserve stack traces, and we use the copy
behavior.

Test Plan:
```
python test/test_fx.py TestFX.test_stack_traces
```

Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D32835248

fbshipit-source-id: 91610fd8d05f5683cfa5e11fb6f9f3feacb8e241
2021-12-07 06:18:38 -08:00
ca945d989a [quant][graphmode][fx] Add default_replay_qconfig for ops like reshape (#69249)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69249

This PR added default_replay_qconfig and default_replay_observer which is used
when we want to configure an operator to reuse the observer from input, if the input
Tensor for the operator is not observed, we will not observe the output of this operator either,
if the input Tensor is observed, we will observe the output of the operator with the same observer.

e.g.

```
x1 = x0.reshape()
```
if reshape is configured with default_replay_qconfig:
1. if x0 is observed with observer_0, we'll observe x1 with the same observer instance
2. if x0 is not observed, we won't observe x1 either

Test Plan:
```
python test/test_quantization.py TestQuantizeFx.test_replay_qconfig
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32774723

fbshipit-source-id: 26862b2bc181d0433e2243daeb3b8f7ec3dd33b2
2021-12-06 22:56:14 -08:00
8b1e49635a [JIT] Separate GPU implementation of frozen_conv_add_relu_fusion.cpp (#68149)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68149

JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available)

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D32773666

Pulled By: davidberard98

fbshipit-source-id: c83dbb88804bdef23dc60a6299acbfa76d5c1495
2021-12-06 21:06:25 -08:00
e55b939732 Enable build-split for all CUDA-11.x version (#69494)
Summary:
Should fix cu115 wheel binary builds, see https://hud.pytorch.org/ci/pytorch/pytorch/nightly?name_filter=cu115

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69494

Reviewed By: atalman

Differential Revision: D32899994

Pulled By: malfet

fbshipit-source-id: bb0e05a30c9360c75d2cfd9d4e0d40ed9a3b2830
2021-12-06 20:39:06 -08:00
bd8a4a9372 [wip][quant][graphmode] produce reference pattern for binary ops and then rewrite to quantized op (#68229)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68229

This PR makes BinaryOpQuantizeHandler to always produce reference patterns, and we rely on
subgraph_rewriter to rewrite the reference qunatized patterns to quantized ops

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32537714

fbshipit-source-id: 456086b308c4446840d8d37997daa6f8f8068479
2021-12-06 20:20:15 -08:00
bcd0303834 [fx2trt][easy] add sparse flag to TRTInterpreter (#69495)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69495

As the title. Separated from D30589161.

Test Plan: Tested in D30589161.

Reviewed By: maratsubkhankulov, wushirong

Differential Revision: D32898927

fbshipit-source-id: 89e18d2eb19b43fbab92b4988d0a21d21cff2d1f
2021-12-06 18:57:08 -08:00
3211588308 [fx2trt] Separate sign from trunc_div and use it for acc_ops.sign (#69486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69486

As the title. Migrate from sign plugin to native trt layers. All the layers are fused into one single PWN kernel in TRT.

```
[TensorRT] VERBOSE: Engine Layer Information:
Layer(PointWiseV2): PWN(sign_1_sign_rhs + sign_1_sign_rhs_broadcast, PWN(PWN(sign_1_floor_div*2_rhs + sign_1_floor_div*2_rhs_broadcast, PWN(PWN(PWN([UNARY]-[acc_ops.sign]-[sign_1_prod_abs], [UNARY]-[acc_ops.sign]-[sign_1_prod_abs_exp]), PWN([UNARY]-[acc_ops.sign]-[sign_1_prod_exp], [ELEMENTWISE]-[acc_ops.sign]-[sign_1_exp_floor_div])), [ELEMENTWISE]-[acc_ops.sign]-[sign_1_floor_div*2])), [ELEMENTWISE]-[acc_ops.sign]-[sign_1_sign])), Tactic: 0, x[Float(2,2,3)] -> output0[Float(2,2,3)]
```

Test Plan: CI

Reviewed By: wushirong

Differential Revision: D32887537

fbshipit-source-id: ac250b5197e340319de29653a27f879a0e1ea9cd
2021-12-06 16:54:44 -08:00
e23827e6d6 [fx2trt] [Prep for release] Add type hints to converters and separate main files (#69458)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69458

1. Added type hints to acc ops converters.
2. Put some of the class/logic in fx2trt.py to some separated files. (input_tensor_spec.py, trt_module.py, converter_registry.py).
3. Added import in `__init__.py` so that user can just call `from torch.fx.experimental.fx2trt import xxx` instead of `experimental.fx2trt.fx2trt`.

Test Plan: CI

Reviewed By: wushirong

Differential Revision: D32884637

fbshipit-source-id: e3e1e597edb9a08b47b4595bd371f570f2f3c9b6
2021-12-06 16:54:41 -08:00
a2d1cadfdb [fx2trt] Add a helper function to generate specs for dynamic batch size (#69405)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69405

Add a helper function that will generate input tensor specs with dynamic batch size.

Note that the constraint currently on this function is that the batch dimension of all these tensors should be the first dimension.

Also add more doc strings.

Test Plan:
Added unit tests.
```
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7881299413036896
    ✓ ListingSuccess: caffe2/test/fx2trt/core:test_input_tensor_spec - main (7.455)
    ✓ Pass: caffe2/test/fx2trt/core:test_input_tensor_spec - test_from_tensor (caffe2.test.fx2trt.core.test_input_tensor_spec.TestTRTModule) (7.047)
    ✓ Pass: caffe2/test/fx2trt/core:test_input_tensor_spec - test_from_tensors_with_dynamic_batch_size (caffe2.test.fx2trt.core.test_input_tensor_spec.TestTRTModule) (7.066)
    ✓ Pass: caffe2/test/fx2trt/core:test_input_tensor_spec - test_from_tensors (caffe2.test.fx2trt.core.test_input_tensor_spec.TestTRTModule) (7.181)
Summary
  Pass: 3
  ListingSuccess: 1
```

Wait for CI to verify if this unit test can run without RE.

Reviewed By: yinghai, kflu

Differential Revision: D32853947

fbshipit-source-id: 19713e8ad5478c945385c7013f7a1b9894151fea
2021-12-06 16:54:39 -08:00
cfe3cbb392 [fx2trt] Use weights shape as normalize shape in layer norm (#69401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69401

As the title. In PyTorch, these two shapes are the same. Normalize shape might be retrieved from tensor.size and in explicit batch dim, it won't work right now.

Test Plan:
```
    ✓ ListingSuccess: caffe2/test/fx2trt/converters:test_layernorm - main (7.018)
    ✓ Pass: caffe2/test/fx2trt/converters:test_layernorm - test_layer_norm_with_dynamic_shape_0_1d_normalized_shape (caffe2.test.fx2trt.converters.acc_op.test_layernorm.TestLayerNormConverter) (22.945)
    ✓ Pass: caffe2/test/fx2trt/converters:test_layernorm - test_layer_norm_0_1d_normalized_shape (caffe2.test.fx2trt.converters.acc_op.test_layernorm.TestLayerNormConverter) (23.203)
    ✓ Pass: caffe2/test/fx2trt/converters:test_layernorm - test_layer_norm_with_dynamic_shape_1_2d_normalized_shape (caffe2.test.fx2trt.converters.acc_op.test_layernorm.TestLayerNormConverter) (42.549)
    ✓ Pass: caffe2/test/fx2trt/converters:test_layernorm - test_layer_norm_1_2d_normalized_shape (caffe2.test.fx2trt.converters.acc_op.test_layernorm.TestLayerNormConverter) (43.544)
    ✓ Pass: caffe2/test/fx2trt/converters:test_layernorm - test_layer_norm_with_dynamic_shape_2_4d_input_shape (caffe2.test.fx2trt.converters.acc_op.test_layernorm.TestLayerNormConverter) (45.958)
    ✓ Pass: caffe2/test/fx2trt/converters:test_layernorm - test_layer_norm_2_4d_input_shape (caffe2.test.fx2trt.converters.acc_op.test_layernorm.TestLayerNormConverter) (47.027)
Summary
  Pass: 6
  ListingSuccess: 1
```

Reviewed By: yinghai

Differential Revision: D32853359

fbshipit-source-id: 8a122fe3348a1d9ad07b48647ec6166d171d113a
2021-12-06 16:53:29 -08:00
59e98b66ac Revert D32704467: [Autograd/Checkpoint] Checkpoint implementation without reentrant autograd
Test Plan: revert-hammer

Differential Revision:
D32704467 (e032dae329)

Original commit changeset: 6eea1cce6b93

fbshipit-source-id: 1a788c1fd57cee46bba82e216e6162d078359cc2
2021-12-06 16:33:32 -08:00
bc89528931 Initialize upgrader and operator version files (#68772)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68772

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D32603257

Pulled By: tugsbayasgalan

fbshipit-source-id: 5a3d9ba4d0a01ddff4ff6ebdf7bb88ec125765b0
2021-12-06 16:27:52 -08:00
9e678446a2 [Pytorch Edge] Add new_empty_strided to tracer (#69492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69492

We already add empty, and this is another weird variation that sometimes pops up. Triggering it is unclear, so just adding it for now.

Test Plan: ran tracer

Differential Revision: D32896522

fbshipit-source-id: 38627d8efc48ef240100ccdbd94c0e7208b0b466
2021-12-06 15:28:13 -08:00
65b0f389d2 [PyTorch][Distributed] Use auto-grad enabled collections for the shared linear op to enable backward grad calculation (#68096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68096

We replace all c10d APIs with the Auto-grad collection in the shareded linear op. So that we can enable the backward propagation (grad calculation for sharded linear).
ghstack-source-id: 144882914

Test Plan: Unit test + CI

Reviewed By: pritamdamania87

Differential Revision: D32177341

fbshipit-source-id: 1919e8ca877bdc79f4cdb0dc2a82ddaf6881b9f1
2021-12-06 15:17:08 -08:00
7c2489bdae [PyTorch][Distributed] Enable Reduce Scatter and modify all_to_all for sharded linear with more test cases. (#68786)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68786

To enable the auto grad for the sharded linear, we find we need to make some changes to the current nn function api (c10d api with auto grad enabled). So we made the following several changes:

1. Add a new api `reduce_scatter` since we need it in the rowwise sharding.
2. Modify the `all_to_all` api to make sure it consistent with the ones in distributed_c10d.py.
3. Found the cpp input params of `reduce_scatter` is missing input param, added more unit test to cover these cases.
4. Sync the NN test from gloo to nccl.
ghstack-source-id: 144860208

Test Plan: CI + Unit Test

Reviewed By: pritamdamania87

Differential Revision: D32569674

fbshipit-source-id: 9bd613f91bbf7a39eede0af32a5a5db0f2ade43b
2021-12-06 13:38:58 -08:00
e032dae329 [Autograd/Checkpoint] Checkpoint implementation without reentrant autograd (#69027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69027

Resubmission of https://github.com/pytorch/pytorch/pull/62964 withe
suggestions and tests discussed in
https://github.com/pytorch/pytorch/issues/65537.

Adds a `use_reentrant=False` flag to `checkpoint` function. When
`use_reentrant=True` is specified, a checkpointing implementation that uses
SavedVariableHooks instead of re-entrant autograd is used. This makes it more
composable with things such as `autograd.grad` as well as DDP (still need to
add thorough distributed testing).

As discussed in https://github.com/pytorch/pytorch/issues/65537, we have added
the following tests:

-[ ] Gradient hooks are called once
ghstack-source-id: 144644859

Test Plan: CI

Reviewed By: pbelevich

Differential Revision: D32704467

fbshipit-source-id: 6eea1cce6b935ef5a0f90b769e395120900e4412
2021-12-06 13:29:37 -08:00
4d81175a07 add VSX dispatch for fft_fill_with_conjugate_symmetry_stub (#68914)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68057.

As discussed in https://github.com/pytorch/pytorch/issues/68057 adding change to provide the missing dispatch for VSX.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68914

Reviewed By: seemethere

Differential Revision: D32696773

Pulled By: malfet

fbshipit-source-id: f1b70ab85bf9fb1c0119cc70d6125b8801d95669
2021-12-06 13:04:59 -08:00
f87faf3c29 .github: Volume mount local netrc for docs push (#69472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69472

Neglected the fact that the actual push for these variables is happening
inside of a docker container, this should help resolve that issue

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32889583

Pulled By: seemethere

fbshipit-source-id: d0ef213787694ab1a7e9fb508c58d2f53ff218c3
2021-12-06 12:11:23 -08:00
1859e5f000 [FSDP] Enforce wrapper_cls as a mandatory kwarg in enable_wrap. (#69358)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69358

Enforces and raises error earlier if wrapper_cls is not provided as an
arg into enable_wrap() function. Also improves the documentation.
ghstack-source-id: 144807950

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D32826963

fbshipit-source-id: d1b98df021e86d3d87a626e82facf6230b571a55
2021-12-06 12:11:20 -08:00
00245fed96 [FSDP] Kill config_auto_wrap_policy, remove policy from enable_wrap, (#69357)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69357

Since we only want to support enable_wrap() and wrap() manual wrapping
APIs without them accepting auto_wrap_policy, remove all this unneeded code.
ghstack-source-id: 144807951

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D32826318

fbshipit-source-id: 6526e700ebdf132cbb10439698f5c97ce083cd3d
2021-12-06 12:11:17 -08:00
c95277e92a [FSDP] Remove auto_wrap() (#69356)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69356

Per title
ghstack-source-id: 144807949

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D32816150

fbshipit-source-id: 6b4eacc63edd267bc1eb8a1c1d6c753bc581d63a
2021-12-06 12:11:14 -08:00
f333cde14e [FSDP] Make recursive_wrap, wrap APIs independent of ConfigAutoWrap. (#68776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68776

Makes these APIs independent of ConfigAutoWrap so that they can be
used by FSDP ctor without it knowing about ConfigAutoWrap.

Also gets us one step closer to killing ConfigAutoWrap.recursive_wrap and
auto_wrap(), as we will only support enable_wrap() and wrap() moving forward.

Will test via unittests and FSDP benchmarks to ensure the wrapping still works.
ghstack-source-id: 144807948

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D32604021

fbshipit-source-id: 54defc0cd90b16b5185a8c1294b39f75c06ffd21
2021-12-06 12:09:49 -08:00
456139d0ae FX pass: fuse_sparse_matmul_add (#69340)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69340

- An FX pass to fuse ops resulting from addmm(a, b.t())
- Used to enable structured sparsity using TRT

Reviewed By: 842974287

Differential Revision: D32456684

fbshipit-source-id: 601826af216cea314ee85ed522d5c54a5151d720
2021-12-06 12:07:02 -08:00
68b5c86e65 [Vulkan] Implement slice operator (#69382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69382

Implemented `slice` operator on the Vulkan backend:
* Supports only <= 4D tensors.
* `aten::slice.Tensor` will be executed internally by indexing Tensor.
* Slicing means selecting the elements present in the tensor by using `:` slice operator. We can slice the elements by using the index of that particular element.
* Indexing starts with 0. `end` is exclusive. In this example, we will be getting the elements from the very start to the end index 4(exclusive) of the tensor.
```
tensor = torch.tensor([2, 4, 1, 7, 0, 9])
print(tensor[ : 4])
# Outputs- tensor([2, 4, 1, 7])
```
* Generalized input tensors to 4D ones to simplify input/output texture handling. For example, {2, 3} is treated as {1,1,2,3} internally.
* Negative `start` and `end` inputs are allowed.
* CPU implementation: [/aten/src/ATen/native/TensorShape.cpp::slice()](3e45739543/aten/src/ATen/native/TensorShape.cpp (L1262))
* For **width** dimension, use `vkCmdCopyImage` API,
  * input texture size = `{x,y,z}`
  * if `step` is 1, copy a region from the input texture to the output texture once where
    * source offset = `{start,0,0}`
    * destination offset = `{0,0,0}`
    * copy extents = `{end-start,y,z}`
    * call `vkCmdCopyImage` API
  * if `step` is not 1, do for-loop from x=`start` to `end-1` by `step` (also from x_new=`0` to `end-start-1`) where
    * x_max = x
    * copy extents = `{1,y,z}`
    * if (x >= x_max) continue; // out of range
    * source offset = `{x,0,0}`
    * destination offset = `{x_new,0,0}`
    * call `vkCmdCopyImage` API
* For **height** dimension, use `vkCmdCopyImage` API,
  * input texture size = `{x,y,z}`
  * if `step` is 1, copy a region from the input texture to the output texture once where
    * source offset = `{0,start,0}`
    * destination offset = `{0,0,0}`
    * copy extents = `{x,end-start,z}`
    * call `vkCmdCopyImage` API
  * if `step` is not 1, do for-loop from y=`start` to `end-1` by `step` (also from y_new=`0` to `end-start-1`) where
    * y_max = y
    * copy extents = `{x,1,z}`
    * if (y >= y_max) continue; // out of range
    * source offset = `{0,y,0}`
    * destination offset = `{0,y_new,0}`
    * call `vkCmdCopyImage` API
* For **batch** and **feature**(channel) dimensions, we build up shader operations from the output texture point of view to avoid the nondeterministic order of GPU shader operations between texels. See [incoherent memory access](https://www.khronos.org/opengl/wiki/Memory_Model#Incoherent_memory_access)
  * `b,c,h,w` = input tensor dims (NCHW)
  * `b1,c1,h1,w1` = output tensor dims (NCHW)
  * `posIn` = position (x,y,z) for input texture
  * `posOut` = position (x,y,z) for output texture
  * `inval` = input texel value
  * `outval` = output texel value
  * `max_dst_index` = batch size of output tensor * channel size of output tensor
  * `n` = end - start
  * `i` = index of input texel (0...3) and `j` = index of output texel (0..3)
  * Pseudo code:
```
for (uint j = 0; j < 4; ++j) {
  dst_index = posOut.z * 4 + j;
  if (dst_index >= max_dst_index) {
    save outval to output texture at posOut
    break; // out of reange
  }

  b1 = int(dst_index / channel size of output tensor);
  c1 = dst_index % channel size of output tensor;
  h1 = posOut.y;
  w1 = posOut.x;

  b=b1
  c=c1
  h=h1
  w=w1

  if (dim==0) { // batch
    b=start+step*b1;
  } else { // feature(channel)
    c=start+step*c1
  }

  src_index = b * channel size of input tensor + c;
  posIn.x = int(w);
  posIn.y = int(h);
  posIn.z = int(src_index / 4);
  i = (src_index % 4);
  read inval from input texture at posIn
  outval[j] = inval[i]
  if (j == 3) {
    save outval to output texture at posOut
  }
}
```
* Error/edge cases:
  * Vulkan backend doesn't support zero-sized slice. It throws an exception when allocating a Vulkan buffer if any dim size is zero.
  * The slice step should be positive.
* Generalized test cases with different dim size tensors for batch, feature, height and width. For example, a 4D tensor slicing by dim=width:
```
tensor {2, 3, 40, 50} slicing with dim=3, start=10, end=30, step=1 <-> tensor indexing by [:,:,:,10:30:1]
tensor {2, 3, 40, 50} slicing with dim=3, start=10, end=30, step=7 <-> tensor indexing by [:,:,:,10:30:7]
tensor {2, 3, 40, 50} slicing with dim=3, start=10, end=50, step=2 <-> tensor indexing by [:,:,:,10:50:2] with end=out of range
tensor {2, 3, 40, 50} slicing with dim=3, start=-60, end=60, step=2 <-> tensor indexing by [:,:,:,-60:60:2] with start/end=out of range
tensor {2, 3, 40, 50} slicing with dim=3, start=-30, end=-10, step=2 <-> tensor indexing by [:,:,:,-30:-10:1] with negative start/end
tensor {2, 3, 40, 50} slicing with dim=3, start=0, end=INT64_MAX, step=2 <-> tensor indexing by [:,:,:,0:9223372036854775807:1] with end=INT64_MAX
tensor {2, 3, 40, 50} slicing with dim=3, start=-10, end=INT64_MAX, step=2 <-> tensor indexing by [:,:,:,-10:9223372036854775807:1] with negative start and end=INT64_MAX
tensor {2, 3, 40, 50} slicing with dim=3, start=INT64_MIN, end=INT64_MAX, step=2 <-> tensor indexing by [:,:,:,-9223372036854775808:9223372036854775807:1] with start=INT64_MIN and end=INT64_MAX
tensor {2, 3, 40, 50} slicing with dim=3, start=empty, end=empty, step=2 <-> tensor indexing by [:,:,:,::1] with empty start/end
```
* References:
  * [Slicing PyTorch Datasets](https://lewtun.github.io/blog/til/nlp/pytorch/2021/01/24/til-slicing-torch-datasets.html)
  * [How to Slice a 3D Tensor in Pytorch?](https://www.geeksforgeeks.org/how-to-slice-a-3d-tensor-in-pytorch/)
  * [PyTorch Tensor Indexing API](https://pytorch.org/cppdocs/notes/tensor_indexing.html#translating-between-python-c-index-types)
  * [PyTorch Tensor Indexing](https://deeplearninguniversity.com/pytorch/pytorch-tensor-indexing/)
  * [Slicing and Striding](https://mlverse.github.io/torch/articles/indexing.html#slicing-and-striding)
* Vulkan `slice` operator tensor conversion:
{F684363708}

Test Plan:
Build & test on Android:
```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
```
Build & test on MacOS:
```
cd ~/fbsource
buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac
./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64
```
Test result on Android (Google Pixel 5):
```
[ RUN      ] VulkanAPITest.slice_width_success
[       OK ] VulkanAPITest.slice_width_success (17 ms)
[ RUN      ] VulkanAPITest.slice_height_success
[       OK ] VulkanAPITest.slice_height_success (13 ms)
[ RUN      ] VulkanAPITest.slice_feature_success
[       OK ] VulkanAPITest.slice_feature_success (20 ms)
[ RUN      ] VulkanAPITest.slice_batch_success
[       OK ] VulkanAPITest.slice_batch_success (9 ms)
[ RUN      ] VulkanAPITest.slice_invalidinputs_exceptions
[       OK ] VulkanAPITest.slice_invalidinputs_exceptions (0 ms)
```
Test result on MacOS:
```
[ RUN      ] VulkanAPITest.slice_width_success
[       OK ] VulkanAPITest.slice_width_success (81 ms)
[ RUN      ] VulkanAPITest.slice_height_success
[       OK ] VulkanAPITest.slice_height_success (56 ms)
[ RUN      ] VulkanAPITest.slice_feature_success
[       OK ] VulkanAPITest.slice_feature_success (132 ms)
[ RUN      ] VulkanAPITest.slice_batch_success
[       OK ] VulkanAPITest.slice_batch_success (33 ms)
[ RUN      ] VulkanAPITest.slice_invalidinputs_exceptions
[       OK ] VulkanAPITest.slice_invalidinputs_exceptions (1 ms)
```

Reviewed By: SS-JIA

Differential Revision: D32482638

fbshipit-source-id: 65841fb2d3489ee407f2b4f38619b700787d41b0
2021-12-06 12:05:37 -08:00
a84ed8be6d unify compare kernels (#69111)
Summary:
This unifies 6 compare ops (NE, EQ, LT, LE, GE, GT) into 2 kernels, reducing context size. Performance is ~5% worse for low width broadcasted cases, on-par for non-broadcasted
With this PR, benchmarks for contiguous, 1M-MM, 1M-M1, op with scalar (size in MB and bandwidth in GB/s):
```
    5.0,   795.9
   10.0,   650.5
   15.0,   706.2
   20.0,   731.6
   25.0,   744.9
   30.0,   758.1
   35.0,   762.6
   40.0,   768.8
   45.0,   775.7
   50.0,   780.7
   55.0,   781.7
   60.0,   783.0
   65.0,   784.8
   70.0,   790.7
   75.0,   789.2
   80.0,   794.4
   85.0,   794.2
   90.0,   797.4
   95.0,   796.3
  100.0,   798.0
    3.0,   363.7     1.0,   122.2     3.0,   385.5
    6.0,   420.4     2.0,   142.9     6.0,   755.5
    9.0,   438.3     3.0,   151.6     9.0,   684.5
   12.0,   449.5     4.0,   156.4    12.0,   702.9
   15.0,   463.7     5.0,   159.6    15.0,   716.8
   18.0,   472.7     6.0,   161.4    18.0,   737.0
   21.0,   477.6     7.0,   162.4    21.0,   745.6
   24.0,   480.9     8.0,   164.1    24.0,   755.4
   27.0,   483.7     9.0,   163.7    27.0,   760.7
   30.0,   487.3    10.0,   165.9    30.0,   770.4
   33.0,   491.4    11.0,   166.3    33.0,   774.3
   36.0,   492.9    12.0,   166.2    36.0,   779.0
   39.0,   494.7    13.0,   166.7    39.0,   782.5
   42.0,   491.3    14.0,   166.7    42.0,   789.0
   45.0,   495.1    15.0,   167.5    45.0,   790.0
   48.0,   499.7    16.0,   167.7    48.0,   791.8
   51.0,   496.2    17.0,   166.9    51.0,   794.0
   54.0,   497.6    18.0,   167.7    54.0,   797.4
   57.0,   497.1    19.0,   167.5    57.0,   798.6
   60.0,   498.8    20.0,   168.8    60.0,   802.1

```
Master
```
    5.0,   743.4
   10.0,   665.7
   15.0,   702.3
   20.0,   727.5
   25.0,   740.7
   30.0,   757.5
   35.0,   760.3
   40.0,   768.5
   45.0,   775.7
   50.0,   776.8
   55.0,   781.1
   60.0,   786.5
   65.0,   786.8
   70.0,   790.1
   75.0,   789.7
   80.0,   789.1
   85.0,   793.2
   90.0,   793.8
   95.0,   795.9
  100.0,   796.0
    3.0,   383.1     1.0,   129.0     3.0,   337.0
    6.0,   445.0     2.0,   149.6     6.0,   670.6
    9.0,   445.3     3.0,   159.6     9.0,   678.6
   12.0,   474.9     4.0,   164.1    12.0,   705.5
   15.0,   480.8     5.0,   167.2    15.0,   718.3
   18.0,   490.3     6.0,   169.1    18.0,   733.3
   21.0,   493.9     7.0,   168.5    21.0,   742.5
   24.0,   503.8     8.0,   171.9    24.0,   756.4
   27.0,   506.7     9.0,   171.3    27.0,   759.8
   30.0,   508.7    10.0,   172.4    30.0,   767.1
   33.0,   515.7    11.0,   174.2    33.0,   773.7
   36.0,   516.7    12.0,   170.4    36.0,   781.7
   39.0,   519.1    13.0,   174.4    39.0,   782.1
   42.0,   515.7    14.0,   174.1    42.0,   787.0
   45.0,   519.2    15.0,   172.7    45.0,   788.1
   48.0,   522.2    16.0,   175.4    48.0,   791.7
   51.0,   519.6    17.0,   175.1    51.0,   795.7
   54.0,   518.5    18.0,   174.8    54.0,   795.8
   57.0,   519.1    19.0,   174.4    57.0,   796.6
   60.0,   521.5    20.0,   175.6    60.0,   800.1
```
<details>
<summary>Benchmarking script </summary>

```
import torch
from matplotlib import pyplot as plt
from torch.utils.benchmark import Timer, Compare
import math
import click
print(torch.cuda.get_device_capability()) # check that we are on Volta (compute capability 7,0)
#torch.cuda.set_device(1)
# don't benchmark on anything too small, you'll see only overhead
click.command()
click.option('--op_str', default="torch.gt")
click.option('--dtype_str', default="float", type=click.Choice(['float', 'half']))
def bench(op_str, dtype_str):
    if dtype_str == "float":
        dtype = torch.float
    elif dtype_str == "half":
        dtype = torch.half

    MB = 1024 * 1024
    size = MB
    results = []
    sizes = []
    for _ in range(20):
        torch.cuda.memory.empty_cache()
        a=torch.randn(int(size), device="cuda", dtype=dtype)
        b=torch.randn(int(size), device="cuda", dtype=dtype)
        t = Timer(stmt=f"{op_str}(a,b)", label = op_str, sub_label=f"{size/MB} MB", description="contiguous", globals = {"a":a, "b":b})
        res = t.blocked_autorange()
        results.append(res)
        sizes.append(size)
        size +=  MB
        del a #to save memory for next iterations
        del b
    c=Compare(results)
    #print(c)
    bw=[]
    bytes=[]
    element_size = torch.tensor([], dtype=dtype).element_size()
    output_element_size = 1
    for res, size in zip(results,sizes):
        bytes_io = 2*size*element_size + output_element_size * size
        bytes.append(bytes_io/MB)
        # we'll report bandwidth in GB/s
        bw.append(bytes_io/res.median * 1e-9)
        print(f"{bytes_io/MB:7.1f}, {bw[-1]:7.1f}")

    sizes = []
    results = [[],[],[]]

    size = MB
    for _ in range(20):
        torch.cuda.memory.empty_cache()
        M = math.floor(math.sqrt(size))
        a=torch.randn(1, M, device="cuda", dtype=dtype)
        b=torch.randn(M, M, device="cuda", dtype=dtype)
        b1 = torch.randn(M, 1, device="cuda", dtype=dtype)
        tb = Timer(stmt=f"{op_str}(a,b)", label = op_str, sub_label=f"{M*M/MB} MB", description="MMM1", globals = {"a":a, "b":b})
        t1 = Timer(stmt=f"{op_str}(a,b1)", label = op_str, sub_label=f"{M*M/MB} MB", description="M11M", globals = {"a":a, "b1":b1})
        ts = Timer(stmt=f"{op_str}(b,1.)", label = op_str, sub_label=f"{M*M/MB} MB", description="scalar", globals = {"a":a, "b":b})

        res = [t.blocked_autorange() for t in (tb, t1, ts)]
        for (rl, r) in zip(results, res):
            rl.append(r)
        sizes.append(M)
        size += MB
        del a #to save memory for next iterations
        del b
    comps = [Compare(r) for r in results]
    #[print(c) for c in comps]
    bw=[[],[],[]]

    for res, res1, ress, size in zip(results[0],results[1],results[2], sizes):
        bytes_io = (size+size*size)*element_size + output_element_size * size*size #(size+size+size*size)*4
        bytes_io1 = (size+size)*element_size + output_element_size * size*size #(size+size+size*size)*4
        bytes_ios = (size*size)*element_size + output_element_size * size * size
        bytes_iol = (bytes_io, bytes_io1, bytes_ios)
        for (bw_elem, bytes_elem, res_elem) in zip(bw, bytes_iol, (res, res1, ress)):
            bw_elem.append(bytes_elem/res_elem.median * 1e-9)
        print(f"{bytes_iol[0]/MB:7.1f}, {bw[0][-1]:7.1f}", f"{bytes_iol[1]/MB:7.1f}, {bw[1][-1]:7.1f}",
        f"{bytes_iol[2]/MB:7.1f}, {bw[2][-1]:7.1f}")

if __name__ == '__main__':
    bench()
```
</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69111

Reviewed By: mruberry

Differential Revision: D32851098

Pulled By: ngimel

fbshipit-source-id: cfb83922b2e8eb6a0ad0621ff07c2dada9c8e626
2021-12-06 11:00:53 -08:00
38c576cfef Clean up CODEOWNERS for .github/ (#69395)
Summary:
Cleans up the CODEOWNERS file to reflect current team

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69395

Test Plan: yeah_sandcastle

Reviewed By: anjali411

Differential Revision: D32885237

Pulled By: seemethere

fbshipit-source-id: a465f2cd0e27d5e53f5af5769d1cad47ec5348e7
2021-12-06 10:50:29 -08:00
bf01cd5228 Move THC_sleep to ATen (#69038)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69038

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D32872479

Pulled By: ngimel

fbshipit-source-id: 97c7592b16eee2ecc66c42507c358aa92cc8ee50
2021-12-06 10:20:43 -08:00
a974699633 Skips failing ROCm test (#69456)
Summary:
ROCm and CUDA type promotion are slightly divergent and need to be updated.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69456

Reviewed By: anjali411, janeyx99

Differential Revision: D32883895

Pulled By: mruberry

fbshipit-source-id: 3b0ba8a9d092c2d7ff20d78da42d4a147b1db12d
2021-12-06 09:12:31 -08:00
b737e09f60 expose return_types in Python (#66614)
Summary:
https://github.com/facebookresearch/functorch/issues/87

TODO:
* [x] Add comments
* [x] Add test
* [x] Fix XLA

<details>

<summary>Generated python_return_types.cpp</summary>

```cpp
#include <Python.h>

#include <vector>
#include <map>
#include <string>

#include "torch/csrc/autograd/python_return_types.h"
#include "torch/csrc/utils/structseq.h"
#include "torch/csrc/Exceptions.h"

namespace {
PyTypeObject* get__det_lu_based_helper_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"det", ""}, {"lu", ""}, {"pivs", ""},  {nullptr} };
    static PyTypeObject _det_lu_based_helperNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types._det_lu_based_helper", nullptr, NamedTuple_fields, 3 };
    if (!is_initialized) {
        PyStructSequence_InitType(&_det_lu_based_helperNamedTuple, &desc);
        _det_lu_based_helperNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &_det_lu_based_helperNamedTuple;
}
PyTypeObject* get__fake_quantize_per_tensor_affine_cachemask_tensor_qparams_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"output", ""}, {"mask", ""},  {nullptr} };
    static PyTypeObject _fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types._fake_quantize_per_tensor_affine_cachemask_tensor_qparams", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&_fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple, &desc);
        _fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &_fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple;
}
PyTypeObject* get__fused_moving_avg_obs_fq_helper_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"output", ""}, {"mask", ""},  {nullptr} };
    static PyTypeObject _fused_moving_avg_obs_fq_helperNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types._fused_moving_avg_obs_fq_helper", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&_fused_moving_avg_obs_fq_helperNamedTuple, &desc);
        _fused_moving_avg_obs_fq_helperNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &_fused_moving_avg_obs_fq_helperNamedTuple;
}
PyTypeObject* get__lu_with_info_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"LU", ""}, {"pivots", ""}, {"info", ""},  {nullptr} };
    static PyTypeObject _lu_with_infoNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types._lu_with_info", nullptr, NamedTuple_fields, 3 };
    if (!is_initialized) {
        PyStructSequence_InitType(&_lu_with_infoNamedTuple, &desc);
        _lu_with_infoNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &_lu_with_infoNamedTuple;
}
PyTypeObject* get__unpack_dual_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"primal", ""}, {"tangent", ""},  {nullptr} };
    static PyTypeObject _unpack_dualNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types._unpack_dual", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&_unpack_dualNamedTuple, &desc);
        _unpack_dualNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &_unpack_dualNamedTuple;
}
PyTypeObject* get_aminmax_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"min", ""}, {"max", ""},  {nullptr} };
    static PyTypeObject aminmaxNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.aminmax", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&aminmaxNamedTuple, &desc);
        aminmaxNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &aminmaxNamedTuple;
}

PyTypeObject* get_aminmax_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"min", ""}, {"max", ""},  {nullptr} };
    static PyTypeObject aminmax_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.aminmax_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&aminmax_outNamedTuple1, &desc);
        aminmax_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &aminmax_outNamedTuple1;
}
PyTypeObject* get_cummax_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject cummaxNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.cummax", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&cummaxNamedTuple, &desc);
        cummaxNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &cummaxNamedTuple;
}

PyTypeObject* get_cummax_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject cummax_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.cummax_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&cummax_outNamedTuple1, &desc);
        cummax_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &cummax_outNamedTuple1;
}
PyTypeObject* get_cummin_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject cumminNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.cummin", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&cumminNamedTuple, &desc);
        cumminNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &cumminNamedTuple;
}

PyTypeObject* get_cummin_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject cummin_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.cummin_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&cummin_outNamedTuple1, &desc);
        cummin_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &cummin_outNamedTuple1;
}
PyTypeObject* get_eig_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""},  {nullptr} };
    static PyTypeObject eig_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.eig_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&eig_outNamedTuple, &desc);
        eig_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &eig_outNamedTuple;
}

PyTypeObject* get_eig_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""},  {nullptr} };
    static PyTypeObject eigNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.eig", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&eigNamedTuple1, &desc);
        eigNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &eigNamedTuple1;
}
PyTypeObject* get_frexp_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"mantissa", ""}, {"exponent", ""},  {nullptr} };
    static PyTypeObject frexpNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.frexp", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&frexpNamedTuple, &desc);
        frexpNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &frexpNamedTuple;
}

PyTypeObject* get_frexp_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"mantissa", ""}, {"exponent", ""},  {nullptr} };
    static PyTypeObject frexp_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.frexp_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&frexp_outNamedTuple1, &desc);
        frexp_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &frexp_outNamedTuple1;
}
PyTypeObject* get_geqrf_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"a", ""}, {"tau", ""},  {nullptr} };
    static PyTypeObject geqrf_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.geqrf_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&geqrf_outNamedTuple, &desc);
        geqrf_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &geqrf_outNamedTuple;
}

PyTypeObject* get_geqrf_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"a", ""}, {"tau", ""},  {nullptr} };
    static PyTypeObject geqrfNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.geqrf", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&geqrfNamedTuple1, &desc);
        geqrfNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &geqrfNamedTuple1;
}
PyTypeObject* get_histogram_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"hist", ""}, {"bin_edges", ""},  {nullptr} };
    static PyTypeObject histogram_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.histogram_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&histogram_outNamedTuple, &desc);
        histogram_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &histogram_outNamedTuple;
}

PyTypeObject* get_histogram_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"hist", ""}, {"bin_edges", ""},  {nullptr} };
    static PyTypeObject histogramNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.histogram", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&histogramNamedTuple1, &desc);
        histogramNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &histogramNamedTuple1;
}
PyTypeObject* get_kthvalue_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject kthvalueNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.kthvalue", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&kthvalueNamedTuple, &desc);
        kthvalueNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &kthvalueNamedTuple;
}

PyTypeObject* get_kthvalue_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject kthvalue_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.kthvalue_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&kthvalue_outNamedTuple1, &desc);
        kthvalue_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &kthvalue_outNamedTuple1;
}
PyTypeObject* get_linalg_cholesky_ex_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"L", ""}, {"info", ""},  {nullptr} };
    static PyTypeObject linalg_cholesky_exNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_cholesky_ex", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_cholesky_exNamedTuple, &desc);
        linalg_cholesky_exNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_cholesky_exNamedTuple;
}

PyTypeObject* get_linalg_cholesky_ex_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"L", ""}, {"info", ""},  {nullptr} };
    static PyTypeObject linalg_cholesky_ex_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_cholesky_ex_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_cholesky_ex_outNamedTuple1, &desc);
        linalg_cholesky_ex_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_cholesky_ex_outNamedTuple1;
}
PyTypeObject* get_linalg_eig_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""},  {nullptr} };
    static PyTypeObject linalg_eigNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_eig", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_eigNamedTuple, &desc);
        linalg_eigNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_eigNamedTuple;
}

PyTypeObject* get_linalg_eig_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""},  {nullptr} };
    static PyTypeObject linalg_eig_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_eig_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_eig_outNamedTuple1, &desc);
        linalg_eig_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_eig_outNamedTuple1;
}
PyTypeObject* get_linalg_eigh_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""},  {nullptr} };
    static PyTypeObject linalg_eighNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_eigh", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_eighNamedTuple, &desc);
        linalg_eighNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_eighNamedTuple;
}

PyTypeObject* get_linalg_eigh_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""},  {nullptr} };
    static PyTypeObject linalg_eigh_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_eigh_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_eigh_outNamedTuple1, &desc);
        linalg_eigh_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_eigh_outNamedTuple1;
}
PyTypeObject* get_linalg_inv_ex_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"inverse", ""}, {"info", ""},  {nullptr} };
    static PyTypeObject linalg_inv_exNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_inv_ex", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_inv_exNamedTuple, &desc);
        linalg_inv_exNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_inv_exNamedTuple;
}

PyTypeObject* get_linalg_inv_ex_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"inverse", ""}, {"info", ""},  {nullptr} };
    static PyTypeObject linalg_inv_ex_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_inv_ex_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_inv_ex_outNamedTuple1, &desc);
        linalg_inv_ex_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_inv_ex_outNamedTuple1;
}
PyTypeObject* get_linalg_lstsq_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"residuals", ""}, {"rank", ""}, {"singular_values", ""},  {nullptr} };
    static PyTypeObject linalg_lstsqNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_lstsq", nullptr, NamedTuple_fields, 4 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_lstsqNamedTuple, &desc);
        linalg_lstsqNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_lstsqNamedTuple;
}

PyTypeObject* get_linalg_lstsq_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"residuals", ""}, {"rank", ""}, {"singular_values", ""},  {nullptr} };
    static PyTypeObject linalg_lstsq_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_lstsq_out", nullptr, NamedTuple_fields, 4 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_lstsq_outNamedTuple1, &desc);
        linalg_lstsq_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_lstsq_outNamedTuple1;
}
PyTypeObject* get_linalg_qr_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""},  {nullptr} };
    static PyTypeObject linalg_qrNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_qr", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_qrNamedTuple, &desc);
        linalg_qrNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_qrNamedTuple;
}

PyTypeObject* get_linalg_qr_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""},  {nullptr} };
    static PyTypeObject linalg_qr_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_qr_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_qr_outNamedTuple1, &desc);
        linalg_qr_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_qr_outNamedTuple1;
}
PyTypeObject* get_linalg_slogdet_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"sign", ""}, {"logabsdet", ""},  {nullptr} };
    static PyTypeObject linalg_slogdetNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_slogdet", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_slogdetNamedTuple, &desc);
        linalg_slogdetNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_slogdetNamedTuple;
}

PyTypeObject* get_linalg_slogdet_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"sign", ""}, {"logabsdet", ""},  {nullptr} };
    static PyTypeObject linalg_slogdet_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_slogdet_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_slogdet_outNamedTuple1, &desc);
        linalg_slogdet_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_slogdet_outNamedTuple1;
}
PyTypeObject* get_linalg_svd_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"Vh", ""},  {nullptr} };
    static PyTypeObject linalg_svd_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_svd_out", nullptr, NamedTuple_fields, 3 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_svd_outNamedTuple, &desc);
        linalg_svd_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_svd_outNamedTuple;
}

PyTypeObject* get_linalg_svd_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"Vh", ""},  {nullptr} };
    static PyTypeObject linalg_svdNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_svd", nullptr, NamedTuple_fields, 3 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_svdNamedTuple1, &desc);
        linalg_svdNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_svdNamedTuple1;
}
PyTypeObject* get_lstsq_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"QR", ""},  {nullptr} };
    static PyTypeObject lstsq_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.lstsq_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&lstsq_outNamedTuple, &desc);
        lstsq_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &lstsq_outNamedTuple;
}

PyTypeObject* get_lstsq_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"QR", ""},  {nullptr} };
    static PyTypeObject lstsqNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.lstsq", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&lstsqNamedTuple1, &desc);
        lstsqNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &lstsqNamedTuple1;
}
PyTypeObject* get_lu_unpack_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"P", ""}, {"L", ""}, {"U", ""},  {nullptr} };
    static PyTypeObject lu_unpackNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.lu_unpack", nullptr, NamedTuple_fields, 3 };
    if (!is_initialized) {
        PyStructSequence_InitType(&lu_unpackNamedTuple, &desc);
        lu_unpackNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &lu_unpackNamedTuple;
}

PyTypeObject* get_lu_unpack_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"P", ""}, {"L", ""}, {"U", ""},  {nullptr} };
    static PyTypeObject lu_unpack_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.lu_unpack_out", nullptr, NamedTuple_fields, 3 };
    if (!is_initialized) {
        PyStructSequence_InitType(&lu_unpack_outNamedTuple1, &desc);
        lu_unpack_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &lu_unpack_outNamedTuple1;
}
PyTypeObject* get_max_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject maxNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.max", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&maxNamedTuple, &desc);
        maxNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &maxNamedTuple;
}

PyTypeObject* get_max_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject max_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.max_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&max_outNamedTuple1, &desc);
        max_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &max_outNamedTuple1;
}
PyTypeObject* get_median_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject medianNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.median", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&medianNamedTuple, &desc);
        medianNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &medianNamedTuple;
}

PyTypeObject* get_median_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject median_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.median_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&median_outNamedTuple1, &desc);
        median_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &median_outNamedTuple1;
}
PyTypeObject* get_min_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject minNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.min", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&minNamedTuple, &desc);
        minNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &minNamedTuple;
}

PyTypeObject* get_min_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject min_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.min_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&min_outNamedTuple1, &desc);
        min_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &min_outNamedTuple1;
}
PyTypeObject* get_mode_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject modeNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.mode", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&modeNamedTuple, &desc);
        modeNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &modeNamedTuple;
}

PyTypeObject* get_mode_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject mode_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.mode_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&mode_outNamedTuple1, &desc);
        mode_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &mode_outNamedTuple1;
}
PyTypeObject* get_nanmedian_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject nanmedianNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.nanmedian", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&nanmedianNamedTuple, &desc);
        nanmedianNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &nanmedianNamedTuple;
}

PyTypeObject* get_nanmedian_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject nanmedian_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.nanmedian_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&nanmedian_outNamedTuple1, &desc);
        nanmedian_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &nanmedian_outNamedTuple1;
}
PyTypeObject* get_qr_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""},  {nullptr} };
    static PyTypeObject qr_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.qr_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&qr_outNamedTuple, &desc);
        qr_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &qr_outNamedTuple;
}

PyTypeObject* get_qr_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""},  {nullptr} };
    static PyTypeObject qrNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.qr", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&qrNamedTuple1, &desc);
        qrNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &qrNamedTuple1;
}
PyTypeObject* get_slogdet_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"sign", ""}, {"logabsdet", ""},  {nullptr} };
    static PyTypeObject slogdetNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.slogdet", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&slogdetNamedTuple, &desc);
        slogdetNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &slogdetNamedTuple;
}
PyTypeObject* get_solve_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"LU", ""},  {nullptr} };
    static PyTypeObject solveNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.solve", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&solveNamedTuple, &desc);
        solveNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &solveNamedTuple;
}

PyTypeObject* get_solve_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"LU", ""},  {nullptr} };
    static PyTypeObject solve_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.solve_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&solve_outNamedTuple1, &desc);
        solve_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &solve_outNamedTuple1;
}
PyTypeObject* get_sort_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject sort_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.sort_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&sort_outNamedTuple, &desc);
        sort_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &sort_outNamedTuple;
}

PyTypeObject* get_sort_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject sortNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.sort", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&sortNamedTuple1, &desc);
        sortNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &sortNamedTuple1;
}
PyTypeObject* get_svd_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"V", ""},  {nullptr} };
    static PyTypeObject svd_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.svd_out", nullptr, NamedTuple_fields, 3 };
    if (!is_initialized) {
        PyStructSequence_InitType(&svd_outNamedTuple, &desc);
        svd_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &svd_outNamedTuple;
}

PyTypeObject* get_svd_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"V", ""},  {nullptr} };
    static PyTypeObject svdNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.svd", nullptr, NamedTuple_fields, 3 };
    if (!is_initialized) {
        PyStructSequence_InitType(&svdNamedTuple1, &desc);
        svdNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &svdNamedTuple1;
}
PyTypeObject* get_symeig_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""},  {nullptr} };
    static PyTypeObject symeig_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.symeig_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&symeig_outNamedTuple, &desc);
        symeig_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &symeig_outNamedTuple;
}

PyTypeObject* get_symeig_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""},  {nullptr} };
    static PyTypeObject symeigNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.symeig", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&symeigNamedTuple1, &desc);
        symeigNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &symeigNamedTuple1;
}
PyTypeObject* get_topk_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject topk_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.topk_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&topk_outNamedTuple, &desc);
        topk_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &topk_outNamedTuple;
}

PyTypeObject* get_topk_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject topkNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.topk", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&topkNamedTuple1, &desc);
        topkNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &topkNamedTuple1;
}
PyTypeObject* get_triangular_solve_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"cloned_coefficient", ""},  {nullptr} };
    static PyTypeObject triangular_solve_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.triangular_solve_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&triangular_solve_outNamedTuple, &desc);
        triangular_solve_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &triangular_solve_outNamedTuple;
}

PyTypeObject* get_triangular_solve_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"cloned_coefficient", ""},  {nullptr} };
    static PyTypeObject triangular_solveNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.triangular_solve", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&triangular_solveNamedTuple1, &desc);
        triangular_solveNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &triangular_solveNamedTuple1;
}
}

namespace torch {
namespace autograd {

std::map<std::string, PyTypeObject*>& get_namedtuple_types_map() {
  // [NOTE] Non-global map
  // This map calls Python functions during its initialization.
  // If it is a global static variable and in case it is loaded
  // before Python interpreter is ready, then the calls it makes during
  // initialization will SEGFAULT.
  // To avoid this we make it function static variable so that it is
  // initialized only after the Python interpreter is ready.
  static std::map<std::string, PyTypeObject*> namedtuple_types_map = {
    {"_det_lu_based_helper", get__det_lu_based_helper_namedtuple()},
    {"_fake_quantize_per_tensor_affine_cachemask_tensor_qparams", get__fake_quantize_per_tensor_affine_cachemask_tensor_qparams_namedtuple()},
    {"_fused_moving_avg_obs_fq_helper", get__fused_moving_avg_obs_fq_helper_namedtuple()},
    {"_lu_with_info", get__lu_with_info_namedtuple()},
    {"_unpack_dual", get__unpack_dual_namedtuple()},
    {"aminmax", get_aminmax_namedtuple()},
    {"aminmax_out", get_aminmax_out_namedtuple()},
    {"cummax", get_cummax_namedtuple()},
    {"cummax_out", get_cummax_out_namedtuple()},
    {"cummin", get_cummin_namedtuple()},
    {"cummin_out", get_cummin_out_namedtuple()},
    {"eig_out", get_eig_out_namedtuple()},
    {"eig", get_eig_namedtuple()},
    {"frexp", get_frexp_namedtuple()},
    {"frexp_out", get_frexp_out_namedtuple()},
    {"geqrf_out", get_geqrf_out_namedtuple()},
    {"geqrf", get_geqrf_namedtuple()},
    {"histogram_out", get_histogram_out_namedtuple()},
    {"histogram", get_histogram_namedtuple()},
    {"kthvalue", get_kthvalue_namedtuple()},
    {"kthvalue_out", get_kthvalue_out_namedtuple()},
    {"linalg_cholesky_ex", get_linalg_cholesky_ex_namedtuple()},
    {"linalg_cholesky_ex_out", get_linalg_cholesky_ex_out_namedtuple()},
    {"linalg_eig", get_linalg_eig_namedtuple()},
    {"linalg_eig_out", get_linalg_eig_out_namedtuple()},
    {"linalg_eigh", get_linalg_eigh_namedtuple()},
    {"linalg_eigh_out", get_linalg_eigh_out_namedtuple()},
    {"linalg_inv_ex", get_linalg_inv_ex_namedtuple()},
    {"linalg_inv_ex_out", get_linalg_inv_ex_out_namedtuple()},
    {"linalg_lstsq", get_linalg_lstsq_namedtuple()},
    {"linalg_lstsq_out", get_linalg_lstsq_out_namedtuple()},
    {"linalg_qr", get_linalg_qr_namedtuple()},
    {"linalg_qr_out", get_linalg_qr_out_namedtuple()},
    {"linalg_slogdet", get_linalg_slogdet_namedtuple()},
    {"linalg_slogdet_out", get_linalg_slogdet_out_namedtuple()},
    {"linalg_svd_out", get_linalg_svd_out_namedtuple()},
    {"linalg_svd", get_linalg_svd_namedtuple()},
    {"lstsq_out", get_lstsq_out_namedtuple()},
    {"lstsq", get_lstsq_namedtuple()},
    {"lu_unpack", get_lu_unpack_namedtuple()},
    {"lu_unpack_out", get_lu_unpack_out_namedtuple()},
    {"max", get_max_namedtuple()},
    {"max_out", get_max_out_namedtuple()},
    {"median", get_median_namedtuple()},
    {"median_out", get_median_out_namedtuple()},
    {"min", get_min_namedtuple()},
    {"min_out", get_min_out_namedtuple()},
    {"mode", get_mode_namedtuple()},
    {"mode_out", get_mode_out_namedtuple()},
    {"nanmedian", get_nanmedian_namedtuple()},
    {"nanmedian_out", get_nanmedian_out_namedtuple()},
    {"qr_out", get_qr_out_namedtuple()},
    {"qr", get_qr_namedtuple()},
    {"slogdet", get_slogdet_namedtuple()},
    {"solve", get_solve_namedtuple()},
    {"solve_out", get_solve_out_namedtuple()},
    {"sort_out", get_sort_out_namedtuple()},
    {"sort", get_sort_namedtuple()},
    {"svd_out", get_svd_out_namedtuple()},
    {"svd", get_svd_namedtuple()},
    {"symeig_out", get_symeig_out_namedtuple()},
    {"symeig", get_symeig_namedtuple()},
    {"topk_out", get_topk_out_namedtuple()},
    {"topk", get_topk_namedtuple()},
    {"triangular_solve_out", get_triangular_solve_out_namedtuple()},
    {"triangular_solve", get_triangular_solve_namedtuple()},
  };
  return namedtuple_types_map;
}

PyTypeObject* get_namedtuple(std::string name) {
  static auto& namedtuple_types_map = get_namedtuple_types_map();
  return namedtuple_types_map[name];
}

void initReturnTypes(PyObject* module) {
  static struct PyModuleDef def = {
      PyModuleDef_HEAD_INIT, "torch._C._return_types", nullptr, -1, {}};
  PyObject* return_types_module = PyModule_Create(&def);
  if (!return_types_module) {
    throw python_error();
  }

  for (const auto& return_type_pair : get_namedtuple_types_map()) {
    // hold onto the TypeObject for the unlikely case of user
    // deleting or overriding it.
    Py_INCREF(return_type_pair.second);
    if (PyModule_AddObject(
            return_types_module,
            return_type_pair.first.c_str(),
            (PyObject*)return_type_pair.second) != 0) {
      Py_DECREF((PyObject*)return_type_pair.second);
      throw python_error();
    }
  }

  // steals a reference to return_types on success
  if (PyModule_AddObject(module, "_return_types", return_types_module) != 0) {
    Py_DECREF(return_types_module);
    throw python_error();
  }
}

} // namespace autograd
} // namespace torch

```

</details>

<details>

<summary>Eg. updated call in other python_*_functions</summary>

```cpp
// linalg_cholesky_ex
static PyObject * THPVariable_linalg_cholesky_ex(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  HANDLE_TH_ERRORS
  static PyTypeObject* NamedTuple = get_namedtuple("linalg_cholesky_ex");
  static PyTypeObject* NamedTuple1 = get_namedtuple("linalg_cholesky_ex_out");
  static PythonArgParser parser({
    "linalg_cholesky_ex(Tensor input, *, bool upper=False, bool check_errors=False, TensorList[2] out=None)",
  }, /*traceable=*/true);

  ParsedArgs<4> parsed_args;
  auto _r = parser.parse(nullptr, args, kwargs, parsed_args);
  if(_r.has_torch_function()) {
    return handle_torch_function(_r, nullptr, args, kwargs, THPLinalgVariableFunctionsModule, "torch.linalg");
  }
  if (_r.isNone(3)) {
    // aten::linalg_cholesky_ex(Tensor self, *, bool upper=False, bool check_errors=False) -> (Tensor L, Tensor info)

    auto dispatch_linalg_cholesky_ex = [](const at::Tensor & self, bool upper, bool check_errors) -> ::std::tuple<at::Tensor,at::Tensor> {
      pybind11::gil_scoped_release no_gil;
      return at::linalg_cholesky_ex(self, upper, check_errors);
    };
    return wrap(NamedTuple, dispatch_linalg_cholesky_ex(_r.tensor(0), _r.toBool(1), _r.toBool(2)));
  } else {
    // aten::linalg_cholesky_ex.L(Tensor self, *, bool upper=False, bool check_errors=False, Tensor(a!) L, Tensor(b!) info) -> (Tensor(a!) L, Tensor(b!) info)
    auto out = _r.tensorlist_n<2>(3);
    auto dispatch_linalg_cholesky_ex_out = [](at::Tensor & L, at::Tensor & info, const at::Tensor & self, bool upper, bool check_errors) -> ::std::tuple<at::Tensor,at::Tensor> {
      pybind11::gil_scoped_release no_gil;
      return at::linalg_cholesky_ex_out(L, info, self, upper, check_errors);
    };
    return wrap(NamedTuple1, dispatch_linalg_cholesky_ex_out(out[0], out[1], _r.tensor(0), _r.toBool(1), _r.toBool(2)));
  }
  Py_RETURN_NONE;
  END_HANDLE_TH_ERRORS
}

```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66614

Reviewed By: H-Huang

Differential Revision: D32741134

Pulled By: zou3519

fbshipit-source-id: 27bada30d20e66333ca1be1775608d9f0cbf9f59
2021-12-06 09:05:29 -08:00
78b7a419b2 Enable native_dropout/backward for lazy (#69374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69374

Enables existing native_dropout operator for use with lazy tensors.  Also adds aten interned strings so lazy tensor codegen can refer to the symbols in generated IR classes.

Test Plan: CI for regressions of existing use cases, and manual tests of new Lazy Tensor functionality

Reviewed By: ngimel

Differential Revision: D32837301

fbshipit-source-id: a372a24ec65367fb84ad2e97c7e38cae4ec703a6
2021-12-06 08:14:10 -08:00
b6f41bb848 The Jiterator (#69439)
Summary:
This PR:

- creates the "jiterator" pattern, allowing elementwise unary and binary kernels that don't accept scalars to be jit compiled when called
- ports the gcd and i1 CUDA kernels to use the jiterator
- extends elementwise binary systemic testing to be comparable to elementwise unary systemic testing
- separates one test case from test_out in test_ops.py
- updates more OpInfos to use expected failures instead of skips

The jiterator currently does not support half, bfloat16 or complex dtypes. It also (as mentioned above) doesn't support scalar inputs. In the future we expect to add support for those datatypes and scalars.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69439

Reviewed By: ngimel

Differential Revision: D32874968

Pulled By: mruberry

fbshipit-source-id: d44bb9cde4f602703e75400ec5a0b209f085e9b3
2021-12-06 07:32:48 -08:00
3202028ed1 [Core ML] Avoid recompiling models when the OS version is not changed (#69438)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69438

We don't need to recompile the model if the OS version is not changed. This could save hundreds of ms when loading the model.

{F683788183}
ghstack-source-id: 144784720
ghstack-source-id: 144821734

Test Plan:
1. Test in the playground app
2. Test in the ig

Reviewed By: hanton

Differential Revision: D32866326

fbshipit-source-id: ae2174f68dda4d2ab89ee328cb710c08d45c4d9a
2021-12-06 00:49:51 -08:00
c97dc9286d Revert D32780415: [Static Runtime] Move implementation details from impl.h into internal.h
Test Plan: revert-hammer

Differential Revision:
D32780415 (999e93e6a8)

Original commit changeset: 119b7aedbf56

fbshipit-source-id: 1aa777e8c1854ab27e86bc625188f7170097fac8
2021-12-04 19:44:07 -08:00
29a45f0009 Revert D32743881: [Core ML] Avoid recompiling models when the OS version is not changed
Test Plan: revert-hammer

Differential Revision:
D32743881 (b97903abb8)

Original commit changeset: 2e94c6035520

fbshipit-source-id: 6cb05c414a23e15604b095c333a92ed8980092bd
2021-12-04 15:57:58 -08:00
999e93e6a8 [Static Runtime] Move implementation details from impl.h into internal.h (#69274)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69274

`impl.h` is the main header file that defines the interface of Static Runtime to its clients.

However, it is currently filled with implementation details that should not be leaked to our clients. 1) this can unnecessarily leak our internals to our clients which can make it hard to change them later 2) cause unnecessary merge conflicts when multiple people are touching this enormous impl.cpp file.

To alleviate the situation, this change moves the implementation details from impl.h into a new file, internal.h, that's internally kept without leaking the details to our clients.

This change will be followed by another change to rename `impl.h` into `runtime.h` or anything better since `impl.h` is currently not about implementation but SR's interface.

Note that this change is NOT complete since the remaining declarations in impl.h still contain a lot of implementation details. Therefore, we should keep working on minimizing the interface to prevent our API from being bloated unnecessarily. Also we need to work on modularizing our implementations into separate pieces organized by separate files in the near future.

Test Plan: Existing unittests

Reviewed By: donaldong

Differential Revision: D32780415

fbshipit-source-id: 119b7aedbf563b195641c5674572a9348732145f
2021-12-04 14:48:28 -08:00
b97903abb8 [Core ML] Avoid recompiling models when the OS version is not changed (#69234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69234

We don't need to recompile the model if the OS version is not changed. This could save hundreds of ms when loading the model.

{F683788183}
ghstack-source-id: 144784720

Test Plan:
1. Test in the playground app
2. Test in the ig

Reviewed By: hanton

Differential Revision: D32743881

fbshipit-source-id: 2e94c6035520de3eeaf0b61f7cf9082228c8a955
2021-12-04 13:38:27 -08:00
e8f4c9cc40 [LT] Upstream LazyView and view ops IR Nodes (#69277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69277

LazyView is the main class for tracking alias caused by view
ops. The corresponding IR classes for view ops are hand-written now, and
we can switch to code-gen them in future. For certain view ops, they
have a reverse IR class to perform inplace update in the backward
direction on a chain of alias ops.

As part of the future work, we will simplify the logic for LazyView once
the functionalization pass in core is ready to use.

Test Plan: Imported from OSS

Reviewed By: wconstab

Differential Revision: D32820014

Pulled By: desertfire

fbshipit-source-id: d9eb526cb23885f667e4815dc9dd291a7b7e4256
2021-12-04 08:44:54 -08:00
0bbe21b172 [LT] Upstream more util functions (#69098)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69098

Add the following utils: helpers, ir_dump_util, and
tensor_util. Some of the util functions may be better organized by
grouping into different files, but we can leave that for later.

Test Plan: Imported from OSS

Reviewed By: alanwaketan

Differential Revision: D32758480

Pulled By: desertfire

fbshipit-source-id: 2a0707879f0c49573380b4c8227a3c916c99bf9a
2021-12-04 08:42:35 -08:00
bfe5ad28e6 [Linalg] Add a runtime switch to let pytorch prefer a backend impl in linalg functions on GPU (#67980)
Summary:
Per title.

This PR introduces a global flag that lets pytorch prefer one of the many backend implementations while calling linear algebra functions on GPU.

Usage:
```python
torch.backends.cuda.preferred_linalg_library('cusolver')
```

Available options (str): `'default'`, `'cusolver'`, `'magma'`.

Issue https://github.com/pytorch/pytorch/issues/63992 inspired me to write this PR. No heuristic is perfect on all devices, library versions, matrix shapes, workloads, etc. We can obtain better performance if we can conveniently switch linear algebra backends at runtime.

Performance of linear algebra operators after this PR should be no worse than before. The flag is set to **`'default'`** by default, which makes everything the same as before this PR.

The implementation of this PR is basically following that of https://github.com/pytorch/pytorch/pull/67790.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67980

Reviewed By: mruberry

Differential Revision: D32849457

Pulled By: ngimel

fbshipit-source-id: 679fee7744a03af057995aef06316306073010a6
2021-12-03 19:06:30 -08:00
9663e08674 [Static Runtime] Fix a bug that aten::embedding_bag keeps cannot handle resized input tensors (#69219)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69219

This change fixes a bug that `aten::embedding_bag` implementation does not adjust the size of a managed output tensor according to a given input after memory planning starts.

Test Plan: Enhanced `StaticRuntime.EmbeddingBag` to trigger the existing bug that's fixed by this change.

Reviewed By: mikeiovine

Differential Revision: D32544399

fbshipit-source-id: 0a9f1d453e96f0cfa8443c8d0b28bbc520e38b29
2021-12-03 19:01:45 -08:00
6a4fa86026 Add OpInfos for misc nn.functional operators (#68922)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68922

Reviewed By: Chillee

Differential Revision: D32842301

Pulled By: saketh-are

fbshipit-source-id: b7166faefb64668fc76cca6c528501b0d360c43b
2021-12-03 17:03:02 -08:00
da023611d7 [CUDA graphs] Fixes make_graphed_callables example typos (#69379)
Summary:
cc mcarilli

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69379

Reviewed By: mruberry

Differential Revision: D32841260

Pulled By: ngimel

fbshipit-source-id: a7d0b9db0578526907547b201eddd55827812b63
2021-12-03 16:51:14 -08:00
e92b14bf1f Update CUDA version to 11.3 and setup proper environment variables. (#69383)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69383

Test Plan:
TorchBench CI

RUN_TORCHBENCH: hf_Bert

Reviewed By: janeyx99

Differential Revision: D32845001

Pulled By: xuzhao9

fbshipit-source-id: 50dff742ad4786e4b4995bd9aa82629b2fc050c5
2021-12-03 16:12:29 -08:00
a3ca4c83a6 [PyTorch] Add torch::jit::toString(const Type&) (#66689)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66689

Let's not take an extra refcount bump to stringify types.
ghstack-source-id: 144374720

Test Plan: CI

Reviewed By: suo

Differential Revision: D31691526

fbshipit-source-id: 673d632a83e6179c063530fdbc346c22d5f47d7c
2021-12-03 15:16:08 -08:00
855365e9c4 Clean up dead code (#69296)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69296

remove a commented block of code that was accidentally checked in

Test Plan: no testable changes

Reviewed By: alanwaketan

Differential Revision: D32799197

fbshipit-source-id: d3eb05cbafb0f5a4a3f41c17f66ca6d0c2fc60b7
2021-12-03 15:11:38 -08:00
a813ddf5ec CUDACachingAllocator: make an error message more accurate. (#69174)
Summary:
The `TORCH_CHECK` asserts for strictly-greater-than `kLargeBuffer`,
but the exception claims `>=`. Fix the error message to match the
code.

Happy to open an issue if it's helpful; I was hopeful the trivial fix doesn't need a separate issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69174

Reviewed By: zou3519

Differential Revision: D32760055

Pulled By: H-Huang

fbshipit-source-id: 1a8ab68f36b326ed62d78afdcb198f4d6572d017
2021-12-03 15:04:59 -08:00
088a4feb41 Update the documentation for AMP with DataParallel (#69218)
Summary:
Following https://github.com/pytorch/pytorch/issues/60540 and pull request https://github.com/pytorch/pytorch/issues/43102

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69218

Reviewed By: gchanan

Differential Revision: D32803814

Pulled By: ngimel

fbshipit-source-id: 06fdbbee2c7734153271be70ec4bc24263c8c367
2021-12-03 14:58:47 -08:00
80a67cd33c Limit uploading JSONs to trunk (#69385)
Summary:
Mac workflows on forked PRs don't have the right permissions to upload artifacts :/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69385

Reviewed By: malfet, atalman

Differential Revision: D32843252

Pulled By: janeyx99

fbshipit-source-id: e137a6707fe46559771b9d77fbfe44b0a21c914a
2021-12-03 13:20:37 -08:00
a20b9f8d5c add HPU case for backend_to_string function (#69225)
Summary:
Change-Id: If8ed7f1161343a2e494d8b964576e1ee193007f7

Fixes https://github.com/pytorch/pytorch/issues/65609

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69225

Reviewed By: gchanan

Differential Revision: D32804545

Pulled By: wconstab

fbshipit-source-id: bdf359bd779113153ebdecc515edba94e47e0ae4
2021-12-03 12:54:15 -08:00
6f7a5ddffc [SR] Use std::vector::reserve in GetLivenessMap (#68884)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68884

This diff uses std::vector::reserve in GetLivenessMap to set container capacity for all local contains to avoid runtime resizing.

The changes should theoretically improves the performance by a little.

Test Plan:
- [x] `buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- -v 1`
- [x]
```
seq 1 10 | xargs -I{} ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench \
--scripted_model=/data/users/dxd/302008423_0.predictor.disagg.local \
--method_name=local_request_only.forward --pt_cleanup_activations=1 \
--pt_enable_out_variant=1 --pt_optimize_memory=1 --iters=0 --warmup_iters=0 \
--num_threads=1 --pt_enable_static_runtime=1 --set_compatibility=1 \
--input_type="recordio" --pt_inputs=/data/users/dxd/302008423_0.local_ro.inputs.recordio \
--recordio_use_ivalue_format=1
```

### Before
```
I1201 12:04:46.753311 2874563 PyTorchPredictorBenchLib.cpp:336] Took 10.9826 sec to initialize a predictor.
I1201 12:05:00.617139 2875780 PyTorchPredictorBenchLib.cpp:336] Took 11.1078 sec to initialize a predictor.
I1201 12:05:15.279667 2876813 PyTorchPredictorBenchLib.cpp:336] Took 11.7979 sec to initialize a predictor.
I1201 12:05:30.201207 2877554 PyTorchPredictorBenchLib.cpp:336] Took 11.8901 sec to initialize a predictor.
I1201 12:05:44.386926 2879713 PyTorchPredictorBenchLib.cpp:336] Took 11.2722 sec to initialize a predictor.
I1201 12:05:58.003582 2881426 PyTorchPredictorBenchLib.cpp:336] Took 10.8046 sec to initialize a predictor.
I1201 12:06:12.004778 2882604 PyTorchPredictorBenchLib.cpp:336] Took 11.2754 sec to initialize a predictor.
I1201 12:06:26.101241 2884888 PyTorchPredictorBenchLib.cpp:336] Took 11.3355 sec to initialize a predictor.
I1201 12:06:40.364817 2886572 PyTorchPredictorBenchLib.cpp:336] Took 11.401 sec to initialize a predictor.
I1201 12:06:54.483794 2888614 PyTorchPredictorBenchLib.cpp:336] Took 11.3498 sec to initialize a predictor.
```

### After
```
I1201 11:51:53.775239 2818391 PyTorchPredictorBenchLib.cpp:336] Took 10.9113 sec to initialize a predictor.
I1201 11:52:07.412720 2819530 PyTorchPredictorBenchLib.cpp:336] Took 10.8413 sec to initialize a predictor.
I1201 11:52:21.202816 2820359 PyTorchPredictorBenchLib.cpp:336] Took 11.0216 sec to initialize a predictor.
I1201 11:52:35.513288 2821029 PyTorchPredictorBenchLib.cpp:336] Took 11.4216 sec to initialize a predictor.
I1201 11:52:49.145979 2821930 PyTorchPredictorBenchLib.cpp:336] Took 10.8272 sec to initialize a predictor.
I1201 11:53:02.908790 2822859 PyTorchPredictorBenchLib.cpp:336] Took 11.0262 sec to initialize a predictor.
I1201 11:53:16.276015 2823657 PyTorchPredictorBenchLib.cpp:336] Took 10.6893 sec to initialize a predictor.
I1201 11:53:30.103283 2824382 PyTorchPredictorBenchLib.cpp:336] Took 11.1854 sec to initialize a predictor.
I1201 11:53:44.298514 2825365 PyTorchPredictorBenchLib.cpp:336] Took 11.4796 sec to initialize a predictor.
I1201 11:53:58.258708 2826128 PyTorchPredictorBenchLib.cpp:336] Took 11.2652 sec to initialize a predictor.
```

Reviewed By: swolchok

Differential Revision: D32649252

fbshipit-source-id: 5cd296d12b12e5b15e85e4f1a8a236e293f37f9c
2021-12-03 12:18:06 -08:00
ae11264583 Fixed type checking errors in node.py (#68124)
Summary:
Fixes [issue#67](https://github.com/MLH-Fellowship/pyre-check/issues/67)
This PR fixes the type checking errors in Pytorch torch/fx/node.py .
The variable types in 363:20 and 364:20 were declared to have type `List[str]`  but were  assigned a value of  `None`. This caused an incompatitble variable type error.  I changed the type from `List[str]` to `Optional[List[str]` . This therefore fixed the incompatitble variable type error.

Signed-off-by: Onyemowo  Agbo
onionymous
0xedward

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68124

Reviewed By: gmagogsfm

Differential Revision: D32322414

Pulled By: onionymous

fbshipit-source-id: be11bbbd463715ddf28a5ba78fb4adbf62878c80
2021-12-03 12:03:49 -08:00
6baaec30cd [DataPipe] Adding ShufflerMapDataPipe (#68606)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68606

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D32813290

Pulled By: NivekT

fbshipit-source-id: 8d1ebd5bc776563c23250f76a2efc1d395f1af9c
2021-12-03 11:36:33 -08:00
3e45739543 [PyTorch][JIT] Use stack.pop_back() instead of pop(stack) for DROP (#69326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69326

Looks like this really is slightly cheaper (see assembly diff screenshot in internal test plan). The problem is that `pop()` returns the value, so we have to spend instructions moving it out of the stack and then destroying it via a local.
ghstack-source-id: 144641680

Test Plan:
{F684148304}

CI

Reviewed By: zhxchen17

Differential Revision: D32812841

fbshipit-source-id: e9e43458d3364842f67edd43e43575a1f72e3cb0
2021-12-03 11:09:05 -08:00
2c84b010e6 [PyTorch] Use toObjectRef in JIT interpreter (#69324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69324

This slightly shrinks runImpl.

Before:
- Move pointer out of IValue
- Clear the IValue to none
- Do our thing with the Object
- destroy the intrusive_ptr on the C stack
- destroy the IValue on the C stack (even though it was cleared to None, the destructor has to run anyway)

After:
- Grab the pointer out of IValue
- Do our thing with the Object
- Decref the pointer in the IValue on the JIT stack as we assign over it

We should be saving at least the memory traffic from clearing the IValue and possibly the dtor code as well.
ghstack-source-id: 144638920

Test Plan:
Inspected assembly to verify shorter runImpl

Tried to microbenchmark (D32809454) but can't show a difference.

Reviewed By: gchanan

Differential Revision: D32812252

fbshipit-source-id: a3689f061ee51ef01e4696bd4c6ffcbc41c30af5
2021-12-03 11:07:16 -08:00
5a480831e6 .github: Propagate WITH_PUSH to docs jobs (#69372)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69372

Docs weren't getting push since this variable wasn't getting propagated
to the docker container

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32837012

Pulled By: seemethere

fbshipit-source-id: 5074d5266a567df2230981186cabffb53c01c634
2021-12-03 11:00:38 -08:00
8f8524a447 Expose is_metal_available in header (#68942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68942

Currently, `at::native::is_metal_available()` is implemented, but it's not exposed in the header, so nobody can use it. It's a useful function and I want to use it, so exposing it in the header.

Test Plan: CI

Reviewed By: sodastsai, xta0

Differential Revision: D32675236

fbshipit-source-id: b4e692db7d171dfb872d5c2233cc808d7131f2e9
2021-12-03 10:31:03 -08:00
77ca153d3e Remove columns and ones from slow2d transpose signatures (#68898)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68898

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32655873

Pulled By: jbschlosser

fbshipit-source-id: 810035a745e3851bd5326459b563e4796a074a65
2021-12-03 09:56:18 -08:00
7ca2da14e9 Remove finput and fgrad_input from slow3d signatures (#68897)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68897

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32655875

Pulled By: jbschlosser

fbshipit-source-id: 8d04968b2df47e11da1eceb1612d55d00768eeb4
2021-12-03 09:55:02 -08:00
73d2ca20e0 .github: Add credentials for macos test jobs (#69371)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69371

macOS jobs need credentials to upload their test stats

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32836893

Pulled By: seemethere

fbshipit-source-id: 0f5a8f1b35f4240d57b08a2120a97a13ba3b3de5
2021-12-03 09:43:41 -08:00
6ed7354435 [SR][Code cleanup] Typedef/default for kwargs (#69164)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69164

We have lots of methods that take `std::unordered_map<std::string, c10::IValue>` now. That's kind of ugly and cumbersome to type, so add a `KWargs` typedef.

Also made the `operator()` default `kwargs` to empty. Note that we could have another overload that doesn't take `kwargs` at all, but the perf gain is so minuscule it's probably not worth it.
ghstack-source-id: 144691899

Test Plan: CI

Reviewed By: d1jang

Differential Revision: D32734677

fbshipit-source-id: 8d6496a6d1ec2dc71253151d2f6408f1387966cf
2021-12-03 09:27:37 -08:00
b761172406 Revert D32786909: [C10D] [Easy] Use pinned memory for HtoD copies in Reducer:: sync_bucket_indices
Test Plan: revert-hammer

Differential Revision:
D32786909 (dbc8d9c947)

Original commit changeset: a53f96f57e67

fbshipit-source-id: 19599c3a489804bfdbb3062f4256dceb680c143b
2021-12-03 08:31:45 -08:00
e0fb228e03 Revert of adding windows CUDA 11.5 workflow (#69365)
Summary:
This is partial revert of bb522c9d7a to revert addition of workflows for CUDA 11.5 windows that fails

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69365

Reviewed By: suo

Differential Revision: D32831418

Pulled By: atalman

fbshipit-source-id: 184346d22623f88594312a4ce2e4d29cc67e8338
2021-12-03 08:00:16 -08:00
21919be96b CMake: Update precompiled header and fix support (#67851)
Summary:
This fixes the `USE_PRECOMPILED_HEADERS` cmake version check which was accidentally inverted, so it was always disabled.

I've also made the precompiled header so it only includes headers used in 95% or more of code, weighted by compile time. This limits it to the standard library, `c10` and a limited subset of `ATen/core`. Crucially, the new pch doesn't depend on `native_functions.yaml` so won't cause as much unnecessary rebuilding.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67851

Reviewed By: zou3519

Differential Revision: D32290902

Pulled By: dagitses

fbshipit-source-id: dfc33330028c99b02ff40963926c1f1260d00d00
2021-12-03 06:51:56 -08:00
cc46dc45e1 [SR] Factor logic that determines managed tensors out of MemoryPlanner (#68295)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68295

There's no reason we can't figure out what tensors we need to manage at model load time. It's also useful to have the set of ranges available at load time for integrating the ranges algorithm introduced in the previous diff.

Test Plan: `buck test caffe2/benchmarks/static_runtime/...`

Reviewed By: hlu1

Differential Revision: D32400593

fbshipit-source-id: 0466b2641166ddc9c14f72774f4ba151407be400
2021-12-03 04:45:27 -08:00
276cb8f501 [Pytorch Edge] Make Tracer support xirp metal segmentation (#69328)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69328

Aten_metal_prepack is cpp based and can be safely included here.

Test Plan: "Traced" the xirp model with the script.

Reviewed By: xta0

Differential Revision: D32813686

fbshipit-source-id: 7a428151348dc9d3f576531701926d6b3413de3d
2021-12-02 22:16:19 -08:00
a07ffe8d0e Add OpInfos for combinations, cartesian_prod, sum_to_size, ldexp, as_strided (#68853)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68853

Reviewed By: davidberard98

Differential Revision: D32811147

Pulled By: saketh-are

fbshipit-source-id: 941dcf949072f8d10faf4d5a0fa0ef409ac6a7db
2021-12-02 21:22:56 -08:00
834bd3134e Back out "Add efficient zero tensors" (#69327)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69327

Original commit changeset: d44096d88265

Original Phabricator Diff: D32144240 (668574af4a)

Test Plan:
CI

original diff failed 175 builds in CI

Reviewed By: airboyang, anjali411

Differential Revision: D32809407

fbshipit-source-id: c7c8e69bcee0274992e2d5da901f035332e60071
2021-12-02 19:11:41 -08:00
c572a603a6 fix for python 3.10 for gradient opinfo (#68113)
Summary:
This PR fixes https://github.com/pytorch/pytorch/issues/67612 by creating a tensor first and then converting the dtype explicitly using `.to(dtype)` call.

Looking forward to your feedback and suggestions on this.

cc: kshitij12345 mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68113

Reviewed By: zou3519

Differential Revision: D32797329

Pulled By: saketh-are

fbshipit-source-id: 5c34709ab277c82cda316a3ea1cf01e853e4c38b
2021-12-02 19:01:01 -08:00
572c3e3118 Fix some usages of CUDA_VERSION (#69092)
Summary:
See https://pytorch.slack.com/archives/G4Z791LL8/p1638229956006300

I grepped c10, aten, and torch for CUDA_VERSION and checked the usages I saw.
I can't guarantee I made a clean sweep. but this improves the status quo.

cc ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69092

Reviewed By: zou3519

Differential Revision: D32786919

Pulled By: ngimel

fbshipit-source-id: 1d29827dca246f33118d81e136252ddb5bf3830f
2021-12-02 18:32:47 -08:00
dbc8d9c947 [C10D] [Easy] Use pinned memory for HtoD copies in Reducer:: sync_bucket_indices (#69298)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69298

I was exploring adding an invariant that we actually use properly-tracked pinned memory when doing non-blocking copies (to plug various correctness holes), and found this case where we allocate a tensor without pinned memory and then copy it with non_blocking=True.

Test Plan: Unit tests cover this code.

Reviewed By: rohan-varma

Differential Revision: D32786909

fbshipit-source-id: a53f96f57e6727238e4cd2164c1a0f04cf270413
2021-12-02 17:34:34 -08:00
e2c7bd08b9 Add operator div (#68528)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68528

Add operator converter div, torch.floor_div is announce to be deprecated by pytorch, consider remove after full deprecation done by pytorch.

Reviewed By: 842974287

Differential Revision: D32497573

fbshipit-source-id: d06c864077f745c295c33fb25639b7116f85ca20
2021-12-02 17:25:40 -08:00
bede18b061 Add support for C++ frontend wrapper on Linux (#69094)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69094

Partially addresses https://github.com/pytorch/pytorch/issues/68768

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D32730079

Pulled By: malfet

fbshipit-source-id: 854e4215ff66e087bdf354fed7a17e87f2649c87
2021-12-02 16:47:00 -08:00
33c3c539b6 THPStorage: Prefer intrusive_ptr over owning raw pointers (#69248)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69248

Reviewed By: zou3519

Differential Revision: D32771035

Pulled By: ngimel

fbshipit-source-id: cf9bbcc5563ae9715ecf13631ba56c32240e59e3
2021-12-02 16:33:03 -08:00
9f39a2de0a [fix] add range check for k kthvalue_cpu (#68863)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68813

Long-term it might make more sense to port it to structured

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68863

Reviewed By: H-Huang

Differential Revision: D32749372

Pulled By: mruberry

fbshipit-source-id: 85a1b2a31e922ff1df0d0f3f576ad219e652aa49
2021-12-02 15:33:06 -08:00
cc85b68984 .github: Fix ci workflows generation (#69329)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69329

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32814709

Pulled By: seemethere

fbshipit-source-id: ea83aa0319bebb65623856ca9e34689581dd517b
2021-12-02 15:28:59 -08:00
f786b03f98 ci: Migrate docs push to GHA (#69172)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69172

Migrates the docs push jobs to Github Actions by implementing a simple
WITH_PUSH switch to do the actual push.

Adds 2 new workflows for GHA:
* linux-docs (on trunk)
* linux-docs-push (on schedule)

linux-docs-push is the only workflow that actually gets access to
credentials so it should be relatively safe.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32767239

Pulled By: seemethere

fbshipit-source-id: 5b100f986cf4023c323f4f96f0fe7942fec49ad2
2021-12-02 15:06:57 -08:00
db5425bcd2 re-enable layer_norm in autodiff (#69007)
Summary:
Turn on layer_norm in autodiff

https://github.com/pytorch/pytorch/issues/67732 should have fixed the previously issue exposed by enabling layer_norm in autodiff.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69007

Reviewed By: soulitzer

Differential Revision: D32699108

Pulled By: eellison

fbshipit-source-id: 6951668c0e74e056d3776294f4e1fd3123c763e5
2021-12-02 14:55:00 -08:00
5b2586fe09 [testing] Ignore expected_regex in assertRaisesRegex for non-native device (#68723)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/29719

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68723

Reviewed By: zou3519

Differential Revision: D32797061

Pulled By: mruberry

fbshipit-source-id: 3bcae6d3d62d180059dbe39be520b0e7f9aea19f
2021-12-02 14:52:27 -08:00
36ba1b6b3a Remove unused _convolution_nogroup op (#68829)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68829

Test Plan: Imported from OSS

Reviewed By: zou3519, albanD

Differential Revision: D32627578

Pulled By: jbschlosser

fbshipit-source-id: 8a4c0ac58aae184a465b1fd40cce880a60d67339
2021-12-02 14:42:08 -08:00
791d5087ed [TensorExpr] Add lowerings for quantized ops: cat, mul, conv1d, relu. (#69055)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69055

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D32710325

Pulled By: ZolotukhinM

fbshipit-source-id: 4a7f0ac059ea238463317b6a45a822b8d05610dd
2021-12-02 14:34:21 -08:00
83c4451f60 [TensorExpr] Add a pass to symbolize an input dimension. (#68857)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68857

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D32632908

Pulled By: ZolotukhinM

fbshipit-source-id: bcee95d83731fcea07ec2f55ed78418ee52f51b6
2021-12-02 14:34:18 -08:00
1e9dcdd2a0 [TensorExpr] TensorExprKernel: support custom-class constants. (#68856)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68856

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D32632907

Pulled By: ZolotukhinM

fbshipit-source-id: e4180f8d791ba0cdf82bcb3bd11b61405c2faadd
2021-12-02 14:34:15 -08:00
48d7d585c8 [TensorExpr] IR Eval: add more logging. (#68855)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68855

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D32632905

Pulled By: ZolotukhinM

fbshipit-source-id: fef9b019d8d5b8a3ffd4075bfac069d1c81f647d
2021-12-02 14:34:12 -08:00
b6bcf5a0f1 [TensorExpr] Un-const TEK::kernel_func_name to allow recompilation. (#68854)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68854

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D32632904

Pulled By: ZolotukhinM

fbshipit-source-id: 154e3802ba844e738f09dbc239cf3656b9f8d5fd
2021-12-02 14:33:02 -08:00
a0367f8980 Revert D32404517: [quant][embedding qat] Support Embedding QAT via FX API
Test Plan: revert-hammer

Differential Revision:
D32404517 (abda069ce2)

Original commit changeset: 0484df8c826b

fbshipit-source-id: 4e7d62b9ccdb84eb4d184cd0b3c9506013fd8336
2021-12-02 14:28:35 -08:00
ec4c749024 Revert D32318435: [quant][embdding qat] Add FX support for QAT EmbeddingBag
Test Plan: revert-hammer

Differential Revision:
D32318435 (4484c04513)

Original commit changeset: 8b5d1a5d5422

fbshipit-source-id: e46d431f92a5c3f86c757695164d1eb5b0041298
2021-12-02 14:27:17 -08:00
8dafe6e147 Forward fix merge conflict (#69319)
Summary:
Forward fixes a merge conflict between two commits

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69319

Reviewed By: seemethere

Differential Revision: D32810884

Pulled By: janeyx99

fbshipit-source-id: 6e2f9fc89d00da979de1430a172673e82c51ba14
2021-12-02 14:05:54 -08:00
52219b1017 Fix ChainedScheduler.get_last_lr() (#69112)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68820

cc vincentqb jbschlosser albanD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69112

Reviewed By: zou3519

Differential Revision: D32796626

Pulled By: albanD

fbshipit-source-id: bde9d4e473527be4c0a7f21cb57f795a67a99eaa
2021-12-02 13:44:12 -08:00
db30696be8 [pytorch][PR] bug fix for D32374003 (#69278)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69278

Test Plan:
```
fbpkg build -E smart.inference_platform_sp.sigrid_predictor.persistent.bolt --yes
```

Reviewed By: kimishpatel, HDCharles

Differential Revision: D32773910

fbshipit-source-id: a2181fea354f310cf9f6f57b802dc4a148627acc
2021-12-02 13:31:19 -08:00
915c26f588 GHA: preserve downloaded JSONs as artifacts (#69258)
Summary:
Preserves the .json files in the test folder for every test job as an artifact.

Going to hud.pytorch.org/pr/69258 and downloading/unzipping any of the `test-jsons-*.zip` shows that .pytorch-slow-tests.json and .pytorch-disabled-tests.json exist. (Though you won't see them in your file manager as they are hidden files.)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69258

Reviewed By: seemethere

Differential Revision: D32807102

Pulled By: janeyx99

fbshipit-source-id: ed1b227cdd32160ed045dd79a7edc55216dcfe53
2021-12-02 13:26:14 -08:00
cafcf599d0 Deprecate torch.triangular_solve (#63570)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63570

There is a use of `at::triangular_solve_out` in the file
`torch/csrc/jit/tensorexpr/external_functions.cpp` that I have not dared
to move to `at::linalg_solve_triangular_out`.

**Deprecation note:**

This PR deprecates the `torch.triangular_solve` function in favor of
`torch.linalg.solve_triangular`. An upgrade guide is added to the
documentation for `torch.triangular_solve`.

Note that it DOES NOT remove `torch.triangular_solve`, but
`torch.triangular_solve` will be removed in a future PyTorch release.

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D32618035

Pulled By: anjali411

fbshipit-source-id: 0bfb48eeb6d96eff3e96e8a14818268cceb93c83
2021-12-02 13:24:55 -08:00
dde801686b Expose MobileCode to python (#66592)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66592

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D31632600

Pulled By: tugsbayasgalan

fbshipit-source-id: 46a7ac20ddb6b433bd037280ed020481901a15c9
2021-12-02 13:18:46 -08:00
bb522c9d7a Enabling CUDA 11.5 for binary builds, Adding windows workflows for CUDA 11.5 (#69262)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68259

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69262

Reviewed By: malfet

Differential Revision: D32804850

Pulled By: atalman

fbshipit-source-id: abac45ad1d49ec7e0e7df6cb9a22a46fbcd905a2
2021-12-02 13:04:43 -08:00
f587267dc7 Revert D31705359: use irange for loops 8
Test Plan: revert-hammer

Differential Revision:
D31705359 (17e5200441)

Original commit changeset: c9ea2fbc0f9c

fbshipit-source-id: 08fff2d12beca953ad30dd0baabf86e39ac84f14
2021-12-02 12:55:08 -08:00
97750e03a4 [torch][edge] Add int to the copy kernel. (#69297)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69297

.

Test Plan: CI

Reviewed By: JacobSzwejbka

Differential Revision: D32799822

fbshipit-source-id: c40fdb55a706b3a8eccaa69dbfbc6d7af0b111e4
2021-12-02 12:13:58 -08:00
7142b0b033 .github: Add linux.large to actionlint.yaml (#69304)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69304

Don't know why this isn't automatically figured out

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: anjali411, atalman, janeyx99

Differential Revision: D32805380

Pulled By: seemethere

fbshipit-source-id: 2c4805f87ae91388a6b605a6394024887b4bc71e
2021-12-02 11:21:49 -08:00
4056251a18 Add missing spaces to an error message (#69289)
Summary:
Before:
`ValueError: InstanceNorm1d returns 0-filled tensor to 2D tensor.This is because InstanceNorm1d reshapes inputs to(1, N * C, ...) from (N, C,...) and this makesvariances 0.`

After:
`ValueError: InstanceNorm1d returns 0-filled tensor to 2D tensor. This is because InstanceNorm1d reshapes inputs to (1, N * C, ...) from (N, C,...) and this makes variances 0.`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69289

Reviewed By: jbschlosser

Differential Revision: D32796035

Pulled By: albanD

fbshipit-source-id: c8e7c5cf6e961ec5f7242b31c7808454104cde02
2021-12-02 11:03:33 -08:00
2ea70a6462 Aloow Union of scalars to be NumberType (#66591)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66591

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D31632599

Pulled By: tugsbayasgalan

fbshipit-source-id: 374065da1d91334a19c15c604faf13ebec1681f6
2021-12-02 10:52:02 -08:00
d673b1ec59 .github: Switch ciflow-should-run to self hosted (#69166)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69166

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32735493

Pulled By: seemethere

fbshipit-source-id: 9a03cf5245d1dbfe1be86cfbb3f5d1d42dd391c8
2021-12-02 10:42:07 -08:00
14ed4df305 [PyTorch][Static Runtime][easy] give to_copy_functor a name (#67701)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67701

I split this out to ease rebasing and review.
ghstack-source-id: 144507288

Test Plan: CI

Reviewed By: hlu1

Differential Revision: D32112523

fbshipit-source-id: dba14e6ada33df02dbcd7025b090a8a18cf438ae
2021-12-02 10:36:26 -08:00
21686923e8 [PyTorch][SR] More debug logging (#67220)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67220

Specifically we log AliasDb and same_storage_values, and are chattier about the aliasing logs in the liveness analysis.
ghstack-source-id: 144507289

Test Plan: Used to help develop D31776259

Reviewed By: hlu1

Differential Revision: D31847561

fbshipit-source-id: 8371455d060c17dace91cd90e4034b7618f820a6
2021-12-02 10:36:23 -08:00
b22e4d4aea [PyTorch][SR] Add more to() tests & extend debug logging in testStaticRuntime (#67219)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67219

I found that these specific test cases were causing different failures when developing D31776259. I also found that it was difficult to debug testStaticRuntime failures, so I added more verbose logs gated behind -v 2.
ghstack-source-id: 144507287

Test Plan: Used during development of D31776259

Reviewed By: hlu1

Differential Revision: D31847566

fbshipit-source-id: ea9147fb246c345d18bbc8d7f3bfba48d3a0fab3
2021-12-02 10:34:54 -08:00
84aa1ddedd [quant] Remove warning for quantized Tensor in __dir__ (#69265)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69265

This is used in tab completion, we should not put warning here

Test Plan:
ci

Imported from OSS

Reviewed By: albanD

Differential Revision: D32778736

fbshipit-source-id: f1bec5e09a8238ab41329ac2b64e6f3267799f6a
2021-12-02 10:30:36 -08:00
17e5200441 use irange for loops 8 (#66743)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66743

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D31705359

fbshipit-source-id: c9ea2fbc0f9cd29e97a52dcb203addc5f2abb09b
2021-12-02 10:21:29 -08:00
ff3fc37267 [BE] rewrite ProcessGroupNCCLTest to be MultiProcess (#67705)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67705

This PR rewrites ProcessGroupNCCLTest to be MultiProcessTestCase. It was originally written in a single process multi-GPU fashion, we change it to multi-process instead to align with other c10d tests.
ghstack-source-id: 144555092

Test Plan: wait for CI

Reviewed By: pritamdamania87, fduwjj

Differential Revision: D32113626

fbshipit-source-id: 613d36aeae36bf441de1c2c83aa4755f4d33df4d
2021-12-02 10:12:05 -08:00
5c816520b3 ns for fx: fix bug in graph matcher (#69238)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69238

The NS for FX graph matcher was not properly taking into account
seen_nodes, this allowed a node to be matched twice.

Test Plan:
FB-only testing on real model passes.

Ideally we would have a test case to capture this, but hopefully we can land this soon to unblock production work.

Imported from OSS

Reviewed By: HDCharles

Differential Revision: D32765761

fbshipit-source-id: ed3dff8fd981e399a649fcd406883b4d56cc712a
2021-12-02 09:59:57 -08:00
698c35e743 Add functorch TLS to ATen/ThreadLocalState (#69181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69181

functorch lives out-of-tree. However, it has some TLS that needs to be
propagated. The solution for that is we store a pointer to the TLS
inside pytorch/pytorch and extend FuncTorchTLSBase inside functorch to
include whatever functorch needs.

A previous solution used ThreadLocalDebugInfo. However, all
PyTorch-managed threads (e.g. spawned by Autograd) all receive a
shared_ptr that points to the same ThreadLocalDebugInfo. This leads to
race conditions if the multiple threads start modifying the TLS
stored within ThreadLocalDebugInfo without using mutexes.

Test Plan:
- tested with functorch
- The performance impact of this change when functorch is not used is
negligible because we end up manipulating nullptrs.

Reviewed By: albanD

Differential Revision: D32742312

Pulled By: zou3519

fbshipit-source-id: 1a8439a4af06b3d3e50b9a2dbca98a0ba612062a
2021-12-02 09:29:55 -08:00
0de7a618a3 functionalization: update is_aliased() logic (#68881)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68881

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32647614

Pulled By: bdhirsh

fbshipit-source-id: 6bec50d3e54419d1707d0b6c0c6729bcc1ced1f0
2021-12-02 09:19:12 -08:00
4484c04513 [quant][embdding qat] Add FX support for QAT EmbeddingBag (#68121)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68121

Add FX support for QAT EmbeddingBag operator, previously only eager mode support.

Test Plan:
pytest test/quantization/fx/test_quantize_fx.py  -v -k "test_qat_embeddingbag_linear"

Imported from OSS

Reviewed By: supriyar

Differential Revision: D32318435

fbshipit-source-id: 8b5d1a5d5422972c49676f9e470d5fbe29dd503b
2021-12-02 09:05:07 -08:00
78ab3cde4a Do not modify type map from getCustomClassTypeImpl() (#69261)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69261

As this function is supposed to be called only once per type from
caching getCustomClassType template

Test Plan: Imported from OSS

Reviewed By: suo, lw

Differential Revision: D32776564

Pulled By: malfet

fbshipit-source-id: 218436657e6ad5ad0c87964857114d1e60c57140
2021-12-02 08:53:09 -08:00
113684cf81 Fix crash in checkCustomClassType if arg is null (#69259)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69259

Otherwise `checkCustomClassType(nullptr, new Type())` will crash

Test Plan: Imported from OSS

Reviewed By: lw

Differential Revision: D32775297

Pulled By: malfet

fbshipit-source-id: 54b10fd395d734c615dcaf85a5e599a445cee663
2021-12-02 08:51:59 -08:00
668574af4a Add efficient zero tensors (#64837)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64837

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32144240

Pulled By: anjali411

fbshipit-source-id: d44096d882657c7f9270a16636900e0b73cefa40
2021-12-02 08:47:45 -08:00
abda069ce2 [quant][embedding qat] Support Embedding QAT via FX API (#68296)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68296

Support QAT workflow by using torch.fx QAT API.  e.g. `prepare_qat_fx` and `convert_fx`.

Test Plan:
`pytest test/quantization/fx/test_quantize_fx.py -v -k "test_qat_embedding_linear"`

Imported from OSS

Reviewed By: jingsh, supriyar

Differential Revision: D32404517

fbshipit-source-id: 0484df8c826b823b60dfecd9def77bf8cffe0527
2021-12-02 08:42:45 -08:00
3157371bb4 [quant][embedding qat] Fix bug enforcing quant_min <= zero_point <= quant_max for float zeropoint (#68852)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68852

When using a float zero_point in FakeQuant, such as for embeddings, it does not need to be between
quant_min and quant_max, as is enforced for integer zero_points.

This is because float zero_points are formulated as per:

```
xq = Round(Xf * inv_scale + zero_point),
Xq = Round((Xf - min) * inv_scale)
```

Test Plan:
pytest test/test_quantization.py -v -k "test_fake_quant_per_channel_qparam_range"

Imported from OSS

Reviewed By: supriyar

Differential Revision: D32645014

fbshipit-source-id: 96dc3ca6eef9cee60be6919fceef95c9f2759891
2021-12-02 07:58:03 -08:00
397183f44c Add Lazy Tensor codegen infra (#69020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69020

Merges the lazy tensor codegen infra which has already been used on lazy_tensor_staging.

Test Plan: Test via lazy_tensor_staging branch

Reviewed By: alanwaketan, bdhirsh

Differential Revision: D32570613

fbshipit-source-id: 2cd5698644398bda69669683f8de79fd3b6639b5
2021-12-02 07:51:52 -08:00
28c519961f Follow the undefined Tensor <-> None rule better in torch dispatch (#67793)
Summary:
As per title. This in particular allows to more easily override backward function for which the underlying backend returns `None`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67793

Reviewed By: zou3519

Differential Revision: D32242962

Pulled By: albanD

fbshipit-source-id: 6e114def90ee9499161e1303d301ba7fd003ff89
2021-12-02 07:46:56 -08:00
0465f64bb8 [DataPipe] Adding BatcherMapDataPipe (#68197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68197

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32440963

Pulled By: NivekT

fbshipit-source-id: 277cbe8d735afe341a7c189be20e1d334ecf9d4a
2021-12-02 07:27:17 -08:00
00ebbd5ef6 Revert D32010095: [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer
Test Plan: revert-hammer

Differential Revision:
D32010095 (41d35dc201)

Original commit changeset: d763b0557780

fbshipit-source-id: bf746a0389135c9f5f67f00f449435ce08fb5f6d
2021-12-02 06:41:40 -08:00
ed3b73fd4d [Static Runtime] Skip ProcessedNode:: verify_no_memory_overlap() for out variants (#68639)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68639

Fix all problems related to `ProcessedNode:: verify_no_memory_overlap()`
- Only enable this check for native and fallback ops that are not inplace or view ops
- Enable ProcessedNode:: verify_no_memory_overlap() in debug mode and enforce it
- Add gflag --static_runtime_disable_debug_memory_overlap_check to test the runtime memory overlap fix for bad schemas

fb::expand_dims's schema was not correct after this check is re-enabled. It's fixed in D32556204 (39ab417107)

Reviewed By: mikeiovine

Differential Revision: D32553708

fbshipit-source-id: 88de63cdf1ee4f87b7726c8b65a11a5fb8a99d13
2021-12-02 05:03:12 -08:00
c60232d89a [shard] add back init_from_local_shard_and_global_metadata API (#69226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69226

This add back the previous init_from_local_shards API, but renamed it to init_from_local_shard_and_global_metadata. It's a partial revert of D32147888 (35712a8eb4). We now provide two APIs:
1. `init_from_local_shards`: user don't need to provide global metadata and we do all_gather under the hood, the other that
2. `init_from_local_shards_and_global_metadata`: user need to explicitly construct ShardedTensorMetadata to use this API, need to ensure correctness on all ranks, as there's no cross-rank communication/validations.

All of these two APIs stay private until it stablizes and proof of UX. The second one can only be called on `ShardedTensor` class directly, not included as a package API for now.

Test Plan:
test_init_from_local_shards_and_global_metadata
test_init_from_local_shards_and_global_metadata_invalid_shards

Reviewed By: dstaay-fb, pritamdamania87

Differential Revision: D32746882

fbshipit-source-id: bafd26ce16c02e2095907f9e59984a5d775c7df5
2021-12-02 01:02:56 -08:00
12621c3a39 support pure fp16 training in FSDP (#68417)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68417

1. since parameter attributes are lazily initialized at the beginning of forward, it makes more sense to init full_param_padded using parameters' data type during lazy_init, instead of using parameters' data type during construction, as parameters' data type may be changed after construction and before training loop
2.add checking whether parameter storage is changed outside FSDP and handle it properly
ghstack-source-id: 144479019

Test Plan: unit tests

Reviewed By: rohan-varma

Differential Revision: D32458643

fbshipit-source-id: 0e07e5e08270f2e265e8f49124a6648641e42e7a
2021-12-02 00:27:45 -08:00
41d35dc201 Add ability for a mobile::Module to save as flatbuffer (#67351)
Summary:
Included functions:

* save_mobile_module -> saves a mobile::Module to flatbuffer
* load_mobile_module_from_file -> loads a flatbuffer into mobile::Module
* parse_mobile_module -> parses from bytes or deserialized flatbuffer
      Module object

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67351

Reviewed By: iseeyuan

Differential Revision: D32010095

Pulled By: qihqi

fbshipit-source-id: d763b0557780f7c2661b6485105b045e41a5e8f1
2021-12-01 23:58:15 -08:00
40fb28ea87 [JIT] Compute input sym shapes in partial eval graph (#68281)
Summary:
Needed for NNC dynamic shape fusion. Previously, when creating a partially evaluated graph for symbolic shape compute, if the input wasn't used, we wouldn't compute it, which led to failures when NNC expected this value to be passed in.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68281

Reviewed By: navahgar

Differential Revision: D32401365

Pulled By: eellison

fbshipit-source-id: 97a684e5f1faed5df77c8fd69f9623cdba0781f9
2021-12-01 22:33:35 -08:00
d8a44270d6 [DataPipe] Simplify BatcherIterDataPipe by removing 'unbatch_level' argument and functionality (#68594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68594

Based on my conversation with ejguan [here](https://github.com/pytorch/pytorch/pull/68197#pullrequestreview-809148827), we both believe that having the `unbatch_level` argument and functionality is making this DataPipe unnecessarily complicated, because users can call `.unbatch` before `.batch` if they would like to do so. That will likely be cleaner as well.

I also checked other libraries (for example, [TensorFlow](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#unbatch)), and I do not see them provide the ability the `unbatch` within the `batch` function either.

This PR simplifies the DataPipe by removing the argument.

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D32532594

Pulled By: NivekT

fbshipit-source-id: 7276ce76ba2a3f207c9dfa58803a48e320adefed
2021-12-01 22:00:31 -08:00
ad182479b0 [deploy] docs (#69251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69251

This adds some actual documentation for deploy, which is probably useful
since we told everyone it was experimentally available so they will
probably be looking at what the heck it is.

It also wires up various compoenents of the OSS build to actually work
when used from an external project.

Differential Revision:
D32783312
D32783312

Test Plan: Imported from OSS

Reviewed By: wconstab

Pulled By: suo

fbshipit-source-id: c5c0a1e3f80fa273b5a70c13ba81733cb8d2c8f8
2021-12-01 21:55:18 -08:00
cbe0a38d8c Back out "[CUDA Pinned Memory] Event recording with non-blocking copies should track the storage context, not the tensor data pointer" (#69193)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69193

Reviewed By: xing-liu, yuchenhao

Differential Revision: D32748570

fbshipit-source-id: bd73d7567f94c70daeace49d4081381b8adf2d77
2021-12-01 19:30:08 -08:00
929f2a750a Back out "[CUDA Pinned Memory] Alternative implementation of pinned memory allocator focusing on multi-threaded scalability" (#69191)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69191

Reviewed By: xing-liu, yuchenhao

Differential Revision: D32748466

fbshipit-source-id: 6abd3265e8a20270305da3f8be25114ad4d67fc2
2021-12-01 19:28:57 -08:00
370d0afc1b Strided masked var. (#68738)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68738

Test Plan: Imported from OSS

Reviewed By: davidberard98

Differential Revision: D32767155

Pulled By: cpuhrsch

fbshipit-source-id: a5c095103405fbfc28b9f4fd624bdbbc45e7f715
2021-12-01 19:19:37 -08:00
291e56eda4 [Pytorch Edge] Update Black Box Api with operator versioning (#68678)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68678

Test Plan: Ill update the unit test before land

Reviewed By: cccclai

Differential Revision: D32573603

fbshipit-source-id: 19271bcbb68b61d24d6943e61a943f4f75fddb5d
2021-12-01 19:13:32 -08:00
b9738e923e [Operator Versioning][Edge] Add old models and unittest (#67726)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67726

1. Check in one model with aten:div_tensor old op with unittest in both cpp and python. The following two lines are commented out and expected to work after using upgrader.
```
_helper(mobile_module_v2, div_tensor_0_3)
_helper(current_mobile_module, torch.div)
```

2. Update the commented code accordingly.

Currently there are 6 upgraders. The following old models with operators are added to cover these 6 upgraders:
```
// Tensor x Tensor

test_versioned_div_tensor_v3

// Tensor x Scalar

test_versioned_div_scalar_float_v3
test_versioned_div_scalar_reciprocal_int_v3
test_versioned_div_scalar_inplace_float_v3

// Scalar x Scalar

test_versioned_div_scalar_scalar_v3

// Tensor x Tensor with out kwarg

test_versioned_div_tensor_out_v3

// Tensor x Tensor inplace

test_versioned_div_tensor_inplace_v3

// Tensor x Scalar inplace

test_versioned_div_scalar_inplace_int_v3

```
Note:
In this pr, per model, it includes the following test:
1. Model (with old op) load/run test will be in both cpp and python
2. Model (with old op) + upgrader test will be in python
Other tests considered adding:
1. per upgrader bytecode test
2. app level integration test
ghstack-source-id: 144422418

Test Plan: CI and the added unittest

Reviewed By: iseeyuan

Differential Revision: D32069653

fbshipit-source-id: 96d9567088a1f709bc7795f78beed7a308e71ca9
2021-12-01 18:46:30 -08:00
124bb6a19d RegisterDispatchKey.cpp: remove redundant code (#68983)
Summary:
remove the line since line 10 has already included this header file

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68983

Reviewed By: samdow

Differential Revision: D32706952

Pulled By: soulitzer

fbshipit-source-id: 98746e12d8d04d64ee2e0449e4aec5153ac723d5
2021-12-01 18:38:19 -08:00
fced51eaf7 [torch][distributed] Check for file existence before invoking cleanup logic in FileStore destructor (#68603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68603

FileStore is frequently used from the python lang, which has GC. This means, that users of FileStore from python do not have control over when FileStore destructor is invoked. If the directory for file store is created by some external logic, that has cleanup procedure, this procedure may have a race condition with the logic in the FileStore destructor.

The diff adds check for file access in destructor before actually invoking the cleanup. In long term, it makes sense to move out the cleanup logic out of the destructor to a separate method.

Test Plan:
CI

Stress tests: `buck test mode/dev-nosan //torchrec/examples/dlrm/tests:test_dlrm_main -- --exact 'torchrec/examples/dlrm/tests:test_dlrm_main - torchrec.examples.dlrm.tests.test_dlrm_main.MainTest: test_main_function' --run-disabled --jobs 18 --stress-runs 20 --record-results`

Reviewed By: colin2328

Differential Revision: D32535470

fbshipit-source-id: 6f421f2e7b0d9ac9c884a1db2f7e5a94fc59fc0e
2021-12-01 16:43:42 -08:00
3c1e2ff9eb fixing layer_norm cuda bug (#69210)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/69208

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69210

Reviewed By: H-Huang

Differential Revision: D32764811

Pulled By: ngimel

fbshipit-source-id: fb4201fe5f2284fcb22e36bc1029eef4a21b09bf
2021-12-01 15:46:47 -08:00
d72d476875 [pyper] add flag to disable clip_ranges_gather fusions (#69198)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69198

add flag --enable_clip_ranges_gather_fusions to disable clip_ranges+gather_ranges fusions.

This fusion happens in static runtime, and it also happens in jit when optimize_sparse_nn_model is used.

Note that clip_ranges+gather_ranges+sigrid_hash fusions use different code that was untouched by D30515441 (01b30922dd), so not disabling it for now.
This also effectively disables ClipRangesGatherSigridHash(graph) (even though it's not explicitly included), because that fusion lookgs for the clip_ranges_gather_lengths_to_offsets fusion, which won't exist if this flag is on

Test Plan:
Run ptvsc2_predictor_bench with --enable_clip_ranges_gather_fusions=0 and SR=1
```
Input size: 211
Static runtime ms per iter: 11.9668. Iters per second: 83.5643
Time per node type:
        6.42796 ms.    54.5663%. static_runtime::fused_variadic_sigrid_transforms_torch_bind (1 nodes, out variant)
        1.64969 ms.    14.0041%. fb::quantized_linear (9 nodes, out variant)
       0.475394 ms.    4.03557%. fb::clip_ranges_gather_sigrid_hash_precompute_v3 (158 nodes, out variant)
       0.367554 ms.    3.12013%. aten::argmin (1 nodes, out variant)
       0.358351 ms.    3.04201%. aten::matmul (1 nodes, out variant)
       0.215082 ms.    1.82581%. static_runtime::to_copy (805 nodes, out variant)
       0.214397 ms.    1.81999%. fb::gather_ranges (313 nodes, out variant)
       0.179759 ms.    1.52595%. fb::offsets_to_ranges (655 nodes, out variant)
       0.173236 ms.    1.47058%. fb::lengths_to_offsets (464 nodes, out variant)
       0.151249 ms.    1.28394%. aten::sub (1 nodes, out variant)
        0.14017 ms.    1.18989%. aten::sigmoid (3 nodes, out variant)
       0.136118 ms.    1.15549%. aten::mul (5 nodes, out variant)
       0.130813 ms.    1.11046%. aten::sum (3 nodes, out variant)
       0.124876 ms.    1.06006%. aten::repeat (1 nodes, out variant)
        0.12191 ms.    1.03488%. static_runtime::signed_log1p (1 nodes, out variant)
      0.0922349 ms.   0.782972%. aten::norm (1 nodes, out variant)
      0.0877845 ms.   0.745193%. aten::pow (1 nodes, out variant)
      0.0783335 ms.   0.664966%. fb::batch_box_cox (1 nodes, out variant)
      0.0755047 ms.   0.640951%. fb::clip_ranges (311 nodes, out variant)
      0.0702456 ms.   0.596308%. static_runtime::layer_norm (2 nodes, out variant)
      0.0696762 ms.   0.591475%. fb::quantize_per_tensor (4 nodes)
      0.0556873 ms.   0.472724%. quantized::embedding_bag_byte_prepack (3 nodes, out variant)
      0.0555237 ms.   0.471335%. prim::VarConcat (2 nodes, out variant)
      0.0437336 ms.    0.37125%. static_runtime::dict_unpack (2 nodes, native)
      0.0390592 ms.    0.33157%. static_runtime::dequantize_copy (9 nodes, out variant)
      0.0385823 ms.   0.327521%. fb::concat_add_mul_replacenan_clip (1 nodes, out variant)
      0.0321869 ms.   0.273231%. prim::TupleConstruct (1 nodes, out variant)
      0.0308289 ms.   0.261703%. fb::casted_batch_one_hot_lengths (1 nodes, out variant)
      0.0280272 ms.    0.23792%. static_runtime::reshape_copy (2 nodes, out variant)
      0.0244705 ms.   0.207727%. fb::sigrid_hash_precompute (1 nodes, out variant)
       0.020917 ms.   0.177562%. static_runtime::VarTupleUnpack (1 nodes, native)
      0.0175842 ms.   0.149271%. aten::div (1 nodes, out variant)
      0.0169989 ms.   0.144302%. aten::narrow_copy (4 nodes, out variant)
     0.00818147 ms.  0.0694517%. aten::logit (1 nodes, out variant)
     0.00719822 ms.   0.061105%. prim::VarStack (1 nodes, out variant)
     0.00687292 ms.  0.0583435%. aten::add (1 nodes, out variant)
     0.00328646 ms.  0.0278985%. aten::clamp_min (1 nodes, out variant)
     0.00325073 ms.  0.0275951%. static_runtime::expand_dims_copy (1 nodes, out variant)
     0.00295617 ms.  0.0250946%. static_runtime::flatten_copy (1 nodes, out variant)
     0.00230511 ms.  0.0195679%. aten::expand_as (1 nodes, native)
     0.00182061 ms.   0.015455%. aten::full_like (1 nodes, out variant)
    0.000268152 ms. 0.00227631%. prim::ListConstruct (1 nodes, out variant)
        11.7801 ms. in Total
```

Servicelabs:
AF: https://www.internalfb.com/intern/servicelab/1001770528/
AI: https://www.internalfb.com/intern/servicelab/402342245/
Prospector: https://www.internalfb.com/intern/servicelab/502342630/

Reviewed By: movefast1990

Differential Revision: D32750847

fbshipit-source-id: b809a72a9fbeea86080346962eb17761e71397d8
2021-12-01 15:26:36 -08:00
263125a962 Fix RAdam docstring on LR default value (#69186)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69186

Reviewed By: albanD

Differential Revision: D32759614

Pulled By: H-Huang

fbshipit-source-id: b11819c50156a538cd6003e9cddde0390c853f67
2021-12-01 14:32:07 -08:00
3bf4080fd9 Change misleading MaxUnpool2d example to better demonstrate output_size usage (#68936)
Summary:
At https://github.com/pytorch/pytorch/issues/68873, jbschlosser states that maxunpool2d with the `output_size` argument only works for indices of the same size. This makes sense, but unfortunately it's not what's shown in the example! I've removed the wrong example and replaced it with one where specifying `output_size` is actually necessary -- the unpool call fails without it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68936

Reviewed By: H-Huang

Differential Revision: D32759207

Pulled By: jbschlosser

fbshipit-source-id: 658e1724150a95454a05a771ae7c6e2e736740a7
2021-12-01 14:11:26 -08:00
2eef5e76db add extra_repr for nn.ZeroPad2d (#69206)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/69205

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69206

Reviewed By: H-Huang

Differential Revision: D32759597

Pulled By: jbschlosser

fbshipit-source-id: abc9ee69fb5e22d45a640993a4e598b016020688
2021-12-01 13:53:19 -08:00
cd043c335f Revert D32329330: [JIT] Separate GPU implementation of frozen_conv_add_relu_fusion.cpp
Test Plan: revert-hammer

Differential Revision:
D32329330 (cfc75c2137)

Original commit changeset: c0f10da4b954

fbshipit-source-id: e81f93a5c1e2bb9b20fde6ccaeef143472a5b900
2021-12-01 12:55:10 -08:00
e6c435bf96 [LTC] Upstream helpers for c10::Device <=> BackendDevice (#69064)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69064

This commit upstreams helpers for converting a c10::Device to
BackendDevice and vice versa.

Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.FromAten:BackendDeviceTest.ToAten

Reviewed By: wconstab

Differential Revision: D32732607

Pulled By: alanwaketan

fbshipit-source-id: 0dd233d37a4a30fc4b22dba322ddd85d4cb3635b
2021-12-01 12:15:32 -08:00
92f168941e remove accidentally committed redundant debug print (#68510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68510

remove accidentally committed redundant debug print
ghstack-source-id: 144362817

Test Plan: unit tests

Reviewed By: rohan-varma

Differential Revision: D32487736

fbshipit-source-id: 279030f782e6b716a6bbfd591c5ce761de3ddd63
2021-12-01 11:35:34 -08:00
1842364b30 Strided masked normalize. (#68694)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68694

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D32724552

Pulled By: cpuhrsch

fbshipit-source-id: 82f579a86b0b265e0b9b3715a8a327b775dd55e1
2021-12-01 10:45:16 -08:00
23633bdb5c record the datapipe for each pieces of Dataset (#67613)
Summary:
Add record_function for each DataPipe.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67613

Reviewed By: H-Huang

Differential Revision: D32246672

Pulled By: ejguan

fbshipit-source-id: 02ef7e75748c5b84fdcbb103398532e1f2962fbf
2021-12-01 10:29:06 -08:00
deaf745aee Add kl divergence between normal and laplace distribution. (#68807)
Summary:
Fixes [https://github.com/pytorch/pytorch/issues/68746]
![KL_normal_laplace](https://user-images.githubusercontent.com/35850237/143008244-f304cee1-9583-4de1-b0d0-5751ebdb8188.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68807

Reviewed By: H-Huang

Differential Revision: D32750391

Pulled By: neerajprad

fbshipit-source-id: 129e6ef60d6e244d0d6b02b3944bfd5d8b06edcb
2021-12-01 10:22:08 -08:00
486ae5c733 Dataset & IterableDataset attribute errors prints attribute (#69021)
Summary:
The message is the message from a standard attribute error.
Thought it would be informative when the error is thrown.
Alternatively in python 3.10, one can set the keyword arguments 'name' and 'obj',
reference: https://github.com/python/cpython/blob/3.10/Doc/library/exceptions.rst#concrete-exceptions

Fixes #{?}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69021

Reviewed By: samdow

Differential Revision: D32730362

Pulled By: ejguan

fbshipit-source-id: 7132ba612fa6075aeffb9315ce651828e9a8e0bc
2021-12-01 10:16:31 -08:00
d507fd63f3 Check that block height and width are positive in nn.Fold (#69048)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68875

cc albanD mruberry jbschlosser walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69048

Reviewed By: samdow

Differential Revision: D32729307

Pulled By: jbschlosser

fbshipit-source-id: 162cafb005873012d900d86997d07640967038c0
2021-12-01 10:08:47 -08:00
c08e95dd9c Introduce IS_LINUX and IS_MACOS global vars (#69093)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69093

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D32730080

Pulled By: malfet

fbshipit-source-id: aa3f218d09814b4edd96b01c7b57b85fd58c47fc
2021-12-01 09:47:38 -08:00
840fe8e4e6 Fix MacOS artifact upload (#69188)
Summary:
Add test shard number and runner name to the test name suffix
Otherwise test report names for shard 1 and shard 2 will be identical
and override each other

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69188

Reviewed By: janeyx99

Differential Revision: D32747747

Pulled By: malfet

fbshipit-source-id: 149f921d8e420d3ed69ce812bdcd3c034799353a
2021-12-01 08:06:48 -08:00
f9e69af22e Modify LU_backward and lu_solve_backward to use linalg_solve_triangular (#63569)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63569

This PR also rewrites `lu_solve_backward` from scratch going from
solving 5 systems of equations to just 2.

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D32618014

Pulled By: anjali411

fbshipit-source-id: 0e915bcf7045a4db43ffd076d807beac816c8538
2021-12-01 07:34:38 -08:00
478069d6f2 Remove duplicate .DS_Store in gitignore (#68981)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68981

Reviewed By: samdow

Differential Revision: D32707039

Pulled By: soulitzer

fbshipit-source-id: 346f0f3de583d995be34c252db4f9f26cd574ba8
2021-12-01 07:28:33 -08:00
e5e0c19882 OpInfo : embedding_bag (#67252)
Summary:
Adds OpInfo for `embedding_bag`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67252

Reviewed By: VitalyFedyunin

Differential Revision: D32462157

Pulled By: zou3519

fbshipit-source-id: 70303349a718720c4fa47519fa94ae900e052939
2021-12-01 07:00:50 -08:00
1da1707568 Sparse: Implement simple unary ufuncs operators (#68887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68887

Closes #46988, closes #46987, closes #46761

By "simple" I mean operators that map 0->0 so we can implement it by
just re-dispatching on the values tensor. That does mean we have `sin`
but not `cos` for example, but without fill value support this is the
best that can be done.

Most of these don't support autograd because the derivative formulas
use unsupported operators.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32734911

Pulled By: cpuhrsch

fbshipit-source-id: 203ab105799f3d2d682b01ca3d6b18e7c994776a
2021-12-01 05:43:19 -08:00
afff381824 Automated submodule update: tensorpipe (#69089)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe).

New submodule commit: ed4bbe52b7

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69089

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: lw

Differential Revision: D32725534

fbshipit-source-id: 73b1e0f67c957ca0220cd47179dd4b350a98fd33
2021-12-01 02:29:18 -08:00
a23d1036ab Add ops for BI (mean) (#68826)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68826

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D32732465

Pulled By: eellison

fbshipit-source-id: e8b185d89e5ecbe5c8e09d576c84a1f0a402a5e0
2021-12-01 00:45:00 -08:00
19b87292fc Add TE fuser ops (#68825)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68825

Factoring out the elementwise ops in tensorexpr fuser and adding their corresponding shape functions, since we need shape functions to fuse them with dynamic shapes

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D32732466

Pulled By: eellison

fbshipit-source-id: 69cacf6fbed8eb97e475f5d55b2eec0384fe8ec1
2021-12-01 00:43:42 -08:00
7fad758e02 [FSDP] AutoWrap Main API (#68155)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68155

Per title
ghstack-source-id: 144398229

Test Plan: CI

Reviewed By: pbelevich, mrshenli

Differential Revision: D32327954

fbshipit-source-id: 36bdf06c1c50932a93acbfa97017c549fa490a6c
2021-12-01 00:16:38 -08:00
999e52a795 [FileStore] log timeout in err msg (#69167)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69167

Per title
ghstack-source-id: 144378083

Test Plan: Ci

Reviewed By: H-Huang

Differential Revision: D32736119

fbshipit-source-id: f37fd3e4ac393c07eb8bd1f9202841d33d0a8aad
2021-11-30 23:29:09 -08:00
845a82b635 Debug positive definite constraints (#68720)
Summary:
While implementing https://github.com/pytorch/pytorch/issues/68644,
during the testing of 'torch.distributions.constraint.positive_definite', I found an error in the code: [location](c7ecf1498d/torch/distributions/constraints.py (L465-L468))
```
class _PositiveDefinite(Constraint):
    """
    Constrain to positive-definite matrices.
    """
    event_dim = 2

    def check(self, value):
        # Assumes that the matrix or batch of matrices in value are symmetric
        # info == 0 means no error, that is, it's SPD
        return torch.linalg.cholesky_ex(value).info.eq(0).unsqueeze(0)
```

The error is caused when I check the positive definiteness of
`torch.cuda.DoubleTensor([[2., 0], [2., 2]])`
But it did not made a problem for
`torch.DoubleTensor([[2., 0], [2., 2]])`

You may easily reproduce the error by following code:

```
Python 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> const = torch.distributions.constraints.positive_definite
>>> const.check(torch.cuda.DoubleTensor([[2., 0], [2., 2]]))
tensor([False], device='cuda:0')
>>> const.check(torch.DoubleTensor([[2., 0], [2., 2]]))
tensor([True])
```
The cause of error can be analyzed more if you give 'check_errors = True' as a additional argument for 'torch.linalg.cholesky_ex'.
It seem that it is caused by the recent changes in 'torch.linalg'.
And, I suggest to modify the '_PositiveDefinite' class by using 'torch.linalg.eig' function like the below:

```
class _PositiveDefinite(Constraint):
    """
    Constrain to positive-definite matrices.
    """
    event_dim = 2

    def check(self, value):
        return (torch.linalg.eig(value)[0].real > 0).all(dim=-1)
```

By using above implementation, I get following result:
```
Python 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> const = torch.distributions.constraints.positive_definite
>>> const.check(torch.cuda.DoubleTensor([[2., 0.], [2., 2.]]))
tensor(True, device='cuda:0')
>>> const.check(torch.DoubleTensor([[2., 0.], [2., 2.]]))
tensor(True)
```

FYI, I do not know what algorithm is used in 'torch.linalg.eig' and 'torch.linalg.cholesky_ex'. As far as I know, they have same time complexity generally, O(n^3). It seems that in case you used special algorithms or finer parallelization, time complexity of Cholesky decomposition may be reduced to approximately O(n^2.5). If there is a reason 'torch.distributions.constraints.positive_definite' used 'torch.linalg.cholesky_ex' rather than 'torch.linalg.eig' previously, I hope to know.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68720

Reviewed By: samdow

Differential Revision: D32724391

Pulled By: neerajprad

fbshipit-source-id: 32e2a04b2d5b5ddf57a3de50f995131d279ede49
2021-11-30 22:27:27 -08:00
8586f374bc [Pytorch Edge] Get Operator Version from model file (#68677)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68677

Using in compatibility apis. Luckily the stream reader kinda just does this already so mostly just create a wrapper in our compatibility files

Test Plan: ci

Reviewed By: cccclai

Differential Revision: D32573132

fbshipit-source-id: 86331c03a1eebcd86ed29b9c6cd8a8fd4fe79949
2021-11-30 21:10:21 -08:00
219db3b4e1 Add OpInfo for torch.linalg.tensorsolve (#68810)
Summary:
This PR adds an OpInfo entry for tensorsolve function.
The keyword argument is different from NumPy so a lambda function is needed to be passed to `ref=`.
I had to change the dtypes for `test_reference_testing` because NumPy does computation internally using double for all linear algebra functions and maybe for some other functions. Using `torch.float64` and `torch.complex128` is more reliable for NumPy comparisons.

cc mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68810

Reviewed By: soulitzer

Differential Revision: D32696065

Pulled By: mruberry

fbshipit-source-id: a4305065d3e7d0097503dc05938b3c4784e14996
2021-11-30 20:31:12 -08:00
b05237f5e4 [Pytorch Edge] Add bool to copy kernel (#69106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69106

this kernel sucks.

Test Plan: ci

Reviewed By: shoumikhin, cccclai

Differential Revision: D32729888

fbshipit-source-id: c747d4bf3d5233c8ed15dba5e2c2d244ba7d4b3f
2021-11-30 19:45:42 -08:00
e534c5efd7 CMake: Include instead of copying cpu kernel files (#67656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67656

Currently, each cpu kernel file is copied into the build folder 3 times to give them different compilation flags. This changes it to instead generate 3 files that `#include` the original file. The biggest difference is that updating a copied file requires `cmake` to re-run, whereas include dependencies are natively handled by `ninja`.

A side benefit is that included files show up directly in the build dependency graph, whereas `cmake` file copies don't.

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D32566108

Pulled By: malfet

fbshipit-source-id: ae75368fede37e7ca03be6ade3d4e4a63479440d
2021-11-30 19:13:53 -08:00
f6f1b580f8 Fix mypy in cpp_extension.py (#69101)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69101

Test Plan: Imported from OSS

Reviewed By: atalman, janeyx99

Differential Revision: D32730081

Pulled By: malfet

fbshipit-source-id: 76ace65b51850b74b175a3c4688c05e107873e8d
2021-11-30 16:01:55 -08:00
6953b7e269 [BE] Fix mypy local run on MacOS (#69097)
Summary:
Unversioned python invocations should not be used, as it can be aliased to Python-2
Also invoke mypy as `python3 -mmypy` as binary aliases are not always available for user installation

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69097

Reviewed By: janeyx99

Differential Revision: D32729367

Pulled By: malfet

fbshipit-source-id: 7539bd0af15f97eecddfb142dba7de7f3587083d
2021-11-30 15:52:23 -08:00
aa2163eba5 .github: Add linux.large instance type (#69165)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69165

We're hitting hard concurrency limits for built in github runners so
let's use our own runners and make them non-ephemeral so they'll have
basically constant uptime

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: atalman

Differential Revision: D32735494

Pulled By: seemethere

fbshipit-source-id: c042c6f0fb23fd50acef312d96b0c89d02c93270
2021-11-30 14:45:51 -08:00
e60fd10659 [fbgemm] remove assumption number of rows is in 32 bit (#69066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69066

Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/781

And remove unnecessary looping inside parallel_for despite fbgemm routines support batching multiple rows

Test Plan: CI

Reviewed By: dskhudia, jianyuh

Differential Revision: D32715453

fbshipit-source-id: 33c3e72f51c8ff5d02dafab4a8947d1230c2d551
2021-11-30 13:38:53 -08:00
ef7ed082ec [PyTorch] Remove StringView from RecordFunction implementation [2/2] (#68411)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68411

Avoids heap-allocating a std::string instance in before() each time even if it's not going to be used.
ghstack-source-id: 144287655

Test Plan:
Run //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark before/after this diff with arguemnts --stressTestRecordFunction --op empty

Before: P467922606
After: P467922626

Reviewed By: chaekit

Differential Revision: D32453846

fbshipit-source-id: 18e1b482dbf5217add14cbaacd447de47cb5877b
2021-11-30 13:22:27 -08:00
1d84d8c5d8 [PyTorch] Remove StringView from RecordFunction interface (1/2) (#68410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68410

First step toward not heap-allocating a string in RecordFunction::before() every time
ghstack-source-id: 144287654

Test Plan: CI

Reviewed By: chaekit

Differential Revision: D32453847

fbshipit-source-id: 080d95095fb568287b65fcc41a4ca6929b5f9a87
2021-11-30 13:20:08 -08:00
22690c2cb6 Use cub::FutureValue to simplify 64bit indexing split of cub scan (#66711)
Summary:
https://github.com/NVIDIA/cub/pull/305 has landed to cub 1.15. This is ready to review and land. This PR contains https://github.com/pytorch/pytorch/pull/66219, please land that PR first before review.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66711

Reviewed By: soulitzer

Differential Revision: D32698306

Pulled By: ngimel

fbshipit-source-id: 4cc6b9b24cefd8932f4d421c6d64ea20ea911f52
2021-11-30 13:15:36 -08:00
c48e6f014a [vulkan] Update VMA settings to reduce memory usage (#69088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69088

It was found that the Vulkan backend was consuming a huge (~287 MB) of graphics memory when executing a lightweight segmentation model. In fact the Vulkan backend tends to consume a huge amount of memory in general.

It was found that the reason for this is due to how the backend uses [VMA](https://gpuopen-librariesandsdks.github.io/VulkanMemoryAllocator/html/). When allocating memory, VMA will first allocate a large block of memory, then subdivide that block to use for individual textures and buffers. The pattern is used because Vulkan has a limit on the number of `vkDeviceMemory` allocations that can be active at one time.

It turns out that the Vulkan backend was using custom memory pools with a block size of 64 MiB, meaning that the minimum amount of memory used will be 64 MiB at minimum. Furthermore, usage of the [linear allocation algorithm](https://gpuopen-librariesandsdks.github.io/VulkanMemoryAllocator/html/custom_memory_pools.html#linear_algorithm) resulted in minimal reuse of memory, leading to the creation of much more blocks than were actually required and a huge amount of unused memory.

By avoiding the use of custom memory pools and instead simply using the default memory pool provided by VMA, the library seems to have a much easier time minimizing the amount of unused memory. This change reduces memory usage down to 20 MB when running the aforementioned segmentation model.

This diff also reduces the preferred block size to 32 MiB and removes the use of the linear allocation algorithm in case custom memory pools are needed in the future.

Test Plan:
Build and run vulkan_api_test:

```
cd ~/pytorch
BUILD_CUSTOM_PROTOBUF=OFF \
  BUILD_TEST=ON \
  USE_EIGEN_FOR_BLAS=OFF \
  USE_FBGEMM=OFF \
  USE_MKLDNN=OFF \
  USE_NNPACK=OFF \
  USE_NUMPY=OFF \
  USE_OBSERVERS=OFF \
  USE_PYTORCH_QNNPACK=OFF \
  USE_QNNPACK=OFF \
  USE_VULKAN=ON \
  USE_VULKAN_API=ON \
  USE_VULKAN_SHADERC_RUNTIME=ON \
  USE_VULKAN_WRAPPER=OFF \
  MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python3 setup.py develop --cmake && ./build/bin/vulkan_api_test
```

Reviewed By: beback4u

Differential Revision: D32653767

fbshipit-source-id: b063a8ea76d34b57d0e2e6972ca5f6f73f2fd7e5
2021-11-30 12:45:41 -08:00
fcd1375b2b [DDP][BE][Docs] Clarify checkpoint support (#68827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68827

Add a note about current checkpoint support with DDP. Note that this
does not include the features enabled with _set_static_graph yet, as it is an
undocumented private API. Once we support static graph as beta feature in OSS
we can add to the note here.
ghstack-source-id: 144285041

Test Plan: CI

Reviewed By: pritamdamania87

Differential Revision: D32624957

fbshipit-source-id: e21d156a1c4744b6e2a807b5b5289ed26701886f
2021-11-30 12:37:37 -08:00
994f110a6f Refactor DDP checkpoint tests (#68792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68792

Refactor tests to be more clear what features are supported and
unsupported under certain DDP configs.
ghstack-source-id: 144285040

Test Plan: Ci

Reviewed By: pbelevich

Differential Revision: D32609498

fbshipit-source-id: 5231242054d4ff6cd8e7acc4a50b096771ef23d1
2021-11-30 12:36:14 -08:00
49abda208b [JIT] internal build bug fix (#69061)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69061

`warning` breaks this build [D32622152](https://www.internalfb.com/diff/D32622152)

Test Plan: Imported from OSS

Differential Revision: D32712448

Pulled By: makslevental

fbshipit-source-id: c7a70487bd0b95ac8b242522c36597d36072201f
2021-11-30 12:32:11 -08:00
5e0302e1d0 [quant][embedding qat] Set FakeQuant zeropoint dtype matches observer (#68390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68390

Observer zero_point's dtype can be float, in the specific case of `torch.per_channel_affine_float_qparams`.
This change sets FakeQuant's zero_point dtype accordingly.

Test Plan:
`pytest test/quantization/core/test_workflow_module.py  -v -k "embedding"`
`pytest test/quantization/eager/test_quantize_eager_qat.py  -v -k "embedding"`

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32446405

fbshipit-source-id: cca7aade68ff171887eeeae42801f77d934dad4c
2021-11-30 12:21:14 -08:00
8f9f559453 ammend tensors.rst and torch.rst for doc generation (#69030)
Summary:
(This is my first contribution to PyTorch) Added missing operations to docs added in https://github.com/pytorch/pytorch/issues/64430. Please let me know if I've done anything wrong.

Fixes https://github.com/pytorch/pytorch/issues/68928

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69030

Reviewed By: samdow

Differential Revision: D32706826

Pulled By: soulitzer

fbshipit-source-id: edcc175a8f9bc69450a39059580c05edce699312
2021-11-30 12:04:13 -08:00
0aa9d177fe [fx] remove CPatcher (#69032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69032

I am removing it because, for packaging-related reasons, it's easier if
torch.fx is a pure Python module.

I don't think there is much reason to keep it: this functionality was
experimental, has no known users currently, and we didn't have a clear
path to turning it on by default due to regressions in tracing
performance. Also, it only was ever enabled for `rand` and friends.

Technically the removal of the `enable_cpatching` arguments on
`symbolic_trace` and `Tracer.__init__` are BC-breaking, but the
docstrings clearly state that the argument is experimental and BC is not
guaranteed, so I think it's fine.

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D32706344

Pulled By: suo

fbshipit-source-id: 501648b5c3610ae71829b5e7db74e3b8c9e1a480
2021-11-30 11:59:57 -08:00
81246ed01c Markdown was linking to repo rather than pytorch.org website (#68937)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68937

Reviewed By: samdow

Differential Revision: D32707264

Pulled By: soulitzer

fbshipit-source-id: c534f008087def33784dde701130769e2058aa9f
2021-11-30 11:51:24 -08:00
251686fc4c Revert D32706197: Sparse: Implement simple unary ufuncs operators
Test Plan: revert-hammer

Differential Revision:
D32706197 (fbaa19a6fa)

Original commit changeset: 65e1acb36457

fbshipit-source-id: 45c4b486f9eee200d5a1f6d46d267617124f8a5e
2021-11-30 10:50:12 -08:00
8fef7c09f5 Remove finput from slow2d signatures (#68896)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68896

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32655874

Pulled By: jbschlosser

fbshipit-source-id: 3c9acb106961c40af1432652179edb2bc5a4bfa5
2021-11-30 09:47:24 -08:00
cd3e37cbe4 [Static Runtime] [Code Cleanup] Reduce indentation depth in ops.cpp (#69028)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69028

This change converts

```
if (..) {
 ...
} else {
 ...
}
# end of function
```

into

```
if(...) {
  ...
  return;
}
...
```
in ops.cpp to remove the else branch to reduce the indentation depth by 1 for better readability.

Test Plan: N/A

Reviewed By: hlu1

Differential Revision: D32506235

fbshipit-source-id: a4fd5188bd680dba5dcad2b6e873735a54497664
2021-11-30 09:41:46 -08:00
cfc75c2137 [JIT] Separate GPU implementation of frozen_conv_add_relu_fusion.cpp (#68149)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68149

JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available)
ghstack-source-id: 143676384

Test Plan:
In the following script, conv_add_relu fusion is not observed without this change, but is observed when this change is added.
```
from typing import List, Optional

import torch

class Model(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.weight = torch.nn.Parameter(torch.rand((3, 3, 7, 7), device="cuda"))
        self.add_tensor = torch.nn.Parameter(torch.rand((3, 3, 7, 7), device="cuda"))

    def forward(
        self,
        inp: torch.Tensor,
        bias: Optional[torch.Tensor],
        stride: List[int],
        padding: List[int],
        dilation: List[int],
        groups: int,
    ):
        # weight = torch.zeros((3, 3, 7, 7), device="cuda")
        inp = inp.to("cuda")
        conv_result = torch.conv2d(
            inp, self.weight, bias, stride, padding, dilation, groups
        )
        add_result = conv_result.add_(self.add_tensor)
        return add_result.relu_()

    torch.jit.export
    def make_prediction(self, inp: torch.Tensor):
        bias = None
        groups = 1
        stride = (1, 1)
        padding = (0, 0)
        dilation = (1, 1)

        return self.forward(inp, bias, stride, padding, dilation, groups)

if __name__ == "__main__":
    # generate some sample input
    groups = 1
    channels_in = 3
    channels_out = 3
    kernel_size = (7, 7)
    stride = (1, 1)
    padding = (0, 0)
    dilation = (1, 1)
    inp = torch.rand((64, 3, 432, 432))
    weight = torch.rand(
        (channels_out, channels_in, kernel_size[0], kernel_size[1]), device="cuda"
    )
    bias = None

    model = Model()
    model.eval()
    script = torch.jit.script(model)
    script = torch.jit.freeze(script)
    script = torch.jit.optimize_for_inference(script)

    print("~~~~ FORWARD ~~~~")
    print(script.graph)

    print("with preserved_attrs")
    print(torch.sum(script.forward(inp, bias, stride, padding, dilation, groups)))
```

Reviewed By: cpuhrsch

Differential Revision: D32329330

fbshipit-source-id: c0f10da4b9540c588819efe3ec540baa0fae4b35
2021-11-30 09:31:57 -08:00
7342b654a1 [static runtime] dequantize out variant (#68664)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68664

Reland D32187063 (f120335643), fixing lint
Add out variant for aten::dequantize

Test Plan:
Test on inline_cvr model
```
MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/294738512/294738512_0.predictor.disagg.local --recordio_inputs=/data/users/ansha/tmp/adfinder/294738512/294738512_0_local.inputs.recordio --pt_enable_static_runtime=1 --compare_results=1 --iters=5 --warmup_iters=5 --num_threads=1 --do_profile=1 --method_name=local.forward --set_compatibility --do_benchmark=1 --recordio_use_ivalue_format=1
```

Before:
0.047472 ms.   0.409729%. aten::dequantize (9 nodes)

After
0.0307179 ms.   0.267204%. static_runtime::dequantize_copy (9 nodes, out variant)

Test on ctr_mbl_feed model 307210374 on 696 inputs

Before:
0.0569016 ms.   0.296647%. aten::dequantize (10 nodes)

After:
0.0423128 ms.   0.220481%. static_runtime::dequantize_copy (10 nodes, out variant)

Reviewed By: mikeiovine

Differential Revision: D32566429

fbshipit-source-id: b95dfc4c5e4115e083794093bc1571c7b1d72f5b
2021-11-30 09:03:26 -08:00
d3de3546d9 Revert D32099294: Split cuda: list cpp files that go in _cu library explicitly
Test Plan: revert-hammer

Differential Revision:
D32099294 (b47ae9810c)

Original commit changeset: 8a3582944b6b

fbshipit-source-id: eab63e6ba3db3e17f404292a3659823607627576
2021-11-30 08:42:19 -08:00
6fea7499c2 CompositeImplicitAutograd compliance testing (#65819)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65819

Related to #61669.

Functions registered as CompositeImplicitAutograd MUST work for most, if
not all, backends. This includes Tensor subclasses.

To achieve this, we (PyTorch) impose a set of constraints on how a
CompositeImplicitAutograd function can be written.

Concretely, this PR adds tests for all OpInfos that checks for
compliance. The things that get tested in this PR apply to composite
ops and are that:
- the op does not change the metadata of a Tensor without performing
dispatches
- the op does not call set_ or resize_
- the op does not directly access the data ptr

The mechanism for the test is to create a new __torch_dispatch__
object, CompositeCompliantTensor. For each operator, we wrap all inputs
in CompositeCompliantTensor, turn on python mode for it,
and send it through the operator.

Non-CompositeImplicitAutograd operators will pass the test because they
perform a dispatch to backend code. Here's how CompositeCompliantTensor
catches problems:

- If it sees set_ or resize_ getting called, it will directly error
out
- After each operation, CompositeCompliantTensor checks to make sure
that its metadata is consistent with that of the thing it is wrapping.
If the CompositeImplicitAutograd op modifes the metadata directly
(through e.g. the TensorImpl API) then the metadata will go out of sync.
- If data_ptr gets called, that returns a nice error (because the
storage is meta).

CompositeCompliantTensor is written in an interesting way. First off,
if a view operation occurs (e.g. `B = A.view_op(...)`), then B.storage()
must alias A.storage() where B.storage() is CompositeCompliantTensor's
storage, NOT the storage of the tensor it is wrapping. This is an
invariant in autograd, see #62182 for details. To handle
this we replay the view on A's storage and set it as B's storage.

Secondly, there are cases where the metadata is allowed to go out of
sync. I believe this is only possible with in-place view functions, like
transpose_, t_, squeeze_, unsqueeze_. Those are special cased.

Finally, I added a new section to aten/src/ATen/native/README.md about
what it means to be CompositeImplicitAutograd Compliant

Test Plan: - run tests

Reviewed By: ezyang, bdhirsh

Differential Revision: D31268369

Pulled By: zou3519

fbshipit-source-id: 31634b1cbe1778ab30196013cfc376ef9bd2e8b1
2021-11-30 07:35:22 -08:00
b83e8d560b [LT] Sync LTC branch changes on torch/csrc/lazy/core (#69012)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69012

Some changes to torch/csrc/lazy/core were done on the
lazy_tensor_staging branch (https://github.com/pytorch/pytorch/pull/68427).
Merge those back into the trunk.

Test Plan: Imported from OSS

Reviewed By: wconstab

Differential Revision: D32708696

Pulled By: desertfire

fbshipit-source-id: e54b978f2bdb9c7db27880f60246fdf1e8b41019
2021-11-30 07:09:15 -08:00
39ab417107 [Static Runtime] Fix fb::expand_dims schema (#68636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68636

Same old alias problem

Reviewed By: mikeiovine

Differential Revision: D32556204

fbshipit-source-id: 4d380f0110ad1be83f705e6d6910a6aaf818ec08
2021-11-30 06:28:29 -08:00
5b37ac54cb dbr quant overhead [14/x]: cache whether an op is a module (#68877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68877

Saves whether an op type is a module during tracing, so we
can avoid recalculating this when validating the op during inference.
This leads to a small speedup.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

```
// MobileNetV2, 1x3x224x224, function level profiling

// before
validate_cur_op - 1.77%

// after
validate_cur_op - 1.41%

```

Reviewed By: jerryzh168

Differential Revision: D32646149

Pulled By: vkuzo

fbshipit-source-id: 03ebc4fedceb84bb885939dff8dec81d30ba6892
2021-11-30 06:13:06 -08:00
b47ae9810c Split cuda: list cpp files that go in _cu library explicitly (#67216)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67216

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32099294

Pulled By: dagitses

fbshipit-source-id: 8a3582944b6b48af1ac31c5df09a7e6e838892c4
2021-11-30 04:24:55 -08:00
174eea8a05 Remove native_functions.yaml dependency from IndexKernel.{cpp,cu} (#66914)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66914

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D31856105

Pulled By: dagitses

fbshipit-source-id: 8729783b68879b509ae6b66ce145de0af68aad8c
2021-11-30 04:24:52 -08:00
f7d598948a Remove native_functions.yaml dependency from TensorModeKernel.cu (#66913)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66913

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D31856102

Pulled By: dagitses

fbshipit-source-id: 8888a1984adef09104a40ae683d091143cd1f4fa
2021-11-30 04:22:09 -08:00
ec1339a48b [CUDA Pinned Memory] Alternative implementation of pinned memory allocator focusing on multi-threaded scalability (#68906)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68906

The existing PyTorch pinned memory allocator has been a challenge for scalability in multi-GPU inference workloads. The existing allocator is mostly designed in the context of training, where in the process-per-GPU setup we have natural sharding of the global locks and lower allocation rates (perhaps O(100 allocs/sec) per process. In this setup we might have globally on the order of O(200k allocs/sec) - e.g. 20k QPS and 10 allocs/query. This is a different domain.

In the existing allocator, we observe tail latencies of cudaEventCreate and cudaEventDestroy (while holding the lock) can also completely stall all allocations, which is undesirable.

The idea here is to retain a similar design to the existing PyTorch allocator - eager collection of used memory, no lock-free or deferred tricks, identical semantics around events, but to:

a) split up the locks around the various critical datastructures, and
b) do as little work as possible while holding any process-global mutexes (importantly, no CUDA runtime API calls)
c) pool CUDA events manually (as cuda event creation is a bottleneck at high rates from multiple threads).

This does require a bit of care, but I believe it's correct. In general the threading and state transitions are fairly simple.

With these improvements, microbenchmarks show significant improvements (1.5x-3x). Importantly, real workloads also show significant improvements, especially WRT tail latency and stalls.

Test Plan:
Unit tests all pass.

With a synthetic benchmark such as:

```
static void BM_copies_baseline(benchmark::State& state) {
  auto N = state.range(0);
  auto scale = state.range(1);
  auto object_size_min = N;
  auto object_size_max = scale * N;

  auto device = at::Device(at::kCUDA, at::cuda::current_device());

  uint64_t bytes_copied = 0;
  uint64_t allocs = 0;
  auto stream = at::cuda::getCurrentCUDAStream();
  for (auto _ : state) {
    auto object_size = static_cast<int64_t>(expf(folly::Random::randDouble(
        logf(object_size_min), logf(object_size_max))));
    auto tensor = at::empty(
        {object_size},
        at::TensorOptions().dtype(at::kByte).pinned_memory(true));
    at::cuda::CachingHostAllocator_recordEvent(
        tensor.storage().data_ptr().get_context(), stream);
    bytes_copied += object_size;
    allocs += 1;
  }
  state.counters["BW"] =
      benchmark::Counter(bytes_copied, benchmark::Counter::kIsRate);
  state.counters["Allocs"] =
      benchmark::Counter(allocs, benchmark::Counter::kIsRate);
}

BENCHMARK(BM_copies_baseline)->Args({1000000, 20})->Threads(1)->UseRealTime();
BENCHMARK(BM_copies_baseline)->Args({1000000, 20})->Threads(4)->UseRealTime();
BENCHMARK(BM_copies_baseline)->Args({1000000, 20})->Threads(16)->UseRealTime();
BENCHMARK(BM_copies_baseline)->Args({1000000, 20})->Threads(64)->UseRealTime();
BENCHMARK(BM_copies_baseline)->Args({1000000, 20})->Threads(128)->UseRealTime();
BENCHMARK(BM_copies_baseline)->Args({1000000, 20})->Threads(256)->UseRealTime();
```

I observe roughly 1.5-3x improvements.

End to end application testing also sees significant improvements in the contended scenario.

Reviewed By: jianyuh, ngimel

Differential Revision: D32588784

fbshipit-source-id: ee86c3b7ed4da6412dd3c89362f989f4b5d91736
2021-11-30 02:49:43 -08:00
0cdeb586ae [LTC] Upstream some utilities (#69046)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69046

This commit upstreams utilities including ExceptionCleanup, MaybeRef,
Iota, ToVector, ToOptionalVector and GetEnumValue.

Test Plan: ./build/bin/test_lazy --gtest_filter=UtilTest.*

Reviewed By: wconstab, Chillee

Differential Revision: D32709090

Pulled By: alanwaketan

fbshipit-source-id: 5147433becd4dbb07be7d36d66b0b8685054d714
2021-11-30 02:44:02 -08:00
fbaa19a6fa Sparse: Implement simple unary ufuncs operators (#68887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68887

Closes #46988, closes #46987, closes #46761

By "simple" I mean operators that map 0->0 so we can implement it by
just re-dispatching on the values tensor. That does mean we have `sin`
but not `cos` for example, but without fill value support this is the
best that can be done.

Most of these don't support autograd because the derivative formulas
use unsupported operators.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32706197

Pulled By: cpuhrsch

fbshipit-source-id: 65e1acb3645737ca7bdb7f2db739d8e118906f4b
2021-11-30 00:30:30 -08:00
3186d36972 [TensorExpr] Supress TracerWarnings in test_unsupported in test_jit_fuser_te.py. (#68757)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68757

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D32600951

Pulled By: ZolotukhinM

fbshipit-source-id: 7b9859d7dee1e9803b8fde5d071890a72d30cec9
2021-11-30 00:06:36 -08:00
75ce040620 [TensorExpr] Allow for 'keepdim' argument in aten::mean in NNC's external call. (#68756)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68756

That fixes some warnings in our tests.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D32600952

Pulled By: ZolotukhinM

fbshipit-source-id: 548eaf3659e20795cce44d8f57e77f4a47d44d98
2021-11-30 00:06:34 -08:00
a93f505ee5 [TensorExpr] IRPrinter: print sizes and name when visiting a Buf. (#68755)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68755

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D32600950

Pulled By: ZolotukhinM

fbshipit-source-id: 925da05d958497791cb9176a5d15d8315334aa24
2021-11-30 00:05:10 -08:00
8cc9ec2f6b Add option to get input dtype from user (#68751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68751

Add option to get input dtype from user for AOT compilation

Test Plan:
BI model compiles and runs fine
```
(pytorch)  ~/fbsource/fbcode/caffe2/fb/nnc
└─ $ buck run //caffe2/binaries:aot_model_compiler -- --model=bi.pt --model_name=pytorch_dev_bytedoc --model_version=v1 '--input_dims=1,115;1' --input_types='int64;int64'
Building... 8.3 sec (99%) 7673/7674 jobs, 0/7674 updated
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1116 14:32:44.632536 1332111 TensorImpl.h:1418] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator())
E1116 14:32:44.673710 1332111 huge_pages_allocator.cc:287] Not using huge pages because not linked with jemalloc
The compiled llvm assembly code was saved to bi.compiled.ll
The compiled model was saved to bi.compiled.pt
```

> Error thrown when input dims and input types sizes don't match

```
(pytorch)  ~/fbsource/fbcode/caffe2/fb/nnc
└─ $ buck run //caffe2/binaries:aot_model_compiler -- --model=bi.pt --model_name=pytorch_dev_bytedoc --model_version=v1 '--input_dims=1,115;1' --input_types='int64;int64;int64'
.
.
terminate called after throwing an instance of 'c10::Error'
  what():  [enforce fail at aot_model_compiler.cc:208] split(';', FLAGS_input_dims).size() == split(';', FLAGS_input_types).size(). Number of input_dims and input_types should be the same
.
.
.
```

Reviewed By: ljk53

Differential Revision: D32477001

fbshipit-source-id: 8977b0b59cf78b3a2fec0c8428f83a16ad8685c5
2021-11-29 21:39:49 -08:00
ac1fe91dc9 Clean up some THC includes (#69024)
Summary:
These seem to not be needed and cause ninja to rebuild the files at every build.

(There also is THCStorage.cu, but hopefully this will go away with https://github.com/pytorch/pytorch/issues/68556 )

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69024

Reviewed By: soulitzer

Differential Revision: D32705309

Pulled By: ngimel

fbshipit-source-id: 5255297f213fdcf36e7203de7460a71291f8c9a0
2021-11-29 20:55:27 -08:00
ce53baf573 Merging the implementations of ClearProfiling (#67575)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67575

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32497548

Pulled By: Gamrix

fbshipit-source-id: fb656b017d405487e25bd2407b069a702769659f
2021-11-29 19:48:56 -08:00
e6a8d15a4c cpu_kernel_vec: Hoist stride checks out of loop (#68962)
Summary:
`cpu_kernel_vec` does stride checks to determine whether to use the vectorized or scalar inner loop. Since it uses a 1d `for_each` loop, it re-does these stride checks after every loop over the inner dimension. For iterators with small inner dimensions, this means a significant proportion of the time may be spent just on stride checks.

This changes it to use a 2d loop so the stride checks are further amortized. With the below `copy_` benchmark, it saves 50% of the callgrind instruction count from 28.4 Million to 13.5 Million and 30% time speedup from 22.8 us to 16.4 us on my machine.

```
from torch.utils.benchmark import Timer
import timeit
timer = Timer(
    stmt="b.copy_(a);",
    setup="""
    auto a = at::rand({10000, 8}, at::kComplexDouble).slice(0, 0, -1, 2);
    auto b = at::empty_like(a);
    """,
    num_threads=1,
    language='c++',
    timer=timeit.default_timer
)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68962

Reviewed By: mrshenli

Differential Revision: D32684191

Pulled By: ngimel

fbshipit-source-id: 582af038314a0f999f43669e66edace38ff8d2dc
2021-11-29 19:37:58 -08:00
61ea2fc35e Fix device type / dtype handling for parametrized test names (#65217)
Summary:
This PR absolves `_TestParametrizer`s (e.g. `ops`, `modules`, `parametrize`) of the responsibility of adding device type (e.g. `'cpu'`, `'cuda'`, etc.) / dtype (e.g. 'float32') to generated test names. This fixes repeated instances of the device string being added to generated test names (e.g. `test_batch_norm_training_True_cuda_track_running_stats_True_cuda_affine_True_cuda`).

The responsibility for placing device / dtype suffixes is now handled by `instantiate_device_type_tests()` instead so it is added a single time. It will place `<device>_<dtype>` at the end of the test name unconditionally, maintaining the current naming convention.

As part of this work, I also tightened the semantics through some additional error case handling:
* Composing multiple decorators that each try to handle the same parameter will error out with a nice message. This includes the case to trying to compose `modules` + `ops`, as they each try to handle `dtype`. Similarly, `ops` + `dtypes` is forbidden when both try to handle `dtype`. This required changes in the following test files:
  * `test/test_unary_ufuncs.py`
  * `test/test_foreach.py`
* The `modules` / `ops` decorators will now error out with a nice message if used with `instantiate_parametrized_tests()` instead of `instantiate_device_type_tests()`, since they're not (currently) written to work outside of a device-specific context.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65217

Reviewed By: mruberry

Differential Revision: D32627303

Pulled By: jbschlosser

fbshipit-source-id: c2957228353ed46a0b7da8fa1a34c67598779312
2021-11-29 19:02:23 -08:00
933d5b561f Fixed links to RNN docs in comments (#68828)
Summary:
Fixed links to RNN docs in comments

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68828

Reviewed By: soulitzer

Differential Revision: D32702384

Pulled By: jbschlosser

fbshipit-source-id: 577c88842cde555534d9a39fa7dfd24164d71552
2021-11-29 18:55:53 -08:00
863f321c6d Fix typo in AdaptiveLogSoftmaxWithLoss docs (#68926)
Summary:
Fixes a typo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68926

Reviewed By: soulitzer

Differential Revision: D32702366

Pulled By: jbschlosser

fbshipit-source-id: 8975aad3e817dab33359cf29182b4bd1e3aa1299
2021-11-29 18:51:58 -08:00
b8c3693281 Remove autograd-enabled collective APIs from distributed docs (#69011)
Summary:
These APIs are not yet officially released and are still under discussion. Hence, this commit removes those APIs from docs and will add them back when ready.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69011

Reviewed By: fduwjj

Differential Revision: D32703124

Pulled By: mrshenli

fbshipit-source-id: ea049fc7ab6b0015d38cc40c5b5daf47803b7ea0
2021-11-29 18:14:50 -08:00
178010455d Vectorized: Use inline namespace instead of anonymous (#67655)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67655

Some of the CPU operators already use the `namespace CPU_CAPABILITY` trick to avoid anonymous namespacing, like [`PowKernel.cpp`](cd51d2a3ec/aten/src/ATen/native/cpu/PowKernel.cpp (L14)). This extends that pattern to the `Vectorized` class, which avoids `Wsubobject-linage` warnings like I was getting in #67621.

For many functions, it was necessary to add `inline` because the functions are defined in a header. There were no link errors previously because the anonymous namespace ensured they were not exposed to linkage. Similarly, free functions defined in an anonymous namespace might need the `C10_UNUSED` attribute to silence warnings about the function not being called in the only translation unit that it's defined in. By removing the anonymous namespace, these decorators are no longer necessary.

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D32566109

Pulled By: malfet

fbshipit-source-id: 01d64003513b4946dec6b709bd73bbab05772134

Co-authored-by: Nikita Shulga <nshulga@fb.com>
2021-11-29 16:54:17 -08:00
1d0416397a [PyTorch] Change from unique_ptr to optional for RecordFunction state (#68397)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68397

Now that hot paths can avoid instantiating RecordFunction by using shouldRunRecordFunction, we can improve efficiency for profiling cases by avoiding a large heap allocation.
ghstack-source-id: 144235785

Test Plan:
1) Run //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark before/after this diff with arguemnts --stressTestRecordFunction --op empty.

Before: P467891381

After: P467902339

2) Run without --stressTestRecordFunction to verify no regression in the regular dispatcher path.

Before: P467902381

After: P467902403

Reviewed By: chaekit

Differential Revision: D32448365

fbshipit-source-id: 2d32a3bd82c60d2bb11fc57bb88bf3f02aa3fa25
2021-11-29 16:35:36 -08:00
7194faed7f [PyTorch] Optimzize mergeRunCallbacks for RecordFunction (#68387)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68387

Function call overhead on tryRunCallback was notable.
ghstack-source-id: 144235788

Test Plan:
Run //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark before/after this diff with arguemnts `--stressTestRecordFunction --op empty`.

Before: P467891339
After: P467891381

Reviewed By: chaekit

Differential Revision: D32443863

fbshipit-source-id: c0b3dd40bbd5bca976c2ebb0f21aa62e097b302e
2021-11-29 16:33:36 -08:00
f1a3512b78 Adding Linux cuda 11.5 workflows (#68745)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68960

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68745

Reviewed By: janeyx99

Differential Revision: D32707491

Pulled By: atalman

fbshipit-source-id: 100facfdcc0fc2f68e203a696856852faa25ee08
2021-11-29 16:21:00 -08:00
27228656e6 [FX][docs] Document gotcha about training flag (#68915)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68913

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68915

Reviewed By: jamesr66a

Differential Revision: D32705410

Pulled By: jubinchheda

fbshipit-source-id: a44c17ab0e62465823ceb0ef983ae330b50fb073
2021-11-29 16:13:32 -08:00
f253370bb9 dbr quant overhead [13/x]: cache results of get_module_hook_type (#68841)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68841

Caches the current module's hook type as an attribute on the module.
This requires the assumption that the current module's hook type
does not change during inference, which is an assumption we can
commit to.

Test Plan:
correctness
```
python test/test_quantization.py TestQuantizeDBR
```

performance
```
// MobileNetV2, 1x3x224x224, function profiling

// before
get_module_hook_type -> 2.58%

// after
get_module_hook_type -> 0.73%
```

Reviewed By: jerryzh168

Differential Revision: D32630881

Pulled By: vkuzo

fbshipit-source-id: 667f2667ef9c5514e5d82e4e7e4c02b8238edc65
2021-11-29 16:10:24 -08:00
2ad4727ad9 dbr quant: fix debugging fqn info for converted model (#68840)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68840

Fixes the debugging FQN info for a converted model. Some of this
information was missing because eager mode convert performed
module swaps. This information is only used in debugging and is
not used for inference.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

turn `enable_logging` on in `auto_trace.py`, the FQN is now displayed
for a converted model

Reviewed By: jerryzh168

Differential Revision: D32630884

Pulled By: vkuzo

fbshipit-source-id: be8c43343abfdab9fe0af39499d908ed61a01b78
2021-11-29 16:10:21 -08:00
a03fe9ba61 dbr quant overhead[12/x]: turn off overrides for module convert output hook (#68839)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68839

We can assume that there are no overrides needed for the hook which
dequantizes the module outputs, so we can turn them off explicitly.
While this does not lead to a measurable perf win, it makes things
easier to debug by eliminating the no-op overrides.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32630886

Pulled By: vkuzo

fbshipit-source-id: 1719c168f5f21f3e59c80a3b6d0f32ebb1c77ef8
2021-11-29 16:10:18 -08:00
515db56755 dbr quant: remove unnecessary outputs hook (#68838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68838

Removes an unnecessary outputs hook on the top level
module.  The same hook is already called inside the regular
hook flow.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: soulitzer

Differential Revision: D32630882

Pulled By: vkuzo

fbshipit-source-id: aa5f1b1cb866051013195d7311949333b08df4de
2021-11-29 16:10:15 -08:00
e3af582f92 dbr quant overhead[11/x]: speed up module convert hook (#68837)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68837

The module convert hook dequantizes the module outputs if the user
requested the module to adhere to a certain dtype for outputs. This
is most commonly used for the assumption that a model's overall return
type if fp32.

This PR precalculates for each module whether this hook will do anything,
and returns early if it does not. This prevents the overhead of this
hook to influencing any module which does not need this hook.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

perf

```
MobileNetV2, 1x3x224x224, function level profiling

// before
outputs_convert_hook - 0.73%

// before
outputs_convert_hook - 0.45%
```

Reviewed By: jerryzh168

Differential Revision: D32630885

Pulled By: vkuzo

fbshipit-source-id: 7ee84de742fc0c752b66d20d097405a754c8b480
2021-11-29 16:10:12 -08:00
be70477a7b dbr quant overhead[10/x]: disable torch_function overrides for leaf nodes (#68836)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68836

If we have a leaf module like a `torch.nn.Conv2d`, DBR quant handles
the input and output of the module and should treat the inside of
this module as invisible.  Specifically, there is no need to override
the `F.conv2d` call if the parent module is already being overridden.

Before this PR, `__torch_function__` was still overridden for the insides
of leaf modules, and the override was a no-op.  There was some overhead
in these overrides because they were checking the hook type.

This PR adds a fast global override so we can skip overridding the insides
of leaf modules. This has some performance benefits in the prepare model,
because we now skip overriding all of the inner functions in observers.

Test Plan:
testing
```
python test/test_quantization.py TestQuantizeDBR
```

perf
```
// MobileNetV2, 1x3x224x224, comparing fp32 with dbr quant, Mac OS laptop

// before

fp32: 0.017837 seconds avg
fx_prepared: 0.021963 seconds avg, 0.812143 speedup vs fp32
fx_quantized: 0.012632 seconds avg, 1.412056 speedup vs fp32
dt_prepared: 0.034052 seconds avg, 0.523820 speedup vs fp32
dt_quantized: 0.018316 seconds avg, 0.973829 speedup vs fp32

// after

fp32: 0.020395 seconds avg
fx_prepared: 0.026969 seconds avg, 0.756230 speedup vs fp32
fx_quantized: 0.013195 seconds avg, 1.545611 speedup vs fp32
dt_prepared: 0.033432 seconds avg, 0.610023 speedup vs fp32
dt_quantized: 0.018244 seconds avg, 1.117866 speedup vs fp32

```

Reviewed By: jerryzh168

Differential Revision: D32630883

Pulled By: vkuzo

fbshipit-source-id: 6365e1c514726d8b2a4b3a51f114f5fed3ebe887
2021-11-29 16:08:52 -08:00
1342f19a8c Add ModuleInfo-based device transfer tests (#68092)
Summary:
Continuation of https://github.com/pytorch/pytorch/issues/65488; addresses the problem that got it reverted.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68092

Reviewed By: mruberry

Differential Revision: D32299103

Pulled By: jbschlosser

fbshipit-source-id: bc298aca15368f2acb5082e6fb6eedea60b5d75f
2021-11-29 15:48:40 -08:00
89a145fd91 Sparse CSR CUDA: Add torch.sparse.sampled_addmm (#68007)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68007

This PR adds a new function to the sparse module.
`sampled_addmm` computes α*(A @ B) * spy(C) + β*C, where C is a sparse CSR matrix and A, B are dense (strided) matrices.
This function is currently restricted to single 2D matrices, it doesn't support batched input.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32435799

Pulled By: cpuhrsch

fbshipit-source-id: b1ffac795080aef3fa05eaeeded03402bc097392
2021-11-29 15:43:29 -08:00
af49805a73 Port lerp to structured kernels (#68924)
Summary:
Ref https://github.com/pytorch/pytorch/issues/55070

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68924

Reviewed By: jbschlosser

Differential Revision: D32697409

Pulled By: bdhirsh

fbshipit-source-id: b098533e46f8bdbb995c76db0e6a124ab2b076b8
2021-11-29 15:11:30 -08:00
62847a2b9c Fix bug on empty GLOO_SOCKET_IFNAME_ENV (#68933)
Summary:
This PR is trying to fix the no device bug when user resets the `GLOO_SOCKET_IFNAME_ENV` with

```bash
export GLOO_SOCKET_IFNAME_ENV=
```

Thank you for your time on reviewing this PR :).

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68933

Reviewed By: soulitzer

Differential Revision: D32690633

Pulled By: mrshenli

fbshipit-source-id: f6df2b8b067d23cf1ec177c77cc592dc870bda72
2021-11-29 15:05:38 -08:00
b468566208 Add ModuleInfo-based CPU / GPU parity tests (#68097)
Summary:
Continuation of https://github.com/pytorch/pytorch/issues/64694; fixes issues with the diff there

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68097

Reviewed By: mruberry

Differential Revision: D32300650

Pulled By: jbschlosser

fbshipit-source-id: f3a5e72b019d4eddd7202854999eab61fffc9006
2021-11-29 14:58:07 -08:00
fb63bb60ec Strided masked norm. (#68584)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68584

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D32581285

Pulled By: cpuhrsch

fbshipit-source-id: 896ee1e58957b46c2f6a16a170adff4cb3b8da62
2021-11-29 14:23:27 -08:00
f776f30780 Keep the sequence or mapping type in default_collate (#68779)
Summary:
`default_collate`, `default_convert`, and `pin_memory` convert sequences into lists. I believe they should keep the original type when possible (e.g., I have a class that inherits from `list`, which comes from a 3rd party library that I can't change, and provides extra functionality).

Note it's easy to do when the type supports an iterable in its creation but it's not always the case (e.g., `range`).

Even though this can be accomplished if using a custom `default_collate`/`default_convert`, 1) this is behavior they should support out-of-the-box IMHO, and 2) `pin_memory` still does it.

cc VitalyFedyunin ejguan NivekT

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68779

Reviewed By: wenleix

Differential Revision: D32651129

Pulled By: ejguan

fbshipit-source-id: 17c390934bacc0e4ead060469cf15dde815550b4
2021-11-29 13:14:20 -08:00
d9e7d85390 Remove TH/THC Storage (#68556)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67852

cc ezyang bhosmer smessmer ljk53 bdhirsh

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68556

Reviewed By: ejguan

Differential Revision: D32652758

Pulled By: ngimel

fbshipit-source-id: 170956fca112606f9008abe09b92c6ddc411be09
2021-11-29 12:55:20 -08:00
f5fa91ba2e Sparse: Add additional opinfo tests (#68886)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68886

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32697933

Pulled By: cpuhrsch

fbshipit-source-id: fffdd1bc663cc1bc49abe8cf3680982d1cb497bc
2021-11-29 12:49:20 -08:00
3bd7dbf119 [Dist CI][BE] Remainder of c10d/store tests run in subprocess (#68822)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68822

Per title, we switched over c10d_gloo and nccl and results look good
so far, so switch the rest of them as well. After the only dist tests that
won't run in subprocess are pipe and fsdp tests, which historically haven't had
much flakiness.
ghstack-source-id: 144213522

Test Plan: CI

Reviewed By: H-Huang

Differential Revision: D32624330

fbshipit-source-id: 469f613e5b0e4529e6b23ef259d948837d4af26b
2021-11-29 10:59:39 -08:00
250d0bd20b [RPC][Dist CI][BE] RPC tests run in subprocess (#68821)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68821

Continuing effort to move most distributed tests to run in subprocess
for better reproducibility + reduce flakiness.
ghstack-source-id: 144213520

Test Plan: CI

Reviewed By: H-Huang

Differential Revision: D32624199

fbshipit-source-id: 04448636320554d7a3ab29ae92bc1ca9fbe37da2
2021-11-29 10:58:08 -08:00
51f4ac40fd ci: Use default blank if no TEST_CONFIG (#69008)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69008

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32699051

Pulled By: seemethere

fbshipit-source-id: 9ed12fe8a7f541c6eda77182cfd1b0a733a545f0
2021-11-29 10:05:20 -08:00
ee59a09772 Implement sharding for MacOS jobs (#68784)
Summary:
Do not run distributed tests as part of separate shard, but keep it inside one of the two shards (to limit concurrency problems)
Fixes https://github.com/pytorch/pytorch/issues/68260

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68784

Reviewed By: seemethere, janeyx99

Differential Revision: D32653440

Pulled By: malfet

fbshipit-source-id: ebe5bbc30bdf67e930f2c766c920932700f3a4e4
2021-11-29 09:31:42 -08:00
61a4204d80 Sparse CSR CUDA: Add block torch.addmm when mat1 is sparse (#68707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68707

This PR adds a path for block CSR matrices for `torch.addmm`. cuSPARSE interface is restricted to 32-bit indices and square blocks.
My plan is to make everything work and tests passing using an unsafe constructor first, keeping it all private. Then discuss & implement constructors with block information separately unlocking the functions for wider use. Documentation will come with the update to constructors.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D32650366

Pulled By: cpuhrsch

fbshipit-source-id: 430a9627901781ee3d2e2496097b71ec17727d98
2021-11-29 08:58:49 -08:00
9ee5db490b neg_sparse: Fix output dtype (#68885)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68885

`torch.neg` should preserve the input dtype but for sparse tensors it
was promoting integers to floating point. This would have been picked
up by the OpInfo-based test, but `neg` wasn't marked with
`supports_sparse=True` so it was never run.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32680008

Pulled By: cpuhrsch

fbshipit-source-id: 502f8743c1c33ab802e3d9d097792887352cd220
2021-11-29 08:48:22 -08:00
7b701ce2d4 Add set_to_none option to C++ API (#68801)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68167.

Signed-off-by: Vinnam Kim <vinnam.kim@makinarocks.ai>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68801

Reviewed By: mruberry

Differential Revision: D32625239

Pulled By: jbschlosser

fbshipit-source-id: 5f09b959e23d5448106a47029d06ec20ad094d82
2021-11-29 08:42:39 -08:00
787ded5103 Add lazy::Shape::numel() (#68314)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68314

Add a convenience to lazy::Shape for counting the number of elements (by multiplying out the dimensions).  This is a method on Tensor, and in switching other lazy tensor shape utils to use aten shape inference, we need numel counts.

Test Plan: add unit tests

Reviewed By: alanwaketan

Differential Revision: D32409138

fbshipit-source-id: 3ae725300f8826d38e45412f46501d5e5f776fb2
2021-11-29 08:38:09 -08:00
3d504ae1b4 [RELAND] Fix Dispatching not considering List[Optional[Tensor]] for dispatch (#68073)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68073

Relanding the original PR. Its body was as follows:

Followup to https://github.com/pytorch/pytorch/pull/60787

It turns out that the original PR was wrong for unboxed kernels. We
recently ran into this in
https://github.com/facebookresearch/functorch/issues/124

For unboxed kernels, the correct type for a Tensor?[] argument is
actually `List<optional<Tensor>>`, not `ArrayRef<optional<Tensor>>`
ghstack-source-id: 144204580

Test Plan:
- assert that https://github.com/facebookresearch/functorch/issues/124
actually works

Reviewed By: gchanan

Differential Revision: D32313601

Pulled By: zou3519

fbshipit-source-id: 8028d5f34eecabc53d603bd54d6b6748b5db461a
2021-11-29 08:31:55 -08:00
17ba936da0 .github: Migrate XLA tests to GHA (#64320)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64320

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30684490

Pulled By: seemethere

fbshipit-source-id: 5d2657f9aa4c7082591239a5bb095cc85d2cde66
2021-11-29 08:30:57 -08:00
f398320e0d packaging: Include lazy headers in package_data (#68817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68817

Looks like these files are getting used by downstream xla so we need to
include them in our package_data

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D32622241

Pulled By: seemethere

fbshipit-source-id: 7b64e5d4261999ee58bc61185bada6c60c2bb5cc
2021-11-29 08:29:48 -08:00
871cd7c5b9 Forward-mode AD support for torch.split, torch.split_with_sizes (#68566)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68566

These are just auto-linear as pointed out by Jeffrey.
ghstack-source-id: 143814393

Test Plan: - Run OpInfo tests.

Reviewed By: albanD, soulitzer

Differential Revision: D32520239

Pulled By: zou3519

fbshipit-source-id: 807115157b131e6370f364f61db1b14700279789
2021-11-29 07:50:53 -08:00
3315c4b31e add instructions for unhandled exceptions in assert_close (#68722)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68722

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D32684446

Pulled By: mruberry

fbshipit-source-id: 04fe5730721d24e44692cdc9bb327484356ead3f
2021-11-28 21:35:53 -08:00
d095f498a0 Tensor docs (#63308)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62146.

Modernizes and clarifies the documentation of torch.tensor and torch.as_tensor, highlighting the distinction in their copying behavior and preservation of autograd history.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63308

Reviewed By: albanD, ngimel

Differential Revision: D30338025

Pulled By: mruberry

fbshipit-source-id: 83a0c113e4f8fce2dfe086054562713fe3f866c2
2021-11-28 21:26:12 -08:00
6ae34ea6f8 Revert D32521980: Add linalg.lu_factor
Test Plan: revert-hammer

Differential Revision:
D32521980 (b10929a14a)

Original commit changeset: 26a49ebd87f8

fbshipit-source-id: e1a6bb9c2ece9bd78190fe17e16a46e3358c5c82
2021-11-28 17:22:15 -08:00
b10929a14a Add linalg.lu_factor (#66933)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66933

This PR exposes `torch.lu` as `torch.linalg.lu_factor` and
`torch.linalg.lu_factor_ex`.

This PR also adds support for matrices with zero elements both in
the size of the matrix and the batch. Note that this function simply
returns empty tensors of the correct size in this case.

We add a test and an OpInfo for the new function.

This PR also adds documentation for this new function in line of
the documentation in the rest of `torch.linalg`.

Fixes https://github.com/pytorch/pytorch/issues/56590
Fixes https://github.com/pytorch/pytorch/issues/64014

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32521980

Pulled By: mruberry

fbshipit-source-id: 26a49ebd87f8a41472f8cd4e9de4ddfb7f5581fb
2021-11-27 17:52:48 -08:00
01ddd5dde6 [opinfo] use dtypes instead of dtypesIfCPU (#68732)
Summary:
Reland https://github.com/pytorch/pytorch/issues/67619

Replace usage of dtypesIfCPU with dtypes in OpInfo class and also make it a mandatory argument.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68732

Reviewed By: jbschlosser

Differential Revision: D32594344

Pulled By: mruberry

fbshipit-source-id: 660b38aef97752ba064228e8989041ed1d5777fe
2021-11-27 16:07:51 -08:00
cffad597ea Tune test_reference_numerics_normal (#68019)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68019

Reviewed By: albanD

Differential Revision: D32482535

Pulled By: mruberry

fbshipit-source-id: 48300a5c6a4484fb81789f9049d3f08272d9f31c
2021-11-26 18:59:31 -08:00
5fdcc20d8d [JIT][Symbolic Shape Analysis] expose op shape functions (#68748)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68748

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D32598605

Pulled By: makslevental

fbshipit-source-id: c97a06cd0fe143a6ea14db65fc5d3f76abdff312
2021-11-24 17:17:01 -08:00
f14c16e509 Revert D32599540: [pytorch][PR] implemented 'torch.distributions.constraints.symmetric' checking if the tensor is symmetric at last 2 dimension.
Test Plan: revert-hammer

Differential Revision:
D32599540 (bc3bdbc8f4)

Original commit changeset: 9227f7e99318

fbshipit-source-id: edfe7072073d910a49be52e1b8c2d374ef71e9ec
2021-11-24 17:15:31 -08:00
c2e3b92db4 partial revert of D32522826 (#68889)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68889

Reviewed By: cpuhrsch, ejguan

Differential Revision: D32650385

Pulled By: Krovatkin

fbshipit-source-id: 2c4a30cfc729a023b592b6b6e1959bbd2ad6f7cf
2021-11-24 17:05:20 -08:00
4afa5ea0ab native_functions.yaml: remove SparseXPU which is added by accident (#68791)
Summary:
gen_backend_stubs.py will report 'assert' when generate code with
SparseXPU dispatch key for external backends, if SparseXPU is in
native_functions.yaml.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68791

Reviewed By: cpuhrsch, ejguan

Differential Revision: D32646303

Pulled By: bdhirsh

fbshipit-source-id: 64e42cc40468bc8c696a31b4b7c0cc3728866a64
2021-11-24 15:34:17 -08:00
c5f63f859e Add slow path to getCustomClassTypeImpl (#68717)
Summary:
This fixes custom class registration issue when `typeid` is not guaranteed to be unique across multiple libraries, which is the case for libc++ runtime on MacOS 11 in particular for M1
From [libcxx/include/typeinfo](78d6a7767e/include/typeinfo (L139)):
```
// -------------------------------------------------------------------------- //
//                          NonUniqueARMRTTIBit
// -------------------------------------------------------------------------- //
// This implementation of type_info does not assume always a unique copy of
// the RTTI for a given type inside a program. It packs the pointer to the
// type name into a uintptr_t and reserves the high bit of that pointer (which
// is assumed to be free for use under the ABI in use) to represent whether
// that specific copy of the RTTI can be assumed unique inside the program.
// To implement equality-comparison of type_infos, we check whether BOTH
// type_infos are guaranteed unique, and if so, we simply compare the addresses
// of their type names instead of doing a deep string comparison, which is
// faster. If at least one of the type_infos can't guarantee uniqueness, we
// have no choice but to fall back to a deep string comparison.
```

But `std::type_index` hash is computed always assuming that implementation is unique
By adding a slow path this problem can be fixed in those scenarios.

Fixes https://github.com/pytorch/pytorch/issues/68039

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68717

Reviewed By: seemethere

Differential Revision: D32605187

Pulled By: malfet

fbshipit-source-id: 8d50e56885b8c97dad3bc34a69c47ef879456dd1
2021-11-24 15:00:47 -08:00
14dc9759f2 Revert D32650384: OpInfos for torch.{flatten, column_stack}
Test Plan: revert-hammer

Differential Revision:
D32650384 (aceb46e4ce)

Original commit changeset: 9ead83b378d0

fbshipit-source-id: 3ef281e536b1f21a6f13c6c51309021cf92b53b2
2021-11-24 14:55:26 -08:00
96929ea995 Update empty and empty_like examples in docs (#68874)
Summary:
For some reason, the example for `torch.empty` showed the usage of `torch.empty_like` and the other way around. These are now swapped.

Fixes https://github.com/pytorch/pytorch/issues/68799

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68874

Reviewed By: wenleix

Differential Revision: D32646645

Pulled By: ejguan

fbshipit-source-id: c8298bcaca450aaa4abeef2239af2b14cadc05b3
2021-11-24 14:01:06 -08:00
d44e610efa [CUDA Pinned Memory] Event recording with non-blocking copies should track the storage context, not the tensor data pointer (#68749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68749

The logic for asynchronous copies (either HtoD or DtoH) using cudaMemcpyAsync relies on recording an event with the caching host allocator to notify it that a given allocation has been used on a stream - and thus it should wait for that stream to proceed before reusing the host memory.

This tracking is based on the allocator maintaining a map from storage allocation pointers to some state.

If we try to record an event for a pointer we don't understand, we will silently drop the event and ignore it (9554ebe44e/aten/src/ATen/cuda/CachingHostAllocator.cpp (L171-L175)).

Thus, if we use the data_ptr of a Tensor instead of the storage allocation, then reasonable code can lead to incorrectness due to missed events.

One way this can occur is simply by slicing a tensor into sub-tensors - which have different values of `data_ptr()` but share the same storage, for example:

```
image_batch = torch.randn(M, B, C, H, W).pin_memory()
for m in range(M):
  sub_batch = image_batch[m].cuda(non_blocking=True)
  # sub_batch.data_ptr() != image_batch.data_ptr() except for m == 0.
  # however, sub_batch.storage().data_ptr() == image_batch.storage().data_ptr() always.
```

Therefore, we instead use the storage context pointer when recording events, as this is the same state that is tracked by the caching allocator itself. This is a correctness fix, although it's hard to determine how widespread this issue is.

Using the storage context also allows us to use a more efficient structure internally to the caching allocator, which will be sent in future diffs.

Test Plan: Test added which demonstrates the issue, although it's hard to demonstrate the race explicitly.

Reviewed By: ngimel

Differential Revision: D32588785

fbshipit-source-id: d87cc5e49ff8cbf59052c3c97da5b48dd1fe75cc
2021-11-24 13:20:22 -08:00
bc3bdbc8f4 implemented 'torch.distributions.constraints.symmetric' checking if the tensor is symmetric at last 2 dimension. (#68644)
Summary:
Implemented submodule for https://github.com/pytorch/pytorch/issues/68050
Opened cleaned, final version of PR for https://github.com/pytorch/pytorch/issues/68240

Explanation:
I am trying to contribute to PyTorch by implementing distributions for symmetric matrices like Wishart distribution and Inverse Wishart distribution. Although there is a LKJ distribution for the Cholesky decomposition of correlation matrices, it only represents equivalence to restricted form of Wishart distribution. [https://arxiv.org/abs/1809.04746](https://arxiv.org/abs/1809.04746) Thus, I started implementing Wishart distribution and Inverse Wishart distribution seperately.

I added a short code about the 'torch.distributions.constraints.symmetric', which was not included in 'torch.distributions.constraints' previously. i.e., 'torch.distributions.constraints' contains module like 'positive_definite' constraints, but it just assumes symmetricity of the input matrix. [Link](1adeeabdc0/torch/distributions/constraints.py (L466)) So, I think it will be better if we have constraint checking symmetricity of the tensors in PyTorch.

We may further utilize it like
`constraints.stack([constraints.symmetric, constraints.positive_definite])`
for the constraint of the covariance matrix in Multivariate Normal distribution, for example, to check if the random matrix is a symmetric positive definite matrix.

cc fritzo neerajprad alicanb nikitaved

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68644

Reviewed By: jbschlosser

Differential Revision: D32599540

Pulled By: neerajprad

fbshipit-source-id: 9227f7e9931834a548a88da69e4f2e9af7732cfe
2021-11-24 13:13:28 -08:00
1940cc028e [quant][graphmode][fx] Fork subgraph_rewriter from torch.fx to quantization (#68228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68228

Forking this for now so that we can make changes as we need, the changes can be merged back to torch.fx
later

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32537713

fbshipit-source-id: 326598d13645fcc28ef2c66baaac6a077b80fd0c
2021-11-24 10:49:05 -08:00
aceb46e4ce OpInfos for torch.{flatten, column_stack} (#67555)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67555

Test Plan: Imported from OSS

Reviewed By: cpuhrsch

Differential Revision: D32650384

Pulled By: anjali411

fbshipit-source-id: 9ead83b378d0ece60569e1a0fc7d8849f89566b3
2021-11-24 10:25:37 -08:00
cf54416925 Add docs entry for adjoint. (#68869)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68869

As per title.

cc brianjo mruberry anjali411

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D32647456

Pulled By: anjali411

fbshipit-source-id: 2cb053a6884e2b22d3decc058e86d10f355fcb84
2021-11-24 10:03:41 -08:00
c7d5e0f53f OpInfos for torch.atleast_{1d, 2d, 3d} (#67355)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67355

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D32649416

Pulled By: anjali411

fbshipit-source-id: 1b42e86c7124427880fff52fbe490481059da967
2021-11-24 09:55:39 -08:00
b69155f754 Avoid dtype mismatch error in torch.save if storages are unallocated (#68787)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58970

cc mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68787

Reviewed By: mruberry

Differential Revision: D32617425

Pulled By: anjali411

fbshipit-source-id: fe7f2374e4ef4428346a0a202cae8e0d382e03ab
2021-11-24 09:51:29 -08:00
208e109dbf Revert D32633806: Sparse CSR CUDA: Add block torch.addmm when mat1 is sparse
Test Plan: revert-hammer

Differential Revision:
D32633806 (b28ddd72d3)

Original commit changeset: b98db0bd655c

fbshipit-source-id: 1c757628526bb1b88747257fc77d8b9cb996e502
2021-11-24 09:15:17 -08:00
7802953dd5 [nnc][quantization] quantized ops for BI bytedoc via aten (#68790)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68790

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D32609427

Pulled By: IvanKobzarev

fbshipit-source-id: de8f4209befe2509f5033888c739554470768290
2021-11-24 08:59:44 -08:00
31d36fd35d fix sccache issue on Windows CPU (#68870)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68796

```
2021-11-24T10:12:40.7634007Z Compile requests                   4312
2021-11-24T10:12:40.7634484Z Compile requests executed          4300
2021-11-24T10:12:40.7634823Z Cache hits                         4227
2021-11-24T10:12:40.7635122Z Cache hits (C/C++)                 4227
2021-11-24T10:12:40.7636139Z Cache misses                         62
2021-11-24T10:12:40.7636930Z Cache misses (C/C++)                 62
2021-11-24T10:12:40.7637333Z Cache timeouts                        0
2021-11-24T10:12:40.7637839Z Cache read errors                     0
2021-11-24T10:12:40.7638161Z Forced recaches                       0
2021-11-24T10:12:40.7638489Z Cache write errors                    0
2021-11-24T10:12:40.7638828Z Compilation failures                  1
2021-11-24T10:12:40.7639180Z Cache errors                         10
2021-11-24T10:12:40.7639490Z Cache errors (C/C++)                 10
2021-11-24T10:12:40.7639856Z Non-cacheable compilations            0
2021-11-24T10:12:40.7640244Z Non-cacheable calls                   0
2021-11-24T10:12:40.7640601Z Non-compilation calls                12
2021-11-24T10:12:40.7640987Z Unsupported compiler calls            0
2021-11-24T10:12:40.7641426Z Average cache write               0.104 s
2021-11-24T10:12:40.7641763Z Average cache read miss           6.000 s
2021-11-24T10:12:40.7642110Z Average cache read hit            0.046 s
2021-11-24T10:12:40.7642485Z Failed distributed compilations       0
```
https://github.com/pytorch/pytorch/runs/4310176911?check_suite_focus=true

cc seemethere malfet pytorch/pytorch-dev-infra

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68870

Reviewed By: ejguan

Differential Revision: D32646289

Pulled By: janeyx99

fbshipit-source-id: bf04446439e55a4ccaf9ce7c77812752ca717a7c
2021-11-24 08:04:59 -08:00
be7e159e71 Remove extraneous logging (#68830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68830

No logical changes, removing a logging statement that was accidentally committed.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang jjlilley mrzzd

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D32628711

Pulled By: H-Huang

fbshipit-source-id: 070190b92f97c8e38d8bb03124c13cb061fc9ec1
2021-11-24 07:15:50 -08:00
7d8a79b6f3 [nnc] llvm_codegen quantization types for vectype (#68736)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68736

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D32596261

Pulled By: IvanKobzarev

fbshipit-source-id: 0388c3b5ae58eb16921d25d9a784f82f1bb924fc
2021-11-24 01:17:39 -08:00
b28ddd72d3 Sparse CSR CUDA: Add block torch.addmm when mat1 is sparse (#68707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68707

This PR adds a path for block CSR matrices for `torch.addmm`. cuSPARSE interface is restricted to 32-bit indices and square blocks.
My plan is to make everything work and tests passing using an unsafe constructor first, keeping it all private. Then discuss & implement constructors with block information separately unlocking the functions for wider use. Documentation will come with the update to constructors.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D32633806

Pulled By: cpuhrsch

fbshipit-source-id: b98db0bd655cce651a5da457e78fca08619a5066
2021-11-23 22:55:46 -08:00
b5b62b3408 Cleanup old TD logic (#68842)
Summary:
Remove `--determine-from` option from run_test.py and remove all
references from corresponding test scripts

Followup after https://github.com/pytorch/pytorch/pull/64921

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68842

Reviewed By: seemethere, janeyx99

Differential Revision: D32631418

Pulled By: malfet

fbshipit-source-id: bdb5dd888c1d97dfaf95c1f299bf8073f3de9588
2021-11-23 18:45:42 -08:00
d9f3feb5a2 [SR] Use std::vector::reserve for StaticModule constants (#68834)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68834

This diff uses std::vector::reserve for constructing constants in StaticModule. We can also avoid two extra iterations over all the graph nodes.

This diff should technically improve its performance by a tiny bit.

Test Plan: - [x] buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- -v 1

Reviewed By: mikeiovine

Differential Revision: D32628806

fbshipit-source-id: 99dd2a7a36e86899ca1fe5300f3aa90d30a43726
2021-11-23 18:00:04 -08:00
8fb9ce4927 Update Documentation to Make CUDA Call Explicit (#67973)
Summary:
I am clarifying in the docs to make the call to cudaStreamWaitEvent explicit.

Fixes https://github.com/pytorch/pytorch/issues/67866

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67973

Reviewed By: mruberry

Differential Revision: D32620261

Pulled By: ngimel

fbshipit-source-id: 1fc8beb2062baaddb013ea4d7b10da2baa10f15e
2021-11-23 16:25:37 -08:00
79b67d9a4a [Quant] Refactor handling of FixedQParams operators (#68143)
Summary:
**Summary**: FixedQParams operators do not need fake quantization
in the prepare step. This commit introduces FixedQParamsObserver
and makes FixedQParamsFakeQuantize a simple wrapper around this
observer. It also removes the fake quantize logic in forward.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68143

Test Plan:
Added two tests:
python3 test/test_quantization.py TestQuantizeFx.test_fixed_qparams_patterns
python3 test/test_quantization.py TestQuantizeFx.test_register_patterns

**Reviewers**: Jerry Zhang

**Subscribers**: Jerry Zhang, Supriya Rao

**Tasks**: T104942885

**Tags**: pytorch

Reviewed By: albanD

Differential Revision: D32484427

Pulled By: andrewor14

fbshipit-source-id: 5a048b90eb4da79074c5ceffa3c8153f8d8cd662
2021-11-23 15:26:10 -08:00
998daf44d6 All get_attr node to be in64 type (#68818)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68818

Operator Support was blocking all node with dtype int64 from lowering. This diff ease the condition, for input from get_attr node(which are known not gonna be used for trt compute) to have dtype int64.

Reviewed By: brad-mengchi, 842974287

Differential Revision: D32609457

fbshipit-source-id: ea255f3281349a4254cb6abdeed671ab2c0216ba
2021-11-23 15:21:47 -08:00
78dce417a1 [BE] Simplify magma installation logic (#68778)
Summary:
Difference between `CUDA_VERSION` is magma package name is just a dot between major and minor

In process of refactoring, discovered that some docker images set `CUDA_VERSION` to contain minor revision, so modified pattern to strip it, i.e. `cuda-magma102` would be installed for `CUDA_VERSION=10.2.89` and `cuda-magma113` would be installed for `CUDA_VERSION=11.3.0`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68778

Reviewed By: seemethere

Differential Revision: D32605365

Pulled By: malfet

fbshipit-source-id: 43f8edeee5b55fdea6b4d9943874df8e97494ba1
2021-11-23 14:57:44 -08:00
2cd48d14ef Fix test_svd_errors_and_warnings warning message when cuda >= 11.5 (#68683)
Summary:
In SVD cusolverDnXgesvd computations,

When cuda < 11.5, cusolver raises CUSOLVER_STATUS_EXECUTION_FAILED when input contains nan.
When cuda >= 11.5, cusolver normally finishes execution and sets info array indicating convergence issue.

Related: https://github.com/pytorch/pytorch/issues/68259 #64533

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68683

Reviewed By: dagitses

Differential Revision: D32583576

Pulled By: mruberry

fbshipit-source-id: f732872522e0bda2703450ffcc64ae3a0d3f5bc0
2021-11-23 14:16:23 -08:00
8e343ba5db Revert D32611368: [pytorch][PR] Initial version of general convolution_backward
Test Plan: revert-hammer

Differential Revision:
D32611368 (445b31abff)

Original commit changeset: 26d759b7c908

fbshipit-source-id: e91f45f0f31150e60d657a3964b7e42027beff58
2021-11-23 13:39:36 -08:00
84047ff342 Add API usage logging to ShardedTensor and fix a few tests. (#68771)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68771

ghstack-source-id: 143974518

Test Plan: waitforbuildbot

Reviewed By: fduwjj, wanchaol

Differential Revision: D32601562

fbshipit-source-id: ed624137efab94fbe556609bb40cca14e69d9bac
2021-11-23 13:30:59 -08:00
959cb03132 Populate operator_input_sizes_ (#68542)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68542

title

Test Plan: unittest

Reviewed By: iseeyuan

Differential Revision: D32508159

fbshipit-source-id: 0773a725973a493f19a2e9a340365e559dfdf7f8
2021-11-23 12:18:06 -08:00
c0e6dc9ac7 [pytorch] Fix loading from checkpoint after "maximize" flag was introduced in SGD (#68733)
Summary:
After 'maximize' flag was introduced in  https://github.com/pytorch/pytorch/issues/46480 some jobs fail because they resume training from the checkpoints.

After we load old checkpoints we will get an error during optimizer.step() call during backward pass in [torch/optim/sgd.py", line 129] because there is no key 'maximize' in the parameter groups of the SGD.

To circumvent this I add a default value `group.setdefault('maximize', False)` when the optimizer state is restored.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68733

Reviewed By: albanD

Differential Revision: D32480963

Pulled By: asanakoy

fbshipit-source-id: 4e367fe955000a6cb95090541c143a7a1de640c2
2021-11-23 11:42:16 -08:00
73f494d690 .circleci: Remove migrated CUDA 10.2 build (#68782)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68782

These builds are no longer required for slow_gradcheck and should be
removed

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet, janeyx99

Differential Revision: D32606679

Pulled By: seemethere

fbshipit-source-id: e4827a6f217b91c34cfab6c2340e3272f3db1522
2021-11-23 09:50:53 -08:00
23288fdacc Making norms inputs independent (#68526)
Summary:
An update to https://github.com/pytorch/pytorch/issues/67442 to make sure all of the inputs produced are independent

Updates group_norm and instance_norm (local_response_norm was already producing independent inputs)

Also updates instance_norm for a bug in one set of inputs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68526

Reviewed By: ngimel

Differential Revision: D32532076

Pulled By: samdow

fbshipit-source-id: 45b9320fd9aecead052b21f838f95887cfb71821
2021-11-23 09:41:36 -08:00
e7e1b76106 Require CMake 3.13 when building with Ninja (#68731)
Summary:
There is a bug in CMake's Ninja generator where files considered inputs to the cmake command couldn't be generated by another build step. The fix was included in CMake 3.13, but 3.10.3 is still sufficient for other cmake generators e.g. makefiles.
For reference, the bug is here https://gitlab.kitware.com/cmake/cmake/-/issues/18584

This is necessary for https://github.com/pytorch/pytorch/issues/68246 but I'm isolating the change here to make testing easier.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68731

Reviewed By: jbschlosser

Differential Revision: D32604545

Pulled By: malfet

fbshipit-source-id: 9bc0bd8641ba415dd63ce21a05c177e2f1dd9866
2021-11-23 09:34:20 -08:00
3282386aa4 Added additional string to search cpu flags for vnni detection (#67686)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67685

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67686

Reviewed By: ejguan

Differential Revision: D32109038

Pulled By: malfet

fbshipit-source-id: 3ea6e4cc1aa82831fd6277129a67c8241a5591a5
2021-11-23 09:32:53 -08:00
98e51895ef [dist_quant] change op registration to each file instead (#68797)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68797

This change dist quantization op registration to each file instead, allow torch deploy test to pass
ghstack-source-id: 143994945

Test Plan: wait for sc

Reviewed By: jbschlosser

Differential Revision: D32610679

fbshipit-source-id: 3ade925286f1ed0f65017939f1ad3f5c539e1767
2021-11-23 09:20:26 -08:00
445b31abff Initial version of general convolution_backward (#65219)
Summary:
Towards [convolution consolidation](https://fb.quip.com/tpDsAYtO15PO).

Introduces the general `convolution_backward` function that uses the factored-out backend routing logic from the forward function.

Some notes:
* `finput` is now recomputed in the backward pass for the slow 2d / 3d kernels instead of being saved from the forward pass. The logic for is based on the forward computation and is present in `compute_finput2d` / `compute_finput3d` functions in `ConvUtils.h`.
* Using structured kernels for `convolution_backward` requires extra copying since the backend-specific backward functions return tensors. Porting to structured is left as future work.
* The tests that check the routing logic have been renamed from `test_conv_backend_selection` -> `test_conv_backend` and now also include gradcheck validation using an `autograd.Function` hooking up `convolution` to `convolution_backward`. This was done to ensure that gradcheck passes for the same set of inputs / backends.

The forward pass routing is done as shown in this flowchart (probably need to download it for it to be readable since it's ridiculous):
![conv_routing_graph md](https://user-images.githubusercontent.com/75754324/137186002-5bca75ca-f911-4e61-8245-ec07af841506.png)

![conv_nogroup_routing_graph md](https://user-images.githubusercontent.com/75754324/139731619-9d0d436e-cce3-4bc3-8eaf-d469f667f0d7.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65219

Reviewed By: mruberry

Differential Revision: D32611368

Pulled By: jbschlosser

fbshipit-source-id: 26d759b7c908ab8f19ecce627acea7bd3d5f59ba
2021-11-23 08:19:45 -08:00
a31aea8eaa [quant][graphmode][fx] Add support for specifying reference quantized module mapping in backend_config_dict (#68227)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68227

This PR adds two keys to backend_config_dict:
"root_module": the root module for the pattern (since we may have patterns for fused ops)
"reference_quantized_module_for_root": the corresponding reference quantized module for the root

Test Plan:
```
python test/test_quant_trt.py TestQuantizeFxTRTOps
python test/test_quant_trt.py TestConvertFxDoNotUse
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32537711

fbshipit-source-id: 6b8f36a219db7bb6633dac53072b748ede8dfa78
2021-11-22 21:35:04 -08:00
b845b9876b [sparsity] Fix for the failing pruner test (#68794)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68794

The pruner `test_constructor` fails because of a typo in the regular expression matching for the error that the pruner throws.
This fixes it.

Test Plan:
Separate test is not needed -- single letter change.
Previous test: `python test/test_ao_sparsity.py -- TestBasePruner

Reviewed By: ngimel

Differential Revision: D32609589

Pulled By: z-a-f

fbshipit-source-id: 800ef50c8cdbf206087bc6f945d1830e4af83c03
2021-11-22 21:07:24 -08:00
d6a68e0b8d [PyTorch][3/N] Enable the rest forward spec options for ShardedEmbedding and ShardedEmbeddingBag. (#67799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67799

We have enabled the sharding embedding and embedding bag in https://github.com/pytorch/pytorch/pull/67188 and https://github.com/pytorch/pytorch/pull/66604. We now want to enable as many parameters as defined in doc as possible: https://pytorch.org/docs/stable/generated/torch.nn.functional.embedding_bag.html, https://pytorch.org/docs/stable/generated/torch.nn.functional.embedding.html.

For the ones that we don't support we just throw exception.

Last but not least, we use get to get params instead of directly using the key.
ghstack-source-id: 143987066

Test Plan: Unit test & CI

Reviewed By: pritamdamania87

Differential Revision: D31985333

fbshipit-source-id: 3794241b81eecc815bc4390679d0bb0323f4ae72
2021-11-22 20:33:03 -08:00
5d300e761d Add OpInfos for parcel Activation Functions I (#68521)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68521

Reviewed By: jbschlosser

Differential Revision: D32606625

Pulled By: saketh-are

fbshipit-source-id: acf98a07c45bce95b1470bf9856577426265f3d1
2021-11-22 20:01:35 -08:00
74e6d2ce67 fix typos in jit_language_reference.rst (#68706)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68700

- indent problem

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68706

Reviewed By: mruberry

Differential Revision: D32598916

Pulled By: jbschlosser

fbshipit-source-id: 42af216e83fb48bbd311fc3d41fc3e8f5a2fef08
2021-11-22 19:09:06 -08:00
e7d8f096c9 [sparsity] Fix GPU training for sparsity (#66412)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66412

The GPU training was not supported in the sparsifier.
The reason was that when the sparsifier was created the masks would default to the CPU.
Attaching a GPU model to the sparsifier would throw an error.
The solution is to create the masks on the same device as the weight.

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31590675

Pulled By: z-a-f

fbshipit-source-id: 98c2c1cedc7c60aecea4076e5254ef6b3443139e
2021-11-22 16:49:39 -08:00
0b0674121a Fix strict aliasing rule violation in bitwise_binary_op (#66194)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66119

Failure on ARM Neoverse N1 before this PR:
```
======================================================================
FAIL: test_bitwise_ops_cpu_int16 (__main__.TestBinaryUfuncsCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 373, in instantiated_test
    result = test(self, **param_kwargs)
  File "test_binary_ufuncs.py", line 315, in test_bitwise_ops
    self.assertEqual(op(a, b), op(a_np, b_np))
  File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1633, in assertEqual
    self.assertEqual(
  File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1611, in assertEqual
    super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Tensors failed to compare as equal!Found 176 different element(s) (out of 225), with the greatest difference of 21850 (-21846 vs. 4) occuring at index (0, 2).

======================================================================
FAIL: test_bitwise_ops_cpu_int32 (__main__.TestBinaryUfuncsCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 373, in instantiated_test
    result = test(self, **param_kwargs)
  File "test_binary_ufuncs.py", line 315, in test_bitwise_ops
    self.assertEqual(op(a, b), op(a_np, b_np))
  File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1633, in assertEqual
    self.assertEqual(
  File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1611, in assertEqual
    super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Tensors failed to compare as equal!Found 188 different element(s) (out of 225), with the greatest difference of 1335341061 (-1335341056 vs. 5) occuring at index (14, 8).

----------------------------------------------------------------------
```
which passes now.

CC malfet ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66194

Reviewed By: dagitses, bdhirsh, ngimel

Differential Revision: D31430274

Pulled By: malfet

fbshipit-source-id: bcf1c9d584c02eff328dd5b1f7af064fac5942c9
2021-11-22 16:43:09 -08:00
d176c82bd5 [sparsity] Fix and enable the pruning tests (#66411)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66411

The original tests were disabled, and had some bugs. This fixes those unittests.

Test Plan: Imported from OSS

Reviewed By: HDCharles

Differential Revision: D31590678

Pulled By: z-a-f

fbshipit-source-id: ddbed34cc01d5f15580cb8f0033416f2f9780068
2021-11-22 15:28:12 -08:00
b46c89d950 Add linalg.solve_triangular (#63568)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568

This PR adds the first solver with structure to `linalg`. This solver
has an API compatible with that of `linalg.solve` preparing these for a
possible future merge of the APIs. The new API:
- Just returns the solution, rather than the solution and a copy of `A`
- Removes the confusing `transpose` argument and replaces it by a
correct handling of conj and strides within the call
- Adds a `left=True` kwarg. This can be achieved via transposes of the
inputs and the result, but it's exposed for convenience.

This PR also implements a dataflow that minimises the number of copies
needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the
conjugate and neg bits.

This algorithm is implemented for `solve_triangular` (which, for this, is
the most complex of all the solvers due to the `upper` parameters).
Once more solvers are added, we will factor out this calling algorithm,
so that all of them can take advantage of it.

Given the complexity of this algorithm, we implement some thorough
testing. We also added tests for all the backends, which was not done
before.

We also add forward AD support for `linalg.solve_triangular` and improve the
docs of `linalg.solve_triangular`. We also fix a few issues with those of
`torch.triangular_solve`.

Resolves https://github.com/pytorch/pytorch/issues/54258
Resolves https://github.com/pytorch/pytorch/issues/56327
Resolves https://github.com/pytorch/pytorch/issues/45734

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32588230

Pulled By: mruberry

fbshipit-source-id: 69e484849deb9ad7bb992cc97905df29c8915910
2021-11-22 12:41:06 -08:00
a2e35e167b refactor: update f-string for swa.utils.py (#68718)
Summary:
_ Update some old-style formats to f-string, for whole and coherent consistency.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68718

Reviewed By: jbschlosser

Differential Revision: D32593746

Pulled By: albanD

fbshipit-source-id: fcc17958f8af6a3260beca883bc1065f019dcf0e
2021-11-22 11:23:18 -08:00
9554ebe44e [Dist CI][BE] c10d gloo tests run in subprocess (#68504)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68504

Per title
ghstack-source-id: 143928767

Test Plan: CI

Reviewed By: H-Huang

Differential Revision: D32485100

fbshipit-source-id: a55687aea4af69e3830aee6f0278550c72f142c2
2021-11-22 09:54:07 -08:00
ddc22ea3b2 [Dist CI][BE] test_c10d_nccl run in subprocess (#68503)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68503

Per title
ghstack-source-id: 143928768

Test Plan: CI

Reviewed By: H-Huang

Differential Revision: D32484990

fbshipit-source-id: 6682f46256af0da5153e5087a91a7044156dd17f
2021-11-22 09:52:58 -08:00
39ec0f321b GHA: add print_tests_stats step to MacOS workflow (#68669)
Summary:
This will allow trunk CI to print test stats and upload stats (test reports, flaky tests, failed tests) to
- Scribe
- S3
- RDS

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68669

Reviewed By: dagitses

Differential Revision: D32578169

Pulled By: janeyx99

fbshipit-source-id: c348e2070402754789f462b52cd71411984102e2
2021-11-22 08:26:52 -08:00
a66ff81837 [DataPipe] Optimize Grouper from N^2 to N (#68647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68647

Fixes #68539

When all data from source datapipe depletes, there is no need to yield the biggest group in the buffer.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32562646

Pulled By: ejguan

fbshipit-source-id: ce91763656bc457e9c7d0af5861a5606c89965d5
2021-11-22 07:49:13 -08:00
148f323856 Revert D32541986: [pytorch][PR] [opinfo] use dtypes instead of dtypesIfCPU
Test Plan: revert-hammer

Differential Revision:
D32541986 (d2a90f91bc)

Original commit changeset: 793d7d22c3ec

fbshipit-source-id: c60c4be3416f6feb658b5da1bdf75f0cbe6bee24
2021-11-22 04:58:01 -08:00
7c6a8a47db [BE] minor improvement to dist quantization (#67401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67401

some minor changes to dist quantization, mainly change the namespace and add some notes for future code dedup
ghstack-source-id: 143910067
ghstack-source-id: 143910067

Test Plan: wait for ci

Reviewed By: mrshenli

Differential Revision: D31979269

fbshipit-source-id: 85a2f395e6a3487dd0b9d1fde886eccab106e289
2021-11-21 23:31:59 -08:00
fb556c91ce [BE] delete frontend.cpp (#67400)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67400

c10d/frontend.cpp was originally proposed to introduce pure C++ API and use TorcBind to share python level API with TorchScript. This is no longer needed, so delete this to reduce code redundancy.
ghstack-source-id: 143910066
ghstack-source-id: 143910066

Test Plan: wait for ci

Reviewed By: navahgar

Differential Revision: D31979270

fbshipit-source-id: 6ceb8b53d67ab8f9aef44b34da79346dfbb51225
2021-11-21 23:30:52 -08:00
d2a90f91bc [opinfo] use dtypes instead of dtypesIfCPU (#67619)
Summary:
Replace usage of `dtypesIfCPU` with `dtypes` in OpInfo class and also make it a mandatory argument.

Also added DeprecationWarning on using `dtypesIfCPU`

This raises a question :
For an OpInfo entry, currently `dtypes` works for any external backend, `dtypesIfCPU` for CPU and `dtypesIfCUDA` and `dtypesIfROCM` for CUDA and ROCm respectively.

If we merge `dtypes` and `dtypesIfCPU`, then for cases where external backend `dtypes` don't match cpu `dtypes` then it will lead to failures.

Currently there are few issues (5 failures) due to this on XLA (we may add relevant skips for the same). If we agree that skip should be added, then should it be added via OpInfo using decorators mechanism or at the XLA end? I think XLA end makes more sense to me to have one source of skips.

<details>

<summary>XLA Fail Log</summary>

```
Nov 01 11:48:26 ======================================================================
Nov 01 11:48:26 ERROR [0.016s]: test_reference_eager_histogram_xla_float32 (__main__.TestOpInfoXLA)
Nov 01 11:48:26 ----------------------------------------------------------------------
Nov 01 11:48:26 Traceback (most recent call last):
Nov 01 11:48:26   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test
Nov 01 11:48:26     result = test(self, **param_kwargs)
Nov 01 11:48:26   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper
Nov 01 11:48:26     return test(*args, **kwargs)
Nov 01 11:48:26   File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 411, in test_reference_eager
Nov 01 11:48:26     self.compare_with_eager_reference(op, sample_input)
Nov 01 11:48:26   File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 397, in compare_with_eager_reference
Nov 01 11:48:26     cpu_inp, cpu_args, cpu_kwargs = cpu(sample_input)
Nov 01 11:48:26   File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 393, in cpu
Nov 01 11:48:26     sample.args), to_cpu(sample.kwargs)
Nov 01 11:48:26   File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 386, in to_cpu
Nov 01 11:48:26     return {k: to_cpu(v) for k, v in x.items()}
Nov 01 11:48:26   File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 386, in <dictcomp>
Nov 01 11:48:26     return {k: to_cpu(v) for k, v in x.items()}
Nov 01 11:48:26   File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 390, in to_cpu
Nov 01 11:48:26     raise ValueError("Unknown type {0}!".format(type(x)))
Nov 01 11:48:26 ValueError: Unknown type <class 'NoneType'>!
Nov 01 11:48:26
Nov 01 11:48:26 ======================================================================
Nov 01 11:48:26 FAIL [0.575s]: test_reference_eager___rmatmul___xla_int64 (__main__.TestOpInfoXLA)
Nov 01 11:48:26 ----------------------------------------------------------------------
Nov 01 11:48:26 Traceback (most recent call last):
Nov 01 11:48:26   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test
Nov 01 11:48:26     result = test(self, **param_kwargs)
Nov 01 11:48:26   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper
Nov 01 11:48:26     return test(*args, **kwargs)
Nov 01 11:48:26   File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 411, in test_reference_eager
Nov 01 11:48:26     self.compare_with_eager_reference(op, sample_input)
Nov 01 11:48:26   File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 402, in compare_with_eager_reference
Nov 01 11:48:26     self.assertEqual(actual, expected, exact_dtype=True, exact_device=False)
Nov 01 11:48:26   File "/var/lib/jenkins/workspace/xla/test/pytorch_test_base.py", line 607, in assertEqual
Nov 01 11:48:26     return DeviceTypeTestBase.assertEqual(self, x, y, *args, **kwargs)
Nov 01 11:48:26   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1903, in assertEqual
Nov 01 11:48:26     super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
Nov 01 11:48:26 AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=0.001, found 44 element(s) (out of 50) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 9.187201950435738e+18 (-9.187201950435738e+18 vs. 34.0), which occurred at index (0, 4).
Nov 01 11:48:26
Nov 01 11:48:26 ======================================================================
Nov 01 11:48:26 FAIL [0.137s]: test_reference_eager_linalg_multi_dot_xla_int64 (__main__.TestOpInfoXLA)
Nov 01 11:48:26 ----------------------------------------------------------------------
Nov 01 11:48:26 Traceback (most recent call last):
Nov 01 11:48:26   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test
Nov 01 11:48:26     result = test(self, **param_kwargs)
Nov 01 11:48:26   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper
Nov 01 11:48:26     return test(*args, **kwargs)
Nov 01 11:48:26   File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 411, in test_reference_eager
Nov 01 11:48:26     self.compare_with_eager_reference(op, sample_input)
Nov 01 11:48:26   File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 402, in compare_with_eager_reference
Nov 01 11:48:26     self.assertEqual(actual, expected, exact_dtype=True, exact_device=False)
Nov 01 11:48:26   File "/var/lib/jenkins/workspace/xla/test/pytorch_test_base.py", line 607, in assertEqual
Nov 01 11:48:26     return DeviceTypeTestBase.assertEqual(self, x, y, *args, **kwargs)
Nov 01 11:48:26   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1903, in assertEqual
Nov 01 11:48:26     super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
Nov 01 11:48:26 AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=0.001, found 4 element(s) (out of 4) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 140230883884432.0 (0.0 vs. 140230883884432.0), which occurred at index (0, 0).
Nov 01 11:48:26
Nov 01 11:48:26 ======================================================================
Nov 01 11:48:26 FAIL [0.461s]: test_reference_eager_matmul_xla_int64 (__main__.TestOpInfoXLA)
Nov 01 11:48:26 ----------------------------------------------------------------------
Nov 01 11:48:26 Traceback (most recent call last):
Nov 01 11:48:26   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test
Nov 01 11:48:26     result = test(self, **param_kwargs)
Nov 01 11:48:26   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper
Nov 01 11:48:26     return test(*args, **kwargs)
Nov 01 11:48:26   File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 411, in test_reference_eager
Nov 01 11:48:26     self.compare_with_eager_reference(op, sample_input)
Nov 01 11:48:26   File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 402, in compare_with_eager_reference
Nov 01 11:48:26     self.assertEqual(actual, expected, exact_dtype=True, exact_device=False)
Nov 01 11:48:26   File "/var/lib/jenkins/workspace/xla/test/pytorch_test_base.py", line 607, in assertEqual
Nov 01 11:48:26     return DeviceTypeTestBase.assertEqual(self, x, y, *args, **kwargs)
Nov 01 11:48:26   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1903, in assertEqual
Nov 01 11:48:26     super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
Nov 01 11:48:26 AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=0.001, found 37 element(s) (out of 50) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 7.661375630332297e+18 (-7.66128151259864e+18 vs. 94117733658072.0), which occurred at index (4, 5).
Nov 01 11:48:26
Nov 01 11:48:26 ======================================================================
Nov 01 11:48:26 FAIL [0.050s]: test_reference_eager_remainder_autodiffed_xla_int64 (__main__.TestOpInfoXLA)
Nov 01 11:48:26 ----------------------------------------------------------------------
Nov 01 11:48:26 Traceback (most recent call last):
Nov 01 11:48:26   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test
Nov 01 11:48:26     result = test(self, **param_kwargs)
Nov 01 11:48:26   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper
Nov 01 11:48:26     return test(*args, **kwargs)
Nov 01 11:48:26   File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 411, in test_reference_eager
Nov 01 11:48:26     self.compare_with_eager_reference(op, sample_input)
Nov 01 11:48:26   File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 402, in compare_with_eager_reference
Nov 01 11:48:26     self.assertEqual(actual, expected, exact_dtype=True, exact_device=False)
Nov 01 11:48:26   File "/var/lib/jenkins/workspace/xla/test/pytorch_test_base.py", line 607, in assertEqual
Nov 01 11:48:26     return DeviceTypeTestBase.assertEqual(self, x, y, *args, **kwargs)
Nov 01 11:48:26   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1903, in assertEqual
Nov 01 11:48:26     super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
Nov 01 11:48:26 AssertionError: False is not true : Tensors failed to compare as equal!Attempted to compare equality of tensors with different dtypes. Got dtypes torch.int64 and torch.float32.
Nov 01 11:48:26
Nov 01 11:48:26 ----------------------------------------------------------------------
```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67619

Reviewed By: ngimel

Differential Revision: D32541986

Pulled By: mruberry

fbshipit-source-id: 793d7d22c3ec9b4778784254ef6f9c980b4b0ce2
2021-11-21 21:52:38 -08:00
2d06c081ca Fix test issue with householder_product for non-contiguous inputs. (#68231)
Summary:
Fixes failing tests for `householder_product` due to non-contiguous inputs as shown here: https://github.com/pytorch/pytorch/issues/67513.

The floating point error was set too high for the complex64 type, so this PR reduces the error threshold for that particular type.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68231

Reviewed By: dagitses

Differential Revision: D32562774

Pulled By: mruberry

fbshipit-source-id: edae4447ee257076f53abf79f55c5ffa1a9b3cb2
2021-11-21 21:47:23 -08:00
3b3dc1ade8 Sparse CSR CPU: add triangular_solve_out (#62180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62180

This PR adds CPU dispatch for `triangular_solve` with sparse CSR matrix.
The implementation uses MKL Sparse library. If it's not available then a runtime error is thrown.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D32581395

Pulled By: cpuhrsch

fbshipit-source-id: 41c7133a0d2754ef60b5a7f1d14aa0bf7680a844
2021-11-21 21:29:20 -08:00
e1c449ff34 dbr quant overhead[9/x]: precalculate when to skip op_convert_after_hook (#68432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68432

Speeds up `op_convert_after_hook` by precalculating when this hook is a no-op
based on informationg gathered while tracing, and skipping execution when
this flag is true.

```
MobileNetV2, function level profiling, 1x3x224x224

// before
op_convert_before_hook = 3.25%

// after
op_convert_before_hook = 1.35%
```

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463752

Pulled By: vkuzo

fbshipit-source-id: b0c3d37909ddc8c254fe53f90954f625ae874e3b
2021-11-21 07:08:29 -08:00
ba230de118 dbr quant: remove more asserts from hot paths (#68431)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68431

asserts have some overhead, removing the asserts used only to make
mypy happy from the path which is hit in every forward.

Test Plan: python test/test_quantization.py TestQuantizeDBR

Reviewed By: jerryzh168

Differential Revision: D32463767

Pulled By: vkuzo

fbshipit-source-id: 5f85f80144f35a725afe481bf027ea61ca6315bf
2021-11-21 07:08:26 -08:00
95c00cf029 speed up quantized relu6 inplace kernel (#68404)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68404

The qclamp kernel is equal to (non inplace) or faster (inplace) than the
qrelu6 kernel.  Removing the qrelu6 kernel and routing qrelu6 to the
qclamp kernel instead.

Test Plan:
```
// correctness
python test/test_quantization.py TestQuantizedOps.test_qrelu6

// benchmarking
import torch
import torch.nn.functional as F
toq = torch.ops.quantized
import time

N_WARMUP = 5
N_ITER = 1000

data = torch.randn(32, 32, 64, 64)
data = torch.quantize_per_tensor(data, 0.05, 0, torch.quint8)

for _ in range(N_WARMUP):
    F.hardtanh(data, 0., 6., inplace=True)
t1 = time.time()
for _ in range(N_ITER):
    F.hardtanh(data, 0., 6., inplace=True)
t2 = time.time()

for _ in range(N_WARMUP):
    toq.relu6(data, inplace=True)
t3 = time.time()
for _ in range(N_ITER):
    toq.relu6(data, inplace=True)
t4 = time.time()

t_hardtanh = t2 - t1
t_qrelu6 = t4 - t3
print(t_hardtanh, t_qrelu6)

// before
0.7156341075897217 1.4007949829101562

// after
0.6825599670410156 0.6571671962738037
```

Reviewed By: jerryzh168

Differential Revision: D32463754

Pulled By: vkuzo

fbshipit-source-id: a87fe5907d7b71d87eb1d5f6588cd509a88f2969
2021-11-21 07:08:23 -08:00
592053f115 dbr quant: simplify relatedness logic (#68374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68374

Cleans up the relatedness logic in DBR quant. For now, this is still
duplicated with NS.  A future PR should unify these mappings.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463750

Pulled By: vkuzo

fbshipit-source-id: 90c2f5e79b86b1b595bd52650305bad88212ed49
2021-11-21 07:08:20 -08:00
f1021bcf38 dbr quant overhead[8/x]: small speedup in op_needs_quantization (#68373)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68373

Removes redundant logic in `op_needs_quantization`, for a small speedup.

Test Plan:
```
// MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert

// before
cur_op_needs_hooks - 0.76%
op_needs_quantizaion - 0.41%

// after
cur_op_needs_hooks - 0.70%
op_needs_quantization - 0.36%
```

Reviewed By: jerryzh168

Differential Revision: D32463762

Pulled By: vkuzo

fbshipit-source-id: 334591c514dfa5af6fabc1390005088e8c5ca952
2021-11-21 07:08:17 -08:00
74ba1067a6 dbr quant overhead[7/x]: speed up AutoQuantizationState.reset_to_new_call (#68372)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68372

Speeds up `AutoQuantizationState.reset_to_new_call` by going around
the getattr and setattr overhead in `torch.nn.Module`.

Test Plan:
```
// MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert

// before
reset_to_new_call - 1.09%

// after
reset_to_new_call - 0.18%
```

Reviewed By: jerryzh168

Differential Revision: D32463759

Pulled By: vkuzo

fbshipit-source-id: f3faa464372b0703f7d246680d62acd2782453e3
2021-11-21 07:08:15 -08:00
b7d58745c8 dbr quant overhead[6/x]: remove unneeded isinstance checks in op_convert_before_hook (#68371)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68371

`isinstance` has some overhead, changing the code in `op_convert_before_hook`
to use the information calculate during tracing instead which is cheaper.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

function level benchmarking
```
// MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert

// before
op_convert_before_hook = 3.55%
isinstance = 1.62%

// after
op_convert_before_hook = 2.89%
```

Reviewed By: jerryzh168

Differential Revision: D32463757

Pulled By: vkuzo

fbshipit-source-id: 129efe9c279a41f55b8bfd09132e21c0066298a6
2021-11-21 07:08:12 -08:00
b3a7d696b3 dbr quant overhead[5/x]: remove unnecessary asserts (#68370)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68370

Removes asserts which are duplicate (the same condition is checked
when calculating the hook type, so there is no need to check it again).
For the assert in `validate_is_at_last_seen_idx`, rewrites it to
raise an Error instead to ensure it does not get stripped in
production environments.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463766

Pulled By: vkuzo

fbshipit-source-id: 8a7b7e0bf270bc327f49bd3e5bd156339e846381
2021-11-21 07:08:09 -08:00
16a6e0612d dbr quant: clean up key types in AutoQuantizationState mappings (#68369)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68369

`AutoQuantizationState` has various mappings keyed on IDs. Only
`tensor_id_to_observer` actually needs string keys because it is an
`torch.nn.ModuleDict`.  This PR changes the other mappings to have
integer keys, for simplicity and performance.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463765

Pulled By: vkuzo

fbshipit-source-id: 5a9bf2a1102859097eedf1e536761084cd408856
2021-11-21 07:08:06 -08:00
3fc9bc43c6 dbr quant overhead[4/x]: speed up hook type calculations (#68351)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68351

Speeds up `get_module_hook_type` and `get_torch_function_hook_type` by
bypassing the expensive `torch.nn.Module` getters and setters and
fetching `_auto_quant_state` directly.

Test Plan:
Model level benchmarking is noisy.  Individual `cProfile` results:

```
// MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert

// before
get_module_hook_type - 5.96%
get_torch_function_hook_type - 2.24%

// after
get_module_hook_type - 2.10%
get_torch_function_hook_type - 0.57%
```

Reviewed By: jerryzh168

Differential Revision: D32463756

Pulled By: vkuzo

fbshipit-source-id: 6eb199052ddf8d78f1c123a427e7437fc7c4fe58
2021-11-21 07:08:03 -08:00
c72ffee497 dbr quant overhead[3/x]: speed up AutoQuantizationState.mark_cur_op_complete (#68350)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68350

`torch.nn.Module` has overhead for getting and setting attributes because
it does various type checks on the attribute.

This PR explicitly gets and sets the right thing for this particular
function, avoding the type checks. Model level benchmarks are too noisy,
but according to function level profiling this reduces the time spent in
this function in a quantized model from 2.60% to 0.53%, on MobileNetV2 with
input size 1x3x224x224.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: albanD

Differential Revision: D32463751

Pulled By: vkuzo

fbshipit-source-id: a29beed2a2b87ca4df675a30dd591f797c8a1dbe
2021-11-21 07:06:42 -08:00
c7ecf1498d dbr quant overhead[2/x]: precalculate op_convert_info (#68347)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68347

Moves `op_convert_info` to be precalculated in the convert step
instead of calculated dynamically.  This should help with framework
overhead.

Test Plan:
Noisy benchmark:

```
// before

fp32: 0.016103 seconds avg
fx_prepared: 0.019841 seconds avg, 0.811601 speedup vs fp32
fx_quantized: 0.011907 seconds avg, 1.352346 speedup vs fp32
dt_prepared: 0.035055 seconds avg, 0.459357 speedup vs fp32
dt_quantized: 0.018891 seconds avg, 0.852417 speedup vs fp32

// after

fp32: 0.020535 seconds avg
fx_prepared: 0.023071 seconds avg, 0.890070 speedup vs fp32
fx_quantized: 0.011693 seconds avg, 1.756206 speedup vs fp32
dt_prepared: 0.038691 seconds avg, 0.530734 speedup vs fp32
dt_quantized: 0.021109 seconds avg, 0.972793 speedup vs fp32
```

The benchmark is too noisy to rely on, but according to `cProfiler`
this removes about 5% of overhead.

Reviewed By: jerryzh168

Differential Revision: D32463761

Pulled By: vkuzo

fbshipit-source-id: e2ad0d7eeff7dbadf3aa379604bfe9bec0c228fe
2021-11-20 15:17:12 -08:00
9fba8971a7 dbr quant: move model level utils into own file (#68346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68346

Some utility functions for DBR quant need to be aware
of `AutoQuantizationState`.  This PR moves them into their own file, so they
can use the type directly without circular imports, and removes the mypy
ignores which are no longer necessary after this change.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463763

Pulled By: vkuzo

fbshipit-source-id: e2c367de0d5887c61e6d2c3a73d82f7d76af3de1
2021-11-20 15:17:10 -08:00
629f9a5532 dbr quant: clean up AutoQuantizationState.get_op_convert_info flag (#68345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68345

Removes a flag to unwrap scale and zp which was only needed by
the FX rewriter. Moves the logic to happen in the FX tracer instead.
This resolves a technical debt TODO.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463764

Pulled By: vkuzo

fbshipit-source-id: ba7c976664c95111174fb65488bdac62b4f4984d
2021-11-20 15:17:07 -08:00
52cc9cb0ee dbr quant: refactor AutoQuantizationState._get_packed_param_name (#68344)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68344

Makes `AutoQuantizationState._get_packed_param_name` use `seen_op_info`
instead of the current op. This will make future performance improvements
easier.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: albanD

Differential Revision: D32463758

Pulled By: vkuzo

fbshipit-source-id: 0c16fe4bc989cb66180ad674ec55060cd970e32e
2021-11-20 15:17:04 -08:00
2755cf457c dbr quant: refactor AutoQuantizationState._get_input_args_quant_dequant_info (#68343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68343

Refactors `AutoQuantizationState._get_input_args_quant_dequant_info` to
use less internal state, makes the function have no side effects by passing
the state in the arguments, and moves the function to utils file.

This will help with a future refactor to cache this info at runtime.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463760

Pulled By: vkuzo

fbshipit-source-id: bdd50b0772f128755f9b734b5eeb0a9f4bc4970b
2021-11-20 15:17:02 -08:00
57472ec414 dbr quant: refactor get_quantized_op to only use seen_op_info (#68342)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68342

Before this PR, `get_quantized_op` required the current callable.

After this PR, `get_quantized_op` only requires `seen_op_info`.
The signature was changed slightly to return `None` if the original
callable does not need replacement for quantization.

This will make it easier to make performance improvements in a
future PR.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463768

Pulled By: vkuzo

fbshipit-source-id: 5db2c4199f6c0529817f4c058f81fd1d32b9fa9f
2021-11-20 15:16:59 -08:00
9cf4779ec9 dbr quant: refactor get_func_output_obs_type to only use seen_op_info (#68341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68341

Before this PR, `get_func_output_obs_type` used information from the
incoming op and its arguments, which makes it hard to cache.

This PR refactors `get_func_output_obs_type` to only use information
collected during tracing. This will make it easier to make performance
improvements in a future PR.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: jerryzh168

Differential Revision: D32463755

Pulled By: vkuzo

fbshipit-source-id: 25a220de652f0285685d43aedf7392082104b26c
2021-11-20 15:16:56 -08:00
f8b084c563 dbr quant overhead[1/x]: remove expensive calls to named_modules (#68309)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68309

This is the first of a series of PRs to reduce overhead of DBR quantization
prototype. For now, the measurement of this work is not super scientific as
there are a lot of low hanging fruit.  As we speed up the prototype, we
might need to invest in better benchmarking.

Current benchmarking setup:
* mac OS laptop with OMP_NUM_THREADS=1
* torchvision's mobilenet_v2
* input size 1x3x224x224
* we measure fp32 forward, prepared and quantized forward with FX quant vs DBR quant

Note that due to small input size, this benchmark is pretty noisy.
The goal here is to measure overhead of DBR quant logic (not the kernels),
so small input is good as we want the kernels to take as little % of overall
time as possible.

High level goal is for DBR quant convert forward to approach the FX time.

This first PR removes the expensive named_modules calls and resets the op
counter in the op instead. According to cProf, this should be a 2 to 3 pct win.

Test Plan:
```
benchmark: https://gist.github.com/vkuzo/1a4f98ca541161704ee3c305d7740d4a

// before

fp32: 0.020101 seconds avg
fx_prepared: 0.020915 seconds avg, 0.961083 speedup vs fp32
fx_quantized: 0.012037 seconds avg, 1.670005 speedup vs fp32
dt_prepared: 0.037506 seconds avg, 0.535953 speedup vs fp32
dt_quantized: 0.022688 seconds avg, 0.885988 speedup vs fp32

// after

fp32: 0.020722 seconds avg
fx_prepared: 0.023417 seconds avg, 0.884893 speedup vs fp32
fx_quantized: 0.014834 seconds avg, 1.396942 speedup vs fp32
dt_prepared: 0.039120 seconds avg, 0.529700 speedup vs fp32
dt_quantized: 0.020063 seconds avg, 1.032831 speedup vs fp32
```

Reviewed By: albanD

Differential Revision: D32463753

Pulled By: vkuzo

fbshipit-source-id: 1d7de7d9c4837e2b0ec815f0f67014c7600bb16c
2021-11-20 15:16:53 -08:00
ed6ef0eec4 dbr quantization: inline scale and zp (#68251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68251

Before this PR, DBR quantization used to recalculate scale and zero_point
in the converted model every time it was needed, which is slow.
This PR creates a pass during the convert function to go through every
observer in the model and cache its scale and zero_point.

Note: only doing this for observers which correspond to int8 operations
is saved for a future PR.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR
```

Reviewed By: VitalyFedyunin

Differential Revision: D32463769

Pulled By: vkuzo

fbshipit-source-id: d1d2e598e2bccc1958e5023096b451d69dc34e29
2021-11-20 15:16:51 -08:00
ca499567d2 barebones numeric suite for quantization with dynamic tracing (#67776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67776

This adds a barebones `add_loggers` and `extract_logger_info` API
to analyze intermediate activations of models using quantization
with dynamic tracing.  The API generally matches the NS for FX tool,
with some omissions.  For now, this is moving fast to help us
debug real models, and the API will be 100% aligned before this is marketed to users,
in future PRs.

Note: the current approach couples Numeric Suite with the quantization
logic. This is not the best for composability, and may be changed
at a future time.

Test Plan:
```
python test/test_quantization.py TestAutoTracing.test_numeric_suite
```

```
python test/test_quantization.py TestAutoTracing.test_numeric_suite
```

Differential Revision:
D32231332
D32231332

Reviewed By: jerryzh168

Pulled By: vkuzo

fbshipit-source-id: 8adfb50cd8b7836c391669afe2e2ff6acae6d40a
2021-11-20 15:15:48 -08:00
d0eff8d846 Strided masked softmin. (#68463)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68463

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D32576497

Pulled By: cpuhrsch

fbshipit-source-id: 286edb2e7a5415df76858c69d0312743437b0fd8
2021-11-19 20:51:42 -08:00
75955e4ef8 [clone][sparse] Add torch._C._sparse namespace (#68672)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68672

This PR adds `python_module: sparse` to `native_function.yaml`.
These functions would appear in `torch._C._sparse` namespace instead of
just `torch`.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D32517813

fbshipit-source-id: 7c3d6df57a24d7c7354d0fefe1b628dc89be9431
2021-11-19 19:47:38 -08:00
95f4cd0ba9 Implement topk with sort for some cases (#68632)
Summary:
Benchmark that compares original implementation and the sort implementation (this code should run on a branch without this patch):
```python
import torch
import timeit

def tune_dtype(f):
    def ret(*args, **kwargs):
        for dtype in [torch.int8, torch.half, torch.float, torch.double]:
            f(*args, **kwargs, dtype=dtype)
    return ret

def tune_slice(f):
    def ret(*args, **kwargs):
        slice = 1
        while slice <= 256:
            f(*args, **kwargs, slice=slice)
            slice *= 2
    return ret

def tune_slice_size(f):
    def ret(*args, **kwargs):
        slice_size = 1
        while slice_size <= 1_000_000:
            f(*args, **kwargs, slice_size=slice_size)
            slice_size *= 10
    return ret

def tune_k(f):
    def ret(*args, slice_size, **kwargs):
        k = 1
        while k <= slice_size:
            f(*args, **kwargs, k=k, slice_size=slice_size)
            k *= 10
    return ret

def topk_with_sort(tensor, k, dim=-1, largest=True):
    values, indices = tensor.sort(dim=dim, descending=largest)
    return values.narrow(dim, 0, k), indices.narrow(dim, 0, k)

def run50sync(f):
    for _ in range(50):
        f()
    torch.cuda.synchronize()

def warmup():
    N = 1000000
    for i in range(1, N // 10000):
        torch.randn(i, device='cuda')

def benchmark_one(slice, slice_size, k, dtype):
    input_ = torch.empty((slice, slice_size), dtype=dtype, device="cuda").random_()
    torch.cuda.synchronize()
    time = timeit.timeit(lambda: run50sync(lambda: torch.topk(input_, k, dim=1)), number=1)
    torch.cuda.synchronize()
    time_sort = timeit.timeit(lambda: run50sync(lambda: topk_with_sort(input_, k, dim=1)), number=1)
    method = "orig" if time < time_sort else "sort"
    speedup = time / time_sort
    print(f"(dtype={dtype}, slice={slice}, slice_size={slice_size}, k={k}) -> (method={method}, speedup={speedup})")

if __name__ == "__main__":
    warmup()
    tune_dtype(tune_slice(tune_slice_size(tune_k(benchmark_one))))()

```
Benchmark result see next comment.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68632

Reviewed By: dagitses

Differential Revision: D32566233

Pulled By: ngimel

fbshipit-source-id: f7a508176ef3685b491048c4a6562121c60b8b2a
2021-11-19 17:18:20 -08:00
e554d8b89c Fix retry on connect failure decorator (#68600)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68541 by checking string contains instead of exact eror

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68600

Reviewed By: dagitses, H-Huang

Differential Revision: D32535592

Pulled By: rohan-varma

fbshipit-source-id: 864c3e3c6831f2351c2949b2348af4f48a308522
2021-11-19 17:13:30 -08:00
8e51381bac Make AOT compiler generic (#68637)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68637

Make AOT compiler compile BI bytedoc model also making the compiler generic enough for other models. Shape propagation pass replaced with the new JIT tracer as shape propagation doesn't yet support dynamic shapes.
Change to get and set input dtype to follow

Test Plan:
BI model changed to return a tuple of tensors instead of returning a tuple(list[tensor], list[string]). Modified BI model runs well with these changes
```
jf download GN91Hg9shoWzU1oPAGQ7X9SV8-5nbmQwAAAA --file bi.pt

└─ $ ./compile_model.sh -m pytorch_dev_bytedoc -p bi.pt -v v1 -i "1,115;1"
+ VERSION=v1
+ getopts m:p:v:i:h opt
+ case $opt in
+ MODEL=pytorch_dev_bytedoc
+ getopts m:p:v:i:h opt
+ case $opt in
+ MODEL_PATH=bi.pt
+ getopts m:p:v:i:h opt
+ case $opt in
+ VERSION=v1
+ getopts m:p:v:i:h opt
+ case $opt in
+ INPUT_DIMS='1,115;1'
+ getopts m:p:v:i:h opt
+ require_arg m pytorch_dev_bytedoc
+ '[' -n pytorch_dev_bytedoc ']'
+ require_arg p bi.pt
+ '[' -n bi.pt ']'
+ require_arg i '1,115;1'
+ '[' -n '1,115;1' ']'
+ '[' '!' -f bi.pt ']'
+++ dirname ./compile_model.sh
++ cd .
++ pwd -P
+ SRC_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc
+ FBCODE_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../..
+ FBSOURCE_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../..
+ KERNEL_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../../xplat/pytorch_models/build/pytorch_dev_bytedoc/v1/nnc
++ readlink -f bi.pt
++ sed 's/.pt.*//'
+ MODEL_PATH_PREFIX=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/bi
+ LLVM_CODE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/bi.compiled.ll
+ ASSEMBLY_CODE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/bi.compiled.s
+ COMPILED_MODEL_FILE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/bi.compiled.pt
+ KERNEL_FUNC_NAME=nnc_pytorch_dev_bytedoc_v1_forward
+ buck run //caffe2/binaries:aot_model_compiler -- --model=bi.pt --model_name=pytorch_dev_bytedoc --model_version=v1 '--input_dims=1,115;1'
Restarting Buck daemon because Buck version has changed...
Buck daemon started.
Parsing buck files... 0.6 sec (0/unknown)
.
.
Parsing buck files: finished in 5.0 sec
Creating action graph: finished in 0.7 sec
Downloaded 3750/4917 artifacts, 16.09 Mbytes, 13.3% cache miss (for updated rules)
Building: finished in 01:22.3 min (100%) 4995/4995 jobs, 4995/4995 updated
  Total time: 01:28.0 min
BUILD SUCCEEDED
Run with 56 threads
Run with 56 threads
Loading model...
Model loaded: /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/bi.compiled.pt
Running forward ...
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1115 11:42:18.170666 1597103 TensorImpl.h:1418] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator())
(Columns 1 to 10 0.5428  0.1651  0.0158  0.0055  0.0503  0.0749  0.0161  0.0204  0.0237  0.0095

Columns 11 to 12 0.0609  0.0148
[ CPUFloatType{1,12} ], Columns 1 to 10-1.3946 -0.0835 -1.1268  0.3325 -2.1884  4.6175 -0.1206 -1.5058 -1.5277 -2.1214

Columns 11 to 20 1.3726 -0.4573 -1.7583 -2.2275  1.9607 -5.3430 -4.4927 -3.2548 -5.3214  2.9002

Columns 21 to 30-1.3973 -0.8084 -1.8491 -1.6518  4.2531 -0.0321 -0.0282 -1.1180 -0.9800  2.9228

Columns 31 to 32 0.8228  2.2611
[ CPUFloatType{1,32} ])
Starting benchmark.
Running warmup runs.
Main runs.
Main run finished. Milliseconds per iter: 40.64. Iters per second: 24.6063
Memory usage before main runs: 71581696 bytes
Memory usage after main runs: 94347264 bytes
Peak memory usage after main runs: 94347264 bytes
Average memory increase per iter: 2.22495e+06 bytes
0 value means "not available" in above
```

Reviewed By: ljk53

Differential Revision: D32438852

fbshipit-source-id: 5defdc2593abda5da328f96248459d23b2c5e5c6
2021-11-19 17:08:07 -08:00
c41d8290b3 Rename shard_lengths to shard_sizes to be more inline with Tensor sizes. (#66464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66464

Dimension sizes are referred to as `size` in general in PyTorch and
hence rename shard_lengths to shard_sizes.

#Closes: https://github.com/pytorch/pytorch/issues/65794
ghstack-source-id: 143866449

Test Plan: waitforbuildbot

Reviewed By: fduwjj, wanchaol

Differential Revision: D31564153

fbshipit-source-id: 6273426c4b0e079358806070d0d9644740adb257
2021-11-19 16:30:00 -08:00
af564e73b8 Strided masked log_softmax. (#68461)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68461

Test Plan: Imported from OSS

Reviewed By: dagitses, zou3519

Differential Revision: D32569961

Pulled By: cpuhrsch

fbshipit-source-id: 5d262adacf239dace4a28de85af4b602e36f17f0
2021-11-19 16:28:35 -08:00
578507cb7b Fix nanmedian result using more CUDA memory than necessary (#68591)
Summary:
CUDA's `at::nanmedian` creates a sorted copy of the array, then indexes into it to create a single element view. This view necessarily keeps the entire `sorted` tensor's storage alive which can be avoided by returning a copy, which is what `at::median` does indirectly via `at::where`.

This also changes the index variable `k` to be a simple `int64_t` instead of the CUDA tensor that was used before. This saves the  additional host and device operations from calling `Tensor`'s `operator -` which helps balance out the cost of the `clone` added here.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68591

Reviewed By: dagitses

Differential Revision: D32538538

Pulled By: ngimel

fbshipit-source-id: abe9888f80cf9d24d50a83da756e649af1f6ea3b
2021-11-19 16:16:19 -08:00
6cca14d02f [fx2trt][easy] Replace all network.add_activation() call with helper function (#68676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68676

As the title, the helper functions handles setting layer name. We would want to use those helper functions whenever possible.

Test Plan: CI

Reviewed By: wushirong

Differential Revision: D32571061

fbshipit-source-id: 4a191f0085c0b3965dc02d99bb33de21973d565d
2021-11-19 15:29:39 -08:00
37edb7483a [torchelastic][1/n] Fix caffe2.test.distributed.launcher.api_test flaky tests (#68624)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68624

Fix `caffe2.test.distributed.launcher.api_test` flaky tests for opt-tsan mode.
The diff changes the default `mp.Process` invocation to use spawn context.  `mp.Process` will uses `fork` method that is not compatible with `*san`.

Test Plan: CI

Reviewed By: d4l3k

Differential Revision: D32550578

fbshipit-source-id: f4767987e8e10a6a2ece3f86e48278f2dbaebe7c
2021-11-19 15:23:30 -08:00
a545a409f8 [quant][graphmode][fx] Support input_quantized_idxs and output_quantized_idxs in the new convert (#68042)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68042

att

Also added test cases from TestQuantizeFx which tests all combinations of {fp32, int8} input and output override

Test Plan:
```
python test/fx2trt/test_quant_trt.py TestConvertFxDoNotUse
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32271511

fbshipit-source-id: 87ffc00069aaff7d1c455cdd97fac82b11aa4527
2021-11-19 15:12:54 -08:00
993b7a2052 Remove doubly nested anonymous namespace (#68555)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68555

The outer namespace is already anonymous, so this is not necessary.

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D32565941

Pulled By: malfet

fbshipit-source-id: 4daf1c46b25ff68e748e6c834c63d759ec6fde4f
2021-11-19 14:40:47 -08:00
5456d8c8f3 Add vectorized Jacobian and Hessian computation with forward AD (#67041)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67041

Original PR here: https://github.com/pytorch/pytorch/pull/62246 (The old PR does more things, but now that's split across this stack)

This PR:
- Adds "jacfwd" and "hessian_fwdrev"
- Modifies existing tests to also test the `forward_ad=True` case

Test Plan: Imported from OSS

Reviewed By: gchanan, zou3519

Differential Revision: D32314424

Pulled By: soulitzer

fbshipit-source-id: 785b0e39162b93dc3b3cb9413233447152eddd53
2021-11-19 14:31:09 -08:00
7bb401a4c9 Add forward AD support for miscellanous operators (#67820)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67820

Original PR here: https://github.com/pytorch/pytorch/pull/67040

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D32314423

Pulled By: soulitzer

fbshipit-source-id: ecd898dc903692cab084f6922a1d86986f957b1b
2021-11-19 14:31:06 -08:00
e358c49a5b Add OpInfo test and fix a couple cases (#66294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66294

In this PR:
- OpInfo for forward AD now checks batched forward grad when `op.check_batched_grad=True`
- Adds setting to disable the test for individual ops `check_batched_forward_grad` and disable for the ops here: https://github.com/pytorch/pytorch/issues/66357

Fixes some more failures:
- Make Forward AD metadata less strict by allowing stride to differ when size is 1
- Fix sum batching rule when logical tensor is a scalar and dim is unspecified
- Batching rule for `_reshape_alias`
- ~Batching rules now preserve storage offset for view operator that return non-zero storage offset~ (moved to previous PR)

Test Plan: Imported from OSS

Reviewed By: zou3519, albanD

Differential Revision: D31842020

Pulled By: soulitzer

fbshipit-source-id: 3517a8fb9d6291fccb53c0b1631eab5bbb24ebd1
2021-11-19 14:31:03 -08:00
21d203b5ca Add internal assert for tangent layout mismatch for view ops (#66293)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66293

This PR:
 - Asserts that if the output is a view, then the `is_same_metadata` must return `true`. Otherwise, we are performing a copy.
 - unless we are being called from `make_dual` which can allow the tangent and primal to have different layouts, because it is not forward differentiable.
 - To make this possible, we add `is_make_dual` as a parameter. ~The alternative is to make `make_dual` non-composite, and then we can rely on its `view_info` for differentiability information. This also assumes that the only composite function that calls `set_fw_grad` is `make_dual`.~
 - Batching rules now preserve storage offset for view operator that return non-zero storage offset

Test Plan: Imported from OSS

Reviewed By: zou3519, albanD

Differential Revision: D31842021

Pulled By: soulitzer

fbshipit-source-id: ed606f5a7b4770df1e9ebc6eb1d584b27dad5bae
2021-11-19 14:30:59 -08:00
2455cc2adf Address case when layout of tangent is not same as base (#66292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66292

In this PR:
1. Fix the case when tangent has a different layout from the base when `set_fw_grad` by adding a native function and its batching rule.

For (1) we replace the following:
```
Tensor new_with_same_meta(const Variable& base) {
  int64_t nelement_in_storage = base.storage().nbytes() / base.itemsize();
  auto new_tensor = at::zeros({nelement_in_storage}, base.options());
  auto res = new_tensor.as_strided(base.sizes(), base.strides(), base.storage_offset());
  return res;
}
```
with a native function as to enable a batching rule to alter its behavior.

This new function will be similar to `new_zeros_strided` except we also require the `storage_offset` and `storage_numel` arguments.

Possible concerns:
 - Why have redundant logic? Why not add new args `new_zeros_strided`? This is probably a niche use case, so it's better not to complicate the current API.
 - Previously the created tensor inherits the TensorOptions of the primal. Now we inherit from the TensorOptions of the tangent.
   - Probably fine. Likely, no one relies on this because the behavior is only triggered when tangent/base have different layouts.
 - Why pass in exploded size, stride, and offset? It is possible in the non-batched case to pass in a tensor directly, but not possible when we'd like to have a batching rule. The size, stride, and offset we'd be passing won't belong to any live tensor.

Test Plan: Imported from OSS

Reviewed By: zou3519, albanD

Differential Revision: D31842019

Pulled By: soulitzer

fbshipit-source-id: a58433d814fd173bc43a2c550b395377dba40de2
2021-11-19 14:29:46 -08:00
bbe2aae84c Support cuda 11.5: install magma for cuda in conda (#68665)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68667

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68665

Reviewed By: malfet

Differential Revision: D32570283

Pulled By: atalman

fbshipit-source-id: 4471fe8c4f8cc74c542ed67038322f07e861af73
2021-11-19 13:43:26 -08:00
183dcdf551 [reland] Fix flaky test_nccl_timeout (#68544)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66882

In addition to changes in https://github.com/pytorch/pytorch/pull/68403, add one more error check that can be raised when a collective times out

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68544

Reviewed By: albanD

Differential Revision: D32508706

Pulled By: rohan-varma

fbshipit-source-id: 7d41b91f547d4ad763c44cd11e7b9914b452b617
2021-11-19 13:25:24 -08:00
875ba3dddb [quant][trt] Add support for torch.addmm in TensorRT (#67537)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67537

This PR adds support for quantizing torch.addmm to produce a reference quantized pattern,
and also adds support in the backend_config_dict api that allows people to specify the input, weight and bias input for each input:

```
    addmm_config = {
        "pattern": torch.addmm,
        "observation_type": ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT,
        "dtype_configs": [
            weighted_op_qint8_dtype_config,
        ],
        # a map from input type to input index
        "input_type_to_index": {
            "bias": 0,
            "input": 1,
            "weight": 2,
        }
    }
```

This requires some changes in getting weight_dtype and bias_dtype in the type inference stage of prepare, which will be added in the previous PR

Test Plan:
```
pytho test/fx2trt/test_quant_trt.py TestQuantizeFxTRT.test_addmm
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32014998

fbshipit-source-id: 8d96c1e8b7ebb2ab385c08a5b1e43f2d5a2cbcbe
2021-11-19 13:19:28 -08:00
ee4cfaa286 [SR] Add utility class to determine tensor ranges (#68284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68284

Add a new class `ManagedTensorRanges` that determines when manage tensors can be made available for re-use. This class provides a method `availableTensors(Node* node)` that returns a vector of `Value*` (corresponding to managed tensors) that are not used (either directly or through any alias) after `node`.

Test Plan: New unit tests: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: swolchok

Differential Revision: D32397207

fbshipit-source-id: fb0d9a23f13abf6f2207e3d7266384966f477fc6
2021-11-19 13:10:55 -08:00
a6d862c50a [quant][graphmode][fx] Add support for weight and bias dtype in backend_config_dict (#68602)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68602

This PR adds support for configuring weight/bias dtype in backend_config_dict
and refactor the current code that checks when to insert observers

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32537712

fbshipit-source-id: 28eb7c61a8dcad8c1f3f6622d490a34cff0c59e2
2021-11-19 13:01:50 -08:00
da4a95c79a [ROCm] Use hipCUB/rocPRIM scan algorithms for large index support (#68487)
Summary:
For inclusive_scan and exclusive_scan, use hipCUB/rocPRIM scan algorithms for large index support.
Implemented for ROCm 5.0 and above.
Code reference : ROCmSoftwarePlatform/rocPRIM@5673df4#diff-47f4ef75e5af60dd5fe3906df9cf971f0635602a6b64a706dee6633d6677ef1a

Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com>

cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68487

Reviewed By: ngimel

Differential Revision: D32547541

Pulled By: malfet

fbshipit-source-id: 4dd984e6906aec7634d05e2ceaa55e31cd4d7376
2021-11-19 12:51:30 -08:00
5880a2f1ef Allow fuse unsqueeze cat sum with multiple input (#68650)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68650

Allow fuse unsqueeze cat sum with >2 input, the impl in this diff is naive, just concat item with add. Not sure can have more perf gain with fuse multiple add into one operation.

Test Plan: unit test

Reviewed By: jfix71

Differential Revision: D32520135

fbshipit-source-id: 535b1c8c91e415d5f1af714378b9205c1ca02ffd
2021-11-19 12:45:37 -08:00
2cab77f810 Masked normalization infrastructure and strided masked softmax (#68333)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68333

Test Plan: Imported from OSS

Reviewed By: dagitses, ZolotukhinM

Differential Revision: D32564435

Pulled By: cpuhrsch

fbshipit-source-id: 4d4662323ceffd12c210b7e931682d0442578157
2021-11-19 12:41:22 -08:00
f99f5ee088 add support for None in assert_close (#67795)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67795

Closes #61035.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D32532207

Pulled By: mruberry

fbshipit-source-id: 6a2b4245e0effce4ddea7d89eca63e3b163951a7
2021-11-19 12:38:25 -08:00
0809553cf0 refactor assert_close to be more modular (#67794)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67794

This change is needed to conveniently use the same comparison mechanism for our internal testsuite (see #67796). The reworked version is on par with the previous version except for the ability to pass a custom message as callable. Before we converted everything to a tensor so it was fairly easy to provide consistent mismatch diagnostics to the callable. Now, with arbitrary `Pair`'s that are used for comparison that is no longer viable.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D32532206

Pulled By: mruberry

fbshipit-source-id: dc847fba6a795c1766e01bc3e88b680a68287b1e
2021-11-19 12:37:16 -08:00
f74779e403 [android] Lite interpreter naming for android nightly publishing (#68651)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68651

Test Plan: Imported from OSS

Reviewed By: linbinyu

Differential Revision: D32564796

Pulled By: IvanKobzarev

fbshipit-source-id: 57847bfb2778433cfb02ad7a5a79ae30a6b438c1
2021-11-19 10:56:13 -08:00
4bcff4733d Add OpInfos for parcel Elementwise Binary II (#68085)
Summary:
Adds OpInfos for `torch.lcm`, `torch.gcd`, `torch.heaviside`, `torch.bitwise_or`, `torch.bitwise_xor`, `torch.isclose`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68085

Reviewed By: ngimel

Differential Revision: D32533310

Pulled By: saketh-are

fbshipit-source-id: 1616ebec61164cd1b44672f36220787a878b96a4
2021-11-19 10:37:07 -08:00
c2c859bdf2 [quant][embedding qat] Add benchmarks for QAT Embedding+EmbeddingBag (#66560)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66560

Test Plan: Imported from OSS

Reviewed By: HDCharles

Differential Revision: D31618282

Pulled By: b-koopman

fbshipit-source-id: ebfe723cfc4004f413f157e65532d64e8d0274b3
2021-11-19 06:29:19 -08:00
f82f14de17 [libkineto] Refactor 4/n: Simplify activity logger step 2/3 (#68329)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68329

Pull Request resolved: https://github.com/pytorch/kineto/pull/466

1. Generalize ChromeTraceLogger::handleGenericActivity to enable it to handle Cuda runtime activities as well as the Roctracer generic activities.
This primarily involves enabling generic support for CPU -> GPU flows.

2. In the event of out-of-order GPU activities (an issue with Cuda11.0, likely fixed in later versions), no longer remove them but print warnings. Another diff will add these warnings to the metadata section.

Reviewed By: briancoutinho

Differential Revision: D31624496

fbshipit-source-id: dab04b3e3c0dd6799496ac87f837363de79eea25
2021-11-18 23:09:20 -08:00
18312313c4 [Profiler] Add missing guards (#65812)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65812

Multiple threads are recording events to a shared activity buffer and the buffer is at some point transferred to libkineto.
The access to and the transfer of the buffer needs to be done under lock.

Reviewed By: leitian, xw285cornell

Differential Revision: D31220061

fbshipit-source-id: f11c879df1b55aa9068187e600730bb0e5e5455f
2021-11-18 22:39:21 -08:00
343723e2ad [PyTorch][JIT][easy] Delete unnecessary overload of MemoryDAG::mayAlias (#66966)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66966

T* is convertible to const T*, so we don't need this overload.
ghstack-source-id: 143749559

Test Plan: builds

Reviewed By: hlu1

Differential Revision: D31809824

fbshipit-source-id: 70cca86c4a87dc09cd958953a08a801db3e4d047
2021-11-18 22:36:06 -08:00
ced57eb490 [PyTorch][Static Runtime] Delete incorrect alias analysis code (#67075)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67075

Sharing storage if `mayAlias` is incorrect, as the old comment notes; sharing if `mustAlias` would be nice but, as the new comment notes, would not matter.
ghstack-source-id: 143749553

Test Plan: CI

Reviewed By: hlu1

Differential Revision: D31851893

fbshipit-source-id: 5bdc8de984d5919332c9010e8b0160211d96bc2f
2021-11-18 22:34:50 -08:00
833dcaf2d6 Sparse CSR: Add torch.sin (#68123)
Summary:
This PR attempts to add support for `torch.sin` for sparse CSR tensors.

This aims to be a revised implementation (in some form) of https://github.com/pytorch/pytorch/pull/68083, and the implementation aims to be similar to that in [`SparseTensorMath.cpp` file](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/sparse/SparseTensorMath.cpp)

The tests and `empty_like` support for sparse CSR tensors (with a minor correction) are borrowed from https://github.com/pytorch/pytorch/pull/68083 temporarily to assist CI with testing this PR. :)

cc nikitaved pearu cpuhrsch IvanYashchuk krshrimali

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68123

Reviewed By: jbschlosser

Differential Revision: D32533379

Pulled By: cpuhrsch

fbshipit-source-id: eb834d64d16ee12734c77e74fffa4a47614e3dfb
2021-11-18 21:58:09 -08:00
758d7dea9c torch.monitor - Initial C++ Stats (#68074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68074

This is the first step of many PRs towards implementing the `torch.monitor` RFC https://github.com/pytorch/rfcs/pull/30

This defines the aggregation types, the `Stat` class and provides some simple collection of the stats.

This doesn't match the RFC exactly as it incorporates some of the comments on the RFC as well as a few changes for performance.

Changes:
* added window_size to the stats. If specified it will always compute the stat using the `window_size` number of values. If there aren't enough values within that window it reports the previous stats.
* This doesn't include the push metrics yet (will be coming).
  After more discussion it looks like the best way to handle this is to support a hybrid where the metric can set how frequently it'll be logged. For fixed window_size metrics it'll be logged each time it hits the window size. This will allow performant counters as well as lower frequency push counters (window_size=1).

Performance considerations:
* Updating the stats acquires a lock on that Stat object. This should be performant unless there's many-many threads writing to the same stat. Single thread will typically use futex so should be quite fast.
* Adding/removing/fetching all stats sets a global lock on the stat list -- this shouldn't be an issue since these events happen infrequently.
* Fetching stats accesses one stat at a time instead of a global lock. This means the exported values are linearizable but not serializable across multiple stats but I don't expect this to be an issue.

Next steps:
1. Add StatCollector interface for push style metrics
1. Add pybind interfaces to expose to Python
1. Add default metric providers
1. Integrate into Kineto trace view

Test Plan:
buck test //caffe2/test/cpp/monitor:monitor

CI

Reviewed By: kiukchung

Differential Revision: D32266032

fbshipit-source-id: dab8747b4712f5dba5644387817a3a0fda18b66a
2021-11-18 21:46:23 -08:00
68d8ab0cc6 [const_fold] Fix call_module const folding (#68614)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68614

We need to copy modules over to the `split` graph during const folding. We were previously only doing so from the non-constant submod, but we need to do this for the constant one as well in case some `call_module` is const folded.

Test Plan: Added unit test

Reviewed By: wushirong, 842974287

Differential Revision: D32543289

fbshipit-source-id: 80d1d0ce2c18a665b00e1343d6c55d939390ab10
2021-11-18 20:56:06 -08:00
39747dc456 [nnc] Loweings for flatten, xnnpack prepack op (#68470)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68470

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D32545261

Pulled By: IvanKobzarev

fbshipit-source-id: b2bf5b3260002bcc40a351a9c56d786b16b69287
2021-11-18 20:14:42 -08:00
ca92111758 Add native_dropout (#63937)
Summary:
Adds native_dropout to have a reasonable target for torchscript in auto diff. native_dropout has scale and train as arguments in its signature, this makes native_dropout more consistent with other operators and removes conditionals in the autodiff definition.

cc gmagogsfm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63937

Reviewed By: mruberry

Differential Revision: D32477657

Pulled By: ngimel

fbshipit-source-id: d37b137a37acafa50990f60c77f5cea2818454e4
2021-11-18 19:41:10 -08:00
a39060c001 textray demo for unity
Summary:
Previously I need to back out D32220626 and then apply D31841609 to run the textray unity demo. It's hard to have other people to take a look how this textray demo looks like.

I copied the textray demo (a single file) from pytext folder to unity folder and applied the changes needed. This way, other people can also run this textray demo. This also makes my dev environment cleaner.

Test Plan: buck run mode/opt :textray_demo

Reviewed By: mleshen

Differential Revision: D32537190

fbshipit-source-id: 5df6347c4bec583c225aea9f98fbc9f37b5d3153
2021-11-18 19:04:18 -08:00
ff125a3624 Minor changes in documentation (#68557)
Summary:
Fixed some small typos

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68557

Reviewed By: mruberry

Differential Revision: D32538749

Pulled By: ngimel

fbshipit-source-id: 09a9cd4031463b6a40d7307bd8fcb7d364444ac3
2021-11-18 17:57:16 -08:00
9ce3c630ba [Docs] Mention torch.bfloat16 in torch.finfo (#68496)
Summary:
https://pytorch.org/docs/master/type_info.html#torch.torch.finfo seems to miss `torch.bfloat16`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68496

Reviewed By: mruberry

Differential Revision: D32538806

Pulled By: ngimel

fbshipit-source-id: 1296b3eb34d024cfc7d85cf53efe771ee9f98ea2
2021-11-18 17:52:41 -08:00
913ac27112 Fixes forward AD codegen for multiple formulas (#68535)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67367

- Adds check to make sure forward grad itself does not have forward grad at the same level
- Verify with `python test/test_ops.py -k test_forward_mode_AD_linalg_eigh_cpu_float64` that it fails the check before, but passes after the codegen update

Before:
```
  if (_any_has_forward_grad_eigenvalues) {
      auto self_t_raw = toNonOptFwGrad(self);
      auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self));
      auto eigenvalues_new_fw_grad = eigh_jvp_eigenvalues(self_t, eigenvalues, eigenvectors);
      if (eigenvalues_new_fw_grad.defined()) {
        // The hardcoded 0 here will need to be updated once we support multiple levels.
        eigenvalues._set_fw_grad(eigenvalues_new_fw_grad, /* level */ 0, /* is_inplace_op */ false);
      }
  }
  if (_any_has_forward_grad_eigenvectors) {
      auto self_t_raw = toNonOptFwGrad(self);
      auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self));
      auto eigenvectors_new_fw_grad = eigh_jvp_eigenvectors(self_t, eigenvalues, eigenvectors);
      if (eigenvectors_new_fw_grad.defined()) {
        // The hardcoded 0 here will need to be updated once we support multiple levels.
        eigenvectors._set_fw_grad(eigenvectors_new_fw_grad, /* level */ 0, /* is_inplace_op */ false);
      }
  }
```

After:
```
  c10::optional<at::Tensor> eigenvalues_new_fw_grad_opt = c10::nullopt;
  if (_any_has_forward_grad_eigenvalues) {
      auto self_t_raw = toNonOptFwGrad(self);
      auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self));
      eigenvalues_new_fw_grad_opt = eigh_jvp_eigenvalues(self_t, eigenvalues, eigenvectors);
  }
  c10::optional<at::Tensor> eigenvectors_new_fw_grad_opt = c10::nullopt;
  if (_any_has_forward_grad_eigenvectors) {
      auto self_t_raw = toNonOptFwGrad(self);
      auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self));
      eigenvectors_new_fw_grad_opt = eigh_jvp_eigenvectors(self_t, eigenvalues, eigenvectors);
  }
  if (eigenvalues_new_fw_grad_opt.has_value() && eigenvalues_new_fw_grad_opt.value().defined()) {
    // The hardcoded 0 here will need to be updated once we support multiple levels.
    eigenvalues._set_fw_grad(eigenvalues_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
  }

  if (eigenvectors_new_fw_grad_opt.has_value() && eigenvectors_new_fw_grad_opt.value().defined()) {
    // The hardcoded 0 here will need to be updated once we support multiple levels.
    eigenvectors._set_fw_grad(eigenvectors_new_fw_grad_opt.value(), /* level */ 0, /* is_inplace_op */ false);
  }
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68535

Reviewed By: ngimel

Differential Revision: D32536089

Pulled By: soulitzer

fbshipit-source-id: a3f288540e2d78a4a9ec4bd66d2c0f0e65dd72cd
2021-11-18 17:44:17 -08:00
e7002c62ae [nnc] External functions quantized via Dispatch (#68572)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68572

Test Plan: Imported from OSS

Reviewed By: beback4u

Differential Revision: D32522410

Pulled By: IvanKobzarev

fbshipit-source-id: 7bb373819275582bb02e0d2ffd87a78d19f92318
2021-11-18 17:27:03 -08:00
a990a7ac31 [torchelastic] Remove stale test_get_default_executable test (#68609)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68609

The test is stale and tests non-existent method

Test Plan: ci

Reviewed By: kiukchung

Differential Revision: D32540127

fbshipit-source-id: c47b7aed3df6947819efb2f4ad1b7a059c252138
2021-11-18 17:20:36 -08:00
003f6ccec6 [BE] rename some tests in test_c10d_common (#67828)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67828

as titled
ghstack-source-id: 143781976

Test Plan: wait for ci

Reviewed By: mrshenli

Differential Revision: D32165576

fbshipit-source-id: 40c04b74f9e3241d3b3d64dee53af01fcfd1018b
2021-11-18 17:14:58 -08:00
3757a16c7a Adding custom testing based on opinfos input for ops with custom rules. (#67500)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67500

* #66898

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32497547

Pulled By: Gamrix

fbshipit-source-id: 07761f0e27f4ac289377ff3279ce6470d4b727dd
2021-11-18 16:29:00 -08:00
71a031e954 Adding Custom Rules to Device Propagation (#66973)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66973

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32497549

Pulled By: Gamrix

fbshipit-source-id: 5732682c0b39709f76cf218490e5f5136c0d83f8
2021-11-18 16:28:56 -08:00
77db720c65 Moving parts of the Shape Registry into a common file (#66948)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66948

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32497550

Pulled By: Gamrix

fbshipit-source-id: 650feded6bae379af3d73a52edac2721bd7af2f2
2021-11-18 16:27:45 -08:00
244691db98 surface ncclUniqueId store broadcast error (#68597)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68597

Users got confused by just 'Socket timeout'. Surfacing detailed error message. https://fb.workplace.com/groups/319878845696681/posts/650320792652483/. As we are using store more often for desync timeout/slowness detection, will need a good wrapper to surface error message for all store APIs.

Test Plan:
```
RuntimeError: [3] is setting up NCCL communicator and retreiving ncclUniqueId from [0] via c10d key-value store by key '0', but store->get('0') got exception: Socket Timeout
Exception raised from recvBytes at caffe2/torch/csrc/distributed/c10d/Utils.hpp:595 (most recent call first):
# 0  c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool)
# 1  std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::GetFetchStackTrace()::$_0>::_M_invoke(std::_Any_data const&)
# 2  c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
# 3  c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*)
# 4  c10d::TCPStore::doWait(c10::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::chrono::duration<long, std::ratio<1l, 1000l> >)
# 5  c10d::TCPStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 6  c10d::PrefixStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 7  c10d::PrefixStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 8  c10d::ProcessGroupNCCL::broadcastUniqueNCCLID(ncclUniqueId*, c10d::OpType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)
# 9  c10d::ProcessGroupNCCL::getNCCLComm(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<c10::Device, std::allocator<c10::Device> > const&, c10d::OpType, int, bool)
# 10 c10d::ProcessGroupNCCL::allreduce(std::vector<at::Tensor, std::allocator<at::Tensor> >&, c10d::AllreduceOptions const&)
# 11 pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::ProcessGroup::Work, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup::Work> >, c10d::ProcessGroup, std::vector<at::Tensor, std::allocator<at::Tensor> >&, c10d::AllreduceOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::ProcessGroup::WorkTraceback (most recent call last):
```

Reviewed By: rohan-varma, mingzhe09088

Differential Revision: D32533304

fbshipit-source-id: e471636ee0c5291215cb6cde659b10bee13b7d12
2021-11-18 16:04:39 -08:00
ab1d879b33 [WIP] forbid aliasing between the outputs of a differentiable graph (#67732)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67732

Reviewed By: cpuhrsch

Differential Revision: D32522826

Pulled By: Krovatkin

fbshipit-source-id: 9fdf3509dcd1b885f7c7f06d22b340c0f93bbe12
2021-11-18 15:03:35 -08:00
9f4e004abd Revert D32283178: Add linalg.solve_triangular
Test Plan: revert-hammer

Differential Revision:
D32283178 (0706607abc)

Original commit changeset: deb672e6e52f

fbshipit-source-id: d2a3421292147426cc61c2f063b721acf9004755
2021-11-18 14:46:10 -08:00
48771d1c7f [BC-breaking] Change dtype of softmax to support TorchScript and MyPy (#68336)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68336

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D32470965

Pulled By: cpuhrsch

fbshipit-source-id: 254b62db155321e6a139bda9600722c948f946d3
2021-11-18 11:26:14 -08:00
748d9d2494 Revert D32187063: [static runtime] dequantize out variant
Test Plan: revert-hammer

Differential Revision:
D32187063 (f120335643)

Original commit changeset: 1fec6b74c7d3

fbshipit-source-id: 9770f8379e9ddba9e537fef0e66cc93c2caaf860
2021-11-18 10:12:31 -08:00
0706607abc Add linalg.solve_triangular (#63568)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568

This PR adds the first solver with structure to `linalg`. This solver
has an API compatible with that of `linalg.solve` preparing these for a
possible future merge of the APIs. The new API:
- Just returns the solution, rather than the solution and a copy of `A`
- Removes the confusing `transpose` argument and replaces it by a
correct handling of conj and strides within the call
- Adds a `left=True` kwarg. This can be achieved via transposes of the
inputs and the result, but it's exposed for convenience.

This PR also implements a dataflow that minimises the number of copies
needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the
conjugate and neg bits.

This algorithm is implemented for `solve_triangular` (which, for this, is
the most complex of all the solvers due to the `upper` parameters).
Once more solvers are added, we will factor out this calling algorithm,
so that all of them can take advantage of it.

Given the complexity of this algorithm, we implement some thorough
testing. We also added tests for all the backends, which was not done
before.

We also add forward AD support for `linalg.solve_triangular` and improve the
docs of `linalg.solve_triangular`. We also fix a few issues with those of
`torch.triangular_solve`.

Resolves https://github.com/pytorch/pytorch/issues/54258
Resolves https://github.com/pytorch/pytorch/issues/56327
Resolves https://github.com/pytorch/pytorch/issues/45734

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: zou3519, JacobSzwejbka

Differential Revision: D32283178

Pulled By: mruberry

fbshipit-source-id: deb672e6e52f58b76536ab4158073927a35e43a8
2021-11-18 09:45:51 -08:00
f120335643 [static runtime] dequantize out variant (#67873)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67873

Add out variant for aten::dequantize

Test Plan:
Test on inline_cvr model
```
MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/294738512/294738512_0.predictor.disagg.local --recordio_inputs=/data/users/ansha/tmp/adfinder/294738512/294738512_0_local.inputs.recordio --pt_enable_static_runtime=1 --compare_results=1 --iters=5 --warmup_iters=5 --num_threads=1 --do_profile=1 --method_name=local.forward --set_compatibility --do_benchmark=1 --recordio_use_ivalue_format=1
```

Before:
0.047472 ms.   0.409729%. aten::dequantize (9 nodes)

After
0.0307179 ms.   0.267204%. static_runtime::dequantize_copy (9 nodes, out variant)

Reviewed By: hlu1

Differential Revision: D32187063

fbshipit-source-id: 1fec6b74c7d3f25d0f445775c4558d30c55dcece
2021-11-18 09:31:27 -08:00
7d38768d84 Rename splitter result (#68303)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68303

Result of splitter is run on either accelerator or directly on gpu, rename gpu part graph to run_on_gpu

Test Plan: buck test mode/opt caffe2/test:trt_tools_test

Reviewed By: 842974287

Differential Revision: D32392492

fbshipit-source-id: b085376c00c1097752e856e22c631d74a0fbc38f
2021-11-18 09:04:30 -08:00
533e72e0a4 Fix DLPack CUDA stream convention (#67618)
Summary:
Apparently for the array API, cuda default stream and per thread stream should be 1 and 2 instead of 0 and 1:

https://data-apis.org/array-api/latest/API_specification/array_object.html?dlpack-self-stream-none#dlpack-self-stream-none.

This caused a problem in the interop with CuPy https://github.com/cupy/cupy/pull/5970#discussion_r739912926.

cc rgommers leofang mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67618

Reviewed By: albanD

Differential Revision: D32521805

Pulled By: mruberry

fbshipit-source-id: 95777e4014e5edf1f88ba10adc03c6e34c13248d
2021-11-18 08:36:05 -08:00
d5d2096dab [testing] make @dtypes mandatory when using @dtypesIf (#68186)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53647

With this if a test forgets to add `dtypes` while using `dtypesIf`, following error is raised
```
AssertionError: dtypes is mandatory when using dtypesIf however 'test_exponential_no_zero' didn't specify it
```

**Tested Locally**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68186

Reviewed By: VitalyFedyunin

Differential Revision: D32468581

Pulled By: mruberry

fbshipit-source-id: 805e0855f988b77a5d8d4cd52b31426c04c2200b
2021-11-18 08:29:31 -08:00
857fed1f42 torch.linalg.qr: forward AD support (#67268)
Summary:
As per title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67268

Reviewed By: ngimel

Differential Revision: D31960517

Pulled By: albanD

fbshipit-source-id: bfd1028a8d352f550efb420f9ca609c09f4a7484
2021-11-18 08:11:54 -08:00
a2d187a672 [BE] MapAllocator: report map error on Linux (#68545)
Summary:
Add `, strerror(errno), " (", errno, ")"`  suffix to TORCH_CHECK messages that report failures from POSIX calls

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68545

Reviewed By: ngimel

Differential Revision: D32509300

Pulled By: malfet

fbshipit-source-id: 1d7792d07e3a1184d2d54d137e6a9105dbab7d4c
2021-11-18 08:04:09 -08:00
b1aa45a8a7 Fix _make_wrapper_subclass's storage_offset handling (#68268)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68268

Previously, `_make_wrapper_subclass` ignored the storage offset it was
passed. This PR fixes that by updating TensorMaker::computeStorageSize()
and TensorMaker::make_tensor() to take into account storage_offset.

Test Plan: - added test

Reviewed By: albanD, bdhirsh

Differential Revision: D32396330

Pulled By: zou3519

fbshipit-source-id: 2c85bc4066044fe6cb5ab0fc192de6c9069855fd
2021-11-18 07:07:42 -08:00
f0e2ad5037 Stop warning spamming about vmap in gradcheck (#68586)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68586

We updated the vmap warnings to be more descriptive in
https://github.com/pytorch/pytorch/pull/67347 . However, gradcheck does
some warning squashing that matches on the warning message and we didn't
update that. This PR updates the warning squashing in gradcheck.

Test Plan: - check logs

Reviewed By: albanD

Differential Revision: D32530259

Pulled By: zou3519

fbshipit-source-id: 9db380b57c38b3b72cbdb29574f71dbfe71e90d1
2021-11-18 07:00:36 -08:00
f9ef807f4d Replace empty with new_empty in nn.functional.pad (#68565)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68565

This makes it so that we can now vmap over nn.functional.pad (circular
variant). Previously we could not because we were effectively doing
`out.copy_(input)` where the out was created with empty.

This also has the added side effect of cleaning up the code.

Test Plan:
- I tested this using functorch.vmap and can confirm that vmap now
works.
- Unfortunately this doesn't work with the vmap in core so I cannot add
a test for this here.

Reviewed By: albanD

Differential Revision: D32520188

Pulled By: zou3519

fbshipit-source-id: 780a7e8207d7c45fcba645730a5803733ebfd7be
2021-11-18 06:06:50 -08:00
6c9cf5e6ea [quant][embedding qat] eager mode QAT for Embeddings (#66429)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66429

Test Plan: Imported from OSS

Reviewed By: HDCharles, supriyar

Differential Revision: D31618284

Pulled By: b-koopman

fbshipit-source-id: 0c0e2e86b98da9f29e9b2fc2a35c59424f94cbba
2021-11-18 05:57:11 -08:00
dbbb02474b [GPU host alloc] Fast path for size 0 malloc (#68532)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68532

Diff to better handle size 0 pinned memory allocation requests.
----
### Behavior before fix
The very first size 0 malloc comes in. It will create a block with `{key: 0, value: Block(0, 0, true)}`.

Another size 0 malloc comes in.
It will either 1) get a block with size > 0 (which is a waste of pinned memory) or 2) call `cudaHostAlloc()` with size 0 to eventually get *ptr=0.
Note that this block is *not registered* to the block pool because we have a duplicate entry (and that's why we will keep wasting size > 0 pinned memory block, if `available.empty() == false`).

----
### Behavior after fix

Let `malloc()` simply return a nullptr (0).
This avoids wasting valid size > 0 blocks as well as save the calls to `cudaHostAlloc()` which is expensive.
This is also safe since `free()` simply returns success for nullptrs.

-----

Test Plan: Unit tests.

Reviewed By: yinghai

Differential Revision: D32487522

fbshipit-source-id: 6140cab54ff5a34ace7d046f218fb32805c692c0
2021-11-18 02:39:36 -08:00
4635f5711f [static runtime][dper] multi_env tests for static runtime: selective enable (#67467)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67467

Unit tests for static runtime in the dper multi-env tests for cpu and scripted (including fx-traced + scripted) models. Only turn it on for single_operators_tests that are in the inline_cvr local/local_ro/remote_ro model for now.

Will have another diff that turns this on by default and explicitly disables for certain tests.

Test Plan: buck test dper3/dper3/modules/low_level_modules/tests:single_operators_test

Reviewed By: hlu1, houseroad

Differential Revision: D30870488

fbshipit-source-id: 382daec8dbcb95135cdd43e7b84a1d23b445d27c
2021-11-18 01:04:12 -08:00
35712a8eb4 [reland] simplify init_from_local_shards API (#68021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68021

reland PR of https://github.com/pytorch/pytorch/pull/64481 as the previous one have some internal failures that didn't get captured when first landed.

This simplifies `init_from_local_shards` API in sharded tensor, to only require user pass in a list of `Shard` and `overall_size`, instead of ShardedTensorMetadata. We will do the all_gather inside to form a valid ShardedTensorMetadata instead.

TODO: add more test cases to improve coverage.
ghstack-source-id: 143661119
ghstack-source-id: 143661119

Test Plan: TestShardedTensorFromLocalShards

Reviewed By: pritamdamania87

Differential Revision: D32147888

fbshipit-source-id: 897128b75224f4b9644471a04a64079f51e0d5fe
2021-11-17 23:20:37 -08:00
Rok
952ca25daa Sparse CSR: add convert_indices_from_csr_to_coo (#66774)
Summary:
This PR adds conversion from CSR to COO.

Fixes https://github.com/pytorch/pytorch/issues/56959

cc nikitaved pearu cpuhrsch IvanYashchuk gchanan mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66774

Reviewed By: zou3519

Differential Revision: D32288415

Pulled By: cpuhrsch

fbshipit-source-id: 683ba658dc46835fdf3c0e24645c0c2bb243b968
2021-11-17 22:28:30 -08:00
96ba2099d1 Fix c10d TCP store with mutex (#68499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68499

TCP store is actually being accessed by multi-threading (NCCL watch dog thread), but no mutex protection while FileStore and HashStore have. As enabling desync root cause analysis makes store calls more often, the race condition of TCP store was always triggered when creating another process group like gloo. Adding mutex to TCP store, to be the same with FileStore and HashStore.

Test Plan:
DDP benchmark with desync debug enabled, no perf regression

https://www.internalfb.com/intern/fblearner/details/309398285?tab=Outputs

W/o this diff

https://www.internalfb.com/intern/fblearner/details/308379789?tab=Outputs

Reviewed By: mingzhe09088

Differential Revision: D32482254

fbshipit-source-id: e8f466e1c6fdcab6cfa170f44b9be70395935fb8
2021-11-17 20:30:10 -08:00
146a7f68e2 Enable desync root cause analysis for NCCL (#68310)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68310

Enable desync root cause analysis by recording the last footprint of collective calls. When timeout we parse the store trace and figure out the root cause of the desync issue. This feature is built based on async error handling.

Test Plan:
Standalone test
* Typical desync - P467288969
* Mismatched collectives - P467288916
* Mismatched broadcast size - P467288873

DDP benchmark
* DDP benchmark desync - P467433483, P467520195

No perf regression:
* w/o this diff https://www.internalfb.com/intern/fblearner/details/308379789?tab=Outputs
* w/ this diff https://www.internalfb.com/intern/fblearner/details/308534088?tab=Outputs

Reviewed By: mingzhe09088

Differential Revision: D32348647

fbshipit-source-id: 43e7e96e3fa2be0ac66c1325bceb639b461a8b3a
2021-11-17 20:29:03 -08:00
9807787135 scatter_reduce (#68115)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63780

Basic functionality of a `scatter_reduce` algorithm with `reduce="sum"`:

* `scatter_reduce` is named as `scatter_reduce2` due to compiling issues
* It currently re-uses functionality from `scatter_add`
* Tests are missing: WIP

The error when the `scatter_reduce` naming is used:
```
In file included from aten/src/ATen/core/TensorBody.h:3,
                 from ../aten/src/ATen/core/Tensor.h:3,
                 from ../aten/src/ATen/DeviceGuard.h:4,
                 from ../aten/src/ATen/ATen.h:11,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Operators.h:13949:18: error: redefinition of ‘struct at::_ops::scatter_reduce’
13949 | struct TORCH_API scatter_reduce {
      |                  ^~~~~~~~~~~~~~
aten/src/ATen/Operators.h:13817:18: note: previous definition of ‘struct at::_ops::scatter_reduce’
13817 | struct TORCH_API scatter_reduce {
      |                  ^~~~~~~~~~~~~~
aten/src/ATen/Operators.h:13960:18: error: redefinition of ‘struct at::_ops::scatter_reduce_out’
13960 | struct TORCH_API scatter_reduce_out {
      |                  ^~~~~~~~~~~~~~~~~~
aten/src/ATen/Operators.h:13839:18: note: previous definition of ‘struct at::_ops::scatter_reduce_out’
13839 | struct TORCH_API scatter_reduce_out {
      |                  ^~~~~~~~~~~~~~~~~~
In file included from ../aten/src/ATen/core/Tensor.h:3,
                 from ../aten/src/ATen/DeviceGuard.h:4,
                 from ../aten/src/ATen/ATen.h:11,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/core/TensorBody.h: In member function ‘at::Tensor at::Tensor::scatter_reduce(int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>) const’:
aten/src/ATen/core/TensorBody.h:3976:83: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’
 3976 |     return at::_ops::scatter_reduce::call(const_cast<Tensor&>(*this), dim, index, reduce, output_size);
      |                                                                                   ^~~~~~
      |                                                                                   |
      |                                                                                   c10::string_view {aka c10::basic_string_view<char>}
In file included from aten/src/ATen/core/TensorBody.h:3,
                 from ../aten/src/ATen/core/Tensor.h:3,
                 from ../aten/src/ATen/DeviceGuard.h:4,
                 from ../aten/src/ATen/ATen.h:11,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Operators.h:13824:109: note:   initializing argument 4 of ‘static at::Tensor at::_ops::scatter_reduce::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view)’
13824 |   static at::Tensor call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce);
      |                                                                                          ~~~~~~~~~~~~~~~~~~~^~~
In file included from ../aten/src/ATen/ATen.h:15,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Functions.h: In function ‘at::Tensor at::scatter_reduce(const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>)’:
aten/src/ATen/Functions.h:7119:61: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’
 7119 |     return at::_ops::scatter_reduce::call(self, dim, index, reduce, output_size);
      |                                                             ^~~~~~
      |                                                             |
      |                                                             c10::string_view {aka c10::basic_string_view<char>}
In file included from aten/src/ATen/core/TensorBody.h:3,
                 from ../aten/src/ATen/core/Tensor.h:3,
                 from ../aten/src/ATen/DeviceGuard.h:4,
                 from ../aten/src/ATen/ATen.h:11,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Operators.h:13824:109: note:   initializing argument 4 of ‘static at::Tensor at::_ops::scatter_reduce::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view)’
13824 |   static at::Tensor call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce);
      |                                                                                          ~~~~~~~~~~~~~~~~~~~^~~
In file included from ../aten/src/ATen/ATen.h:15,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Functions.h: In function ‘at::Tensor& at::scatter_reduce_out(at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>)’:
aten/src/ATen/Functions.h:7124:65: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’
 7124 |     return at::_ops::scatter_reduce_out::call(self, dim, index, reduce, output_size, out);
      |                                                                 ^~~~~~
      |                                                                 |
      |                                                                 c10::string_view {aka c10::basic_string_view<char>}
In file included from aten/src/ATen/core/TensorBody.h:3,
                 from ../aten/src/ATen/core/Tensor.h:3,
                 from ../aten/src/ATen/DeviceGuard.h:4,
                 from ../aten/src/ATen/ATen.h:11,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Operators.h:13846:111: note:   initializing argument 4 of ‘static at::Tensor& at::_ops::scatter_reduce_out::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view, at::Tensor&)’
13846 |   static at::Tensor & call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce, at::Tensor & out);
      |                                                                                            ~~~~~~~~~~~~~~~~~~~^~~
In file included from ../aten/src/ATen/ATen.h:15,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Functions.h: In function ‘at::Tensor& at::scatter_reduce_outf(const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>, at::Tensor&)’:
aten/src/ATen/Functions.h:7129:65: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’
 7129 |     return at::_ops::scatter_reduce_out::call(self, dim, index, reduce, output_size, out);
      |                                                                 ^~~~~~
      |                                                                 |
      |                                                                 c10::string_view {aka c10::basic_string_view<char>}
In file included from aten/src/ATen/core/TensorBody.h:3,
                 from ../aten/src/ATen/core/Tensor.h:3,
                 from ../aten/src/ATen/DeviceGuard.h:4,
                 from ../aten/src/ATen/ATen.h:11,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/Operators.h:13846:111: note:   initializing argument 4 of ‘static at::Tensor& at::_ops::scatter_reduce_out::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view, at::Tensor&)’
13846 |   static at::Tensor & call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce, at::Tensor & out);
      |                                                                                            ~~~~~~~~~~~~~~~~~~~^~~
In file included from aten/src/ATen/NativeFunctions.h:6,
                 from ../aten/src/ATen/TensorIndexing.h:12,
                 from ../aten/src/ATen/ATen.h:20,
                 from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1:
aten/src/ATen/NativeMetaFunctions.h: At global scope:
aten/src/ATen/NativeMetaFunctions.h:496:18: error: redefinition of ‘struct at::meta::structured_scatter_reduce’
  496 | struct TORCH_API structured_scatter_reduce : public at::impl::MetaBase {
      |                  ^~~~~~~~~~~~~~~~~~~~~~~~~
aten/src/ATen/NativeMetaFunctions.h:481:18: note: previous definition of ‘struct at::meta::structured_scatter_reduce’
  481 | struct TORCH_API structured_scatter_reduce : public at::impl::MetaBase {
      |                  ^~~~~~~~~~~~~~~~~~~~~~~~~
ninja: build stopped: subcommand failed.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68115

Reviewed By: albanD

Differential Revision: D32488450

Pulled By: cpuhrsch

fbshipit-source-id: 65e79c6d0555c0d5715535bb52aade8d5fcd9722
2021-11-17 19:53:12 -08:00
e72b9db48e [fx2trt] add converter for acc_ops.hardtanh (#68550)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68550

Missing ops in https://fburl.com/gsheet/q06f1vrc

Test Plan: unit tests

Reviewed By: wushirong

Differential Revision: D32500303

fbshipit-source-id: 9266210ae229263f6bb2a60486c279ceb766ffdf
2021-11-17 17:59:37 -08:00
9d9ca88f5c [predictor][trt] Expose more CUDA/CuDNN info to at::Context and BC stage 1 (#68146)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68146

Expose more CUDA/CuDNN info to at::Context

Test Plan: CI; lint;

Reviewed By: houseroad

Differential Revision: D32264935

fbshipit-source-id: ad43d5d245dba4a054e09346240414159832585e
2021-11-17 17:16:19 -08:00
d71092f668 [android][fbjni] Update fbjni to 0.2.2 (#68400)
Summary:
ghstack-source-id: caeb8df3a18a6fa48d591af126ac59d8e41494b5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68400

Fixes #{issue number}

CI-all check:
https://github.com/pytorch/pytorch/pull/68497

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68495

Reviewed By: linbinyu

Differential Revision: D32481451

Pulled By: IvanKobzarev

fbshipit-source-id: b19ce05ff9d63b3f701d718eefbf1e9d66e11639
2021-11-17 16:54:22 -08:00
53bfb00ee1 [bugfix] TensorList args in functionalization pass (#68395)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68395

At the time that I wrote the pass, I thought that `c10::TensorList` and `c10::List<Tensor>` were the same thing. But it looks like a `TensorList` is actually an `ArrayRef<Tensor>`. This led to a nasty bug when I tried to add conditional functionalization to `block_diag`, where in the boxed kernel, I would:

(1) unwrap the first `IValue` by calling `.toTensorList()` (this actually returns a `List<Tensor>`, not a `TensorList`).
(2) call `TensorList to_functional_tensor(List<Tensor>)` to get out a `TensorList` with the functionalized tensors
(3) wrap that back into an `IValue` and put in on the stack.

Somewhere in that sequence of operations, something bad happens and we segfault. Fixing up the signature of `to_functional_tensor` to be `List<Tensor> to_functional_tensor(List<Tensor>)` fixes the bug. I have a feeling that there's a latent TensorList-related bug in the boxing/unboxing logic that made this worse, but I'm okay to stick with my narrow fix for now.

Additionally tested by running `pytest test/test_ops.py test/test_vmap.py -v -k block_diag` on top of this PR: https://github.com/pytorch/functorch/pull/235

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32448258

Pulled By: bdhirsh

fbshipit-source-id: 3b2b6c7cd5e4c29533d0502f24272d826bfe03c1
2021-11-17 15:50:30 -08:00
b0bdf588ea [ONNX] Release values cached in global object (#68210)
Summary:
To release constants computed and stored by `ConstantValueMap::SetValue(...)` during ONNX exporting, `ConstantValueMap::Clear()` needs to be called explicitly. Otherwise, it's a memory leak.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68210

Reviewed By: jansel

Differential Revision: D32465670

Pulled By: msaroufim

fbshipit-source-id: 521e474071b94c5d2cd4f353ee062cee78be1bd4
2021-11-17 12:47:59 -08:00
4eb772fde6 Refactor saving jit::Module to mobile .pt in 2 steps: (#66494)
Summary:
1. is to convert Function -> mobile::Function
2. is to serialize mobile::Function

This also opens opportunity to create mobile::Module without saving/reloading

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66494

Reviewed By: zhxchen17

Differential Revision: D32293022

Pulled By: qihqi

fbshipit-source-id: 29b43d47ff86071d5e2f9d6ca4dba4445711ce3d
2021-11-17 12:02:20 -08:00
e2aeb4a7af Improve native layer norm backward perf (#68238)
Summary:
Benchmarks
At this PR
```
[------------------------------------------------------ ln ------------------------------------------------------]
                  |  fwd, torch.float32  |  fwdbwd, torch.float32  |  fwd, torch.float16  |  fwdbwd, torch.float16
1 threads: -------------------------------------------------------------------------------------------------------
      200, 256    |         17.5         |          106.6          |         18.1         |           94.7
      1000, 256   |         18.7         |          116.6          |         18.7         |          110.7
      6000, 256   |         28.1         |          111.8          |         19.4         |           92.3
      6272, 256   |         29.3         |          108.5          |         20.1         |           92.7
      200, 512    |         19.3         |           83.8          |         19.1         |          116.3
      1000, 512   |         17.9         |           88.0          |         17.9         |           93.0
      6000, 512   |         36.9         |          141.2          |         27.4         |          103.3
      6272, 512   |         38.2         |          146.5          |         28.1         |          107.9
      200, 1024   |         18.1         |           89.5          |         21.1         |          102.7
      1000, 1024  |         17.9         |           88.7          |         18.5         |           92.5
      6000, 1024  |         77.6         |          277.5          |         40.3         |          148.5
      6272, 1024  |         80.7         |          288.1          |         42.0         |          154.0
      200, 1536   |         17.9         |          117.3          |         18.1         |           88.1
      1000, 1536  |         22.9         |           92.0          |         19.4         |           89.0
      6000, 1536  |        123.4         |          436.3          |         61.7         |          228.5
      6272, 1536  |        129.1         |          457.3          |         64.3         |          238.5
      200, 2048   |         18.0         |           90.5          |         19.1         |          101.6
      1000, 2048  |         31.1         |          109.8          |         25.3         |          107.9
      6000, 2048  |        174.5         |          589.8          |         87.1         |          310.5
      6272, 2048  |        182.2         |          617.0          |         91.2         |          316.7
      200, 3072   |         19.8         |           96.4          |         19.4         |           89.3
      1000, 3072  |         48.1         |          168.7          |         23.5         |          100.9
      6000, 3072  |        267.1         |          930.0          |        134.8         |          519.2
      6272, 3072  |        278.2         |          971.2          |        140.7         |          540.2
```
Pre-https://github.com/pytorch/pytorch/issues/67977
```
[------------------------------------------------------- ln -------------------------------------------------------]
                    |  fwd, torch.float32  |  fwdbwd, torch.float32  |  fwd, torch.float16  |  fwdbwd, torch.float16
1 threads: ---------------------------------------------------------------------------------------------------------
        200,   256  |         20.9         |            92.6         |         21.3         |          110.1
       1000,   256  |         20.3         |            91.8         |         28.1         |          115.6
       6000,   256  |         93.0         |           310.7         |         86.3         |          299.8
       6272,   256  |         97.3         |           323.5         |         90.0         |          314.1
        200,   512  |         20.9         |           110.2         |         21.1         |           95.0
       1000,   512  |         24.0         |           102.8         |         22.2         |           95.9
       6000,   512  |        121.7         |           367.2         |        105.6         |          337.4
       6272,   512  |        127.0         |           382.3         |        111.3         |          352.0
        200,  1024  |         21.0         |           131.8         |         20.4         |           93.3
       1000,  1024  |         35.5         |           108.7         |         27.7         |           99.4
       6000,  1024  |        170.4         |           495.5         |        137.7         |          411.4
       6272,  1024  |        177.5         |           517.6         |        143.6         |          428.6
        200,  1536  |         21.9         |            97.6         |         20.8         |           92.7
       1000,  1536  |         44.3         |           129.7         |         33.9         |          100.1
       6000,  1536  |        215.8         |           619.2         |        167.2         |          480.9
       6272,  1536  |        225.0         |           646.9         |        174.8         |          505.9
        200,  2048  |         21.8         |           100.8         |         20.7         |           96.7
       1000,  2048  |         53.7         |           152.4         |         41.4         |          118.3
       6000,  2048  |        267.0         |           753.6         |        220.4         |          571.5
       6272,  2048  |        278.6         |           785.8         |        211.4         |          589.2
        200,  3072  |         20.9         |           103.7         |         21.9         |          104.6
       1000,  3072  |         71.4         |           201.1         |         53.1         |          148.3
       6000,  3072  |        365.7         |          1040.3         |        262.0         |          731.5
       6272,  3072  |        382.0         |          1084.4         |        273.3         |          766.3
```
Benchmarking script
```
import torch
from torch.utils.benchmark import Timer, Compare

results = []
for dtype in (torch.float, torch.half):
    for fs in (256, 512, 1024, 1536, 2048, 3072):
        for bs in (200, 1000, 6000, 196*32):
            ln = torch.nn.LayerNorm((fs,), device="cuda", dtype=dtype)
            X = torch.randn(bs, fs, device="cuda", dtype=dtype, requires_grad=True)
            gO = torch.rand_like(X)
            stmtfwd = "ln(X)"
            stmtfwdbwd = "X.grad=None; ln.zero_grad(set_to_none=True); out = ln(X); out.backward(gO)"
            tfwd = Timer(stmt=stmtfwd, label="ln", sub_label=f"{bs:5}, {fs:5}", description=f"fwd, {dtype}", globals=globals())
            tfwdbwd = Timer(stmt=stmtfwdbwd, label="ln", sub_label=f"{bs:5}, {fs:5}", description=f"fwdbwd, {dtype}", globals=globals())
            for t in (tfwd, tfwdbwd):
                results.append(t.blocked_autorange())
        print(fs, end='\r')
c = Compare(results)
c.print()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68238

Reviewed By: mruberry

Differential Revision: D32469450

Pulled By: ngimel

fbshipit-source-id: 08fe755c156d3d5c366c966cb808bf0f3e74c050
2021-11-17 12:00:07 -08:00
f3e2fefe09 Actually enable PYTORCH_RETRY_TEST_CASES for linux tests (#68486)
Summary:
After realizing that CUDA mem leaks were not rerun, I realized I forgot to pass the env var as a Docker variable.

What a noob mistake.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68486

Reviewed By: seemethere

Differential Revision: D32501718

Pulled By: janeyx99

fbshipit-source-id: 9918d626e90bea1562a3094c6eb12cb7d86dbf6a
2021-11-17 11:50:48 -08:00
2f37a39a5c [quant][graphmode][fx] Refactor node_name_to_target_dtype to make it more clear (#68317)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68317

We use the node_name_to_target_dtype to store the target dtype for output activations for each node, computed from qconfig for the node,
there are two problems with node_name_to_target_dtype that makes it hard to work with:
1. we mutate node_name_to_target_dtype when we insert observers, this makes the data structure confusing because it's typically unexpected
to change a data structure that store the "target" dtype
2. currently it only stores target dtype about output activations, while we also need target dtype for input activation, weight and bias

This PR fixes both problem by removing mutation from the node_name_to_target_dtype and expanding the target_dtype for node to include
the missing target dtype for input activation, weight and bias. We will have another refactor to simplify the observation for weight and bias dtype
in the future.

Please see comments for the updated structure of node_name_to_target_dtype

TODO: we may want to rename node_name_to_target_dtype to node_name_to_target_dtype_info in a separate PR.

Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D32411858

fbshipit-source-id: 3d76dd65056920ff8642899517bc1b95d43fc1de
2021-11-17 11:21:25 -08:00
3b4f072383 Remove TH/THC Storage data and copy functions (#68127)
Summary:
Part of https://github.com/pytorch/pytorch/issues/67852

cc ezyang bhosmer smessmer ljk53 bdhirsh

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68127

Reviewed By: mrshenli

Differential Revision: D32441885

Pulled By: ngimel

fbshipit-source-id: 1bbe7c8bed30bfe1737511a4f347fd9a8024dd99
2021-11-17 11:19:54 -08:00
4e21d77dbb Use TORCH_CHECK in MapAllocator (#68424)
Summary:
When porting `THAllocator` to ATen I changed `AT_ERROR` to `TORCH_INTERNAL_ASSERT` but the direct translation should have been `TORCH_CHECK`.

33e9a0b5f6/c10/util/Exception.h (L619-L623)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68424

Reviewed By: VitalyFedyunin

Differential Revision: D32465548

Pulled By: ngimel

fbshipit-source-id: 7fa9c1fe27e4849b76248badb681d7b6877ce9e8
2021-11-17 10:33:22 -08:00
693fe2fd9b docs: Added Union to supported types in documentation (#68435)
Summary:
This PR simply updates the documentation following up on https://github.com/pytorch/pytorch/pull/64234, by adding `Union` as a supported type.

Any feedback is welcome!

cc ansley albanD gmagogsfm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68435

Reviewed By: davidberard98

Differential Revision: D32494271

Pulled By: ansley

fbshipit-source-id: c3e4806d8632e1513257f0295568a20f92dea297
2021-11-17 10:26:31 -08:00
61206ba4db [SR] Add StorageGroup abstraction (#68279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68279

While reworking the liveness analysis, I noticed that using `std::pair<size_t, std::vector<Tensor*>>` to represent storage groups made things quite unreadable.

Add a simple class to wrap a `std::vector<at::Tensor*>` and store a `size` attribute

Test Plan:
`buck test caffe2/benchmarks/static_runtime/...`

Also ran inline_cvr benchmarks, did not see any errors

Reviewed By: swolchok

Differential Revision: D32369447

fbshipit-source-id: e0b562aa7eefd738b1a34f1f37eb7bc95d71a257
2021-11-17 09:29:08 -08:00
cac3cd1433 add torch.diff support for n greater than 1 (#67260)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67260

Addressing 54853

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D31930294

Pulled By: mikaylagawarecki

fbshipit-source-id: 97c7a27e9200c6688242680ff96b73dfff828479
2021-11-17 09:16:33 -08:00
3da2e09c9b Added antialias flag to interpolate (CPU only, bilinear) (#65142)
Summary:
Description:
- Added antialias flag to interpolate (CPU only)
  - forward and backward for bilinear mode
  - added tests

### Benchmarks

<details>
<summary>
Forward pass, CPU. PTH interpolation vs PIL
</summary>

Cases:
- PTH RGB 3 Channels, float32 vs PIL RGB uint8 (apply vs pears)
- PTH 1 Channel, float32 vs PIL 1 Channel Float

Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112

```
# OMP_NUM_THREADS=1 python bench_interp_aa_vs_pillow.py

Torch config: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_75,code=sm_75
  - CuDNN 8.0.5
  - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON,

Num threads: 1
[------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (320, 196) ------------------------]
                                                  |  Reference, PIL 8.3.2, mode: RGB  |  1.10.0a0+git1e87d91
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                2.9                |          3.1
      channels_last non-contiguous torch.float32  |                2.6                |          3.6

Times are in milliseconds (ms).

[------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (460, 220) ------------------------]
                                                  |  Reference, PIL 8.3.2, mode: RGB  |  1.10.0a0+git1e87d91
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                3.4                |          4.0
      channels_last non-contiguous torch.float32  |                3.4                |          4.8

Times are in milliseconds (ms).

[------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (120, 96) -------------------------]
                                                  |  Reference, PIL 8.3.2, mode: RGB  |  1.10.0a0+git1e87d91
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                1.6                |          1.8
      channels_last non-contiguous torch.float32  |                1.6                |          1.9

Times are in milliseconds (ms).

[----------------------- Downsampling: torch.Size([1, 3, 906, 438]) -> (1200, 196) ------------------------]
                                                  |  Reference, PIL 8.3.2, mode: RGB  |  1.10.0a0+git1e87d91
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                9.0                |          11.3
      channels_last non-contiguous torch.float32  |                8.9                |          12.5

Times are in milliseconds (ms).

[----------------------- Downsampling: torch.Size([1, 3, 906, 438]) -> (120, 1200) ------------------------]
                                                  |  Reference, PIL 8.3.2, mode: RGB  |  1.10.0a0+git1e87d91
1 threads: -------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |                2.1                |          1.8
      channels_last non-contiguous torch.float32  |                2.1                |          3.4

Times are in milliseconds (ms).

[--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (320, 196) --------------]
                                 |  Reference, PIL 8.3.2, mode: F  |  1.10.0a0+git1e87d91
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |               1.2               |          1.0

Times are in milliseconds (ms).

[--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (460, 220) --------------]
                                 |  Reference, PIL 8.3.2, mode: F  |  1.10.0a0+git1e87d91
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |               1.4               |          1.3

Times are in milliseconds (ms).

[--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (120, 96) ---------------]
                                 |  Reference, PIL 8.3.2, mode: F  |  1.10.0a0+git1e87d91
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |              719.9              |         599.9

Times are in microseconds (us).

[-------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (1200, 196) --------------]
                                 |  Reference, PIL 8.3.2, mode: F  |  1.10.0a0+git1e87d91
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |               3.7               |          3.5

Times are in milliseconds (ms).

[-------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (120, 1200) --------------]
                                 |  Reference, PIL 8.3.2, mode: F  |  1.10.0a0+git1e87d91
1 threads: ------------------------------------------------------------------------------
       contiguous torch.float32  |              834.4              |         605.7

Times are in microseconds (us).

```

</details>

Code is moved from torchvision: https://github.com/pytorch/vision/pull/4208

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65142

Reviewed By: mrshenli

Differential Revision: D32432405

Pulled By: jbschlosser

fbshipit-source-id: b66c548347f257c522c36105868532e8bc1d4c6d
2021-11-17 09:10:15 -08:00
143491e0ad [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D32484422

fbshipit-source-id: 5c836dc7d06f12e64cc4bb1e85d8fa4b62a29b85
2021-11-17 07:27:04 -08:00
3e3bf40b0a Revert D32452012: [pytorch][PR] Fix flaky test_nccl_timeout
Test Plan: revert-hammer

Differential Revision:
D32452012 (faa1e8b7cf)

Original commit changeset: c959b25957f2

fbshipit-source-id: a2786744b12ceed350eec0ca2834f5176a4e21ee
2021-11-17 06:08:53 -08:00
54ac64f035 Revert D32477989: [pytorch][PR] Actually enable PYTORCH_RETRY_TEST_CASES for linux tests
Test Plan: revert-hammer

Differential Revision:
D32477989 (173c0f8a98)

Original commit changeset: e28d095773f5

fbshipit-source-id: 2de5fac08f7f322a3aeb92a67b5fdfa0a6518bf1
2021-11-17 06:04:14 -08:00
0dc3f829d9 Nvfuser code bump 11 5 (#67943)
Summary:
nvfuser code update:
1. Tuning heuristics on schedulers for reduction/normalization kernels;
2. bfloat16 on IO tensor support;
3. Refactored memory format support, now we can support dimension collapsing with non-coherent input tensors with different memory format. e.g. channels last tensor input to batch normalization. Note that we are currently limiting memory format to only Contiguous and Channels last;
4. Refactored nvfuser graph partitioning in `graph_fuser.cpp`, separated node merge and profile node API. Updated `profiling_record.cpp`.

Things that are reverted from our local branch:
1. changes on some entries in autodiff
2. aten::gelu with approximation
3. native_dropout(_backward)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67943

Reviewed By: ngimel

Differential Revision: D32288709

Pulled By: dzhulgakov

fbshipit-source-id: fc9491182ea7e0158bc112c66f096823c588eaf1
2021-11-17 01:22:17 -08:00
01b30922dd [static runtime] fuse gather+to+lengths_to_offsets (#64075)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64075

Test Plan:
Before:
`I0826 17:17:54.165174 1064079 PyTorchPredictorBenchLib.cpp:313] PyTorch run finished. Milliseconds per iter: 6.66724. Iters per second: 149.987`

After:
`I0826 17:13:07.464485 1040300 PyTorchPredictorBenchLib.cpp:313] PyTorch run finished. Milliseconds per iter: 6.46362. Iters per second: 154.712`

Profile after: P453143683

Accuracy tested comparing with jit interpreter for no differences under 1e-3 (nnc ops turned on) https://www.internalfb.com/intern/diff/view-version/136824794/

======

With 100-request recordio inputs (211 inputs)

Before:
`I1101 12:43:13.558375 742187 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 11.7882. Iters per second: 84.8309`
After:
`I1101 13:50:41.087644 1126186 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 11.6763. Iters per second: 85.6438`

Profile after: P465977010
Constituent ops before (total is 0.5646):
```
       0.187392 ms.    1.61737%. fb::clip_ranges_gather (309 nodes, out variant)
       0.174101 ms.    1.50266%. fb::lengths_to_offsets (464 nodes, out variant)
       0.203126 ms.    1.75317%. static_runtime::to_copy (805 nodes, out variant)
```
Constitutent ops after (total is 0.4985):
```
       0.376559 ms.    3.25614%. fb::clip_ranges_to_gather_to_offsets (305 nodes, out variant)
      0.0614349 ms.   0.531235%. fb::lengths_to_offsets (159 nodes, out variant)
      0.0573315 ms.   0.495751%. static_runtime::to_copy (195 nodes, out variant)
     0.00325543 ms.  0.0281501%. fb::gather_ranges (4 nodes, out variant)
```

Compare with jit interpreter inside benchmark:
`I1101 13:55:53.013602 1149446 PtVsBlackBoxPredictorBenchLib.cpp:175] Finished comparing PT static runtime and jit interpreter results`

======

Casting on the fly:

a. Static runtime off
```
Static runtime ms per iter: 11.4658. Iters per second: 87.2159
0.220367 ms.    1.94726%. static_runtime::to_copy (805 nodes, out variant)
0.172585 ms.    1.52504%. fb::clip_ranges_gather (309 nodes, out variant)
0.157836 ms.    1.39471%. fb::lengths_to_offsets (464 nodes, out variant)
```

b. Casting on the fly, using explicit allocation+to_copy (which has the fast pass for certain cases, but we'll always call empty):
```
I1115 09:08:35.711972 1925508 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 11.6732. Iters per second: 85.6662

0.599439 ms.    5.25098%. fb::clip_ranges_to_gather_to_offsets (305 nodes, out variant)
0.0552475 ms.   0.483958%. fb::lengths_to_offsets (159 nodes, out variant)
0.0576032 ms.   0.504593%. static_runtime::to_copy (195 nodes, out variant)
0.00299026 ms.  0.0261941%. fb::gather_ranges (4 nodes, out variant)
```

c. Casting on the fly with native::to (no explicit allocation, but no fast pass):
```
Static runtime ms per iter: 11.5627. Iters per second: 86.4849
0.454356 ms.     3.9652%. fb::clip_ranges_to_gather_to_offsets (305 nodes, out variant)
0.06315 ms.   0.551115%. static_runtime::to_copy (195 nodes, out variant)
0.0590741 ms.   0.515544%. fb::lengths_to_offsets (159 nodes, out variant)
0.00359182 ms.   0.031346%. fb::clip_ranges_gather (4 nodes, out variant)
```

d. Removal of the to() call in question from the fusion pattern:
```
Static runtime ms per iter: 11.3658. Iters per second: 87.9836
 0.29591 ms.     2.6479%. fb::clip_ranges_to_gather_to_offsets (305 nodes, out variant)
 0.154612 ms.    1.38352%. static_runtime::to_copy (500 nodes, out variant)
0.0567151 ms.   0.507505%. fb::lengths_to_offsets (159 nodes, out variant)
0.0051115 ms.  0.0457394%. fb::clip_ranges_gather (4 nodes, out variant)
```

Reviewed By: hlu1

Differential Revision: D30515441

fbshipit-source-id: 53acee10619ac2be7dc8982e929e3210c4bb6d21
2021-11-17 00:49:31 -08:00
faa1e8b7cf Fix flaky test_nccl_timeout (#68403)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66882

- Remove time.sleep call
- Use gloo barrier to enforce rank synchronization
- Reduce timeouts for allrduce
- Pass in timeout and call wait() in _check_for_nccl_abort()

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68403

Reviewed By: H-Huang

Differential Revision: D32452012

Pulled By: rohan-varma

fbshipit-source-id: c959b25957f2eb8d59c506075da6023d25bbcfd9
2021-11-16 23:43:23 -08:00
6186b90c53 [Contrib][Fakelowp] Change Lut Size for Tanh (#68334)
Summary:
Reference code LUT size increased and now mininum
starts from 0, instead of 7000 earlier

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68334

Reviewed By: jiecaoyu

Differential Revision: D32467332

Pulled By: hl475

fbshipit-source-id: 3e4510e09374519aebe657a31f0b1ccde117e761
2021-11-16 23:39:02 -08:00
f6696c5a85 export CPUOffload in _fsdp package (#68308)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68308

export CPUOffload in _fsdp package, as cpu_offload config in FSDP API needs to import this class
ghstack-source-id: 143560608

Test Plan: unit tests

Reviewed By: rohan-varma

Differential Revision: D32408719

fbshipit-source-id: ee5c40ec91a423fbd58872fbdeb5f2dda8a3d89e
2021-11-16 22:56:12 -08:00
9c15523793 Attach unused parameter info to static graph error message (#68413)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68413

attach unused parameter info to static graph error message
ghstack-source-id: 143560766

Test Plan: unit tests

Reviewed By: rohan-varma

Differential Revision: D32457112

fbshipit-source-id: 31de859bf5289aa6044279014f0e76be9bcb9e54
2021-11-16 22:55:08 -08:00
9de730ebba q_avgpool: Loop over batch dimension inside operators (#66819)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66819

This has a number of different advantages:
- For channels last tensors, DispatchStub overhead is only incurred once.
- For contiguous tensors, parallelization now happens over batch and
  chanels, enabling better load balancing between threads.
- `q_scale()` and `q_zero_point()` are no longer called inside of a
  parallel region, which is not allowed (see gh-56794)

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32445352

Pulled By: ngimel

fbshipit-source-id: cd938e886cd5696855eb56a649eaf3bccce35e54
2021-11-16 22:29:42 -08:00
1cade067e3 [Vulkan] Vulkan backend is now thread-safe (#67733)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67733

Vulkan backend is now thread-safe:
* `ThreadContext` class holds onto all per-thread Vulkan states such as Command, Descriptor and Resource objects.
* `ThreadContext::SingletonThreadLocalObject<T>` is a very light version of `folly::SingletonThreadLocal` (https://github.com/facebook/folly/blob/main/folly/SingletonThreadLocal.h). It holds a static object with `thread_local` modifier. It is tied with a `GPU` object which allows us to expand multi-threaded GPU backend and multi-GPU capability in the future. The lifetime of `SingletonThreadLocalObject<T>` object is from the first call (instantiation) to the termination of thread.
* `MAKE_VULKAN_THREADSAFE` preprocessor is used for BUCK and the implementation of thread-safe Vulkan backend. We can quickly exclude it from the BUCK if any unexpected issue gets uncovered in the future. Once we are confident it's stable, we can remove the preprocessor from the code.
* A new perf test is added with `{3,40,221,193}` with 3 threads.
* `vkQueueSubmit` is not thread-safe, only one thread can push the commands at a time (See https://vkguide.dev/docs/chapter-1/vulkan_command_flow/#vulkan-command-execution). The number of available queues depends on GPU. It could be 1 and we cannot assume we can create multiple queues. Thus, we need to avoid calling `vkQueueSubmit` from multiple threads at the same time. When running Vulkan backend in different threads without any locking mechanism, `vkQueueSubmit` will get the `VK_ERROR_INITIALIZATION_FAILED(-3)` error.
* In the `Context::~Context()`, we should not call `flush()` since all per-thread objects will be destroyed as each thread exits. From the following logs, you can verify all per-thread objects are getting destroyed as their threads are terminated. The logs captured all ctor/dtor calls when running Vulkan backend with 3 different threads:
```
ThreadContext::ThreadContext() -> thread[0x1207d5e00] this[0x0x7f9489981e28]
Context::Context() -> thread[0x1207d5e00] this[0x7f9489981800] device_[1]
Resource::Pool::Pool() -> thread[0x7000095ab000] this[0x7f9489965258] device_[0x7f94998cf218] allocator_[0x7f947980ee00]
Command::Pool::Pool() -> thread[0x7000095ab000] this[0x7f9489965068] device_[0x7f94998cf218] command_pool_[0xfa21a40000000003]
Resource::Pool::Pool() -> thread[0x70000962e000] this[0x7f947980d458] device_[0x7f94998cf218] allocator_[0x7f949b119c00]
Command::Pool::Pool() -> thread[0x70000962e000] this[0x7f947980d268] device_[0x7f94998cf218] command_pool_[0xead9370000000008]
Resource::Pool::Pool() -> thread[0x1207d5e00] this[0x7f949a0ee858] device_[0x7f94998cf218] allocator_[0x7f9499901c00]
Command::Pool::Pool() -> thread[0x1207d5e00] this[0x7f949a0ee668] device_[0x7f94998cf218] command_pool_[0xcad092000000000d]
Descriptor::Pool::Pool() -> thread[0x1207d5e00] this[0x7f949a0ee910] device_[0x7f94998cf218] descriptor_pool_[0xa43473000000002d]
Descriptor::Pool::Pool() -> thread[0x70000962e000] this[0x7f947980d510] device_[0x7f94998cf218] descriptor_pool_[0x980b0000000002e]
Descriptor::Pool::Pool() -> thread[0x7000095ab000] this[0x7f9489965310] device_[0x7f94998cf218] descriptor_pool_[0x4b7df1000000002f]
Descriptor::Pool::~Pool() -> thread[0x7000095ab000] this[0x7f9489965310] device_[0x7f94998cf218] descriptor_pool_[0x4b7df1000000002f] -> enter
Descriptor::Pool::~Pool() -> thread[0x7000095ab000] this[0x7f9489965310] device_[0x7f94998cf218] descriptor_pool_[0x4b7df1000000002f] -> leave
Command::Pool::~Pool() -> thread[0x7000095ab000] this[0x7f9489965068] device_[0x7f94998cf218] command_pool_[0xfa21a40000000003] -> enter
Command::Pool::~Pool() -> thread[0x7000095ab000] this[0x7f9489965068] device_[0x7f94998cf218] command_pool_[0xfa21a40000000003] -> leave
Resource::Pool::~Pool() -> thread[0x7000095ab000] this[0x7f9489965258] device_[0x7f94998cf218] allocator_[0x7f947980ee00] -> enter
Descriptor::Pool::~Pool() -> thread[0x70000962e000] this[0x7f947980d510] device_[0x7f94998cf218] descriptor_pool_[0x980b0000000002e] -> enter
Descriptor::Pool::~Pool() -> thread[0x70000962e000] this[0x7f947980d510] device_[0x7f94998cf218] descriptor_pool_[0x980b0000000002e] -> leave
Command::Pool::~Pool() -> thread[0x70000962e000] this[0x7f947980d268] device_[0x7f94998cf218] command_pool_[0xead9370000000008] -> enter
Command::Pool::~Pool() -> thread[0x70000962e000] this[0x7f947980d268] device_[0x7f94998cf218] command_pool_[0xead9370000000008] -> leave
Resource::Pool::~Pool() -> thread[0x70000962e000] this[0x7f947980d458] device_[0x7f94998cf218] allocator_[0x7f949b119c00] -> enter
Resource::Pool::~Pool() -> thread[0x7000095ab000] this[0x7f9489965258] device_[0x7f94998cf218] allocator_[0x7f947980ee00] -> leave
Resource::Pool::~Pool() -> thread[0x70000962e000] this[0x7f947980d458] device_[0x7f94998cf218] allocator_[0x7f949b119c00] -> leave
Descriptor::Pool::~Pool() -> thread[0x1207d5e00] this[0x7f949a0ee910] device_[0x7f94998cf218] descriptor_pool_[0xa43473000000002d] -> enter
Descriptor::Pool::~Pool() -> thread[0x1207d5e00] this[0x7f949a0ee910] device_[0x7f94998cf218] descriptor_pool_[0xa43473000000002d] -> leave
Command::Pool::~Pool() -> thread[0x1207d5e00] this[0x7f949a0ee668] device_[0x7f94998cf218] command_pool_[0xcad092000000000d] -> enter
Command::Pool::~Pool() -> thread[0x1207d5e00] this[0x7f949a0ee668] device_[0x7f94998cf218] command_pool_[0xcad092000000000d] -> leave
Resource::Pool::~Pool() -> thread[0x1207d5e00] this[0x7f949a0ee858] device_[0x7f94998cf218] allocator_[0x7f9499901c00] -> enter
Resource::Pool::~Pool() -> thread[0x1207d5e00] this[0x7f949a0ee858] device_[0x7f94998cf218] allocator_[0x7f9499901c00] -> leave
Context::~Context() -> thread[0x1207d5e00] this[0x7f9489981800] device_[1] -> enter
Context::~Context() -> thread[0x1207d5e00] this[0x7f9489981800] device_[1] -> leave
ThreadContext::~ThreadContext() -> thread[0x1207d5e00] this[0x0x7f9489981e28] -> enter
ThreadContext::~ThreadContext() -> thread[0x1207d5e00] this[0x0x7f9489981e28] -> leave
```
Some notes on unexpected behaviors by `VkQueue`:
* We need to make sure only one thread accesses `VkQueue` at a time if multi-threaded. Or we need to have a locking mechanism to protect `VkQueue` from multiple threads. This approach is used for this change.
* To avoid having lock overhead, we tried to have per-thread `VkQueue` (having separate object per thread) didn't fix `VK_ERROR_INITIALIZATION_FAILED` error by `vkQueueSubmit` call. This was not expected. Interestingly, MacOS doesn't crash with this per-thread approach but no wonder since its behavior has been not that reliable. Not sure it's an Android Vulkan driver issue or not.
* Making the entire `Context` as `thread_local` without any lock actually fixes the same error.

Test Plan:
**Test build on Android**
```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_perf_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_perf_test
adb shell "/data/local/tmp/vulkan_perf_test"
```
**Test build on MacOS**
```
cd ~/fbsource
buck build //xplat/caffe2:pt_vulkan_perf_test_binAppleMac
./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac\#macosx-x86_64
```

**Test result on Google Pixel 5**
```
//xplat/caffe2:pt_vulkan_perf_test_binAndroid#android-arm64 buck-out/gen/fe3a39b8/xplat/caffe2/pt_vulkan_perf_test_binAndroid#android-arm64
buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid#android-arm64: 1 file pushed, 0 skipped. 145.4 MB/s (826929592 bytes in 5.426s)
Running /data/local/tmp/vulkan_perf_test
Run on (8 X 1804.8 MHz CPU s)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
=============================================================================================================
Thread-safe Vulkan backend on Google Pixel 5
-------------------------------------------------------------------------------------------------------------
Benchmark                                                                   Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1       55.8 ms         15.1 ms         1000
cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1       25.6 ms         4.08 ms         1000
cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1       60.6 ms         14.3 ms         1000
cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1        4.52 ms        0.757 ms         5000
cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1        7.16 ms        0.770 ms         5000
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:3       35.9 ms         38.8 ms         3000
=============================================================================================================
Non thread-safe Vulkan backend on Google Pixel 5
-------------------------------------------------------------------------------------------------------------
Benchmark                                                                   Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1       55.0 ms         14.5 ms         1000
cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1       25.8 ms         4.30 ms         1000
cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1       60.6 ms         14.5 ms         1000
cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1        4.52 ms        0.761 ms         5000
cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1        7.15 ms        0.765 ms         5000
```
For the single thread scenario of thread-safe and non thread-safe versions, the difference between them is less than 2% which is acceptable. In other words, there is no considerable performance degradation with the thread-safe Vulkan backend by using:
* singleton thread local objects for `Command`, `Descriptor` and `Resource` pools
* mutex lock for `VkQueueCommit` call

**Test result on MacOS**
```
Running ./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac#macosx-x86_64
Run on (16 X 2400 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 256 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 11.96, 7.17, 5.45
***WARNING*** Library was built as DEBUG. Timings may be affected.
=============================================================================================================
Thread-safe Vulkan backend on MacOS
-------------------------------------------------------------------------------------------------------------
Benchmark                                                                   Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1       58.4 ms         42.8 ms         1000
cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1       12.3 ms         5.43 ms         1000
cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1       56.0 ms         41.2 ms         1000
cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1        3.00 ms         1.52 ms         5000
cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1        2.56 ms         1.34 ms         5000
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:3       42.8 ms         42.8 ms         3000
=============================================================================================================
Non thread-safe Vulkan backend on MacOS
-------------------------------------------------------------------------------------------------------------
Benchmark                                                                   Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1       58.6 ms         42.6 ms         1000
cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1       11.3 ms         4.67 ms         1000
cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1       57.6 ms         42.4 ms         1000
cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1        2.89 ms         1.45 ms         5000
cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1        2.47 ms         1.27 ms         5000
```
Non thread-safe version is slightly faster than the thread-safe one. This test result is only for reference since we cannot trust MacOS that has an extra layer [MoltenVk](https://github.com/KhronosGroup/MoltenVK) on top of `Metal`.

Reviewed By: SS-JIA

Differential Revision: D32093974

fbshipit-source-id: 9eab7f0db976eff717540a5b32f94ed17a00b662
2021-11-16 22:09:32 -08:00
2317e28e9e Enable complex autograd for col2im / im2col (#68199)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68199

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D32467043

Pulled By: mruberry

fbshipit-source-id: 9094aff036f75b280422e210f7089140ea61fc71
2021-11-16 21:11:44 -08:00
fea2bb64c8 OpInfos for stft, istft, fftshift, ifftshift (#68198)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68198

This unearths some bugs in istft backward, so I've disabled
backward tests but it's fixed in the next PR in the stack.

cc mruberry peterbell10

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D32467044

Pulled By: mruberry

fbshipit-source-id: 5cf49560cbeb0263a66aafb48ed1bcc8884b75f1
2021-11-16 21:09:54 -08:00
6e640a0acf Revise the socket implementation of c10d (#68226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68226

**Note that this PR is unusually big due to the urgency of the changes. Please reach out to me in case you wish to have a "pair" review.**

This PR introduces a major refactoring of the socket implementation of the C10d library. A big portion of the logic is now contained in the `Socket` class and a follow-up PR will further consolidate the remaining parts. As of today the changes in this PR offer:

 - significantly better error handling and much more verbose logging (see the example output below)
 - explicit support for IPv6 and dual-stack sockets
 - correct handling of signal interrupts
 - better Windows support

A follow-up PR will consolidate `send`/`recv` logic into `Socket` and fully migrate to non-blocking sockets.

## Example Output

```
[I logging.h:21] The client socket will attempt to connect to an IPv6 address on (127.0.0.1, 29501).
[I logging.h:21] The client socket is attempting to connect to [localhost]:29501.
[W logging.h:28] The server socket on [localhost]:29501 is not yet listening (Error: 111 - Connection refused), retrying...
[I logging.h:21] The server socket will attempt to listen on an IPv6 address.
[I logging.h:21] The server socket is attempting to listen on [::]:29501.
[I logging.h:21] The server socket has started to listen on [::]:29501.
[I logging.h:21] The client socket will attempt to connect to an IPv6 address on (127.0.0.1, 29501).
[I logging.h:21] The client socket is attempting to connect to [localhost]:29501.
[I logging.h:21] The client socket has connected to [localhost]:29501 on [localhost]:42650.
[I logging.h:21] The server socket on [::]:29501 has accepted a connection from [localhost]:42650.
[I logging.h:21] The client socket has connected to [localhost]:29501 on [localhost]:42722.
[I logging.h:21] The server socket on [::]:29501 has accepted a connection from [localhost]:42722.
[I logging.h:21] The client socket will attempt to connect to an IPv6 address on (127.0.0.1, 29501).
[I logging.h:21] The client socket is attempting to connect to [localhost]:29501.
[I logging.h:21] The client socket has connected to [localhost]:29501 on [localhost]:42724.
[I logging.h:21] The server socket on [::]:29501 has accepted a connection from [localhost]:42724.
[I logging.h:21] The client socket will attempt to connect to an IPv6 address on (127.0.0.1, 29501).
[I logging.h:21] The client socket is attempting to connect to [localhost]:29501.
[I logging.h:21] The client socket has connected to [localhost]:29501 on [localhost]:42726.
[I logging.h:21] The server socket on [::]:29501 has accepted a connection from [localhost]:42726.
```
ghstack-source-id: 143501987

Test Plan: Run existing unit and integration tests on devserver, Fedora, Ubuntu, macOS Big Sur, Windows 10.

Reviewed By: Babar, wilson100hong, mrshenli

Differential Revision: D32372333

fbshipit-source-id: 2204ffa28ed0d3683a9cb3ebe1ea8d92a831325a
2021-11-16 20:49:25 -08:00
4c346bd073 Added forward derivatives for neg, diag, inverse, linalg_eig (#67837)
Summary:
Recreated due to CI failures as per comment https://github.com/pytorch/pytorch/pull/67339#issuecomment-959893293

===

See also discussion in https://github.com/pytorch/pytorch/issues/10223, starting from [this](https://github.com/pytorch/pytorch/issues/10223#issuecomment-949499666) comment

The formulas for the derivatives are taken from https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf.

As indicated, the method linalg_eig_jvp should be used instead of linalg_eig_jvp_eigenvalues and linalg_eig_jvp_eigenvectors in the future. Due to a codegen limitation, this is not yet possible.

CC albanD Lezcano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67837

Reviewed By: mrshenli

Differential Revision: D32403662

Pulled By: soulitzer

fbshipit-source-id: 529cb93f865ce4cc2e24fa6f672d4234e7abe2b1
2021-11-16 20:32:47 -08:00
aa9ee8d02a [Static Runtime] Avoid copying function objects per StaticRuntime instance (#68368)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68368

Currently, each instance of `StaticRuntime` has its own copy of `std::function` object wrapped in `ProcessedNode::Function` object, in order to invoke actual operation implementation.

However, all instances of `StaticRuntime` derived from same `StaticModule` objects invoke exactly same op implementation, and this is avoidable.

This change adds `StaticModule::functions_` member variable to keep a list of unique instance of `ProcessedFunction` objects. A newly constructed `StaticRuntime` takes `ProcessedFunction`'s pointers instead of the whole function object. This can save a substantial amount of memory per `StaticRuntime` instance.

This comes with a sacrifice in execution time. Now that a `ProcessedNode` instance keeps the function object's pointer, executing a node now involves an extra pointer dereference. However, this cost was proved to be negligible from local performance tests.

Thanks to hlu1 for proposing this non-intrusive improvement idea :D

Test Plan:
This change reduces the size of a StaticRuntime instance by 14.41% (459KB -> 393KB) (patched D32181666 to print the memory turnover from instantiating a StaticRuntime instance) for CMF/local ( & 8% for CMF/local_ro). No noticeable latency regression was observed.

==AFTER

* CMF/local
memory turnover: 393608
latency: PyTorch run finished. Milliseconds per iter: 15.6965. Iters per second: 63.7087

* CMF/local_ro
memory turnover:387288
latency: PyTorch run finished. Milliseconds per iter: 7.51308. Iters per second: 133.101

==BEFORE

* CMF/local
memory turnover: 459888
latency: PyTorch run finished. Milliseconds per iter: 15.8278. Iters per second: 63.18

* CMF/local_ro
memory turnover: 420832
latenfcy: PyTorch run finished. Milliseconds per iter: 7.43756. Iters per second: 134.453

==Confirmation that ptvsc2_predictor_bench reports the same memrmoy management stats for inline_cvr:

==AFTER

Total number of managed tensors: 2660
Total number of managed output tensors: 0
Total number of unmanaged values: 3041
Total memory managed: 1496896 bytes
Total number of reused tensors: 1183
Total number of 'out' variant nodes/total number of nodes: 2452/2469 (99.3115%)

Total number of managed tensors: 1412
Total number of managed output tensors: 0
Total number of unmanaged values: 2677
Total memory managed: 39040 bytes
Total number of reused tensors: 959
Total number of 'out' variant nodes/total number of nodes: 1928/1937 (99.5354%)

Total number of managed tensors: 1293
Total number of managed output tensors: 0
Total number of unmanaged values: 14
Total memory managed: 5293824 bytes
Total number of reused tensors: 771
Total number of 'out' variant nodes/total number of nodes: 1298/1298 (100%)

==BEFORE

Total number of managed tensors: 2660
Total number of managed output tensors: 0
Total number of unmanaged values: 3041
Total memory managed: 1496896 bytes
Total number of reused tensors: 1183
Total number of 'out' variant nodes/total number of nodes: 2452/2469 (99.3115%)

Total number of managed tensors: 1412
Total number of managed output tensors: 0
Total number of unmanaged values: 2677
Total memory managed: 39040 bytes
Total number of reused tensors: 959
Total number of 'out' variant nodes/total number of nodes: 1928/1937 (99.5354%)

Total number of managed tensors: 1293
Total number of managed output tensors: 0
Total number of unmanaged values: 14
Total memory managed: 5293824 bytes
Total number of reused tensors: 771
Total number of 'out' variant nodes/total number of nodes: 1298/1298 (100%)

Reviewed By: swolchok

Differential Revision: D32337548

fbshipit-source-id: e714e735399c93fde337b0f70e203a2de632057a
2021-11-16 20:28:48 -08:00
fd85d925b0 Fix some sign issues (#68361)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68361

Fixes
```
caffe2/aten/src/ATen/FunctionalizeFallbackKernel.cpp:36:31: error: comparison of integers of different signs: 'int64_t' (aka 'long') and 'const unsigned long' [-Werror,-Wsign-compare]
    for (int64_t idx = 0; idx < num_returns; ++idx) {
                          ~~~ ^ ~~~~~~~~~~~
caffe2/aten/src/ATen/native/cuda/Sorting.cpp:87:16: error: comparison of integers of different signs: 'int64_t' (aka 'long') and 'std::vector::size_type' (aka 'unsigned long') [-Werror,-Wsign-compare]
    assert(dim < out_shape.size());
           ~~~ ^ ~~~~~~~~~~~~~~~~
```

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D32433063

fbshipit-source-id: b896dbab81861f3f074e00db73d20d9523037dd1
2021-11-16 20:18:58 -08:00
173c0f8a98 Actually enable PYTORCH_RETRY_TEST_CASES for linux tests (#68486)
Summary:
After realizing that CUDA mem leaks were not rerun, I realized I forgot to pass the env var as a Docker variable.

What a noob mistake.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68486

Reviewed By: malfet, seemethere

Differential Revision: D32477989

Pulled By: janeyx99

fbshipit-source-id: e28d095773f50864ab49229e434187a9ecb004cc
2021-11-16 19:02:03 -08:00
affa3f846c Sparse CSR CPU: add torch.addmm (#65606)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65606

This PR adds `torch.addmm(c, a, b, alpha=1.0, beta=0.0, out=out)` variant with `a, b, c, out` all being sparse CSR tensors on CPU.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32366236

Pulled By: cpuhrsch

fbshipit-source-id: e910bcc96eee99d624b80ee881df3887ab3ba5ac
2021-11-16 17:22:46 -08:00
5cfca5524c [JIT] clear GraphFunction.optimized_graphs_ after freezing a module (#68316)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68316

Consider the following:
```
class Mod(nn.Module):
    def __init__(self, val):
	super().__init__()
	self.param = nn.Parameter(val)

    def forward(self, x):
	# this method will change during freezing
	return x + self.param

    torch.jit.export
    def make_prediction(self, x):
	y = x + x
	return self.forward(y)

param = torch.rand([2, 2])

unscripted_mod = Mod(param)
mod = torch.jit.script(unscripted_mod)
mod.eval()
mod = torch.jit.freeze(mod, preserved_attrs=["make_prediction"])`
```

During freezing the following will occur:
1. do some pre-freezing, including inlining; in particular, forward will be inlined into make_prediction. During inlining, forward.optimized_graph() is called, and the result is cached
2. freeze some methods. While freezing forward, the graph associated with the function will get updated. The cached optimized_graphs_ are not updated.

Previously, a call to `mod.forward(x)` would return an exectutor that would run on the old cached optimized_graph(). This would mean that the freezing optimizations would not apply, and potentially that the execution would fail because of parameters removed from the module.

This change clears the optimized_graphs_ cache after running freezing to prevent executing an old version of the graph.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D32410862

Pulled By: davidberard98

fbshipit-source-id: dd8bfe86ec2898b7c72813ab32c08f25c38e4cea
2021-11-16 17:15:29 -08:00
75ccb07b26 [SR] LOG->VLOG (#68477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68477

We're printing a lot of unnecessary logs in prod. Change these from LOG(INFO) to VLOG(1) so you can easily flip them back for testing.

Test Plan: CI

Reviewed By: ajyu, d1jang

Differential Revision: D32439776

fbshipit-source-id: 40fa57f4eeb6ca0b610008062cc94aed62fb6981
2021-11-16 17:09:52 -08:00
515d9fb2a9 Add OpInfo for torch.histc (#67452)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67452

Reviewed By: davidberard98

Differential Revision: D32453690

Pulled By: saketh-are

fbshipit-source-id: 6311519dc1b2e92a200d0455d32a9c7301a45d51
2021-11-16 13:55:30 -08:00
a8bcfc90f5 fix fsdp overlap flaky test (#68415)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68415

remove e4["cpu_iter"] from short list as cpu may take some time to queue both compute and all-gather.
close #68391
ghstack-source-id: 143478769

Test Plan: unit tests

Reviewed By: rohan-varma

Differential Revision: D32457334

fbshipit-source-id: baeedfb628ce4554a1ef365c3a2de27b8884f6d4
2021-11-16 13:52:13 -08:00
27eca2c6fd Revert D32467139: [pytorch][PR] [android][fbjni] Update fbjni to 0.2.2
Test Plan: revert-hammer

Differential Revision:
D32467139 (04056df475)

Original commit changeset: 49e155989d2d

fbshipit-source-id: ce03be3c6f209a6e9969660bd823d5343a7f0615
2021-11-16 13:50:50 -08:00
284758b585 correct NLLLoss parameters default value (#68426)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/17577

Previous
`size_average by default:  True`
`reduce by default: True`
Present
`size_average by default:  None`
`reduce by default: None`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68426

Reviewed By: VitalyFedyunin

Differential Revision: D32463324

Pulled By: jbschlosser

fbshipit-source-id: 7ba9cd03c9fb6b2f19301e7e39c3c490de17202b
2021-11-16 13:45:52 -08:00
76e9dbb0f4 [torch.fx] add code-gen customizability and support for setting breakpoint in code-gen'd forward() call (#67139)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67139

This diff enables setting breakpoint in the graph module's generated python code. See test plan for usage.

In order to support this functionality, and other similar functionalities to customize the generated code, a code transformer functionality is added to `fx.Graph`. This allows flexible customization of `fx.Graph`'s code gen behavior, in composable and functional ways. See test plan for its usage.

Test Plan:
### Use of `fx.experimental.debug.set_trace`

```
In [2]: from torch.fx.experimental.debug import set_trace

In [3]: set_trace(ttop)
Out[3]:
top(
  (a): Sub()
)

In [4]: ttop(1)
> /data/users/kefeilu/fbsource33/fbcode/buck-out/dev/gen/caffe2/torch/fb/fx2trt/<eval_with_key>.10(6)forward()
(Pdb) l
  1
  2
  3
  4     def forward(self, x):
  5         import pdb; pdb.set_trace()
  6  ->     a = self.a(x);  x = None
  7         getitem = a[0]
  8         getitem_1 = a[0];  a = None
  9         add = getitem + getitem_1;  getitem = getitem_1 = None
 10         return add
 11
(Pdb)
```

### Use of `on_generate_code`

```
In [1]: def insert_pdb(body):
   ...:     return ['import pdb; pdb.set_trace()\n', *body]
   ...:

In [8]: type(ttop)
Out[8]: torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl

In [10]: with ttop.graph.on_generate_code(lambda _: insert_pdb):
    ...:     ttop.recompile()
    ...:     print(f"== _on_generate_code should not be None: { ttop.graph._on_generate_code }")
    ...:     print(ttop.code)
    ...:

== _on_generate_code should not be None: <function insert_pdb at 0x7fc9895ddd30>

def forward(self, x):
    import pdb; pdb.set_trace()
    a = self.a(x);  x = None
    getitem = a[0]
    getitem_1 = a[0];  a = None
    add = getitem + getitem_1;  getitem = getitem_1 = None
    return add

In [11]: ttop.graph._on_generate_code  # restored to None

In [12]: ttop(1) # this should drop into pdb
> /data/users/kefeilu/fbsource33/fbcode/buck-out/dev/gen/caffe2/torch/fb/fx2trt/<eval_with_key>.6(6)forward()
(Pdb) l
  1
  2
  3
  4     def forward(self, x):
  5         import pdb; pdb.set_trace()
  6  ->     a = self.a(x);  x = None
  7         getitem = a[0]
  8         getitem_1 = a[0];  a = None
  9         add = getitem + getitem_1;  getitem = getitem_1 = None
 10         return add
 11
```

Reviewed By: jamesr66a

Differential Revision: D30736160

fbshipit-source-id: 9646867aae0461b5131dfd4ba9ee77a8c2ea9c93
2021-11-16 13:28:11 -08:00
8954c92529 [PyTorch][Static Runtime] Borrow outputs in static_runtime::VarTupleUnpack (#68161)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68161

Continuing rollout of borrowing outputs for native ops.
ghstack-source-id: 143424920

Test Plan:
Compare CMF local_ro perf again.

Previous diff:
```
I1110 22:05:23.245435 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03272. Iters per second: 968.313
I1110 22:05:23.822196 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.06478. Iters per second: 939.163
I1110 22:05:24.395256 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.035. Iters per second: 966.186
I1110 22:05:24.964169 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.02786. Iters per second: 972.898
I1110 22:05:25.536558 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03205. Iters per second: 968.946
I1110 22:05:26.109027 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.04256. Iters per second: 959.174
I1110 22:05:26.679611 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03245. Iters per second: 968.567
I1110 22:05:27.253048 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.04493. Iters per second: 957.005
I1110 22:05:27.822629 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.0299. Iters per second: 970.971
I1110 22:05:28.393326 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03039. Iters per second: 970.509
I1110 22:05:28.393368 113949 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.03726, standard deviation: 0.0111053
```

This diff:
```
I1110 22:18:48.453075 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.931188. Iters per second: 1073.9
I1110 22:18:48.967614 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.933196. Iters per second: 1071.59
I1110 22:18:49.483338 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.932087. Iters per second: 1072.86
I1110 22:18:49.997144 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.930877. Iters per second: 1074.26
I1110 22:18:50.529383 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.936981. Iters per second: 1067.26
I1110 22:18:51.085038 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.953214. Iters per second: 1049.08
I1110 22:18:51.607192 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.940719. Iters per second: 1063.02
I1110 22:18:52.126169 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.942638. Iters per second: 1060.85
I1110 22:18:52.644445 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.937574. Iters per second: 1066.58
I1110 22:18:53.163486 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.941636. Iters per second: 1061.98
I1110 22:18:53.163537 191647 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 0.938011, standard deviation: 0.00691196
```

0.099 (9.5%!) usec/iter improvement over previous diff

Reviewed By: hlu1

Differential Revision: D32347900

fbshipit-source-id: 8169ebcadf1248e555a18bbffa99eef6cac1ba85
2021-11-16 12:32:15 -08:00
755be54c77 [PyTorch][Static Runtime] Borrow outputs in static_runtime::dict_unpack (#68160)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68160

This generalizes the mechanism D32318674 added for letting native ops borrow their outputs and uses it in dict_unpack.
ghstack-source-id: 143424919

Test Plan:
4.5% in CMF local_ro compared to D32318674 (previous two diffs were necessary steps but didn't get the full win yet):

```
FastAliasingInSelectTensor, local_ro
========================================
I1110 22:06:37.549811 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08488. Iters per second: 921.76
I1110 22:06:38.147949 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08675. Iters per second: 920.171
I1110 22:06:38.766340 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08626. Iters per second: 920.592
I1110 22:06:39.366608 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08376. Iters per second: 922.717
I1110 22:06:39.964979 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08362. Iters per second: 922.833
I1110 22:06:40.565248 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08423. Iters per second: 922.312
I1110 22:06:41.167326 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.0945. Iters per second: 913.659
I1110 22:06:41.766187 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08373. Iters per second: 922.742
I1110 22:06:42.367816 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08995. Iters per second: 917.475
I1110 22:06:42.968391 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08854. Iters per second: 918.665
I1110 22:06:42.968446 119627 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.08662, standard deviation: 0.00351662

BorrowDictUnpackOutputs, local_ro
========================================

I1110 22:05:23.245435 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03272. Iters per second: 968.313
I1110 22:05:23.822196 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.06478. Iters per second: 939.163
I1110 22:05:24.395256 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.035. Iters per second: 966.186
I1110 22:05:24.964169 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.02786. Iters per second: 972.898
I1110 22:05:25.536558 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03205. Iters per second: 968.946
I1110 22:05:26.109027 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.04256. Iters per second: 959.174
I1110 22:05:26.679611 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03245. Iters per second: 968.567
I1110 22:05:27.253048 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.04493. Iters per second: 957.005
I1110 22:05:27.822629 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.0299. Iters per second: 970.971
I1110 22:05:28.393326 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03039. Iters per second: 970.509
I1110 22:05:28.393368 113949 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.03726, standard deviation: 0.0111053
```

0.04936 (4.5%) usec/iter improvement

Reviewed By: hlu1

Differential Revision: D32347390

fbshipit-source-id: e636ddafacf30ed2a2d84a6e15fff97481342fdb
2021-11-16 12:31:03 -08:00
bbc24222d2 [PyTorch][Static Runtime] Refcount bump pass in native_ops (#68159)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68159

These all look like they'll cause unnecessary refcount bumps to me.
ghstack-source-id: 143424917

Test Plan:
CI

TODO profile local_ro?

Reviewed By: hlu1

Differential Revision: D32347392

fbshipit-source-id: d8ed91b5855b86765db00c61ad3650273302c7b6
2021-11-16 12:27:12 -08:00
86399d8e0c Add histogramdd to torch.rst (#68273)
Summary:
The `torch.histogramdd` operator is documented in `torch/functional.py` but does not appear in the generated docs because it is missing from `docs/source/torch.rst`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68273

Reviewed By: cpuhrsch

Differential Revision: D32470522

Pulled By: saketh-are

fbshipit-source-id: a23e73ba336415457a30bae568bda80afa4ae3ed
2021-11-16 11:55:40 -08:00
ed00a763a2 [PyTorch] Don't force refcount bump when accessing DictEntryRef key/value (#68158)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68158

to() sometimes returns a reference; let's forward that through.
ghstack-source-id: 143424916

Test Plan: Combined with following diff, seeing a huge drop in dict_unpack self time in ctr_mobile_feed local_ro net. Following diff by itself didn't work.

Reviewed By: suo

Differential Revision: D32347391

fbshipit-source-id: da96295bf83ea30867a2e3fceedc9b4e0a33ffa3
2021-11-16 11:44:08 -08:00
04056df475 [android][fbjni] Update fbjni to 0.2.2 (#68400)
Summary:
ghstack-source-id: caeb8df3a18a6fa48d591af126ac59d8e41494b5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68400

Fixes #{issue number}

Updates fbjni version to 0.2.2

ci-all PR: https://github.com/pytorch/pytorch/pull/68401

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68402

Reviewed By: linbinyu

Differential Revision: D32467139

Pulled By: IvanKobzarev

fbshipit-source-id: 49e155989d2dbafedd5b2df77e089e25e8b4f8f8
2021-11-16 11:34:46 -08:00
df129fa8d6 [PyTorch] Support MaybeOwned<IValue> (#68157)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68157

Does what it says on the tin. I don't have a use for `MaybeOwned<IValue>` itself right now, but following diffs will use `MaybeOwnedTraits<IValue>::{create,destroy}Borrow` and I thought it best to just provide the full thing.
ghstack-source-id: 143424915

Test Plan: Extended MaybeOwned tests to cover this.

Reviewed By: hlu1

Differential Revision: D32347393

fbshipit-source-id: 219658cb69b951d36dee80c2ae51387328224866
2021-11-16 11:24:32 -08:00
030ee34216 Add OpInfo for torch.nonzero (#67459)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67459

Reviewed By: davidberard98

Differential Revision: D32453687

Pulled By: saketh-are

fbshipit-source-id: e7ed5601686d88407bf67bca0f75304b30fa7ac5
2021-11-16 11:10:43 -08:00
10e9d80ad1 [PyTorch][Static Runtime] Don't track scalar ivalues (#67702)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67702

This isn't a particularly large optimization and it does
nothing before select_tensor is introduced (I'm surprised that no
operators have optimizable outputs!), but it seems like we should probably get the savings.
ghstack-source-id: 143424918

Test Plan: CI; checked `--do_profile=1` ouput with following diff and we save tracking hundreds of values, as expected.

Reviewed By: hlu1

Differential Revision: D32112522

fbshipit-source-id: 1804b77992a73670bfc1e36af608b852b8261bd2
2021-11-16 11:05:42 -08:00
eqy
391be39575 Use reduced precision switch in test_addmm_baddbmm_overflow (#68399)
Summary:
https://github.com/pytorch/pytorch/issues/68125
Checking to see if actually using the switch fixes the test...

CC mruberry ngimel ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68399

Reviewed By: VitalyFedyunin

Differential Revision: D32466974

Pulled By: ngimel

fbshipit-source-id: aa8643ed913b344977fd103974625c527d20dbb8
2021-11-16 10:50:17 -08:00
5c3529a86d [lint] small pass to make lint clean (#68367)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68367

- bmm_test.py was using syntax not allowed in 3.6
- Some suppressions were not placed on the correct line.

With this file,
```
lintrunner --paths-cmd='git grep -Il .'
```
passes successfully.

Test Plan: Imported from OSS

Reviewed By: janeyx99, mrshenli

Differential Revision: D32436644

Pulled By: suo

fbshipit-source-id: ae9300c6593d8564fb326822de157d00f4aaa3c2
2021-11-16 10:27:00 -08:00
639258499f [PyTorch][Static Runtime] Add & use "small array" for ProcessedNodeInputs (#67935)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67935

Rationale should be documented in code comments. In short, we
can avoid heap-allocating arrays of input indexes for operators with 5
or fewer inputs, at the cost of a tag bit check on access.
ghstack-source-id: 143429112

Test Plan:
Patched d1jang's D32181666, which prints static runtime memory usage.

Previous diff, local:

```
I1105 12:17:36.459688 866763 PyTorchStaticRuntimePredictor.cpp:82] memory turnover after creating an instance of StaticRuntime: 354208
```

This diff, local:

```
I1105 12:48:35.820663 1066520 PyTorchStaticRuntimePredictor.cpp:82] memory turnover after creating an instance of StaticRuntime: 338064
```
4.5% savings (16144 bytes)

Ran 10 repetitions of CMF local_ro with core pinning: P467095603. This diff is perf neutral compared to the previous diff.

Reviewed By: hlu1

Differential Revision: D32216573

fbshipit-source-id: d18483db255f75f1d90e610ecded7727c6ffe65c
2021-11-16 10:21:12 -08:00
6acde23bec [PyTorch][Static Runtime] Switch input/output repr to 2-byte offsets (#67934)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67934

This reduces the memory requirements of ProcessedNode: by allocating outputs sequentially into a shared array and supporting at most 2**16 - 1 values (current models seem to have 10-20x less than that), we only need to store the 2-byte offset into that array and 2-byte number of outputs in ProcessedNode.
ghstack-source-id: 143429113

Test Plan:
Patched d1jang's diff to measure memory turnover around SR startup.

Previous diff, CMF local:

```
I1104 12:19:39.900211 597593 PyTorchStaticRuntimePredictor.cpp:82] memory turnover after creating an instance of StaticRuntime: 427120
```

This diff, CMF local:

```
I1105 12:17:36.459688 866763 PyTorchStaticRuntimePredictor.cpp:82] memory turnover after creating an instance of StaticRuntime: 354208
72912 bytes (17%) savings
```

Perf looks neutral; see next diff (D32216573) test plan for details.

Reviewed By: hlu1

Differential Revision: D32190751

fbshipit-source-id: 30c1e2caa9460f0d83b2d9bb24c68ccfcef757cc
2021-11-16 10:19:50 -08:00
8678472ec8 [PyTorch][Static Runtime] Save 2 pointers in ProcessedNode (#67860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67860

We don't need 8-byte sizes for inputs and outputs, and we only need op names if profiling isn't disabled.
ghstack-source-id: 143429111

Test Plan:
Ran CMF local & local_ro with recordio inputs. I'm calling
the result inconclusive/neutral because I saw some noise (as you'll
see below), but that's fine with me since this is a clear memory win.

```
Nov4Stable, local_ro
========================================
I1104 09:53:08.875444 505783 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.19925. Iters per second: 833.851
I1104 09:53:10.200443 505783 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.1996. Iters per second: 833.608
I1104 09:53:11.524045 505783 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.19746. Iters per second: 835.103
I1104 09:53:12.851861 505783 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.20479. Iters per second: 830.019
I1104 09:53:14.183387 505783 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.20487. Iters per second: 829.964
I1104 09:53:14.183427 505783 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.2012, standard deviation: 0.00341762

re-ran stable in light of baffling regression (see next entry), and sure enough we still have some significant run-to-run-variation:

I1104 09:56:15.244969 524012 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.24956. Iters per second: 800.28
I1104 09:56:16.621292 524012 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.24776. Iters per second: 801.437
I1104 09:56:18.018808 524012 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.25247. Iters per second: 798.42
I1104 09:56:19.399660 524012 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.25054. Iters per second: 799.656
I1104 09:56:20.781828 524012 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.25052. Iters per second: 799.664
I1104 09:56:20.781878 524012 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.25017, standard deviation: 0.00171396

Nov4SaveTwoWordsInProcessedNode, local_ro
========================================
I1104 09:53:42.070139 508309 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.2411. Iters per second: 805.736
I1104 09:53:43.438390 508309 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.24102. Iters per second: 805.788
I1104 09:53:44.773303 508309 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.20682. Iters per second: 828.621
I1104 09:53:46.110538 508309 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.21216. Iters per second: 824.973
I1104 09:53:47.448279 508309 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.21265. Iters per second: 824.639
I1104 09:53:47.448334 508309 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.22275, standard deviation: 0.0168698

early runs look like a glitch, rerunning

I1104 09:54:20.999117 511022 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.24558. Iters per second: 802.841
I1104 09:54:22.376780 511022 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.24436. Iters per second: 803.623
I1104 09:54:23.738584 511022 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.23176. Iters per second: 811.845
I1104 09:54:25.113063 511022 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.24938. Iters per second: 800.395
I1104 09:54:26.476349 511022 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.23552. Iters per second: 809.377
I1104 09:54:26.476395 511022 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.24132, standard deviation: 0.00737197

Nov4Stable, local
========================================

I1104 09:57:56.854537 533814 PyTorchPredictorBenchLib.cpp:346] memory turnover after getPredictor: 177885632
I1104 09:58:02.829813 533814 PrepareModelInputs.cpp:190] Loaded 696 records.
I1104 09:58:03.010681 533814 PyTorchPredictorBenchLib.cpp:353] memory turnover before benchmarking: 4590507056
I1104 09:58:03.010710 533814 PyTorchPredictorBenchLib.cpp:154] PyTorch predictor: number of prediction threads 1
I1104 09:58:58.839010 533814 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 20.0567. Iters per second: 49.8586
I1104 09:59:54.797755 533814 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 20.1007. Iters per second: 49.7494
I1104 10:00:50.696525 533814 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 20.0657. Iters per second: 49.8363
I1104 10:01:46.514736 533814 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 20.0696. Iters per second: 49.8265
I1104 10:02:42.378270 533814 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 20.0641. Iters per second: 49.8402
I1104 10:02:42.378316 533814 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 20.0714, standard deviation: 0.0170605
I1104 10:02:42.378325 533814 PyTorchPredictorBenchLib.cpp:366] memory turnover after benchmarking: 4591882400

Nov4SaveTwoWordsInProcessedNode, local
========================================
I1104 10:38:15.543320 733514 PyTorchPredictorBenchLib.cpp:346] memory turnover after getPredictor: 177721792
I1104 10:38:21.224673 733514 PrepareModelInputs.cpp:190] Loaded 696 records.
I1104 10:38:21.382973 733514 PyTorchPredictorBenchLib.cpp:353] memory turnover before benchmarking: 4590343216
I1104 10:38:21.382992 733514 PyTorchPredictorBenchLib.cpp:154] PyTorch predictor: number of prediction threads 1
I1104 10:39:17.005359 733514 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.9498. Iters per second: 50.1257
I1104 10:40:12.545269 733514 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.9279. Iters per second: 50.1808
I1104 10:41:08.138119 733514 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.999. Iters per second: 50.0026
I1104 10:42:03.686841 733514 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.9115. Iters per second: 50.2222
I1104 10:42:55.137498 733539 Proxy2Connection.cpp:343] Received NotRegisteredException from Configerator Proxy2.
I1104 10:42:55.138715 733539 ReadOnlyConnectionIf.h:91] Mark connection as healthy.
I1104 10:42:55.384534 733514 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.6297. Iters per second: 50.9433
I1104 10:42:55.384579 733514 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 19.8836, standard deviation: 0.14571
I1104 10:42:55.384588 733514 PyTorchPredictorBenchLib.cpp:366] memory turnover after benchmarking: 4591711760
```

Reviewed By: d1jang

Differential Revision: D32177531

fbshipit-source-id: 267e38a151d2dbab34fd648135d173cfbee1c22e
2021-11-16 10:12:53 -08:00
45b2f41c3e [package] fix torchscript classes in package (#68028)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68028

Today, we demangle a typename before passing it to the TorchScript
compiler. This breaks compilation of torch classes in cases where we are
attempting to script the same class name from inside a package and out,
since we will return the same qualified name for both.

Differential Revision:
D32261907
D32261907

Test Plan: Imported from OSS

Reviewed By: saketh-are

Pulled By: suo

fbshipit-source-id: 921bc03ad385d94b9279fbc6f3b7dcd0ddbe5bc7
2021-11-16 10:01:40 -08:00
ba16b1eca7 [numpy] Alias arctan2 to atan2 (#67010)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65906

Adds an alias `arctan2` to improve numpy compatibility

cc mruberry rgommers

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67010

Reviewed By: anjali411

Differential Revision: D32378998

Pulled By: mruberry

fbshipit-source-id: 424c5c10c12b49c20ee83ccd109325c480b5b6cf
2021-11-16 09:41:09 -08:00
6226a3cf74 [Vulkan] Implement permute operator (#68274)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68274

Implemented `permute` operator on the Vulkan backend:
* Supports only <= 4D tensors.
* Builds up shader operations from the output texture point of view to avoid the nondeterministic order of GPU shader operations between texels. See [incoherent memory access](https://www.khronos.org/opengl/wiki/Memory_Model#Incoherent_memory_access)
* Generalized input tensors to 4D ones to simplify input/output texture handling. For example, {2, 3} is treated as {1,1,2,3} internally.
* 1D to 4D inputs with all possible permutations are used for test cases.
* Reference on CPU implementation of `permute` operator: [TensorShape.cpp](cbf596bf8e/aten/src/ATen/native/TensorShape.cpp (L936))
* When shuffling dims, a new depth size of output texture needs to be determined by `ceil(batch*channel)/4`. This logic needs to be handled in a separate change.
    * The depth of texture cannot exceed a certain number, depending on the device's capability. It is typically 2048 on most of android devices but less than or equal to 16,384 (see [Value distribution for maxImageDimension3D on Android](https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxImageDimension3D&platform=android)). i.e., 2048 on MacOS and Google Pixel 5.
    * Due to this limitation, `permute` op needs to throw an exception if the depth of output texture is greater than or equal to `VkImageFormatProperties.maxExtent.depth`.
    * Otherwise, the following error will occur: `-[MTLTextureDescriptorInternal validateWithDevice:]:1325: failed assertion "Texture Descriptor Validation MTLTextureDescriptor has depth (10664) greater than the maximum allowed size of 2048."`
* Vulkan `permute` operator tensor conversion:
{F679505029}
{F679505223}
* Vulkan `permute` operator shader equation:
{F679504799}
* Error/edge cases:
```
X = torch.randint(0, 23, (2, 3, 2, 2))
O = torch.permute(X, (2, 2, 1, 0))
# RuntimeError: repeated dim in permute

O = torch.permute(X, (2, 1, 0))
# RuntimeError: number of dims don't match in permute

O = torch.permute(X, (4, 3, 2, 1, 0))
# RuntimeError: number of dims don't match in permute

O = torch.permute(X, (3, 2, -1, 0))
# RuntimeError: repeated dim in permute

data2 = [0,1,2]
X2 = torch.tensor(data2)
O2 = torch.permute(X2, (0))
# permute(): argument 'dims' (position 2) must be tuple of ints, not int
# TypeError: permute(): argument 'dims' (position 2) must be tuple of ints, not int

O = torch.permute(X, (0, 1, 2, 3))
# do nothing since the dims doesn't change?
```
* Shader debug traces with a 4D tensor size {2,3,2,2} with permute by {3,2,1,0}:
```
output tensor:
(1,1,.,.) =
  0.4395  0.5652
  0.1309  0.9768
  0.0490  0.1127
(2,1,.,.) =
  0.7058  0.2238
  0.6542  0.4064
  0.4813  0.0500
(1,2,.,.) =
  0.1716  0.4951
  0.2225  0.3255
  0.0758  0.7150
(2,2,.,.) =
  0.3762  0.0228
  0.6367  0.4411
  0.7682  0.7599
[ CPUFloatType{2,2,3,2} ]

shader debug traces:
src_index:0, b c h w: 0 0 0 0, posIn: (0 0 0) i:0 -> b c h w: 0 0 0 0, dst_index: 0, posOut: (0 0 0) j:0 -> inval[0.439453] outval[0.439453] -> inval[0.439453 0.130859 0.049011 0.564941] outval[0.439453 0.000000 0.000000 0.000000]
src_index:3, b c h w: 1 0 0 0, posIn: (0 0 0) i:3 -> b c h w: 0 0 0 1, dst_index: 0, posOut: (1 0 0) j:0 -> inval[0.564941] outval[0.564941] -> inval[0.439453 0.130859 0.049011 0.564941] outval[0.564941 0.000000 0.000000 0.000000]
src_index:1, b c h w: 0 1 0 0, posIn: (0 0 0) i:1 -> b c h w: 0 0 1 0, dst_index: 0, posOut: (0 1 0) j:0 -> inval[0.130859] outval[0.130859] -> inval[0.439453 0.130859 0.049011 0.564941] outval[0.130859 0.000000 0.000000 0.000000]
src_index:4, b c h w: 1 1 0 0, posIn: (0 0 1) i:0 -> b c h w: 0 0 1 1, dst_index: 0, posOut: (1 1 0) j:0 -> inval[0.976562] outval[0.976562] -> inval[0.976562 0.112671 -65504.000000 -65504.000000] outval[0.976562 0.000000 0.000000 0.000000]
src_index:2, b c h w: 0 2 0 0, posIn: (0 0 0) i:2 -> b c h w: 0 0 2 0, dst_index: 0, posOut: (0 2 0) j:0 -> inval[0.049011] outval[0.049011] -> inval[0.439453 0.130859 0.049011 0.564941] outval[0.049011 0.000000 0.000000 0.000000]
src_index:5, b c h w: 1 2 0 0, posIn: (0 0 1) i:1 -> b c h w: 0 0 2 1, dst_index: 0, posOut: (1 2 0) j:0 -> inval[0.112671] outval[0.112671] -> inval[0.976562 0.112671 -65504.000000 -65504.000000] outval[0.112671 0.000000 0.000000 0.000000]
src_index:0, b c h w: 0 0 1 0, posIn: (0 1 0) i:0 -> b c h w: 0 1 0 0, dst_index: 1, posOut: (0 0 0) j:1 -> inval[0.171509] outval[0.171509] -> inval[0.171509 0.222412 0.075745 0.494873] outval[0.439453 0.171509 0.000000 0.000000]
src_index:3, b c h w: 1 0 1 0, posIn: (0 1 0) i:3 -> b c h w: 0 1 0 1, dst_index: 1, posOut: (1 0 0) j:1 -> inval[0.494873] outval[0.494873] -> inval[0.171509 0.222412 0.075745 0.494873] outval[0.564941 0.494873 0.000000 0.000000]
src_index:1, b c h w: 0 1 1 0, posIn: (0 1 0) i:1 -> b c h w: 0 1 1 0, dst_index: 1, posOut: (0 1 0) j:1 -> inval[0.222412] outval[0.222412] -> inval[0.171509 0.222412 0.075745 0.494873] outval[0.130859 0.222412 0.000000 0.000000]
src_index:4, b c h w: 1 1 1 0, posIn: (0 1 1) i:0 -> b c h w: 0 1 1 1, dst_index: 1, posOut: (1 1 0) j:1 -> inval[0.325439] outval[0.325439] -> inval[0.325439 0.714844 -65504.000000 -65504.000000] outval[0.976562 0.325439 0.000000 0.000000]
src_index:2, b c h w: 0 2 1 0, posIn: (0 1 0) i:2 -> b c h w: 0 1 2 0, dst_index: 1, posOut: (0 2 0) j:1 -> inval[0.075745] outval[0.075745] -> inval[0.171509 0.222412 0.075745 0.494873] outval[0.049011 0.075745 0.000000 0.000000]
src_index:5, b c h w: 1 2 1 0, posIn: (0 1 1) i:1 -> b c h w: 0 1 2 1, dst_index: 1, posOut: (1 2 0) j:1 -> inval[0.714844] outval[0.714844] -> inval[0.325439 0.714844 -65504.000000 -65504.000000] outval[0.112671 0.714844 0.000000 0.000000]
src_index:0, b c h w: 0 0 0 1, posIn: (1 0 0) i:0 -> b c h w: 1 0 0 0, dst_index: 2, posOut: (0 0 0) j:2 -> inval[0.705566] outval[0.705566] -> inval[0.705566 0.653809 0.481201 0.223755] outval[0.439453 0.171509 0.705566 0.000000]
src_index:3, b c h w: 1 0 0 1, posIn: (1 0 0) i:3 -> b c h w: 1 0 0 1, dst_index: 2, posOut: (1 0 0) j:2 -> inval[0.223755] outval[0.223755] -> inval[0.705566 0.653809 0.481201 0.223755] outval[0.564941 0.494873 0.223755 0.000000]
src_index:1, b c h w: 0 1 0 1, posIn: (1 0 0) i:1 -> b c h w: 1 0 1 0, dst_index: 2, posOut: (0 1 0) j:2 -> inval[0.653809] outval[0.653809] -> inval[0.705566 0.653809 0.481201 0.223755] outval[0.130859 0.222412 0.653809 0.000000]
src_index:4, b c h w: 1 1 0 1, posIn: (1 0 1) i:0 -> b c h w: 1 0 1 1, dst_index: 2, posOut: (1 1 0) j:2 -> inval[0.406250] outval[0.406250] -> inval[0.406250 0.049957 -65504.000000 -65504.000000] outval[0.976562 0.325439 0.406250 0.000000]
src_index:2, b c h w: 0 2 0 1, posIn: (1 0 0) i:2 -> b c h w: 1 0 2 0, dst_index: 2, posOut: (0 2 0) j:2 -> inval[0.481201] outval[0.481201] -> inval[0.705566 0.653809 0.481201 0.223755] outval[0.049011 0.075745 0.481201 0.000000]
src_index:5, b c h w: 1 2 0 1, posIn: (1 0 1) i:1 -> b c h w: 1 0 2 1, dst_index: 2, posOut: (1 2 0) j:2 -> inval[0.049957] outval[0.049957] -> inval[0.406250 0.049957 -65504.000000 -65504.000000] outval[0.112671 0.714844 0.049957 0.000000]
src_index:0, b c h w: 0 0 1 1, posIn: (1 1 0) i:0 -> b c h w: 1 1 0 0, dst_index: 3, posOut: (0 0 0) j:3 -> inval[0.376221] outval[0.376221] -> inval[0.376221 0.636719 0.768066 0.022751] outval[0.439453 0.171509 0.705566 0.376221] outval_after[0.439453 0.171509 0.705566 0.376221]
src_index:3, b c h w: 1 0 1 1, posIn: (1 1 0) i:3 -> b c h w: 1 1 0 1, dst_index: 3, posOut: (1 0 0) j:3 -> inval[0.022751] outval[0.022751] -> inval[0.376221 0.636719 0.768066 0.022751] outval[0.564941 0.494873 0.223755 0.022751] outval_after[0.564941 0.494873 0.223755 0.022751]
src_index:1, b c h w: 0 1 1 1, posIn: (1 1 0) i:1 -> b c h w: 1 1 1 0, dst_index: 3, posOut: (0 1 0) j:3 -> inval[0.636719] outval[0.636719] -> inval[0.376221 0.636719 0.768066 0.022751] outval[0.130859 0.222412 0.653809 0.636719] outval_after[0.130859 0.222412 0.653809 0.636719]
src_index:4, b c h w: 1 1 1 1, posIn: (1 1 1) i:0 -> b c h w: 1 1 1 1, dst_index: 3, posOut: (1 1 0) j:3 -> inval[0.440918] outval[0.440918] -> inval[0.440918 0.759766 -65504.000000 -65504.000000] outval[0.976562 0.325439 0.406250 0.440918] outval_after[0.976562 0.325439 0.406250 0.440918]
src_index:2, b c h w: 0 2 1 1, posIn: (1 1 0) i:2 -> b c h w: 1 1 2 0, dst_index: 3, posOut: (0 2 0) j:3 -> inval[0.768066] outval[0.768066] -> inval[0.376221 0.636719 0.768066 0.022751] outval[0.049011 0.075745 0.481201 0.768066] outval_after[0.049011 0.075745 0.481201 0.768066]
src_index:5, b c h w: 1 2 1 1, posIn: (1 1 1) i:1 -> b c h w: 1 1 2 1, dst_index: 3, posOut: (1 2 0) j:3 -> inval[0.759766] outval[0.759766] -> inval[0.440918 0.759766 -65504.000000 -65504.000000] outval[0.112671 0.714844 0.049957 0.759766] outval_after[0.112671 0.714844 0.049957 0.759766]
```

Test Plan:
Build & test on Android:
```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
```
Build & test on MacOS:
```
cd ~/fbsource
buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac
./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64
```
Test result on Android (Google Pixel 5):
```
[ RUN      ] VulkanAPITest.permute_2d_success
[       OK ] VulkanAPITest.permute_2d_success (26 ms)
[ RUN      ] VulkanAPITest.permute_3d_success
[       OK ] VulkanAPITest.permute_3d_success (6 ms)
[ RUN      ] VulkanAPITest.permute_4d_success
[       OK ] VulkanAPITest.permute_4d_success (10 ms)
[ RUN      ] VulkanAPITest.permute_4dmclaren_success
[       OK ] VulkanAPITest.permute_4dmclaren_success (1 ms)
[ RUN      ] VulkanAPITest.permute_4dbig_success
[       OK ] VulkanAPITest.permute_4dbig_success (234 ms)
[ RUN      ] VulkanAPITest.permute_negativedims_success
[       OK ] VulkanAPITest.permute_negativedims_success (0 ms)
[ RUN      ] VulkanAPITest.permute_1d_nochange
[       OK ] VulkanAPITest.permute_1d_nochange (0 ms)
[ RUN      ] VulkanAPITest.permute_sameDims_nochange
[       OK ] VulkanAPITest.permute_sameDims_nochange (1 ms)
[ RUN      ] VulkanAPITest.permute_invalidinputs_exceptions
[       OK ] VulkanAPITest.permute_invalidinputs_exceptions (1 ms)
```
Test result on MacOS:
```
[ RUN      ] VulkanAPITest.permute_2d_success
[       OK ] VulkanAPITest.permute_2d_success (154 ms)
[ RUN      ] VulkanAPITest.permute_3d_success
[       OK ] VulkanAPITest.permute_3d_success (13 ms)
[ RUN      ] VulkanAPITest.permute_4d_success
[       OK ] VulkanAPITest.permute_4d_success (33 ms)
[ RUN      ] VulkanAPITest.permute_4dmclaren_success
[       OK ] VulkanAPITest.permute_4dmclaren_success (2 ms)
[ RUN      ] VulkanAPITest.permute_4dbig_success
[       OK ] VulkanAPITest.permute_4dbig_success (251 ms)
[ RUN      ] VulkanAPITest.permute_negativedims_success
[       OK ] VulkanAPITest.permute_negativedims_success (2 ms)
[ RUN      ] VulkanAPITest.permute_1d_nochange
[       OK ] VulkanAPITest.permute_1d_nochange (1 ms)
[ RUN      ] VulkanAPITest.permute_sameDims_nochange
[       OK ] VulkanAPITest.permute_sameDims_nochange (0 ms)
[ RUN      ] VulkanAPITest.permute_invalidinputs_exceptions
[       OK ] VulkanAPITest.permute_invalidinputs_exceptions (2 ms)
```

Reviewed By: SS-JIA

Differential Revision: D32292554

fbshipit-source-id: dbeaee6ff98633022cf34d6da90662d81eac6b0e
2021-11-16 09:27:51 -08:00
bc3d380ed1 Throw error when saving storages that view same data with different type (#66949)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58970

cc mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66949

Reviewed By: albanD

Differential Revision: D31926323

Pulled By: anjali411

fbshipit-source-id: f6e7acc0c1968b70a94f9b0b69a32780e8e21a62
2021-11-16 08:44:44 -08:00
bf60c6e71b [JIT] remove prim::SetAttr from list of ops with side effects (#68311)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68311

prim::SetAttr is listed as an op with side effects, but in AliasDb, `analyzeSetAttr` already accounts for its behavior. By removing it from the list of ops with side effects, dead code elimination will work in a few other scenarios.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32409510

fbshipit-source-id: 52ed9e19f92afb95c669ad3c2440f72f9515ba4c
2021-11-16 08:39:24 -08:00
add79722dd Correct householder_product docs. (#68335)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68335

When discussing https://github.com/pytorch/pytorch/pull/63880, we
realised that the docs of `householder_product` were not correct. This
PR fixes this.

The new docs are slightly more difficult, but hopefully correct. Note
that this is a LAPACK function in disguise, so it is expected the
specification to be more difficult than normal.

cc brianjo mruberry jianyuh nikitaved pearu walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32429755

Pulled By: mruberry

fbshipit-source-id: 3ac866d30984adcd9f3b83d7fa9ae7b7ae5d4b53
2021-11-16 07:54:24 -08:00
01a8862582 OpInfo tests for nn.functional.max_pool{n}d. (#68075)
Summary:
As per title.

It is planned to use these tests for fixing issues with the max_unpools' backward methods reported in https://github.com/pytorch/pytorch/issues/67658 and https://github.com/pytorch/pytorch/issues/67657.
max_unpool.backward methods are not tested and implemented with custom kernels. We can replace these kernels with advanced indexing operations (i.e. `gather`) which are efficient and well tested.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68075

Reviewed By: malfet

Differential Revision: D32308317

Pulled By: mruberry

fbshipit-source-id: 9f91c6e6a9d78c19230e93fc0a3164f4eb7b8ec5
2021-11-16 07:28:32 -08:00
33e9a0b5f6 [Reland] Python tracer. (#68325)
Summary:
There were two issues with the original PR:
1) My assumption that bound C functions could be trusted to stay alive was not valid. I'm still not entirely sure what was dying, but I've just added a cache so that the first time I see a function I collect the repr just like I was already doing with Python functions.

2) `std::regex` is known to be badly broken and prone to segfaults. Because I'm just doing a very simple prefix prune it's fine to do it manually; see `trimPrefix`. Long term we should move all of PyTorch to `re2` as the internal lint suggests, but CMake is hard and I couldn't get it to work.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68325

Reviewed By: chaekit

Differential Revision: D32432596

Pulled By: robieta

fbshipit-source-id: 06fb4bcdc6933a3e76f6021ca69dc77a467e4b2e
2021-11-15 23:32:49 -08:00
438ca7603f Fix sign comparison issue in Histogram.cpp (#68294)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68294

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D32403821

fbshipit-source-id: cdbf1d83ab02b1e996559e4cfbbe699b7165483a
2021-11-15 23:14:04 -08:00
ec742c65d5 Fix a sign comparison issue in BatchLinearAlgebraLib.cpp (#68293)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68293

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D32403788

fbshipit-source-id: 1afc5e62e7157f144ec36b029ee3bcc6c23d65a1
2021-11-15 23:12:56 -08:00
d541aa8cbe [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D32454757

fbshipit-source-id: ffb46701843245ac040905423eb950902b51951d
2021-11-15 21:54:23 -08:00
27cc11226d make broadcast fastpath the default for currently rolled-out ops (#68365)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68365

title. broadcast fastpath has been running fine for the enabled ops for a while now, so make it the default for these ops.

Test Plan: diff is a no-op, so sandcastle

Differential Revision: D32107847

fbshipit-source-id: b239b127b219985bf7df6a0eea2d879b8e9c79a4
2021-11-15 21:41:57 -08:00
7ee84ad321 Refactoring quantized op tests to combine test classes (#68282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68282

Combined 3 Dynamic quantized op test classes into 1

Test Plan:
python test/test_quantization.py TestDynamicQuantizedOps

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D32402163

fbshipit-source-id: 696b7ef5d823632941dc7afc95161501445d0e18
2021-11-15 20:47:02 -08:00
065018d812 [pytorch][xros] Ensure all pytorch mobile operators build ok in XROS mode (#68266)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68266
* Use `if...endif` to adjust pyTorch internals towards XROS

Test Plan: CI

Reviewed By: kkosik20

Differential Revision: D32190771

fbshipit-source-id: cce073dea53c2b5681d913321101cd83c6472019
2021-11-15 19:52:45 -08:00
86c1368611 [fx][const fold] Add test/example for skipping quant/dequant pattern (#68378)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68378

Add test/example for skipping quant/dequant pattern

Reviewed By: jfix71

Differential Revision: D32410544

fbshipit-source-id: e63419a01a097e4c570c3861d79d573cabc0b294
2021-11-15 18:49:23 -08:00
722af775c3 [ONNX] ConstantMap setters to update existing value instead of emplace (#67630) (#67812)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67812

`UpdateShape` uses `.emplace(tensorName, shapeValue)`. This will not update `shapeValue` for `tensorName`, if such name already exist in the map. Hence our code is not able to correct the shape inference error, even if we inferred the shape correctly later.

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D32181300

Pulled By: malfet

fbshipit-source-id: 05c58ad3fdac683ad957996acde8f0ed6341781d

Co-authored-by: BowenBao <bowbao@microsoft.com>
2021-11-15 17:20:07 -08:00
d32efe8bc2 [ONNX] Remove the argument use_external_data_format of export() method entirely. (#67080) (#67811)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67811

* remove the argument use_external_data_format of export() method entirely

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D32181302

Pulled By: malfet

fbshipit-source-id: 4bc1448b7487bb9dfdad4e36008ff5b227fd64a3

Co-authored-by: hwangdeyu <dejack953@outlook.com>
2021-11-15 17:20:04 -08:00
9d25554d45 [ONNX] Allow registration of custom symbolics for aten namespace (#66481) (#67810)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67810

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D32181303

Pulled By: malfet

fbshipit-source-id: af2a715dc554b958fa3f5a7a8ae96cb3f7d112bb
2021-11-15 17:18:39 -08:00
09615cd0b0 Adding Dynamic Conv and ConvT ops/modules (#68176)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68176

it should be noted that for the modules, reduce_range is set to
true by default in a similar fashion to linear_dynamic.

Test Plan:
python test/test_quantization.py TestDynamicQuantizedModule
python test/test_quantization.py TestDynamicQuantizedConv
python test/test_quantization.py TestQuantizedConv

Imported from OSS

Reviewed By: kimishpatel

Differential Revision: D32374003

fbshipit-source-id: 011562bd0f4d817387d53bb113df2600aa60a7a3
2021-11-15 16:42:25 -08:00
529ebae0ac Bugfix for TorchScript RNN RELU and TANH (#61274)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28418
Related https://github.com/pytorch/pytorch/issues/32976 but has already been fixed before.

TorchScript handling of GRU and LSTM have been working, but not for RNN (Tanh and ReLU). The reason is that the ```Union[Tensor, PackedSequence]``` is not supported by TorchScript. Using ```torch._jit_internal._overload_method``` in ```RNNBase::Forward``` does not work, as it seems TorchScript does not correctly use them if the method gets inherited by ```RNN```. My solution is to move the ```RNNBase::forward``` to ```RNN``` and annotate using ```torch._jit_internal._overload_method```. LSTM and GRU anyway use their own ```forward``` methods, so there seems to be no problem related to this fix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61274

Reviewed By: anjali411

Differential Revision: D32374452

Pulled By: malfet

fbshipit-source-id: 77bab2469c01c5dfa5eaab229429724a4172445d

Co-authored-by: Nikita Shulga <nshulga@fb.com>
2021-11-15 16:20:58 -08:00
2fd468e5f8 [jit] Set the graph input types before interpreting the graph during tracing (#68242)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68242

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D32382958

Pulled By: navahgar

fbshipit-source-id: 4e82a604a9ea2046af2755de23944147e618a65f
2021-11-15 15:44:32 -08:00
9ed49449b3 [SR] Add net level record functions (#68091)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68091

Add record functions for recording perf stats on the entire network.

Note that this is backed by the same pre-sampling mechanism as the op record functions, so net level stats get logged relatively infrequently. (If this is not acceptable, we can not use pre-sampling at the cost of a little bit of perf, every inference will require an RNG call)

Reviewed By: hlu1

Differential Revision: D32296756

fbshipit-source-id: 09ff16c942f3bfc8f4435d6cca2be4a6b8dc6091
2021-11-15 15:39:08 -08:00
0823d18fcd make TSComputation ctor explicit (#68286)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68286

Test Plan: check it compiles

Reviewed By: alanwaketan

Differential Revision: D32402016

fbshipit-source-id: b623afa8831cd906336d7fcafbcbad32f79254b0
2021-11-15 14:58:33 -08:00
7b958fbec4 ci: Build periodic jobs with DEBUG=1 (#67192)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67192

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD, janeyx99

Differential Revision: D31902447

Pulled By: seemethere

fbshipit-source-id: 1d1cca8b5ac84b1c23ab73e2d973bfb7bffa8982
2021-11-15 14:51:06 -08:00
ea0a558487 GHA CI: make the default config use only one GPU (#68382)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66511

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68382

Reviewed By: mrshenli

Differential Revision: D32441585

Pulled By: janeyx99

fbshipit-source-id: d92407c9bb9e4f740435840b4022e75749d7f0ba
2021-11-15 14:35:49 -08:00
6adbe044e3 Added nearest-exact interpolation mode (#64501)
Summary:
Added "nearest-exact" interpolation mode to fix the issues: https://github.com/pytorch/pytorch/issues/34808 and https://github.com/pytorch/pytorch/issues/62237.

Description:

As we can not fix "nearest" mode without large impact on already trained model [it was suggested](https://github.com/pytorch/pytorch/pull/64501#pullrequestreview-749771815) to introduce new mode instead of fixing exising "nearest" mode.

- New mode "nearest-exact" performs index computation for nearest interpolation to match scikit-image, pillow, TF2 and while "nearest" mode still match opencv INTER_NEAREST, which appears to be buggy, see https://ppwwyyxx.com/blog/2021/Where-are-Pixels/#Libraries.

"nearest":
```
input_index_f32 = output_index * scale
input_index = floor(input_index_f32)
```

"nearest-exact"
```
input_index_f32 = (output_index + 0.5) * scale - 0.5
input_index = round(input_index_f32)
```

Comparisions with other libs: https://gist.github.com/vfdev-5/a5bd5b1477b1c82a87a0f9e25c727664

PyTorch version | 1.9.0 "nearest" | this PR "nearest" | this PR "nearest-exact"
---|---|---|---
Resize option: | |
OpenCV INTER_NEAREST result mismatches | 0 | 0 | 10
OpenCV INTER_NEAREST_EXACT result mismatches | 9 | 9 | 9
Scikit-Image result mismatches | 10 | 10 | 0
Pillow result mismatches | 10 | 10 | 7
TensorFlow result mismatches | 10 | 10 | 0
Rescale option: | | |
size mismatches (https://github.com/pytorch/pytorch/issues/62396) | 10 | 10 | 10
OpenCV INTER_NEAREST result mismatches | 3 | 3| 5
OpenCV INTER_NEAREST_EXACT result mismatches | 3 | 3| 4
Scikit-Image result mismatches | 4 | 4 | 0
Scipy result mismatches | 4 | 4 | 0
TensorFlow: no such option | - |  -

Versions:
```
skimage: 0.19.0.dev0
opencv: 4.5.4-dev
scipy: 1.7.2
Pillow: 8.4.0
TensorFlow: 2.7.0
```

Implementations in other libs:

- Pillow:
  - ee079ae67e/src/libImaging/Geometry.c (L889-L899)
  - ee079ae67e/src/libImaging/Geometry.c (L11)
  - `a[2] == 0`

- Scikit-Image :
  - dev v0.19.0 uses scipy ndi.zoom:
    - 38fae50c3f/skimage/transform/_warps.py (L180-L188)
    - 47bb6febaa/scipy/ndimage/src/ni_interpolation.c (L775-L779)
    - 47bb6febaa/scipy/ndimage/src/ni_interpolation.c (L479)

Additionally:
- Updated upsampling tests

cc ezyang gchanan albanD mruberry jbschlosser walterddr fmassa heitorschueroff ppwwyyxx

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64501

Reviewed By: anjali411

Differential Revision: D32361901

Pulled By: jbschlosser

fbshipit-source-id: df906f4d25a2b2180e1942ffbab2cc14600aeed2
2021-11-15 14:28:19 -08:00
e3bcf64ff8 [qnnpack] Remove redundant fp16 dependency (#68011)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68011

`qnnpack/operator.h` introduces a dependency on an external library fp16 via `qnnpack/requantization.h`.
Including `qnnpack/operator.h` in `pytorch_qnnpack.h` will make objects who really don't require fp16 depend on it indirectly because they include `pytorch_qnnpack.h`.
This was causing some test and bench targets to fail building for local and android/arm64 (only two tried) using cmake.

This diff moves `qnnpack/operator.h` from `pytorch_qnnpack.h` to `qnnpack_func.h`, and explicitly add `qnnpack/operator.h` in `src/conv-prepack.cc`.

Test Plan: Ran all the tests for local on my devserver, and arm64 on Pixel3a.

Reviewed By: salilsdesai

Differential Revision: D32250984

fbshipit-source-id: 21468d8ef79c90e9876dc00da95383180a1031b5
2021-11-15 12:38:44 -08:00
0cf46fb0de [fx2trt] fix a bug in conversion from negative dim to positive dim (#68360)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68360

Added a helper function to do this. Only use `mod` to convert negative dim to positive. Do nothing when it's already positive.

Previously in `getitem` if we are slicing to the very end, we will get the dimension wrong.

Test Plan: Add a unit test

Reviewed By: yinghai, wushirong

Differential Revision: D32432893

fbshipit-source-id: 3c5d6a578d92a15207a5e52802750f9ea7f272a9
2021-11-15 12:30:50 -08:00
549e014963 [docs] fix torch.histc's min/max arg types (#64191)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/31475. `torch.histc` accepts Scalar min/max. The docs erroneously specified their types as int.

cc brianjo mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64191

Reviewed By: mrshenli

Differential Revision: D32437279

Pulled By: saketh-are

fbshipit-source-id: e6017e9236d815abd818dcd44e27819611666823
2021-11-15 12:29:25 -08:00
ccd9675569 [lint] Disable modernize-use-nodiscard (#68354)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68354

Lint rule: https://clang.llvm.org/extra/clang-tidy/checks/modernize-use-nodiscard.html

This check adds a ton of noise to our diffs. `[[nodiscard]]` is typically only useful when ignoring the return value of a function is a critical error, e.g. for `operator new`.

Test Plan: Verified that the lint does not get triggered

Reviewed By: hlu1

Differential Revision: D32429731

fbshipit-source-id: ca3d90686ec8d419d3f96167140dc406df6f4a53
2021-11-15 12:11:08 -08:00
c697eeba72 [JIT] Combine concat nodes where possible (#67000)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67000

See the [related issue](https://github.com/pytorch/pytorch/issues/66654) for context.

This new JIT optimization transforms patterns like this:
```
%inputs.1 : Tensor[] = prim::ListConstruct(%a, %b, %c)
%concat.1 : Tensor = aten::cat(%inputs, %dim)
%inputs.2 : Tensor[] = prim::ListConstruct(%x, %concat.1, %y)
%concat.2 : Tensor = aten::cat(%inputs.2, %dim)
```
into this:
```
%inputs.2 : Tensor[] = prim::ListConstruct(%x, %a, %b, %c, %y)
%concat.2 : Tensor = aten::cat(%inputs.2, %dim)
```
(it can do this for chains of `aten::cat` longer than 2 as well)

A few conditions have to hold:
1.  The `dim`s have to match.
2. `inputs.1` and `inputs.2` cannot be mutated

Test Plan: `buck test caffe2/test/cpp/jit:jit -- ConcatOpt`

Reviewed By: d1jang

Differential Revision: D31819491

fbshipit-source-id: 9f1a501d52099eb1a630b5dd906df4c38c3817ba
2021-11-15 12:02:45 -08:00
30cda0b28c [bugfix] functionalization pass for view ops without a 'self' first argumennt (#68339)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68339

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32429570

Pulled By: bdhirsh

fbshipit-source-id: e6df243c508c2ba2ca1df7a53fa68f32db454f32
2021-11-15 11:58:21 -08:00
5b05983497 [bugfix] fix two edge cases in functionalization (#68269)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68269

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32396357

Pulled By: bdhirsh

fbshipit-source-id: 1d374b693f3f526d027104cbdc08b8bbe9d38307
2021-11-15 11:58:18 -08:00
12026124cc Avoid the view for mkldnn case in 1D convolution (#68166)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68034

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68166

Reviewed By: mrshenli

Differential Revision: D32432444

Pulled By: jbschlosser

fbshipit-source-id: fc4e626d497d9e4597628a18eb89b94518bb3b33
2021-11-15 11:56:45 -08:00
56024e91c9 GHA: Enable flaky test reporting by setting PYTORCH_RETRY_TEST_CASES=1 (#68300)
Summary:
Enables https://github.com/pytorch/pytorch/issues/68150 in CI

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68300

Reviewed By: seemethere

Differential Revision: D32435332

Pulled By: janeyx99

fbshipit-source-id: 155018afaf73d5a24d13d358879361468ec7b18e
2021-11-15 11:23:55 -08:00
24b60b2cbf [lint] lintrunner fixes/improvements (#68292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68292

- noqa was typo-d to be the same as type: ignore
- generalize clang-tidy initialization and use it for clang_format as well
- Add a script that lets you update the binaries in s3 relatively easily

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32403934

Pulled By: suo

fbshipit-source-id: 4e21b22605216f013d87d636a205707ca8e0af36
2021-11-15 11:08:26 -08:00
43874d79e7 Fix failing test due to a bug in NumPy when using OpenBLAS (#67679)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67679

implementations

Fixes https://github.com/pytorch/pytorch/issues/67675

cc mruberry

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D32368698

Pulled By: mruberry

fbshipit-source-id: 3ea6ebc43c061af2f376cdf5da06884859bbbf53
2021-11-15 08:25:12 -08:00
d1c529bd0b replace platform specific CI environment variables with generic ones (#68133)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59478

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68133

Reviewed By: saketh-are

Differential Revision: D32401080

Pulled By: atalman

fbshipit-source-id: 057a34a56f8a2d324f4d1ea07da3a09772177897
2021-11-15 07:02:44 -08:00
1c0d6ff835 [fx][const fold] Allow to set up a function to modify const_nodes for split_const_subgraphs (#67784)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67784

FX model generates quant/dequant layers for INT8 explicit mode support. However, if the inputs of quant/dequant layers are constant, the layer will be put into constant subgraph and optimized out. Hence TensorRT will fails to parse the left over graph. It is better to set up an optional function (skip_folding_node_fn) to skip folding nodes for split_const_subgraphs.

Reviewed By: jfix71

Differential Revision: D32076970

fbshipit-source-id: 7dcbb4f02386f8c831d09a2f0e40bcdba904471c
2021-11-15 06:51:19 -08:00
4c87aa77d1 [DataPipe] Traverse DataPipe graph excluding primitive and callable (#67783)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67783

Add `getstate_hook` to exclude primitive objects and callable when serialization when `exclude_primitive` is enabled for `traverse`.
For graph traversing, we don't have to handle the lambda and other stuff.
This is used by `OnDiskCacheHolder` to trace the DataPipe Graph.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D32146697

Pulled By: ejguan

fbshipit-source-id: 03b2ce981bb21066e807f57c167b77b2d0e0ce61
2021-11-15 06:46:31 -08:00
1adeeabdc0 Fix trt tuple(Dims) throwing issue (#68318)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68318

Adding a `__iter__` binding so that when we do `tuple(Dims)` can construct the right iterator and knows where to stop instead of trial and error with exception catch. We should upstream this to https://github.com/NVIDIA/TensorRT. cc: wushirong

I did try a very similar `__iter__` fix previsouly but not sure why it wasn't effective...

Reviewed By: kflu, wushirong

Differential Revision: D32412430

fbshipit-source-id: 6390a1275dc34ef498acf933bb96f636c15baf41
2021-11-13 19:48:46 -08:00
be281fc597 Check for None in torch.jit.Graph.create (#68253)
Summary:
...because we don't like segfaults from Python (see test).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68253

Reviewed By: suo

Differential Revision: D32396747

Pulled By: gmagogsfm

fbshipit-source-id: a0925e8479702766e88176280985a63bc79e4f6a
2021-11-13 11:30:33 -08:00
6fb8ebcd92 [tensorexp] Add strides to Buf (#68018)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68018

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32262381

Pulled By: IvanKobzarev

fbshipit-source-id: dba79add0bf703bc2378d64e726d4c47ec30e3be
2021-11-13 08:33:01 -08:00
f7366ca51b implemented quantize_per_tensor_dynamic and added a corresponding test script (#68004)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68004

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D32301792

Pulled By: dzdang

fbshipit-source-id: f680557ba4736d095efc33e8c92111265f25aee0
2021-11-13 06:34:36 -08:00
cb14a258a2 [c10d] Fix object-based collectives for debug mode (#68223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68223

DETAIL debug mode didn't work with object-based collectives for NCCL backend, because we'd only check if backend is NCCL and then move tensors to CUDA.

Instead, check if it is a wrapped PG, and then check the pg that is wrapped to see if its nccl.
ghstack-source-id: 143242023

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D32366840

fbshipit-source-id: be0a2af6849f8f24446593f4a4fbea4a67586ee5
2021-11-13 04:18:31 -08:00
ec94bb787a [TensorExpr] Add a way to define target triple/cpu/attrs for llvm codegen and turn on the AOT workflow. (#66527)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66527

Differential Revision:
D31593869
D31593869

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: e7534c11fbcf0dab5f49d01d6053caf77b833ef0
2021-11-13 00:52:20 -08:00
52e93fca2c [TensorExpr] Fix some TE python bindings. (#68232)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68232

Differential Revision:
D32380676
D32380676

Test Plan: Imported from OSS

Reviewed By: saketh-are

Pulled By: ZolotukhinM

fbshipit-source-id: 9287a2c765a53b45ac04d625cc010f5384a8bddf
2021-11-13 00:52:18 -08:00
e511a7a5b4 [TensorExpr] Remove non-determinism in iterating over unordered_set of intermediate buffers. (#68277)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68277

Differential Revision:
D32400553
D32400553

Test Plan: Imported from OSS

Reviewed By: saketh-are, priyaramani

Pulled By: ZolotukhinM

fbshipit-source-id: a8fe820bbddaa19f95db432efaa6d3e36095a05e
2021-11-13 00:50:57 -08:00
80339e85c5 Fix disabling bot with subprocessing (#68290)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/68270

Tested locally + tests get disabled properly

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68290

Reviewed By: mrshenli

Differential Revision: D32403956

Pulled By: janeyx99

fbshipit-source-id: 86629daa86f83f6777f2279524ef973af51046b9
2021-11-12 19:56:17 -08:00
282221c5d6 Fuse unsqueeze, cat, sum for inline_cvr (#68289)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68289

Fuse op unsqueese+cat+sum to add op

Reviewed By: jfix71

Differential Revision: D31769197

fbshipit-source-id: 184b3c8217f2ad9fab9ac8d3c91cd33cf7e7de30
2021-11-12 18:20:11 -08:00
48c8de45b0 [ONNX] Remove the argument example_outpus of export() method entirely. (#67082) (#67809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67809

* remove the argument example_outpus of export() method entirely

[ONNX] Follow-up: Remove the argument example_outpus of export() method entirely. (#67629)

* Resolve CI failure

* remove test after removing example_outputs

[ONNX] Follow-up: Follow-up: Remove the argument example_outpus of export() method entirely (#67719)

Removing unused import, resolving flake error.

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D32181305

Pulled By: malfet

fbshipit-source-id: ba00547b7cb455ace86606b1bda643c02bdcfa1b

Co-authored-by: hwangdeyu <dejack953@outlook.com>
2021-11-12 17:06:26 -08:00
a8b93cb3ec More aggressively market functorch.vmap when torch.vmap gets called (#67347)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67347

This PR:
- changes the warning when torch.vmap gets called to suggest using
functorch.vmap
- changes the warning when a batching rule isn't implemented to suggest
using functorch.vmap

Test Plan: - test/test_vmap.py

Reviewed By: H-Huang

Differential Revision: D31966603

Pulled By: zou3519

fbshipit-source-id: b01dc1c2e298ce899b4a3a5fb333222a8d5bfb56
2021-11-12 16:10:16 -08:00
da5ffe752a Add reporting for flaky tests in CI (#68150)
Summary:
This PR does NOT change how signal is displayed in CI but rather just reports stats of flaky tests to RDS. **None of the below will be enabled after landing this PR--it will be done in a separate PR with environment variables.**

We report flaky tests stats when a test first fails, and when we rerun it MAX_NUM_RETRIES times, we get at least one success.
For tests that fail all the reruns, we assume it is because it is a real test failure.
For tests that succeed the first time, we do not rerun the test, even if it was previously noted as flaky.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68150

Test Plan:
First, I modified:
test_async_python to always fail (will be our "failing test")
test_async_future_type_python to fail 40% of the time
test_async_script_capture to fail 60% of the time

Then, running `python test/test_jit.py -v -k test_async` while setting IN_CI to 1:
```
(pytorch) janeyx@janeyx-mbp pytorch % python test/test_jit.py -v -k test_async
...

Running tests...
----------------------------------------------------------------------
  test_async_future_type_python (jit.test_async.TestAsync) ... ok (0.004s)
  test_async_grad_guard_no_grad (jit.test_async.TestAsync) ... ok (0.020s)
  test_async_grad_guard_with_grad (jit.test_async.TestAsync) ... ok (0.008s)
  test_async_kwargs (jit.test_async.TestAsync) ... ok (0.045s)
  test_async_parsing (jit.test_async.TestAsync) ... ok (0.010s)
  test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s)
    test_async_python failed - num_retries_left: 3
  test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s)
    test_async_python failed - num_retries_left: 2
  test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s)
    test_async_python failed - num_retries_left: 1
  test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s)
    test_async_python failed - num_retries_left: 0
  test_async_script (jit.test_async.TestAsync) ... ok (0.008s)
  test_async_script_capture (jit.test_async.TestAsync) ... FAIL (0.010s)
    test_async_script_capture failed - num_retries_left: 3
  test_async_script_capture (jit.test_async.TestAsync) ... FAIL (0.010s)
    test_async_script_capture failed - num_retries_left: 2
  test_async_script_capture (jit.test_async.TestAsync) ... ok (0.011s)
    test_async_script_capture succeeded - num_retries_left: 1
  test_async_script_capture (jit.test_async.TestAsync) ... FAIL (0.010s)
    test_async_script_capture failed - num_retries_left: 0
  test_async_script_error (jit.test_async.TestAsync) ... ok (0.040s)
  test_async_script_multi_forks (jit.test_async.TestAsync) ... ok (0.025s)
  test_async_script_multi_waits (jit.test_async.TestAsync) ... ok (0.009s)
...

======================================================================
FAIL [0.003s]: test_async_python (jit.test_async.TestAsync)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/janeyx/pytorch/test/jit/test_async.py", line 30, in test_async_python
    self.assertTrue(False)
AssertionError: False is not true

======================================================================
FAIL [0.010s]: test_async_script_capture (jit.test_async.TestAsync)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/janeyx/pytorch/test/jit/test_async.py", line 123, in test_async_script_capture
    self.assertTrue(False)
AssertionError: False is not true

----------------------------------------------------------------------
Ran 28 tests in 0.399s

FAILED (failures=2, expected failures=5, unexpected successes=1)
```
Yielding this as the test report (I changed the extension from xml to txt so it uploads here):
[TEST-jit.test_async.TestAsync-20211110222055.txt](https://github.com/pytorch/pytorch/files/7517532/TEST-jit.test_async.TestAsync-20211110222055.txt)

And then running print_test_stats correctly excludes the all failing test `test_async_python` and calculates red and green appropriately:
```
(pytorch) janeyx@janeyx-mbp pytorch % python tools/stats/print_test_stats.py test-reports/python-unittest/test.test_jit
[scribe] Not invoking RDS lambda outside GitHub Actions:
[{'create_table': {'table_name': 'flaky_tests', 'fields': {'name': 'string', 'suite': 'string', 'file': 'string', 'num_green': 'int', 'num_red': 'int', 'pr': 'string', 'ref': 'string', 'branch': 'string', 'workflow_id': 'string', 'build_environment': 'string'}}}]
[scribe] Writing for None
[scribe] Wrote stats for flaky_tests
[scribe] Not invoking RDS lambda outside GitHub Actions:
[{'write': {'table_name': 'flaky_tests', 'values': {'name': 'test_async_script_capture', 'suite': 'jit.test_async.TestAsync', 'file': 'test/test_jit', 'num_green': 1, 'num_red': 3, 'pr': None, 'ref': None, 'branch': None, 'workflow_id': None, 'build_environment': 'linux-xenial-gcc5.4-py3'}}}]
(pytorch) janeyx@janeyx-mbp pytorch %
```

-------------------
If you're curious, I also included the code for when we would like to override the report_only feature and also hide flaky signal in CI. The results for the same test command correctly still fail the test suite, but mark the flaky test_async_future_type_python as passed:
```
(pytorch) janeyx@janeyx-mbp pytorch % python test/test_jit.py -v -k test_async
...

Running tests...
----------------------------------------------------------------------
  test_async_future_type_python (jit.test_async.TestAsync) ... FAIL (0.004s)
    test_async_future_type_python failed - num_retries_left: 3
  test_async_future_type_python (jit.test_async.TestAsync) ... ok (0.001s)
  test_async_grad_guard_no_grad (jit.test_async.TestAsync) ... ok (0.017s)
  test_async_grad_guard_with_grad (jit.test_async.TestAsync) ... ok (0.008s)
  test_async_kwargs (jit.test_async.TestAsync) ... ok (0.091s)
  test_async_parsing (jit.test_async.TestAsync) ... ok (0.010s)
  test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s)
    test_async_python failed - num_retries_left: 3
  test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s)
    test_async_python failed - num_retries_left: 2
  test_async_python (jit.test_async.TestAsync) ... FAIL (0.004s)
    test_async_python failed - num_retries_left: 1
  test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s)
    test_async_python failed - num_retries_left: 0
  test_async_script (jit.test_async.TestAsync) ... ok (0.008s)
  test_async_script_capture (jit.test_async.TestAsync) ... ok (0.011s)
  test_async_script_error (jit.test_async.TestAsync) ... ok (0.039s)
...

======================================================================
FAIL [0.003s]: test_async_python (jit.test_async.TestAsync)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/janeyx/pytorch/test/jit/test_async.py", line 30, in test_async_python
    self.assertTrue(False)
AssertionError: False is not true

----------------------------------------------------------------------
Ran 26 tests in 0.390s

FAILED (failures=1, expected failures=4)
```
With test reports:
[TEST-jit.test_async.TestAsync-20211110224810.txt](https://github.com/pytorch/pytorch/files/7517663/TEST-jit.test_async.TestAsync-20211110224810.txt)
And running print_test_stats:
```
(pytorch) janeyx@janeyx-mbp pytorch % python tools/stats/print_test_stats.py test-reports/python-unittest/test.test_jit
[scribe] Not invoking RDS lambda outside GitHub Actions:
[{'create_table': {'table_name': 'flaky_tests', 'fields': {'name': 'string', 'suite': 'string', 'file': 'string', 'num_green': 'int', 'num_red': 'int', 'pr': 'string', 'ref': 'string', 'branch': 'string', 'workflow_id': 'string', 'build_environment': 'string'}}}]
[scribe] Writing for None
[scribe] Wrote stats for flaky_tests
[scribe] Not invoking RDS lambda outside GitHub Actions:
[{'write': {'table_name': 'flaky_tests', 'values': {'name': 'test_async_future_type_python', 'suite': 'jit.test_async.TestAsync', 'file': 'test/test_jit', 'num_green': 1, 'num_red': 1, 'pr': None, 'ref': None, 'branch': None, 'workflow_id': None, 'build_environment': 'linux-xenial-gcc5.4-py3'}}}]
```

Reviewed By: saketh-are

Differential Revision: D32393907

Pulled By: janeyx99

fbshipit-source-id: 37df890481ab84c62809c022dc6338b50972899c
2021-11-12 15:03:14 -08:00
8bf150f21b Revert D32178667: [pytorch][PR] Python tracer for profiler
Test Plan: revert-hammer

Differential Revision:
D32178667 (33353fb828)

Original commit changeset: 118547104a7d

fbshipit-source-id: 47510607589fc39c730ba913f47c01a7d107b7b0
2021-11-12 14:53:52 -08:00
a82e51a7ae Move some cub templates out of the header file (#67650)
Summary:
Cub routines are both expensive to compile and used in multiple
different operators throughout the cuda folder. So, it makes sense to
compile them in one centralized place where possible (i.e. when
custom operators aren't involved).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67650

Reviewed By: mruberry

Differential Revision: D32259660

Pulled By: ngimel

fbshipit-source-id: 5f7dbdb134297e1ffdc1c7fc5aefee70a2fa5422
2021-11-12 13:51:11 -08:00
6ddaf3bd37 [LT] Upstream TsNode, TsNodeLowering, TsLoweringContext (#68154)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68154

Test Plan: added a basic test; cover more by using lazy_tensor_staging tests

Reviewed By: Krovatkin, alanwaketan

Differential Revision: D32224303

fbshipit-source-id: ac3e1161229b8ae60fdb15ffa72e17072b595914
2021-11-12 12:57:20 -08:00
f6e45102d2 [quant][embedding qat] Support non-partial functions in qconfig comparison (#68067)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68067

Embedding QAT uses a NoopObserver class for activation,
and a FakeQuant for weight, make sure that qconfig comparison
functions properly for a mix of partial function and class in
qconfig.

Test Plan:
`pytest test/quantization/eager/test_quantize_eager_qat.py  -v -k "test_embedding_qat_qconfig_equal"`

Imported from OSS

Reviewed By: HDCharles

Differential Revision: D32318434

fbshipit-source-id: c036eef9cbabe7c247745930501328e9c75a8cb0
2021-11-12 12:48:00 -08:00
66b52d5b49 [TensorExpr] Convert linear_clamp_run to using schema in NNC lowerings. (#66523)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66523

Differential Revision:
D31590857
D31590857

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Pulled By: ZolotukhinM

fbshipit-source-id: da8a7d68c8a4cf74c3f622b8a3af54d00ffb14a6
2021-11-12 12:26:06 -08:00
06e8cb9e04 Manually Disabling two TestDistBackendWithSpawn tests on ROCm, test_ddp_profiling_torch_profiler and test_ddp_sync_bn_training_vs_eval (#68255)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68255

Manually disabling these two tests because they can't be disabled via Probot.

See the issues #68222 and #68173 for details.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH

Test Plan: Imported from OSS

Reviewed By: malfet, saketh-are

Differential Revision: D32390899

Pulled By: NivekT

fbshipit-source-id: bd4996d73014337a9175b20ae67a3880ee994699
2021-11-12 12:04:21 -08:00
33353fb828 Python tracer for profiler (#67407)
Summary:
This PR instruments the CPython interpreter and integrates the resulting trace into the PyTorch profiler.

The python tracing logic works by enabling `PyEval_SetProfile`, and then logging the minimal information to track every time python calls or returns from a function. A great deal of care has gone into keeping this process very lightweight; the `RawEvent` struct is only two words and doesn't do anything fancy. When a python function is called, we have to do extra work. If the call is to `nn.Module.__call__`, we simply incref to extend the life of the module. Otherwise we check if we have seen the function before, and if not go through the (somewhat expensive) task of saving the strings which we then cache.

To actually get a useful timeline, we have to replay the events to determine the state of the python stack at any given point. A second round of stack replay is needed to figure out what the last python function was for each torch op so we can reconstruct the correct python stack. All of this is done during post processing, so while we want to be reasonably performant it is no longer imperative to shave every last bit.

I still need to do a bit of refinement (particularly where the tracer interfaces with the profiler), but this should give a good sense of the general structure.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67407

Test Plan:
```
import torch

class MyModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = torch.nn.Linear(2, 2)
        self.relu = torch.nn.ReLU()

    def forward(self, x):
        x = self.linear(x)
        return self.relu(x)

def call_module():
    m = MyModule()
    for _ in range(4):
        m(torch.ones((2, 2)))

def top_level_fn():
    with torch.profiler.profile(with_stack=True) as p:
        call_module()

    p.export_chrome_trace("test_trace.json")

top_level_fn()
```
<img width="1043" alt="Screen Shot 2021-10-27 at 6 43 18 PM" src="https://user-images.githubusercontent.com/13089297/139171803-f95e70f3-24aa-45e6-9d4b-6d437a3f108d.png">

PS: I've tried to comment liberally, particularly around some of the more magical parts. However I do plan on doing another linting and commenting pass. Hopefully it's not too bad right now.

Reviewed By: gdankel, chaekit

Differential Revision: D32178667

Pulled By: robieta

fbshipit-source-id: 118547104a7d887e830f17b94d3a29ee4f8c482f
2021-11-12 11:58:12 -08:00
96d116fec2 [JIT] Add additional debug output when op cannot be found in AliasDb (#68099)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68099

When an op in the graph cannot be matched to any known ops, alias_analysis.cpp throws an error.

Before:
```
RuntimeError: 0INTERNAL ASSERT FAILED at "../torch/csrc/jit/ir/alias_analysis.cpp":612, please report a bug to PyTorch. We don't have an op for aten::add but it isn't a special case. Argument types: Tensor, float, Tensor,
```

After:
```
RuntimeError: 0INTERNAL ASSERT FAILED at "../torch/csrc/jit/ir/alias_analysis.cpp":612, please report a bug to PyTorch. We don't have an op for a
ten::add but it isn't a special case.  Argument types: Tensor, float, Tensor,

Candidates:
        aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> (Tensor)
        aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
        aten::add.out(Tensor self, Tensor other, *, Scalar alpha=1, Tensor(a!) out) -> (Tensor(a!))
        aten::add.t(t[] a, t[] b) -> (t[])
        aten::add.str(str a, str b) -> (str)
        aten::add.int(int a, int b) -> (int)
        aten::add.complex(complex a, complex b) -> (complex)
        aten::add.float(float a, float b) -> (float)
        aten::add.int_complex(int a, complex b) -> (complex)
        aten::add.complex_int(complex a, int b) -> (complex)
        aten::add.float_complex(float a, complex b) -> (complex)
        aten::add.complex_float(complex a, float b) -> (complex)
        aten::add.int_float(int a, float b) -> (float)
        aten::add.float_int(float a, int b) -> (float)
        aten::add(Scalar a, Scalar b) -> (Scalar)
```

Test Plan:
Run
```
import torch

if __name__ == '__main__':
    ir = """
graph(%x : Tensor,
      %y : Tensor):
  %2 : float = prim::Constant[value=1.2]()
  %result : Tensor= aten::add(%x, %2, %y)
  return (%result)
"""
    x = torch.tensor([[1., 2.], [3., 4.]])
    y = torch.tensor([[2., 1.], [2., 1.]])
    graph = torch._C.parse_ir(ir)
    print(graph)
    graph.alias_db().analyze()
    # print(script(x, y))
```

to get the results above

Imported from OSS

Reviewed By: anjali411

Differential Revision: D32339639

fbshipit-source-id: a79a3c2f157154b5fb1e3f33a23e43b7884e8e38
2021-11-12 08:39:41 -08:00
98bab78e11 Revert D32039318: [pytorch][PR] Bump dlpack.h to latest version
Test Plan: revert-hammer

Differential Revision:
D32039318 (d049772538)

Original commit changeset: 7dfc653e1e77

fbshipit-source-id: 0d4b1af7381a2638ca9f3c3af26c2ff0b7bd7469
2021-11-12 08:20:21 -08:00
5c3a9f3fdc adding opinfo for torch.nn.bilinear and torch.nn.glu (#67478)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67478

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32027807

Pulled By: mikaylagawarecki

fbshipit-source-id: 501057cc9aced19fca26c4294fe81dcbb4b83a26
2021-11-12 08:13:15 -08:00
dc24503a89 Fix Hash(c10::Scalar), account for garbage data in union (#68201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68201

Hash(c10::Scalar) made a bad assumption that it was valid to just hash over all the bytes of data of the c10::Scalar struct.

Becuase c10::Scalar stores a union of different (float/int/complex) types with different sizes, not all bytes are valid in all cases.  Hash() should only read the bytes corresponding to the currently active type.

Test Plan: Added new unit tests.  Verified HashTest.Scalar failed with the original Hash() impl and then fixed.

Reviewed By: alanwaketan

Differential Revision: D32367564

fbshipit-source-id: ac30dd4f6dd0513954986d3d23c0c11ba802c37b
2021-11-12 07:20:08 -08:00
0bd0a67c4f [lint][fbcode/caffe2] CLANGFORMAT
Test Plan:
Proof of coverage:

```
$ hg files fbcode/caffe2 |
  arc linttool debugfilterpaths --take CLANGFORMAT --path-match-only > ~/before.txt

$ hg up this_diff

$ hg files fbcode/caffe2 |
  arc linttool debugfilterpaths --take CLANGFORMAT --path-match-only > ~/after.txt

$ comm -3 ~/before.txt ~/after.txt | pastry
P467377980: https://www.internalfb.com/intern/paste/P467377980/
```

These files lost coverage:

- `fbcode/caffe2/torch/abi-check.cpp`
- `fbcode/caffe2/torch/custom_class.h`
- `fbcode/caffe2/torch/custom_class_detail.h`
- `fbcode/caffe2/torch/deploy.h`
- `fbcode/caffe2/torch/extension.h`
- `fbcode/caffe2/torch/library.h`
- `fbcode/caffe2/torch/script.h`

Everything else in P467377980 gained coverage.

Reviewed By: suo

Differential Revision: D32364856

fbshipit-source-id: 9b3ba3350ecdb50038412a24af5e0da0fe4d69b8
2021-11-12 05:12:39 -08:00
e795315c63 Changes and fixes to prepare for dynamic conv (#68175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68175

This slightly alters the way from_float works so it will work
with placeholder observers. It also fixes a but with ConvTranspose3d and
ConvTranspose1d where the parameters like kernel_size, stride...etc
weren't set properly. New tests were added to check for this type of
issue as well.

Test Plan:
python test/test_quantization.py TestQuantizedOps
python test/test_quantization.py TestStaticQuantizedModule

Imported from OSS

Reviewed By: z-a-f

Differential Revision: D32374004

fbshipit-source-id: caaa548d12d433d9c1fa0abc8597a7d31bb4e8af
2021-11-11 23:55:04 -08:00
1181628d85 BE: Use TORCH_CHECK instead of explicit c10::Error (#68187)
Summary:
`if (cond) { raise c10::error("", msg)}` is identical to `TORCH_CHECK(!cond, msg);`, but with better attribution

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68187

Reviewed By: xuzhao9

Differential Revision: D32360956

Pulled By: malfet

fbshipit-source-id: e554b99926d7ad0c79a1cd54d35f47339fa2429d
2021-11-11 22:01:41 -08:00
799ebce3aa Add algo recorder/replayer to lower.py (#68194)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68194

Add algorithm recorder/replayer to lower.py

Reviewed By: yinghai

Differential Revision: D31909575

fbshipit-source-id: 552f2ba4fbd6ea646316f6412d55416a76e1f69a
2021-11-11 21:22:22 -08:00
613c1aca6d Adds support for automated error and warning testing (#67354)
Summary:
Adds a new class `ErrorOrWarningInput` that is a `SampleInput` with some additional metadata for validating that `SampleInput` throws the desired warning or error. The architecture to support these new tests is modeled after the existing reference tests and sample input functions.

Existing invalid input tests for neg and kthvalue are ported to the new scheme to validate it.

There may be a simpler/clearer naming scheme we can use here.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67354

Reviewed By: jbschlosser

Differential Revision: D31989888

Pulled By: mruberry

fbshipit-source-id: 4fa816e1e8d0eef21b81c2f80813d42b2c26714e
2021-11-11 19:28:47 -08:00
89d556f648 add VS extension in doc (#63944)
Summary:
add VS  extension in https://pytorch.org/cppdocs/installing.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63944

Reviewed By: malfet

Differential Revision: D30546156

Pulled By: seemethere

fbshipit-source-id: a65448d8702f9fd400c9dd2ef2d9f961f30c4983
2021-11-11 18:02:08 -08:00
9cb65df79f [Static Runtime] Fallback to disabling manage_output_tensors instead of crashing when wrong API is used (#67939)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67939

With `manage_output_tensor` enabled, a client of `StaticRuntime` requires to call it via  `PyTorchPredictor::predict_managed_result`. If the client uses `PyTorchPredictor::operator()`  the client will experience a crash (intended behavior not to  leak memory of managed output tensors). This mistake can cause a catastrophic failure in production if that happens (by gatekeeper, config changes, etc).

Considering the complexity in how `PyTorchPredictor` is used in different settings, the chances that this bug can hit production is non-zero.

This change introduces `StaticRuntime::disableManageOutputTensor` to disable `manage_output_tensor` feature when a client mistakenly uses `PyTorchPredictor::operator()` instead of crashing. When `StaticRuntime` is invoked via `PyTorchPredictor::operator()`, it first calls  `StaticRuntime::disableManageOutputTensor` to disable the feature, so that it can get non-managed output tensors to pass to the client safely.

A slight perf degradation is expected by forcefully disabling `manage_output_tensors`, but its robustness value outweighs a catastrophic failure of crashes at a high rate.

Test Plan: Added a unittest `StaticRuntime, DisableManageOutputTensors` to cover the newly added code.

Reviewed By: swolchok

Differential Revision: D32219731

fbshipit-source-id: caf5c910b34726c570e17435ede7d888443e90cf
2021-11-11 17:31:07 -08:00
3dc0754c53 [pytorch][mobile] deprecate the LLVM-based static analyzer (#68180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68180

Since we've open sourced the tracing-based selective build, we can deprecate the
op-dependency-graph-based selective build and the static analyzer tool that
produces the dependency graph.
ghstack-source-id: 143108377

Test Plan: CIs

Reviewed By: seemethere

Differential Revision: D32358467

fbshipit-source-id: c61523706b85a49361416da2230ec1b035b8b99c
2021-11-11 16:37:08 -08:00
301369a774 [PyTorch][Fix] Pass the arguments of embedding as named arguments (#67574)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67574

When adding the optional params for sharded embedding op. Found that we cannot get these params from `__torch_function__` override. The reason is that we don't pass them via keyword arguments. So maybe we want to change them to kwargs?
ghstack-source-id: 143029375

Test Plan: CI

Reviewed By: albanD

Differential Revision: D32039152

fbshipit-source-id: c7e598e49eddbabff6e11e3f8cb0818f57c839f6
2021-11-11 15:22:10 -08:00
9571eb599c [lint] fix up clangtidy lintrunner integration (#68192)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68192

- Run on exactly the same stuff as the existing linter checks.
- Exclude deploy interpreter headers from being reported.

Test Plan: Imported from OSS

Reviewed By: janeyx99

Differential Revision: D32364023

Pulled By: suo

fbshipit-source-id: c27eca4a802534875d609d004fa9f6fca59ae6a5
2021-11-11 14:53:28 -08:00
6afb414c21 Nan in linalg eig (#67544)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61251. As per the comment here (https://github.com/pytorch/pytorch/issues/61251#issuecomment-954676082), a consensus has been reached to raise an error if there is a NaN value in the input when calling `eig()`. This PR implements that feature.

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67544

Reviewed By: malfet

Differential Revision: D32310919

Pulled By: mruberry

fbshipit-source-id: fc74a1ae2d929157c2d4c9051e3e9a4bf03dd5be
2021-11-11 14:33:49 -08:00
d049772538 Bump dlpack.h to latest version (#65047)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64995

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65047

Reviewed By: ngimel

Differential Revision: D32039318

Pulled By: mruberry

fbshipit-source-id: 7dfc653e1e77799d1f26a95fa9bbae3c7ffc887c
2021-11-11 14:02:16 -08:00
0420545639 Enable all dtype combinations in torch.Tensor.view(dtype) (#66493)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/29013

Note: This PR does not enable autograd. This can be done in a future PR.

cc mruberry rgommers

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66493

Reviewed By: gchanan

Differential Revision: D32314680

Pulled By: mruberry

fbshipit-source-id: 69d325573b2331f32b83c05c91ffbe80571e7ae2
2021-11-11 13:55:21 -08:00
f9ea41f257 Fixes spelling error writeable to writable, improves warning, and documentation (#67664)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46741
pytorchbot

contributors: nickleus27, yanivsagy, and khanhthien123

SmrutiSikha this is mostly your work.  We just did very minor clean up.

cc mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67664

Reviewed By: gchanan

Differential Revision: D32311838

Pulled By: mruberry

fbshipit-source-id: 0e5d4d888caeccb0fd7c80e6ff11b1b1fa8e00d6
2021-11-11 13:05:00 -08:00
1e8f836c44 Remove OpInfo non-contig inputs (#67677)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67677

This follows
https://github.com/pytorch/pytorch/issues/63341#issuecomment-899690614

Fixes https://github.com/pytorch/pytorch/issues/67012

Note. I wrote the OpInfo for `index_fill`, so removing those inputs in
there is right. kshitij12345 mentioned that the same thing is true for
the inputs for tile / repeat.
https://github.com/pytorch/pytorch/issues/67012#issuecomment-948537446

There are more uses of `transpose` within the OpInfos, but most of them
are for testing `mm` and `baddmm`. I did not touch those, as those
operations are so important that it won't hurt to test those more
thoroughly.

cc mruberry

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32311729

Pulled By: mruberry

fbshipit-source-id: ac0804ca6f893118046b3e1bd97b5a2e6b900b59
2021-11-11 13:03:16 -08:00
4fe3965b3a Fix dtype arg typing for Tensor.type doc string (#67019)
Summary:
Fix typing error in PyCharm when using torch.Tensor.type(dtype=torch.int64)

<img width="386" alt="Screenshot 2021-10-21 at 15 30 50" src="https://user-images.githubusercontent.com/59562934/138288062-cc2ba45e-ece0-4fca-9369-55d020404c28.png">

Thanks for your great work! :)

cc brianjo mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67019

Reviewed By: malfet

Differential Revision: D32311313

Pulled By: mruberry

fbshipit-source-id: 90fc453bc4129a301d567d4b39137b93c5dac01e
2021-11-11 12:58:46 -08:00
b07a11929d Array API: Add torch.linalg.cross (#63285)
Summary:
### Create `linalg.cross`

Fixes https://github.com/pytorch/pytorch/issues/62810

As discussed in the corresponding issue, this PR adds `cross` to the `linalg` namespace (**Note**: There is no method variant) which is slightly different in behaviour compared to `torch.cross`.

**Note**: this is NOT an alias as suggested in mruberry's [https://github.com/pytorch/pytorch/issues/62810 comment](https://github.com/pytorch/pytorch/issues/62810#issuecomment-897504372) below
> linalg.cross being consistent with the Python Array API (over NumPy) makes sense because NumPy has no linalg.cross. I also think we can implement linalg.cross without immediately deprecating torch.cross, although we should definitely refer users to linalg.cross. Deprecating torch.cross will require additional review. While it's not used often it is used, and it's unclear if users are relying on its unique behavior or not.

The current default implementation of `torch.cross` is extremely weird and confusing. This has also been reported multiple times previously. (See https://github.com/pytorch/pytorch/issues/17229, https://github.com/pytorch/pytorch/issues/39310, https://github.com/pytorch/pytorch/issues/41850, https://github.com/pytorch/pytorch/issues/50273)

- [x] Add `torch.linalg.cross` with default `dim=-1`
- [x] Add OpInfo and other tests for `torch.linalg.cross`
- [x] Add broadcasting support to `torch.cross` and `torch.linalg.cross`
- [x] Remove out skip from `torch.cross` OpInfo
- [x] Add docs for `torch.linalg.cross`. Improve docs for `torch.cross` mentioning `linalg.cross` and the difference between the two. Also adds a warning to `torch.cross`, that it may change in the future (we might want to deprecate it later)

 ---

### Additional Fixes to `torch.cross`
- [x] Fix Doc for Tensor.cross
- [x] Fix torch.cross in `torch/overridres.py`

While working on `linalg.cross` I noticed these small issues with `torch.cross` itself.

[Tensor.cross docs](https://pytorch.org/docs/stable/generated/torch.Tensor.cross.html) still mentions `dim=-1` default which is actually wrong. It should be `dim=None` after the behaviour was updated in PR https://github.com/pytorch/pytorch/issues/17582 but the documentation for the `method` or `function` variant wasn’t updated. Later PR https://github.com/pytorch/pytorch/issues/41850 updated the documentation for the `function` variant i.e `torch.cross` and also added the following warning about the weird behaviour.
> If `dim` is not given, it defaults to the first dimension found with the size 3. Note that this might be unexpected.

But still, the `Tensor.cross` docs were missed and remained outdated. I’m finally fixing that here. Also fixing `torch/overrides.py` for `torch.cross` as well now, with `dim=None`.

To verify according to the docs the default behaviour of `dim=-1` should raise, you can try the following.

```python
a = torch.randn(3, 4)
b = torch.randn(3, 4)
b.cross(a)  # this works because the implementation finds 3 in the first dimension and the default behaviour as shown in documentation is actually not true.
>>> tensor([[ 0.7171, -1.1059,  0.4162,  1.3026],
        [ 0.4320, -2.1591, -1.1423,  1.2314],
        [-0.6034, -1.6592, -0.8016,  1.6467]])

b.cross(a, dim=-1)  # this raises as expected since the last dimension doesn't have a 3
>>> RuntimeError: dimension -1 does not have size 3
```

Please take a closer look (particularly the autograd part, this is the first time I'm dealing with `derivatives.yaml`). If there is something missing, wrong or needs more explanation, please let me know. Looking forward to the feedback.

cc mruberry Lezcano IvanYashchuk rgommers

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63285

Reviewed By: gchanan

Differential Revision: D32313346

Pulled By: mruberry

fbshipit-source-id: e68c2687c57367274e8ddb7ef28ee92dcd4c9f2c
2021-11-11 12:49:41 -08:00
40bedf6206 Fix test_triangular_solve testcase enumeration (#67635)
Summary:
use product instead of zip to cover all cases

cc mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67635

Reviewed By: malfet

Differential Revision: D32310956

Pulled By: mruberry

fbshipit-source-id: 806c3313e2db26d77199d3145b2d5283b6ca3617
2021-11-11 12:49:38 -08:00
db014b8529 Add set_deterministic_debug_mode and get_deterministic_debug_mode (#67778)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67386

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67778

Reviewed By: ngimel

Differential Revision: D32310661

Pulled By: mruberry

fbshipit-source-id: 300129e96ca51c22fa711182ce6a9f4d4d2ce57f
2021-11-11 12:48:29 -08:00
cd4e31ff21 [LTC] Add some comments to BackendDevice() (#68156)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68156

[skip ci]

Test Plan: Imported from OSS

Reviewed By: wconstab

Differential Revision: D32346302

Pulled By: alanwaketan

fbshipit-source-id: 06de6afbe2f937511abce485b24cec0a85bfbe97
2021-11-11 12:43:56 -08:00
7b376bf844 Remove ProcessGroup from TensorPipeAgent initialization (#68128)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68128

Reland of D31762735 (0cbfd466d2).

This diff was originally reverted due to failure in test_send_export_type_through_rpc_with_custom_pickler.

I updated rpc_pickler_test.py to prevent a race condition where processes were not registering their pickler before handling their rpc_sync calls.

Test Plan:
rpc_pickler_test file:

buck test mode/dev-nosan -c 'cxx.coverage_only=caffe2' //caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test //caffe2/torch/fb/training_toolkit/backend/metrics/collectors/fbdata_aggregator/tests:batch_collector_test -- --run-disabled --collect-coverage '--code-coverage-session=test_session' --force-tpx

rpc_pickler stress test:

buck test mode/dev-nosan -c 'cxx.coverage_only=caffe2' //caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test -- --exact 'caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test - test_send_export_type_through_rpc_with_custom_pickler (caffe2.torch.fb.training_toolkit.backend.metrics.tests.rpc_pickler_test.CythonTypeRpcSpawnTest)' --run-disabled --collect-coverage '--code-coverage-session=test_session' --force-tpx --jobs 18 --stress-runs 10 --record-results

Reviewed By: mrshenli

Differential Revision: D32316077

fbshipit-source-id: e58de2335fbaa3ab46d46fe222c659197633a5e4
2021-11-11 12:28:55 -08:00
b473ca999b [lint] add cmakelint to lintrunner (#68191)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68191

+ fix filename of exec_linter

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D32364022

Pulled By: suo

fbshipit-source-id: 740892d9580edc348c3e818664fd37f145669fda
2021-11-11 12:19:01 -08:00
6cade3362b [fx-acc] add optimize_noop graph opt (#68131)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68131

Ports EliminateNoop to FX

Adds optimization for a few more ops and cases than the glow version
* `acc_ops.dequantize`
* `acc_ops.flatten`
* `acc_ops.(max|min)_full_reduce`
* `acc_ops.permute`
* `acc_ops.reshape`
* `acc_ops.squeeze`
* `acc_ops.to_dtype`

Already covered by either constant fold or custom mapper
* acc_ops.slice_tensor
* acc_ops.getitem

Bug fix
* If `-1` is used in reshape's `shape` argument, we would convert this inferred value to actual positive value but needed to use integer division, otherwise we get a float in the shape tuple. Existing unit tests didn't cover this because `unittest.TestCase.assertEqual(1, 1.0)` doesn't check types and returns `True`.

Test Plan:
# Graph Opt
`buck test mode/opt glow/fb/fx/graph_opts:test_fx_graph_opts -- TestEliminateNoOp`
```
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 95c17eb9-cd4d-463a-96c8-358ca3679d56
Trace available for this run at /tmp/tpx-20211105-144929.801413/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5629499609900775
    ✓ ListingSuccess: glow/fb/fx/graph_opts:test_fx_graph_opts - main (4.873)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_01_noop_dequantize (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.032)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_02_flatten (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.048)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_12_tile (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.081)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_15_to_dtype (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.022)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_20_cat (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.126)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_18_max_pool2d (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.183)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_08_reshape (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.034)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_16_avg_pool2d (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.183)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_10_squeeze (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.048)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_06_min_full_reduce (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.038)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_09_noop_reshape (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.055)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_00_identity (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.025)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_04_max_full_reduce (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.037)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_21_noop_cat (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.037)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_03_noop_flatten (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.040)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_19_noop_max_pool2d (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.135)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_11_noop_squeeze (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.036)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_14_to_dtype (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.024)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_17_noop_avg_pool2d (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.114)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_13_noop_tile (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.031)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_05_noop_max_full_reduce (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.026)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_07_noop_min_full_reduce (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.030)
Summary
  Pass: 22
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5629499609900775
```

# Shape Inference
`buck test mode/opt //glow/fb/fx/acc_tracer:test_acc_shape_inference`
```
Summary
  Pass: 99
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4503599703156114
```

Reviewed By: jfix71

Differential Revision: D32081046

fbshipit-source-id: 22403f2bb72a2605f1adcbb733e8150795c7984b
2021-11-11 12:08:24 -08:00
fe90313d02 Avoid index_put_ overhead in histogram kernel's inner loop (#67815)
Summary:
**TLDR**: Makes torch.histc run 400x faster on large inputs. Should fix [a broken test on internal CI](https://www.internalfb.com/intern/test/281475013640093/).

HistogramKernel presently calls torch.Tensor.index_put_ once for each element of its input tensor. Obtaining a data pointer and manipulating it directly avoids the considerable dispatch overhead from calling index_put_. Behavior is unchanged because the tensor being operated on is known to be contiguous and in CPU memory.

Fixes performance regression introduced in https://github.com/pytorch/pytorch/pull/65318.

Benchmark: time taken to compute histc on a tensor with 10,000,000 elements

1. Before https://github.com/pytorch/pytorch/pull/65318: **0.003s**
2. After https://github.com/pytorch/pytorch/pull/65318: **2.154s**
3. After this change: **0.005s**

Benchmark code:
```
import torch as t
from timeit import default_timer as timer

x = t.randperm(10000000, dtype=t.float32)

start = timer()
t.histc(x)
end = timer()
print(end - start)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67815

Reviewed By: anjali411

Differential Revision: D32357663

Pulled By: saketh-are

fbshipit-source-id: f8fa59173ea4772c8ad1332548ef4d9ea8f01178
2021-11-11 11:16:45 -08:00
61a94495d9 [DataPipe] adding ZipperMapDataPipe (#68032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68032

Part of #57031

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32263058

Pulled By: NivekT

fbshipit-source-id: 13a30ee9d9779284a9fd9bb7222fc41253c6fe3b
2021-11-11 10:36:05 -08:00
bd5f33f91e demo backend decoupled from operators (#66100)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66100

A backend should not directly dependent on ATen operators. The demo backend is changed to that way for testing purpose.

Test Plan: Imported from OSS

Reviewed By: pavithranrao

Differential Revision: D31384614

Pulled By: iseeyuan

fbshipit-source-id: c97f0c4aa12feb1d124f1d7a852e9955a7a2ce42
2021-11-11 10:26:17 -08:00
97a386805e [Pytorch Edge] Add selective macros to metal ops (#68134)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68134

Add the macros in preparation of making these selective. Should be a no-op in this diff.

ghstack-source-id: 143023844

Test Plan: CI

Reviewed By: dhruvbird

Differential Revision: D32326833

fbshipit-source-id: 7abc93102bff0aa0bc5e3383bdf3e95fb84ce5ba
2021-11-11 10:15:31 -08:00
c2642b6465 Sparse CSR CPU: add torch.add with all inputs sparse (#64391)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64391

This PR adds `torch.add(a, b, alpha=None, out=out)` variant with `a, b, out` all being sparse CSR tensors on CPU.

Fixes #59060

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32316562

Pulled By: cpuhrsch

fbshipit-source-id: 384462369007854b5e2e6cb9ae7b320302627c71
2021-11-11 10:02:12 -08:00
84d3df8027 Fast cuda layer norm (#67977)
Summary:
This adds apex-inspired fast layer norm forward kernel to pytorch (it is a significant rewrite though).
It's much faster than current implementation, for a typical transformer size (32*196, 1024) time goes down from ~180us to ~49 us on Volta. Compared to apex, it also produces bitwise accurate results between float inputs representable in fp16, and fp16 inputs. It produces slightly different results compared to current implementation though, because welford summation is implemented differently.
It is slower than lightSeq (~37 us), but lightseq uses inaccurate variance approximation, and doesn't guarantee float - fp16 bitwise accuracy.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67977

Reviewed By: mruberry

Differential Revision: D32285331

Pulled By: ngimel

fbshipit-source-id: a8b876a9cf3133daacfe0ce3a37e3ad566f4b6a8
2021-11-11 09:32:40 -08:00
eqy
a1ace029e2 Add host-side memory requirement for test_softmax_64bit_indexing (#67922)
Summary:
https://github.com/pytorch/pytorch/issues/67910
The original `largeTensorTest` decorator didn't account for the additional host-side memory requirements.
Thanks crcrpar for raising the issue, CC ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67922

Reviewed By: malfet

Differential Revision: D32308602

Pulled By: mruberry

fbshipit-source-id: 97b7d2c39fe63c1a8269402f72186026a89f6b4c
2021-11-11 09:24:15 -08:00
9e7b314318 OpInfo for nn.functional.conv1d (#67747)
Summary:
This PR adds OpInfo for `nn.functional.conv1d`. There is a minor typo fix in the documentation as well.

Issue tracker: https://github.com/pytorch/pytorch/issues/54261

cc: mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67747

Reviewed By: malfet

Differential Revision: D32309258

Pulled By: mruberry

fbshipit-source-id: add21911b8ae44413e033e19398f398210737c6c
2021-11-11 09:23:04 -08:00
35f1617001 Implement Entropy methods for Binomial and Multinomial distributions (#67609)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60866.

Because it seems https://github.com/pytorch/pytorch/pull/61719 shows no response for a long time, I made this PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67609

Reviewed By: malfet

Differential Revision: D32310866

Pulled By: mruberry

fbshipit-source-id: b3a8dde452f448e5981f5405f5f925f860b0d84f
2021-11-11 09:16:28 -08:00
864c6b3794 [nnc] aotCompiler outputSpec support quantized outputs (#67711)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67711

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D32115833

Pulled By: IvanKobzarev

fbshipit-source-id: e96eb72a290ffb88011b86b3c65c0eff864b63dc
2021-11-11 09:01:46 -08:00
362c6069b9 [nnc] Lazy lowerings registration; custom classes network params (#67623)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67623

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D32065076

Pulled By: IvanKobzarev

fbshipit-source-id: 4945ac6483938d428c539ed1ce4fcd6988b34250
2021-11-11 09:00:23 -08:00
f89572f417 Add feature: zeros_like() from a dense tensor to a sparse tensor (#68108)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67904.
 - Create a sparse tensor when the sparse layout is given even if the input tensor is not sparse.

cc nikitaved pearu cpuhrsch IvanYashchuk

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68108

Reviewed By: anjali411

Differential Revision: D32316269

Pulled By: cpuhrsch

fbshipit-source-id: 923dbd4dc7c74f51f7cdbafb2375a30271a6a886
2021-11-11 08:54:15 -08:00
5efe5e243a Ease constrain for fuse path in trt lower (#68148)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68148

Question raised regarding whether we should fuse pass a->b->c if node a has other consumer rather than node b. This diff is to ease the constrain in fuse path so that in case:
```
   a
|     |
b     d
|
c
```
we still allow fuse path(a->b->c), after fuse, node b will be eliminated by dead_node_eliminator while node a keep in graph.

Reviewed By: yinghai, 842974287

Differential Revision: D32296266

fbshipit-source-id: 44ded07a97b5b708bdf37193a022fae21410b4bd
2021-11-11 08:48:34 -08:00
d4ae789655 OpInfos for new_blah functions and some _like functions (#67357)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67357

This PR adds OpInfos for:
- new_ones, new_zeros, new_full, new_empty
- rand_like, randint_like

I forgot to add the _like functions in a previous PR, so here they are.

Test Plan: - wait for tests

Reviewed By: mruberry

Differential Revision: D31969533

Pulled By: zou3519

fbshipit-source-id: 236d70d66e82f1d6f8e5254b55ca2a37b54c9494
2021-11-11 07:21:23 -08:00
4466ba8f30 Working POC of define-by-run quantization (#64676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64676

We implement a working eager mode quantization flow which uses
tracing and `__torch_function__` and `torch.nn.Module.__call__` overrides to automate the model modifications needed for quantization.  Partial program capture (instead of full program capture) is used, allowing this scheme to target a wide variety of user programs.  Control flow over quantizeable ops is not supported, but general control flow is supported.

In particular:
* `auto_trace.py` contains the machinery to override `__torch_function__` and `torch.nn.Module.__call__` and call hooks before and after each quantizeable module or function
* `quantization_state.py` contains the state needed to use the hooks to implement quantization logic such as adding quants/dequants, observers, etc.
* please see `README.md` for more details

Test Plan:
```
python test/test_quantization.py TestAutoTracing
python test/test_quantization.py TestAutoTracingModels
```

```
python test/test_quantization.py TestAutoTracing
python test/test_quantization.py TestAutoTracingModels
```

Differential Revision:
D31992281
D31992281

Reviewed By: HDCharles

Pulled By: vkuzo

fbshipit-source-id: 6d40e855f3c96b9a4b637a0e677388a7b92f7967
2021-11-11 06:25:24 -08:00
f02efc749a [Dist CI][BE] Run each test in its own process for test_distributed_spawn (#67901)
Summary:
Context: https://github.com/pytorch/pytorch/issues/67061

Use `run_test.py`'s provided flag `"--subprocess"`, passed in like `extra_unittest_args=["--subprocess"]` when running test_distributed_spawn. This will ensure that each test is run separately in its own process. The goal is to more closely simulate how a developer would run a single test when reproducing a CI failure and make reproducibility easier in general.

Also, when a test fails, print out the exact command that was issued so developer knows how to reproduce it.

For example test fails, it will print out something like the following to logs -

```
Test exited with non-zero exitcode 1. Command to reproduce: BACKEND=gloo WORLD_SIZE=3 /fsx/users/rvarm1/conda/envs/pytorch/bin/python distributed/test_distributed_spawn.py -v TestDistBackendWithSpawn.test_Backend_enum_class
```

running test_distributed_spawn is still the same cmd as before:

`
python test/run_test.py --verbose -i distributed/test_distributed_spawn
`

as seen in [distributed contributing](https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md) guide.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67901

Reviewed By: cbalioglu, mruberry

Differential Revision: D32225172

Pulled By: rohan-varma

fbshipit-source-id: 7e8d4c7a41858044bd2a4e0d1f0bf8f1ac671d67
2021-11-11 06:11:00 -08:00
aea4e61ec3 skip test_jit_legacy (#68129)
Summary:
disables failing tests in [https://github.com/pytorch/pytorch/issues/66429](https://github.com/pytorch/pytorch/issues/67646)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68129

Reviewed By: suo, janeyx99

Differential Revision: D32326118

Pulled By: Krovatkin

fbshipit-source-id: ca00d2214503f418be45dc756057b990fb6e6370
2021-11-10 23:08:32 -08:00
a6a2616558 Automated submodule update: kineto (#67445)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/kineto](https://github.com/pytorch/kineto).

New submodule commit: f60ad2cb0f

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67445

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: robieta

Differential Revision: D31993939

fbshipit-source-id: 3d4aa2f900434d4bbe5134db8453deb227ef5685
2021-11-10 22:33:03 -08:00
a229c3e51a Add complete type name in error message when fail to export model (#67750)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67750

Add more information about why exporting model fails.

Before: error message:
```
E1102 22:57:42.984015 3220949 ExceptionTracer.cpp:221] exception stack complete
terminate called after throwing an instance of 'c10::Error'
  what():  __torch__ types other than torchbind (__torch__.torch.classes)are not supported in lite interpreter. Workaround: instead of using arbitrary class type (class Foo()), define a pytorch class (class Foo(torch.nn.Module)). The problematic type is: __torch__.dper3.core.schema_utils.IdListFeature
Exception raised from getFunctionTuple at caffe2/torch/csrc/jit/serialization/export_module.cpp:246 (most recent call first):
```

After
```
E1102 22:57:42.984015 3220949 ExceptionTracer.cpp:221] exception stack complete
terminate called after throwing an instance of 'c10::Error'
  what():  __torch__ types other than torchbind (__torch__.torch.classes)are not supported in lite interpreter. Workaround: instead of using arbitrary class type (class Foo()), define a pytorch class (class Foo(torch.nn.Module)).
Exception raised from getFunctionTuple at caffe2/torch/csrc/jit/serialization/export_module.cpp:246 (most recent call first):
```
ghstack-source-id: 143009294

Test Plan: CI

Reviewed By: larryliu0820

Differential Revision: D32129397

fbshipit-source-id: 0594a98a59f727dc284acd1c9bebcd7589ee7cbb
2021-11-10 21:04:05 -08:00
1f07efd0f2 [SR] Fix aten::split schema (#68135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68135

Update the schema to reflect the changes in  D31935573 (6b44e75f6b).

Test Plan:
`buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Confirmed native implementation is used.

Reviewed By: hlu1

Differential Revision: D32326865

fbshipit-source-id: 7f607f57ceb6690a2782d94d9ee736ba64e7d242
2021-11-10 20:03:30 -08:00
47bc47f2b9 [SR] Add runtime check to correct bad schema alias info (#67825)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67825

The comment explains how it works.

Test Plan:
A small regression to local and local_ro if we only enable it for fallback ops.
```
## local_ro
# before
I1103 21:25:05.250440 2636751 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.22213. Iters per second: 818.247
I1103 21:25:08.629221 2636751 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.22351. Iters per second: 817.319
I1103 21:25:12.005179 2636751 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.22285. Iters per second: 817.759
I1103 21:25:12.005236 2636751 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.22283, standard deviation: 0.000693619

# after
# # only enable for fall back ops: 0.7%
I1103 21:26:40.190436 2644597 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.22928. Iters per second: 813.481
I1103 21:26:43.590443 2644597 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.23265. Iters per second: 811.262
I1103 21:26:46.992928 2644597 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.23379. Iters per second: 810.51
I1103 21:26:46.992980 2644597 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.23191, standard deviation: 0.0023424

# enable for all (no clone): 4.7%
I1103 21:27:55.291216 2649780 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.28204. Iters per second: 780.005
I1103 21:27:58.822347 2649780 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.27854. Iters per second: 782.14
I1103 21:28:02.354184 2649780 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.27958. Iters per second: 781.506
I1103 21:28:02.354240 2649780 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.28006, standard deviation: 0.00179765

# local
# before
I1103 21:52:00.784718 2765168 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.676. Iters per second: 50.8233
I1103 21:52:28.985873 2765168 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.699. Iters per second: 50.7641
I1103 21:52:57.200223 2765168 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.6953. Iters per second: 50.7735
I1103 21:52:57.200273 2765168 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 19.6901, standard deviation: 0.0123206
# after
# # only enable for fall back ops: 0.1%
I1103 21:45:25.514535 2734440 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.7103. Iters per second: 50.7349
I1103 21:45:53.773594 2734440 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.7005. Iters per second: 50.7601
I1103 21:46:21.955680 2734440 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.7398. Iters per second: 50.659
I1103 21:46:21.955729 2734440 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 19.7169, standard deviation: 0.0204658

# enable for all (no clone): 0.9%
I1103 21:43:22.162272 2723868 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.8893. Iters per second: 50.2783
I1103 21:43:50.651847 2723868 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.8566. Iters per second: 50.3611
I1103 21:44:19.068519 2723868 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.8793. Iters per second: 50.3037
I1103 21:44:19.068570 2723868 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 19.875, standard deviation: 0.0167498
```

Reviewed By: d1jang

Differential Revision: D32124812

fbshipit-source-id: 0f60c26f8fb338d347e4ca7a70b23e5a386fc9aa
2021-11-10 19:35:11 -08:00
ca7d0062ad [PyTorch Edge] Better error message when training attribute is not found (#68103)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68103

The error message `'training' attribute not found.` in itself isn't particularly actionable. Anyone running into this tends to be clueless regarding why they are getting this message.

For example, see [this post](https://fb.workplace.com/groups/pytorch.edge.users/posts/965868874283406/) asking for help when seeing this specific error message.

The most common reason for this error is that users call `.eval()` in the model instance before saving it. This change tries to draw attention to that oversight and allows them to proactively investigate and correct that mis-action if necessary.

This saves valuable time for our users and effort from the team tp provide support. Overall, I believe this is a Developer Experience win.

ghstack-source-id: 143021300

Test Plan: Build/CI

Reviewed By: JacobSzwejbka

Differential Revision: D32304477

fbshipit-source-id: 474abe717a862347f16ad981834ddab6819cb4d3
2021-11-10 19:31:10 -08:00
0e366b8e5f Make torch.fx.experimental.fx2trt.passes a package (#68139)
Summary:
Only packages and tools (which are explicitly specified) are included in the wheel/conda files

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68139

Test Plan:
Run `python3 -c "from setuptools import find_packages; print([x for x in find_packages(exclude=('tools','tools.*')) if 'torch.fx' in x])"` before and after the change
Fixes https://github.com/pytorch/pytorch/issues/68059

Reviewed By: nrsatish, seemethere

Differential Revision: D32330483

Pulled By: malfet

fbshipit-source-id: a55443730999a83c615b3f943c327353c011bf7b
2021-11-10 15:57:29 -08:00
f171c78c04 add unpack_sequence and unpad_sequence functions (#66550)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66549

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66550

Reviewed By: malfet

Differential Revision: D32299193

Pulled By: jbschlosser

fbshipit-source-id: 96c92d73d3d40b7424778b2365e0c8bb1ae56cfb
2021-11-10 15:15:08 -08:00
a510f4139b Fix lambda function broke torch.save
Summary: Torch.save use pickle, which cannot handle lambda function or local function directly without modify serialization.py. This diff fix the issue by extract lambda to a normal function.

Test Plan: buck test mode/dev-nosan //caffe2/test/fx2trt/core:test_trt_module

Reviewed By: 842974287

Differential Revision: D32320536

fbshipit-source-id: 497d2e64f94526f92e6d1a9909b6ad629dbca850
2021-11-10 14:21:06 -08:00
22e73f616c Update unpack_dual to return named tuple (#68062)
Summary:
Also updates the doc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68062

Reviewed By: gchanan

Differential Revision: D32315089

Pulled By: soulitzer

fbshipit-source-id: 567c812da093daeb6549b0dc7ecbffd58eb8ccc2
2021-11-10 14:14:55 -08:00
d6e6064efc [LT] Upstream backend interfaces (#67927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67927

BackendData - represents 'tensor data' in opaque backend storage
LoweringContext - interface for performing backend-specific IR lowering
BackendImplInterface - interface for lazy tensors backends to implement

Reorgs backend-related files into lazy/backend subdir

includes a few small fixes, which were made on lazy_tensor_staging but need to be back-ported to master.

Test Plan: used by lazy_tensor_staging branch

Reviewed By: desertfire

Differential Revision: D32142032

fbshipit-source-id: 828c717bcd0d511876e64ad209b50f7bfb10cec5
2021-11-10 12:55:31 -08:00
c075f0f633 Update rpc testing to include USE_TENSORPIPE directive (#68080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68080

Fixes #68002

After FaultyProcessGroupAgent was replaced with FaultyTensorpipeAgent there is now a dependency on Tensorpipe for rpc testing. However, if user does not have USE_TENSORPIPE enabled they will hit an issue such `undeclared identifier 'FaultyTensorPipeRpcBackendOptions'`. This is for testing the faulty agent method so it should not block compilation. Update to wrap the Tensorpipe specific code in a directive.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32292861

Pulled By: H-Huang

fbshipit-source-id: 4ffb879860ced897674728200a1831f18fea0a4a
2021-11-10 12:12:18 -08:00
a3bb95c1b5 don't include label in ci: sev issue (#68093)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68093

We don't want regular users without write access to be able to file an
actual issue with the `ci: sev` label since that issue will
automatically show up on hud.pytorch.org

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D32299553

Pulled By: seemethere

fbshipit-source-id: d46a96f16ae29120fff94288d3e0c06b103edf7f
2021-11-10 12:03:18 -08:00
ecd5b1a8d4 [SR] Native implementation for aten::split (#67476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67476

Native ops are faster than falling back to the JIT interpreter, sometimes significantly (we've previously seen this with ops like TupleUnpack). We should improve op coverage where possible.

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: d1jang

Differential Revision: D31994040

fbshipit-source-id: 9de57d8d7925ee46544478eae8229952ca5f248a
2021-11-10 10:23:03 -08:00
746a31b290 Logger integration format (#67962)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67962

Logger integration format for chunks at [dims] -> input_val.shape[dim]

NOTE: Unused typing imports removed

Test Plan:
buck run -c python.package_style=inplace mode/dev-nosan caffe2/torch/fb/fx2trt:test_chunk

out:
RuntimeWarning: Asked for 2000 chunks along dimention 2 on tensor with size (3, 10, 20), chunks will default to 20

Reviewed By: 842974287

Differential Revision: D32233039

fbshipit-source-id: 1fde12c9f743bb80cdb309e0b7be287173d45147
2021-11-10 10:12:06 -08:00
8dfbc620d4 don't hardcode mask type in mha (#68077)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68077

Reviewed By: zou3519

Differential Revision: D32292410

Pulled By: ngimel

fbshipit-source-id: 67213cf5474dc3f83e90e28cf5a823abb683a6f9
2021-11-10 09:41:21 -08:00
ae5864498d torch.allclose opinfo (#68023)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68023

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D32295811

Pulled By: george-qi

fbshipit-source-id: 3253104a5a9655d8ba7bbba6620038ed6d6669f1
2021-11-10 09:16:39 -08:00
9a2db6f091 Factor backend routing logic out of convolution forward (#67790)
Summary:
This PR introduces a new function `_select_conv_backend` that returns a `ConvBackend` enum representing the selected backend for a given set of convolution inputs and params.

The function and enum are exposed to python for testing purposes through `torch/csrc/Module.cpp` (please let me know if there's a better place to do this).

A new set of tests validates that the correct backend is selected for several sets of inputs + params. Some backends aren't tested yet:
* nnpack (for mobile)
* xnnpack (for mobile)
* winograd 3x3 (for mobile)

Some flowcharts for reference:
![conv_routing_graph md](https://user-images.githubusercontent.com/75754324/140828957-1135b400-38c0-4c9f-87ef-4f33ceebeeae.png)
![conv_nogroup_routing_graph md](https://user-images.githubusercontent.com/75754324/140828977-ed223a4e-aa86-49f1-9925-c0f6b9ab36af.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67790

Reviewed By: zou3519

Differential Revision: D32280878

Pulled By: jbschlosser

fbshipit-source-id: 0ce55174f470f65c9b5345b9980cf12251f3abbb
2021-11-10 07:53:55 -08:00
147de8243b Fixed deprection warnings with .data<T>() in SpectalOps.cpp (#67993)
Summary:
Description:
- Fixed deprection warnings `.data<T>()` -> `.data_ptr<T>()` in SpectralOps.cpp shown while building pytorch from source

```c++
../aten/src/ATen/native/mkl/SpectralOps.cpp:213:10: warning: ‘T* at::Tensor::data() const [with T = c10::complex<double>]’ is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.
data_ptr<T>() instead. [-Wdeprecated-declarations]
  213 |   return reinterpret_cast<std::complex<T>*>(t.data<c10::complex<T>>());
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67993

Reviewed By: H-Huang

Differential Revision: D32246945

Pulled By: mruberry

fbshipit-source-id: 5cd6b0ac6ddff0afc56e99641971e1e3b6434af6
2021-11-10 07:33:15 -08:00
6011c35a79 [LTC] Upstream class BackendDevice (#68027)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68027

This commit upstreams class BackendDevice to the master, which is a backend
specific representation of the actual hardware, for instances, CPU, GPU, or
TPU.

This concept is important for backend like XLA where it needs to tell the
actual hardware type from the c10::DeviceType::Lazy virtual device during
both IR constructions and lowerings.

Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.*

Reviewed By: wconstab

Differential Revision: D32261838

Pulled By: alanwaketan

fbshipit-source-id: 579c3fc5f9da7847c887a383c6047e8ecb9cc5bc
2021-11-10 07:05:43 -08:00
a6c0edff1a fix gradcheck to generate valid input for forward AD complex (#68001)
Summary:
This fixed a few of the linalg checks that we disabled before!

This also seems to break sgn, abs and angle (sending on CI here to see if there are more). These two functions used to only ever get pure imaginary or real values.
This is very much likely that something is wrong with their formula.
But they are implemented as element-wise, so not sure where the error can come from. I tried to look at it but nothing obvious seems wrong there (especially because it is correct in backward mode).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68001

Reviewed By: soulitzer

Differential Revision: D32280475

Pulled By: albanD

fbshipit-source-id: e68b1ce0e2e97f8917c3d393141d649a7669aa9d
2021-11-10 03:07:48 -08:00
94b6fa6f8b Adds an optimizer instance variable to ChainedScheduler (#68010)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67601.

As simple a fix as I could make it. I even managed to delete some testing code!

I checked calling `super()` and, as I had feared, it doesn't work out the box, so perhaps that ought to be revisited later.

As it stands,  https://github.com/pytorch/pytorch/issues/20124, still applies to the chained scheduler, but I think this change is still an improvement.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68010

Reviewed By: zou3519

Differential Revision: D32278139

Pulled By: albanD

fbshipit-source-id: 4c6f9f1b2822affdf63a6d22ddfdbcb1c6afd579
2021-11-10 01:31:47 -08:00
cb2a41e508 [PyTorch Edge] Don't use LeftRight in mobile (#66064)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66064

The only place this is used seems to be in the dispatcher for `operatorLookupTable_`. Disarming `LeftRight` disarms it for this one use case.

This should make .so loading faster, and also reduce memory consumption since `LeftRight<T>` does 2 writes for every write. I'd like to get a thorough review from reviewers for this diff since I want to make sure that initialization of stuff that writes into the dispatcher isn't going to happen on multiple threads for on-device use.

Created a new class named `LeftRightNoOpWrapper<T>` for use in mobile builds.

### Why is LeftRight<T> slow?

It maintains 2 copies of each data structure `T` to be able to keep reads quick. Every write goes to both data structures, which means that writes that 2x and memory overhead is also 2x

### Why is this safe for mobile builds?

1. .so loading never happens concurrently with model execution
2. Custom ops are loaded during .so load - initializers are all run serially
3. I don't see any threads being spawned from the global schema and kernel initializers

After discussing with dreiss, it seems like there could be rare cases in OSS apps or internal Android/iOS apps where a `.so` or `dylib` is loaded after the PT runtime is loaded, and this load happens concurrently with an in-progress inference run, which is looking up the operator table in the dispatcher.

To avoid crashes there, it seems reasonable to use the RW lock, since I don't expect any contention 99.9% of the time.

When registering operators, everything is serial so only one thread will ever hold the lock. The next time it needs the lock, it will have already released it.
During inference runs, only one thread will ask for the shared lock unless multiple concurrent inferences are in progress. Even in that case, they will all be able to simultaneously get the Read lock.

Test Plan: Build and generate a local build of the iOS app to test.

Reviewed By: swolchok

Differential Revision: D31352346

fbshipit-source-id: c3f12454de3dbd7b421a6057d561e9373ef5bf98
2021-11-09 21:49:45 -08:00
b0817e19e0 [PyTorch] Avoid reading file from stream for 0 byte Tensor storage (#67787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67787

First noticed in https://fb.workplace.com/groups/pytorch.edge.team/posts/952737705280969/ - basically one of the speech models has ~400 0 byte tensor files, so we're basically paying the cost of looking it up in the archive and reading nothing from it.

Turns out that there's a fairly simple fix to avoid reading a 0 byte tensor. Once we notice that it's 0 bytes, just use the default `DataPtr` instead to initializing it with 0 bytes read in from the input file stream.

ghstack-source-id: 142025211

Test Plan: CI and manually ran a couple production mobile models with bundled inputs. CI Will run all prod. mobile mobiles with bundled inputs.

Reviewed By: swolchok

Differential Revision: D32054983

fbshipit-source-id: 919b0cdbc44bccb8f6cfe0da10ff5474af37fd99
2021-11-09 21:45:05 -08:00
bf31d4b2b5 [PyTorch] Replace copy_ with data_ptr<float>() since input Tensor's dtype is guaranteed to be float (#67788)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67788

Based on comments from supriyar in D31657430 (20aa417e38).
ghstack-source-id: 142924000

Test Plan: CI

Reviewed By: supriyar

Differential Revision: D32055028

fbshipit-source-id: 756d526585f8ded755ea42b52dbbf5c1687acde2
2021-11-09 21:40:23 -08:00
6b44e75f6b aliasing fixes (#66977)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66977

Fix for https://github.com/pytorch/pytorch/issues/47218

More context is in original PR here: https://github.com/pytorch/pytorch/pull/20556

Test Plan: Imported from OSS

Reviewed By: malfet, albanD

Differential Revision: D31935573

Pulled By: eellison

fbshipit-source-id: 3658d5711116396c35f1d5016773b0096ed347a5
2021-11-09 18:33:37 -08:00
3f1a3f7b18 Fix ads dense arch regression (#68071)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68071

Reviewed By: yinghai

Differential Revision: D32261611

fbshipit-source-id: 3224464bbf30fecbdb69e6ae88e42485ef67f800
2021-11-09 18:22:01 -08:00
91af74c934 remove Generate* macro files (#67940)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67940

Reviewed By: mruberry

Differential Revision: D32250987

Pulled By: ngimel

fbshipit-source-id: 3feb0bc876bc26d0a42784e5c6001670ed71e971
2021-11-09 17:31:56 -08:00
eqy
790763b0fe Add an option to disable reduced precision reductions for FP16 GEMM (#67946)
Summary:
https://github.com/pytorch/pytorch/issues/67578 disabled reduced precision reductions for FP16 GEMMs. After benchmarking, we've found that this has substantial performance impacts for common GEMM shapes (e.g., those found in popular instantiations of multiheaded-attention) on architectures such as Volta. As these performance regressions may come as a surprise to current users, this PR adds a toggle to disable reduced precision reductions
`torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = `
rather than making it the default behavior.

CC ngimel ptrblck
stas00 Note that the behavior after the previous PR can be replicated with
`torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67946

Reviewed By: zou3519

Differential Revision: D32289896

Pulled By: ngimel

fbshipit-source-id: a1ea2918b77e27a7d9b391e030417802a0174abe
2021-11-09 17:27:20 -08:00
078c655985 [nnc][mobile] temporarily disable quantization external functions (#68029)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68029

Temporarily disable quantization external functions with a new macro DISABLE_NNC_QUANTIZATION.

The ATen CPU library consists of two parts:
A. Common operator functions, e.g. "at::empty()", the list of sources can be found at "aten_cpu_source_list" in "tools/build_variables.bzl";
B. Implementations of these operators, e.g. "at::native::empty()", the list of sources is defined at "aten_native_source_list" in "tools/build_variables.bzl";

Note that A does not directly depend on B. A calls B via dispatch table. The dependency is injected into the dispatch table by B during its static initialization.

For internal mobile builds, B is built on a per-app basis. A is the public library for other libraries to depend on. Because these external functions call quantization functions that are not part of A, the NNC kernel library cannot resolve the missing symbols.

Use this PR to unblock the internal experiment until we figure out a better solution (e.g. move quantization API to A).
ghstack-source-id: 142868370

Test Plan: Make sure it can build with the stacked diff.

Reviewed By: IvanKobzarev

Differential Revision: D32239783

fbshipit-source-id: 3797b14104b0f54fb527bc3fc5be7f09cc93d9e4
2021-11-09 17:10:16 -08:00
b1a42298a4 Simplify example for nn.Flatten (#67472)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67415
Using the docstring example provided by jbschlosser to the issue submitted by qzylalala

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67472

Reviewed By: soulitzer

Differential Revision: D32210995

Pulled By: jbschlosser

fbshipit-source-id: f22bcd729699993942b6e676b479618ac613022c
2021-11-09 17:03:06 -08:00
d8f0087e08 .github: Fix sccache for macOS workflows on push (#68094)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68094

Turns out sccache was not getting activated properly on master pushes so
this should help resolve that

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D32299636

Pulled By: seemethere

fbshipit-source-id: 5f1be98dffdb202d3c11b6ceb2b49af235e1f91b
2021-11-09 16:40:56 -08:00
1b2a366932 [SR] Enforce checks for resizing of the internal buffer in MemoryPlanner in unit tests (#67941)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67941

I just found out that due to the round up of the Tensor storage sizes to multiples of 64 bytes, resizing is not actually triggered for a lot of our unit tests (23 OSS, 16 internal). Now they should be all fixed. Also moved a bunch of tests to `test_static_module.cc` so that `test_static_runtime.cc` now only contains operator tests.

From now on, by default if `args2` is passed to `test_static_runtime`, at the end of the second iteration, it would check that the managed buffer's size is bigger than the previous size and enforce that. You can bypass the check for ops with constant output sizes, such as `aten::sum` without `dim` passed in.

Test Plan:
Facebook
```
buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest
buck test //caffe2/benchmarks/static_runtime/fb:test_fb_operators
```

Reviewed By: swolchok

Differential Revision: D32196204

fbshipit-source-id: 8425d9efe6b9a1c1e3807e576b1143efd7561c71
2021-11-09 16:07:40 -08:00
8d025bbc2d .github: Migrate macOS workflows to GHA (#67717)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67717

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32287733

Pulled By: seemethere

fbshipit-source-id: 8df6b20aada818ad39895ef87dc280098e09707b
2021-11-09 15:46:05 -08:00
55e3b23abe [Pytorch Edge] Generic Build Features for Selective Build (#67817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67817

Implementation of build features as a useable feature. Includes tracing support and selectivity support. Follow up of Dhruv's prototype in D30076214.

The general idea is to allow selectivity of arbitrary sections of the codebase through the 2 apis,
BUILD_FEATURE_REQUIRED(NAME), and
BUILD_FEATURE_AVAILABLE(NAME)

References
PyTorch Edge Team Workplace group post link: https://fb.workplace.com/groups/pytorch.edge.team/posts/905584476662959/
Quip talking about some early ideas related to build features: https://fb.quip.com/iur3ApU9q29v
Google Doc about most recent discussion and details: https://docs.google.com/document/d/1533zuN_9pwpQBa4RhtstUjT5B7guowblqJz35QYWPE0/edit

Will remove the copy kernel example after. Its just here as an example.
ghstack-source-id: 142850218

Test Plan: CI, dummy traced a model, and played around with its unit test if i removed the traced value from the yaml

Reviewed By: dhruvbird

Differential Revision: D32151856

fbshipit-source-id: 33764c1f6902a025e53807b784792a83c8385984
2021-11-09 15:37:21 -08:00
43ef6816f2 OpInfo for nn.functional.cross_entropy (#63547)
Summary:
Reference: https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261

TODOs:

* [ ] Investigate autograd failures.
* [ ] Clean up `test_nn.py` for `cross_entropy`.

cc: mruberry zou3519

cc albanD mruberry jbschlosser walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63547

Reviewed By: mruberry

Differential Revision: D32062955

Pulled By: zou3519

fbshipit-source-id: 2a62a4c28af51fb71159df2e262d05039d549b7e
2021-11-09 15:07:12 -08:00
eaf0457eef [distributed][docs] Delete distributed optimimzer section from RPC and add reference to namespace docs page (#68068)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68068

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Test Plan: Imported from OSS

Reviewed By: pritamdamania87

Differential Revision: D32286554

Pulled By: jamesr66a

fbshipit-source-id: a43fe1f0cfa74721f467b128f2e878bd02f32546
2021-11-09 15:01:54 -08:00
7c90bd77ec Test functionalization pass in python (#66101)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66101

Updated description:

This PR tests the functionalization pass in python in two ways. For each of the test programs that I have in `test_functionalization.py`, it:
- runs the program with and without functionalization, and asserts the outputs and (potentially mutated) inputs are equal in both cases
- runs the program with `LoggingTensor`, and uses expecttests on the resulting graph. I manually confirm that the graphs look reasonable and only contain functional ops.

Mechanically, the changes include:
- factoring out `LoggingTensor` into a testing util so it can be re-used in multiple tests
- adding some private python api's in the `torch` namespace as hooks that I can use during testing

In the original version of this PR, I also added some fixes to the `_make_subclass()` function in python: allowing you to pass in strides and storage_offset. I kept them in mainly because the changes were already there.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D31942095

Pulled By: bdhirsh

fbshipit-source-id: 90ff4c88d461089704922e779571eee09c21d707
2021-11-09 14:34:05 -08:00
fe46d6c68f functionalization: map copy_() -> to().expand_as() (#67878)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67878

The functionalization pass doesn't work with `copy_()` which is a problem with functorch. Originally we were going to make a functional `copy()` operator to fix this problem, but zou3519 that we can get (most of) the existing functionality by mapping `self.copy_(src)` to `src.to(self).expand_as(self)`. This makes the codegen a bit uglier, but has the benefit of avoiding a totally unnecessary tensor allocation in functorch.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32280588

Pulled By: bdhirsh

fbshipit-source-id: 2c6ee65f0929e0846566987183ba2498c88496c2
2021-11-09 14:34:02 -08:00
be4150139a bugfix for conditional functionalization (#67715)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67715

I had original made the `vector<ViewMeta>` and `Tensor`s stored on the `Update` struct references, but will pointed out a bug in the conditional-functionalization PR due to a use-after-free error. This happens because the queued-up updates might not be synced until later, and can out-live the original tensor that was used to create them.

It was kind of strange that this doesn't show up in the existing `test/test_functionalization.py` tests that I have in this stack, which technically also should have this bug (they call sync_() after the mutated tensors have gone out of scope). I looked at it with gdb, and I'm wondering if it's just because the stored values in the free'd `ViewMeta`/`Tensor` just happen to not get clobbered by the time the sync is called in the test.

Either way, copying the Tensor + vector<ViewMeta> is probably not ideal for performance, but I couldn't think of an easy work-around for now.

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32136007

Pulled By: bdhirsh

fbshipit-source-id: 707c6392a31b967e8965b9b77f297fd10a0a095a
2021-11-09 14:32:17 -08:00
4100a5cc48 Revert D32286934: [pytorch][PR] replace platform specific CI environment variables with generic ones
Test Plan: revert-hammer

Differential Revision:
D32286934 (7d931fb082)

Original commit changeset: 1008938088da

fbshipit-source-id: dd2dd07742670a34deec10995b95b98c9fd62724
2021-11-09 14:06:18 -08:00
273f7ae9b3 fx: Update fx.rst (#68043)
Summary:
When I run this part of the code on the document with PyTorch version 1.10.0, I found some differences between the output and the document, as follows:

```python
import torch
import torch.fx as fx

class M(torch.nn.Module):
    def forward(self, x, y):
        return x + y

# Create an instance of `M`
m = M()

traced = fx.symbolic_trace(m)
print(traced)
print(traced.graph)
traced.graph.print_tabular()
```

I get the result:

```shell
def forward(self, x, y):
    add = x + y;  x = y = None
    return add

graph():
    %x : [#users=1] = placeholder[target=x]
    %y : [#users=1] = placeholder[target=y]
    %add : [#users=1] = call_function[target=operator.add](args = (%x, %y), kwargs = {})
    return add
opcode         name    target                   args    kwargs
-------------  ------  -----------------------  ------  --------
placeholder    x       x                        ()      {}
placeholder    y       y                        ()      {}
call_function  add     <built-in function add>  (x, y)  {}
output         output  output                   (add,)  {}
```

This pr modified the document。

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68043

Reviewed By: driazati

Differential Revision: D32287178

Pulled By: jamesr66a

fbshipit-source-id: 48ebd0e6c09940be9950cd57ba0c03274a849be5
2021-11-09 14:00:45 -08:00
c7eaec86f0 [NCCL] Patch bfloat16 support (#67843)
Summary:
Patch bfloat16 support in NCCL, PR https://github.com/pytorch/pytorch/issues/63260 adds bfloat16 support but is
still not complete to enable bfloat16 for allreduce in end-to-end training.

This patch does the followings:
* fix minimum NCCL version from 2.9.7 to 2.10, NCCL adds bf16 support in
  v2.10.3-1 (commit 7e51592)
* update bfloat16 datatype flag in `csrc/cuda/nccl.cpp` so that NCCL
  operations like all reduce can use it
* enable unit tests for bfloat16 datatype if possible

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67843

Reviewed By: H-Huang

Differential Revision: D32248132

Pulled By: mrshenli

fbshipit-source-id: 081e96e725af3b933dd65ec157c5ad11c6873525
2021-11-09 13:46:13 -08:00
45ac6f2b65 [quant] Fix comparison against reference for test_qat_functional_linear (#68061)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68061

Test had a typo that didn't compare test value against reference value, fixed typo.

Test Plan:
`pytest test/quantization/fx/test_quantize_fx.py  -v -k "test_qat_functional_linear"`

Imported from OSS

Reviewed By: HDCharles

Differential Revision: D32280803

fbshipit-source-id: d57a25a0dcdd88df887a39b5117abafaf15125b2
2021-11-09 13:33:13 -08:00
a9c2f11d2a Update Freezing Logic and add new passes (#68024)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68024

Pull Request resolved: #67949

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D32260614

Pulled By: eellison

fbshipit-source-id: 41d7a9b45e33297a17560a22eba8973e2fc48b43
2021-11-09 13:21:52 -08:00
d2438a8901 [qnnpack] Lock before weightpacking in qlinear (#68012)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68012

Previous attempt to make qlinear threadsafe placed lock after weight ptr was already accessed via packB. Race condition occurs when thread1 acquires lock, packs weights but thread2 still uses old nullptr after acquiring the lock. This causes a null pointer dereference later.
ghstack-source-id: 142714894

Test Plan: Tested on repro diff

Reviewed By: kimishpatel

Differential Revision: D32252563

fbshipit-source-id: 429fcd3f76193f1c4c8081608b6f725b19562230
2021-11-09 13:03:02 -08:00
e86058559a Op info for activation functions 2 (softsign, tanh, tanhshrink, threshold, celu, sigmoid, mish, hardsigmoid) (#67492)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67492

Reviewed By: zou3519

Differential Revision: D32282580

Pulled By: samdow

fbshipit-source-id: 115afe790328577357a90117bede3b6502590441
2021-11-09 12:57:38 -08:00
726e2ed715 [lint] add more lints to lintrunner (#68069)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68069

- executable bit
- cub include
- raw CUDA API usage

Test Plan: Imported from OSS

Reviewed By: janeyx99

Differential Revision: D32286559

Pulled By: suo

fbshipit-source-id: 21d58e259c951424f9c6cbf1dac6d79fe7236aa4
2021-11-09 12:48:56 -08:00
cbf596bf8e Sparse CSR CPU: add addmv_out (#61536)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61536

This PR adds CPU dispatch for `addmv_out` with Sparse CSR matrix.
The implementation uses MKL Sparse library. If it's not available then a
runtime error is thrown.
Since structured_delegate is used we only need to implement the out variant, the in-place and normal variants are autogenerated.

MKL descriptor of sparse matrices is implemented in `at::mkl::sparse::MklSparseCsrDescriptor`.
MKL Sparse doesn't allow switching indices type in runtime, it's
predetermined in build time. Only 32-bit version of MKL was tested
locally, but I expect 64-bit version to work correctly as well.

When indices type of PyTorch CSR tensor doesn't match with MKL's,
indices tensor is converted to MKL compatible type (`int` vs `int64_t`).

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D32141787

Pulled By: malfet

fbshipit-source-id: b818a0b186aa227982221c3862a594266a58a2a6
2021-11-09 12:34:21 -08:00
7d931fb082 replace platform specific CI environment variables with generic ones (#68022)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59478

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68022

Reviewed By: seemethere

Differential Revision: D32286934

Pulled By: atalman

fbshipit-source-id: 1008938088da56807e85fb5d776abf79f28ef77b
2021-11-09 12:06:44 -08:00
a027551358 [LT] Merge cache.h (#67929)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67929

1. Write a node-hash based unit test for Cache
2. Replace CHECK with TORCH_CHECK in IrUtil

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D32246134

Pulled By: desertfire

fbshipit-source-id: c464bc300126d47e9ad4af3b3e8484a389757dc0
2021-11-09 12:02:02 -08:00
a473417076 [LT] Merge permutation_util into master (#67766)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67766

Test Plan: `build/bin/test_lazy`

Reviewed By: wconstab

Differential Revision: D32147676

Pulled By: desertfire

fbshipit-source-id: 528b48c9cf789abc171235091c7146b2ab7a9c76
2021-11-09 12:00:39 -08:00
442d7d72de fixed type checking errors in options.py (#68056)
Summary:
Fixes [issue#64](https://github.com/MLH-Fellowship/pyre-check/issues/64)
This PR fixes the type checking errors in torch/distributed/rpc/options.py.
The variable types in 84:8 and 85:8 were  declared to have type `List`  but were sometimes assigned a value of  `None`. This caused an incompatitble variable type error. Therefore, I changed the type from `List` to `Optional[List]` . Hence, this fixes the incompatitble variable type error.

Signed-off-by: Onyemowo  Agbo
onionymous
0xedward

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68056

Reviewed By: zou3519

Differential Revision: D32282289

Pulled By: mrshenli

fbshipit-source-id: ee410165e623834b4f5f3da8d44bd5a29306daae
2021-11-09 11:42:34 -08:00
acb035f513 Revert D31609714: Fix Dispatching not considering List[Optional[Tensor]] for dispatch
Test Plan: revert-hammer

Differential Revision:
D31609714 (c581f56c74)

Original commit changeset: bb91cafd32fb

fbshipit-source-id: a04055e7af4bf8491b44bbc3e3bddc7831ab205e
2021-11-09 10:41:53 -08:00
6e53d6df83 [SR] Introduce StaticMethod (#67981)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67981

To save on memory, various internal classes need to release all references to their `torch::jit::Module` after constructing their `StaticModule`. Unfortunately, many of these classes currently instantiate a `torch::jit::Method` attribute, which holds a reference to the `ivalue` backing its owning module.

To avoid this, I've introduced a new subclass of `IMethod` to represent scripted functions backed by static runtime.

Test Plan: CI

Reviewed By: swolchok

Differential Revision: D32232039

fbshipit-source-id: 434b3a1a4b893b2c4e6cacbee60fa48bd33b5722
2021-11-09 10:37:29 -08:00
5e19fb61fd [SR] Release reference to JIT module if possible (#67911)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67911

If we can remove `self` from the graph inputs, there is no need for `StaticModule` to hold onto its `Module` reference anymore.

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D32190755

fbshipit-source-id: 9c4649a63b6e68c7d2e47395a23572985d2babb1
2021-11-09 10:35:31 -08:00
9ae3f3945b Add remote_module logging to the __new__ method. (#68035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68035

RemoteModule is sometimes created using object.__new__ (ex:
init_from_module_rref), in this case the logging in the __init__ method would
not pick this up.

As a result, adding a `__new__` method to RemoteModule to log all usages
appropriately.
ghstack-source-id: 142762019

Test Plan: waitforbuildbot

Reviewed By: vipannalla

Differential Revision: D32263978

fbshipit-source-id: a95ab0bb5d0836da8fe6333c41593af164b008d9
2021-11-09 09:32:34 -08:00
96b4f2296e CppSignature: Compare types by their mangled names (#67987)
Summary:
`.name()` has to call `__cxa_demangle` and allocate a new string, both of which can be avoided by just comparing the mangled names directly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67987

Reviewed By: mruberry

Differential Revision: D32264560

Pulled By: H-Huang

fbshipit-source-id: 9dd4388ba4e2648c92e4062dafe6d8dc3ea6484e
2021-11-09 08:52:42 -08:00
114ef8c5ea Add SiLU backward Aten symbol (#67665)
Summary:
This is related to https://github.com/pytorch/xla/issues/3192. bdhirsh

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67665

Reviewed By: desertfire

Differential Revision: D32245736

Pulled By: bdhirsh

fbshipit-source-id: c5a2b24214fa37a181246cbbfcbee131473cf807
2021-11-09 08:14:02 -08:00
c581f56c74 Fix Dispatching not considering List[Optional[Tensor]] for dispatch (#66506)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66506

Followup to https://github.com/pytorch/pytorch/pull/60787

It turns out that the original PR was wrong for unboxed kernels. We
recently ran into this in
https://github.com/facebookresearch/functorch/issues/124

For unboxed kernels, the correct type for a Tensor?[] argument is
actually `List<optional<Tensor>>`, not `ArrayRef<optional<Tensor>>`

Test Plan:
- assert that https://github.com/facebookresearch/functorch/issues/124
actually works

Reviewed By: bdhirsh

Differential Revision: D31609714

Pulled By: zou3519

fbshipit-source-id: bb91cafd32fb3c1b7d1e4f966b46b5d973b50df2
2021-11-09 08:00:09 -08:00
803e88d418 [DataPipe] Fixing pickling issues with fork and demux (#67930)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67930

Fixes #67848

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D32222184

Pulled By: NivekT

fbshipit-source-id: 48871c45a855d92cd599e21f3b53827dd32c91ef
2021-11-09 07:54:02 -08:00
577a4d34a7 making import_module private and deprecating public method (#67990)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67990

Duplicate of the following PR which was merged by mistake without ghimport
https://github.com/pytorch/pytorch/pull/67914

cc albanD NicolasHug

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D32247560

Pulled By: jdsgomes

fbshipit-source-id: 8ba5ba7d17fc3d0d2c377da467ea805822e21ec1
2021-11-09 07:27:57 -08:00
0a9cd6d461 Removes unnecessary no_pretrained_model from test_quantize_fx.py (#67836)
Summary:
TorchVision accidentally included model builders for quantized models without weights; this was an old bug. These builders were largely unusable and caused issues to the users. Commonly they were filtered out to avoid causing issues.

We've recently fixed that (https://github.com/pytorch/vision/pull/4854) by either removing those unnecessary builders or by providing quantized weights. This PR removes the no-longer necessary filtering of the methods.

**It should be merged after TorchVision is synced on FBCode.**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67836

Reviewed By: jerryzh168

Differential Revision: D32230658

Pulled By: datumbox

fbshipit-source-id: 01cd425b1bda3b4591a25840593b3b5dde3a0f12
2021-11-09 05:49:27 -08:00
f9422e1c6b Fix deadlock for multi-output forward AD (#67995)
Summary:
Will hide some of the issues from https://github.com/pytorch/pytorch/issues/67367
This will at least allow us to run gradcheck for now until the above issue is fixed.

For more context, the deadlock happens when we (wrongfully) set a forward grad that also has a forward grad of the same level.
In particular, when exiting the level from 191b48b12f/torch/csrc/autograd/forward_grad.cpp (L23)
We are taking the `all_forward_levels_mutex_` lock and proceed to delete the level at 191b48b12f/torch/csrc/autograd/forward_grad.cpp (L29) (nothing else usually references this object, so it gets deleted as soon as it gets removed from the vector). Note that, at this point, we still have the lock!

In the level destructor in 191b48b12f/torch/csrc/autograd/forward_grad.cpp (L55) we are deleting the forward grad. Which triggers the deletion the grad Tensor and everything it holds (assuming nothing else references it).
But in the (bad) case where this Tensor also has a forward grad for this level, the autograd meta clears the fw grads: 191b48b12f/torch/csrc/autograd/forward_grad.h (L124)
While clearing, we access the level (to de-register this forward grad) via 191b48b12f/torch/csrc/autograd/forward_grad.h (L139)
But this tries to access the level again in 191b48b12f/torch/csrc/autograd/forward_grad.cpp (L39) and deadlocks.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67995

Reviewed By: soulitzer

Differential Revision: D32250996

Pulled By: albanD

fbshipit-source-id: f6118117effd3114fa90dc8fe22865339445f70c
2021-11-09 01:32:43 -08:00
f8297d40fc Adds a maximize flag to SGD. (#67847)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46480 -- for SGD.

## Notes:
- I have modified the existing tests to take a new `constructor_accepts_maximize` flag. When this is set to true, the ` _test_basic_cases_template` function will test both maximizing and minimizing the sample function.
- This was the clearest way I could think of testing the changes -- I would appreciate feedback on this strategy.

## Work to be done:
[] I need to update the docs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67847

Reviewed By: H-Huang

Differential Revision: D32252631

Pulled By: albanD

fbshipit-source-id: 27915a3cc2d18b7e4d17bfc2d666fe7d2cfdf9a4
2021-11-09 00:43:07 -08:00
c5e5264be2 Disable TF32 in pinv_jvp and pinv_backward (#67948)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67947

cc ptrblck xwang233 zasdfgbnm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67948

Reviewed By: H-Huang

Differential Revision: D32251934

Pulled By: ngimel

fbshipit-source-id: a2b1a118337b38db61350c9e49f1ba19030d70ec
2021-11-08 22:33:29 -08:00
417dc7f86c Revert D32007691: [pytorch][PR] Op info for activation functions 2 (softsign, tanh, tanhshrink, threshold, celu, sigmoid, mish, hardsigmoid)
Test Plan: revert-hammer

Differential Revision:
D32007691 (ea60e7d559)

Original commit changeset: 6cb14dc56e29

fbshipit-source-id: 9ef599ef07302fb521b1f413b989786adfa3576c
2021-11-08 21:16:53 -08:00
36d9a74bc6 Enforce that test cases extend from correct TestCase (#67819)
Summary:
Addresses https://github.com/pytorch/pytorch/issues/66903

Main code is in  torch/testing/_internal/common_utils.py and everything else is fixing the lint

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67819

Reviewed By: H-Huang

Differential Revision: D32259978

Pulled By: janeyx99

fbshipit-source-id: 39c5ffbaa510e1e533d6bdacf5c6158a3dd9885d
2021-11-08 18:28:36 -08:00
25cd81876d Fix typo grid_sampler_3d_cuda (#67752)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67752

Reviewed By: NivekT, mruberry

Differential Revision: D32256561

Pulled By: H-Huang

fbshipit-source-id: b4d56cadf15bc00181e899ea4be4b1bcfe63f692
2021-11-08 18:16:01 -08:00
4b1d044498 [WIP][resubmit] Don't #define NUM_THREADS (#68008)
Summary:
This reverts commit 9e8016d8c48e9c99addad93112f99d3375a0fbc7.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68008

Reviewed By: H-Huang

Differential Revision: D32254779

Pulled By: ngimel

fbshipit-source-id: 38ec415199f62a1e58000abe3e34ac91898a94ae
2021-11-08 18:03:45 -08:00
a2ab06514b Fixes CUDA vs CPU consistency for index_put_ when accumulating (part 2) (#67189)
Summary:
Description:
- Follow up PR to https://github.com/pytorch/pytorch/issues/66790 to fix the tests on functorch, https://github.com/pytorch/functorch/issues/195

In functorch, a null tensor is added to the list of indices for the batch dimension in C++, but I can not find an equivalent of that in python without using `torch.jit.script`. If any other better solutions could be suggested, I'd be happy to replace the current way of testing.

cc ngimel zou3519

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67189

Reviewed By: suo

Differential Revision: D31966686

Pulled By: ngimel

fbshipit-source-id: a14b9e5d77d9f43cd728d474e2976d84a87a6ff4
2021-11-08 17:56:43 -08:00
3f048c637f [distributed] Render torch.distributed.optim members (#67885)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67885

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32191952

Pulled By: jamesr66a

fbshipit-source-id: a9ed52da8e89b3491eab2e691f5571338f83e8e3
2021-11-08 16:20:55 -08:00
fd198a2fea [fx2trt] fix import in oss tests (#68016)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68016

We would want to use oss test utils.

Also refactor both test utils so that the internal one is an enhancement over the oss test utils.

Test Plan: CI

Reviewed By: wushirong

Differential Revision: D32250266

fbshipit-source-id: 968b8f215ca2d294f7d0bd13cf9563be567954dd
2021-11-08 16:11:00 -08:00
0d8a8a2e41 [fx2trt]organize converter utils (#68015)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68015

Put all converter utils into a single file `converter_utils.py`.

Test Plan: CI

Reviewed By: wushirong

Differential Revision: D32250243

fbshipit-source-id: 93fb34bc9ca23f4c3cef3125e04871083dbd413d
2021-11-08 16:09:42 -08:00
5b036d5f2b [Doc] [ONNX]Fix a broken url for ONNXRuntime custom op (#67944)
Summary:
**Description**
Update the broken url by a valid link https://onnxruntime.ai/docs/reference/operators/add-custom-op.html.

**Motivation**
Closes https://github.com/pytorch/pytorch/issues/67849. The url is broken.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67944

Reviewed By: NivekT

Differential Revision: D32252880

Pulled By: H-Huang

fbshipit-source-id: 400b0efa3d6f63e60b016c482fbbed1293c29806
2021-11-08 15:51:02 -08:00
82398e38ab Upgrade and fix boto3 version to 1.19.12 (#68025)
Summary:
The new boto3 version could be causing the macos test reporting to fail. Pinning to version 1.19.12

example fail: https://app.circleci.com/pipelines/github/pytorch/pytorch/406385/workflows/f15ca6ba-e8af-45a3-b1b0-c0298ea3fe9d/jobs/16687920

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68025

Reviewed By: malfet, seemethere

Differential Revision: D32261971

Pulled By: janeyx99

fbshipit-source-id: 1a2cd636a2f0b206921749c3f0c9e4707c9a1222
2021-11-08 15:43:35 -08:00
9094947b0a use better secrets for upload labels workflow (#68013)
Summary:
Should prevent https://github.com/pytorch/pytorch/runs/4134946329?check_suite_focus=true

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68013

Reviewed By: seemethere

Differential Revision: D32254046

Pulled By: janeyx99

fbshipit-source-id: 55a7a1b8f8434f6608fe9d423982406c1e187c59
2021-11-08 15:14:28 -08:00
db9b4f1a37 [ROCm] Bump magma source to pickup memory leak fix (#67225)
Summary:
Magma's magma_queue was double allocating storage when creating
ptrArray for gemm operations.  A fix has been upstreamed and the build
needs to pick this up going forward.

Fixes #{issue number}

cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67225

Reviewed By: janeyx99

Differential Revision: D32252609

Pulled By: seemethere

fbshipit-source-id: e27ba1a54dc060fd1bfb4afad9079bf9b4705c8a
2021-11-08 15:08:09 -08:00
0b09d62cf3 [hackathon][DataPipe] adding .pyi file generation for torch.utils.data.datapipes (#67374)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* __->__ https://github.com/pytorch/pytorch/issues/67374

This is a work in progress.

Related TorchData issue: https://github.com/pytorch/data/issues/80

cc VitalyFedyunin ejguan NivekT

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67374

Reviewed By: H-Huang

Differential Revision: D32153211

Pulled By: NivekT

fbshipit-source-id: b4c61f191f20fd98ca44bb9e4f972c6d812994a0
2021-11-08 14:43:24 -08:00
2e523ed229 [JIT] additional support for CallMethod with autocasting (#67925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67925

Previously, the following would always fail, because autocasting would not be enabled in the called method:

```
torch.jit.script
def fn(x, y):
    with autocast():
        # CallMethod() to some method

fn(x, y)
```

This allows the above, if autocasting is globally enabled, e.g.

```
torch.jit.script
def fn(x, y):
    with autocast():
        # CallMethod() to some method

with autocast():
    fn(x, y) # now
```
ghstack-source-id: 142667351

Test Plan: added test in test_jit_autocast.py

Reviewed By: navahgar

Differential Revision: D32214439

fbshipit-source-id: bb7db054e25e18f5e3d2fdb449c35b5942ab303e
2021-11-08 14:37:09 -08:00
f57c63032e [ONNX] Fix reciprocal when input is not floating point (#67471) (#67808)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67808

torch.reciprocal implicitly casts the inputs to float, and ONNX
Reciprocal requires floating point inputs.

Also separate the reciprocal test from other tests, and test different
input types.

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D32181307

Pulled By: malfet

fbshipit-source-id: 3e1109b3c85a49c51dc713656a900b4ee78c8340
2021-11-08 14:37:07 -08:00
eb22d06e5e [ONNX] Use human readable enum for dtype scalars (#66822) (#67807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67807

Also make quoting of string literals consistent.

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D32181309

Pulled By: malfet

fbshipit-source-id: e1053701e3589f0310d8b5ef920359c03c6713f0
2021-11-08 14:37:05 -08:00
958d517643 [ONNX] Fix new_full and full_like for Python 3.9 (#67124) (#67806)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67806

Previously new_full would fail with errors like:
`TypeError: only integer tensors of a single element can be converted to an index`

And full_like would trigger warnings like:
`DeprecationWarning: an integer is required (got type float).  Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.`

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D32181301

Pulled By: malfet

fbshipit-source-id: 2cf262cfef36c18e7b2423efe1e1d4fa3438f0ba

Co-authored-by: Bowen Bao <bowbao@microsoft.com>
2021-11-08 14:37:03 -08:00
37688148ae [ONNX] Support opset 15 (#67121) (#67805)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67805

Also fix Reduce ops on binary_cross_entropy_with_logits

The graph says the output is a scalar but with `keepdims=1`
(the default), the output should be a tensor of rank 1. We set keep
`keepdims=0` to make it clear that we want a scalar output.

This previously went unnoticed because ONNX Runtime does not strictly
enforce shape inference mismatches if the model is not using the latest
opset version.

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D32181304

Pulled By: malfet

fbshipit-source-id: 1462d8a313daae782013097ebf6341a4d1632e2c

Co-authored-by: Bowen Bao <bowbao@microsoft.com>
2021-11-08 14:37:00 -08:00
ead59b5ff3 [ONNX] Suppress ort warnings in onnx related test (#67054) (#67804)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67804

Improve readability of test logs by suppressing ort warnings logging for onnx related test.

Reducing ONNX CI test log binary size:
linux-xenial-py3.6-clang7-onnx-test1: 12443 KB -> 6958 KB
linux-xenial-py3.6-clang7-onnx-test2: 16884 KB -> 5778 KB

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D32181308

Pulled By: malfet

fbshipit-source-id: 11cf165dc212d061606590e96c08c6e021135f74

Co-authored-by: BowenBao<bowbao@microsoft.com>
2021-11-08 14:35:20 -08:00
ea60e7d559 Op info for activation functions 2 (softsign, tanh, tanhshrink, threshold, celu, sigmoid, mish, hardsigmoid) (#67492)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67492

Reviewed By: mruberry

Differential Revision: D32007691

Pulled By: samdow

fbshipit-source-id: 6cb14dc56e296154e2f48249049c4d2fe4f4d10d
2021-11-08 14:30:50 -08:00
a1d733ae8c Avoid convert trt.Dims to tuple in hot path (#67960)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67960

For some reason, we are throwing py::index_error when converting a trt.Dims to tuple. This staying in the hot path of trt inference in not good, especially when we register a bunch of pybind11 exception translator where they repeatedly rethrow the exception. Since shape is static information, we save it once to avoid such repeated conversion.

Reviewed By: jianyuh, wushirong, 842974287

Differential Revision: D32232065

fbshipit-source-id: 11e49da9758ead0ff3aa647bbd3fce7735bf4a07
2021-11-08 13:36:15 -08:00
4a8f27445d [Quant] Add dynamic QAT Linear module (#67325)
Summary:
**Summary:** This commit adds the `torch.nn.qat.dynamic.modules.Linear`
module, the dynamic counterpart to `torch.nn.qat.modules.Linear`.
Functionally these are very similar, except the dynamic version
expects a memoryless observer and is converted into a dynamically
quantized module before inference.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67325

Test Plan:
`python3 test/test_quantization.py TestQuantizationAwareTraining.test_dynamic_qat_linear`

**Reviewers:** Charles David Hernandez, Jerry Zhang

**Subscribers:** Charles David Hernandez, Supriya Rao, Yining Lu

**Tasks:** 99696812

**Tags:** pytorch

Reviewed By: malfet, jerryzh168

Differential Revision: D32178739

Pulled By: andrewor14

fbshipit-source-id: 5051bdd7e06071a011e4e7d9cc7769db8d38fd73
2021-11-08 10:24:25 -08:00
db456d16ee torch.lobpcg.backward: do not save non-Variable types with ctx.save_for_backward. (#67994)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67827

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67994

Reviewed By: H-Huang

Differential Revision: D32244818

Pulled By: albanD

fbshipit-source-id: 702a3a1d1f4c160bef7ec1f764a2ab5d01ca7901
2021-11-08 10:02:09 -08:00
8e2528132b [lint] small updates to .lintrunner.toml (#67942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67942

- Change "name" to "code" for consistency with linttool and LintMessage
format.
- Change "args" and "init_args" to "command" and "init_command" for
consistency with internal representation.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D32250606

Pulled By: suo

fbshipit-source-id: 557fef731bab9adca7ab1e7cc41b996956076b05
2021-11-08 09:45:26 -08:00
d201102d36 [lint] Add the rest of the grep linters (#67932)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67932

Also various improvements to grep_linter.py, including the ability to
specify a replacement pattern.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D32250603

Pulled By: suo

fbshipit-source-id: e07eb182e9473a268e2b805a68a859b91228bfbb
2021-11-08 09:45:20 -08:00
53f118c800 [lint] improve mypy lintrunner config (#67936)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67936

- Add the strict config
- Make the patterns exactly match the current CI
- Add init_args

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D32250605

Pulled By: suo

fbshipit-source-id: a71d434bf6024db4462260a460a1bc2d9ac66a32
2021-11-08 09:45:14 -08:00
419c58ea9c [lint] add newlines linter to lintrunner (#67894)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67894

As title. Confirmed that the code base passes by running:

```
lintrunner --paths-cmd='git grep -Il ""' --take NEWLINE
```

and seeing that it pases

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D32250604

Pulled By: suo

fbshipit-source-id: de9bcba635d21f8832bb25147b19b7b2e8802247
2021-11-08 09:45:07 -08:00
4b021280ad [lint] add nativefunctions to lintrunner (#67890)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67890

Adding another linter. I also added a generic initializer that installs
the right pip packages (you can invoke it by running `lintrunner init`).

Differential Revision:
D32197366
D32197366

Test Plan: Imported from OSS

Reviewed By: driazati

Pulled By: suo

fbshipit-source-id: 82844e78f1ee3047220d8444874eab41d7cc0e9e
2021-11-08 09:44:59 -08:00
5bb5bfccf7 [lint] add lintrunner support for circleci_linter (#67872)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67872

As title. This demonstrates some of the nice features of lintrunner:
- Uniform error reporting means you get a nice diff of the changes for
free
- Can run with -a to just accept the changes (don't need to tell people
to run a special regenerate command since the linter adaper already knows how.)

Differential Revision:
D32187386
D32187386

Test Plan: Imported from OSS

Reviewed By: driazati

Pulled By: suo

fbshipit-source-id: 71de6b042730be80ff6794652039e9bc655a72b1
2021-11-08 09:43:25 -08:00
b3770766c4 Fixes deprecation warnings in test_optim.py (#67954)
Summary:
Catches deprecation warnings when we call `scheduler.step(epoch)`
in tests.

Removes duplicate parameters to optimizers unless we are specifically
testing for that

Fixes https://github.com/pytorch/pytorch/issues/67696

There is one warning remaining when I run this locally -- however that is due to the implementation of the `SequentialLR` Scheduler. I will open a new issue relating to that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67954

Reviewed By: H-Huang

Differential Revision: D32244056

Pulled By: albanD

fbshipit-source-id: 2ab3086a58e10c8d29809ccbaab80606a1ec61d8
2021-11-08 09:36:08 -08:00
b546cdf401 [SR] Out variant for prim::NumToTensor (#67856)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67856

Returns a tensor constructed from scalar input

Test Plan:
```
buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest
```

Ran
```
buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --gtest_filter=*NumToTensorScalar* --v=1
```
and the output contains `Switch to out variant for node: %2 : Tensor = prim::NumToTensor(%0)`.

Reviewed By: mikeiovine

Differential Revision: D32014194

fbshipit-source-id: e7df65ea1bf05d59c1fc99b721aee420e484f542
2021-11-08 09:02:58 -08:00
0dc99dcf59 Update __init__.py (#67900)
Summary:
fix bugs https://github.com/pytorch/pytorch/issues/67896
fix a syntax error in pytorch/torch/cuda/__init__.py
Fixes https://github.com/pytorch/pytorch/issues/67896

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67900

Reviewed By: mruberry

Differential Revision: D32211978

Pulled By: soulitzer

fbshipit-source-id: a313a5e23b4d79e5b7bb909eaf82c9ee6cab10c9
2021-11-08 08:56:38 -08:00
5bc89275dd [SR] Eliminate no-ops (#67437)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67437

Certain ops do nothing on the forward pass and can be discarded after training: `aten::detach` and `fb::scale_gradient` are examples of this.

Test Plan: `buck test caffe2/test:jit -- test_freezing`

Reviewed By: hlu1

Differential Revision: D31980843

fbshipit-source-id: 0045b6babcfae786a2ce801b2f5997a078205bc0
2021-11-08 08:42:33 -08:00
191b48b12f [torch.fx] Fix replace pattern mechanism (#66442)
Summary:
Fixes #{issue number}

The following code would not return the pattern correctly:

```python
        def f(x):
            x = torch.sigmoid(x)
            x = torch.sigmoid(x)
            return torch.sigmoid(x)

        def pattern(x):
            return torch.sigmoid(x)

        def replacement(x):
            return torch.exp(x)

        def comparison(x):
            x = torch.exp(x)
            x = torch.exp(x)
            return torch.exp(x)

        traced = symbolic_trace(f)
        comparison_fn = symbolic_trace(comparison)

        subgraph_rewriter.replace_pattern(traced, pattern, replacement) # Only one sigmoid gets converted.
```

This PR fixes this by adding a new test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66442

Reviewed By: ZolotukhinM

Differential Revision: D32238424

Pulled By: ansley

fbshipit-source-id: 386e777174c639baafc166d5ffbc0658a96b1ee9
2021-11-07 13:23:02 -08:00
9fb3ba9d7b Revert D31762735 (#67924)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67924

This diff reverts the changes made in D31762735 (0cbfd466d2)

Test Plan: Wait for CI

Reviewed By: derekmod-fb

Differential Revision: D32214744

fbshipit-source-id: e0a65b6a31a88216ae1243549fcbc901ef812374
2021-11-06 17:34:13 -07:00
9cacf2b718 Add custom zipper script to zip python modules for torch.deploy (#67006)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67006

Test Plan: nervouslaugh_

Reviewed By: shunting314

Differential Revision: D31822429

fbshipit-source-id: c2efeab1446fbeb70b98d4ee766fbc670cf091b0
2021-11-06 11:49:02 -07:00
ae501a9727 [PyTorch Edge] Update bytecode version compatibility check (#67417)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67417

bytecode version is valid when it's smaller than kMaxSupported and larger than kMinSupported
ghstack-source-id: 142609392

Test Plan:
```
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleFail'
```

Reviewed By: JacobSzwejbka, iseeyuan

Differential Revision: D31984839

fbshipit-source-id: 2011e77455c931c0a8a58267494d44bcf167b877
2021-11-05 19:34:01 -07:00
80178d6152 [DDP] Fix some issues with code example in DDP docstring (#67883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67883

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Test Plan: Imported from OSS

Reviewed By: zhaojuanmao

Differential Revision: D32190946

Pulled By: jamesr66a

fbshipit-source-id: a376324b95cbe833ffa606ecdfc6156432880f70
2021-11-05 17:32:45 -07:00
22afe82ce3 [rpc] Switch RPC agent check to TORCH_CHECK and add more descriptive error (#67882)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67882

I ran into a hard-to-interpret error message when trying to run the following script, which was missing an `init_rpc` call:

```
# $ torchrun --standalone --nnodes=1 --nproc_per_node=1 script.py
import os
rank = int(os.environ['LOCAL_RANK'])
world_size = int(os.environ['WORLD_SIZE'])

import torch.distributed
# !!!!!! Uncomment the following and the script succeeds
# torch.distributed.rpc.init_rpc('worker', rank=rank, world_size=world_size)

import torch.distributed as dist
dist.init_process_group(backend='gloo')

import torchvision.models as models
import torch

rn50 = models.resnet50()
rn50.train()
rn50 = torch.nn.parallel.DistributedDataParallel(rn50)

from torch.distributed.rpc import RRef
from torch.distributed.optim import DistributedOptimizer

params = []
for param in rn50.parameters():
    params.append(RRef(param))

dist_optim = DistributedOptimizer(
        torch.optim.SGD,
        params,
        lr=0.05)

loss_func = torch.nn.CrossEntropyLoss()

with torch.distributed.autograd.context() as context_id:
    pred = rn50(torch.randn(50, 3, 224, 224))
    target = torch.randn(50, 1000).softmax(dim=1)
    loss = loss_func(pred, target)
    dist.autograd.backward(context_id, [loss])
    dist_optim.step(context_id)
```

Error:

```
Traceback (most recent call last):
  File "/xxx/torchrun_exp/script.py", line 23, in <module>
    params.append(RRef(param))
RuntimeError: agentINTERNAL ASSERT FAILED at "../torch/csrc/distributed/rpc/rpc_agent.cpp":237, please report a bug to PyTorch. Current RPC agent is not set!
```

Since this is a user-facing error, I've changed `TORCH_INTERNAL_ASSERT` to `TORCH_CHECK` and added a hint about how to resolve the issue. On the other hand, the fact that this was originally `TORCH_INTERNAL_ASSERT` may suggest that the author thought that this should be an internal-only error condition. If there is some other place that should be throwing an exception in this case that is failing, let me know and I can adapt the fix to change that location.

Question for reviewers:
* Is there a good test file where I can add a test for this error condition?

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D32190947

Pulled By: jamesr66a

fbshipit-source-id: 3621d755329fd524db68675c55b1daf20e716d43
2021-11-05 17:31:11 -07:00
efdb17b984 Add meta support to tensor range factories (#67032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67032

This PR adds meta backend support to the `range`, `arange`, `linspace`, and `logspace` operators.

Note that the original PR (#66630) was reverted due to two failing unit tests in the Bionic CI. This revision includes a fix for those tests; otherwise its content is identical to the previous PR.

Original commit changeset: 2f9d8d1acbb0
ghstack-source-id: 142487306

Test Plan: Extended the existing tensor creation tests to assert meta backend support.

Reviewed By: zhaojuanmao

Differential Revision: D31834403

fbshipit-source-id: a489858a2a8a38a03234b14408e14d2b208a8d34
2021-11-05 15:36:29 -07:00
9e8016d8c4 Revert D31932215: [pytorch][PR] Don't #define NUM_THREADS
Test Plan: revert-hammer

Differential Revision:
D31932215 (f70e8064f4)

Original commit changeset: ccdf11e249fb

fbshipit-source-id: 4c330aebe9cfb483f02ceb1fdaf5c3b0f8fa6fa1
2021-11-05 15:14:32 -07:00
10411e3561 [quan][fusion] Fix a additional_fuser_method method for fuse_fx (#67876)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67876

Previously we miss it when we call obj.convert and this argument would not impact the fusion.
This PR fixes it and adds a test for it

Test Plan:
python test/test_quantization.py TestFuseFx

Imported from OSS

Reviewed By: malfet

Differential Revision: D32191364

fbshipit-source-id: 566bd39461010d70a21de71f611bb929976fe01d
2021-11-05 14:51:15 -07:00
f70e8064f4 Don't #define NUM_THREADS (#67258)
Summary:
PyTorch doesn't compile with the latest `main` branch of cub again. The root cause is, PyTorch defines a macro `NUM_THREADS`, and cub added some code like
```C++
template<...., int NUM_THREADS, ...>
```
and these two mess up with each other.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67258

Reviewed By: albanD

Differential Revision: D31932215

Pulled By: ngimel

fbshipit-source-id: ccdf11e249fbc0b6f654535067a0294037ee7b96
2021-11-05 13:56:11 -07:00
b1ecfc6d45 Add timeouts for GHA jobs for pytorch/pytorch (#67912)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67713

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67912

Reviewed By: seemethere

Differential Revision: D32215323

Pulled By: atalman

fbshipit-source-id: 45da7c4bb13c877c9b38bea8615adf75c4a9702d
2021-11-05 12:50:19 -07:00
f6402c469e (torch/elastic) fix scale down bug caused by calling rdzv_handler.shutdown() on premature agent failures (#67749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67749

Fixes: https://github.com/pytorch/pytorch/issues/67742

Test Plan:
Added unittests.

Validated manually:

```
# start agent 0
$ torchrun --rdzv_backend c10d --rdzv_id 123 --rdzv_endpoint localhost:29500 --nnodes 1:2 --nproc_per_node 1 --monitor_interval 1 test.py

# start agent 1
torchrun --rdzv_backend c10d --rdzv_id 123 --rdzv_endpoint localhost:29500 --nnodes 1:2 --nproc_per_node 1 --monitor_interval 1 test.py

# kill agent 0
CTRL+C (SIGINT) or kill -15 (SIGTERM)

# restart it
torchrun --rdzv_backend c10d --rdzv_id 123 --rdzv_endpoint localhost:29500 --nnodes 1:2 --nproc_per_node 1 --monitor_interval 1 test.py
```

Reviewed By: cbalioglu

Differential Revision: D32129005

fbshipit-source-id: db292268250ef6f1e06f5b4c5bd67124d8dfd325
2021-11-05 12:18:46 -07:00
240e8d5cc5 Updated searchsorted functionality (#66818)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60492

Updates searchsorted API to be more consistent with numpy and adds an OpInfo for searchsorted

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66818

Reviewed By: mruberry

Differential Revision: D31745142

Pulled By: samdow

fbshipit-source-id: 0b9600afc3cb0720afb5811212404ee96d2a7d93
2021-11-05 12:13:47 -07:00
f6a4c80a5a Refactor cuDNN Convolution memory format and Conv-Bias-Relu code (#65594)
Summary:
This PR makes several changes:

- Changed function `bool cudnn_conv_use_channels_last(...)` to `at::MemoryFormat cudnn_conv_suggest_memory_format(...)`
- Removed `resize_` in cudnn convolution code. Added a new overloading method `TensorDescriptor::set` that also passes the desired memory format of the tensor.
- Disabled the usage of double + channels_last on cuDNN Conv-Relu and Conv-Bias-Relu. Call `.contiguous(memory_format)` before passing data to cuDNN functions.
- Disabled the usage of cuDNN fused Conv-Bias-Relu in cuDNN < 8.0 version due to a CUDNN_STATUS_NOT_SUPPORTED error. Instead, use the native fallback path.
- Let Conv-Bias-Relu code respect the global `allow_tf32` flag.

From cuDNN document, double + NHWC is genenrally not supported.

Close https://github.com/pytorch/pytorch/pull/66968

Fix https://github.com/pytorch/pytorch/issues/55301

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65594

Reviewed By: jbschlosser, malfet

Differential Revision: D32175766

Pulled By: ngimel

fbshipit-source-id: 7ba079c9f7c46fc56f8bfef05bad0854acf380d7
2021-11-05 11:50:55 -07:00
cdd5d16489 [Foreach] Implement L1&L2 norm (#62646)
Summary:
Implement L1 & L2 norm in fast path with the reference of [nvidia/apex](https://github.com/NVIDIA/apex/blob/master/csrc/multi_tensor_l2norm_kernel.cu).
When `ord` is neither 1 nor 2, then slow path is chosen.

Related: https://github.com/pytorch/pytorch/issues/58833

cc ptrblck mcarilli ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62646

Reviewed By: malfet

Differential Revision: D32173421

Pulled By: ngimel

fbshipit-source-id: 14b7544601658a979b83509df351e1848ded7675
2021-11-05 11:23:00 -07:00
e7a3bbce89 [nnc] Add support for dynamic shapes in TensorExprKernel (#67861)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67861

Previously submitted as https://github.com/pytorch/pytorch/pull/67197.
This got reverted because its failures were hidden by the failures of
another PR.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D32178196

Pulled By: navahgar

fbshipit-source-id: cc8a5c68aed360d06289e69645461cfa773e1300
2021-11-05 11:18:19 -07:00
a4a6d056e6 Add ownership to more edge tests (#67859)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66232

This should be the last immediate task. I anticipate test ownership will change overtime but this is the last big thing to close it out

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67859

Reviewed By: soulitzer

Differential Revision: D32210534

Pulled By: janeyx99

fbshipit-source-id: 7fd835d87d9d35d49ec49de1fcfa29b085133e99
2021-11-05 11:01:16 -07:00
9dafb6434b remove use of THGenerateAllTypes, clean up (#67867)
Summary:
Per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67867

Reviewed By: mruberry

Differential Revision: D32191053

Pulled By: ngimel

fbshipit-source-id: 84eb6c2989495fca5f7b055c4984efe5de94e812
2021-11-05 10:57:04 -07:00
ee7412dd29 autodiff fix for autocast_to_xxx (#67648)
Summary:
Fixes autocast + autodiff issue where `RuntimeError: grad_inputs.size() == node->inputs().size()INTERNAL ASSERT FAILED at "../torch/csrc/jit/runtime/autodiff.cpp":426, please report a bug to PyTorch.`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67648

Reviewed By: cpuhrsch

Differential Revision: D32083227

Pulled By: davidberard98

fbshipit-source-id: edf526cff4ec21874ae35ec730d13c250073e10c
2021-11-05 10:48:39 -07:00
9269080b47 [PyTorchEge] backport test (#67824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67824

Testing backport of all prod models using model test framework

Ref:
[Create tests at run-time (google test)](https://stackoverflow.com/questions/19160244/create-tests-at-run-time-google-test)

breaking the list of models into 20 chunks based on a simple hash (sum of all char values)
ghstack-source-id: 142398833

Test Plan:
```
 buck test //xplat/pytorch/mobile/test:test_read_all_mobile_model_configs
Starting new Buck daemon...

Parsing buck files: finished in 7.6 sec
Creating action graph: finished in 0.9 sec
[RE] Metadata: Session ID=[reSessionID-66f5adfe-50d1-4599-9828-3e8115181601]
[RE] Waiting on 0 remote actions. Completed 1008 actions remotely, action cache hit rate: 43.59%.
Downloaded 26/1523 artifacts, 252.60 Kbytes, 96.6% cache miss (for updated rules)
Building: finished in 01:18.6 min (100%) 5532/5532 jobs, 770/5532 updated
  Total time: 01:27.3 min
Testing: finished in 11:21.6 min (41 PASS/0 FAIL)
BUILD SUCCEEDED
RESULTS FOR //xplat/pytorch/mobile/test:test_read_all_mobile_model_configs
PASS    673.8s 41 Passed   0 Skipped   0 Failed   //xplat/pytorch/mobile/test:test_read_all_mobile_model_configs
TESTS PASSED
```

Reviewed By: dhruvbird

Differential Revision: D32068955

fbshipit-source-id: d06c2434a4a69572ab52df31a684e5973b9d551c
2021-11-05 10:41:36 -07:00
02e35ce17b [ONNX] Update onnx function export with comments and clean up (#66817) (#67803)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67803

* Addresses comments from #63589

[ONNX] remove torch::onnx::PRODUCER_VERSION (#67107)

Use constants from version.h instead.
This simplifies things since we no longer have to update
PRODUCER_VERSION for each release.

Also add TORCH_VERSION to version.h so that a string is available for
this purpose.

[ONNX] Set `ir_version` based on opset_version. (#67128)

This increases the odds that the exported ONNX model will be usable.
Before this change, we were setting the IR version to a value which may
be higher than what the model consumer supports.

Also some minor clean-up in the test code:
* Fix string replacement.
* Use a temporary file so as to not leave files around in the test
  current working directory.

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D32181306

Pulled By: malfet

fbshipit-source-id: 02f136d34ef8f664ade0bc1985a584f0e8c2b663

Co-authored-by: BowenBao <bowbao@microsoft.com>
Co-authored-by: Gary Miguel <garymiguel@microsoft.com>
Co-authored-by: Nikita Shulga <nshulga@fb.com>
2021-11-05 10:35:35 -07:00
ace2183195 [FSDP] Address follow up comments for CPU offload (#67813)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67813

Address Shen's comments in
https://github.com/pytorch/pytorch/pull/67249/files
ghstack-source-id: 142379312

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D32157545

fbshipit-source-id: 3cc2df6d5fa0d3b9383ed3711e7f79729dbb1dda
2021-11-05 10:34:08 -07:00
823ae3a4ff [forward ad] Also check layout of grad matches that of self for inplace over view (#67816)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67800

Currently when the grad is the same layout as base, we try to assign the same tensor to the forward grad of both the base and the view. However, when the layout of the grad is different from the layout of the view, this triggers a copy to be created, and the tangent of the view (after the inplace) will not have a view relationship with the view of the base.

This PR just changes it so that we only do the above optimization when the layout also matches the layout of self

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67816

Reviewed By: malfet

Differential Revision: D32190021

Pulled By: soulitzer

fbshipit-source-id: b1b2c9b332e83f4df5695ee9686ea76447f9305b
2021-11-05 10:26:24 -07:00
13a69d23b1 Add retry logic for test_multitenancy and documentation for find_free_port (#67775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67775

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D32142749

Pulled By: H-Huang

fbshipit-source-id: 67ab4ede4f4bff96a1ffd41d55b3be0edc82b1ce
2021-11-05 09:05:12 -07:00
33b7790907 Fix conv_transpose3d backward with non-contiguous grad_out (#67829)
Summary:
Many thanks to Forest Yang (meowmix) from the forum for reporting it with a minimal reproduction.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67829

Reviewed By: malfet

Differential Revision: D32184786

Pulled By: albanD

fbshipit-source-id: b63dbd3148b5def2109deb2f4612c08f55f59dfb
2021-11-05 08:34:21 -07:00
07a08fb95f Fix typo in LinearLR docs (#67840)
Summary:
The final learning rate should be 0.05 like the lr used as the argument for the optimizer and not 0.005.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67840

Reviewed By: jbschlosser

Differential Revision: D32187091

Pulled By: albanD

fbshipit-source-id: 8aff691bba3896a847d7b9d9d669a65f67a6f066
2021-11-05 07:16:15 -07:00
53ebccbe78 Fix warnings produced when running test_optim.py (#67756)
Summary:
Fixes part of https://github.com/pytorch/pytorch/issues/67696 by adding calls to `optimizer.step()` in various places.

## Notes for reviewers:
- It is not entirely clear which is the right optimizer to step in each case. I have favoured the more explicit approach of creating a set of optimizers and calling step on each of them.
- At the time of writing, the only Scheduler without an `optimizer` instance variable is `ChainedScheduler` which I need to deal with once. I use `hasattr` to do this check. Let me know if this ought to be changed.
- I am opening this PR for review when it only solve part of the issue, as I'd rather get feedback sooner. I think it is fine to fix the issue in several PRs too.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67756

Reviewed By: jbschlosser

Differential Revision: D32187864

Pulled By: albanD

fbshipit-source-id: fd0d133bcaa3a24588e5a997ad198fdf5879ff5a
2021-11-05 07:12:13 -07:00
b098264f22 Revert D32063662: [pytorch][PR] TST Adds device transfer into module info tests
Test Plan: revert-hammer

Differential Revision:
D32063662 (da59bd1d13)

Original commit changeset: 0868235a0ae7

fbshipit-source-id: a4f775874faa88be0eb5272dedf3bbc8194ebde6
2021-11-05 07:07:39 -07:00
bb8978f605 Revert D32175963: Converting hardswish to strucutred kernels with metatensor support
Test Plan: revert-hammer

Differential Revision:
D32175963 (57335a9ee3)

Original commit changeset: f4d749c6aeaf

fbshipit-source-id: 6d68a96cf872c2d7b518c061875b9336bca0043a
2021-11-05 07:04:40 -07:00
4d5338228f Revert D32175960: Moving parts of the Shape Registry into a common file
Test Plan: revert-hammer

Differential Revision:
D32175960 (d04389e6f0)

Original commit changeset: 2e30115ca554

fbshipit-source-id: 27f9889c535e4f7c21c50b2468e1e6650e952d4f
2021-11-05 07:04:37 -07:00
38af37f409 Revert D32175958: Adding Custom Rules to Device Propagation
Test Plan: revert-hammer

Differential Revision:
D32175958 (853298481b)

Original commit changeset: 26a9ef41e10a

fbshipit-source-id: adcc70687b5b454f358b5446bed2c06d04e61435
2021-11-05 07:04:35 -07:00
b1ac7f51a1 Revert D32175957: Adding custom testing based on opinfos input for ops with custom rules.
Test Plan: revert-hammer

Differential Revision:
D32175957 (b8e165e841)

Original commit changeset: 1cb51a7b6cbb

fbshipit-source-id: 29fd0750d9981758436c55eea2de40cdaddfb9be
2021-11-05 07:04:33 -07:00
0c8569bec9 Revert D32175959: Merging the implementations of ClearProfiling
Test Plan: revert-hammer

Differential Revision:
D32175959 (f1754319e3)

Original commit changeset: b335dacce709

fbshipit-source-id: 23d1f75d47f15effc9806bd6e5228007d521b0b3
2021-11-05 07:03:18 -07:00
2f68878a05 [Static Runtime] Add a comment on clients taking ownership of managed output tensors (#67554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67554

This change adds a comment on clients taking ownership of managed output tensor to remind SR developers of  how and why that matters.

Test Plan: N/A

Reviewed By: swolchok

Differential Revision: D32013468

fbshipit-source-id: bcc13055c329c61677bdcc76411fe8db44bb2cee
2021-11-04 22:20:49 -07:00
ba9d9d488e Implement padding with slice layer (#67888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67888

Implement padding with slice layer, work step is:
reverse slice and pad 0
[1, 2] => [2, 1, 0 ... 0]
transpose, reverse tensor back to original order, finish pre-pad
[2, 1, 0 ... 0] => [0 ... 0, 1, 2]
continue post-pad
[0 ... 0, 1, 2] => [0 ... 0, 1, 2, 0 ... 0]

Test Plan: buck test mode/dev-nosan caffe2/test/fx2trt/converters:test_pad

Reviewed By: 842974287

Differential Revision: D32160739

fbshipit-source-id: dbbc04d916e23551e3ce9be480283377e9a38b34
2021-11-04 21:25:01 -07:00
daaad47d9c Allow torch::deploy unity embed xar file of any size (#67814)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67814

There was a limitation on the xar file size we can embed into the binary previously. The payload (xar file here) is added to .data section by default using 'ld -b binary -r' command (which section the payload goes is hardcoded in ld BTW. Check code pointer [here](https://github.com/bminor/binutils-gdb/blob/binutils-2_32/bfd/binary.c#L80) ) . When we link the object file containing the payload to other parts of the executable, we will get relocation out of range error if the overall size of .test, .data, .bss etc sections exceed 2G. Some relocation entries uses 32 bit singed integer, thus the limit is 2G here.

To solve the issue and mitigate the risk, we designed a mechanism to put the payload in a customized payload section (.torch_deploy_payload.unity here). The payload section does not join the party of relocating and symbol resolution, thus in theory it can be as large as the disk space... Since we don't do relocation for the payload section, the start/end/size symbols are no longer available/valid, we have to parse the ELF file ourselves to figure out those.

The mechanism can be used to embed interprter.so as well. The interpreter.so is currently 0.5G. That will limit the other .test/.data/.bss sections of the executable to be at most 1.5G. Using this mechanim in this diff avoid the interpreter.so taking any budgets. We could also use this mechanism to ship python scripts with our binary rather than freeze them before hand. These use cases are not handled in this diff.

This diff also improves experience for those simple use cases that does not depends on extra shared libraries in the XAR file (except the shared libraries for python extensions themselves). This is mainly for fixing the stress test right now, but it also makes other simple cases easier.
ghstack-source-id: 142483327

Test Plan:
# Verify the relocation out of range issue is fixed
Add //caffe2:torch as a dependency to the macro build_unity(name="example", …) in torch/csrc/deploy/unity/TARGETS and run 'buck run mode/opt :unity_demo', it's expected to get the relocation errors like:
```
ld.lld: error:
caffe2/c10/util/intrusive_ptr.h:325:(.text._ZN11ska_ordered8detailv317sherwood_v3_tableISt4pairIN3c106IValueES4_ES4_NS3_6detail11DictKeyHashENS0_16KeyOrValueHasherIS4_S5_S7_EENS6_14DictKeyEqualToENS0_18KeyOrValueEqualityIS4_S5_SA_EESaIS5_ESaINS0_17sherwood_v3_entryIS5_EEEE15emplace_new_keyIS5_JEEES2_INSH_18templated_iteratorIS5_EEbEaPSF_OT_DpOT0_+0x4E9): relocation R_X86_64_32S out of range: 2345984168 is not in [-2147483648, 2147483647]; references c10::UndefinedTensorImpl::_singleton
>>> defined in /data/sandcastle/boxes/fbsource/fbcode/buck-out/opt/gen/caffe2/c10/c10#platform009-clang,static/libc10.a(../c10#compile-UndefinedTensorImpl.cpp.o44c44c4c,platform009-clang/core/UndefinedTensorImpl.cpp.o)
```

With the diff, the error above is resolved.

# Pass Stress Test

Also pass existing unit tests for unity.

buck test mode/opt //caffe2/torch/csrc/deploy/unity/tests:test_unity_sum -- --exact 'caffe2/torch/csrc/deploy/unity/tests:test_unity_sum - UnityTest.TestUnitySum' --run-disabled --jobs 18 --stress-runs 10 --record-results

buck test mode/opt //caffe2/torch/csrc/deploy/unity/tests:test_unity_simple_model -- --exact 'caffe2/torch/csrc/deploy/unity/tests:test_unity_simple_model - UnityTest.TestUnitySimpleModel' --run-disabled --jobs 18 --stress-runs 10 --record-results

# Verify debug sections are not messed up

Verified that debug sections are not messed up and GDB still works:
`gdb ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/unity/unity_demo`

```
b main
run
l
c
```

Reviewed By: suo

Differential Revision: D32159644

fbshipit-source-id: a133513261b73551a71acc257f4019f7b5af34a8
2021-11-04 20:52:57 -07:00
5a48868d39 [qnnpack] fix benchmarks after an API update (#67768)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67768

We don't need to pass so many padding args after removing support for asymm padding from qnnpack

Test Plan: it builds

Reviewed By: jshen

Differential Revision: D32082204

fbshipit-source-id: 2bfe4c135ad613f0cc267e7e3ab6357731f29bc2
2021-11-04 20:17:05 -07:00
f1754319e3 Merging the implementations of ClearProfiling (#67575)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67575

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32175959

Pulled By: Gamrix

fbshipit-source-id: b335dacce709a64e3d5779f9c6e9569f86e10748
2021-11-04 19:02:08 -07:00
b8e165e841 Adding custom testing based on opinfos input for ops with custom rules. (#67500)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67500

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32175957

Pulled By: Gamrix

fbshipit-source-id: 1cb51a7b6cbb75bf3841e3c4caedf88aa94168fe
2021-11-04 19:02:06 -07:00
853298481b Adding Custom Rules to Device Propagation (#66973)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66973

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D32175958

Pulled By: Gamrix

fbshipit-source-id: 26a9ef41e10a171be6a8779a4e6014e2e7e3f2c1
2021-11-04 19:02:04 -07:00
d04389e6f0 Moving parts of the Shape Registry into a common file (#66948)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66948

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32175960

Pulled By: Gamrix

fbshipit-source-id: 2e30115ca554816166fedddbcdeffbe189eb19a6
2021-11-04 19:02:02 -07:00
57335a9ee3 Converting hardswish to strucutred kernels with metatensor support (#66899)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66899

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32175963

Pulled By: Gamrix

fbshipit-source-id: f4d749c6aeaf064084be72361607ea4f3f6bc91d
2021-11-04 19:02:00 -07:00
ec8a71f9ac Dtype Analysis for Unary and Binary ops with Metatensors (#66898)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66898

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32175961

Pulled By: Gamrix

fbshipit-source-id: 72721259b900e5a311b6bcb5c350366ba420b734
2021-11-04 19:00:50 -07:00
4b084bc832 Benchmarks for various fusers (#67622)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67622

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D32171063

Pulled By: bertmaher

fbshipit-source-id: 40d3a7adcc52aba3b051e382ec5ec4ee7e43d81b
2021-11-04 18:57:17 -07:00
31fc9d6539 Introduce version control for tensorrt converter decorator (#67886)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67886

Similar to what we have in torch2trt tensorrt_converter, introduce version enablement for fx2trt converters. Upgrade to trt 8.2 will introduce new op converter as well as deprecate old op.

Test Plan: pass existing unit test

Reviewed By: 842974287

Differential Revision: D32183581

fbshipit-source-id: 6419acada296d24e882efa9fca25eca6349153e4
2021-11-04 17:39:15 -07:00
f5daa9f76b [iOS] Enable ARC for CMake build (#67884)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67884

Test Plan: Imported from OSS

Reviewed By: husthyc

Differential Revision: D32191532

Pulled By: xta0

fbshipit-source-id: a295004f8e7f1b0f5a4ab12ffd9b37c36b80226b
2021-11-04 16:50:46 -07:00
c2ceba8ada [PyTorchEdge] Move all serialize/deserialize files to a separate target (#66805)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66805

{F672465642}

DGW:
```
buck query 'allpaths(//xplat/caffe2:torch_mobile_core, //xplat/caffe2:torch_mobile_interpreter)' --output-format dot_compact | pastry
bunnylol dgw paste_id

```

Test Plan:
buck builds to pass

```
buck build fbsource//fbandroid/mode/opt @//fbandroid/mode/messenger //fbandroid/apps/messenger:messenger_staticdi_dextr_splitarsc_dlstr_xzs_for_perftest_redex_optimizedtestableresources_postprocessed_resign //fbandroid/apps/messenger:messenger_staticdi_dextr_splitarsc_dlstr_xzs_for_perftest#unstripped_native_libraries

buck build //xplat/caffe2:torch_mobile_coreAndroid#android-armv7,shared

buck build //xplat/caffe2:torch_commonAndroid#android-armv7,shared

```

DGW:
```
buck query 'allpaths(//xplat/caffe2/fb/runtime:only_flatbuffer_test, //xplat/caffe2:miniz)' --output-format dot_compact | pastry
P464671429: https://www.internalfb.com/intern/paste/P464671429/

bunnylol dgw P464671429
```

loader is decoupled from miniz

```
buck query 'allpaths(//xplat/caffe2/fb/runtime:flatbuffer_loader, //xplat/caffe2:miniz)' --output-format dot_compactdigraph result_graph {
}
```

Reviewed By: iseeyuan

Differential Revision: D31532862

fbshipit-source-id: 51e6880e78e1cafe20c8d90e98037edc3c1b6b11
2021-11-04 15:55:52 -07:00
b0c05297f9 [Static Runtime] Arena allocate StorageImpls for managed tensors (#66130)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66130

We're reusing backing storage for these tensors, which is only safe because they have non-overlapping lifetimes. Accordingly, it seems that they can also share their StorageImpl.

ghstack-source-id: 142427752

Test Plan:
benchmarked ctr_mobile_feed local and local_ro:

Using recordio inputs for model 302008423_0

```
swolchok@devbig032 ~/f/fbcode> env MKL_NUM_THREADS=1 OMP_NUM_THREADS=1  > environment^C
swolchok@devbig032 ~/f/fbcode> sudo ~/fbsource2/fbcode/scripts/bertrand/noise/denoise-env.sh \
                                 /tmp/ptvsc2_predictor_benchNov1ArenaAllocateStorageImpls \
                               --scripted_model=/data/users/swolchok/ctr_mobile_feed_q3_2021/302008423_0.predictor.disagg.local \
                               --method_name=local.forward --pt_cleanup_activations=1 \
                               --pt_enable_out_variant=1 --pt_optimize_memory=1 --iters=2 --warmup_iters=2 \
                                      --num_threads=1 --pt_enable_static_runtime=1 --set_compatibility=1 --repetitions=5 --recordio_use_ivalue_format=1 --recordio_inputs=/data/users/swolchok/ctr_mobile_feed_q3_2021/302008423_0.local.inputs.recordio

Stable
========================================
I1101 14:19:16.473964 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 20.0131. Iters per second: 49.9673
I1101 14:20:12.193130 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 20.0155. Iters per second: 49.9612
I1101 14:21:07.761898 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9751. Iters per second: 50.0624
I1101 14:22:03.218066 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9104. Iters per second: 50.2249
I1101 14:22:58.723256 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.956. Iters per second: 50.1102
I1101 14:22:58.723306 2748837 PyTorchPredictorBenchLib.cpp:262] Mean milliseconds per iter: 19.974, standard deviation: 0.043643

ArenaAllocateStorageImpls
========================================
I1101 14:08:57.070914 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9771. Iters per second: 50.0572
I1101 14:09:52.605121 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.924. Iters per second: 50.1907
I1101 14:10:48.098287 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9353. Iters per second: 50.1624
I1101 14:11:43.645395 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9723. Iters per second: 50.0694
I1101 14:12:39.171636 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9673. Iters per second: 50.0819
I1101 14:12:39.171685 2695478 PyTorchPredictorBenchLib.cpp:262] Mean milliseconds per iter: 19.9552, standard deviation: 0.0239318

difference: 0.0188 (0.09%), which is less than 1 standard deviation

Stable, local_ro
========================================
I1101 14:26:10.796161 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.25991. Iters per second: 793.708
I1101 14:26:12.194727 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.26862. Iters per second: 788.26
I1101 14:26:13.591312 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.26549. Iters per second: 790.207
I1101 14:26:14.982439 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.25943. Iters per second: 794.01
I1101 14:26:16.377033 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.25995. Iters per second: 793.68
I1101 14:26:16.377094 2787930 PyTorchPredictorBenchLib.cpp:262] Mean milliseconds per iter: 1.26268, standard deviation: 0.00414788

ArenaAllocateStorageImpls, local_ro
========================================
I1101 14:26:45.875073 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.20987. Iters per second: 826.536
I1101 14:26:47.207271 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.20827. Iters per second: 827.633
I1101 14:26:48.533766 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.20023. Iters per second: 833.174
I1101 14:26:49.850610 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.19206. Iters per second: 838.884
I1101 14:26:51.172356 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.19958. Iters per second: 833.622
I1101 14:26:51.172411 2790009 PyTorchPredictorBenchLib.cpp:262] Mean milliseconds per iter: 1.202, standard deviation: 0.00722754

Difference: 0.06 usec/iter (4.8%), which is much more than 1 standard deviation

```

we can see that this is a large relative improvement on local_ro, but no effect on local.

Reviewed By: hlu1

Differential Revision: D31357486

fbshipit-source-id: 229c003677da76e89c659d0e0639002accced76e
2021-11-04 15:43:39 -07:00
01809731bc [Static Runtime] Cache managed tensor Storages (#66638)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66638

See comments in code explaining what we're doing here.
ghstack-source-id: 142427750

Test Plan:
Ran ptvsc2_predictor_bench on ctr_mobile_feed local and local_ro net before/after this change on a devserver with turbo off.

Results:

```
stable, local_ro:
========================================
I1014 16:13:52.713300 151733 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.68012. Iters per second: 373.118
I1014 16:14:00.961875 151733 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.66156. Iters per second: 375.719
I1014 16:14:09.163097 151733 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.6449. Iters per second: 378.086
I1014 16:14:17.425621 151733 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.66661. Iters per second: 375.008
I1014 16:14:25.711349 151733 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.67375. Iters per second: 374.006
I1014 16:14:25.711390 151733 PyTorchPredictorBenchLib.cpp:269] Mean milliseconds per iter: 2.66539, standard deviation: 0.0134423

stable, local:
========================================
I1014 15:08:28.547081 3979345 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.42772. Iters per second: 155.576
I1014 15:08:48.276582 3979345 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.3643. Iters per second: 157.127
I1014 15:09:07.978683 3979345 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.3566. Iters per second: 157.317
I1014 15:09:27.875543 3979345 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.42044. Iters per second: 155.752
I1014 15:09:47.558079 3979345 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.34902. Iters per second: 157.505
I1014 15:09:47.558120 3979345 PyTorchPredictorBenchLib.cpp:269] Mean milliseconds per iter: 6.38361, standard deviation: 0.037421

cache storages, local_ro:
========================================
I1014 16:15:42.292997 160496 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.66604. Iters per second: 375.088
I1014 16:15:50.622402 160496 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.68683. Iters per second: 372.186
I1014 16:15:58.901475 160496 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.67028. Iters per second: 374.493
I1014 16:16:07.156373 160496 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.66317. Iters per second: 375.492
I1014 16:16:15.474292 160496 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.68394. Iters per second: 372.587
I1014 16:16:15.474334 160496 PyTorchPredictorBenchLib.cpp:269] Mean milliseconds per iter: 2.67405, standard deviation: 0.0106982

cache storages, local:
========================================
I1014 20:53:43.113400 1657168 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.3811. Iters per second: 156.713
I1014 20:54:02.829102 1657168 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.36039. Iters per second: 157.223
I1014 20:54:22.885171 1657168 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.47333. Iters per second: 154.48
I1014 20:54:42.768963 1657168 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.41404. Iters per second: 155.908
I1014 20:55:02.624423 1657168 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.4042. Iters per second: 156.147
I1014 20:55:02.624460 1657168 PyTorchPredictorBenchLib.cpp:269] Mean milliseconds per iter: 6.40661, standard deviation: 0.0427168
```

Looks like this diff is neutral or a slight regression, but it is a stepping stone on the way to the following diff.

Reviewed By: hlu1

Differential Revision: D31326711

fbshipit-source-id: a6e0185f24a6264b1af2a51b69243c310d0d48d5
2021-11-04 15:42:22 -07:00
56dda833ff Small updates to RELEASE.md (#65489)
Summary:
Combine `xla` and `builder` branch pinning steps and link them to a PR that does it correctly
Update example PR for version bump, as few files have changed
Deleted FaceHub step as it is no longer necessary after recent update

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65489

Reviewed By: zhouzhuojie, seemethere

Differential Revision: D31120498

Pulled By: malfet

fbshipit-source-id: e1a9db2b03243c8d28eeed9888c3653e4460ad07
2021-11-04 15:39:40 -07:00
d5d342b237 Sparse CSR CUDA: Support mixed memory format input for triangular_solve (#66401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66401

This PR fixes the case when result and input tensors have different
strides.
cuSPARSE from CUDA 11.3.1 has a bug: it doesn't use correct strides to
write the result. This is "fixed" in PyTorch code by copying the input
tensor to a tensor with same strides as result tensor has.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: davidberard98

Differential Revision: D32177966

Pulled By: cpuhrsch

fbshipit-source-id: 118437409df147f04dce02763aff9bfd33f87c63
2021-11-04 15:34:42 -07:00
a20a64af4e Increased tolerance for test_zero_model_parallel tests (#67765)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67764

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67765

Reviewed By: malfet

Differential Revision: D32171621

Pulled By: mrshenli

fbshipit-source-id: 8c34f4714289cb41824f3a18822a28ed670fa0a6
2021-11-04 15:17:45 -07:00
c541c69e89 Fix minor typo in contributing.md (#67855)
Summary:
Fixes #{issue number}
No issue number, minor change

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67855

Reviewed By: malfet

Differential Revision: D32186689

Pulled By: driazati

fbshipit-source-id: 7cda19f66ff1312296d8310922bb0d221df81e46
2021-11-04 14:38:48 -07:00
8bed46ef38 [WIP][LTC] Upstream class Shape (#67672)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67672

This commit Upstreams class Shape from lazy_tensor_staging branch.

Test Plan: WIP.

Reviewed By: malfet

Differential Revision: D32095478

Pulled By: alanwaketan

fbshipit-source-id: 61611b12fc079b195833b5b22a6cf73c0935b8b9
2021-11-04 14:12:03 -07:00
e8ac8c005d [NOOP][clangformat][codemod] Enable CLANGFORMAT (#67854)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67854

Test Plan: Visual inspection. Sandcastle.

Reviewed By: zertosh

Differential Revision: D32173077

fbshipit-source-id: 10ab4b0afa18c7be4fab3e3564d9b479a7a48cb5
2021-11-04 14:07:57 -07:00
938bab0bfd [PyTorch] Add int version of vectorized PrefixSum to Benchmark (#67865)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67865

- Add int version of vectorized PrefixSum
- Use unaligned load/store instructions
- Add exclusive scan version. "exclusive" means that the i-th input element is not included in the i-th sum. For details see https://en.cppreference.com/w/cpp/algorithm/exclusive_scan

Test Plan:
```
buck build mode/opt-clang //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench
OMP_NUM_THREADS=1 numactl -m 0 -C 5 \
./buck-out/opt/gen/caffe2/benchmarks/cpp/tensorexpr/tensorexpr_bench --benchmark_filter=PrefixSumBench
```

For full benchmark results, see P465274613

```
PrefixSumBench/LocalInt/64                            57 ns         56 ns   12414048 GB/s=9.06239G/s
PrefixSumBench/LocalInt/256                          221 ns        221 ns    3160853 GB/s=9.28635G/s
PrefixSumBench/LocalInt/1024                         818 ns        817 ns     857922 GB/s=10.0235G/s
PrefixSumBench/LocalInt/4096                        3211 ns       3210 ns     217614 GB/s=10.2093G/s
PrefixSumBench/LocalInt/16384                      12806 ns      12804 ns      54805 GB/s=10.2364G/s
PrefixSumBench/LocalInt/65536                      51115 ns      51079 ns      13741 GB/s=10.2643G/s
PrefixSumBench/LocalInt/262144                    205974 ns     205912 ns       3401 GB/s=10.1847G/s
PrefixSumBench/LocalInt/1048576                   829523 ns     828859 ns        845 GB/s=10.1207G/s
PrefixSumBench/LocalIntAVX2/64                        45 ns         45 ns   15568113 GB/s=11.3549G/s
PrefixSumBench/LocalIntAVX2/256                      208 ns        208 ns    3371174 GB/s=9.86913G/s
PrefixSumBench/LocalIntAVX2/1024                     893 ns        892 ns     783154 GB/s=9.18629G/s
PrefixSumBench/LocalIntAVX2/4096                    3618 ns       3613 ns     193834 GB/s=9.06838G/s
PrefixSumBench/LocalIntAVX2/16384                  14416 ns      14411 ns      48564 GB/s=9.09543G/s
PrefixSumBench/LocalIntAVX2/65536                  57650 ns      57617 ns      12156 GB/s=9.09952G/s
PrefixSumBench/LocalIntAVX2/262144                230855 ns     230612 ns       3035 GB/s=9.09386G/s
PrefixSumBench/LocalIntAVX2/1048576               924265 ns     923777 ns        758 GB/s=9.08077G/s
PrefixSumBench/LocalIntAVX512/64                      23 ns         23 ns   24876551 GB/s=22.0697G/s
PrefixSumBench/LocalIntAVX512/256                     95 ns         95 ns    7387386 GB/s=21.556G/s
PrefixSumBench/LocalIntAVX512/1024                   435 ns        435 ns    1609682 GB/s=18.8425G/s
PrefixSumBench/LocalIntAVX512/4096                  1815 ns       1815 ns     385462 GB/s=18.0561G/s
PrefixSumBench/LocalIntAVX512/16384                 7479 ns       7476 ns      93660 GB/s=17.5335G/s
PrefixSumBench/LocalIntAVX512/65536                30171 ns      29879 ns      23430 GB/s=17.5468G/s
PrefixSumBench/LocalIntAVX512/262144              125805 ns     125631 ns       5570 GB/s=16.6929G/s
PrefixSumBench/LocalIntAVX512/1048576             504216 ns     503983 ns       1384 GB/s=16.6446G/s
PrefixSumBench/ExclusiveScanIntAVX512/64              23 ns         23 ns   30058295
PrefixSumBench/ExclusiveScanIntAVX512/256            101 ns        101 ns    7398498
PrefixSumBench/ExclusiveScanIntAVX512/1024           435 ns        434 ns    1403877
PrefixSumBench/ExclusiveScanIntAVX512/4096          1979 ns       1978 ns     354016
PrefixSumBench/ExclusiveScanIntAVX512/16384         7828 ns       7819 ns      89551
PrefixSumBench/ExclusiveScanIntAVX512/65536        31206 ns      31192 ns      22408
PrefixSumBench/ExclusiveScanIntAVX512/262144      130106 ns     130023 ns       5388
PrefixSumBench/ExclusiveScanIntAVX512/1048576     525515 ns     524976 ns       1244
```

Reviewed By: navahgar, swolchok

Differential Revision: D32011740

fbshipit-source-id: 7962de710bd588291dd6bf0c719f579c55f7c063
2021-11-04 14:00:19 -07:00
641ba36a4e fix annotation for Demultiplexer (#65998)
Summary:
cc SsnL VitalyFedyunin ejguan NivekT

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65998

Reviewed By: bdhirsh

Differential Revision: D32145926

Pulled By: ejguan

fbshipit-source-id: 60be3126fb9e73b8631b5040676264504e926707
2021-11-04 13:44:02 -07:00
da59bd1d13 TST Adds device transfer into module info tests (#65488)
Summary:
Follow up to  https://github.com/pytorch/pytorch/issues/61935

This PR adds device to device transfer test into `ModuleInfo`.

cc albanD mruberry jbschlosser walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65488

Reviewed By: mruberry

Differential Revision: D32063662

Pulled By: jbschlosser

fbshipit-source-id: 0868235a0ae7e5b6a3e4057c23fe70753c0946d2
2021-11-04 12:50:33 -07:00
3d4a6ff15d Revert D32154788: Move Concat Linear out of Optimize Numerics
Test Plan: revert-hammer

Differential Revision:
D32154788 (ea94dde573)

Original commit changeset: faa6465c89b3

fbshipit-source-id: 0dcaa65268b68ed01e6a5bc7b73ade1f51163b33
2021-11-04 12:20:02 -07:00
86aea79217 Revert D32154786: Fix Freezing Docs Parameters
Test Plan: revert-hammer

Differential Revision:
D32154786 (db15a7c0b3)

Original commit changeset: d8a2b4f39ff4

fbshipit-source-id: 657e3974a8e0ca71790adc1b031a87b7c497ea25
2021-11-04 12:20:00 -07:00
279af1a668 Revert D32154787: Formatted with Black
Test Plan: revert-hammer

Differential Revision:
D32154787 (08d630b9a6)

Original commit changeset: 6a95691c4ad9

fbshipit-source-id: 2dbcf2395071433731683f685a0351fa8604d620
2021-11-04 12:18:37 -07:00
08d630b9a6 Formatted with Black (#67792)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67792

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D32154787

Pulled By: Gamrix

fbshipit-source-id: 6a95691c4ad9d997071bb4ffc00b5eab30f90b81
2021-11-04 11:32:26 -07:00
db15a7c0b3 Fix Freezing Docs Parameters (#67201)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67201

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D32154786

Pulled By: Gamrix

fbshipit-source-id: d8a2b4f39ff477f5131c02fe8c0b1a25339ce158
2021-11-04 11:32:24 -07:00
ea94dde573 Move Concat Linear out of Optimize Numerics (#67196)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67196

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D32154788

Pulled By: Gamrix

fbshipit-source-id: faa6465c89b3676d6b1ff7c20a677738a7fbdf88
2021-11-04 11:30:39 -07:00
6f0a1f2b8d Only set sccache_epilogue to run on build job exits (#67798)
Summary:
Fixes:
* https://github.com/pytorch/pytorch/issues/65431

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67798

Reviewed By: malfet

Differential Revision: D32174810

Pulled By: boyuantan

fbshipit-source-id: 072fdc042b56e541a877074120d41645c98e41f5
2021-11-04 11:11:02 -07:00
618bab593c .github: Output expected vs. actual (#67703)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67703

Had this script fail on me within CI without actually telling me what
was wrong so adding some more output here to showcase what the actual
vs. the expected result is

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: janeyx99

Differential Revision: D32112898

Pulled By: seemethere

fbshipit-source-id: dfc9a82c709d52e0dde02d1e99a19eecc63c5836
2021-11-04 11:02:43 -07:00
90d311b268 [RPC] Add exception logging to constValue() (#67802)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67802

In RPC C++ code, we might sometimes call constValue() when the future actually has an exception, and in unittests we want to assert on the exception. What happens is that we get a message basically saying "!eptr_" which indicates there is some exception but we don't know what it is.

This diff simply adds logging for the exception and mentions that `value` over `constValue` should be used when the future can have an exception. The contract of `constValue` to throw when `eptr_` is set is still held, it is just enhanced with additional logging.
ghstack-source-id: 142375391

Test Plan: Added UT

Reviewed By: mrshenli

Differential Revision: D32156552

fbshipit-source-id: 4dd5e73b92173209074c104a4b75c2021e20de4b
2021-11-04 10:04:09 -07:00
7c739e1ab9 Resubmit #67161 (#67735)
Summary:
Skip building extensions if windows following https://github.com/pytorch/pytorch/pull/67161#issuecomment-958062611

Related issue: https://github.com/pytorch/pytorch/issues/67073

cc ngimel xwang233 ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67735

Reviewed By: bdhirsh

Differential Revision: D32141250

Pulled By: ngimel

fbshipit-source-id: 9bfdb7cf694c99f6fc8cbe9033a12429b6e4b6fe
2021-11-04 09:59:30 -07:00
8b0c2c18eb Fix pretrained=True for test_pt_onnx_trt (#67818)
Summary:
Addresses https://github.com/pytorch/pytorch/pull/66312#issuecomment-960357403

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67818

Reviewed By: malfet

Differential Revision: D32161208

Pulled By: janeyx99

fbshipit-source-id: 076e52ddc8718c74eb2941e867d92bfa4fe70f80
2021-11-04 09:49:42 -07:00
af1bd88fc4 Allow scalars for aliased binary ops {multiply, subtract, divide} (#65937)
Summary:
https://github.com/pytorch/pytorch/issues/65868 pointed out that the "long-form" versions of some binary ops like `mul`, `sub`, and `div` don't match their alias's behavior when it comes to handling scalar inputs. This PR adds the missing registration in `python_arg_parser.cpp` to resolve this.

CC ptrblck ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65937

Reviewed By: malfet

Differential Revision: D32156580

Pulled By: ngimel

fbshipit-source-id: b143cf7119a8bb51609e1b8734204edb750f0210
2021-11-04 09:36:45 -07:00
bd8feb33d4 Update distributed contributing guide to show how to run one test in test_distributed_spawn (#67801)
Summary:
Running one test in test_distributed_spawn is a bit confusing but possible. Add documentation to the CONTRIBUTING.md for this.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67801

Reviewed By: mrshenli

Differential Revision: D32157700

Pulled By: rohan-varma

fbshipit-source-id: a1d10f2fb5f169b46c6d15149bf949082d9bd200
2021-11-04 08:54:31 -07:00
4262c8913c Remove native_functions.yaml dependency from TensorTopK.cu (#66794)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66794

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D31856104

Pulled By: dagitses

fbshipit-source-id: 2b9c0e1072455c5019c6f681faa3de848b3dae46
2021-11-04 08:32:06 -07:00
927da4d32f Remove native_functions.yaml dependency from Sort.cu (#66793)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66793

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31856100

Pulled By: dagitses

fbshipit-source-id: 1469ce1deb4124f2a9e160a8e3298d56ac3f6561
2021-11-04 08:30:40 -07:00
61ed9285dd Automated submodule update: tensorpipe (#67845)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe).

New submodule commit: d2aa3485e8

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67845

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: lw

Differential Revision: D32170821

fbshipit-source-id: 1958e824a9f02c5178fa5d4a73a171dedefc540c
2021-11-04 08:24:05 -07:00
cfd998c197 Remove ProcessGroup RPC backend placeholder as part of 1.11 (#67363)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67363

ProcessGroup RPC backend is deprecated. In 1.10 it would throw an error to the user to be more user friendly. This PR now removes it completely.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D32138321

Pulled By: H-Huang

fbshipit-source-id: b4f700d8f1b1d46ada7b5062d3f754646571ea90
2021-11-04 07:57:58 -07:00
8e1ead8e4d Fix the kl_div docs (#67443)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67443

Fixes https://github.com/pytorch/pytorch/issues/57459

After discussing the linked issue, we resolved that `F.kl_div` computes
the right thing as to be consistent with the rest of the losses in
PyTorch.

To avoid any confusion, these docs add a note discussing how the PyTorch
implementation differs from the mathematical definition and the reasons
for doing so.

These docs also add an example that may further help understanding the
intended use of this loss.

cc brianjo mruberry

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D32136888

Pulled By: jbschlosser

fbshipit-source-id: 1ad0a606948656b44ff7d2a701d995c75875e671
2021-11-04 07:09:08 -07:00
04fe4382ec Automated submodule update: tensorpipe (#67769)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe).

New submodule commit: caa2ccb394

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67769

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: lw

Differential Revision: D32138256

fbshipit-source-id: dfe4c73ae25c8f362f2917dd7594bdcd418c2a0d
2021-11-04 01:13:19 -07:00
b8d365ca3a ci fix (#67826)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67826

Reviewed By: Chillee

Differential Revision: D32164770

Pulled By: mruberry

fbshipit-source-id: c1de7e6db6d0cb1761388f1ea0178dbff3fe6dc8
2021-11-04 00:16:47 -07:00
1baed45c6b [fbcode][static runtime] out-variant for quantized::linear_dynamic_fp16 (#67663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67663

mostly follow the example of quantized::linear (D28428734 (4d7abdbdad)) to enable out-variant for quantized::linear_dynamic_fp16.

Reason being from MP tab ctr pytorch model migration, we observe quantized::linear_dynamic_fp16 operator has highest cost but not enable out-variant yet https://fburl.com/phabricator/b5juus2d

Test Plan:
buck build mode/opt caffe2/caffe2/fb/predictor:ptvsc2_predictor_bench

  sudo watch -n 20 /usr/local/fbprojects/dynamoserver/bin/turboDriver disable

  MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor:ptvsc2_predictor_bench -- --scripted_model=/home/bwen/models/991103061_4/991103061_4.predictor --pt_inputs=/home/bwen/models/991103061_4/pt_inputs --method_name=forward --pt_cleanup_activations=1 --pt_enable_out_variant=1 --pt_optimize_memory=1 --iters=1000 --warmup_iters=1000 --num_threads=1 --repetitions=3 --do_profile=1 --do_benchmark=1 --set_compatibility=1 --compare_results=1 --pt_enable_static_runtime 2>&1 | pastry

before: P465201159

  0.929067 ms.     31.808%. quantized::linear_dynamic_fp16 (16 nodes)
  0.921679 ms.    31.7324%. quantized::linear_dynamic_fp16 (16 nodes)
  0.919127 ms.    31.7404%. quantized::linear_dynamic_fp16 (16 nodes)

after: P465203015

  0.90898 ms.    31.0205%. quantized::linear_dynamic_fp16 (16 nodes, out variant)
  0.9127 ms.      30.62%. quantized::linear_dynamic_fp16 (16 nodes, out variant)
  0.879148 ms.    31.0161%. quantized::linear_dynamic_fp16 (16 nodes, out variant)

unit test logic refers https://fburl.com/code/vv0rry13

  buck run mode/opt caffe2/benchmarks/static_runtime:static_runtime_cpptest

Reviewed By: hlu1

Differential Revision: D32001168

fbshipit-source-id: 873d9f77434b9c4bafb298c871173f9a560dd2a3
2021-11-03 22:39:04 -07:00
99c7a9f09d fix bfloat16 autocast skip (#67822)
Summary:
Per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67822

Reviewed By: mruberry

Differential Revision: D32162605

Pulled By: ngimel

fbshipit-source-id: eb5ccf6c441231e572ec93ac8c2638d028abecad
2021-11-03 21:02:37 -07:00
2486061c72 [JIT] make x (+ or -) 0 and x (* or /) 1 peepholes type promotion aware (#67688)
Summary:
Some of the "no-ops" are not actually no-ops because they can change the dtype

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67688

Reviewed By: davidberard98

Differential Revision: D32104601

Pulled By: eellison

fbshipit-source-id: ccb99179a4b30fd20b5a9228374584f2cdc8ec21
2021-11-03 20:11:46 -07:00
88d86de7d8 Add lint to ensure all test files have headers with ownership info (#66826)
Summary:
UPDATE: CI should be green now with the added files.

This should fail for now, but will pass when all action for https://github.com/pytorch/pytorch/issues/66232 is done.

Example failure run: https://github.com/pytorch/pytorch/runs/4052881947?check_suite_focus=true

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66826

Reviewed By: seemethere

Differential Revision: D32087209

Pulled By: janeyx99

fbshipit-source-id: ad4b51e46de54f23aebacd592ee67577869f8bb6
2021-11-03 18:21:49 -07:00
2766662ca9 [PyTorch][2/N] Basic implementation of ShardedEmbeddingBag using ShardedTensor. (#67188)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67188

This diff/PR is trying to implement the ShardedEmbeddingBag using the ShardedTensor.

We support both row-wise and column-wise sharding of the embedding bag. The detailed logic can be found in the comment.

Several caveats:
1. Only the sharding of one weight is supported now.
1. We support limited input params for the op. To support more params are on the way.
2. We only support chuck sharding for now.
3. We only support a single local shard per rank for now.

Some other changes include:
1. Refactor the ShardedEmbedding code so that the common logic can be reused.
2. Fix tiny typos and corner cases in API `get_chunked_dim_size`. Where it will return -1 if the we set the dim_size = 5, split_size = 2, idx = 3. (This is a valid case because when chunks = 4, dim_size = 5, then the split_size = 2)
ghstack-source-id: 142325915

Test Plan: Unit test and CI

Reviewed By: pritamdamania87

Differential Revision: D31749458

fbshipit-source-id: ed77e05e4ec94ef1a01b1feda8bbf32dc5d5da1b
2021-11-03 17:39:18 -07:00
fd77fff0b1 [FSDP] customizable backend in test (#67135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67135

Add ability to use env var backend for quicker testing (and gloo2 in
the future)
ghstack-source-id: 142274304

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D31878285

fbshipit-source-id: 80ae7107cd631a1a15ebc23262b27d8192cfe4b6
2021-11-03 15:45:52 -07:00
83e8612d11 Clean up test autograd (#67413)
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/66066

This PR:
 - cleans up op-specific testing from test_autograd. test_autograd should be reserved for testing generic autograd functionality
 - tests related to an operator are better colocated
 - see the tracker for details

What to think about when moving tests to their correct test suite:
 - naming, make sure its not too generic
 - how the test is parametrized, sometimes we need to add/remove a device/dtype parameter
 - can this be merged with existing tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67413

Reviewed By: jbschlosser, albanD

Differential Revision: D32031480

Pulled By: soulitzer

fbshipit-source-id: 8e13da1e58a38d5cecbfdfd4fe2b4fe6f816897f
2021-11-03 15:26:09 -07:00
ca445645f9 Revert D31902471: [nnc] Add support for dynamic shapes in TensorExprKernel
Test Plan: revert-hammer

Differential Revision:
D31902471 (15a3c374e2)

Original commit changeset: d2729a38ba1a

fbshipit-source-id: 4c05de82e626bbf744df84fd2b914b66fd165a19
2021-11-03 14:48:12 -07:00
603116a6ae [Core ML][easy] Assign missing properties to the executor (#67737)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67737

As title says
ghstack-source-id: 142277212

Test Plan:
- buck test pp-ios
- circleci

Reviewed By: hanton

Differential Revision: D32123661

fbshipit-source-id: eff3068669f8fdc573dc81b04bcc20ef153d8c4a
2021-11-03 14:15:53 -07:00
fddfb81dd0 Add BF16 type to _autocast_to_full_precision (#67707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67707

https://github.com/pytorch/pytorch/pull/63939/files has added FP16 support to torchscript.

This is to add BF16 device type when doing full conversion.

Test Plan: Unit test. Also tested BF16 locally on A100 using MLP model.

Reviewed By: idning

Differential Revision: D32027152

fbshipit-source-id: b2a5ff2b22ea1e02306b0399f2b39b8493be4f45
2021-11-03 14:06:50 -07:00
05e17e7ff6 Add API usage logging for several other RPC APIs. (#67722)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67722

ghstack-source-id: 142259452

Test Plan: waitforbuildbot

Reviewed By: jaceyca, fduwjj

Differential Revision: D32118872

fbshipit-source-id: 041ab5601221b1846c56ce4bb63364bec9ad28b0
2021-11-03 14:02:00 -07:00
5fd93fb5f8 broaden retries on TestHub (#67779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67779

Not all flaky failures from this test are URLErrors; I think we should
err on the side of being expansive with retries here.

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D32145434

Pulled By: suo

fbshipit-source-id: 3c3274b2080681fcafb3ea6132e420605f65c429
2021-11-03 13:48:58 -07:00
89b02fc70b [StaticRuntime][Easy] Correct typos in test_static_runtime (#67739)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67739

Test Plan:
```
buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest
```

Reviewed By: mikeiovine

Differential Revision: D32125879

fbshipit-source-id: bd989e5088edff87624b858bd9045dfe9da3fbe7
2021-11-03 13:24:46 -07:00
4d601a1c36 codegen: Split up source, header and Declarations.yaml generation (#67497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67497

This allows more of the code-generation to happen in parallel, whereas
previously all codegen was serialized.

Test Plan: Imported from OSS

Reviewed By: dagitses, mruberry

Differential Revision: D32027250

Pulled By: albanD

fbshipit-source-id: 6407c4c3e25ad15d542aa73da6ded6a309c8eb6a
2021-11-03 13:20:54 -07:00
fe91906ad7 Remove Declarations.yaml dependency from gen_autograd (#67496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67496

gen_autograd.py doesn't use `Declarations.yaml` any more, and removing
the dependency allows it to run in parallel with
`tools/codegen/gen.py`.

Test Plan: Imported from OSS

Reviewed By: dagitses, ejguan

Differential Revision: D32027251

Pulled By: albanD

fbshipit-source-id: 2cc0bbe36478e6ec497f77a56ab8d01c76145703
2021-11-03 13:19:24 -07:00
9b1caca185 [SR] Macro to clean up c10::Symbol maps in passes (#67484)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67484

Maps from `c10::Symbol -> c10::Symbol` can be hard to parse when `fromQualString` is scattered everywhere. I've been annoyed by this issue many times when rebasing, and have even messed up `FuseListUnpack` a few times.

Introduce a macro to make it easier to see what maps to what.

Test Plan: CI

Reviewed By: hlu1

Differential Revision: D32004451

fbshipit-source-id: 1086254c8403a0880d014512c439edbefa6fa015
2021-11-03 12:57:07 -07:00
0eaa01ead1 [SR] Add EliminateTrivialEquallySplit graph pass (#67166)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67166

This optimization is not really the same thing as `FuseListUnpack`, and mixing the logic in that pass is confusing and error-prone. It should really be its own pass.

It's slower since we have to do another pass over the graph, but this is not perf critical code; readability is more important.

Test Plan: Unit tests: `buck test caffe2/benchmarks/static_runtime/...`

Reviewed By: hlu1

Differential Revision: D31887458

fbshipit-source-id: 289e281d512435861fccfe19f017751ad015688c
2021-11-03 12:57:05 -07:00
6cc6a5fd9d Fix a bug in TorchBench ondemand CI. (#67743)
Summary:
Use the main branch when TorchBench branch is not specified.

RUN_TORCHBENCH: soft_actor_critic

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67743

Reviewed By: seemethere

Differential Revision: D32142663

Pulled By: xuzhao9

fbshipit-source-id: 160227835543b8e55c970025073839bf0f03aa81
2021-11-03 12:55:52 -07:00
f455030931 Adding a docstring for memoryless in observer args (#67690)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67690

see title [skip ci]

Test Plan:
python setup.py develop

Imported from OSS

Reviewed By: ejguan

Differential Revision: D32107512

fbshipit-source-id: da5668339716d44720672f7b71a991b23530461e
2021-11-03 12:46:44 -07:00
98be5216e2 Revert D32104006: [pytorch][PR] Added forward derivatives for neg, diag, inverse, linalg_eig
Test Plan: revert-hammer

Differential Revision:
D32104006 (88c61b8d06)

Original commit changeset: 1f6ace09ee3e

fbshipit-source-id: f9f950b4177e1fe29b9059f4b5dfb9c8c67f479a
2021-11-03 12:40:00 -07:00
6df0d7d502 [lint] add basic lintrunner compatibility (#67110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67110

Adds support for using lintrunner with:
- clang-format
- clang-tidy
- flake8
- mypy

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D32145555

Pulled By: suo

fbshipit-source-id: 2150348e26fba4ae738cd0b9684b2889ce0f1133
2021-11-03 12:35:28 -07:00
89c4e8c22b [NOOP][clangformat][codemod] Enable CLANGFORMAT for some folders in caffe2/* (#67746)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67746

Test Plan: Visual inspection. Sandcastle.

Reviewed By: zertosh

Differential Revision: D31986646

fbshipit-source-id: 91885c20c3cead3853c49abb9fe0a94a67f33cc8
2021-11-03 12:23:14 -07:00
a5b57c9433 Avoid prematurely casting GEMM parameters alpha, beta to scalar_t (#67633)
Summary:
stas00 uncovered an issue where certain half-precision GEMMs would produce outputs that looked like the result of strange rounding behavior (e.g., `10008.` in place of `10000.`). ptrblck suspected that this was due to the parameters being downcasted to the input types (which would reproduce the problematic output). Indeed, the GEMM and BGEMM cublas wrappers are currently converting the `alpha` and `beta` parameters to `scalar_t` (which potentially is reduced precision) before converting them back to `float`. This PR changes the "ARGTYPE" wrappers to use `acc_t` instead and adds a corresponding test.

CC ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67633

Reviewed By: mruberry

Differential Revision: D32076474

Pulled By: ngimel

fbshipit-source-id: 2540d9b9d0195c17d07d1161374fb6a5850779d5
2021-11-03 12:01:04 -07:00
3f33ada8d5 .github: Forward fix generating GHA workflows (#67777)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67777

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: janeyx99

Differential Revision: D32143785

Pulled By: seemethere

fbshipit-source-id: fb129244bdd46ffda05ed51b16183395152d7296
2021-11-03 11:36:27 -07:00
15a3c374e2 [nnc] Add support for dynamic shapes in TensorExprKernel (#67197)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67197

Test Plan: Imported from OSS

Reviewed By: eellison, ZolotukhinM

Differential Revision: D31902471

Pulled By: navahgar

fbshipit-source-id: d2729a38ba1ac607ff07f516ed56fbd9085715dc
2021-11-03 11:24:17 -07:00
88c61b8d06 Added forward derivatives for neg, diag, inverse, linalg_eig (#67339)
Summary:
See also discussion in https://github.com/pytorch/pytorch/issues/10223, starting from [this](https://github.com/pytorch/pytorch/issues/10223#issuecomment-949499666) comment

The formulas for the derivatives are taken from https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf.

As indicated, the method linalg_eig_jvp should be used instead of linalg_eig_jvp_eigenvalues and linalg_eig_jvp_eigenvectors in the future. Due to a codegen limitation, this is not yet possible.

CC albanD Lezcano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67339

Reviewed By: ejguan

Differential Revision: D32104006

Pulled By: albanD

fbshipit-source-id: 1f6ace09ee3e737b99520543b30550601809ceb5
2021-11-03 11:21:54 -07:00
a23814577b Overload TestCase not vanilla TestCase for some elastic tests (#67700)
Summary:
Addresses a bit of https://github.com/pytorch/pytorch/issues/66903

Fixes it so that https://github.com/pytorch/pytorch/issues/66207 can be properly disabled

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67700

Reviewed By: H-Huang

Differential Revision: D32116908

Pulled By: janeyx99

fbshipit-source-id: 205ff68a7408609cfced2357fd99f41949ef6390
2021-11-03 11:14:52 -07:00
201f7d330a Remove duplicate check in distributions arg validation (#67741)
Summary:
Partial fix for https://github.com/pytorch/pytorch/issues/66800. (Duplicate of https://github.com/pytorch/pytorch/issues/67725 against pytorch/pytorch so as to trigger TorchBench)

https://github.com/pytorch/pytorch/issues/61056 added a more verbose error message for distributions failing argument validation. However, it did not replace the earlier error check as was originally intended and was flagged by xuzhao9 as being the potential cause of a perf regression in `test_eval[soft_actor_critic-cuda-eager]`.

xuzhao9: Is there a way for me to check if this resolves the perf issue you mentioned?

cc VitalyFedyunin ngimel

Note that existing tests already check for the error message and should verify that the removed lines are redundant.

RUN_TORCHBENCH: soft_actor_critic

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67741

Reviewed By: neerajprad

Differential Revision: D32135675

Pulled By: xuzhao9

fbshipit-source-id: 37dfd3ff53b95017c763371979ab3a2c302a72b9
2021-11-03 10:41:41 -07:00
1ffd43cf0c generated-pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit migrated to GHA (#67695)
Summary:
in scope of: https://github.com/pytorch/pytorch/issues/67301. Main changes:
* generated-pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit deleted from circle
* pytorch_android_gradle_custom_build_single removed since it is no longer used
* generated-pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit added to GHA

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67695

Reviewed By: malfet, seemethere, ejguan

Differential Revision: D32115620

Pulled By: b0noI

fbshipit-source-id: 113d48303c090303ae13512819bac2f069a2913f
2021-11-03 10:29:37 -07:00
4a106e41e9 [fx2trt] Add torch.nn.function.pad support for fx2trt (#67498)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67498

Add acc_ops.pad and a converter for it. We want to try padding convolution channel dimension to get better int8 performance.

This one only support padding the last two dimension though. Starting from 8.2, it's suggested to use Slice layer to do padding but this might be nice to have for old version support.

Test Plan: buck test mode/dev-nosan caffe2/test/fx2trt/converters:test_pad

Reviewed By: wushirong

Differential Revision: D32006072

fbshipit-source-id: 96c3aa2aec2d28345d044a88bee2f46aba5cca0e
2021-11-03 10:21:08 -07:00
383c1f51b1 [nnc] Fixed handling of 0-sized tensors in cat (#67734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67734

The implementation of `aten::cat` op in NNC has to ignore tensors that have 0-size in any dimension.

Test Plan: `buck test mode/dev-nosan //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - Kernel.CatWithEmptyInputs'`

Reviewed By: ZolotukhinM

Differential Revision: D32122171

fbshipit-source-id: 90c697813bc504664673cdc262df6e7ce419c655
2021-11-03 10:16:16 -07:00
31cf3d6aad Fix adaptive_max_pool2d for channels-last on CUDA (#67697)
Summary:
Fix https://github.com/pytorch/pytorch/issues/67239

The CUDA kernels for `adaptive_max_pool2d` (forward and backward) were written for contiguous output. If outputs are non-contiguous, first create a contiguous copy and let the kernel write output to the contiguous memory space. Then copy the output from contiguous memory space to the original non-contiguous memory space.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67697

Reviewed By: ejguan

Differential Revision: D32112443

Pulled By: ngimel

fbshipit-source-id: 0e3bf06d042200c651a79d13b75484526fde11fe
2021-11-03 09:47:29 -07:00
ff5c61a74e [TensorExpr] Add lowering for aten::max (reduction). (#66519)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66519

Differential Revision:
D31590853
D31590853

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: a702621621f681d7f5392912e8a77ca124e14170
2021-11-03 09:44:09 -07:00
00afe9ba7b [TensorExpr] Add lowering for aten::embedding. (#66518)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66518

Differential Revision:
D31590855
D31590855

Test Plan: Imported from OSS

Reviewed By: pbelevich

Pulled By: ZolotukhinM

fbshipit-source-id: aace0a87b1649330dae44182f7873aca27160d64
2021-11-03 09:44:07 -07:00
008a58d226 [TensorExpr] Add lowering for aten::conv1d. (#66517)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66517

Differential Revision:
D31590856
D31590856

Test Plan: Imported from OSS

Reviewed By: pbelevich

Pulled By: ZolotukhinM

fbshipit-source-id: c05a37d8741acd0606c2adb8d6cfeb1f57bc8aa0
2021-11-03 09:44:05 -07:00
d58ef2bbff [TensorExpr] Fix lowering for aten::softmax for the case when dtype parameter is None. (#66516)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66516

Differential Revision:
D31590858
D31590858

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 0aeee7a5be64b3b9c8fa00aacb1a94031a7e25d1
2021-11-03 09:42:48 -07:00
ea4d983885 Modify "gemm" code to enable access to "sbgemm_" routine in OpenBLAS (#58831)
Summary:
OpenBLAS recently added support for bfloat16 GEMM, so this change has PyTorch call out to OpenBLAS for that, like it does for single and double precision

Our goal is to try to enable PyTorch to make calls to "sbgemm" in OpenBLAS.

We are prepared (if it is your preference) to add fences to the code to limit this change to the Power architecture,
but our first instinct is that anyone on any architecture that enables access to sbgemm in their OpenBLAS library
should be able to use this code.  (but again, we respect that as we are just starting to modify PyTorch, we respect
your guidance!)

(there is no issue number related to this)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58831

Reviewed By: albanD

Differential Revision: D29951900

Pulled By: malfet

fbshipit-source-id: 3d0a4a638ac95b2ff2e9f6d08827772e28d397c3
2021-11-03 08:53:27 -07:00
05d1dcc14c Split channels_last test cases for tensor conversion OpInfos (#67368)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67368

This PR adds an addition test variant for the tensor conversion
functions (bfloat16, char, long, ...) that tests channels_last. This is
because some backends (mostly just functorch right now) don't have
channels last handling and may want to test that separately from the
more general case of these operations.

Test Plan: - wait for tests

Reviewed By: mruberry

Differential Revision: D31972959

Pulled By: zou3519

fbshipit-source-id: 68fea46908b2cdfeb0607908898bb8f9ef25b264
2021-11-03 07:39:41 -07:00
92a85ecbab add a quantized hardsigmoid inplace variant (#65740)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65740

fp32 hardsigmoid supports inplace. This PR adds the inplace support to the quantized
hardsigmoid function, to make the signatures match.

Test Plan:
```
python test/test_quantization.py TestQuantizedOps.test_qhardsigmoid
```

Reviewed By: supriyar

Differential Revision: D31992282

Pulled By: vkuzo

fbshipit-source-id: f6be65d72954ab8926b36bb74a5e79d422fbac90
2021-11-03 07:35:31 -07:00
e32d7f7525 ATen | Fix potential crash if MTLCreateSystemDefaultDevice return nil (#66859)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66859

`MTLCreateSystemDefaultDevice` can return `nil`. If that happens then inside `createDeviceInfo`, we'll crash trying to convert the `nullptr` from `device.name.UTF8String` into a `std::string`.

Let's fix it by returning early in setup if there's no Metal device. But also make `createDeviceInfo` safe if we do pass in `nil`.

Test Plan: * CircleCI

Reviewed By: xta0

Differential Revision: D31759690

fbshipit-source-id: 74e878ab5b8611250c4843260f1d2e4eab22cdaf
2021-11-03 03:03:45 -07:00
510336499b [PyTorch][Static Runtime] Separate overlap checks for easier debugging (#66637)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66637

We can give more information when verify_no_memory_overlap would fail by separating the DCHECK.
ghstack-source-id: 142226105

Test Plan: fitsships

Reviewed By: d1jang

Differential Revision: D31517151

fbshipit-source-id: 8cbc324c27f6b4db4489d1bd469d37b1d8ae6ce1
2021-11-02 23:59:04 -07:00
3db536e55e add jit_trace_module python binding (#67425)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67425

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D31998564

Pulled By: Krovatkin

fbshipit-source-id: f7e38c8c3f560f2c4e5ed62e1acae2c100efebd4
2021-11-02 23:55:23 -07:00
a8757cdd70 type inputs (#67424)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67424

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D31998565

Pulled By: Krovatkin

fbshipit-source-id: 8a2b8b3f13a361fe8fce7c7c930bbfd357ef8ac1
2021-11-02 23:55:21 -07:00
d352587210 add a few convenience helpers to removeAllXXX to Block and Node (#67423)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67423

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D31998566

Pulled By: Krovatkin

fbshipit-source-id: ed435d5c35e44ab2676c47b43d6e2aa8e79d9ab2
2021-11-02 23:54:02 -07:00
7f3326a6d2 [FSDP] CPU offload resubmit (#67249)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67249

Implements CPU offload for model parameters in FSDP.

- CPU offload class with only offload_params attribute is created
- If this is specified in FSDP ctor, model parameters are moved back to CPU after sharding in __init__
- In forward pass, during lazy init, p._local_shard gets set to p.data so it is on CPU. We pin_memory here.
- In forward pass, in _rebuild_full_params, we move p.data back to self.compute_device if necessary. Note that we don't use the device of p._full_param_padded because we don't always have this attr, but when we do its always the same as compute_device.
- The same logic as above applies to the beginning of backwards pass.
- At end of fwd and end of bwd, `_use_param_local_shard` takes care to ensure the parameters are offloaded to CPU again, by pointing it to p._local_shard, which is always on CPU.

Regarding tests:
- We tests 3 different types of init: 1) CUDA the model before wrapping with FSDP, 2) CUDA the model after wrapping with FSDP, 3) never CUDA the model.
- Case 1 is always supported. Case 2 is not supported with CPU offload and throws an error during fwd pass. Case 3 is only supported with CPU offload at the moment.
- Verifies all params are offloaded to CPU after init.
- Verifies all params are offloaded to CPU after forward and backward.
- Note that there is an issue with verifying exact parity when CPU offloading, but it appears to be related to transfering model back and forth cpu/CUDA. More details in https://github.com/pytorch/pytorch/pull/66961
ghstack-source-id: 141851903

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D31911085

fbshipit-source-id: 3ddf73c070b55ce383e62251868d609004fc30e7
2021-11-02 23:27:34 -07:00
06d1be2447 [NOOP][clangformat][codemod] Enable CLANGFORMAT for caffe2/caffe2/* (#67624)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67624

Test Plan: Visual inspection. Sandcastle.

Reviewed By: malfet

Differential Revision: D31986628

fbshipit-source-id: c872bded7325997a2945dbf5d4d052628dcb3659
2021-11-02 22:14:04 -07:00
e86a5a3a1a [Static Runtime] Add PyTorchPredictor::predict_managed_result to return managed output tensors (#65598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65598

This change adds `PyTorchPredictor::predict_managed_result` to enable Static Runtime to return managed output tensors, allocated and owned by Static Runtime to accelerate inference workloads.

- `PyTorchPredictor::predict_managed_result` does only meaningful work for the overridden `PyTorchStaticRuntimePredictor::predict_managed_result`. For other subclasses, it returns a simple object that just wraps the returned `Ivalue`.

- When `manage_output_tensors` is enabled, a `StaticRuntime` cannot be reentered until its return value gets deallocated by calling `StaticRuntime::deallocateOutputTensors`. Currently an instance of `StaticRuntime` gets immediately pushed back to `static_runtime_pool` to be reentered again, and this cannot be done when `manage_output_tensors` is enabled. `PyTorchStaticRuntimePredictorManagedResult` makes sure to delay pushing a `StaticRuntime` instance back to the pool only after `StaticRuntime::deallocateOutputTensors` is called on the runtime instance.

- When `manage_output_tensors` is enabled, `PyTorchStaticRuntimePredictor::predict_managed_result` returns the prediction result, whose backing memory is managed by an instance of `StaticRuntime`. The lifetime of any value reachable from `PyTorchStaticRuntimePredictorManagedResult.get()` is expected to end before `PyTorchStaticRuntimePredictorManagedResult` gets destructed. As explained above, `PyTorchPredictorManagedResult`'s destruction pushes the runtime instance that returned the result back to `static_runtime_pool` to be reused again.

- The current API design of adding `predict_managed_result` instead of forcing `operator()` to return `PyTorchPredictorManagedResult` was motivated by the fact that `manage_output_tensors` will be selectively enabled just for a few models. In case `manage_output_tensors` becomes a commonly used feature we should revisit this API design to merge them together.

Reviewed By: hlu1

Differential Revision: D31149323

fbshipit-source-id: 5ca026188077232d6a49a46759124a978439d7b2
2021-11-02 22:10:26 -07:00
18955d3564 Raise warning when calling collectives on non-member group objects (#67639)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67639

Due to BC considerations, we cannot directly error out, as that
might break existing applications. Raise warnings first to improve
debuggability.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D32075151

Pulled By: mrshenli

fbshipit-source-id: 5680d420f5f6cd3f74a36616c03350e8a976b363
2021-11-02 20:04:07 -07:00
54241a9cfa [quant][fx] Add support for fused modules in _convert_do_not_use (#67245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67245

Add support for fused modules in the new convert path, including linear-relu, conv{1-3}d-relu and their qat versions,
also tested with trt (conv2d-relu and linear-relu)

Test Plan:
```
python test/fx2trt/test_quantize_fx.py TestQuantizeFxTRTOps.test_linear_relu_module
python test/fx2trt/test_quantize_fx.py TestQuantizeFxTRTOps.test_conv_relu_module
```

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31919724

fbshipit-source-id: 7e5c96eba30706f7989da680aa3443159847bdfd
2021-11-02 19:21:54 -07:00
91971dfc2a [BE] [GHA] Use aws ecr get-login-password (#67709)
Summary:
Replacing `aws ecr get-login` with `awc ecr get-login-password`, per https://docs.aws.amazon.com/cli/latest/userguide/cliv2-migration.html#cliv2-migration-ecr-get-login

Follow up after the similar change in CircleCI: https://github.com/pytorch/pytorch/pull/58308

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67709

Reviewed By: seemethere, janeyx99

Differential Revision: D32119319

Pulled By: malfet

fbshipit-source-id: 0cd0d8f4d81e9981a5f8fbf9b812a9167fd48135
2021-11-02 19:06:50 -07:00
16ee6409ee Changed value constraint of exponential dist (#67184)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67183.

cc fritzo neerajprad alicanb nikitaved

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67184

Reviewed By: ejguan

Differential Revision: D32114661

Pulled By: neerajprad

fbshipit-source-id: ea23e59f38a23a7b0bab4fbbd98ae3feba468b9c
2021-11-02 17:44:56 -07:00
885da61d7d [PG NCCL] Disable NCCL health check (#67668)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67668

This adds an env var to enable NCCL health check, which when left unspecified, results in the check not being run. Unit tests that need to test this functionality have the env variable set. Please see internal diff for more details.

Test Plan: CI

Reviewed By: yuguo68, mrshenli

Differential Revision: D32089763

fbshipit-source-id: dff5664a5e607f711515cd1042089ca769914fbb
2021-11-02 16:21:59 -07:00
0b2f68eadf Remove special FX OpInfo list (#67520)
Summary:
Most of the failing tests are since the test doesn't work with python functions (only builtins like `torch.add`).

I added a check for that and ported the remaining skips into the `skips` field.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67520

Reviewed By: ZolotukhinM

Differential Revision: D32046856

Pulled By: Chillee

fbshipit-source-id: 05fa3e3c40fa6cc4f776e0c24f667629b14afd25
2021-11-02 16:01:46 -07:00
96e3d1a76c Remove native_functions.yaml dependency from Sorting.cu (#66621)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66621

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D31856099

Pulled By: dagitses

fbshipit-source-id: d9c2b6b45099e49c7beaae5888140de350d23696
2021-11-02 14:46:29 -07:00
7deb1726ea Remove native_functions.yaml dependency from ScanKernels.cu (#66620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66620

This splits the Tensor-dependant code out into a cpp file.

A slight complicating factor is `scan_dim` using `copy_` to handle
non-contiguous out arguments. So, I've moved that code into the
caller which does introduce some duplication. Though it's only ~10
lines extra in total.

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D31856106

Pulled By: dagitses

fbshipit-source-id: 91bb4ce5e7c6487e3ea0d5ec4d9f7a625d8ef978
2021-11-02 14:45:17 -07:00
9e97ccbd7a .github: Migrate iOS workflows to GHA (#67645)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67645

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D32104367

Pulled By: seemethere

fbshipit-source-id: 08ff043ed5d0b434322f1f3f20dce2a4f5fa88c1
2021-11-02 14:38:43 -07:00
a831713786 [PyTorch Edge] Use Integer Subtraction (Instead of Float) in Non-FBGEMM Dequantization (#67115)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67115

This matches what FBGEMM does (https://fburl.com/code/vjrdn6tjhttps://fburl.com/code/btkdn24l)

Benchmark Mobile Vision Transformer Model Results (as described in D31066997 and config from rebasing onto v4 of D31869106):

This diff (v18):
- NET latency: 109.866
- https://our.intern.facebook.com/intern/aibench/details/536304563225483

This diff before using vsubl (v14 but rebased onto v22 of D31205883, the previous diff in this stack)
- NET latency: 115.887
- https://our.intern.facebook.com/intern/aibench/details/906978557243297

Before this diff (v22 of D31205883):
- NET latency: 116.449
- https://our.intern.facebook.com/intern/aibench/details/870678436773989

ghstack-source-id: 142166375

Test Plan: Phabricator tests + Running quantized_test on a pixel3a passes and Running mobile vision transformer model (as described in D31066997) both work

Reviewed By: kimishpatel

Differential Revision: D31483135

fbshipit-source-id: fbef00cad6087b49900d21c3dd3b6fd432f64e94
2021-11-02 14:28:03 -07:00
23bd3cf5b2 [PyTorch Edge] Parallelize Quantize and Dequantize Tensor (#65845)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65845

Benchmarking of Non-Parallelized and Parallelized quantization/dequantization for various devices and input sizes done in this notebook:
https://www.internalfb.com/intern/anp/view/?id=1204834&scroll_cell=17&checkpoint_id=432447238302644

For example:
- {F671713127}
- {F671713209}
- {F671713238}
- {F671713253}

When run on Partially Quantized Mobile Vision Transformer Model (as described in D31066997:

Before this diff (on D31444248 v7):
- [120.907ms](https://our.intern.facebook.com/intern/aibench/details/945891590820680)

With this diff (v19):
- Threshold = 2^16: [118.086ms](https://our.intern.facebook.com/intern/aibench/details/436376817372377)
- Threshold = 2^20: [118.361ms](https://our.intern.facebook.com/intern/aibench/details/617543354077290)

ghstack-source-id: 142166374

Test Plan:
Same as previous diff (D31066997)

All tests pass

Also, set numel to 2^21 in quantized_test TestArmVectorizedAndParallelQuantizeDequantize (https://www.internalfb.com/diff/D31066997?dst_version_fbid=596325738080019&transaction_fbid=219437170135898) and the tests passed

Reviewed By: kimishpatel

Differential Revision: D31205883

fbshipit-source-id: 9ed0b11a376734feaf228074a24b8eb79d5270a3
2021-11-02 14:28:01 -07:00
92cfda1785 [PyTorch Edge] Clean up Quantize Tensor code (#66220)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66220

- Pass pointers rather than tensors to ```quantize_tensor_arm``` to allow for using ```__restrict__``` and to make parallelization easier (as in the next diff on this stack D31205883)
- Replace ```auto``` with actual types
- Replace raw cast with reinterpret_cast<...>
- All of these changes make the code structure similar to that of Dequantize
ghstack-source-id: 142166376

Test Plan: same as D31066997 (all tests pass)

Reviewed By: kimishpatel

Differential Revision: D31444248

fbshipit-source-id: 6a31d090082047263403f415911c199519987595
2021-11-02 14:27:59 -07:00
16c62a6dc9 [PyTorch Edge] Optimize Dequantize Tensor with Intrinsics (#65844)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65844

When run on [Partially Quantized Mobile Vision Transformer Model](https://www.internalfb.com/diff/D30648171), with config from rebasing onto v4 of D31869106

Before:
[AIBench Run (128ms)](https://www.internalfb.com/intern/aibench/details/309792316534505)
[Perf Report](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/model_perf_1635881079420.html)

After:
[AIBench Run (117ms)](https://www.internalfb.com/intern/aibench/details/20433505461364)
[Perf Report](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/model_perf_1635881527831.html)

Total events spent on at::native::dequantize_quantized reduced from 1.97 Billion to 0.97 Billion (~50% Reduction)
ghstack-source-id: 142166373

Test Plan:
To run quantized_test
- Clone open source repo
- Set ANDROID_NDK and ANDROID_SDK
- Build with ```BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_LITE_INTERPRETER=0  ANDROID_ABI=arm64-v8a ./scripts/build_android.sh  -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON```
- Move ```build_android/bin/quantized_test``` to devserver
- Use one world to connect to android device (ex. ```one_world android device pixel-3a```)
- In another terminal: Make quantized_test executable (```chmod +x quantized_test```), copy it to android device (```adb push quantized_test /data/local/tmp```), and run it (```adb shell /data/local/tmp/quantized_test```)

Results:
{F676102702}

Also ```buck test mode/dev //caffe2/aten:quantized_test``` passes

To test performance on [Partially Quantized Mobile Vision Transformer Model](https://www.internalfb.com/diff/D30648171) with AI Bench:
- Save this config file: P466124028 (for example: D31869106)
- Before or after the changes in this diff, run ```buck run aibench:run_bench -- -b benchmark_mobile_vision_transformer_model_config.json --platform android/arm64 --framework pytorch --remote --devices Pixel-3a-11-30 --force_profile```

Reviewed By: kimishpatel

Differential Revision: D31066997

fbshipit-source-id: 9067e683e0181aa13a2b636b68ac4fe5a4b2e618
2021-11-02 14:26:42 -07:00
9cef2033f3 Modify decorator for acc op converters (#67636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67636

Modify decorator to denote whether a acc op converter is able to support explicit/implicit batch dim. This info will be used by trt_splitter when determine whether a node can be split into acc graph.
This is can prevent us from split node to acc module and later found no proper converter exist for the node and fail the lower process.

Test Plan: unit test

Reviewed By: 842974287

Differential Revision: D31998477

fbshipit-source-id: 6789ebef4a76f9a0c1ab3edf8e846a5b6143326b
2021-11-02 13:35:40 -07:00
5ad169b7cc Adding in Wrap functions for FSDP from Fairscale (#67292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67292

as title

Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/fsdp:wrap --keep-going

Reviewed By: rohan-varma

Differential Revision: D31936404

fbshipit-source-id: b7ebead9a649766aec83e5630c2ce1386ad33e11
2021-11-02 13:30:41 -07:00
8f63cfda14 [LiteInterpreter] Specify Loader to yaml.load (#67694)
Summary:
It became a mandatory argument since PyYaml-6, but has been present since PyYaml-3

Unblock migration to newer runtime

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67694

Reviewed By: seemethere

Differential Revision: D32106043

Pulled By: malfet

fbshipit-source-id: 35246b97a974b168c066396ea31987b267534c7f
2021-11-02 12:52:57 -07:00
b00206d473 [vulkan] Use 3D textures for everything (#67647)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67647

Test Plan: Imported from OSS

Reviewed By: beback4u

Differential Revision: D32102196

Pulled By: SS-JIA

fbshipit-source-id: ded1835386a0640181f69c190a2294d298311e26
2021-11-02 12:29:26 -07:00
0ee8473af7 [SR][easy] Fix FuseListUnpack 0-use corner case (#67165)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67165

We previously skipped the optimization if `value_out->uses().size() > 1`. But it's possible that the number of uses is 0. In that case, it's not safe to access `value_out->uses()[0]`.

This is not causing any problems in production right now since we don't have any dead code before running this pass. But we should handle this case correctly to make the pass more robust.

Test Plan: CI

Reviewed By: hlu1

Differential Revision: D31887416

fbshipit-source-id: d30a5824e8bd1cda1debdc16524db3fb0da312f9
2021-11-02 12:17:16 -07:00
6b1d8e5bb2 Revert D31861962: [qnnpack] Remove redundant fp16 dependency
Test Plan: revert-hammer

Differential Revision:
D31861962 (4061239fdd)

Original commit changeset: e1425c7dc3e6

fbshipit-source-id: 418f8173c19b9541316443e1ab4ec39062561b5e
2021-11-02 11:55:07 -07:00
3e218dbd27 [PyTorch] Capture function args from schema by reference (#65951)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65951

Profiling shows that we do a bunch of heap allocations to copy Argument structs in append_operator. Capturing by reference here should be safe as long as the schema objects args is from outlive the operator function.

IMPORTANT: Reviewers (or automated tests if we're lucky) need to
confirm that the above is true or we're going to have fun
use-after-free bugs.
ghstack-source-id: 142065422

Test Plan:
AIBench run for speech model on MilanBoard

control: https://www.internalfb.com/intern/aibench/details/485570882988661 (mean 906 ms)
test: https://our.intern.facebook.com/intern/aibench/details/620835625995669 (mean 818 ms)

So almost a 10% improvement in the wall time metric?

Reviewed By: iseeyuan

Differential Revision: D31319988

fbshipit-source-id: 7da56357420df500df344f49007e070ebb1bc581
2021-11-02 11:12:04 -07:00
33d62266f2 [PyTorch][easy] Avoid allocating OperatorName strings in append_operator (#66134)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66134

No reason to do the comparison the old way when we could do it this way and avoid copying into std::string.
ghstack-source-id: 142065423

Test Plan: AIBench Milan run shows neutral to slight regression, but I think we should probably just make this change anyway.

Reviewed By: dhruvbird

Differential Revision: D31319669

fbshipit-source-id: dde329a4f2c4054f275eb98fb6556f5341e7533a
2021-11-02 11:10:52 -07:00
2644725937 [SR] Migrate gather_ranges_to_dense to new FuseListUnpack (#67164)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67164

Migrated both the variadic and non-variadic versions.

This diff is part of the effort to migrate all ops used in `FuseListUnpack` to `FuseListUnpackV2`. The original version of `FuseListUnpack` is problematic for a few reasons:

* You have to complicate the op implementation with an `is_fused` check, resulting in messier code. It is easier to reason about two ops, fused (out variant) and unfused (native).
* The original version of `FuseListUnpack` is buggy. It assumes that the `ListUnpack` node occurs immediately after the fusion candidate, which is not necessarily true.

This diff finishes the migration, so the original version of `FuseListUnpack` is removed

Test Plan:
Unit tests: `buck test caffe2/benchmarks/static_runtime/...`

**Accuracy Test**
Done at the top of this diff stack.

Reviewed By: hlu1

Differential Revision: D31887386

fbshipit-source-id: 9d44c813667a75bce13dce62bf98e6109edea6ba
2021-11-02 11:04:59 -07:00
82f7f8d471 [PyTorch] Adopt IValue::toTupleRef() where obvious (#65505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65505

Generated with

`fastmod -m 'toTuple\(\)(\s*)->' 'toTupleRef()${1}.'`

, followed by

`fastmod '(std::move\(.*)toTupleRef\(\).' '${1}toTuple()->'`

to unbreak 2 callsites.
ghstack-source-id: 142065835

Test Plan: CI

Reviewed By: gchanan

Differential Revision: D31131025

fbshipit-source-id: 54457ae5bbeb38db9c7f196d469b98521c3d3f34
2021-11-02 10:22:18 -07:00
eb1b8a2160 pytorch_android_gradle_custom_build_single migrated from Circle to GHA. (#67577)
Summary:
in scope of: https://github.com/pytorch/pytorch/issues/67301. Main changes:
* pytorch_android_gradle_custom_build_single removed from the circle (however template is still there since it is used by another similar workflow: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit, which will be migrated next)
* new GHA workflow added: pytorch_android_gradle_custom_build_single

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67577

Reviewed By: malfet, mruberry

Differential Revision: D32087709

Pulled By: b0noI

fbshipit-source-id: f9581558ddc1453b63264bf19fe5a4c245b7c007
2021-11-02 10:21:03 -07:00
d9bac7c316 [PyTorch] Add IValue::toTupleRef() (#65504)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65504

We should be able to borrow a Tuple from an IValue without incurring refcount bumps.
ghstack-source-id: 142065833

Test Plan:
Added test coverage.

Profiled static runtime on the local_ro net for ctr_mobile_feed. Inclusive time spent in VarTupleUnpack decreased about 0.3%, which roughly matches with the 0.36% of runtime that was previously spent in IValue::toTuple().

Reviewed By: hlu1

Differential Revision: D31130570

fbshipit-source-id: afa14f46445539e449068fd908d547b8da7f402c
2021-11-02 10:16:25 -07:00
7cd62621fb [PyTorch] Adopt faster Tuple::create (#65381)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65381

The previous diff adds a way to make Tuples of size 3 or less
more efficiently. This diff makes it easier to hit that path and
updates a bunch of callsites to hit it.
ghstack-source-id: 142065832

Test Plan: CI

Reviewed By: ezyang

Differential Revision: D31069538

fbshipit-source-id: d04da3709594ed68ab1c0a1471f8cffd8d001628
2021-11-02 10:10:31 -07:00
9e71ea292d Fix test_init_pg_and_rpc_with_same_socket by retrying on addr in use error (#67638)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67638

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32074698

Pulled By: H-Huang

fbshipit-source-id: 6b980fcdac4b0f1edfe086d0deb99be371a73900
2021-11-02 09:42:47 -07:00
4061239fdd [qnnpack] Remove redundant fp16 dependency (#67281)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67281

`qnnpack/operator.h` introduces a dependency on an external library fp16 via `qnnpack/requantization.h`.
Including `qnnpack/operator.h` in `pytorch_qnnpack.h` will make objects who really don't require fp16 depend on it indirectly because they include `pytorch_qnnpack.h`.
This was causing some test and bench targets to fail building for local and android/arm64 (only two tried) using cmake.

This diff moves `qnnpack/operator.h` from `pytorch_qnnpack.h` to `qnnpack_func.h`, and explicitly add `qnnpack/operator.h` in `src/conv-prepack.cc`.

Test Plan: Ran all the tests for local on my devserver, and arm64 on Pixel3a.

Reviewed By: kimishpatel

Differential Revision: D31861962

fbshipit-source-id: e1425c7dc3e6700cbe3e46b64898187792555bb7
2021-11-02 09:29:55 -07:00
cd51d2a3ec Adding OpInfo for logical_or, logical_and, logical_xor (#67178)
Summary:
This PR addresses https://github.com/pytorch/pytorch/issues/54261.

This adds OpInfos for binary logical element wise operators. This is my first PR in OpInfos to PyTorch, looking forward to suggestions and any feedback.

cc: mruberry krshrimali

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67178

Reviewed By: jbschlosser

Differential Revision: D32057889

Pulled By: mruberry

fbshipit-source-id: 7e670260af6b478dba9d6e8d77de4df1b6d0b5d1
2021-11-01 20:27:45 -07:00
c65f332da4 torch::deploy unity and its demo (#67134)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67134

This diff demos torch::deploy unity which builds the model, the dependencies and the runtime as a unity!

The end user only need to use the build_unity rule to replace the python_binary rule to define the python application. Under the hood, we build the python application (an xar file), build the torch deploy runtime, and then embed the python application (the xar file) into the torch deploy runtime.

When starting the torch::deploy runtime, the xar will be written to the filesystem and extracted. We put the extracted path to python sys.path so all the model files and all the python dependencies can be found!

As a demo, the model here is just a simple python program using numpy and scipy. But  theoretically, it can be as complex as we want.

I'll check how bento_kernel works. Maybe we can learn from bento_kernel to simplify things a bit.
ghstack-source-id: 142085742

Test Plan:
```
#build
buck build mode/opt unity:unity

# make sure the path exists before we start torch::deploy runtime
# Otherwise the dynamic loader will just skip this non-existing path
# even though we create it after the runtime starts.
mkdir -p /tmp/torch_deploy_python_app/python_app_root

#run
LD_LIBRARY_PATH=/tmp/torch_deploy_python_app/python_app_root ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/unity/unity
```

Reviewed By: suo

Differential Revision: D31816526

fbshipit-source-id: 8eba97952aad10dcf1c86779fb3f7e500773d7ee
2021-11-01 19:32:49 -07:00
ec6b472e0a [vulkan] Add prepacking for conv2d_transpose (#67358)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67358

Test Plan: Imported from OSS

Reviewed By: beback4u

Differential Revision: D31970903

Pulled By: SS-JIA

fbshipit-source-id: 128deb40dc14fb97aa61af9cddab4540b630359e
2021-11-01 17:59:32 -07:00
152f665dee Inserted check for PyObject_IsInstance in THPVariableCheck (#67588)
Summary:
Inserted check for the return of PyObject_IsInstance to capture the case in which it raises an exception and return -1. When this happen THPVariable_Check now throws a python_error to signal the exception.

Fixes https://github.com/pytorch/pytorch/issues/65084

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67588

Reviewed By: mruberry

Differential Revision: D32064776

Pulled By: albanD

fbshipit-source-id: 895c7682e0991ca257e27f9638a7462d83707320
2021-11-01 16:53:54 -07:00
c4bf196334 Strided masked reduction: mean (2nd try) (#67088)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67088

Stack from [ghstack](https://github.com/ezyang/ghstack):
* __->__ #67088

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D32070264

Pulled By: cpuhrsch

fbshipit-source-id: 08a91550dd24fb0f51abf06591a0e26186c4f9f9
2021-11-01 16:12:07 -07:00
53e6aca8b3 [Pytorch Edge] Make More Classes Selective (#67397)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67397

Expand selectivity coverage to classes created outside of TORCH_LIBRARY.

ghstack-source-id: 142076940

Test Plan: Model unit tests, manually run some models on prod apps.

Reviewed By: dhruvbird, bdhirsh

Differential Revision: D31978965

fbshipit-source-id: 708901b47a9838ac54c78788028d0e18c1e378c0
2021-11-01 15:12:30 -07:00
45d5b3248b Fixed C++ BatchNorm pretty_print() with optional momentum (#67335)
Summary:
Summary : Inserted a check for the momentum and print  "None" in case is not defined. See  https://github.com/pytorch/pytorch/issues/65143

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67335

Test Plan:
The code below now prints `torch::nn::BatchNorm2d(128, eps=1e-05, momentum=None, affine=true, track_running_stats=true)` without generating errors.
```
torch::nn::BatchNorm2d m(torch::nn::BatchNormOptions(128).momentum(c10::nullopt));
std::cerr << *m << "\n";
```
Fixes https://github.com/pytorch/pytorch/issues/65143

Reviewed By: mruberry

Differential Revision: D32067820

Pulled By: ngimel

fbshipit-source-id: f40f9bbe090aa78e00f6c3a57deae393d946b88d
2021-11-01 14:45:33 -07:00
234bd6dc56 [quantized] Add bilinear quantized grid_sample (#66879)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66879

This adds a quantized implementation for bilinear gridsample. Bicubic interpolation cannot be supported as easily since we rely on the linearity of quantization to operate on the raw values, i.e.

f(q(a), q(b)) = q(f(a, b)) where f is the linear interpolation function.
ghstack-source-id: 141321116

Test Plan: test_quantization

Reviewed By: kimishpatel

Differential Revision: D31656893

fbshipit-source-id: d0bc31da8ce93daf031a142decebf4a155943f0f
2021-11-01 14:44:26 -07:00
0cbfd466d2 Remove ProcessGroup from TensorPipeAgent initialization (#66708)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66708

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D31762735

Pulled By: H-Huang

fbshipit-source-id: 9f3879fca6b8258f7e6171b14d2c1d6cce21627d
2021-11-01 14:15:27 -07:00
ba369ea053 check to ensure profiler_edge is only added when use_kineto is on (#67494)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67494

Reviewed By: jbschlosser

Differential Revision: D32031142

Pulled By: mcr229

fbshipit-source-id: 8267f0e02c5bed0fbc4956af6935a551bedb27ef
2021-11-01 13:42:14 -07:00
76f57cd442 [CODEOWNERS] Remove @neginraoof (#67631)
Summary:
She no longer works on the ONNX exporter

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67631

Reviewed By: malfet

Differential Revision: D32070435

Pulled By: msaroufim

fbshipit-source-id: d741a15bd7a916745aa7f2f3d9bb1dc699553900
2021-11-01 13:26:38 -07:00
e80cb08cc8 [jit][shape_prop] Fix jit registration of unpack_sizes ops for prepacked (#66737)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66737

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D31703587

Pulled By: IvanKobzarev

fbshipit-source-id: ccebe5ffc4fa959e3fa63afab1058d94e9df9dd9
2021-11-01 12:43:10 -07:00
251278d385 [skip ci] set more tests with owners for distributed and elastic (#67583)
Summary:
It turns out my lint doesn't work on CI all the time because of shell differences. I'm working on a new more comprehensive lint in https://github.com/pytorch/pytorch/pull/66826 and it'd be nice if these could be cleared first.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67583

Reviewed By: H-Huang, mruberry

Differential Revision: D32045155

Pulled By: janeyx99

fbshipit-source-id: ecfe9f008310c28e3b731e246c2b2ed0106d03b1
2021-11-01 12:26:03 -07:00
4d99bc839b Remove TH/THC Storage functions for unused dtypes (#67480)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67466

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67480

Reviewed By: mruberry

Differential Revision: D32023494

Pulled By: ngimel

fbshipit-source-id: 8827e1d6e765fee7219b5ee9888a1a3e3c5fbe89
2021-11-01 11:45:20 -07:00
a122ba776a Fix less_than_lowest warnings (#67422)
Summary:
Fixes useless comparison against zero warnings for Half.h

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67422

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D31951939

fbshipit-source-id: 3e9940adda2d57b4d9b122f3862706c673f9ef4b
2021-11-01 11:19:55 -07:00
da29655797 Disable miopen test for convolution on mobile (#66564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66564

Mobile thinks that we are segfaulting in _convolution, and this
is the most recent substantive change to this function.  I think
it's pretty unlikely to have caused the crash, but if we don't have
any better ideas we should try this.
ghstack-source-id: 141972758

Test Plan: ship it and see if it resolves the error report

Reviewed By: kimishpatel

Differential Revision: D31598633

fbshipit-source-id: c34f4b0b7b8529e21fd019c886ad8d68ffe286b0
2021-11-01 10:22:40 -07:00
885a8e53ba replace onlyOnCPUAndCUDA with onlyNativeDeviceTypes (#65201)
Summary:
Reference https://github.com/pytorch/pytorch/issues/53849

Replace `onlyOnCPUandCUDA` with `onlyNativeDeviceTypes` which includes `cpu, cuda and meta`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65201

Reviewed By: mrshenli

Differential Revision: D31299718

Pulled By: mruberry

fbshipit-source-id: 2d8356450c035d6a314209ab51b2c237583920fd
2021-11-01 09:22:34 -07:00
39ad7b670e [SR] Native implementation for aten::squeeze (#67441)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67441

Native ops are faster than falling back to the JIT interpreter, sometimes significantly (we've previously seen this with ops like TupleUnpack). We should improve op coverage where possible.

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D31992093

fbshipit-source-id: 88191c13d229ffeac4e5b17b78e25f51d3f7f23e
2021-11-01 08:22:57 -07:00
00da7b9a3b Set test owner for vmap (#67582)
Summary:
More leftover actions from https://github.com/pytorch/pytorch/issues/66232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67582

Reviewed By: zou3519

Differential Revision: D32045160

Pulled By: janeyx99

fbshipit-source-id: 92ae9a533285b05b44bd04bb6127061c6fddd689
2021-11-01 07:22:48 -07:00
9cdd1d7e48 Docs module check (#67440)
Summary:
Add check to make sure we do not add new submodules without documenting them in an rst file.
This is especially important because our doc coverage only runs for modules that are properly listed.

temporarily removed "torch" from the list to make sure the failure in CI looks as expected. EDIT: fixed now

This is what a CI failure looks like for the top level torch module as an example:
![image](https://user-images.githubusercontent.com/6359743/139264690-01af48b3-cb2f-4cfc-a50f-975fca0a8140.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67440

Reviewed By: jbschlosser

Differential Revision: D32005310

Pulled By: albanD

fbshipit-source-id: 05cb2abc2472ea4f71f7dc5c55d021db32146928
2021-11-01 06:24:27 -07:00
0d7cf825fc [SR] Drop support for aten::__is__ and aten::__isnot__ (#67550)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67550

`aten::__is__` and `aten::__isnot__` are extremely problematic for a large number of SR graph optimizations.

Some examples:

- Removing ops that are no-ops in the forward pass like `aten::detach`. This would normally be trivial, but `is` introduces corner cases like this:
```
def forward(x):
    y = x.detach()
    return x is y
```
We get `False` before optimizations. But after optimizations, the test becomes `x is x`, and we get `True`.

- `ReplaceWithCopy`: the pass that replaces ops like `aten::to` with an out variant that copies its input. The following graph returns `True` before optimizations, but `False` afterwards
```
def forward(x):
    y = x.to(x.dtype)
    return x is y
```

- And many more, `FuseListUnpack` can break too

Since the ops are not used by 99.99% of users, rejecting them so we don't have to think about this is not a big deal.

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: d1jang

Differential Revision: D32022584

fbshipit-source-id: d135938edb2299c9b8f9511afac2bf568578879e
2021-11-01 04:45:14 -07:00
7fbcf79684 [tensorexpr][nnc] Support quantization (#66676)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66676

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31676329

Pulled By: IvanKobzarev

fbshipit-source-id: 288b41ff4ed603dfaacb465f296997f14bb23c22
2021-10-31 22:49:30 -07:00
97f29bda59 Relaxes tolerance on ROCm test_noncontiguous_samples_matmul (#67593)
Summary:
This test is narrowly failing intermittently. See https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.3.1-py3.6-test1/7736//console for an example. Relevant snippet:

```
12:28:43 ======================================================================
12:28:43 FAIL [0.104s]: test_noncontiguous_samples_matmul_cuda_float32 (__main__.TestCommonCUDA)
12:28:43 ----------------------------------------------------------------------
12:28:43 Traceback (most recent call last):
12:28:43   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1422, in wrapper
12:28:43     method(*args, **kwargs)
12:28:43   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1422, in wrapper
12:28:43     method(*args, **kwargs)
12:28:43   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test
12:28:43     result = test(self, **param_kwargs)
12:28:43   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper
12:28:43     return test(*args, **kwargs)
12:28:43   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 920, in only_fn
12:28:43     return fn(self, *args, **kwargs)
12:28:43   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1041, in wrapper
12:28:43     fn(*args, **kwargs)
12:28:43   File "test_ops.py", line 262, in test_noncontiguous_samples
12:28:43     self.assertEqual(actual_grad, expected_grad)
12:28:43   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1903, in assertEqual
12:28:43     super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
12:28:43 AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 1 element(s) (out of 10) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 1.2278556823730469e-05 (-1.458460807800293 vs. -1.4584730863571167), which occurred at index 7.
```

Setting an absolute tolerance of 1e-4, which is what this PR does, should make the test pass consistently.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67593

Reviewed By: ngimel

Differential Revision: D32050986

Pulled By: mruberry

fbshipit-source-id: f15bc8c4516be0a859afcfa76d52334c0b2c58a5
2021-10-31 04:26:31 -07:00
d0662f2f76 Add adaptive_max_pool OpInfo (#67405)
Summary:
Per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67405

Reviewed By: mruberry

Differential Revision: D32044712

Pulled By: ngimel

fbshipit-source-id: 4619d134d18359601801c029dd5be3f59b91626d
2021-10-30 21:19:58 -07:00
e01279cc2e Disable reduced precision reductions for fp16 GEMMs (#67578)
Summary:
It appears that most NVIDIA architectures (well, at least there haven't been many reports of this issue) don't do reduced precision reductions (e.g., reducing in fp16 given fp16 inputs), but this change attempts to ensure that a reduced precision reduction is never done. The included test case currently fails on Volta but passes on Pascal and Ampere; setting this flag causes the test to pass on all three.

CC stas00 ngimel ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67578

Reviewed By: mruberry

Differential Revision: D32046030

Pulled By: ngimel

fbshipit-source-id: ac9aa8489ad6835f34bd0300c5d6f4ea76f333d1
2021-10-30 21:14:11 -07:00
510e3026a9 [numpy] add torch.argwhere (#64257)
Summary:
Adds `torch.argwhere` as an alias to `torch.nonzero`

Currently, `torch.nonzero` is actually provides equivalent functionality to `np.argwhere`.

From NumPy docs,
> np.argwhere(a) is almost the same as np.transpose(np.nonzero(a)), but produces a result of the correct shape for a 0D array.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64257

Reviewed By: qihqi

Differential Revision: D32049884

Pulled By: saketh-are

fbshipit-source-id: 016e49884698daa53b83e384435c3f8f6b5bf6bb
2021-10-30 15:26:11 -07:00
a95c94f075 [fx2trt] fix acc_tracer when run against module that contains ScriptModule submodules (#67567)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67567

- Fix an issue to allow it to work against modules that contains ScriptModule submodules.
- Fix a bug where `getattr(base_class, method_name)` could raise KeyError

Test Plan: linter; CI;

Reviewed By: 842974287

Differential Revision: D31956070

fbshipit-source-id: 1114937f380af437fd6d36cd811ef609d7faefe7
2021-10-30 15:13:45 -07:00
b24c34426f Add OpInfo for torch.unique and torch.unique_consecutive (#67529)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67529

Reviewed By: pbelevich

Differential Revision: D32045941

Pulled By: saketh-are

fbshipit-source-id: fefea1ddabcd3c4b40e9374b991410626437cdb4
2021-10-30 08:33:41 -07:00
aa16de517d Revert D31984694: [pytorch][PR] make TORCH_(CUDABLAS|CUSOLVER)_CHECK usable in custom extensions
Test Plan: revert-hammer

Differential Revision:
D31984694 (d4493b27ee)

Original commit changeset: 0035ecd13980

fbshipit-source-id: c85689007719c9e4a930b0a8a32d481a501d3c14
2021-10-30 03:51:18 -07:00
4a2bbc619d move functionalize fallback out of aten/core (#67564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67564

moves the functionalize fallback out of aten/core and into aten, which should fix the issue described at https://fb.workplace.com/groups/163556484490704/permalink/1029416141238063/. I'm still not clear on why this didn't fail anything in CI / sandcastle on the initial diff: D31942093 (0032fa7725)
ghstack-source-id: 141959891

Test Plan: Locally, running `buck build mode/opt //sigrid/feed/prediction_replayer:fully_remote_replayer_main`

Reviewed By: zou3519

Differential Revision: D32027585

fbshipit-source-id: 2d86c4a6b3a73b00ee0ccee2f89a54704ed83e00
2021-10-29 21:40:49 -07:00
c00806beda Add skipXLA and expectedFailureXLA decorator (#66857)
Summary:
Add skipXLA and expectedFailureXLA decorator and relevant test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66857

Reviewed By: ngimel

Differential Revision: D32039856

Pulled By: mruberry

fbshipit-source-id: 3c99d5e06c1c7684d1f798c11c783bd6ebea9899
2021-10-29 19:53:36 -07:00
69adbc8778 Fix splitter_base and add unit test for trt splitter (#67569)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67569

Splitter_base has assumption that the first subgraph after split must be cpu subgraph if there exists cpu node. This is wrong, start subgraph should be determined by which subgraph has 0-dep node.
Also add unit test for splitter.

Reviewed By: yinghai

Differential Revision: D32012549

fbshipit-source-id: e2639ccd7774b4295ca05c2ddbefff9726702b3f
2021-10-29 18:51:59 -07:00
d4493b27ee make TORCH_(CUDABLAS|CUSOLVER)_CHECK usable in custom extensions (#67161)
Summary:
Make `TORCH_CUDABLAS_CHECK` and `TORCH_CUSOLVER_CHECK` available in custom extensions by exporting the internal functions called by the both macros.

Rel: https://github.com/pytorch/pytorch/issues/67073

cc xwang233 ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67161

Reviewed By: jbschlosser

Differential Revision: D31984694

Pulled By: ngimel

fbshipit-source-id: 0035ecd1398078cf7d3abc23aaefda57aaa31106
2021-10-29 17:27:07 -07:00
ad89d994c9 [Static Runtime] Support recordio format input for benchmark (#67530)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67530

Currently `ptvsc2_predictor_bench` only uses the first input of a given recordio file even when the record io file contains many inputs.

This change extends `StaticRuntime::benchmark` to accept multiple input entries so that we can benchmark more extensibly and realistically using all the inputs in the recordio file.

Test Plan:
Tested `ptvsc2_predictor_bench` with / without this change executing the following command:
```
MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/home/djang/ads/adfinder/ctr_mobilefeed/302008423/302008423_0.predictor.disagg.local  --recordio_inputs=/home/djang/ads/adfinder/ctr_mobilefeed/302008423/302008423.local.inputs.recordio --pt_enable_static_runtime=1 --compare_results=0 --iters=1 --warmup_iters=1 --num_threads=1 --do_profile=1 --method_name=local.forward --set_compatibility --do_benchmark=1 --recordio_use_ivalue_format=1
```

Reviewed By: hlu1

Differential Revision: D31947382

fbshipit-source-id: 4188271613aad201f8cad5f566e0dfed26680968
2021-10-29 14:38:14 -07:00
2cac92f470 [SR] Migrate sigrid_transforms_torch_bind to new FuseListUnpack (#67163)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67163

Migrated both the variadic and non-variadic versions.

This diff is part of the effort to migrate all ops used in `FuseListUnpack` to `FuseListUnpackV2`. The original version of `FuseListUnpack` is problematic for a few reasons:

* You have to complicate the op implementation with an `is_fused` check, resulting in messier code. It is easier to reason about two ops, fused (out variant) and unfused (native).
* The original version of `FuseListUnpack` is buggy. It assumes that the `ListUnpack` node occurs immediately after the fusion candidate, which is not necessarily true.

Test Plan:
Unit tests: `buck test caffe2/benchmarks/static_runtime/...`

**Accuracy Test**
Done at the top of this diff stack.

**Performance**
Everything seems to be about the same plus or minus some noise.

* Baseline (D31947382 with some errors correct locally, the version of the op here is fused and variadic): P464964343
* This diff, fused_variadic: P464960645
* Variadic transformation disabled, fused (caught and fixed a schema error here): P464961561
* List unpack fusion disabled, variadic: P464962661
* Both variadic and fusion passes disabled: P464963342

The predictions match with the JIT interpreter for all ops.

Reviewed By: hlu1

Differential Revision: D31887300

fbshipit-source-id: 25a7b4e35eed21ca8b2c98297513425cf17f461a
2021-10-29 14:25:10 -07:00
289b0f7b04 Resent the reverted PR: Add register_frozenpython.cpp to the torch::deploy interpreter library in the OSS build (#67303)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67303

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D32016061

Pulled By: shunting314

fbshipit-source-id: 9460c90dd4f630f4c81dbfbbd772446ddffbabd0
2021-10-29 14:10:43 -07:00
ba74b03b0d Back out "[sharded_tensor] simplify init_from_local_shards API"
Summary: Original commit changeset: 6e97d95ffafd

Test Plan: unit test

Reviewed By: wanchaol

Differential Revision: D32023341

fbshipit-source-id: 2a9f7b637c0ff18700bcc3e44466fffcff861698
2021-10-29 14:01:07 -07:00
5c77ccefe0 Resolves #67227 documentation issue (#67379)
Summary:
Changed "Chi2" in the docstring to a more intuitive "Chi-squared"

Fixes https://github.com/pytorch/pytorch/issues/67227

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67379

Reviewed By: jbschlosser

Differential Revision: D32023761

Pulled By: ngimel

fbshipit-source-id: b514b49726f616914871a9a831aa10e12e4be90b
2021-10-29 13:47:38 -07:00
66202b7f8d [Pytorch Edge] Expose runtime operators versioning (#67385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67385

As part of the expanded operator versioning effort we are going to start looking at this variable and whats stored locally in the model file.
ghstack-source-id: 141782717

Test Plan: unit test

Reviewed By: cccclai

Differential Revision: D31976654

fbshipit-source-id: 255a23cff7c4f4039089de23b4da95772be48324
2021-10-29 13:42:59 -07:00
60a80c5bbd [jit] Move ModuleIndex operator to selective build. (#67483)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67483

Move ModuleIndex operator to selective build candidates.
ghstack-source-id: 141953898

Test Plan: eyes

Reviewed By: qihqi

Differential Revision: D32003895

fbshipit-source-id: 635c2bc37cd30a98f4a1e182fd6534eb9f1c4a69
2021-10-29 13:31:35 -07:00
12ede84dbb [jit][edge] Enable lite interpreter to correctly handle INTERFACE_CALL instruction. (#65972)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65972

ghstack-source-id: 141842336

Test Plan: buck test mode/dev //caffe2/test:mobile -- --exact 'caffe2/test:mobile - test_stacktrace_interface_call (mobile.test_lite_script_module.TestLiteScriptModule)'

Reviewed By: qihqi

Differential Revision: D31326147

fbshipit-source-id: 338ff4ce8ddc9502ffe0add49057b33b52a24955
2021-10-29 13:13:32 -07:00
d6b15bfcbd [jit][edge] Load interface methods to corresponding ClassTypes. (#65971)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65971

ghstack-source-id: 141842335

We should be able to load methods into their ClassTypes. Right now mobile runtime only loads data member to ClassTypes but not for methods. To support interface call, we inject methods into ClassTypes when the methods are loaded.

Test Plan: existing tests should all pass.

Reviewed By: qihqi

Differential Revision: D31326146

fbshipit-source-id: fb1dbea619910ef1f8fa26146da3ebab348fe902
2021-10-29 12:48:57 -07:00
6259601c8a Set test owners for tests with unknown owners (#67552)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67552

Reviewed By: jbschlosser

Differential Revision: D32028248

Pulled By: janeyx99

fbshipit-source-id: a006f7026288b7126dba58b31cac28e10ce0fed6
2021-10-29 12:42:01 -07:00
c19cda5782 [skip ci] Add test owners for a special hi-pri class of tests (#67553)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

This change does require some context: there were several suggestions regarding what to do about this group of tests: tests that are core and crucial to all of PyTorch and are too broad to be owned by one team.
1. Let's add a "module: core" and put people behind it! This idea sounds appealing unless you are one of the people backing the label. From talking to albanD among others, this idea of putting all these core tests on the shoulder of a few people or one team isn't super fair and I have not yet found anyone willing to take on this job.
2. Taking advantage of the fact that we already have a triaging oncall that takes turns triaging issues, we can leave these tests essentially unlabeled and allow the oncall to triage these tests. Since these tests are crucial to PyTorch, we'll add the "high priority" label to mark them different from other unowned tests (see https://github.com/pytorch/pytorch/issues/67552).
3. I _could_ still create an unbacked label "module: core" and attribute these tests there, but I don't like the idea of creating a facade that the tests are "triaged" to a label when no one is actually taking a look.

Now we could potentially break these tests down into smaller files so that each piece _could_ be owned by a team, but 1. I don't know if this is currently feasible and 2. This approach does not prevent that from happening in the future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67553

Reviewed By: albanD

Differential Revision: D32025004

Pulled By: janeyx99

fbshipit-source-id: 1fb1aa4c27e305695ab6e80ae3d02f90519939c0
2021-10-29 12:17:21 -07:00
fcba8018c2 Update codeowners for sphinx conf (#67548)
Summary:
Add a codeowner for the conf file to ensure allowlist modification is monitored.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67548

Reviewed By: jbschlosser

Differential Revision: D32023929

Pulled By: albanD

fbshipit-source-id: 63f18cdd725cc60993a6c0a9e3529ed95845e0bb
2021-10-29 10:50:15 -07:00
69f86ecd3a Sparse CSR CUDA: add torch.add with all inputs sparse (#63948)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63948

This PR adds `torch.add(a, b, alpha=None, out=out)` variant with `a, b,
out` all being sparse CSR tensors.
The underlying cuSPARSE function works only with 32-bit indices, and in
the current implementation, the result tensor has 32-bit indices. Input
tensors can have both 64-bit and 32-bit indices tensors.

Fixes https://github.com/pytorch/pytorch/issues/59060

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D31909731

Pulled By: cpuhrsch

fbshipit-source-id: 656f523e3947fec56b2f93c474fb6fd49f0360ca
2021-10-29 10:43:05 -07:00
285d5a55b9 Add API usage to torch.RPC (#67515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67515

Adding API usage to torch.rpc to better understand usage of this API.
ghstack-source-id: 141877028

Reviewed By: rohan-varma

Differential Revision: D32011465

fbshipit-source-id: 34d006ece307ae4a90fbcc6cb44fc0b7edca611e
2021-10-29 10:38:41 -07:00
ddc9bd335b Adds reference vs. noncontiguous OpInfo test (#67434)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63341.

This PR adds a new test, `test_noncontigous_samples`, that runs ops forward and backward and compares their outputs and grads between "normal" contiguous SampleInputs and noncontiguous SampleInputs. This test should preclude the need for noncontiguous SampleInputs going forward.

The test was added by generalizing the `.numpy()` transform on SampleInputs to support a new `.noncontiguous()` transform and copying forward/backward patterns from other tests in test_ops.py. It also discovered that many SampleInputs were incorrectly reusing tensors, so those have been revised. SampleInputs creating noncontiguous tensors for testing have also been altered to no longer do so.

In addition, this test discovered the following high priority silent correctness issues:

- https://github.com/pytorch/pytorch/issues/67432
- https://github.com/pytorch/pytorch/issues/67517
- https://github.com/pytorch/pytorch/issues/67513
- https://github.com/pytorch/pytorch/issues/67512
- https://github.com/pytorch/pytorch/issues/67470

It also identified the following issues:
- https://github.com/pytorch/pytorch/issues/67539

The pow OpInfo also incorrectly specified that pow supported the bool datatype, and this has been fixed. Its SampleInputs were written in a way that made requests for boolean SampleInputs return type promoting inputs that never actually tried to compute pow in bool.

This PR suggests we should add the following guidance for writing SampleInputs:

- ensure that all SampleInputs are independent of each other (don't reuse tensors)
- ensure that all SampleInput tensors have no grad or backward functions (no autograd history) -- they should be leaves
- prefer keeping sample inputs simple where possible, a good set of handwritten samples that test interesting cases may be better than an exhaustive but hard to read and maintain programmatic enumeration
- keep code readable by using functools.partial and writing simple inline helpers; break up large statements into a more readable series of smaller statements; especially don't write complicated generator expressions with a `for` at the end!

fyi kshitij12345 krshrimali pmeier anjali411 saketh-are zou3519 dagitses

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67434

Reviewed By: ngimel

Differential Revision: D32014557

Pulled By: mruberry

fbshipit-source-id: b17e19adc1d41e24441f0765af13d381fef5e3c1
2021-10-29 09:55:56 -07:00
16d937b0df Fix strided _conv_double_backward() with 3D input / weight (#67283)
Summary:
Removes the 3D special case logic in `_convolution_double_backward()` that never worked.

The logic was never called previously since `convolution()` expands input / weight from 3D -> 4D before passing them to backends; backend-specific backward calls thus save the 4D version to pass to `_convolution_double_backward()`.

The new general `convolution_backward()` saves the original 3D input / weight, uncovering the bug.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67283

Reviewed By: anjali411

Differential Revision: D32021100

Pulled By: jbschlosser

fbshipit-source-id: 0916bcaa77ef49545848b344d6385b33bacf473d
2021-10-29 09:48:53 -07:00
bf31995194 Add OpInfo for nn.functional.cosine_embedding_loss (#67465)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67465

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D32001920

Pulled By: ejguan

fbshipit-source-id: 82e547b5f0057b4ecc61e6f3be56bf038db179d1
2021-10-29 09:11:23 -07:00
bcd301a457 Add OpInfor for nn.functional.ctc_loss (#67464)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67464

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32001919

Pulled By: ejguan

fbshipit-source-id: f277a8e9c9887ed62e871e8a0c8549e853e34356
2021-10-29 09:11:21 -07:00
e2e20e79fb Add OpInfo for nn.functional.poisson_nll_loss (#67371)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67371

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D31973173

Pulled By: ejguan

fbshipit-source-id: 3cbb21d292b95039f7c7d1f4caa300f3d619740a
2021-10-29 09:11:18 -07:00
8b8fb4f4e6 Add OpInfo for nn.functional.gaussian_nll_loss (#67376)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67376

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D31974040

Pulled By: ejguan

fbshipit-source-id: d6abac78a378d2763ca2fd465e64dea9985840f2
2021-10-29 09:11:16 -07:00
1d900ee22f Add OpInfo for nn.functional.hinge_embedding_loss (#67381)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67381

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D31976354

Pulled By: ejguan

fbshipit-source-id: 09068bb3d1bba665517254dd8a2dab9abd78b0e2
2021-10-29 09:11:14 -07:00
c6a6c09383 Add OpInfo for torch.nn.functional.gaussian_nll_loss (#67356)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67356

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D31970077

Pulled By: ejguan

fbshipit-source-id: 91bd9c5202b49f79ef83795196c2773fbe8a9afd
2021-10-29 09:09:48 -07:00
2e156f649e Sort output of *NativeFunctions.h (#67046)
Summary:
This ensures deterministic output, allowing systems like ccache to be
more effective.

cc ezyang bhosmer bdhirsh

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67046

Reviewed By: VitalyFedyunin

Differential Revision: D31896114

Pulled By: bdhirsh

fbshipit-source-id: d29ef0cf6c7e3408b104c5239b620eaa24327088
2021-10-29 09:03:39 -07:00
f95ed474ac Norms Op Info (#67442)
Summary:
Adds op infos for group_norm, instance_norm, and local_response_norm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67442

Reviewed By: mruberry

Differential Revision: D31992225

Pulled By: samdow

fbshipit-source-id: 5bf3e21cff2a39ca3e47dbe13db7671c617aaad1
2021-10-29 08:36:07 -07:00
d58f209326 add dequantize support for fp16 + cuda (#67234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67234

Extends the dequantize fp16 function to also work on CUDA,
and adds a test.

Test Plan:
```
python test/test_quantization.py TestQuantizedTensor.test_dequantize_fp16_cuda
python test/test_quantization.py TestQuantizedTensor.test_dequantize_fp16_cpu
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D31915330

fbshipit-source-id: 622d47464fae26bf02f295ff56df63a3bf80b786
2021-10-29 07:58:38 -07:00
99282126dc pytorch quantization: document the custom module APIs (#67449)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67449

Adds a description of what the current custom module API does
and API examples for Eager mode and FX graph mode to the main
PyTorch quantization documentation page.

Test Plan:
```
cd docs
make html
python -m http.server
// check the docs page, it renders correctly
```

Reviewed By: jbschlosser

Differential Revision: D31994641

Pulled By: vkuzo

fbshipit-source-id: d35a62947dd06e71276eb6a0e37950d3cc5abfc1
2021-10-29 05:22:17 -07:00
acdc754918 [quant][graphmode][fx] Add support for ObservationType.OUTPUT_SHARE_OBSERVE_WITH_INPUT in backend_config_dict (#67210)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67210

`OUTPUT_SHARE_OBSERVE_WITH_INPUT` is an observation type for operators that would have the same observer/fake_quant instance
as output, when quantized, these ops can take quantized Tensor as input and output a quantized Tensor with the same quantization parameters (scale/zero_point etc.) as input
Using cat as an example in this PR. Other ops can be added later gradually (together with tests).

Test Plan:
python test/fx2trt/test_quantize_fx.py TestQuantizeFxTRTOps.test_cat

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31907243

fbshipit-source-id: 2c7af4a456deb5e6597b0b9cd4e32c5fcdec580b
2021-10-29 02:37:48 -07:00
2bb20c0e48 [quant] Move test file to fx2trt folder (#67129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67129

Since the tests depends on experimental feature (fx2trt), we'll move them to fx2trt foler

Test Plan:
python test/fx2trt/test_quantize_fx.py

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31877123

fbshipit-source-id: 5a98a257c4806c1911cfc2616d5ad98d715234c4
2021-10-28 23:58:44 -07:00
5e46a4f6bd Fixes to make trt timing_cache really work (#67524)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67524

We have some loose ends to tie to make timing cache really work. This diff fixes them.

Reviewed By: wushirong

Differential Revision: D32012021

fbshipit-source-id: 1e93c76d48a3740a02613e1f19222ed92cca9deb
2021-10-28 23:09:24 -07:00
96c868217c [deploy] fix TypedStorage serialization (#67499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67499

Since https://github.com/pytorch/pytorch/pull/62030 was landed, storages being produced when loading from a pickle are of type TypedStorage. We weren't catching this in our deploy serialization, leading tensors to actually get pickled instead of the storages getting shared across interpreters.

Since this is technically correct still, it wasn't caught by any of our tests, until someone tried to pass a really big tensor and started ooming.
ghstack-source-id: 141869521

Test Plan: added unit test

Reviewed By: shunting314

Differential Revision: D32004075

fbshipit-source-id: ef5a80cd3cb1dff0b6b4c1b6c95923e4faab7d50
2021-10-28 22:33:04 -07:00
4052393af8 Revert D31450501: Wextra caffe2/
Test Plan: revert-hammer

Differential Revision:
D31450501 (7c2d3e6d32)

Original commit changeset: 702728fdb3c5

fbshipit-source-id: 486b8e872c38415706288f7f389d7cb1ea5eb0a9
2021-10-28 20:43:28 -07:00
18807273cb Fix Ads build broken due to comparison type mismatch (#67526)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67526

Build error P465285570 due to D31942093 (0032fa7725)

(Note: this ignores all push blocking failures!)

Test Plan: build passes after fix

Reviewed By: jbschlosser

Differential Revision: D32013247

fbshipit-source-id: b60a65afd7a5a2d3249150fbc2ac52374d62a591
2021-10-28 20:42:13 -07:00
26241994b2 Remove the argument strip_doc_string of export() method entirely. (#66615) (#67278)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67278

Remove the argument strip_doc_string of export() method entirely.

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D31962512

Pulled By: malfet

fbshipit-source-id: 168ad3f157a80d1edd7a9053783b3f3deb2ecf43

Co-authored-by: fatcat-z <jiz@microsoft.com>
2021-10-28 19:25:07 -07:00
43d51254bf Deprecate the argument _retain_param_name of export() method entirely. (#66617) (#67277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67277

Remove the argument _retain_param_name of export() method entirely.

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D31962514

Pulled By: malfet

fbshipit-source-id: 8ac5e3a4a7821cc580951a7f167fd20069116350

Co-authored-by: fatcat-z <jiz@microsoft.com>
2021-10-28 19:25:05 -07:00
40920185ac [ONNX] Remove the argument enable_onnx_checker of export() method entirely. (#66611) (#67276)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67276

[ONNX] Remove argument _retain_param_name from torch.onnx.export() function.

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D31962520

Pulled By: malfet

fbshipit-source-id: 86ee15f525261c0da74175e47dd74eeb169ac47f

Co-authored-by: fatcat-z <jiz@microsoft.com>
2021-10-28 19:25:03 -07:00
609da98154 [ONNX] Update value name copying logic for onnx (#66170) (#67275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67275

Specifically targets the symbolic functions that directly returns input as output. The old logic will override the value name with output value name. But since the input is unmodified and unchanged, it is more logically to keep its original input name. Especially for cases where inputs are directly from model parameters.

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D31962517

Pulled By: malfet

fbshipit-source-id: 9cb4a2bb55aa08dd1ce8fdec24e7cfb11d7ea97c

Co-authored-by: BowenBao <bowbao@microsoft.com>
2021-10-28 19:23:55 -07:00
7c2d3e6d32 Wextra caffe2/ (#67319)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67319

Test Plan: Sandcastle

Reviewed By: pbelevich

Differential Revision: D31450501

fbshipit-source-id: 702728fdb3c5b00510ec637ff65bb2c6949fcc4e
2021-10-28 19:02:34 -07:00
d8bde98f36 Workaround the bug of TRT which creates extra outputs (#67327)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67327

At cerntain condition, TRT will create extra outputs, which seems more like a bug. If we don't capture those hidden outputs, we won't allocate memory to host those outputs and trt will end up writing to illegal memory. This diff address the issue but capturing the hidden outputs and allocate proper memory for them.

Reviewed By: jianyuh, wushirong, 842974287

Differential Revision: D31955379

fbshipit-source-id: c9faaf91ed45bec8e0bc4e0bea812a0a3f2abad0
2021-10-28 18:43:59 -07:00
fc82ad186a Add Initial NNC Dynamic Shapes Flow (#66136)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66136

FOR REVIEWERS: this is ready to review, test failures comes from somewhere else in stack..

Takes in a TensorExprGraph of static shapes and generalizes the input shapes
to symbolic dimensions. Dimensions of value 1 will be preserved, otherwise
dimensions with the same value will be bucketed to the same symbolic shape.

E.g. `Tensor(5, 3), Tensor(3, 1) -> Tensor(SS(-1), SS(-2)), Tensor(SS(-2), 1)`

From there, runs symbolic shape inference on the graph, and creates a
versioning if in the graph with prim::TensorExprDynamicGuard checking if
the inputs at runtime match the Generalized Symbolic Shapes that are inputs
to the TE Kernel. The computate to calculate all symbolic dimensions is
inlined in to the if block with the TE Kernel. All Sym Dim Value* are
appended to the end of the TE Kernel Graph/Node inputs, and the Node is
augmented with a integer list attr `symbolic_shape_inputs` that gives the
mapping from Value * -> Symbolic Shape int64_t value. For more lengthy IR
examples and walkthrough look at ShapeAnalysisTest.DynamicShapesFusion in
`test_shape_analysis` Returns True on Success, False on Failure, can fail if
shape propagation fails to propagate # of dims or if complete shapes on
inputs not set.

Example transformation
```
graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)):
  %3 : Tensor = prim::TensorExprGroup_0(%x_inp, %y_inp, %z_inp)
  return ()
with prim::TensorExprGroup_0 = graph(%x.1 : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %y.1 : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)):
  %3 : int = prim::Constant[value=0]()
  %4 : Tensor = aten::tanh(%x.1)
  %5 : Tensor = aten::erf(%4)
  %6 : Tensor = aten::relu(%y.1)
  %7 : Tensor[] = prim::ListConstruct(%5, %6)
  %8 : Tensor = aten::cat(%7, %3)
  %9 : Tensor = aten::hardswish(%8)
  %10 : Tensor = aten::mul(%9, %z)
  return (%9)
```
->

```
  graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)):
  %4 : bool = prim::TensorExprDynamicGuard[types=[Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)]](%x_inp, %y_inp, %z_inp)
  %5 : Tensor = prim::If(%4)
    block0():
      %15 : int[] = aten::size(%x_inp)
      %16 : int[] = aten::size(%y_inp)
      %17 : int = prim::Constant[value=1]()
      %18 : int = prim::Constant[value=0]()
      %elem.3 : int = aten::__getitem__(%15, %18) # <string>:40:10
      %elem.5 : int = aten::__getitem__(%15, %17) # <string>:40:10
      %elem.11 : int = aten::__getitem__(%16, %18) # <string>:40:10
      %cat_dim_size.48 : int = aten::add(%elem.3, %elem.11) # <string>:321:29
      %3 : Tensor = prim::TensorExprGroup_0[symbolic_shape_inputs=[-5, -4, -3, -2]](%x_inp, %y_inp, %z_inp, %cat_dim_size.48, %elem.11, %elem.5, %elem.3)
      -> (%3)
    block1():
      %14 : Tensor = prim::FallbackGraph_1(%x_inp, %y_inp, %z_inp)
      -> (%14)
  return ()
  with prim::TensorExprGroup_0 = graph(%x.1 : Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu),
        %y.1 : Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu),
        %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu),
        %SS_5 : int,
        %SS_4 : int,
        %SS_3 : int,
        %SS_2 : int):
    %3 : int = prim::Constant[value=0]()
    %4 : Tensor(SS(-2), SS(-3)) = aten::tanh(%x.1)
    %5 : Tensor(SS(-2), SS(-3)) = aten::erf(%4)
    %6 : Tensor(SS(-4), SS(-3)) = aten::relu(%y.1)
    %7 : Tensor[] = prim::ListConstruct(%5, %6)
    %8 : Tensor(SS(-5), SS(-3)) = aten::cat(%7, %3)
    %9 : Tensor(SS(-5), SS(-3)) = aten::hardswish(%8)
    %10 : Tensor(SS(-5), SS(-3)) = aten::mul(%9, %z)
    return (%9)
```

Test Plan: Imported from OSS

Reviewed By: navahgar, anjali411

Differential Revision: D31797466

Pulled By: eellison

fbshipit-source-id: b508d2f5baef6e8e4020955ab1d4bc4b9c7bdfdd
2021-10-28 17:09:03 -07:00
2661507488 Adding support for Symbolic Shapes in Inplace Ops #65642 (#65729)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65729

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D31961857

Pulled By: Gamrix

fbshipit-source-id: bfb1e8a66be254638e8e93ade091ab9df6029e8c
2021-10-28 16:49:10 -07:00
d0bc01fac2 ci: Migrate hardcoded docker builds to GHA (#67455)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67455

Migrates docker builds that don't have dependent jobs within the pytorch
repository to our new GHA docker build job

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet, janeyx99

Differential Revision: D31997671

Pulled By: seemethere

fbshipit-source-id: 9d6f58fa8ea8731cf12457fe64dc65e70f3d9f25
2021-10-28 14:50:05 -07:00
6696c59af4 Adding optimizer attribute to SequentialLR (#67406)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67318 :)

cc albanD, datumbox

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67406

Reviewed By: jbschlosser

Differential Revision: D31997873

Pulled By: albanD

fbshipit-source-id: f579fb886d049a545673fd92ef5892fcf501bcc6
2021-10-28 14:43:40 -07:00
354363b57a [SR] Native implementation for aten::size (#67346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67346

Native ops are faster than falling back to the JIT interpreter, sometimes significantly (we've previously seen this with ops like TupleUnpack). We should improve op coverage where possible.

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: d1jang

Differential Revision: D31965159

fbshipit-source-id: 86a69c395f401c4a4c55daa4c5fe80764383c8e5
2021-10-28 14:18:17 -07:00
9f01937caf [PyTorch][easy] Deduplicate memory planner creation code (#67265)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67265

Avoid repeating this initialization code.
ghstack-source-id: 141585971

Test Plan: CI

Reviewed By: hlu1

Differential Revision: D31933368

fbshipit-source-id: 6342ae9bb82c4d152a427bad142470c3d162de69
2021-10-28 14:13:43 -07:00
82c356505f Revert D31894777: [pytorch][PR] Replace issue templates with new issue forms
Test Plan: revert-hammer

Differential Revision:
D31894777 (62feadd76f)

Original commit changeset: fbd39f7ed4ca

fbshipit-source-id: 4698ff5fe8629f9ad0249425a369c6f0bd89c891
2021-10-28 13:52:43 -07:00
afb8434440 [SR] Native implementation for aten::view (#67341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67341

Native ops are faster than falling back to the JIT interpreter, sometimes significantly (we've previously seen this with ops like `TupleUnpack`). We should improve op coverage where possible.

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D31962589

fbshipit-source-id: 3107fb169c1b02fb2bafbb355c005669b5fa8435
2021-10-28 13:37:46 -07:00
60472594e1 [jit][edge] Implement torch::jit::Function for mobile funciton. (#65970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65970

ghstack-source-id: 141842338

mobile::Function should inherit from jit::Function, because for interface call support, we need an abstract jit::Function type stored in corresponding ClassTypes, so that we can look up methods in there. Previously mobile::Function is implemented separately which prevents this. Since we get rid of all the unneeded virtual methods from jit::Function, we can inherit from torch::jit::Function without too much cost.

NOTE that torch::jit::Function is already in dependency because we need it to support custom class call. We should be able to use Function uniformly without looking into whether it's a builtin function or mobile::Function.

Test Plan: no behavior change.

Reviewed By: iseeyuan, mrshenli

Differential Revision: D31326148

fbshipit-source-id: 36caeaf3c8c5f54c23a1a7c8c9e2fd6e78b19622
2021-10-28 13:33:30 -07:00
5ef62c88a9 [jit] Replace get_executor() with call() in abstract Function interface. (#65969)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65969

ghstack-source-id: 141759210

Test Plan: no behavior change.

Reviewed By: anjali411

Differential Revision: D31326151

fbshipit-source-id: 201f6dc4c23fdb2531f6b8c73d26127f9e212de4
2021-10-28 13:11:29 -07:00
8363da3f92 [SR][C2][easy] Benchmarks report # of ops (#67436)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67436

This information is useful for comparing static runtime to c2

Reviewed By: d1jang

Differential Revision: D31991571

fbshipit-source-id: eb83bc4564b05d56fb9a550863eea3f6312f3f6c
2021-10-28 13:03:09 -07:00
b8f07689f2 [ROCm] Enable frexp support for ROCm builds (#67226)
Summary:
The frexp function has been enabled in ROCm code.  Updating PyTorch
to enable this functionality.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67226

Reviewed By: jbschlosser

Differential Revision: D31984606

Pulled By: ngimel

fbshipit-source-id: b58eb7f226f6eb3e17d8b1e2517a4ea7297dc1d5
2021-10-28 12:42:09 -07:00
0795735351 [jit] Clean up unneeded virtual methods from Function interface. (#65968)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65968

tryToGraphFunction() should cover all cases and more composable than
adhoc virtual methods.
ghstack-source-id: 141759214

Test Plan: no behavior change.

Reviewed By: gmagogsfm

Differential Revision: D31326154

fbshipit-source-id: 692a35df424f7d4f777a96489c4cbb24b3ae7807
2021-10-28 12:28:48 -07:00
bd5e6fe5ac Skip complex128 dtype for test_addmm_sizes_all_sparse_csr Windows test (#67453)
Summary:
Windows CUDA 11.1 periodic CI is failing. See https://github.com/pytorch/pytorch/pull/63511#issuecomment-953940183.
I don't understand though why periodic-win-vs2019-cuda11.1-py3 was triggered on the PR, but no test from `test_sparse_csr.py` were run https://github.com/pytorch/pytorch/runs/3975200820?check_suite_focus=true.

cc nikitaved pearu cpuhrsch IvanYashchuk mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67453

Reviewed By: malfet, seemethere, janeyx99

Differential Revision: D31997574

Pulled By: cpuhrsch

fbshipit-source-id: ae8bfb6da865014f39e6ad5675eb17e5a4d39744
2021-10-28 12:24:46 -07:00
5b8b2382d1 Mark mv as CompositeExplicitAutograd (#67373)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67373

From the implementation of mv, it's decomposed into addmv. So it should
be a CompositeExplicitAutograd op.

Test Plan: It shouldn't change any behaviors. So, CI.

Reviewed By: bdhirsh

Differential Revision: D31973265

Pulled By: alanwaketan

fbshipit-source-id: 3b6850f08e6f671b908a177f148cc6194baa8a13
2021-10-28 11:59:00 -07:00
f3aae62942 Port tril and triu to structured kernels (#67055)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67055

This PR ports `tril` and `triu` operations to structured kernels.
ghstack-source-id: 141797608

Test Plan: Extended the existing unit tests.

Reviewed By: wanchaol

Differential Revision: D31844638

fbshipit-source-id: 03ea4aa2410b042cafc3c5397e777a9ca5351b39
2021-10-28 11:45:58 -07:00
4a1f73ccb3 [qnnpack] Remove asymmetrical padding parameters in qnnpack (#67102)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67102

Getting rid of top/bottom and left/right distinction, replacing with height and width. These parameters are widely used in qnnpack and always passed together but never different. Pytorch doesn't support asymmetrical paddings either so I see no potential use for this.
ghstack-source-id: 141334544

Test Plan: qnnpack unit tests

Reviewed By: kimishpatel

Differential Revision: D31863370

fbshipit-source-id: aa57490399e23d6139b2ad7b745139752acd7848
2021-10-28 11:40:19 -07:00
4e873d6799 Formatting changes (#66257)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66257

Used `clang-format -i` for these two files.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D31762737

Pulled By: H-Huang

fbshipit-source-id: e94e301d0b013dbb8f2cef19ff140bac5811738f
2021-10-28 11:36:00 -07:00
cee4e8f35d Add FlexiBLAS build support per #64752 (#64815)
Summary:
To enable building torch+dependencies, set WITH_BLAS=flexi BLAS=FlexiBLAS

Fixes https://github.com/pytorch/pytorch/issues/64752

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64815

Reviewed By: jbschlosser

Differential Revision: D31997745

Pulled By: albanD

fbshipit-source-id: db208d59002f5896608a03132616400f09d972aa
2021-10-28 11:28:00 -07:00
55b7387e45 Timing cache for Tensort (#67214)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67214

This is draft for creating timing cache for tensorrt.

Reviewed By: yinghai, 842974287

Differential Revision: D31783757

fbshipit-source-id: 211ab68df0832120fa637304e4a7ece80d26f9b1
2021-10-28 11:21:51 -07:00
0032fa7725 Add a Functionalization pass in core (#64432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64432

Original PR description + feedback here: https://github.com/pytorch/pytorch/pull/63048

I've addressed all of the feedback in the original PR and made some pretty large changes, listed below.

**Table of Contents**
- Starting points
- List of the main changes from the original PR
- Next Steps
- Example codegen output (for a view, mutation, and view+mutation op)

**Starting Points**

A good place to start when looking through the PR:
* Alban mentioned that this is a useful mental model (thanks Ed for originally making this clear to me). Semantically, the pass currently does THREE things, which are all needed by functorch - all fused together into one big pass.
  * (a) alias removal, which replaces {view} calls with {view}_copy calls, and manually tracks aliasing information, so that when one tensor is mutated, we re-apply the same mutation to all of the aliases. This is the bulk of the work - once this is done, the next 2 things are trivial to implement.
  * (b) mutation removal, which is easy to do once we know that there are no aliases. Every mutation `a.add_(b)` becomes `a.replace_(a.add(b))`
  * (c) reapplying views: all of the `{view}_copy` calls are replaced with `{view}` calls again. This is an optimization that we can make specifically for functorch (and strided backends), that only care about mutation removal and not alias removal
  * XLA and Vulkan only want (a), or (a) + (b). Later, we'll want to split this out so that you can actually opt into different versions of this logic.
  * There is currently no {view}_copy replacement, because the pass just <replace views with copies> and <replace copies with views> steps have been combined. Later, we'll want to actually implement {view}_copy variants of each view operator, probably with codegen.
* documentation breadcrumb 1, in `FunctionalTensorWrapper.cpp`: https://github.com/pytorch/pytorch/pull/64432/files#diff-a0bac99bf205dba5b94cb64fc2466d3d55d991887572f9cd6a02e27b3a91dd60R59 (you might have to expand the `FunctionalTensorWrapper.cpp` file, which GitHub closes by default because it's large)
* documentation breadcrumb 2, in `FunctionalTensorWrapper.h`: https://github.com/pytorch/pytorch/pull/64432/files#diff-c945c71a4ccac65871f24a912e8904f9a5088b24a32e636727ea9c8fe920708aR12
* Reading through the codegen output at the bottom of this description.

**Main changes from the original PR**

(1)  I use lambdas instead of a giant enum to handle all of the different views.

This results in less boilerplate per view op (and more stuff that can be codegen'd). Every `ViewMeta` object now contains a `forward` and `reverse` lambda, that knows how to replay the view and its inverse. This makes the actual code that executes the replaying logic a lot less boilerplate-y (see `Alias::sync_update_operations` and `FunctionalTensorWrapper::sync_`)

(2) Every tensor during the functionalization pass is always wrapped in a `FunctionalTensorWrapper`.

This is potentially unnecessary for Vulkan/XLA, and will have a mild perf impact, but for now this PR just targets the functorch use case. I previously had a complicated design a (`FunctionalTensorImplBase` class) to avoid needing the wrapper for XLA, but it had some subtleties that are gonna require more thought to fix, so I'm pushing that off for now.

(3) `FunctionalTensorWrapper` objects accurately report stride information.

It's a little annoying to do this though, because the logic that calculates stride info for each view isn't easily separated from the actual view kernels in core, `at::native::{view}`. I do this by adding logic in each `at::functionalization::{view}` kernel to call the reference implementation `at::native::{view}`. I don't do anything with the output aside from taking it's size/stride/storage_offset to set the actual output tensor's size/stride/storage_offset correctly. There's another annoying part to this: I'm pretty sure that we want to pass in the actual *wrapper* tensors directly into the native kernels, not their inner unwrapped values. But there are some `at::native::{view}` kernels that call other tensor methods, which re-invokes the dispatcher, calling functionalization/functorch kernels that try do the unwrapping.

To do this, right now I have an `AutoDispatchDirectlyToNative` guard that basically ensures that any tensor methods called inside of the at::native::{view} op always redispatch straight to the CPU kernel (which will be another at::native:: kernel). This feels kind of heavy handed, but I'm not sure of a better way to do it.

(4) `FunctionalTensorWrapper` objects accurately report aliasing information.

There's a new `FunctionalStorageImpl` class (subclass of `StorageImpl`) that allows tensors in the functionalization pass to accurately alias storage. If two tensors `a` and `b` in a functionalized program are views of one another, then `a.storage.is_alias_of(b.storage)` should return true. I added this in a pretty similar way to how meta tensors allocate storage, although I don't pass in an actual allocator (I think this is fine because you should never resize a functional tensor's storage).

One thing I'm not sure about - should `FunctionalTensorWrapper` set `storage_access_should_throw_`: (a) always, (b) never, (c) only if its wrapped tensor has it set.

Right now I have it not set, mostly because calling the reference view functions (`at::native::{view}`) requires looking at the storage. But that means that if you try to access storage from python in a functionalized program, you'll get silent garbage instead of an error. Related question: are we planning on exposing meta tensor storage to python in the future (even though it contains garbage)?

(5) better docs :)

**View operator coverage**

(6) The functionalization pass now gets math-composite view ops for free.

I didn't add the `Functionalize` dispatch key to the composite set, because I don't want composite ops like `torch.ones` to get decomposed before hitting the functionalization pass. Instead, I added codegen to manually register the `at::native::` kernels of composite view ops. This is a little hairy, because the names of the `at::native::` kernels aren't easily accessible. They're stored in a `Dict[DispatchKey, BackendIndex]`. I made a best-effort attempt to get each view kernel's name, basically by assuming that every view op has either a composite or cpu implementation.
There's also a hardcoded list of composite view ops in `gen_inplace_or_view_type.py`, but it looks like it's wrong. This is probably worth rationalizing later, but instead I created a new list of the "complete" set of composite view ops, and preserved the old set by hardcoding the delta between the two sets.

(7) I've added codegen for ops that are both views AND mutations, like `transpose_()` (why do we even have these {emoji:1f622}).

From some light testing, it looks like they work correctly with one caveat: I had a hard time ensuring that functorch programs that mutate their inputs using ops like `transpose_()` preserve the input mutations after the program finishes running. For (in my corresponding functorch branch) I emit a warning when this happens, and just don't preserve the mutation

(8) I added `{view}_inverse` implementations for every view op, in `FunctionalInverses.cpp`.

These are needed to take mutations made to views and replay them back onto the base. To reduce boilerplate, the codegen generates function declarations for each `{view}_inverse` function, so you get a nice compiler error when someone eventually adds a new view op.

The only view ops currently not supported are (a) as_strided, and (b) the sparse view ops (values()/indices()).

I can add support for as_strided, but it needs an `as_strided_inverse()` function. That will look really similar to the `as_strided_backward()` function in FunctionsManual.cpp, but it has some noticeable differences: we basically want an `as_strided_embed` for autograd and `as_strided_scatter` for functionalization. We also will probably need them to be primitives w.r.t to autograd, since the currently implementation for autograd uses view().copy_() calls that XLA won't be able to handle. I'm wondering if anyone has any objections, but otherwise I can make those change (which will require writing backward formulas for `as_strided_embed` and `as_strided_scatter`).

I did a bunch of manual testing that all looks pretty good, but it's definitely not fully tested. Ed pointed out that once XLA uses this pass (or at least once there's a POC), we can just run the existing xla view test suite. Hopefully that delay is okay - if it's not, maybe we can think about using OpInfos similar to how functorch uses them for testing.

Note: there's some duplication with autograd's view code. Every `{view}_inverse` implementation is really similar to the implementation for that view listed in `derivatives.yaml`. There are some major differences though:
* the autograd implementations over those backwards functions (like `permute_backwards()`, in `FunctionsManual.cpp`) internally call other view ops. For functoinalization, we want them to (eventually call `{view}_copy` operators).
* For view ops that take a subset of the original storage, like `slice/select/diagonal/as_strided()`, the autograd backward functions fill the "spaces" in the inverse call with zeroes. For functionalizations, we want to fill them with the value of `base` at those positions. It looks like this currently applies to 6 total ops (since we can ignore composites):
  * select
  * slice
  * diagonal
  * as_stridied
  * split
  * split_with_sizes
A nice end state would probably be for the autograd + functoinalization codegen to both look at the same yaml (either `derivatives.yaml`, or something else), and automatically generate the right thing. I didn't leave that in scope for this PR though.

**Current State + Next Steps**

There are a bunch of followups after this PR eventually lands. Roughly in order:
* Use the current pass to register problematic composite ops in functorch. Also, nested `functionalize()` calls aren't supported yet (I mostly just need to remove some debug asserts and test it).
* Work on freeing up dispatch key space in the by deduplicating the `{backend}`/`Autograd{backend}`/`Sparse{backend}`/`Quantized{backend}` keys
* Once we have more dispatch keys, split up this pass into 3 pieces - it's currently fused, and doesn't do the right thing for vulkan/XLA. Specifically, all of the `{view}` calls in the current pass's view-replay logic should turn into `{view}_copy` calls that vulkan/XLA know how to implement, and there will be separate passes for (a) removing mutations, and (b) turning `{view}_copy` calls back into `{view}` calls. For Vulkan, we eventually want a pass that ONLY removes aliasing and view calls, and doesn't remove mutations. We can also probably make the 2 new passes user dispatch keys to save dispatch key space, if they'll only be used by functorch anyway.
* Do more of a dive on perf for the vulkan/xla use cases. There are several areas to improve perf with varying levels of effort required. The simplest one that I'll probably do regardless is to codegen the out-of-place kernels instead of using a boxed fallback. Getting a POC working for xla will also be useful to test the view operator coverage.

**Example Codegen Output**

View Op:
```
::std::vector<at::Tensor> split_Tensor(c10::DispatchKeySet ks, const at::Tensor & self, int64_t split_size, int64_t dim) {

      auto self_ = at::functionalization::impl::unwrapFunctionalTensor(self);
      ::std::vector<at::Tensor> out;
      {
        at::AutoDispatchBelowFunctionalize guard;
        auto tmp_output = at::redispatch::split(ks & c10::after_func_keyset, self_, split_size, dim);
        out = at::functionalization::impl::wrapFunctionalTensor(tmp_output);
        // I'm fusing the [alias removal], [mutation removal], [add views back] passes together.
        // Later, we'll want to turn them into separate passes (since e.g. vulkan only cares about alias removal).
      }

      at::functionalization::ViewMeta view_meta = at::functionalization::ViewMeta(
        [split_size, dim](const at::Tensor& base, int64_t mutated_view_idx) -> at::Tensor {
          return base.split(split_size, dim)[mutated_view_idx];
        },
        [split_size, dim](const at::Tensor& base, const at::Tensor& mutated_view, int64_t mutated_view_idx) -> at::Tensor {
          return at::functionalization::impl::split_inverse(base, mutated_view, mutated_view_idx, split_size, dim);
        }
      );
      at::functionalization::impl::set_view_meta(out, self, view_meta);

      at::AutoDispatchDirectlyToNative native_guard;
      ::std::vector<at::Tensor> reference_tensor_output = at::native::split(self, split_size, dim);
      at::functionalization::impl::set_strides(out, reference_tensor_output);
      return out;

}
```

Mutation Op:
```
at::Tensor & add__Tensor(c10::DispatchKeySet ks, at::Tensor & self, const at::Tensor & other, const at::Scalar & alpha) {

      at::functionalization::impl::sync(self);
      at::functionalization::impl::sync(other);
      auto self_ = at::functionalization::impl::unwrapFunctionalTensor(self);
      auto other_ = at::functionalization::impl::unwrapFunctionalTensor(other);
      at::Tensor tmp_output;
      {
          at::AutoDispatchBelowFunctionalize guard;
          // The functionalization pass explicitly doesn't pass out= parameters to the redispatch
          tmp_output = at::redispatch::add(
            ks & c10::after_func_keyset, self_, other_, alpha);
      }

      self.replace_(tmp_output);
      at::functionalization::impl::maybe_add_update(self);
      return self;
}
```

View + Mutation Op:
```
at::Tensor & transpose_(c10::DispatchKeySet ks, at::Tensor & self, int64_t dim0, int64_t dim1) {

      at::functionalization::ViewMeta view_meta = at::functionalization::ViewMeta(
        [dim0, dim1](const at::Tensor& base, int64_t mutated_view_idx) -> at::Tensor {
          return base.transpose(dim0, dim1);
        },
        [dim0, dim1](const at::Tensor& base, const at::Tensor& mutated_view, int64_t mutated_view_idx) -> at::Tensor {
          return at::functionalization::impl::transpose_inverse(base, mutated_view, dim0, dim1);
        }
      );
      at::functionalization::impl::mutate_view_meta(self, view_meta);
      // See  Note [Propagating strides in the functionalization pass]
      // Directly update the sizes/strides/storage_offset fields on self using the inplace call.
      // I need the guard because I don't want the at::native kernel to end up calling more functionalization/functorch kernels.
      // Its only job is to directly compute the output size/stride/storage_offset metadata.
      at::AutoDispatchDirectlyToNative native_guard;
      at::native::transpose_(self, dim0, dim1);
      return self;

}
```

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31942093

Pulled By: bdhirsh

fbshipit-source-id: b95598dae35dd1842fa8b1d8d1448332f3afaadf
2021-10-28 10:51:17 -07:00
b0a8ca2cb5 add tags for inplace view ops in native_functions.yaml (#65412)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65412

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31942094

Pulled By: bdhirsh

fbshipit-source-id: 1f7f6ea7df13e9f91b81ed64088e35e471800aa8
2021-10-28 10:51:15 -07:00
03f3a0331b add slice/select/diagonal_scatter variants as primitive ops (#64430)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64430

The functionalization pass needs `{view}_scatter` versions of the slice/select/diagonal ops in order to correctly propagate mutations from a view to its base. On top of that, the implementations need to be primitive w.r.t. autograd, because they look something like `...slice().copy_()`, and the functionalization pass can't use views + mutations inside of it's own alias-removal machinery!

I added some basic tests that I tried to base off of existing tests for views (particularly around testing the derivative formulas), but I'm wondering if I should add something more comprehensive.

Also, as_strided fits into this category - the functionalization pass will need an `as_strided_scatter` op that's primitive w.r.t. autograd. I didn't add it for now, because it'll involve duplicating a bunch of logic from the current `as_strided_backward()` function, and also writing a derivative formula that I wasn't sure how to write :)

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31942092

Pulled By: bdhirsh

fbshipit-source-id: c702a57c2748a7c771c14e4bcc3e996b48fcc4c8
2021-10-28 10:51:12 -07:00
665c148e42 move some codegen utilities into utils.py (#63094)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63094

This PR:
- Moves `FileManager` and its dependencies (`assert_never` and other imports) to `utils.py`, and updates all of the call-sites with the fresh imports
- Passes the list of NativeFunction objects into `gen_trace_type` directly, instead of requiring the function to regenerate it (we already have it)

The purpose of the reshuffling is to avoid circular dependencies in the next PR, where I add codegen for the functionalization pass, which gets called from `gen.py` (but depends on some stuff from the autograd codegen - in partulcar, the list of view ops).

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31942096

Pulled By: bdhirsh

fbshipit-source-id: 36118facae61f25f8922bb43ad2818c80b53504e
2021-10-28 10:49:17 -07:00
b100a9ea82 Back out "Make fb::sigrid_hash_compute_multipler_shift return a std::tuple<int64_t, int64_t>" (#67456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67456

There are some compatibility issues, we need to back-out before it gets to prod feed models

Test Plan: CI

Reviewed By: pgarbacki

Differential Revision: D31997684

fbshipit-source-id: 8b2584cb5d43e487719c6530d4178988fd03c455
2021-10-28 10:44:41 -07:00
a8f85300da [quant][graphmode][fx][test] Refactor test code for quant-fx2trt unit tests (#67070)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67070

Test Plan:
python test/test_quantization.py TestQuantizeFxTRTOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31850124

fbshipit-source-id: a314b8869c091743dad7e5a1468985cf8aff6091
2021-10-28 10:39:58 -07:00
325b15039c Add FSDP tests to verify forward overlap and memory usage (#67117)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67117

Add FSDP tests to verify forward overlap and memory usage
ghstack-source-id: 141783871

Test Plan: unit tests

Reviewed By: mrshenli

Differential Revision: D31845629

fbshipit-source-id: b8b747e036925a9bb9164f0a5546000eece8442a
2021-10-28 10:29:27 -07:00
938afa37a3 Remove process group barrier and all_reduce function calls from tensorpipe agent (#65946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65946

Add new function in agent_utils to perform a synchronization of active call counts using store. This is intended to replace the barrier and all_reduce used by the process group in RPC shutdown.

`test_ddp_comparison` and `test_ddp_comparison_uneven_inputs` test fail with these changes. It seems like the RPC agents are not accessing the same store, so the total count of processes never reaches the world size to exit the barrier, still ened to investigate why it is like this only for these test cases. Setting clean_shutdown to false ignores this code path which allows the test to pass.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D31762736

Pulled By: H-Huang

fbshipit-source-id: cb5d0efe196f72726c63393c4293e97ec4f18548
2021-10-28 10:15:56 -07:00
0c93c8e39a Disable linux-xenial-cuda10.2 config (#67344)
Summary:
linux-xenial-cuda10.2 and linux-bionic-cuda10.2 are very similar, no
need to run both configs

Moved all auxiliary builds from xenial to bionic

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67344

Reviewed By: seemethere, janeyx99

Differential Revision: D31964850

Pulled By: malfet

fbshipit-source-id: d07ce266c843c7fd69b281e678c4247b0bf6da20
2021-10-28 10:10:13 -07:00
6ed68f3f84 Document torch.jit.is_tracing() (#67326)
Summary:
This PR adds `torch.jit.is_tracing()` to the JIT API reference.
This function is widely used but left undocumented: https://github.com/search?q=torch.jit.is_tracing&type=code

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67326

Reviewed By: tugsbayasgalan

Differential Revision: D31985251

Pulled By: Krovatkin

fbshipit-source-id: 852b432b08d63df8bd7a7a02c9555e61f5f37978
2021-10-28 09:56:09 -07:00
b27b1ff809 Fix deadlock when forward and backward AD are used at the same time (#67360)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67360

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D31973040

Pulled By: albanD

fbshipit-source-id: f9c75c6497b622c86e8653027bce45461304eff5
2021-10-28 09:11:36 -07:00
d3f03af496 Fix indentation in forward_grad.h (#67359)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67359

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D31973039

Pulled By: albanD

fbshipit-source-id: 80ca7870ea35977560334aa65aa344da6847c039
2021-10-28 09:10:18 -07:00
6900aacf54 [fbcode] Fix operator_benchmark with jit mode (#67382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67382

two simple updates:

* fix running benchmark with --use_jit. Previously will fail with error

  torch.jit.frontend.UnsupportedNodeError: import statements aren't supported:
  File "/proc/self/fd/3/bmm_test.py", line 9
  def __invoke_main():
    import ctypes
    ~~~~~~ <--- HERE
    import ctypes.util
    import errno

* add matmul to bmm benchmark as D31837588

Test Plan:
buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:bmm_test --  --forward_only=True --mkl_num_threads=1 --omp_num_threads=1
 --use_jit=True

Reviewed By: ShijunK

Differential Revision: D31960528

fbshipit-source-id: 84b892934149784d1b8a0f90b0233cc2f1cf1f5f
2021-10-28 08:48:10 -07:00
eb8b80b76f Add test owners for elastic tests (#67293)
Summary:
Action following discussion with distributed and r2p team--the tests under elastic in distributed should be owned by oncall: r2p and not distributed.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67293

Reviewed By: jbschlosser

Differential Revision: D31973779

Pulled By: janeyx99

fbshipit-source-id: 05875a7600c6eb1da1310a48e1e32a1a69461c55
2021-10-28 08:32:50 -07:00
2366948085 [LT] Add ir_util for ComputePostOrder (#67282)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67282

Test Plan: `build/bin/test_lazy`

Reviewed By: wconstab, ngimel

Differential Revision: D31961754

Pulled By: desertfire

fbshipit-source-id: 28466588ece8057640a7202b8c79cc1a4357d373
2021-10-28 08:17:52 -07:00
6293e0ad61 update coverage ignore to not skip whole modules (#67395)
Summary:
This reduces the chance of a newly added functions to be ignored by mistake.

The only test that this impacts is the coverage test that runs as part of the python doc build. So if that one works, it means that the update to the list here is correct.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67395

Reviewed By: jbschlosser

Differential Revision: D31991936

Pulled By: albanD

fbshipit-source-id: 5b4ce7764336720827501641311cc36f52d2e516
2021-10-28 08:07:24 -07:00
961fd76a9a [ONNX] Relax check on Prim::PythonOp nodes for ONNX_FALLTHROUGH (#66172) (#67273)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67273

* Relax check on Prim::PythonOp nodes for Onnx_fallthrough

* Add tests

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D31962521

Pulled By: malfet

fbshipit-source-id: 878920196d66c4f1dadaf3ebb9a7bf69b88849b4
2021-10-28 08:02:49 -07:00
02a78bdba7 [ONNX] Support conv-bn fusion in blocks (#66152) (#67272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67272

* Support conv-bn fusion in nested blocks

* avoid running script tests twice

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D31962513

Pulled By: malfet

fbshipit-source-id: 3ee79426542f9049cf62ac7b0c1be9d60ae6d014
2021-10-28 08:02:46 -07:00
9deb602726 [ONNX] Use Reciprocal operator instead of Div(1, x). (#65382) (#67271)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67271

* [ONNX] Use Reciprocal operator instead of Div(1, x).

This is a more readable and perhaps more performant way to export
torch.reciprocal.

* Use Reciprocal in caffe to operator to import onnx

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D31962519

Pulled By: malfet

fbshipit-source-id: d926e75b1c8312b9a980c9a1207a1a93ba0c71e0

Co-authored-by: take-cheeze <takechi101010@gmail.com>
2021-10-28 08:01:21 -07:00
eea20bfa15 fixed type checking errors in fuse.py (#66799)
Summary:
Fixes [Issue#70](https://github.com/MLH-Fellowship/pyre-check/issues/70)
This PR fixes the type checking error that was found in fuse.py as follows:

torch/quantization/fx/fuse.py:34:13 Incompatible variable type [9]: fuse_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.

Signed-off-by: Onyemowo Agbo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66799

Reviewed By: 0xedward

Differential Revision: D31961462

Pulled By: onionymous

fbshipit-source-id: 7481afc07152ba13f3224e4ad198fd8e2c34c880
2021-10-28 07:45:28 -07:00
7da9c4ed2e [SR] NNC out variant for aten::where (#67255)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67255

Add an out variant for `aten::where`.

Since this op can be implemented quite trivially in NNC with `ifThenElse`, I added an NNC kernel as well.

Test Plan: Unit tests: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: navahgar

Differential Revision: D31923886

fbshipit-source-id: b4379ee3aaf31a000e626b4caeafd3e3f3d60837
2021-10-28 06:48:22 -07:00
3aadff651c [quant][embedding qat][bugfix] Fix and test QAT EmbeddingBag from_float error message (#66989)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66989

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D31961773

Pulled By: b-koopman

fbshipit-source-id: 0d28728c87751ffc696ac221c3e8e75ac923cc57
2021-10-28 06:29:20 -07:00
62feadd76f Replace issue templates with new issue forms (#65917)
Summary:
This PR introduces the new issue forms that replace issue templates.

This is similar to what was done in torchvision https://github.com/pytorch/vision/pull/4299 and torchaudio, you can see the end result here: https://github.com/pytorch/vision/issues/new/choose (click e.g. on the [bug report](https://github.com/pytorch/vision/issues/new?assignees=&labels=&template=bug-report.yml))

The main new thing is that we can enforce some of the fields to be filled, especially for bug reports. It's also a much cleaner GUI for users IMHO, and we can provide better examples and instructions.

There is still a "blank" template available.

I removed the "Questions" form: we say we close these issues anyway. I replaced it with a direct link to https://discuss.pytorch.org. Since we still have a "blank" template, I think this  covers all previous use-cases properly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65917

Reviewed By: VitalyFedyunin

Differential Revision: D31894777

Pulled By: NicolasHug

fbshipit-source-id: fbd39f7ed4cadab732d106d3166c04c451c31f94
2021-10-28 04:49:47 -07:00
6827d36c1a [Static Runtime][DI] Fuse list unpack and variadic_grouped_accessor_op (#66585)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66585

Add a new op `static_runtime::fused_variadic_grouped_accessor_op` that outputs many tensors rather than a single tensor list. Incorporated this new op into `FuseListUnpack`. This eliminates `ListUnpack` overhead and tensor refcount bumps.

Test Plan:
**Accuracy Test**

Model 294738512_40 (manually confirmed that fusion happens)
```
get 2861 prediction values
get 2861 prediction values
max_error:  0  total:  0
```

Accuracy test with model 296213501_65 (has V2 op): passes with 0 errors.

**Performance**

TW replayer test w/ 800 QPS (stacked with D31482816 (72e25c9f4e)) shows 5% CPU decrease for storage tier.
Results:

{F673610679}

Reviewed By: hlu1

Differential Revision: D31620408

fbshipit-source-id: f05c298bcbce61a491b63d575af4aca746881696
2021-10-28 04:34:57 -07:00
90b722c544 specializeGradSumToSize patch - propagate profile_none through profile_ivalue (#63941)
Summary:
simply propagate profile_none_ value through profile_ivalue nodes inserted by nvfuser.

Without the propagation, profile_ivalue inserted by other passes would block the optimization on no-op sum_to_size.

cc gmagogsfm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63941

Reviewed By: shunting314, cpuhrsch

Differential Revision: D31972765

Pulled By: Krovatkin

fbshipit-source-id: 4fa571a758e269b486c584f47c2a933de82d463b
2021-10-27 22:54:09 -07:00
fc664ac272 [sharded_tensor] easier initialization for Shard (#66351)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66351

This add the ability for use to just provide shard_offsets and optionally rank, to construct a local shard, instead of knowing there's a ShardedMetadata. Under the hood, we will construct the ShardedMetadata by inferring shard_lengths and device from the local tensor.
ghstack-source-id: 141742410

Test Plan: test_local_shards

Reviewed By: pritamdamania87

Differential Revision: D31519919

fbshipit-source-id: 8f3b4682ffc74b79b41076f3f4b832f4cacda49d
2021-10-27 22:20:37 -07:00
71a67d0ce9 [sharded_tensor] simplify init_from_local_shards API (#64481)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64481

This simplifies `init_from_local_shards` API in sharded tensor, to only require user pass in a list of `Shard` and `overall_size`, instead of ShardedTensorMetadata. We will do the all_gather inside to form a valid ShardedTensorMetadata instead.

TODO: add more test cases to improve coverage.
ghstack-source-id: 141742350

Test Plan: TestShardedTensorFromLocalShards

Reviewed By: pritamdamania87

Differential Revision: D30748504

fbshipit-source-id: 6e97d95ffafde6b5f3970e2c2ba33b76cabd8d8a
2021-10-27 22:19:20 -07:00
0117ada47c [quant][graphmode][fx] Add input_idx_to_dtype and ouptut_idx_to_dtype to backend_config_dict (#67067)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67067

Plan to gradually adding features to backend_config_dict, this PR adds support
for specifying the dtype for input and output of a given pattern

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31849074

fbshipit-source-id: ca2fbb873176fe72e08ea79ed1bc659bf27cbd8a
2021-10-27 22:10:12 -07:00
e332d80299 [iOS][CoreML] Remove shape information from TensorSpec (#67412)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67412

For inputs, we'll be using the shape from PyTorch tensors. For outputs, we'll be using the shape from MLMultiArray. Thus, we can decouple from the symbolic shapes defined in the compile spec.
ghstack-source-id: 141746346

Test Plan:
- Sandcastle
- buck test pp-ios

Reviewed By: hanton

Differential Revision: D31299408

fbshipit-source-id: 337d5bb9efc2ff51409586c288d607399b937212
2021-10-27 21:55:29 -07:00
04aba42ed7 [Core ML] Assign Core ML computationUnit to executor (#67411)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67411

This was overlooked before.
ghstack-source-id: 141746345

Test Plan: buck test pp-ios

Reviewed By: hanton

Differential Revision: D31977097

fbshipit-source-id: f5ce9f7d58c3f35097caaa75f75310a89c918387
2021-10-27 21:55:27 -07:00
7e1a53cd5c [Core ML] Fix error messages (#67410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67410

As title
ghstack-source-id: 141537215

Test Plan: buck-test pp-ios

Reviewed By: hanton

Differential Revision: D31901372

fbshipit-source-id: 80ae1cf8cb67c0e2ca276e21cc80b1ff799437a4
2021-10-27 21:54:14 -07:00
fae1c0a434 [PyTorch] Reduce refcount bumps in ClassType (#66724)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66724

Forwarding fix from previous diff through the ClassType getters & moving Types in where possible.

ghstack-source-id: 141594741

Test Plan: CI

Reviewed By: suo

Differential Revision: D31697995

fbshipit-source-id: 05d6af7c23e3b7a94db75b20d06338bc9ade3e20
2021-10-27 19:32:33 -07:00
c8dd90c858 [PyTorch] Fix extra refcount bumps in ClassAttribute (#66723)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66723

Missing move in constructor and forced copy in getter.
ghstack-source-id: 141594742

Test Plan: CI

Reviewed By: suo

Differential Revision: D31697702

fbshipit-source-id: c2018531e7ec4a4853cd003ea3753273a5fae7fb
2021-10-27 19:31:22 -07:00
1cfdb6f4c6 [quant][fx] add pass to duplicate dequant nodes with multi use (#67118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67118

Fixes a bug in the reference pattern support for nn.Linear when the same quantized input is shared across multiple Linear nodes.

This PR adds a pass to duplicate the dequant nodes for each use so that for a case like
```
x -> quant -> dequant -> linear1 - quant1
                     |
                   linear2 - quant2
```
We duplicate the dequant nodes
```
x -> quant -> dequant1 -> linear1 - quant1
            |
          dequant2-> linear2 - quant2
```
So that we can match each pattern in the loweing step

We also add a pass to remove the extra/duplicate dequant nodes that may be leftover from the above pass if we don't lower them based on pattern match

Test Plan:
python test/test_quantization.py test_ref_pattern_multi_use

Imported from OSS

Reviewed By: mrshenli

Differential Revision: D31873511

fbshipit-source-id: aea0819222f084635157426743a50e065e6503c3
2021-10-27 18:25:35 -07:00
9e175400ac Moving python binding to _C and its decl to the right pyi file (#67365)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67365

Reviewed By: malfet, albanD

Differential Revision: D31972163

Pulled By: Krovatkin

fbshipit-source-id: e5313c2c8cb810b57b7fe16af8ba26edbe486488
2021-10-27 17:33:45 -07:00
882446c1d2 add frozen_numpy to :builtin_registry_cuda target (#67396)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67396

frozen_numpy did not work on GPU since we didn't added register_frozennumpy to the :builtin_registry_cuda target.

This was not found earlier since the unit test we added to test_deploy.cpp is only run on CPU. On GPU, we run test_deploy_gpu.cpp which does not contains the added unit tests for numpy.
In this diff, I just duplidate the unit tests for numpy (and pyyaml) across test_deploy.cpp and test_deploy_gpu.cpp.
I think ideally we should consolidate there 2 files to a single one. So we can add unit test in a single place while run them in both hardward platforms.
Tracking task: T104399180
ghstack-source-id: 141750276

Test Plan: buck test mode/opt :test_deploy_gpu

Reviewed By: suo

Differential Revision: D31978156

fbshipit-source-id: 2f5cd55ca33107cc7d230b72f1353df81f0a3bda
2021-10-27 17:29:25 -07:00
9ebc6357b3 [SR] Vectorize int version of fmod (#67313)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67313

Reviewed By: swolchok

Differential Revision: D31889868

fbshipit-source-id: a0af399431a0d672fa56cf2f2ba6d548c47bcedd
2021-10-27 17:02:53 -07:00
dea8b27433 [Pytorch Edge] Make some torchbind classes selective (#67340)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67340

Currently Torchbind classes arent selective. This makes is a rough granularity pass that will remove entire classes if they arent selected. If we need finer granularity in the future we can make individual methods within classes Selective though instrumenting that will be significantly more involved I think. On a linux build only __torch__.torch.classes._nnapi.Compilation remains unselective. I cant find where its registered :P (theres a couple Android only ones and presumably some metal only ones as well)

Many of the classes registered in functions returned a reference to the class that was created. I talked with dreiss about it and we decided that this seemingly didnt serve any purpose, and leaving it like that would make the return value difficult (but possible) to create with selectivity. Since it seems useless anyway I just changed them to return an int so that they can still be called from a global scope, but not have any issues with the return type.
ghstack-source-id: 141690776

Test Plan: CI, model unit tests, test models in prod apps

Reviewed By: dhruvbird

Differential Revision: D31092564

fbshipit-source-id: 657f7eb83490292436c15cf134ceca9b72fb9e1a
2021-10-27 16:58:27 -07:00
f20614af21 [jit] Allow custom class functions to be traced in invokeScriptMethodFromPython(). (#67380)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67380

Test Plan: eyes

Reviewed By: tugsbayasgalan

Differential Revision: D31975656

fbshipit-source-id: 47c8c9854899e9fed5a635f88470711dc4c95970
2021-10-27 16:38:50 -07:00
2267a984eb [ROCm] Add sparse mappings for CUDA->HIP translation (#67323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67323

Applied patch proposed by Jeff https://github.com/pytorch/pytorch/pull/63948#issuecomment-952166982.
In PyTorch, we map cuBLAS->rocBLAS and cuSPARSE->hipSPARSE. Note the prefix, roc versus hip.
The 'hip' APIs offer a more direct CUDA-friendly mapping, but calling rocBLAS directly has better performance.
Unfortunately, the `roc*` types and `hip*` types differ, i.e., `rocblas_float_complex` versus `hipComplex`.
In the case of SPARSE, we must use the hip types for complex instead of the roc types,
but the pytorch mappings assume roc. Therefore, we create a new SPARSE mapping that has a higher priority.
Its mappings will trigger first, and only when a miss occurs will the lower-priority pytorch mapping take place.
When a file contains "sparse" in the filename, a mapping marked with API_SPARSE is preferred over other choices.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D31969246

Pulled By: cpuhrsch

fbshipit-source-id: 4ce1b35eaf9ef0d146a0955ce70c354ddd8f4669
2021-10-27 16:28:37 -07:00
708f7b1209 Update extending doc to cover forward mode AD (#66962)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66962

Reviewed By: VitalyFedyunin

Differential Revision: D31897782

Pulled By: albanD

fbshipit-source-id: 64164783a14a7ed4cedc17da28f1181d9807a499
2021-10-27 14:18:38 -07:00
d9a5668983 [ONNX] Add dim argument to all symbolic (#66093) (#67270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67270

* Add dim argument to all symbolic

* All symbolic depends on any symbolic

Test Plan: Imported from OSS

Reviewed By: msaroufim

Differential Revision: D31962518

Pulled By: malfet

fbshipit-source-id: f7ee05cf4eff5880fc508154267e060952b5b42d
2021-10-27 13:46:31 -07:00
cb15df76ad [ONNX] Update onnxruntime to 1.9 for CI (#65029) (#67269)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67269

Test Plan: Imported from OSS

Reviewed By: ngimel, msaroufim

Differential Revision: D31962516

Pulled By: malfet

fbshipit-source-id: 39b3c6a4a05d7b769f0ef5ce7ea597209516cde2
2021-10-27 13:45:07 -07:00
9900310133 Fix sign warnings in CUDA kernels (#66753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66753

Fixes these Wextra compilation errors:
```
stderr: caffe2/aten/src/ATen/native/cuda/UnarySignKernels.cu: In lambda function:
caffe2/aten/src/ATen/native/cuda/UnarySignKernels.cu:49:72: error: comparison is always false due to limited range of data type [-Werror=type-limits]
   49 |   AT_DISPATCH_ALL_TYPES_AND2 (44fd312604)(kBFloat16, ScalarType::Half, iter.input_dtype(), "signbit_cuda", [&]() {
      |                                                                      ~~^~~
stderr: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu: In lambda function:
caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:86: error: comparison is always false due to limited range of data type [-Werror=type-limits]
   99 |     AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() {
      |                                                                                      ^
caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:97: error: comparison is always false due to limited range of data type [-Werror=type-limits]
   99 |     AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() {
      |                                                                                                 ^
stderr: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu: In lambda function:
caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:86: error: comparison is always false due to limited range of data type [-Werror=type-limits]
   99 |     AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() {
      |                                                                                      ^
```
And also these warnings:
```
caffe2/c10/util/Half.h(461): warning: pointless comparison of unsigned integer with zero
          detected during instantiation of "std::enable_if<<expression>, __nv_bool>::type c10::overflows<To,From>(From) [with To=size_t, From=unsigned long]"
caffe2/aten/src/ATen/native/Resize.h(45): here
caffe2/c10/util/Half.h(459): warning: pointless comparison of unsigned integer with zero
          detected during instantiation of "std::enable_if<<expression>, __nv_bool>::type c10::overflows<To,From>(From) [with To=size_t, From=unsigned long]"
caffe2/aten/src/ATen/native/Resize.h(45): here
```
I thought I'd fixed this previously using `std::is_unsigned` in D25256251 (cff1ff7fb6), but apparently that was insufficient.

Test Plan: Sandcastle

Reviewed By: malfet, ngimel

Differential Revision: D31708173

fbshipit-source-id: 7714f6bbf109d2f2164630d3fc46bad18046c06c
2021-10-27 13:39:27 -07:00
3a1aa31a2f Add dummy bfloat16 VSX implementation (#67331)
Summary:
Just a copy of DEFAULT bfloat16 implementation and revert restriction
introduced by https://github.com/pytorch/pytorch/pull/61630

Fixes https://github.com/pytorch/pytorch/issues/66867 and https://github.com/pytorch/pytorch/issues/62016

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67331

Reviewed By: ngimel

Differential Revision: D31959916

Pulled By: malfet

fbshipit-source-id: 8ca5e65ca041fef67ee18ddbb215cff01fd1e004
2021-10-27 13:35:38 -07:00
7484941eaa Wrap TRTInterpreter result with wrapper (#67307)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67307

 Wrap TRTInterpreter result so that any future change to output params is less likely to break existing use cases.

Test Plan: Run test with all touched file

Reviewed By: 842974287

Differential Revision: D31945634

fbshipit-source-id: 7cf73a1ef0098bff2013815f2f1692233ef7ec14
2021-10-27 13:24:50 -07:00
fa70d72e95 Set kernel func name from AOT Compiler (#67229)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67229

Right now, assembly code generated for the a given method from the model is named wrapper or func by default. The function name is then replaced with a proper kernel_func_name after target specific assembly is generated.
This PR propagates a desired kernel_func_name right from aotCompiler API so that the generated function has the needed name that doesn't need to be replaced later.

Note: Most of this change was landed in https://github.com/pytorch/pytorch/pull/66337 which had to be reverted as it was breaking `test_profiler` in `test_jit_fuser_te` as it replaced the name generated for graph with the default kernel_func_name value. This PR fixes that as well.

```
(pytorch)  ~/local/pytorch kname
└─ $ python3 test/test_jit_fuser_te.py
CUDA not available, skipping tests
monkeytype is not installed. Skipping tests for Profile-Directed Typing
........................................<string>:3: UserWarning: torch.cholesky is deprecated in favor of torch.linalg.cholesky and will be removed in a future PyTorch release.
L = torch.cholesky(A)
should be replaced with
L = torch.linalg.cholesky(A)
and
.
.
.
......................<string>:3: UserWarning: torch.symeig is deprecated in favor of torch.linalg.eigh and will be removed in a future PyTorch release.
The default behavior has changed from using the upper triangular portion of the matrix by default to using the lower triangular portion.
L, _ = torch.symeig(A, upper=upper)
should be replaced with
L = torch.linalg.eigvalsh(A, UPLO='U' if upper else 'L')
and
L, V = torch.symeig(A, eigenvectors=True)
should be replaced with
L, V = torch.linalg.eigh(A, UPLO='U' if upper else 'L') (Triggered internally at  ../aten/src/ATen/native/BatchLinearAlgebra.cpp:2492.)
......[W pybind_utils.cpp:35] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator())
/data/users/priyaramani/pytorch/torch/testing/_internal/common_utils.py:403: UserWarning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (Triggered internally at  ../torch/csrc/jit/python/pybind_utils.h:691.)
  return callable(*args, **kwargs)
.....................................................................[W Resize.cpp:23] Warning: An output with one or more elements was resized since it had shape [1], which does not match the required output shape [].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (function resize_output_check)
[W Resize.cpp:23] Warning: An output with one or more elements was resized since it had shape [1, 5], which does not match the required output shape [5].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (function resize_output_check)
........................................................................s.......s...s.s....s......s..sss............................
----------------------------------------------------------------------
Ran 503 tests in 37.536s

OK (skipped=10)
```

Test Plan: Imported from OSS

Reviewed By: navahgar, pbelevich

Differential Revision: D31945713

Pulled By: priyaramani

fbshipit-source-id: f2246946f0fd51afba5cb6186d9743051e3b096b
2021-10-27 13:10:49 -07:00
5347dab851 Set test owners for onnx tests (#66860)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66860

Reviewed By: malfet

Differential Revision: D31964696

Pulled By: janeyx99

fbshipit-source-id: 4e77d1bda92d9107ca0b90a06d24fa4477ceaffa
2021-10-27 12:50:45 -07:00
72e25c9f4e [Static Runtime][DI] Add variadic grouped_accessor_op (#66289)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66289

Add a variadic version of `grouped_accessor_op` to eliminate list construction overhead and associated refcount bumps in static runtime.

Test Plan:
Accuracy test with model 294738512_40: passes with 0 errors.
Accuracy test with model 296213501_65 (has V2 op): passes with 0 errors.

**Perf impact**

TW replayer test w/ 800 QPS (stacked with D31620408) shows ~5% CPU decrease for storage tier.
Results:

{F673610665}

Reviewed By: hlu1

Differential Revision: D31482816

fbshipit-source-id: 14393da122cefd094c3e4f423beb897c1d17b32c
2021-10-27 12:29:33 -07:00
1ec732bc46 Add fp16/fp32 autocasting to JIT/TorchScript (#63939)
Summary:
Adds mixed precision autocasting support between fp32/fp16 to torchscript/JIT. More in depth descriptoin can be found at [torch/csrc/jit/JIT-AUTOCAST.md](https://github.com/pytorch/pytorch/pull/63939/files#diff-1f1772aaa508841c5bb58b74ab98f49a1e577612cd9ea5c386c8714a75db830b)

This PR implemented an autocast optimization pass that inserts casting ops per AMP rule (torch/csrc/jit/passes/autocast.cpp), that mimics the behavior of eager autocast. The pass also takes into consideration the context of `torch.cuda.amp.autocast` and only inserts casting ops within the enabled context manager, giving feature parity as with eager amp autocast.

We currently provide JIT AMP autocast as a prototyping feature, so it is default off and could be turned on via `torch._C._jit_set_autocast_mode(True)`

The JIT support for autocast is subject to different constraints compared to the eager mode implementation (mostly related to the fact that TorchScript is statically typed), restriction on the user facing python code is described in doc torch/csrc/jit/JIT-AUTOCAST.md

This is a prototype, there are also implementation limitation that's necessary to keep this PR small and get something functioning quickly on upstream, so we can iterate on designs.

Few limitation/challenge that is not properly resolved in this PR:
1. Autocast inserts cast operation, which would have impact on scalar type of output tensor feeding downstream operations. We are not currently propagating the updated scalar types, this would give issues/wrong results on operations in promotion rules.

2. Backward for autodiff in JIT misses the casting of dgrad to input scalar type, as what autograd does in eager. This forces us to explicitly mark the casting operation for certain operations (e.g. binary ops), otherwise, we might be feeding dgrad with mismatch scalar type to input. This could potentially break gradient function consuming dgrad. (e.g. gemm backwards, which assumes grad_output to be of same scalar type as input')

3. `torch.autocast` api has an optional argument `dtype` which is not currently supported in the JIT autocast and we require a static value.

Credit goes mostly to:
tlemo
kevinstephano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63939

Reviewed By: navahgar

Differential Revision: D31093381

Pulled By: eellison

fbshipit-source-id: da6e26c668c38b01e296f304507048d6c1794314
2021-10-27 12:11:36 -07:00
0101b1ea2b [skip-ci] .github: Set linux gpu instances to be non-ephemeral (#67345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67345

Was hitting capacity issues, setting these to non-ephemeral would mean
keeping the current capacity at the expense of "unclean" nodes

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31965477

Pulled By: seemethere

fbshipit-source-id: 6d45fb34d07d55c5112db065af2aa0a8b1fd8d1f
2021-10-27 11:55:45 -07:00
b55a2500d2 [jit] Remove graph() call from abstract Function interface. (#65967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65967

Graph is an implementation detail. If user wants to get access to the
underlying graph, they should be able to explicitly dynamic cast instead.
ghstack-source-id: 141659819

Test Plan: no behavior change.

Reviewed By: gmagogsfm

Differential Revision: D31326153

fbshipit-source-id: a0e984f57c6013494b92a7095bf5bb660035eb84
2021-10-27 11:54:26 -07:00
7c48b9ee25 Sparse CSR CUDA: add triangular_solve_out (#61858)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61858

This PR adds `triangular_solve_out_sparse_csr_cuda`. The operation is
used to comput the solution to the linear system where coefficient
matrix is triangular.
Structured kernels are used and the meta function needed some changes to
support sparse csr layout. With sparse matrix input the `cloned_coefficient`
tensor is 0-sized tensor.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D31948435

Pulled By: cpuhrsch

fbshipit-source-id: 7775fece83ca705a26d75f82aead10b956b14bfd
2021-10-27 11:12:20 -07:00
4b9464f4b9 [fx]Early return if a node tries prepend self (#67068)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67068

Prepending a node to itself will result in the node gets removed from the graph.

Usually people won't prepend a node with itself. But people would accidentally try to append a node that's already next to `self` node, which will be prepending `self` to `self`.

Test Plan: Added a unit test

Reviewed By: jamesr66a

Differential Revision: D31849030

fbshipit-source-id: b0fdfbb893f785f268595acd823b426d57c15e61
2021-10-27 10:49:45 -07:00
2669e4ed4e Revert D31945507: .github: Switch 8xlarge to 4xlarge instance_type
Test Plan: revert-hammer

Differential Revision:
D31945507 (1541bb823a)

Original commit changeset: fb8587de7f31

fbshipit-source-id: 3760f5610f0c9cd5298a35490c549e56f7396aaf
2021-10-27 10:02:51 -07:00
7d1c0992e1 GHA: add back runner type for distributed tests (#67336)
Summary:
Addresses https://github.com/pytorch/pytorch/pull/67264#issuecomment-953031927

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67336

Test Plan:
the 8x is used for the distributed config
![image](https://user-images.githubusercontent.com/31798555/139103861-38d7dc37-ca8b-4448-b3ec-facc24aee342.png)

Reviewed By: malfet

Differential Revision: D31961179

Pulled By: janeyx99

fbshipit-source-id: cd21e2bf2a7c6602c9a42a53759b720959e43b8d
2021-10-27 09:34:18 -07:00
f2f7b02b4c Add support for vmap+fwdAD for basic out-of-place op (#66291)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66291

In this PR:
 - Trivial batching rules for `make_dual` and `is_same_size` that enable forward ad + vmap functionality
 - Adds a check in gradcheck that is performed when both `check_batched_grad` and `check_forward_ad` are `True` (an OpInfo using this is added later in the stack).
 - Tests for the gradcheck functionality
 - Tests that basic out-of-place op works

Test Plan: Imported from OSS

Reviewed By: albanD, saketh-are

Differential Revision: D31842018

Pulled By: soulitzer

fbshipit-source-id: 84b18d9a77eeb19897757e37555581f2a9dc43d8
2021-10-27 08:55:06 -07:00
a3aa9df59f Add barrier to ProcessGroup trampoline (#67236)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67236

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D31916706

Pulled By: mrshenli

fbshipit-source-id: f3d2bcd938a384ec297f4094831c69d4059316bb
2021-10-27 08:18:07 -07:00
e52d0e773b [tensorexpr][ir][quant] Adding qscale and qzero to tensorexpr IR Buf (#66675)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66675

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D31676328

Pulled By: IvanKobzarev

fbshipit-source-id: c6479415fa7d809e02dd3789ee0bfd6dfe50dc92
2021-10-27 01:32:16 -07:00
632719c214 Enable c10d trampoline tests on MacOS (#67205)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67205

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D31916705

Pulled By: mrshenli

fbshipit-source-id: 440d319959796d01c637c277706eeab127d9bde7
2021-10-26 20:40:12 -07:00
c88da701e2 [hpc][inference] enable cuda graph in engine holder (#66738)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66738

added a field `max_batch_size` to TRTModule, which would be later used to determine how big the engine holder would need to pad the input to

Reviewed By: 842974287

Differential Revision: D31286509

fbshipit-source-id: be5c6d4ad9c87ca0842679dc507b187275d4e8dc
2021-10-26 18:48:05 -07:00
28570664d5 [Vulkan] Add vulkan_perf_test with google benchmark (#67230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67230

Added a new test `vulkan_perf_test` for measuring performance with google benchmark.
**Summay:**
* `vulkan_perf_test` can be used to perform a quick benchmark test for Vulkan features to compare before and after performance when applying a new method and/or optimizing the existing implementation on your local machine.
* The **google benchmark** 3rd party library (https://github.com/google/benchmark) is already in the repo (`fbsource/third-party/benchmark`).
* The number of threads is set to 1 since Vulkan backend is not thread-safe.
* Added a new API `Context::wait()` to allow benchmark tests to wait for all GPU operations to be done before calling `Context::flush()`
* Call `Context::wait()` for each output Vulkan tensor and then `Context::flush()` to avoid out-of-memory issues while running a number of iterations in the benchmark test code
* Use `Time` column (wall clock) as a total execution time for each iteration (instead of `CPU` column = CPU execution time only) from the benchmark result table
* The more iterations, the more reliable data. But, it will take much longer. 100-1,000 iterations for bigger tensors and 5,000-10,000 iterations for smaller ones would be sufficient.
* The benchmark data on MacOS is not reliable since there is an extra layer [MoltenVk](https://github.com/KhronosGroup/MoltenVK) that is running on top of `Metal`. And also running Vulkan models on MacOS instead of Metal ones is generally not a good idea.

**Next steps:**
* Add more benchmark tests as we optimize more Vulkan operators
* Consider using Vulkan own performance counter such as [uVkCompute](https://github.com/google/uVkCompute) in the near future. Each iteration time can be manually set by `benchmark::State::SetIterationTime()` and `Benchmark::UseManualTime()` APIs (see [UseManualTime API](365670e432/include/benchmark/benchmark.h (L1013)))
* Consider this `vulkan_perf_test` as a performance BAT (Build Acceptance Test) on the CI pipeline. `gtest` and `google benchmark` can be written in the same place ([see](https://stackoverflow.com/questions/8565666/benchmarking-with-googletest)). And [swiftshader](https://github.com/google/swiftshader) can be used for Sandcastle devservers that don't support Vulkan. We may come up with a reasonable performance criteria for each test and it will fail if any significant performance degradation.

Test Plan:
**Test build on Android**
```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_perf_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_perf_test
adb shell "/data/local/tmp/vulkan_perf_test"
```
**Test build on MacOS**
```
cd ~/fbsource
buck build //xplat/caffe2:pt_vulkan_perf_test_binAppleMac
./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac\#macosx-x86_64
```

**Test result on Google Pixel 5**
```
Running /data/local/tmp/vulkan_perf_test
Run on (8 X 1804.8 MHz CPU s)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-------------------------------------------------------------------------------------------------------------
Benchmark (Without optimization for 4x channels)                            Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1       60.4 ms         14.1 ms         1000
cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1       24.1 ms        0.947 ms         1000
cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1       59.6 ms         14.0 ms         1000
cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1        5.98 ms        0.844 ms         5000
cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1        6.02 ms        0.845 ms         5000
-------------------------------------------------------------------------------------------------------------
Benchmark (With optimization for 4x channels)                               Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1       39.3 ms         13.3 ms         1000
cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1       16.4 ms         3.49 ms         1000
cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1       59.7 ms         14.1 ms         1000
cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1        3.93 ms        0.855 ms         5000
cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1        6.14 ms        0.852 ms         5000
```
Note that the smaller tensors (`3.93 ms` vs `6.14 ms` when comparing `{3,4,221,193}` with `{3,3,221,193}`) receive significant improvement on the Android builds. Because `vkCmdCopyImage` API is used for the bigger tensor `{3,4,22,193}` instead of shader operations.
* `{3,40,221,193}`: 60.4 ms -> 39.3 ms (34.93% faster)
* `{3,20,221,193}`: 24.1 ms -> 16.4 ms (31.95% faster)
* `{3,4,221,193}`: 5.98 ms -> 3.93 ms (34.28% faster)

{F674052834}

**Test result on MacOS**
```
Running ./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac#macosx-x86_64
Run on (16 X 2400 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 256 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 5.95, 5.02, 5.15
***WARNING*** Library was built as DEBUG. Timings may be affected.
-------------------------------------------------------------------------------------------------------------
Benchmark (Without optimization for 4x channels)                            Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1       51.2 ms         35.5 ms         1000
cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1       11.4 ms         4.76 ms         1000
cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1       51.9 ms         35.0 ms         1000
cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1        2.84 ms         1.36 ms         5000
cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1        2.30 ms         1.13 ms         5000
-------------------------------------------------------------------------------------------------------------
Benchmark (With optimization for 4x channels)                               Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1       70.1 ms         36.9 ms         1000
cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1       11.8 ms         5.00 ms         1000
cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1       69.3 ms         36.8 ms         1000
cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1        4.60 ms         1.48 ms         5000
cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1        3.65 ms         1.41 ms         5000
```
Note that `{3,40,221,193}` input tensors don't receive any performance improvement when we use `vkCmdCopyImage` API to directly copy textures when the number of channel is a multiple of 4 on MacOS. This is maybe due to an extra layer [MoltenVk](https://github.com/KhronosGroup/MoltenVK) that is running on top of `Metal`.

Reviewed By: SS-JIA

Differential Revision: D31906379

fbshipit-source-id: 0addc766502dba1a915b08840b3a4dc786a9cd9d
2021-10-26 17:55:42 -07:00
cdc9b26281 [Vulkan] Optimize cat operator for channel dimension (#67207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67207

Improved performance for `cat` operator for channel dimension:
* Improved when the input tensor's channel size is a multiple of 4.
* Add new test cases to cover this scenario
* Limitation: We can't mix up using shader and `vkCmdCopyImage` at the same time. The way we create the output texture is different between two so we can't cross unless we create the output texture every time. We consider using `vkCmdCopyImage` only if all input tensors' channel is a multiple of 4.

{F673815905}

Test Plan:
**Test Conditions**
* 3 input tensors with size `{3, 40, 221, 193}`
* Number of iteration: `1,000`
* Compare `Time` column (`CPU` column is only for CPU execution time)
* Flushes resources every 1 iteration since the input tensor size is big
* running vulkan_perf_test requires a separate diff (D31906379)

**Test build on Android**
```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_perf_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_perf_test
adb shell "/data/local/tmp/vulkan_perf_test"
```
**Test build on Mac**
```
cd ~/fbsource
buck build //xplat/caffe2:pt_vulkan_perf_test_binAppleMac
./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac\#macosx-x86_64
```

**Test result on Google Pixel 5**
a) Without using `vkCmdCopyImage` for multiples of 4 in channel dimension
```
Run on (8 X 1804.8 MHz CPU s)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-------------------------------------------------------------------------------------------------------------
Benchmark (Without optimization for 4x channels)                            Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1       60.4 ms         14.1 ms         1000
cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1       24.1 ms        0.947 ms         1000
cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1       59.6 ms         14.0 ms         1000
cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1        5.98 ms        0.844 ms         5000
cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1        6.02 ms        0.845 ms         5000
```
b) With using `vkCmdCopyImage` for multiples of 4 in channel dimension
```
Run on (8 X 1804.8 MHz CPU s)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-------------------------------------------------------------------------------------------------------------
Benchmark (With optimization for 4x channels)                               Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------
cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1       39.3 ms         13.3 ms         1000
cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1       16.4 ms         3.49 ms         1000
cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1       59.7 ms         14.1 ms         1000
cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1        3.93 ms        0.855 ms         5000
cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1        6.14 ms        0.852 ms         5000
```
* `{3,40,221,193}`: 60.4 ms -> 39.3 ms (34.93% faster)
* `{3,20,221,193}`: 24.1 ms -> 16.4 ms (31.95% faster)
* `{3,4,221,193}`: 5.98 ms -> 3.93 ms (34.28% faster)

{F674052795}

Reviewed By: SS-JIA

Differential Revision: D31781390

fbshipit-source-id: 42179d28ae461a9e247053bc9718f6b8c6c819e5
2021-10-26 17:54:19 -07:00
d691bc1207 Revert D31937065: [pytorch][PR] fix binding to the wrong python module
Test Plan: revert-hammer

Differential Revision:
D31937065 (7ac8ed741d)

Original commit changeset: 5c10b2870bcc

fbshipit-source-id: 9b21ffea8054b8a3a0b96e1b78e933f8654e7f2f
2021-10-26 17:40:59 -07:00
dfa7225a38 [Pytorch][Bootcamp] Add fix and testing for non-vectorized Adadelta optimizer to handle complex numbers (#66587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66587

Made some changes in the step function of the non-vectorized Adadelta optimizer to handle complex numbers as two real numbers as per 65711 on github
ghstack-source-id: 141484731

Test Plan:
buck test mode/dev caffe2/test:optim -- 'test_adadelta_complex'

https://pxl.cl/1R7kJ

Reviewed By: albanD

Differential Revision: D31630069

fbshipit-source-id: 2741177b837960538ce39772897af36bbce7b7d8
2021-10-26 17:35:01 -07:00
fcefed9517 Revert D31935958: Add register_frozenpython.cpp to the torch::deploy interpreter library in the OSS build
Test Plan: revert-hammer

Differential Revision:
D31935958 (00b0d4eeed)

Original commit changeset: 3e2cc5c8bc18

fbshipit-source-id: 3f22bf88d902891b83d836e3c53be9a214a58f1f
2021-10-26 17:30:22 -07:00
1541bb823a .github: Switch 8xlarge to 4xlarge instance_type (#67299)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67299

Switches the linux.8xlarge.nvidia.gpu to the 4xlarge instance type to
help with queueing / capacity issues. This change is only meant to be a
bridge until everyone updates their PRs to use the new
linux.4xlarge.nvidia.gpu node type

NOTE: This node type will be removed so do not depend on it for any new
workflows.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31945507

Pulled By: seemethere

fbshipit-source-id: fb8587de7f31da72e968d46eeecc573d3f5b440f
2021-10-26 17:23:46 -07:00
7ac8ed741d fix binding to the wrong python module (#67246)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67246

Reviewed By: zhxchen17

Differential Revision: D31937065

Pulled By: Krovatkin

fbshipit-source-id: 5c10b2870bccece50ba52dde26127da79bccbba6
2021-10-26 17:19:02 -07:00
0e8bd0c8d6 [Pytorch Delegated Backend] Add macro to define sentinel value of debug handle. (#66584)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66584

This will help avoid "-1"s in different places in our and backend codebase when
debug handle is not known.

Test Plan: CI

Reviewed By: sxu

Differential Revision: D31614478

fbshipit-source-id: 97fceb04e3e78f52feda7b1ba1da08fa4480dd77
2021-10-26 17:13:44 -07:00
00b0d4eeed Add register_frozenpython.cpp to the torch::deploy interpreter library in the OSS build (#67280)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67280

Test Plan: Imported from OSS

Reviewed By: zhxchen17

Differential Revision: D31935958

Pulled By: shunting314

fbshipit-source-id: 3e2cc5c8bc18b5e19bd3804ad542a9ed69e04291
2021-10-26 16:39:40 -07:00
f510193e22 [jit][edge] Export maybe-used interface methods from modules. (#65966)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65966

ghstack-source-id: 141594521

Support exportation of "interface methods" from submodule to a mobile module. "Interface methods" are defined as methods which might be dynamically called in a module therefore need to be exported anyway, like virtual functions in C++.

Before this change the algorithm of exportation is a simple iteration through all toplevel methods. Now since we have indirect calls, we need to recursively walkthrough the call graph to find all potentially used methods, which means the order we export methods might break in old runtimes, to guarantee forward compatibility we need to export toplevel methods first, then extra methods, in this order toplevel methods will always be found first.

NOTE that interface methods exportations are disabled by default in this diff. We need to call torch._C._enable_mobile_interface_call_export to actaully enable it.

Test Plan: buck test mode/dev //caffe2/test:jit -- --exact 'caffe2/test:jit - test_export_opnames_interface (jit.test_misc.TestMisc)'

Reviewed By: qihqi, iseeyuan

Differential Revision: D31326155

fbshipit-source-id: 5be7234cca07691f62648a85133b6db65e427b53
2021-10-26 16:35:15 -07:00
a72a6365c9 disallow requires_grad=True in make_tensor for integral inputs (#67149)
Summary:
per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67149

Reviewed By: albanD

Differential Revision: D31928613

Pulled By: ngimel

fbshipit-source-id: 4491954c4fcd4a4e3121155d4451cc7370c27a0b
2021-10-26 16:19:28 -07:00
81d188101f .github: Use 4xlarge instances for linux gpu (#67264)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67264

Downgrades linux gpu instances from 4xlarge -> 8xlarge

We were seeing capacity issues in terms of scaling 8xlarge instances,
downgrading this to 4xlarge (which only have a single gpu) to see if
that helps resolve some of the capacity issues we were seeing

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: janeyx99

Differential Revision: D31933488

Pulled By: seemethere

fbshipit-source-id: b41922ebb675e663cb035cd3795bc9bae94dcac7
2021-10-26 16:17:33 -07:00
fdc74e2373 Port triangular_solve to structured kernel (#61857)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61857

A few updates to internal code that allow marking triangular_solve as structured.

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31928687

Pulled By: cpuhrsch

fbshipit-source-id: 80a2783c469d5a6194c466ccfa8808fa41c0bdb7
2021-10-26 14:50:00 -07:00
6ce14e7b51 [PyTorch][Static Runtime] Cleanup: add valueVecFromFastSet (#66996)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66996

We do this conversion a few times, and further diffs (which I'm trying to keep as small as possible) will do it more.
ghstack-source-id: 141496817

Test Plan: CI

Reviewed By: mikeiovine

Differential Revision: D31821037

fbshipit-source-id: 1d3b54cadaedd53189aec6a35ed1a126c6fe4824
2021-10-26 14:47:15 -07:00
066a980e7b [PyTorch][Static Runtime][easy] Fix ValueGroup comment (#66965)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66965

external aliases aren't defined to be outputs (though output aliases may end up in there as the following sentence clarifies).
ghstack-source-id: 141473794

Test Plan: review

Reviewed By: mikeiovine

Differential Revision: D31809715

fbshipit-source-id: 82d1391b04e22559932f82270669a7ff94a1c90f
2021-10-26 14:45:36 -07:00
1926156752 Prevent TCPServer get deleted too early (#67204)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67204

Fixes #66422
Fixes #66423

In the original test, all collectives are dummy local ones. As a
result, rank 0 could exit earlier than other ranks. However, the
`TCPStore` lives on rank 0, and other ranks might need to talk to
that store after rank 0 exits. This commit explicitly makes rank 0
wait for all other ranks to finish.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31906802

Pulled By: mrshenli

fbshipit-source-id: 82745f5497d784ea3cea9df6bda537ec71380867
2021-10-26 14:38:11 -07:00
273ab55fc4 Revert D31914868: Strided masked reduction: mean (2nd try)
Test Plan: revert-hammer

Differential Revision:
D31914868 (a33d3d84df)

Original commit changeset: beda9d32ea65

fbshipit-source-id: dc3fa66d7e3c8a211fedac6ae191b11a4a9ab232
2021-10-26 14:18:22 -07:00
2ca552160b [DDP] logging improvements (#67059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67059

Debugging some workflows, and sometimes the training does not finish
but I want to know whether the graph was not static. Also, log 0 for unused
parameter size if no unused params were found.
ghstack-source-id: 141428950

Test Plan: Ci

Reviewed By: mrshenli

Differential Revision: D31846669

fbshipit-source-id: 21763fcdc1b244ba829117da1f15b2271d966983
2021-10-26 13:18:00 -07:00
197dec14b3 .github: Change periodic docker jobs to always_rebuild (#67267)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67267

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: xuzhao9

Differential Revision: D31934251

Pulled By: seemethere

fbshipit-source-id: a323d2c754ff6324c69f81bf0e820ae9adbe7853
2021-10-26 13:06:16 -07:00
99b34b320b Make fb::sigrid_hash_compute_multipler_shift return a std::tuple<int64_t, int64_t> (#67123)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67123

Makes `sigrid_hash_compute_multipler_shift` return a tuple instead of a tensor and modifies functions that depends on it.

Test Plan:
```
buck test //caffe2/benchmarks/static_runtime/fb:test_fb_operators
```

Benchmarks:
`local`:
```
I1022 13:56:34.529495 2866038 PyTorchPredictorBenchLib.cpp:266] Mean milliseconds per iter: 5.67114, standard deviation: 0.336918

I1022 15:29:45.248790 3292725 PyTorchPredictorBenchLib.cpp:266] Mean milliseconds per iter: 5.66678, standard deviation: 0.403032
```

`local_ro`:
```
I1022 13:59:24.262511 2882599 PyTorchPredictorBenchLib.cpp:266] Mean milliseconds per iter: 1.56012, standard deviation: 0.0537101

I1022 15:34:53.941890 3328358 PyTorchPredictorBenchLib.cpp:266] Mean milliseconds per iter: 1.5525, standard deviation: 0.0280267
```

FB: local - P463676888, local_ro - P463676984, master local - P463686094, master local_ro - P463686470

Reviewed By: mikeiovine

Differential Revision: D31867186

fbshipit-source-id: 0f640487b74d1cd0d5f714f2258e056a2f0c2c07
2021-10-26 12:51:10 -07:00
1ce500f56f [easy][PyTorch] Use at::native::is_nonzero (#67195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67195

Now that `is_nonzero` is part of `at::native` refer https://github.com/pytorch/pytorch/pull/66663, replacing `TensorCompare::is_nonzero` to `at::native::is_nonzero`

ghstack-source-id: 141514416

Test Plan: CI

Reviewed By: larryliu0820

Differential Revision: D31704041

fbshipit-source-id: 36813e5411d0aa2eb2d0442e2a195bbed417b33d
2021-10-26 12:40:32 -07:00
a33d3d84df Strided masked reduction: mean (2nd try) (#67088)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67088

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D31914868

Pulled By: cpuhrsch

fbshipit-source-id: beda9d32ea65bcae31c2c0181f95ad23c6631075
2021-10-26 11:54:39 -07:00
6c22b96082 [Pytorch Edge] Extend Tracer to Custom Classes (#67004)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67004

New version because the other one was impossible to rebase

Trace custom classes

Test Plan: CI.

Reviewed By: dhruvbird

Differential Revision: D31818978

fbshipit-source-id: daa22ccb153e32685bcca43a303ba9e21042d052
2021-10-26 11:38:06 -07:00
34ee5b11ff .github: Add 4xlarge nvidia gpu to scale-config (#67262)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67262

Adds a 4xlarge nvidia gpu variant to our scale-config.yml

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31931941

Pulled By: seemethere

fbshipit-source-id: 120c73ad2c973a416a8426ad6f67457f87302db5
2021-10-26 11:19:16 -07:00
7052c41899 .github: Add workflow to build all docker images (#67215)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67215

We were regularly seeing gaps in our docker image builds due to specific
workflows not being run when docker builds occurred on PRs, this should
remove that ambiguity and ensure that all docker builds be re-built if a
rebuild is deemed necessary

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31910422

Pulled By: seemethere

fbshipit-source-id: f346e64f1857e35a995c49bf30521a3acd8af0b1
2021-10-26 11:14:04 -07:00
d7ac6e977a Fix test_create_store_multi flaky test (#66953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66953

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Test Plan: Imported from OSS

Reviewed By: kiukchung

Differential Revision: D31802767

Pulled By: H-Huang

fbshipit-source-id: a430e242788aac164496d4e65b85bf326537d019
2021-10-26 11:08:51 -07:00
49bf24fc83 Updated error message for nn.functional.interpolate (#66417)
Summary:
Description:
- Updated error message for nn.functional.interpolate

Fixes https://github.com/pytorch/pytorch/issues/63845

cc vadimkantorov

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66417

Reviewed By: albanD

Differential Revision: D31924761

Pulled By: jbschlosser

fbshipit-source-id: ca74c77ac34b4f644aa10440b77b3fcbe4e770ac
2021-10-26 10:33:24 -07:00
d47a9004c8 [skip ci] Set test owner for mobile tests (#66829)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66829

Reviewed By: albanD

Differential Revision: D31928812

Pulled By: janeyx99

fbshipit-source-id: 8116b7f3728df8632278b013007c06ecce583862
2021-10-26 10:20:01 -07:00
204ffd33ee [CUDA][Linalg] Add gesvd as SVD fallback; optimize SVD gesvdj performance (#64533)
Summary:
Fix https://github.com/pytorch/pytorch/issues/64237
Fix https://github.com/pytorch/pytorch/issues/28293
Fix https://github.com/pytorch/pytorch/issues/4689

See also https://github.com/pytorch/pytorch/issues/47953

cc ngimel jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64533

Reviewed By: albanD

Differential Revision: D31915794

Pulled By: ngimel

fbshipit-source-id: 29ea48696531ced8a48474e891a9e2d5f11e9d7a
2021-10-26 10:13:52 -07:00
828a9dcc04 [nn] MarginRankingLoss : no batch dim (#64975)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/60585

cc albanD mruberry jbschlosser walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64975

Reviewed By: albanD

Differential Revision: D31906528

Pulled By: jbschlosser

fbshipit-source-id: 1127242a859085b1e06a4b71be19ad55049b38ba
2021-10-26 09:03:31 -07:00
129e99fbce __getitem__: Ensure Tensor subclasses are not treated as tuples (#67202)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67027

`torch.Tensor` is considered a Mapping, but not a Sequence in Python
because it uses `tp_as_mapping` instead of defining `__getitem__` in
Python. However, If you try to overwrite `__getitem__` from Python
it is considered a `Sequence` and so the tensor is treated like a
tuple for indexing purposes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67202

Reviewed By: VitalyFedyunin

Differential Revision: D31908515

Pulled By: albanD

fbshipit-source-id: 0ca55a36be3421f96428a8eacf5d195646252b38
2021-10-26 08:56:59 -07:00
3c61700cf7 torch.linalg.householder_product: forward AD support (#67043)
Summary:
As per title.

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67043

Reviewed By: VitalyFedyunin

Differential Revision: D31897617

Pulled By: albanD

fbshipit-source-id: ef135fe3d9e5b9b2a541c355017f07cdb1309979
2021-10-26 08:34:00 -07:00
5b345e767e QNNPACK: Update to use pytorch/cpuinfo.git repo as a third party dependency (#67106)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67106

Test Plan: Recloned cpuinfo, rebuilt, and ran all the tests locally

Reviewed By: kimishpatel

Differential Revision: D31782317

fbshipit-source-id: 4a71be91f02bb6278db7e0124366d8009e7c7a60
2021-10-26 07:59:19 -07:00
2abffaf050 Consolidate c10d and dist imports in test_c10d_common.py (#67203)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67203

This commit uses `dist` for `torch.distributed` and `c10d` for
`torch.distributed.distributed_c10d`. The former is for public APIs
and the latter is for private ones.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D31906801

Pulled By: mrshenli

fbshipit-source-id: c3a01f33962b01a03dbd565ed119dcdac594bcf2
2021-10-26 07:50:48 -07:00
71b7182ee2 [skip ci] Set test owner for deploy/package tests (#66830)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66830

Reviewed By: albanD

Differential Revision: D31905820

Pulled By: janeyx99

fbshipit-source-id: 9496acc98339d689fa62e18a8781d7344903a64c
2021-10-26 07:49:33 -07:00
49251d05ec [skip ci] Set test owners for NNC tests (#66833)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66833

Reviewed By: albanD

Differential Revision: D31907812

Pulled By: janeyx99

fbshipit-source-id: 5e5013b4276fd208ac68d61cf787679799695602
2021-10-26 07:46:18 -07:00
a6d702a3ee add support for ubuntu 20.04 to CI docker images (#66942)
Summary:
Some minor changes are needed to the .circleci docker scripts to support ubuntu 20.04.  One edit updates the packages needed for all images (.circleci/docker/common/install_base.sh), while the other edit is specific to ROCm support.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH seemethere malfet pytorch/pytorch-dev-infra

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66942

Reviewed By: albanD

Differential Revision: D31899271

Pulled By: janeyx99

fbshipit-source-id: f7677ddc063a4504da9f39a756dc181ac55f200a
2021-10-26 07:41:46 -07:00
83355f9537 [SR][easy] Alias for c10::Symbol::fromQualString (#67162)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67162

It's a bit annoying/ugly to type `c10::Symbol::fromQualString` everywhere, and we can't do `using c10::Symbol::fromQualString` since it's a static class function.

Test Plan: CI

Reviewed By: d1jang

Differential Revision: D31887042

fbshipit-source-id: 073a56c72281c20284a9feef741aed96b58a921d
2021-10-26 06:09:17 -07:00
38cbaeb8a4 Update deprecated import paths. (#67250)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67250

Test Plan: Run tests manually

Reviewed By: NicolasHug

Differential Revision: D31921656

fbshipit-source-id: e2cba7bc7d4a8c7f836bc32f1b8b11a37494a4e2
2021-10-26 04:51:07 -07:00
0c1b7545b6 [Static Runtime] Add more debug info to verify_no_memory_overlap() (#67206)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67206

The memory overlap check still checks the memory overlap for alias ops. It only skips the check for inplace ops. This needs to be fixed if we want to use the memory overlap check in prod.

This diff only adds more debug info. It doesn't fix the aforementioned problem.

Reviewed By: d1jang

Differential Revision: D31889866

fbshipit-source-id: 05a80ace3d404f66f21a8bbdc9678485ff76c8d3
2021-10-26 01:48:41 -07:00
31bcfa3760 [sharded_tensor] refactor sharded_tensor file structure (#67199)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67199

This PR refactors _sharded_tensor package so that it splits from api.py, and add different components to make it more modularized, this will also help us resolve circular dependency due to increasing code size and better organize the package:

* api.py: sharded tensor APIs
* metadata.py: Metadata definition for ShardedTensors
* shard.py: Shard definition for ShardedTensor
* utils.py: utility functions for validation, etc.
ghstack-source-id: 141533618

Test Plan: test_sharded_tensor.py

Reviewed By: pritamdamania87

Differential Revision: D31904249

fbshipit-source-id: c747d96e131a1d4731991ec4ac090f639dcb369b
2021-10-26 00:36:23 -07:00
b96337cf47 add frozen_pyyaml as a builtin library to torch::deploy (#67127)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67127

add frozen_pyyaml as a builtin library to torch::deploy

Test Plan:
unittests pass

> buck test mode/dev-nosan caffe2/torch/csrc/deploy/... -- --regex ".*TestPyYAML.*"

Reviewed By: shunting314

Differential Revision: D31852201

fbshipit-source-id: 889c4493faf09ddd3ec2b9487da9acfea3ab6bcd
2021-10-25 23:16:41 -07:00
0e371e413d [fx-acc] add automated graph opt testing using AccOpProperty (#67228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67228

We added `AccOpProperty` for easy enablement of graph opts for new acc ops based on general properties. This diff adds
1. `AccOpProperty.unary`
2. Automated testing for acc ops with both `AccOpProperty.unary` and `AccOpProperty.pointwise` with `sink_reshape_ops` graph opt. [Adds coverage for 30 more acc_ops]
3. Refactors `graph_opts/TARGETS` to collect all graph optimizations into a common library
4. replaces `def foo(*, input, acc_out_ty=None): assert acc_out_ty is not None` with just `def foo(*, input, acc_out_ty)`. Let me know if there is some hidden purpose to the other implementation.
5. adds `AccOpProperty.*` flags to appropriate ops.

Test Plan:
`buck test mode/dev glow/fb/fx/graph_opts:test_fx_sink`

```
...
Summary
  Pass: 31
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4222124724581304
```

Also ran
```
`buck test mode/dev glow/fb/fx/acc_tracer:`
```
```
...
Summary
  Pass: 136
  ListingSuccess: 4
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5910974582823618
```

Reviewed By: jfix71

Differential Revision: D31671833

fbshipit-source-id: aa16d1008f18f7c8626058361efff33843de3505
2021-10-25 19:53:05 -07:00
3596e13d45 Add torch.nn.init.normal_ and torch.nn.init.kaiming_uniform_ ops to ShardedTensor (#67057)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67057

Extend ShardedTensor with torch.nn.init.[normal_, and kaiming_uniform_] ops
Follow up from https://github.com/pytorch/pytorch/pull/63997

Test Plan:
a) Unit Test
(pytorch) ... $ python test/distributed/_sharded_tensor/ops/test_init.py TestShardedTensorNNInit --v

or b) Manual run: Instruction here: https://docs.google.com/document/d/1_m1Hdo5w51-hhPlZ_F8Y6PIWrN7UgJZqiSpARYvhsaE/edit#
s/uniform_/normal_ or kaiming_uniform_

Imported from OSS

Reviewed By: pritamdamania87

Differential Revision: D31845654

fbshipit-source-id: e7aedc0972539da59f7b84bbbf617caf6b206d52
2021-10-25 19:14:30 -07:00
bfcde08612 [trt] Algorithm recorder/replayer (#4)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch-canary/pull/4

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67211

Record the algorithm selection, dump it in json format and replay it. This has potential to
1. consistently repro the issue (algo selection could be sensitive to local benchmark timing)
2. manual edit the dumped json file to control algorithm selection.

Reviewed By: wushirong, 842974287

Differential Revision: D31888836

fbshipit-source-id: 4611fda548f7391776f1ad61572b1f59fa4665b6
2021-10-25 18:50:55 -07:00
ecf7e96969 [Light] Remove ambiguity from compile_spec names, use actual output type (#67209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67209

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67198

Fixing a couple instances where parameters were named method_compile_spec when they were actually compile_specs that could have multiple method_compile_specs inside.
Also use output dtype from buffer.

Test Plan:
Mobilenetv3 compiles and runs fine
```
(pytorch)  ~/fbsource/fbcode/caffe2/fb/nnc
└─ $ PYTORCH_JIT_LOG_LEVEL="aot_compiler" buck run //caffe2/binaries:aot_model_compiler -- --model mobilenetv3.pt --model_name=pytorch_dev_mobilenetv3 --model_version=v1 --input_dims="1,3,224,224
"
Downloaded 4501/6195 artifacts, 433.89 Mbytes, 14.3% cache miss (for updated rules)
Building: finished in 06:34.6 min (100%) 20233/20233 jobs, 5467/20233 updated
  Total time: 06:35.0 min
BUILD SUCCEEDED
The compiled llvm assembly code was saved to mobilenetv3.compiled.ll
The compiled model was saved to mobilenetv3.compiled.pt

└─ $ ./compile_model.sh -m pytorch_dev_mobilenetv3 -p /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/mobilenetv3.pt -v v1 -i "1,3,224,224"
+ VERSION=v1
+ getopts m:p:v:i:h opt
+ case $opt in
+ MODEL=pytorch_dev_mobilenetv3
.
.
Columns 961 to 9701e-11 *
-4.2304 -3.9674  2.4473 -0.8664 -0.7513  1.2140  0.0010  3.8675  1.2714  2.2989

Columns 971 to 9801e-11 *
-2.7203  1.6772 -0.7460 -0.6936  4.4421 -0.9865 -0.5186 -1.4441  1.3047 -1.6112

Columns 981 to 9901e-11 *
 0.1275 -1.8815  2.5105 -0.4871 -2.2342  0.8520  0.8658  1.6180  3.8901 -0.2454

Columns 991 to 10001e-11 *
-1.4896  4.1337 -2.6640  0.8226  0.2441 -1.4830 -1.7430  1.8758  0.5481  0.5093
[ CPUFloatType{1,1000} ]
Starting benchmark.
Running warmup runs.
Main runs.
Main run finished. Milliseconds per iter: 276.255. Iters per second: 3.61984
Memory usage before main runs: 104366080 bytes
Memory usage after main runs: 343441408 bytes
Average memory increase per iter: 2.39075e+07 bytes
0 value means "not available" in above
```

Reviewed By: ljk53

Differential Revision: D31698338

fbshipit-source-id: da6c74c1321ec02e0652f3afe6f97bf789d3361b
2021-10-25 17:44:05 -07:00
ad5731cacc [PyTorch] Add flop count for bmm and baddbmm (#66636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66636

Add FLOP count for bmm and baddbmm, which is `2*b*m*n*k`.

Reviewed By: ngimel

Differential Revision: D31622061

fbshipit-source-id: f3e1e1e34c45228693117b81647fb4a623c4085b
2021-10-25 17:31:12 -07:00
7acf0c6d4b [PyTorch Edge][type] Add type support for NamedTuple custom class (export) (#62612)
Summary:
Add type support for namedtule custom class. For the namedtuple type, it will deserailize to the following format in string
```
"qualified_named[
    NamedTuple, [
        [filed_name_1, field_type_1],
        [filed_name_2, field_type_2]
    ]
]"
```

If it's nested, it will be
```
"__torch__.A[
    NamedTuple, [
        [field_name_a, __torch__.B [
            NamedTuple, [
                [field_name_b, __torch__.C [
                    NamedTuple, [
                      [field_name_c_1, Tensor],
                      [field_name_c_2, Tuple[Tensor, Tensor]],
                    ]
                ]
                ]
            ]
        ]
        ]
    ]
]
"
```
The nametuple type includes both `collection` and `typing`.
```

from typing import NamedTuple
from collections import namedtuple
```

It will be a forward incompatible change. However this type is never supported and exported before and we don't have a proper way to backport it. The optimum solution to ship this change is probably
1. Update the change for import without the change to export. So the runtime can read the new format, but no new format will be exported.
2. Update the change to export the new type. So runtime can export new format.

For the following example:
```
class Foo(NamedTuple):
    id: torch.Tensor

class Bar(torch.nn.Module):
    def __init__(self):
        super(Bar, self).__init__()
        self.foo = Foo(torch.tensor(1))

    def forward(self, a: torch.Tensor):
        self.foo = Foo(a)
        return self.foo
```
The new bytecode.pkl will be
```
(6,
 ('__torch__.mobile.test_lite_script_type.MyTestModule.forward',
  (('instructions',
    (('STOREN', 1, 2),
     ('DROPR', 1, 0),
     ('MOVE', 2, 0),
     ('LIST_CONSTRUCT', 0, 1),
     ('NAMED_TUPLE_CONSTRUCT', 1, 1),
     ('RET', 0, 0))),
   ('operators', ()),
   ('constants', ()),
   ('types',
    ('List[Tensor]',
     '__torch__.mobile.test_lite_script_type.myNamedTuple[NamedTuple, [[a, '
     'List[Tensor]]]]')),
   ('register_size', 2)),
  (('arguments',
    ((('name', 'self'),
      ('type', '__torch__.mobile.test_lite_script_type.MyTestModule'),
      ('default_value', None)),
     (('name', 'a'), ('type', 'Tensor'), ('default_value', None)))),
   ('returns',
    ((('name', ''),
      ('type', '__torch__.mobile.test_lite_script_type.myNamedTuple'),
      ('default_value', None)),)))))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62612

ghstack-source-id: 141485500

Test Plan:
fb:
1. Add a simple unittest to test NamedTuple custom class
2. Use following cpp code (D30271153)
```
TEST(LiteTrainerTest, CustomOp) {

  std::string jit_model =
  "/home/chenlai/local/notebooks/ads_dper_fl_model_282250609.pt";

  Module jit_m = load(jit_model);

  jit_m.eval();
  torch::jit::Module module_freeze = freeze(jit_m);
  IValue tuple =
      c10::ivalue::Tuple::create({1 * torch::ones({10, 1034}), 3 * torch::ones({10, 1034})});
  std::vector<IValue> inputs_1{tuple};
  auto jit_output = jit_m.forward(inputs_1);
  jit_output.dump();

  std::stringstream ss;
  jit_m._save_for_mobile(ss);
  jit_m._save_for_mobile("/home/chenlai/local/notebooks/tmp/tmp.ptl");

  torch::jit::mobile::Module mobile_m = _load_for_mobile(ss);
  auto mobile_output = mobile_m.forward(inputs_1);
  std::cout << "mobile output: " << std::endl;
  mobile_output.dump();
  }
```
And output from both mobile and jit are
```
{prediction: ([ CPUFloatType{0} ], [ CPUFloatType{0} ])}
```

3. N1033894 with model inspection, also compare the result between jit and mobile with the dper model.

Reviewed By: iseeyuan

Differential Revision: D30004716

fbshipit-source-id: cfd30955e66a604af8f9633b1b608feddc13d7d7
2021-10-25 17:15:50 -07:00
0d7d446154 Disallow annotations on instance attributes outside __init__ (#67051)
Summary:
**Summary**: This commit solves the first part of https://github.com/pytorch/pytorch/issues/52306, which disallows type annotations on instance attributes inside any method other than the constructor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67051

Test Plan:
Added test to test_types.py.

**Reviewers**: Zhengxu Chen

**Subscribers**: Zhengxu Chen, Yanan Cao, Peng Wu, Yining Lu

**Tasks**: T103941984

**Tags**: pytorch

**Fixes** https://github.com/pytorch/pytorch/issues/52306

Reviewed By: zhxchen17

Differential Revision: D31843527

Pulled By: andrewor14

fbshipit-source-id: 624879ae801621e367c59228be8b0581ecd30ef4
2021-10-25 16:20:47 -07:00
1f55dd83ac [WIP] wrap XLATensors into Python XLA wrapper class (#65841)
Summary:
**Improbably** fixes https://github.com/pytorch/pytorch/issues/65130

ezyang I'm super n00b in Python extensions, is this what we want to do?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65841

Reviewed By: navahgar

Differential Revision: D31889790

Pulled By: Krovatkin

fbshipit-source-id: c7f077b89f6f02df1962ab83d9e13fcc348a227d
2021-10-25 16:11:03 -07:00
fa7fb7b4d9 [skip ci] Set test owner for test_profiler.py (#66831)
Summary:
Followup action to https://github.com/pytorch/pytorch/issues/66232

cc ilia-cher robieta chaekit gdankel bitfort ngimel orionr nbcsm guotuofeng guyang3532 gaoteng-git

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66831

Reviewed By: gdankel

Differential Revision: D31909245

Pulled By: janeyx99

fbshipit-source-id: 4156a5cffa215c29022fc4dab6ee5b442a509db4
2021-10-25 15:59:52 -07:00
0acc21b412 [vulkan] Add 2D transposed convolutions (#67104)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67104

Add 2D transposed convolutions to Vulkan. Currently, only `dilation={1,1}` is supported. We plan to support dilation at a later time.

Test Plan:
Build and run `vulkan_api_test`:

```
cd ~/pytorch
BUILD_CUSTOM_PROTOBUF=OFF \
  BUILD_TEST=ON \
  USE_EIGEN_FOR_BLAS=OFF \
  USE_FBGEMM=OFF \
  USE_MKLDNN=OFF \
  USE_NNPACK=OFF \
  USE_NUMPY=OFF \
  USE_OBSERVERS=OFF \
  USE_PYTORCH_QNNPACK=OFF \
  USE_QNNPACK=OFF \
  USE_VULKAN=ON \
  USE_VULKAN_API=ON \
  USE_VULKAN_SHADERC_RUNTIME=ON \
  USE_VULKAN_WRAPPER=OFF \
  MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python3 setup.py develop --cmake && ./build/bin/vulkan_api_test
```

Reviewed By: beback4u

Differential Revision: D31731742

fbshipit-source-id: b79c946c8d988bb4d83e9fd3381992a4f2f4be80
2021-10-25 15:55:20 -07:00
059ae96007 [jit] Factor findAllNodes into one place. (#65965)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65965

ghstack-source-id: 141504185

Test Plan: no behavior change

Reviewed By: qihqi, ejguan

Differential Revision: D31326152

fbshipit-source-id: 2e0261a96853bfb67a96dd68972c905b6b26d562
2021-10-25 15:42:52 -07:00
239b38268b [fx2trt] Better trt layer name (#67200)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67200

We want to put more information on the tensorrt layer name. Mainly we want to be able to tell the original op that a TensorRT layer is mapped from.

The layer format is `[TensorRT Layer Type]-[Original Op Code]-[FX Node Name]`
```
Reformatting CopyNode for Input Tensor 0 to [FULLY_CONNECTED]-[acc_ops.linear]-[linear_1]: 0.0328ms
[FULLY_CONNECTED]-[acc_ops.linear]-[linear_1]: 0.027712ms
PWN([RELU]-[acc_ops.relu]-[relu_1]): 0.008672ms
```

Test Plan:
CI

```
buck run mode/dev-nosan -c python.package_style=inplace caffe2:fx2trt_example
```

Reviewed By: wushirong

Differential Revision: D31627274

fbshipit-source-id: 3dbb576caa63b922274541d2a306b4bd37e707c5
2021-10-25 15:41:38 -07:00
4ac8d06911 [quant] Remove unused print in quantization_patterns.py (#67191)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67191

Test Plan:
sandcastle and ossci

Imported from OSS

Reviewed By: supriyar

Differential Revision: D31899784

fbshipit-source-id: 31ad63c0b2a5328fff80c38dc4e527e0399e802e
2021-10-25 15:07:18 -07:00
12daa4f663 [jit][edge] Enable CALL instruction in lite interpreter. (#65964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65964

ghstack-source-id: 141425519

Test Plan: buck run xplat/caffe2:test_lite_interpreter

Reviewed By: cccclai

Differential Revision: D31326149

fbshipit-source-id: 8a599d92f3fa4e6c125100adb36d89592e71e547
2021-10-25 14:44:33 -07:00
b8dfb45ac2 Refactor cub namespace handling (#66219)
Summary:
This PR is to update PyTorch with the following cub changes:
- Starting cub 1.13.1, cub requires users to define `CUB_NS_QUALIFIER` if `CUB_NS_PREFIX` is also defined. Besides that, a new mechanism `CUB_WRAPPED_NAMESPACE` is added.

And I do the following change to PyTorch:
- Starting CUDA 11.5, define `CUB_WRAPPED_NAMESPACE` globally as an nvcc flag.
- Fix caffe2 failures caused by the above change.
- Add a `aten/src/ATen/cuda/cub_definitions.cuh` that defines helper macros about feature availability.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66219

Reviewed By: bdhirsh

Differential Revision: D31626931

Pulled By: ngimel

fbshipit-source-id: 97ebf5ef671ade8bf46d0860edc317f22660f26d
2021-10-25 14:37:09 -07:00
700b39a3df Sparse CSR CUDA: add torch.addmm with all inputs sparse (#63511)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63511

This PR adds `torch.addmm(c, a, b)` variant with `c, a, b` all being CSR tensors.
The underlying cuSPARSE function works only with 32-bit indices, and in
the current implementation the result tensor has 32-bit indices. Input
tensors can have both 64-bit and 32-bit indices tensors.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D31809838

Pulled By: cpuhrsch

fbshipit-source-id: 97005dba27d8adcae445eb756bcbd7271061e9b5
2021-10-25 14:32:30 -07:00
333717eaf0 Improve assert failure message in test_get_torch_func_signature_exhaustive (#67039)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67039

cc mruberry

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D31899719

Pulled By: cpuhrsch

fbshipit-source-id: 819d07da5b18b31d462010b9f9382e0b8cd10f9f
2021-10-25 14:20:38 -07:00
a6d0339492 [Pytorch Edge] Extend runtime compatibility to custom classes (#66972)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66972

Add api to view how many custom classes we have and what their names are

Test Plan: unit test

Reviewed By: cccclai

Differential Revision: D31811337

fbshipit-source-id: 9f8ca1fc578a0a5360c9cd8f95475acc33f250e4
2021-10-25 13:42:26 -07:00
f4dd88489a Better and more consistent error messages in torch.linalg (#62734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62734

Following https://github.com/pytorch/pytorch/pull/62715#discussion_r682610788
- squareCheckInputs takes a string with the name of the function
- We reuse more functions when checking the inputs

The state of the errors in torch.linalg is far from great though. We
leave a more comprehensive clean-up for the future.

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D31823230

Pulled By: mruberry

fbshipit-source-id: eccd531f10d590eb5f9d04a957b7cdcb31c72ea4
2021-10-25 13:24:28 -07:00
4dce051cb0 [jit][edge] Add control stack frame to lite interpreter (#65963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65963

ghstack-source-id: 141425517

Test Plan: In next diff.

Reviewed By: qihqi, cccclai

Differential Revision: D31326150

fbshipit-source-id: dbbf65f2bf14846c45d0add71edc7d4dbfc6b92c
2021-10-25 12:15:16 -07:00
ac948f4f35 .github: Migrate linux-xenial-py3.6-gcc7 to GHA (#67072)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66888

cc seemethere

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67072

Reviewed By: seemethere

Differential Revision: D31900833

Pulled By: zhaoalex

fbshipit-source-id: 93f8995611169d991f90e07e8c13e08182969577
2021-10-25 11:40:12 -07:00
9de0888891 Move the registration of CPython builtin modules to BuiltinRegistry (#67085)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67085

leverages BuiltinRegistry to register the CPython standard C modules. The standard C modules moved are in the FOR_EACH macro

Test Plan:
buck test mode/opt //caffe2/torch/csrc/deploy/interpreter:test_builtin_registry

buck test mode/opt //caffe2/torch/csrc/deploy:test_deploy

Reviewed By: shunting314

Differential Revision: D31848547

fbshipit-source-id: 7eb49d222eaaccb2b8ca5c984b05bf54cc233f25
2021-10-25 11:12:07 -07:00
d68bb50ef3 Disable SVE when cross-compiling for M1 (#67114)
Summary:
Followup after https://github.com/pytorch/pytorch/issues/58653
It does not matter whether one compiles locally or cross-compiles -
attempts to use SVE on M1 results in compiler crash as SVE ABI is not
defined on MacOS

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67114

Reviewed By: VitalyFedyunin

Differential Revision: D31869356

Pulled By: malfet

fbshipit-source-id: 184e26ae40edc7ef7b703200b53ea7a15da74818
2021-10-25 11:03:00 -07:00
5d9ff8f30e [Static Runtime] Add static_runtime::fused_sigrid_transforms (#66659)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66659

Original message: We added and registered a new operator, static_runtime::fused_sigrid_transforms, and modified the original sigrid_transforms to handle non-fused case only

Note: this diff was commandeered from a bootcamper. Some final touches were needed.

Test Plan: `buck test caffe2/benchmarks/static_runtime/...`

Reviewed By: swolchok

Differential Revision: D31550307

fbshipit-source-id: 287380be0cca20ee6e145bcc7217547bd58cf6d0
2021-10-25 10:44:46 -07:00
8d164a36fb Use at::native::is_nonzero in promoted ops to improve portability (#67097)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67097

all delegated models have `is_nonzero` ops by default, by making the op native and consumable without dispatch eases the portability of such models
ghstack-source-id: 141375082

Test Plan:
`buck test caffe2/test/cpp/jit:jit -- BackendTest.TestComposite`

```
~/fbsource/fbcode] cd ~/fbsource/fbcode/ && buck test caffe2/test:jit -- test_trace_arange
Parsing buck files: finished in 0.5 sec
Building: finished in 9.4 sec (100%) 16035/16035 jobs, 0/16035 updated
  Total time: 10.0 sec
More details at https://www.internalfb.com/intern/buck/build/1e55eea5-2adb-41d1-96ae-cbf4b446d6c6
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 46eedba2-ae17-4e88-b205-93bd1332665d
Trace available for this run at /tmp/tpx-20211015-113905.235421/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/1970324912349177
    ✓ ListingSuccess: caffe2/test:jit - main (12.372)
    ✓ Pass: caffe2/test:jit - test_trace_arange (jit.test_tracer.TestTracer) (13.748)
    ✓ Pass: caffe2/test:jit - test_trace_arange_with_grad (jit.test_tracer.TestTracer) (13.892)
Summary
  Pass: 2
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/1970324912349177
```

Reviewed By: iseeyuan

Differential Revision: D31656842

fbshipit-source-id: c0e6c798478a2783c0e17e6e9100ba5ce044da78
2021-10-25 10:18:31 -07:00
acb340de75 [Pytorch][Bootcamp] Add fixes and vanilla testing for Adagrad non-vectorized and vectorized optimizers to handle complex numbers (#66671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66671

Made changes in the step function of the vectorized and non-vectorized adagrad optimizers to handle complex numbers as two real numbers as per 65711 on github
ghstack-source-id: 141442350

Test Plan:
buck test mode/dev caffe2/test:optim -- 'test_adagrad_complex'
https://pxl.cl/1Rd44

Reviewed By: albanD

Differential Revision: D31673503

fbshipit-source-id: 90a0d0c69b556716e2d17c59ce80f09c750fc464
2021-10-25 10:13:21 -07:00
a0495b3cdb [SR] Remove unused operator() overload (#67001)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67001

The overload of `operator()` taking `std::vector<at::Tensor>` was only used for testing. In a diff following this one, I will add a new overload that takes `std::vector<c10::IValue> args` and no `kwargs` so we can avoid default-constructing `kwargs` everywhere.

This new overload will probably take a forwarding reference, so to avoid problems with overloading on forwarding reference and simplify the interface, it's best to remove this unused one.

Test Plan:
`buck test caffe2/benchmarks/static_runtime/...`

`buck test caffe2/test:static_runtime`

Reviewed By: hlu1

Differential Revision: D31821990

fbshipit-source-id: 6d2e4a75ca4abe6e262651532eb96c3b274c6f4a
2021-10-25 08:18:58 -07:00
364645cd9d [SR] Factor operator() implementation into separate function (#67125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67125

Using explicit template instantiations in D31659973 (f2582a59d0) was a bad idea. The problem is that the lvalue instantiation was for a `const` vector of `IValue`, meaning that if you tried to pass SR a non-const vector of arguments, the linker would fail to find the symbol.

The reason we didn't catch this in D31659973 (f2582a59d0) was because predictor always passes a `const` reference anyways. But we should fix this to prevent unexpected problems in the future.

Test Plan: `buck test caffe2/benchmarks/static_runtime/...`

Reviewed By: hlu1

Differential Revision: D31873406

fbshipit-source-id: 5ab5a03334bed925cec11facadcedf9bec9b90ad
2021-10-25 08:17:40 -07:00
edd4d246c3 Accept 0-dim channel inputs in convolution layer (#66256)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56998 .

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66256

Reviewed By: mrshenli

Differential Revision: D31859428

Pulled By: jbschlosser

fbshipit-source-id: 034b6c1ce35aac50eabfa09bbcd8b1e3c8b171bd
2021-10-25 08:12:29 -07:00
6c985b57ff OpInfo : nn.functional.embedding (#66997)
Summary:
Adds OpInfo for `nn.functional.embedding`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66997

Reviewed By: mrshenli

Differential Revision: D31859799

Pulled By: zou3519

fbshipit-source-id: bbca860df4fbc243751f5fa81658231866c31d2e
2021-10-25 08:06:32 -07:00
adc21f1966 [quant] Fix docs build (#67169)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67169

Looks like the doc error only appears after it's landed

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D31890431

fbshipit-source-id: d40cba082712c4b35704ea15d82fbc4749f85aec
2021-10-25 08:02:26 -07:00
dd81fa9027 [JIT] Freeze allows preservation of submodule attributes (#66102)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66102

This changes allows the `preserved_attributes` parameter of `torch.jit.freeze` to accept attributes of submodules. Previously, only root-level attributes were able to be preserved. Example:

```
class SubModule(nn.Module):
    def __init__(self):
        super(SubModule, self).__init__()
        self.a = 1
        self.b = 2

    def forward(self):
        return self.a + self.b

class Module(nn.Module):
    def __init__(self):
        super(Module, self).__init__()
        self.sub = SubModule()

    def forward(self):
        return self.sub()

mod = torch.jit.script(Module())
mod.eval()
frozen_mod = torch.jit.freeze(mod, preserved_attrs = ['sub.a'])

mod.sub   # OK
mod.sub.a # OK
mod.sub.b # Error, not preserved
mod()     # = 3
mod.sub.a = 0
mod()     # = 2
```

Test Plan: `buck test caffe2/test:jit -- TestFreezing`

Reviewed By: eellison

Differential Revision: D31383868

fbshipit-source-id: 34a05ca9528d4e5f04f71ac2a339fd584a8fa305
2021-10-25 07:56:20 -07:00
09c7771e9c Set test owners for jit tests (#66808)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66808

Reviewed By: mrshenli

Differential Revision: D31761414

Pulled By: janeyx99

fbshipit-source-id: baf8c49ff9c4bcda7b0ea0f6aafd26380586e72d
2021-10-25 07:51:10 -07:00
364c4959c3 [quant] Fix docs error in convert_fx (#67152)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67152

Test Plan:
```
cd docs
make html
```

Imported from OSS

Reviewed By: supriyar

Differential Revision: D31884570

fbshipit-source-id: 2b521f617c93f6fa08da3387df2d25497293eee6
2021-10-24 19:26:45 -07:00
a7ebf76a15 jit trace (#59949)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59949

Reviewed By: ZolotukhinM

Differential Revision: D31366787

Pulled By: Krovatkin

fbshipit-source-id: 798cbcd97e8ecfba984f98cd70214954be9309af
2021-10-24 18:04:22 -07:00
f1b5f1898b Automated submodule update: kineto (#67133)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/kineto](https://github.com/pytorch/kineto).

New submodule commit: 879a203d9b

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67133

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: mrshenli

Differential Revision: D31877172

fbshipit-source-id: 224a499607d1f3bf7c00d8d8dd1fdac47cd33a3b
2021-10-24 13:06:19 -07:00
b51731527d [ez] [Docs] Missing import in example for post_local_sgd (#67047)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67047

Fix missing import
ghstack-source-id: 141258423

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D31841837

fbshipit-source-id: 139e614517dcac7a53259ff7a0360bb5275bb53b
2021-10-24 01:44:06 -07:00
0000c88e10 [FSDP] No need for list() in _get_shard (#66957)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66957

chunk appears to return a tuple which is enough given that we just
index to the right chunk and discard the rest.
ghstack-source-id: 141391149

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D31780799

fbshipit-source-id: fdb1b77fffa916328e14a4cd692b5241ae46a514
2021-10-24 01:29:19 -07:00
580efb35a5 [FSDP] Add some comments after reading the code. (#66956)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66956

Adds some comments I found helpful while ramping up on FSDP code.
ghstack-source-id: 141391150

Test Plan: n/a

Reviewed By: mrshenli

Differential Revision: D31780798

fbshipit-source-id: e2d38a9801b4548b202a73615774d5f0f7f5e3ed
2021-10-24 01:28:19 -07:00
b6fa998892 Revert D31514095: Use kernel_func_name from aotCompiler
Test Plan: revert-hammer

Differential Revision:
D31514095 (7b55dc8340)

Original commit changeset: b70c8e2c7336

fbshipit-source-id: ad4d828f33506e612b51c276149fa0e12b0565d5
2021-10-23 17:17:53 -07:00
313939c9c6 [quant] Fix lint errors (#67138)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67138

Test Plan:
ossci

Imported from OSS

Reviewed By: supriyar

Differential Revision: D31879558

fbshipit-source-id: 271905d3d254c906aa78bae9f2bd411f9d57e1e8
2021-10-23 11:26:25 -07:00
7b55dc8340 Use kernel_func_name from aotCompiler (#66337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66337

Right now, assembly code generated for the a given method from the model is named wrapper or func by default. The function name is then replaced with a proper kernel_func_name after target specific assembly is generated.
This PR propagates a desired kernel_func_name right from aotCompiler API so that the generated function has the needed name that doesn't need to be replaced later.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31514095

Pulled By: priyaramani

fbshipit-source-id: b70c8e2c733600a435cd4e8b32092d37b7bf7de5
2021-10-23 02:20:45 -07:00
64c68edaf3 [pt] Add Half precision support for bucketize and searchsorted op (#67077)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67077

Test Plan: CI

Reviewed By: yinghai

Differential Revision: D31852556

fbshipit-source-id: 1e4212146ee67edc6b6568d25db79de525782788
2021-10-22 23:37:37 -07:00
2d81d5ab0a [quant][graphmode][fx] Remove fbgemm_backend_config_dict for now (#67066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67066

We'll add it later when the api is ready

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31849079

fbshipit-source-id: 0c00d08510166b2d897cf1562c7276527319b05c
2021-10-22 21:57:56 -07:00
8460fa5707 [quant][fx] Add an option in convert_fx to accept qconfig_dict to skip quantization (#66878)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66878

Currently convert_fx quantizes all layers that have been prepared, depending on the prepare qconfig_dict
This PR adds support to accept a variation of qconfig_dict in convert_fx that can be used to specify skip quantizing certain layers

This can help with prepare/observe all operators, quantize a subset of them (based on quantization error), to avoid preparing multiple times.

The qconfig_dict passed to convert_fx can only have the values set to `None`, with the keys being the same as what is allowed in the prepare qconfig_dict

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_convert_qconfig_dict

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D31808247

fbshipit-source-id: a4f5dca1090f0083fc3fea14aff56924033eb24f
2021-10-22 21:18:15 -07:00
d13829e6be [quant][[fx] update observer_fqn to not depend on node.name (#66767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66767

Make observer fqn in prepare step independent of input_node/observed_node name.
This change names the observers as `{input/output}_activation_post_process_{idx}` where idx will be incremented for each new observer instance and is guaranteed to be unique.

Test Plan:
python test/test_quantization.py test_observer_fqn

Imported from OSS

Reviewed By: anjali411

Differential Revision: D31752052

fbshipit-source-id: e0995b1ef33a99d5b012133fe92d303d55a73b7d
2021-10-22 21:16:24 -07:00
83f70db95c Fix common device computation for comparison ops. (#66245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66245

Fixes #66053

This PR splits `declare_static_dtype_and_device` into two new methods for
`TensorIteratorBase`: `declare_static_dtype` and `declare_static_device`.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31503849

Pulled By: ngimel

fbshipit-source-id: 4b131b691d29ceb5f3709f5d6503997ea0875c54
2021-10-22 18:43:17 -07:00
3f5adf4f9c [quant][graphmode][fx] Use the new convert function instead of the old one in quant-fx2trt tests (#67065)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67065

Switching to use _convert_fx_do_not_use in the tests

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31849077

fbshipit-source-id: 3688fc09ac538b6abc16ce87c600b8ee04acfcd1
2021-10-22 18:29:58 -07:00
af1a2df825 enable better depthwise conv perf on cudnn 8.2+ (#58749)
Summary:
There are multiple improvement of depthwise convolution speed in cudnn between 7.6 and 8.2, since https://github.com/pytorch/pytorch/pull/22302.
This PR aim to harvest all the new improvement by enable more cudnn kernel. The workload checking logic can also be simplified now.
To keep the change simple, I kept things before cudnn 8.2 unchanged.

Similar to https://github.com/pytorch/pytorch/pull/22302, I used a script [here](https://gist.github.com/FDecaYed/e8ba98a95cd33697df2ace86fdb44897) to benchmark. Both run are using cudnn 8.2
One enhancement I did to the script is switch to event based timing. With warmup kernels to fill the launch queue ahead, this should give us accurate kernel timing even in CPU launch bound cases.

Here is A100 and V100 result sorted by speedup.
[Book1.xlsx](https://github.com/pytorch/pytorch/files/6530371/Book1.xlsx)

Result highlights:
Newly turned on 5x5 cudnn kernel show up to 6x speedup.
Close to half of test sizes show >10% speedup.
Fixed some corner cases that previously caused 15-20x slowdown.
Only slowdown a handful of cases(~10 out of >1000)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58749

Reviewed By: bdhirsh

Differential Revision: D31613199

Pulled By: ngimel

fbshipit-source-id: 883b58facad67ccd51dc9ab539368b4738d40398
2021-10-22 17:47:07 -07:00
cf3a5160f8 [BE] move init_multigpu_helper to common_distributed (#67050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67050

This PR moves init_multi_gpu_helper to common_distributed so that it could be shared by different distributed tests.
ghstack-source-id: 141370119

Test Plan: wait for ci.

Reviewed By: mrshenli

Differential Revision: D31842644

fbshipit-source-id: c7bad25d6cef9bdce7ad1fb6c60c1cad4b765702
2021-10-22 17:16:11 -07:00
df3f82a1ef Add more FSDP unit tests to cover core logic, freezing weights and flatten parameter wrapper (#66904)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66904

Add more FSDP unit tests to cover core logic, freezing weights and flatten parameter wrappe, these unit tests are refactored to be aligned with PyTorch commonly used test classes
ghstack-source-id: 141335614

Test Plan: unit tests

Reviewed By: mrshenli

Differential Revision: D31779565

fbshipit-source-id: c727110d1d7570c0ec49e42cadfc9e9a5e440073
2021-10-22 16:50:52 -07:00
f6c88fa99d Revert D31627107: [BE] delete frontend.cpp
Test Plan: revert-hammer

Differential Revision:
D31627107

Original commit changeset: 07d30d280c25

fbshipit-source-id: 5e82f2158f5007c67adb8f947f8cc4d995a9a3bc
2021-10-22 16:39:02 -07:00
f50bf16c04 Revert D31663043: [BE] minor improvement to dist quantization
Test Plan: revert-hammer

Differential Revision:
D31663043

Original commit changeset: 2f96b7346e9c

fbshipit-source-id: d38684dfe79ca335fbbe624496ad4c86c29d1570
2021-10-22 16:37:41 -07:00
7b0408684b Fix linter (#67122)
Summary:
Fixes regression introduced by 7e5aa0d35a

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67122

Reviewed By: seemethere

Differential Revision: D31872569

Pulled By: malfet

fbshipit-source-id: ada0137db9a46cbec573489c9c37a94f3a7576ae
2021-10-22 16:02:36 -07:00
018e06edca [torchelastic] Skip tests in tsan mode (#67103)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67103

Skip tests in tsan mode for now. More info: T104010063

Test Plan: sandcastle + running tests in mode/dev-tsan

Reviewed By: d4l3k

Differential Revision: D31861426

fbshipit-source-id: d50e5d06afbc82ccce6d102e52f72b5b01f6f41a
2021-10-22 15:55:18 -07:00
7e5aa0d35a fixed unique arguments documentation (#66132)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66132

Differential Revisi
<img width="875" alt="Screen Shot 2021-10-05 at 12 10 39 PM" src="https://user-images.githubusercontent.com/17888388/136276286-3df20681-7b7a-4a91-97d6-4f1ac3722121.png">
on: [D31397746](https://our.intern.facebook.com/intern/diff/D31397746/)

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D31734476

Pulled By: samdow

fbshipit-source-id: 8999443c7f9b24394d7543652b8350261c1f8b3a
2021-10-22 14:50:02 -07:00
a7bbf8814c [quant][graphmode][fx] Move quant-fx2trt unittests to test_quantize_fx.py (#67064)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67064

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31849075

fbshipit-source-id: 9c5e8aad7c88070830d853faf3106491726e77ff
2021-10-22 14:36:36 -07:00
7379d4db20 [BE] minor improvement to dist quantization (#66649)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66649

some minor changes to dist quantization, mainly change the namespace and add some notes for future code dedup
ghstack-source-id: 141336191

Test Plan: wait for ci

Reviewed By: cbalioglu

Differential Revision: D31663043

fbshipit-source-id: 2f96b7346e9c90df5ab2536767f8301eb86a9c79
2021-10-22 13:46:28 -07:00
1da628bdb7 [ONNX] Update slice process shape to support rank only inference (#65782) (#66149)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66149

Updated logic will be able to infer rank of slice output, when only rank is known for slice input. Enables cases where `ConstantValueMap::HasRank(input)` is `True`, while `ConstantValueMap::HasShape(input)` is `False`.

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D31423840

Pulled By: malfet

fbshipit-source-id: 17b2b24aa63435d5212ebe6bdf66ae3c348c4e3b

Co-authored-by: BowenBao <bowbao@microsoft.com>
2021-10-22 13:46:26 -07:00
0bc9928f31 [ONNX] Symbolic: dynamic input for OneHot, bool for Einsum (#65940) (#66147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66147

Symbolic: dynamic input for OneHot, bool for Einsum

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D31424094

fbshipit-source-id: 76bea22b29c93d1621c597fe7ab59deb3685087f

Co-authored-by: jiafatom <jiafa@microsoft.com>
2021-10-22 13:46:24 -07:00
2c0fe338da [ONNX] Modify softplus symbolic to support beta!=1 (#65001) (#66146)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66146

* Modify softplus symbolic to support beta!=1

* Remove parse args

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D31424096

fbshipit-source-id: 971af54a28141737ccb17510ada03b0651be2a63
2021-10-22 13:46:22 -07:00
6f3f302d9f [ONNX] Deprecate fold_if pass (#65697) (#66145)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66145

Deprecate fold_if pass

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D31424097

fbshipit-source-id: 25b89679c756393a1065ca6aaa24d29db960cbd4

Co-authored-by: jiafatom <jiafa@microsoft.com>
2021-10-22 13:46:20 -07:00
a0fc14c20f [ONNX] Add diagonal symbolic (#64454) (#66144)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66144

* Add logic and tests

* minor edits

* Eliminate expand ops

* Fix flake and editing

* Modified errant message

* Add overrun check

* Add overrun descriptions

* Remove emptyline

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D31424095

fbshipit-source-id: 5b8ef6ac21c32d43c3dbc8e51e1ef30bffb19c25
2021-10-22 13:46:18 -07:00
b18c298f24 ONNX: Delete or document skipped ORT tests (#64470) (#66143)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66143

Delete test_list_remove. There's no point in testing conversion of
this model since TorchScript doesn't support it.

Add a link to an issue tracking test_embedding_bag_dynamic_input.

[ONNX] fix docs (#65379)

Mainly fix the sphinx build by inserting empty before
bulleted lists.

Also some minor improvements:
Remove superfluous descriptions of deprecated and ignored args.
The user doesn't need to know anything other than that they are
deprecated and ignored.

Fix custom_opsets description.

Make indentation of Raises section consistent with Args section.

[ONNX] publicize func for discovering unconvertible ops (#65285)

* [ONNX] Provide public function to discover all unconvertible ATen ops

This can be more productive than finding and fixing a single issue at a
time.

* [ONNX] Reorganize test_utility_funs

Move common functionality into a base class that doesn't define any
tests.

Add a new test for opset-independent tests. This lets us avoid running
the tests repeatedly for each opset.

Use simple inheritance rather than the `type()` built-in. It's more
readable.

* [ONNX] Use TestCase assertions rather than `assert`

This provides better error messages.

* [ONNX] Use double quotes consistently.

[ONNX] Fix code block formatting in doc (#65421)

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D31424093

fbshipit-source-id: 4ced841cc546db8548dede60b54b07df9bb4e36e
2021-10-22 13:46:16 -07:00
7a78f715a6 [ONNX] Add warning for inplace updates on tensor.shape in tracing mode (#63170) (#66142)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66142

* Add warning

* Lint and clang fixes

* Remove duplicate comments

* Added pitfalls section

* Modify sections

* Minor modifications

* Add underline to avoid doc build failures

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D31424092

fbshipit-source-id: c83195f3c66885ad1aecde13b3029c45dd171dbd
2021-10-22 13:46:14 -07:00
136abf5aff [ONNX] Update sum symbolic to handle dtypes (#64289) (#66141)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66141

* Update aten::sum symbolic for dtype

* Remove nesting and modify opeartor tests

* Fix expect files

[ONNX] Fix expect files added in #64289 (#65356)

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D31424091

fbshipit-source-id: d4af21e9f0d7e1c68bf6ef2f3e385db84b4c53f3
2021-10-22 13:46:12 -07:00
53a163a015 [ONNX] Export nn.Module call as ONNX local function (#63589) (#66140)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66140

* Add new argument to export api to enable users specifying `nn.Module` classes that they wish to be exported as local function in ONNX model.
* Refactor `torch/csrc/jit/serialization/export.cpp`, and remove redundant `EncoderBase` class.
* ~~Contains changes from #63268~~
* Depends on #63716 to update onnx submodule.

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D31424098

fbshipit-source-id: c949d0b01c206c30b4182c2dd1a5b90e32b7a0d3

Co-authored-by: BowenBao <bowbao@microsoft.com>
2021-10-22 13:44:56 -07:00
d1986a1cf5 [BE] delete frontend.cpp (#66581)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66581

c10d/frontend.cpp was originally proposed to introduce pure C++ API and use TorcBind to share python level API with TorchScript. This is no longer needed, so delete this to reduce code redundancy.
ghstack-source-id: 141336190

Test Plan: wait for ci

Reviewed By: rohan-varma

Differential Revision: D31627107

fbshipit-source-id: 07d30d280c25502a222a74c2c65dfa4069ed8713
2021-10-22 13:33:24 -07:00
e8742f15cf [quant][graphmode][fx] Add observation_type.py (#67063)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67063

Adding ObservationType Enum for `backend_config_dict`

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31849078

fbshipit-source-id: e9e7225d564b51fa9454f7f087dd134152c069a0
2021-10-22 12:17:54 -07:00
f2582a59d0 [SR] Add rvalue overload for operator() (#66648)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66648

Currently, SR shallow-copies its `IValue` inputs when running inferences. We can avoid refcount bumps by `std::move`-ing the inputs into their slots. To achieve this, I've made the following changes:

1. Add an overload for `set_inputs` that takes a `std::vector<IValue>&&`.
2. Change the signatures of `StaticModule::operator()` and `StaticRuntime::operator()`.
Old:
```
operator()(const std::vector<IValue>& args, const std::unordered_map<std::string, IValue>& kwargs)
```
New:
```
template <class IValueList>
operator()(IValueList&& args, const std::unordered_map<std::string, IValue>& kwargs)
```

The implementations use perfect forwarding to invoke the correct overload of `set_inputs`.

Test Plan: Added a short new unit test to exercise the new code path. All other unit tests still pass.

Reviewed By: hlu1

Differential Revision: D31659973

fbshipit-source-id: b8c194405b54a5af1b418f8edaa1dd29a061deed
2021-10-22 10:51:47 -07:00
40a8a50913 Add static_runtime::fused_equally_split (#2)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch-canary/pull/2

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66881

Adds `static_runtime::fused_equally_split` operator and removes `is_fused` logic from original operator. Modifies `FuseUnpackListV2` to map `fb::equally_split` to this new operator.

Test Plan:
```
adityapillai@5960 /data/sandcastle/boxes/fbsource/fbcode 1m 13s
❯ buck test //caffe2/benchmarks/static_runtime/fb:test_fb_operators
```
and sandcastle
strange_what_could_go_wrong

Reviewed By: mikeiovine

Differential Revision: D31742293

fbshipit-source-id: 60b35589c8817719b005d49811f575b6590d1c39
2021-10-22 10:26:49 -07:00
391eb1dbe3 [JIT] UseVariadicOp handles multiple lists (#66288)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66288

This change makes it so `UseVariadicOp` can transform ops with many Tensor list inputs.

Input pattern:
```
%output : Type = op(%list_1, %arg_1, %list_2, %list_3)
```
Output pattern:
```
%output : Type = variadic_op(%list_11, ..., %list_1N, %arg_1, %list_21, ..., %list_2M, %list_31, ..., %list_3K, N, M, K)
```
The length of each list is passed at the end of the variadic op so that the op implementation can process the inputs appropriately. This also frees us from needing to update `hasVarArgs` in static runtime each time we add a variadic op.

This diff also makes `UseVariadicOp` more robust. Before, `list_idx` was passed as an argument. Now, `VariadicUpdater` determines `list_idx` from the node's schema.

Test Plan:
Existing variadic ops do not break:
`buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: d1jang

Differential Revision: D31450811

fbshipit-source-id: 808fcc3ae8940b9e602586f38f8cf9154c9a6462
2021-10-22 10:22:33 -07:00
c7121ae77f fix formatting CIRCLE_TAG when building docs (#67026)
Summary:
Similar to pytorch/text#1416
malfet, brianjo

The previous code failed when tags changed from `v0.9.0` to `v0.10.0`. I tested this offline, it would be nice to somehow be actually tag the repo and see that this adds the correct documentation directory to the pytorch/pytorch.github.io repo.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67026

Reviewed By: saketh-are

Differential Revision: D31843381

Pulled By: malfet

fbshipit-source-id: 21526ad9ed4c1751c2d7f6d621da305f166a7f55
2021-10-22 10:10:52 -07:00
d9c4b3feab Do rowwisemoments computation in float for half LayerNorm (#66920)
Summary:
https://github.com/pytorch/pytorch/issues/66707

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66920

Reviewed By: mrshenli

Differential Revision: D31850612

Pulled By: ngimel

fbshipit-source-id: a95a33567285dcf9ee28d33f503cead3268960f9
2021-10-22 09:50:42 -07:00
6e6ede2e70 [JIT] Re-enable alias sensitive peepholes (#65860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65860

Re-enable peepholes like `x + 0 == x`. These were at one point enabled, and then disabled because they did not properly account for aliasing, and then re-enabled with reconstructing the alias db everytime which is slow  - O(n^2). I've added correctness conditions, and I've also made it so that we avoid using stale aliasing properties for either the input or output of nodes we optimize.
Some of the other code that we have written to avoid re-instantiating the alias db involves internally mutating it, however this is tricky to reason about and we probably have to add some extra invariants...

cc navahgar relevant to graph opts and d1jang alias analysis relevant here

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D31352382

Pulled By: eellison

fbshipit-source-id: 441a27f17dc623d6c24538d1d43cba0412c3c482
2021-10-22 09:45:57 -07:00
051ea5ccbf [Static Runtime] Bundle function & function_kind to carry them together (#66974)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66974

`D31591785 (67e003f09b)` started carrying a function object to be executed and `FunctionKind` for the type of the function *separately*, and this caused a bug fixed by D31783028 (79803b199f).

This change bundles them as it was before done by swolchok to reduce the chances of such a mistake in the future. They need to be carried altogether always since `FunctionKind` identifies the type of the function object.

Note that `struct Function` is a POD type, so accessing its field (first, second) shouldn't cause an extra overhead in `ProcessedNode::run()`.

Test Plan:
Confirmed that the managed memory metics remain the same before/after this diff on inline_cvr:

```
#AFTER
# inline_cvr/local
Total number of managed tensors: 2660
Total number of managed output tensors: 0
Total number of unmanaged values: 3041
Total memory managed: 1496896 bytes
Total number of reused tensors: 1183
Total number of 'out' variant nodes/total number of nodes: 2452/2469 (99.3115%)
# inline_cvr/local_ro
Total number of managed tensors: 1412
Total number of managed output tensors: 0
Total number of unmanaged values: 2679
Total memory managed: 39040 bytes
Total number of reused tensors: 959
Total number of 'out' variant nodes/total number of nodes: 1928/1939 (99.4327%)
# inline_cvr/remote_ro
First iter time: 12.0344 ms
Total number of managed tensors: 1293
Total number of managed output tensors: 0
Total number of unmanaged values: 14
Total memory managed: 5293824 bytes
Total number of reused tensors: 771
Total number of 'out' variant nodes/total number of nodes: 1298/1298 (100%)
```

```
#BEFORE
#  inline_cvr/local
Total number of managed tensors: 2660
Total number of managed output tensors: 0
Total number of unmanaged values: 3041
Total memory managed: 1496896 bytes
Total number of reused tensors: 1183
Total number of 'out' variant nodes/total number of nodes: 2452/2469 (99.3115%)

#inline_cvr/local_ro
Total number of managed tensors: 1412
Total number of managed output tensors: 0
Total number of unmanaged values: 2679
Total memory managed: 39040 bytes
Total number of reused tensors: 959
Total number of 'out' variant nodes/total number of nodes: 1928/1939 (99.4327%)

#inline_cvr_remote_ro
Total number of managed tensors: 1293
Total number of managed output tensors: 0
Total number of unmanaged values: 14
Total memory managed: 5293824 bytes
Total number of reused tensors: 771
Total number of 'out' variant nodes/total number of nodes: 1298/1298 (100%)
```

Reviewed By: mikeiovine

Differential Revision: D31798419

fbshipit-source-id: fd4301b6731e402be0820729654735c791511aba
2021-10-22 08:57:49 -07:00
3d7a344c5e Fix ArchiveReader to keep archive path (#67035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67035

Incorporate the same change from https://github.com/pytorch/data/pull/73

Test Plan: Imported from OSS

Reviewed By: NivekT

Differential Revision: D31837963

Pulled By: ejguan

fbshipit-source-id: 3b0171ba30f392c8773c497702bc60aa4fbe28c6
2021-10-22 06:34:39 -07:00
d1a5612a3e remove accscalar from i0 and i0e (#67048)
Summary:
Removes some of the half math ops to make https://github.com/pytorch/pytorch/issues/64023 possible.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67048

Reviewed By: mruberry

Differential Revision: D31847249

Pulled By: ngimel

fbshipit-source-id: 8385aacd846bb990e368ff336eb346d847af70b9
2021-10-22 01:34:36 -07:00
5f58764d1d [PyTorch Edge][type] Add type support for NamedTuple custom class (import) (#63130)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63130

Extend `type_parser` to handle `NamedTuple` type. It can be extended to handle other types when needed. The custom type will follow the following format:
```
"qualified_named[
    NamedTuple, [
        [filed_name_1, field_type_1],
        [filed_name_2, field_type_2]
    ]
]"
```
For example:
```
"__torch__.base_models.sparse_nn.pytorch_preproc_types.PreprocOutputType[
    NamedTuple, [
        [float_features, Tensor],
        [id_list_features, List[Tensor]],
        [label,  Tensor],
        [weight, Tensor],
        ]
    ]"
```

For nested types, the order of type lists from type table should be:
```
std::string type_1 = “__torch__.C [
    NamedTuple, [
        [field_name_c_1, Tensor],
        [field_name_c_2, Tuple[Tensor, Tensor]],
    ]
]”

std::string type_2 = “__torch__.B [
   NamedTuple, [
       [field_name_b, __torch__.C ]
   ]
]”

std::string type_3 = “__torch__.A[
   NamedTuple, [
       [field_name_a, __torch__.B]
   ]
]”
std::vector<std::string> type_strs = {type_str_1, type_str_2, type_3};
std::vector<TypePtr> type_ptrs =  c10::parseType(type_strs);
```

namedtuple from both `collection` and `typing` are supported
```

from typing import NamedTuple
from collections import namedtuple
```

This change only adds the parser and now new runtime can read the above format.
ghstack-source-id: 141293658

Test Plan:
```
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.CompatiblePrimitiveType'
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.CompatibleCustomType'

buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.InCompatiblePrimitiveType'
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.InCompatibleCustomType'
```

Reviewed By: iseeyuan

Differential Revision: D30261547

fbshipit-source-id: 68a9974338464e320b39a5c613dc048f6c5adeb5
2021-10-22 00:40:57 -07:00
d3fc3c4ded Implement forward AD for linalg.matrix_exp (#62716)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62716

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D31823231

Pulled By: mruberry

fbshipit-source-id: 6d19b8988dce773b5716f0522d06febfe167fead
2021-10-21 23:55:36 -07:00
fe102b9888 diff tool (#66854)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66854

diff tool and script to test correctness of flatbuffer format

Test Plan:
`./verify_flatbuffer.sh | pastry`
P463163180

Reviewed By: zhxchen17

Differential Revision: D31752696

fbshipit-source-id: bea00102b21e62c02367853c8bec2742b483fbda
2021-10-21 22:53:51 -07:00
8ea985f240 [quant][fx][graphmode] Rename files and functions for convert and add do_not_use suffix (#66955)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66955

The new convert function are not meant to be used by users, it's a temporary function that
we use to build up the new convert path, we will bring feature parity with the old path
and deprecate the old path after that

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D31810488

fbshipit-source-id: 2f65a110506683123350e619c48df090a15570fc
2021-10-21 22:17:28 -07:00
01ced45217 [iOS] Bump up iOS CocoaPods version to 1.10.0 (#67058)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67058

Test Plan: Imported from OSS

Reviewed By: xta0

Differential Revision: D31846445

Pulled By: hanton

fbshipit-source-id: 7510a6c15fdeecc996fcce5c48db32e148ba7def
2021-10-21 21:30:24 -07:00
77beccaedb Do not build PyTorch with caffe2 by default (#66658)
Summary:
CAFFE2 has been deprecated for a while, but still included in every PyTorch build.
We should stop building it by default, although CI should still validate that caffe2 code is buildable.

Build even fewer dependencies when compiling mobile builds without Caffe2
Introduce `TEST_CAFFE2` in torch.common.utils
Skip `TestQuantizedEmbeddingOps` and `TestJit.test_old_models_bc`  is code is compiled without Caffe2
Should be landed after https://github.com/pytorch/builder/pull/864

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66658

Reviewed By: driazati, seemethere, janeyx99

Differential Revision: D31669156

Pulled By: malfet

fbshipit-source-id: 1cc45e2d402daf913a4685eb9f841cc3863e458d
2021-10-21 20:32:47 -07:00
4fe8055b9f made functorch not decompose by default (#66945)
Summary:
Basically reverting this: https://github.com/pytorch/pytorch/pull/63616

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66945

Reviewed By: zou3519

Differential Revision: D31802176

Pulled By: Chillee

fbshipit-source-id: b1cabd7af66aef26411801516c87336eaea4fccb
2021-10-21 19:18:00 -07:00
28fac23409 Fixes CUDA vs CPU consistency for index_put_ when accumulating (#66790)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/39227
Fixes https://github.com/pytorch/pytorch/issues/66495 (duplicate of 39227)

Description:
- Expands values for CUDA implementation
- Improved shapes checking for CUDA
- Improved error message for CUDA
- Added tests

cc zou3519

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66790

Reviewed By: mruberry

Differential Revision: D31843566

Pulled By: ngimel

fbshipit-source-id: c9e5d12a33e1067619c210174ba6e3cd66d5718b
2021-10-21 19:09:57 -07:00
35965869cf Enroll bowangbj@ to PyTorch distributed package (#67062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67062

For cc and potential reviews

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D31849050

fbshipit-source-id: d3899c2ca857b8f22bdc88b4e83cdd20bbf0b1d6
2021-10-21 18:45:21 -07:00
20f08d23a0 Revert D31838513: Strided masked reduction: mean.
Test Plan: revert-hammer

Differential Revision:
D31838513

Original commit changeset: 54b99ccf9821

fbshipit-source-id: 5480e8482c8770b41579ee085e158572b659c1f5
2021-10-21 18:32:42 -07:00
2578de4851 [skip ci] Set test owner for test_cuda* tests (#66838)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66838

Reviewed By: saketh-are

Differential Revision: D31841411

Pulled By: janeyx99

fbshipit-source-id: 5cdffdef4a92f9adcef1143ae4598b052c5acc6b
2021-10-21 17:36:25 -07:00
b40a940192 Strided masked reduction: mean. (#66784)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66784

cc nikitaved pearu cpuhrsch

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D31838513

Pulled By: cpuhrsch

fbshipit-source-id: 54b99ccf9821832c31976406379939b3c95f41de
2021-10-21 16:32:45 -07:00
b696d64ef4 Binaries without AVX512 kernels shouldn't report CPU Capability as AVX512 on machines with AVX512 support (#66703)
Summary:
### BUG
If a PyTorch binary is built with a compiler that doesn't support all the AVX512 intrinsics in the codebase, then it won't have ATen AVX512 kernels, but at runtime, CPU capability would still be incorrectly returned as AVX512 on a machine that supports AVX512. It seems that PyTorch Linux releases are done on CentOS with `gcc 7.3`, so this bug would manifest in the 1.10 release, unless a fix such as this one is added. gcc versions below 9.0 don't support all the AVX512 intrinsics in the codebase, such as `_mm512_set_epi16`.

### FIX
CPU Capability would be returned as AVX512 at runtime only if the binary was built with a compiler that supports all the AVX512 intrinsics in the codebase, and if the hardware the binary is being run on supports all the required AVX512 instruction sets.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66703

Reviewed By: gchanan

Differential Revision: D31732625

Pulled By: malfet

fbshipit-source-id: e52d06b87fbe2af9b303a2e9c264189c8512d5ec
2021-10-21 16:17:28 -07:00
33790c4e06 Implement histogramdd on CPU (#65318)
Summary:
Implements `torch.histogramdd` analogous to `numpy.histogramdd`.

Builds on https://github.com/pytorch/pytorch/pull/58780, generalizing the existing `torch.histogram` kernel to handle D-dimensional inputs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65318

Reviewed By: soulitzer

Differential Revision: D31654555

Pulled By: saketh-are

fbshipit-source-id: 14b781fac0fd3698b052dbd6f0fda46e50d4c5f1
2021-10-21 16:09:31 -07:00
6a224b3370 Set test owners for quantization tests (#66832)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66832

Reviewed By: saketh-are

Differential Revision: D31842880

Pulled By: janeyx99

fbshipit-source-id: 8aee760e4203045c12e7548a21ed5b71c557e3ee
2021-10-21 16:04:41 -07:00
f29e5220a6 Revert D31474901: [pytorch][PR] [numpy] add torch.argwhere
Test Plan: revert-hammer

Differential Revision:
D31474901

Original commit changeset: 335327a4986f

fbshipit-source-id: 534093e459762ff7a888c58d76e49e362015f2ba
2021-10-21 15:50:54 -07:00
fcfa06586d Wextra fix for NamedTensor.cpp (#66897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66897

Fixes:
```
stderr: caffe2/aten/src/ATen/native/NamedTensor.cpp:226:19: error: comparison of integers of different signs: 'const unsigned long' and 'int64_t' (aka 'long') [-Werror,-Wsign-compare]
    if (order_idx >= ellipsis_idx) {
        ~~~~~~~~~ ^  ~~~~~~~~~~~~
stderr: caffe2/aten/src/ATen/native/NamedTensor.cpp:226:19: error: comparison of integers of different signs: 'const unsigned long' and 'int64_t' (aka 'long') [-Werror,-Wsign-compare]
    if (order_idx >= ellipsis_idx) {
        ~~~~~~~~~ ^  ~~~~~~~~~~~~
```

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D31774623

fbshipit-source-id: b6e5b76695e512084ac5c9cb4215de7e9b763cf8
2021-10-21 14:22:38 -07:00
462f333c01 [numpy] add torch.argwhere (#64257)
Summary:
Adds `torch.argwhere` as an alias to `torch.nonzero`

Currently, `torch.nonzero` is actually provides equivalent functionality to `np.argwhere`.

From NumPy docs,
> np.argwhere(a) is almost the same as np.transpose(np.nonzero(a)), but produces a result of the correct shape for a 0D array.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64257

Reviewed By: dagitses

Differential Revision: D31474901

Pulled By: saketh-are

fbshipit-source-id: 335327a4986fa327da74e1fb8624cc1e56959c70
2021-10-21 14:02:11 -07:00
892ac08a02 Do not generate not_implemented error for forward AD when input with tangent passed to non-differentiable function (#66926)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61926

1. update the `if` to just use requires_derivative since that should reflect when function is not differentiable
2. if `requires_derivative=True` but no outputs have forward derivatives, we should error as usual
3. ~In the future we may also want to handle the case~ when `len(fw_derivatives) > 0 and len(fw_derivatives) < num_diff_outputs` we should add assert in codegen that this does not happen.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66926

Reviewed By: anjali411

Differential Revision: D31810736

Pulled By: soulitzer

fbshipit-source-id: 11a14477cc7554f576cff2ed1711a448a8c6a66a
2021-10-21 13:53:07 -07:00
062ae8df0e Automated submodule update: tensorpipe (#65353)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe).

New submodule commit: 183172ba8c

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65353

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: lw

Differential Revision: D31059779

fbshipit-source-id: 7bddff5139f8168750e22e1cc8c0d49931db542e
2021-10-21 13:30:45 -07:00
b07371f19c [skip ci] Set test owners for serialization tests (#66862)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66862

Reviewed By: saketh-are

Differential Revision: D31828615

Pulled By: janeyx99

fbshipit-source-id: 8d28970eead9d6f26e9ea64b823295d9c9e1469d
2021-10-21 13:22:18 -07:00
6f1ba16d6d [skip ci] Set test owners for cpp test (#66836)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc yf225 glaringlee

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66836

Reviewed By: saketh-are

Differential Revision: D31828641

Pulled By: janeyx99

fbshipit-source-id: 076d41686746fecebc07452df8212eef15a7824c
2021-10-21 13:17:46 -07:00
00a871c5c9 [skip ci] Set test owner for multiprocessing tests (#66848)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc VitalyFedyunin

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66848

Reviewed By: VitalyFedyunin

Differential Revision: D31828908

Pulled By: janeyx99

fbshipit-source-id: 45d6901648f5564c1bf07ad8d01d69ef486ae104
2021-10-21 13:13:53 -07:00
78f970568c Add dummy op to use instead of searchsorted (#66964)
Summary:
Would help unblock https://github.com/pytorch/pytorch/issues/66818 if this actually works

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66964

Reviewed By: mruberry

Differential Revision: D31817942

Pulled By: janeyx99

fbshipit-source-id: 9e9a2bcb0c0479ec7000ab8760a2e64bf0e85e95
2021-10-21 12:56:22 -07:00
94f4e9a995 Enable warning tests for nondeterministic backward functions (#66736)
Summary:
Followup from https://github.com/pytorch/pytorch/issues/66233

Since https://github.com/pytorch/pytorch/issues/50209 was fixed, we can enable these warning tests now

cc mruberry kurtamohler

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66736

Reviewed By: zou3519

Differential Revision: D31723385

Pulled By: mruberry

fbshipit-source-id: dc1922a6d0c45cc80020db85710e755a89113861
2021-10-21 12:51:53 -07:00
ce6f4b3a02 Setup c10d extension Backend class attr the same way as builtin ones (#66991)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66991

Currently, c10d extensions uses Backend.NAME to store the creator
function. However, builtin ones use that same field to store the
name. This commit makes c10d extensions comply with builtin ones,
and uses a dedicated `_plugins` field to store creator functions.

Thanks bryanmr for pointing this out.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D31820307

Pulled By: mrshenli

fbshipit-source-id: 259769ebfc80c0c9fc44d25498c8d19a3a09d1bc
2021-10-21 12:35:07 -07:00
40e5d31a52 Add OpInfo for torch.bincount (#65796)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65796

Reviewed By: bdhirsh

Differential Revision: D31386560

Pulled By: saketh-are

fbshipit-source-id: acb6ed3f743ddcccd0ff7ce1ab21f77c2078da37
2021-10-21 12:11:38 -07:00
9d4549295d ONNX export: propagate node metadata across passes (#45256)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/45255

Mostly straightforward. Only downside in this PR is the lack of more scalable way to check for all newly-created nodes in `callPySymbolicFunction`. The other options were:
* Create a scope within the node's scope and loop through all nodes that correspond to the scope. The code would still need to loop through all nodes.
* Add extra state to the graph (no good reason to do so).
* Add extra state to the ONNX exporter, since python calls go back to `g.op(...)` (no good reason to do so, also not very pythonic).

cc BowenBao neginraoof

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45256

Reviewed By: malfet, houseroad

Differential Revision: D31744281

Pulled By: msaroufim

fbshipit-source-id: 1b63f6e7f02ed61b3a9b7ac3d0be0a3a203c8ff6
2021-10-21 11:49:05 -07:00
a33f341cee [ci] try setting MAX_JOBS on windows builds to reduce OOMs (#66986)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66986

See: https://github.com/pytorch/pytorch/issues/66674

Test Plan: Imported from OSS

Reviewed By: seemethere, anjali411

Differential Revision: D31822578

Pulled By: suo

fbshipit-source-id: e24bbe9a1ff21ad0653708217cef5d8b2f56c5a2
2021-10-21 11:41:05 -07:00
53cf7e844f [SR] Fix bug in FuseListUnpackV2 (#67021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67021

When applying the equally split optimization, we still need to delete the list unpack node.

I did an accuracy test yesterday but didn't catch this issue because my diffs were not properly synced between devservers (I use hlu1's devbig for testing and it had an old version of "Add FuseListUnpackV2"). But I did another test this morning and realized that there was an issue.

This is not affecting anything in prod right now since D31742293 has not landed.

Reviewed By: hlu1

Differential Revision: D31827278

fbshipit-source-id: c7b05e3d8ec942632adcff4bdfebb8c27c1a7a39
2021-10-21 11:08:04 -07:00
a7ec4b53d2 Splitter: Transformer_encoder (#66952)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66952

Added splitter to lower parts of the transformer model
Program now supports arg input

Test Plan:
Performance on non-lowered model:
0.19662559509277344
Performance on semi-lowered model:
0.19131642150878905

Reviewed By: 842974287

Differential Revision: D31541325

fbshipit-source-id: 194aba97afc794dbeada4bbc4777d0a7b02e3635
2021-10-21 10:59:08 -07:00
d73b88b473 Unsqueeze bug fix (#66889)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66889

Added support for negative dims and modified unit test.

Test Plan: buck test mode/dev-nosan caffe2/test/fx2trt/converters:test_unsqueeze

Reviewed By: 842974287

Differential Revision: D31769393

fbshipit-source-id: 854335ead2ffad5f466ad66b9be36ba20a0fea67
2021-10-21 10:57:58 -07:00
23321ba7a3 Fix bug [#66780]: wrong input to torch.is_floating_point (#66783)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66783

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D31802971

Pulled By: cpuhrsch

fbshipit-source-id: 6a7d8b83dad219fd683504f9084b77358800507c
2021-10-21 09:50:58 -07:00
13b8599831 [skip ci] Set test owner for test_dispatch.py (#66840)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66840

Reviewed By: saketh-are

Differential Revision: D31829224

Pulled By: janeyx99

fbshipit-source-id: 66aceacd4f976c36ed48ca5be59616d245ba2a82
2021-10-21 08:48:37 -07:00
8cbdf49dce [qnnpack] Remove conv_utils.h (#66605)
Summary:
This completes the removal of conv_utils and redistributes its dependencies

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66605

ghstack-source-id: 140565820

Test Plan: ci tests

Reviewed By: kimishpatel

Differential Revision: D31637731

fbshipit-source-id: 48d3a423e4ff0eb6ab21bb13bda44da16996423b
2021-10-21 08:23:42 -07:00
960e3216a4 [skip ci] Set test owner for named tensor tests (#66849)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc zou3519

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66849

Reviewed By: zou3519

Differential Revision: D31828903

Pulled By: janeyx99

fbshipit-source-id: 30810bcec750ba8e1d5a342c31a5996bf57acd69
2021-10-21 08:22:26 -07:00
f5c5ab2868 [skip ci] Set test owner for cpp-extensions tests (#66837)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc yf225 glaringlee zou3519

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66837

Reviewed By: anjali411

Differential Revision: D31828401

Pulled By: janeyx99

fbshipit-source-id: 35ac27f3e1c0eb70ccb38c07c42ba61bd0c848fe
2021-10-21 08:15:38 -07:00
32e790997b [Rocm]Reduce severity of detected possible memory leak from assertion to warning (#65973)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62533.
In very rare cases, the decorator for detecting memory leak is throwing assertion, even when the test is passing, and the memory is being freed with a tiny delay. The issue is not being reproduced in internal testing, but shows up sometimes in CI environment.

Reducing the severity of such detection to warning, so as not to fail the CI tests, as the actual test is not failing, rather only the check inside the decorator is failing.

Limiting the change to ROCM only for now.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65973

Reviewed By: anjali411

Differential Revision: D31776154

Pulled By: malfet

fbshipit-source-id: 432199fca17669648463c4177c62adb553cacefd
2021-10-21 07:10:54 -07:00
70a5113e03 [ROCm] update Magma for 4.3 release (#65203)
Summary:
Upstream magma fixes the cholesky issues.
Refer https://bitbucket.org/icl/magma/issues/48/parameter-4-was-incorrect-on-entry-to

Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com>

Fixes #{issue number}

cc jeffdaily sunway513 jithunnair-amd ROCmSupport

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65203

Reviewed By: anjali411

Differential Revision: D31766608

Pulled By: malfet

fbshipit-source-id: 3829b89314d25d8aa14be57ead879a811ab3f098
2021-10-21 07:06:01 -07:00
b6df043f1f Add torch.nn.init.uniform_ operator to ShardedTensor. (#63997)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63997

Use torch_function to extend torch.nn.init.uniform_
The Init is done in SPMD fashion. Note that ideally we want to aggregate sharded tensors into a global tensor, init it and reshard. It's fine to run it SPMD since uniform is I.I.D indepenent and identifically distributed.
Also enable unit test for test_linear.py for OSS test

Test Plan:
a) Unit Test
(pytorch) ... $ python test/distributed/_sharded_tensor/ops/test_init.py TestShardedTensorNNInit --v
(pytorch) ... $ python test/distributed/_sharded_tensor/ops/test_linear.py --v (before runs this command is no-op)

or b) Manual run: Instruction here: https://docs.google.com/document/d/1_m1Hdo5w51-hhPlZ_F8Y6PIWrN7UgJZqiSpARYvhsaE/edit#

Imported from OSS

Reviewed By: pritamdamania87, anjali411

Differential Revision: D30563017

fbshipit-source-id: d1859f7682235bcb44515efc69ca92bc5e34fce1
2021-10-21 00:17:13 -07:00
bdb889aca1 [nnc] Use a descriptive name for fused kernels when profiling (#66990)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66990

NNC fusion groups currently show up as "TensorExpr" in the profiler,
which is true but not super useful since it obscures what's actually happening
in the fusion group.  This change will log them as `fused_XXX` where XXX is a
(length-limited) series of ops describing the subgraph, for instance
`fused_mul_add` to represent a group containing `aten::mul`, `aten::add`.

Test Plan: New unit test to check the output of autograd profiler.

Reviewed By: dzhulgakov

Differential Revision: D31762087

fbshipit-source-id: 3fadbdc67b054faa01aa42e5b6ea2c4a6bc3481f
2021-10-21 00:06:23 -07:00
8beabffac3 [PyTorchEdge] Make aten function common to aten and torch_common (#66663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66663

fb: TensorCompare.cpp is in per-app, a target higher than torch_mobile

Please read this doc to know about [Per-app ATen/native and Template Selective Build](
https://docs.google.com/document/d/1O5--mOAi_gGh2GkE-REo3qJRRQ_Lks69IfgszcB8ThI/edit)

Create a filed called "prim_native_functions.cpp" in ATen, add it to aten_cpu, and cut-paste native::is_nonzero() to prim_native_functions.cpp.
By doing this we move the function to lower layer which is more visible to all targets depending on it.

Instruction count comparison new vs old
https://www.internalfb.com/phabricator/paste/view/P463272302?view=diff

Test Plan:
fb:
```
(base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource] buck build //xplat/caffe2:aten_cpu
Building: finished in 0.4 sec (100%) 1/202 jobs, 0/202 updated
  Total time: 0.4 sec
More details at https://www.internalfb.com/intern/buck/build/ea35300b-55be-4b9f-bc74-80cdd869c16a
BUILD SUCCEEDED
(base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource] buck build //xplat/caffe2:aten_native_cpu
Building: finished in 0.7 sec (100%) 1/1 jobs, 0/1 updated
  Total time: 0.8 sec
More details at https://www.internalfb.com/intern/buck/build/ccd97d43-c59d-4f29-9418-485cd24575e2
BUILD SUCCEEDED
```

Reviewed By: iseeyuan

Differential Revision: D31669536

fbshipit-source-id: d35f069f975db6dce0b678c5b5ddd74bd690f599
2021-10-20 20:41:41 -07:00
f8f04d5424 [quant][graphmode][fx] Add support for single linear and conv2d (#66950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66950

Just to show that it works for weighted operations as well, qat/fused op not supported yet
We can start developing the backend_config_dict and work towards making the support more complete afterwards

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31801782

fbshipit-source-id: 8491bab7939a7a1c23ffa87c351844b82e390027
2021-10-20 19:13:27 -07:00
a89851a0d9 [quant][fx][graphmode] Adding a new convert function that produces reference pattern by default (#66925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66925

Current convert_fx implementation is using "The Interpreter Pattern" in https://pytorch.org/docs/stable/fx.html
There are two things that's changed which make the approach in this PR possible and needed:
1). original convert implementation is developed at the initial prototype where fx does not allow mutations, now fx
supports mutations
2). original convert needs to work for a lot of fbgemm/qnnpack specific logic, which is not needed for reference patterns

Therefore it makes sense for us to write a new convert function just for reference patterns, the implementation
is significantly easier to understand than the original convert implementation

Current support:
* we should be able to support all non-weighted ops like relu, add etc.

Missing:
* linear and conv
* some advanced features like standalone modules, input_quantized_idxs etc.

will add linear and conv support and start defining the backend_config_dict based on this version of convert

Test Plan:
python test/test_quantization.py TestQuantizeFxOpsNew

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31786241

fbshipit-source-id: 2a32156eb6d3c5271cb44906cd863055785fb5d4
2021-10-20 18:54:30 -07:00
db4165892b [SmartCompose][OnDevice]fix function name bug in mobile export & Script to convert mobile model (#66915)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66915

Pull Request resolved: https://github.com/pytorch/pytorch-canary/pull/3

fix function name bug in mobile export

Test Plan: buck run pytext/fb/assistant/smart_compose:mobile_converter -- --model_input=pytext_training/tree/teams/assistant/smart_compose/300555761/model.ts --model_output=pytext_training/tree/teams/assistant/smart_compose/300555761/mobile_model_test.ts

Reviewed By: JacobSzwejbka

Differential Revision: D31782983

fbshipit-source-id: 7288bb65adc7346d218980a535d68a12d8ef2033
2021-10-20 18:14:51 -07:00
ab1e4eac42 [Static Runtime] Add FuseListUnpackV2 (#66509)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66509

Like `FuseListUnpack`, but instead of adding arguments to the fused node's outputs, inserts a new fused op.

By using a new fused op, we can avoid runtime `is_fused` checks. This will make the op implementations significantly cleaner. Eventually, we will migrate all ops to `V2` and delete to old pass.

`FuseListUnpackV2` also fixes the bug described in T103159043.

Test Plan: I've made some changes to D31550307 locally and verified that everything works.

Reviewed By: hlu1

Differential Revision: D31492017

fbshipit-source-id: 4f90fcbc17e4c70a3d65985bee836fabf868a22c
2021-10-20 16:39:32 -07:00
17889ad26e Add support for cat in output stitching (#66098)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66098

`cat` is somewhat special-cased right now because currently we only have list of Tensor inputs where the list is constructed in the JIT IR graph. While that is generally true for Fusion (e.g. why we have ConstantChunk) that may not be true for shape analysis generally, so I'm waiting a bit to generalize.

Test Plan: Imported from OSS

Reviewed By: navahgar, anjali411

Differential Revision: D31797467

Pulled By: eellison

fbshipit-source-id: ca761e214dfd7f3bba8d189f3b3f42ffec064f63
2021-10-20 16:13:09 -07:00
2dd23ebfdb Add support for multi output nodes in partial eval graph stitching (#66097)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66097

Adding logic to generate runtime shapes for nodes with multi-outputs. It is generalizing existing flow of looking at a node, getting its shape graph, inlining it, and adding a mapping from the output to the new value in the stitched shape compute graph to loop over multiple outputs.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31797468

Pulled By: eellison

fbshipit-source-id: 2c182b71a46b36d33f23ad35b89790a4a5d4471c
2021-10-20 16:13:07 -07:00
0196b984f3 Add Handling of Cat in Shape Analysis (#65575)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65575

This is needed for lowering an NNC model to mobile. It is also the last class of unhandled ops which NNC fuses, and we need integration this for computing output symbolic shapes.

The graph of with two dynamic shape inputs produces:
```
graph(%x.1 : Tensor(SS(-2), 2, 3),
      %y.1 : Tensor(SS(-3), 2, 3)):
  %5 : int = prim::Constant[value=0]()
  %4 : Tensor[] = prim::ListConstruct(%x.1, %y.1)
  %6 : Tensor(SS(-4), 2, 3) = aten::cat(%4, %5) # /private/home/eellison/pytorch/test/jit/test_symbolic_shape_analysis.py:290:19
  return (%6)
```
With a partial eval graph of
```
Done with partial evaluation
graph(%129 : int[],
      %130 : int[],
      %dim.14 : int):
  %738 : int = prim::Constant[value=3]()
  %737 : int = prim::Constant[value=2]()
  %132 : int = prim::Constant[value=0]()
  %392 : int = aten::__getitem__(%129, %132) # <string>:339:44
  %417 : int = aten::__getitem__(%130, %132) # <string>:339:44
  %cat_dim_size.48 : int = aten::add(%392, %417) # <string>:339:29
  %result_size.5 : int[] = prim::ListConstruct(%cat_dim_size.48, %737, %738)
  return (%result_size.5)
```

To handle cat, I essentially make the cat shape op variadic,
replacing
```
torch.cat([x, y]
...
def cat_shape_op(tensors: List[List[int]], dim: int):
    ...
    op(tensors)
```
with
```
def cat_shape_op(x: List[int], y: List[int], dim: int):
    tensors = [x, y]
    op(tensors)
```
This reuses the existing input Tensor properties partial evaluation path and avoids having to add special handling to optimize out `len(tensors)` calls in the IR.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31797471

Pulled By: eellison

fbshipit-source-id: 62c794533d5fabfd3fad056d7e5fe3e8781b22c5
2021-10-20 16:13:05 -07:00
eaba976d49 Add x + 0 optimization (#65574)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65574

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31797470

Pulled By: eellison

fbshipit-source-id: bf9309fb43f164665335fed0d09697b0e2f67261
2021-10-20 16:13:03 -07:00
b059f035be Fix bug preventing optimization from firing (#65573)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65573

When we remove mutation on
```
x = [0, 1, 3, 4]
x[-2] = 4
```
we have a safety check that the new index will be in bounds of the old index. in practice, this should always be the case otherwise you would have a runtime error. Within that check (not within the actual adjustment) we were using the wrong length of inputs preventing the optimization from firing.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31797469

Pulled By: eellison

fbshipit-source-id: 02a1686b9f6016eb5aeb87ed342c043c203dcd0e
2021-10-20 16:13:01 -07:00
63b41e1f4d [JIT] Add partial evaluation graph stitching logic (#65377)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65377

When we run symbolic shape analysis on
```
conv = torch.nn.Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
max_pool = torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
mod = nn.Sequential(conv1, max_pool)
...
graph(%self : __torch__.torch.nn.modules.container.___torch_mangle_0.Sequential,
      %input.1 : Tensor):
  %18 : bool = prim::Constant[value=0]()
  %30 : int[] = prim::Constant[value=[1, 1]]()
  %29 : int[] = prim::Constant[value=[3, 3]]()
  %28 : int[] = prim::Constant[value=[2, 2]]()
  %6 : int = prim::Constant[value=1]()
  %self.0.bias : NoneType = prim::Constant()
  %self.0.weight : Double(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %input.5 : Tensor(SS(-2), 64, SS(-3), SS(-4)) = aten::conv2d(%input.1, %self.0.weight, %self.0.bias, %28, %29, %30, %6)
  %input.9 : Tensor(SS(-2), 64, SS(-5), SS(-6)) = aten::max_pool2d(%input.5, %29, %28, %30, %30, %18)
  return (%input.9)
```
we partially evaluate the shape compute graph of `conv2d`, whose output gets passed in and used to partially evaluate the shape compute graph of `max_pool2d`.

The conv2d remaining partially eval'd graph is [here](https://gist.github.com/eellison/0598bd224a422211efa1a45d2b7560b7), and the maxpool2d eval'd graph is [here](https://gist.github.com/eellison/625540b84f650ddbefd3ae5511ab8814). We can take the partially eval'd graphs of a series of operators and stitch them together, which allows us to
a) recover symbolic equivalences by CSE'ing & other optimizations
b) calculate shapes for a whole block of operators just on the input, such as for fusing the whole model to nnc with dynamic shapes and then passing along the computed symbolic shapes. the calculation will also handle error handling.
c) (future-looking) generate inputs on demand for straight-line networks that are composed just of aten operators

The combined graph of the two gives us compute for the unknown symbolic dimensions - `SS(-2), SS(-3), SS(-4), SS(-5), and SS(-6)`.
```
graph(%input.1 : int[]):
  %42 : bool = prim::Constant[value=0]() # <string>:152:17
  %15 : int = prim::Constant[value=3]()
  %input_batch_size_dim.1 : int = prim::Constant[value=0]() # <string>:417:41
  %13 : int = prim::Constant[value=1]() # <string>:426:61
  %12 : int = prim::Constant[value=4]() # <string>:437:32
  %11 : str = prim::Constant[value="AssertionError: "]()
  %9 : int = prim::Constant[value=2]()
  %8 : int = prim::Constant[value=6]()
  %7 : int = prim::Constant[value=7]()
  %16 : int = aten::len(%input.1) # <string>:438:17
  %17 : bool = aten::eq(%16, %12) # <string>:438:17
   = prim::If(%17) # <string>:438:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:438:10
      -> ()
  %18 : int = aten::__getitem__(%input.1, %13) # <string>:407:17
  %19 : bool = aten::eq(%18, %15) # <string>:407:17
   = prim::If(%19) # <string>:407:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:407:10
      -> ()
  %20 : int = aten::__getitem__(%input.1, %9) # <string>:411:20
  %21 : int = aten::add(%20, %8) # <string>:411:20
  %22 : bool = aten::ge(%21, %7) # <string>:411:20
   = prim::If(%22) # <string>:411:12
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:411:12
      -> ()
  %23 : int = aten::__getitem__(%input.1, %15) # <string>:411:20
  %24 : int = aten::add(%23, %8) # <string>:411:20
  %25 : bool = aten::ge(%24, %7) # <string>:411:20
   = prim::If(%25) # <string>:411:12
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:411:12
      -> ()
  %26 : int = aten::__getitem__(%input.1, %input_batch_size_dim.1) # <string>:422:29
  %27 : int = aten::sub(%20, %13) # <string>:428:32
  %28 : int = aten::floordiv(%27, %9) # <string>:428:32
  %29 : int = aten::add(%28, %13) # <string>:428:32
  %30 : int = aten::sub(%23, %13) # <string>:428:32
  %31 : int = aten::floordiv(%30, %9) # <string>:428:32
  %32 : int = aten::add(%31, %13) # <string>:428:32
  %48 : int = aten::floordiv(%28, %9) # <string>:133:17
  %outputSize.2 : int = aten::add(%48, %13) # <string>:136:23
  %51 : int = aten::floordiv(%31, %9) # <string>:133:17
  %outputSize.1 : int = aten::add(%51, %13) # <string>:136:23
  %53 : bool = aten::ne(%29, %input_batch_size_dim.1) # <string>:156:41
  %54 : bool = prim::If(%53) # <string>:157:64
    block0():
      %55 : bool = aten::ne(%32, %input_batch_size_dim.1) # <string>:157:93
      -> (%55)
    block1():
      -> (%42)
   = prim::If(%54) # <string>:157:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:157:10
      -> ()
  %56 : bool = aten::ge(%outputSize.1, %13) # <string>:160:17
  %57 : bool = prim::If(%56) # <string>:160:17
    block0():
      %58 : bool = aten::ge(%outputSize.2, %13) # <string>:160:38
      -> (%58)
    block1():
      -> (%42)
   = prim::If(%57) # <string>:160:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:160:10
      -> ()
  return (%26, %29, %32, %outputSize.2, %outputSize.1)
  ```

This PR runs shape analysis, retains the partially evaluated graphs, and then stitches them together, keeping track of what inputs in the partial eval graph correspond to what inputs in the encompassing graph IR and what outputs correspond to what symbolic shape. Adding NNC ppl as reviewers because it is relevant to dynamic shape fusion.

Question for reviewers  : should I make this a separate file ?

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31797472

Pulled By: eellison

fbshipit-source-id: a41ed31fad085d3563e71c815f49af0cd18aaeed
2021-10-20 16:12:58 -07:00
4ad6c144f6 [JIT][Easy] Shape cleanups (#65148)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65148

No functional changes, factoring out optimizations and renaming the `graph` in symbolic shape analysis to `shape_compute_graph` as ZolotukhinM suggested

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31797447

Pulled By: eellison

fbshipit-source-id: 60d322da040245dd7b47ee7c8996239572fd11c2
2021-10-20 16:11:24 -07:00
e046386be8 Avoid inlining error reporting in checked_convert (#66721)
Summary:
**Summary:** Move the error reporting part to the cpp file to avoid callers inlining it, which inflates the generated code size. See https://github.com/pytorch/pytorch/issues/65830.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66721

Test Plan:
Compiling the simple program below now generates ~150 lines of assembly, compared to 700+ lines before.

```
#include <c10/core/Scalar.h>

void g(float) {}

void f(const c10::Scalar& scalar) {
    auto x = scalar.to<float>();
    g(x);
}
```

**Reviewers:** Brian Hirsh

**Subscribers:** Brian Hirsh, Edward Yang, Yining Lu

**Tasks:** T103384490

**Tags:** pytorch

Fixes https://github.com/pytorch/pytorch/issues/65830

Reviewed By: zou3519, bdhirsh

Differential Revision: D31737607

Pulled By: andrewor14

fbshipit-source-id: 3d493c4d8e51d8f8a19d00f59b8ea28176c8a9e3
2021-10-20 16:04:09 -07:00
18bbc4c2b7 [Static Runtime] Fix a bug in aten::index (#66940)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66940

`aten::index`'s schema is as follows:

```
"aten::index.Tensor(Tensor self, Tensor?[] indices) -> Tensor
```

The current implementation assumes `indices`' elements are all tensors by doing `elem.toTensor`, which is incorrectly. This change creates an empty optional value if an element from `indices` is not a tensor.

Test Plan: Fixed `StaticRuntime, IndividualOps_Index` to correctly test `aten::index` with `indices` that contains `None`.

Reviewed By: hlu1

Differential Revision: D31712145

fbshipit-source-id: be1c29674bcd55b67b0dcc2a988bc37fd43745f3
2021-10-20 15:51:21 -07:00
08cb31a03e [PyTorch][1/N] Basic implementation of ShardedEmbedding using ShardedTensor. (#66604)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66604

This diff/PR is trying to implement the ShardedEmbedding and ShardedEmbedding using the ShardedTensor.

Several caveats:
1. We support limited input params for the op. To support more params are on the way.
2. We only support chuck sharding for now.
3. We only support a single local shard per rank for now.

ghstack-source-id: 141056130

Test Plan: Unit test and CI

Reviewed By: pritamdamania87

Differential Revision: D31544556

fbshipit-source-id: cc867dcba8c11e6f4c7c3722488908f5108cc67f
2021-10-20 15:16:49 -07:00
257239972c Fix attr_to_scope's key in torch/utils/tensorboard/_pytorch_graph.py (#65692)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65652

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65692

Reviewed By: Reubend

Differential Revision: D31678606

Pulled By: edward-io

fbshipit-source-id: 7c0bf740ee4f8c21bd01ced3ae70df23c9efadfb
2021-10-20 14:35:29 -07:00
450221c534 Sparse CSR: Add tensor.resize_ and tensor.copy_ (#63510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63510

Sparse CSR matrix resizing behavior:
If we _increase the number of rows_ the number of specified elements in the matrix remains the same -> the size of col_indices, values doesn't change, the size of crow_indices becomes `rows+1`.
If we _decrease the number of rows_ the number of specified elements will be `min(nnz, rows*cols)` -> need to resize `crow_indices` to `rows+1` and set the last element to `min(nnz, rows*cols)`; decrease the size of col_indices and values to `min(nnz, rows*cols)`.
If we _increase the number of columns_ the number of specified elements in the matrix remains the same, the number of rows remains the same -> no need to resize anything, just set new sizes.
We _cannot decrease the number of columns_ because it would require recomputing `crow_indices`.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D31796680

Pulled By: cpuhrsch

fbshipit-source-id: 7d8a9701ce06d30a1841f94bba0a057cacea9401
2021-10-20 14:19:04 -07:00
f56a1a59a3 Add simple backwards compatibility check for torch.package (#66739)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65154, tests for backwards compatibility of torch.package by checking if packages that were created before can still be loaded.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66739

Reviewed By: suo

Differential Revision: D31771526

Pulled By: PaliC

fbshipit-source-id: ba8c652c647b94114a058e4c7d7f1c7ce6033d84
2021-10-20 12:46:17 -07:00
6e67150f57 [skip ci] Set test owner for test_mkldnn.py (#66845)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc gujinghui PenghuiCheng XiaobingSuper jianyuh VitalyFedyunin

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66845

Reviewed By: anjali411

Differential Revision: D31803377

Pulled By: janeyx99

fbshipit-source-id: 4fcf77d3e4bf976449a0b1ab4d750619db3493a1
2021-10-20 12:38:56 -07:00
5569d5824c Fix documentation of arguments for torch.nn.functional.Linear (#66884)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66884

Addressing docs fix mentioned in issue 64978 on Github
ghstack-source-id: 141093449

Test Plan: https://pxl.cl/1Rxkz

Reviewed By: anjali411

Differential Revision: D31767303

fbshipit-source-id: f1ca10fed5bb768749bce3ddc240bbce1dfb3f84
2021-10-20 12:02:58 -07:00
e86d8323cb [JIT] Add special cases for batch_norm, instance_norm in alias_analysis (#66554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66554

In native_functions.yaml, the schemas for batch_norm and instance_norm
are incorrect: the inputs `running_mean` and `running_var` are mutated,
but are not marked as such in the function schema. Since `(a!)?`
annotations are currently not working (see #65760), this instead adds a
special case to `alias_anaysis.cpp`. If the value of `training` or
`use_input_stats` is known to be `false`, then `alias_analysis` will
mark the input as _not_ being written to.

Test Plan:
Removed the `skip` annotation on the following test, and added a special
exception in `check_alias_annotations`:
```
python test/test_ops.py -k test_variant_consistency_jit_nn_functional_batch_norm
```

Also:
```
./build/bin/test_jit --gtest_filter="*BatchAndInstanceNormFixture*"
```

Imported from OSS

Reviewed By: eellison

Differential Revision: D31612339

fbshipit-source-id: 12ca61b782b9e41e06883ba080a276209dc435bb
2021-10-20 10:22:10 -07:00
cf77bd4cf4 Fix python version in test tools CI job (#66947)
Summary:
On the HUD, the test tools job is failing as the runners now install Python 3.10, which is not compatible with numpy 1.20

See https://github.com/pytorch/pytorch/runs/3952169950?check_suite_focus=true Install dependencies step:
```
 ERROR: Command errored out with exit status 1:
   command: /opt/hostedtoolcache/Python/3.10.0/x64/bin/python /opt/hostedtoolcache/Python/3.10.0/x64/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmptq8aay7m
       cwd: /tmp/pip-install-dk_6t98q/numpy_e9431bf106b746148c0e7c36e46551b4
  Complete output (1169 lines):
  setup.py:66: RuntimeWarning: NumPy 1.20.0 may not yet support Python 3.10.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66947

Reviewed By: suo, malfet

Differential Revision: D31799205

Pulled By: janeyx99

fbshipit-source-id: 64bf10c37c0aa4f5837c48e92d56e81d920722bd
2021-10-20 10:12:16 -07:00
793f366e34 [skip ci] Set test owners for sparse tests (#66863)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc nikitaved pearu cpuhrsch IvanYashchuk

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66863

Reviewed By: anjali411

Differential Revision: D31771126

Pulled By: janeyx99

fbshipit-source-id: 6cb5ca0557e8555f6a09b3e607ff8888e505486e
2021-10-20 10:12:13 -07:00
a015964cf8 Strided masked reduction: prod. (#66386)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66386

cc nikitaved pearu cpuhrsch

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D31779598

Pulled By: cpuhrsch

fbshipit-source-id: 304a3d6abc794a49de5b044aade6cfd727758495
2021-10-20 10:10:54 -07:00
822277f302 [skip ci] Set test owners for test_type_promotion.py (#66866)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc nairbv mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66866

Reviewed By: anjali411

Differential Revision: D31771149

Pulled By: janeyx99

fbshipit-source-id: 87c04ed4a75ada06a553a11064d44ac65fc4c6ea
2021-10-20 09:42:37 -07:00
409364e597 [skip ci] Set test owners for test_typing.py (#66869)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc ezyang malfet rgommers xuzhao9 gramster

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66869

Reviewed By: anjali411

Differential Revision: D31766850

Pulled By: janeyx99

fbshipit-source-id: e9772f5378be07162d4f4d06925165e396d7d6c6
2021-10-20 09:41:13 -07:00
452b359c3f [skip ci] Set test owners for tensor creation tests (#66864)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc gchanan mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66864

Reviewed By: anjali411

Differential Revision: D31771139

Pulled By: janeyx99

fbshipit-source-id: 74adeae7de355fa6c63de22290fa324911230368
2021-10-20 09:38:21 -07:00
8a65047acc [skip ci] Set test owners for everything considered with module: tests (#66865)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66865

Reviewed By: anjali411

Differential Revision: D31771147

Pulled By: janeyx99

fbshipit-source-id: 8bebe5ac2098364ef1ee93b590abb5f4455b0f89
2021-10-20 09:37:03 -07:00
94f4b22df9 Revert D31761594: [pytorch][PR] opinfo : nn.functional.embedding
Test Plan: revert-hammer

Differential Revision:
D31761594 (ed5633d0c5)

Original commit changeset: d24f44728d04

fbshipit-source-id: 72574918300a7982430a0ceb772c9a24de525050
2021-10-20 09:17:16 -07:00
f95fef7897 Add prim::TensorExprDynamicGuard to bc allowlist (#66939)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66939

Reviewed By: ejguan

Differential Revision: D31797160

Pulled By: soulitzer

fbshipit-source-id: 630b7a0ab99671192397f927391361622f7e9c2e
2021-10-20 08:53:19 -07:00
3fe2ff800c Module docs update (#66909)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/37824

{F671745341}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66909

Reviewed By: anjali411

Differential Revision: D31782046

Pulled By: mikaylagawarecki

fbshipit-source-id: 009d2ea3c8a51a89786ef55bb9e88dc53aa8360f
2021-10-20 08:14:36 -07:00
62ca5a81c0 Exposed recompute_scale_factor into nn.Upsample (#66419)
Summary:
Description:
- Exposed recompute_scale_factor into nn.Upsample such that recompute_scale_factor=True option could be used

Context: https://github.com/pytorch/pytorch/pull/64501#discussion_r710205190

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66419

Reviewed By: gchanan

Differential Revision: D31731276

Pulled By: jbschlosser

fbshipit-source-id: 2118489e6f5bc1142f2a64323f4cfd095a9f3c42
2021-10-20 07:59:25 -07:00
867ccc9987 Strided masked reduction: amin. (#66385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66385

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D31779530

Pulled By: cpuhrsch

fbshipit-source-id: de753c2d191f7980a48831b892d3a1e8a7a547cd
2021-10-20 07:45:40 -07:00
c69e33bb11 Fix doc string for torch.acosh (#66814)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66814
Shift equation above note as per issue 65905 on github

Test Plan:
Imported from OSS

In preview docs built from PR

https://docs-preview.pytorch.org/66814/generated/torch.acosh.html#torch.acosh equation is now above note

{F671441651}

Reviewed By: gchanan

Differential Revision: D31742677

Pulled By: mikaylagawarecki

fbshipit-source-id: 9fa5390ad2a01ca001418c0bd624f2145f861bf4
2021-10-20 07:01:42 -07:00
ed5633d0c5 opinfo : nn.functional.embedding (#66622)
Summary:
Adds opinfo for `nn.functional.embedding`

Few cases where `numerical` gradient doesn't match (gradcheck fails)

```python
import torch

try:
    t = torch.randn(2, 1, dtype=torch.float64, requires_grad=True)
    idx = torch.tensor([0, 1])
    torch.autograd.gradcheck(lambda idx, t : torch.nn.functional.embedding(idx, t, padding_idx=1), (idx, t, ))
except Exception as e:
    print("PADDING IDX:", e)

try:
    t = torch.ones(2, 1, dtype=torch.float64, requires_grad=True)
    idx = torch.tensor([0, 1])
    torch.autograd.gradcheck(lambda idx, t : torch.nn.functional.embedding(idx, t, max_norm=1.), (idx, t, ))
except Exception as e:
    print("MAX NORM:", e)

try:
    t = torch.randn(2, 1, dtype=torch.float64, requires_grad=True)
    idx = torch.tensor([0, 1, 1])
    torch.autograd.gradcheck(lambda idx, t : torch.nn.functional.embedding(idx, t, scale_grad_by_freq=True), (idx, t, ))
except Exception as e:
    print("SCALE GRAD BY FREQUENCY:", e)

try:
    t = torch.randn(2, 1, dtype=torch.float64, requires_grad=True)
    idx = torch.tensor([0, 1])
    torch.autograd.gradcheck(lambda idx, t : torch.nn.functional.embedding(idx, t, sparse=True), (idx, t, ))
except Exception as e:
    print("SPARSE", e)

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66622

Reviewed By: gchanan

Differential Revision: D31761594

Pulled By: zou3519

fbshipit-source-id: d24f44728d049e6276d6c3165aa1fba458214959
2021-10-20 06:33:55 -07:00
79803b199f [Static Runtime] Make sure ProcessedNode::function_kind_ is copied over (#66917)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66917

The total number of 'out' variant nodes/total number of nodes is now 100% for all the models, which isn't true obviously.

Reviewed By: swolchok, mikeiovine

Differential Revision: D31783028

fbshipit-source-id: e0bc2c6614aa3c3a235283c9125de1b339f42585
2021-10-20 00:21:35 -07:00
14ee608791 [PyTorch] Make rearragement in sharded linear work as expected. (#66603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66603

Found the issue here: https://github.com/pytorch/pytorch/issues/66281 by make the test cases more complicated.

By closely reading the code again, it turns out my original understanding is also wrong. Let's use the example mentioned in the issue to explain:

If the placement is like:
```
"rank:3/cuda:3",
"rank:0/cuda:0",
"rank:1/cuda:1",
"rank:2/cuda:2",
```

First, we split the column or row by the order of [3, 0, 1, 2].

In the case of column-wise sharding:
We get to reaggrage the result from rank0-4.
Step 1: we split the output based on the original sharding strategy, aka, rank3 gets the 1st shard, rank0 get the 2nd shard, etc.
Step 2: we need to rearrange the result from rank0-4 by ordering them following the order of [3, 0, 1, 2], aka, the result from rank3 needs to be put in the front, and so forth.

In the case of row-wise sharding:
We need to rearrange the input being sent to rank0-4.
Step 1: we reorder the input and follow the map of [3, 0, 1, 2]. For example, the first shard goes to rank 3 so we need to put in the 3rd part, the second shard goes to rank 0, so we put it in the 2nd part, and so on.
Step 2: the size of the sharding for each rank is decided by the original placement: [3, 0, 1, 2], aka, rank 3 gets the first shard and its size, etc.

Update the unit test to reflect this change.

Also, correct some format and comments in the sharded linear.
ghstack-source-id: 141055689

Test Plan: unit test and wait for CI.

Reviewed By: pritamdamania87, bowangbj

Differential Revision: D31634590

fbshipit-source-id: 677a9c2b42da1e2c63220523ed2c004565bbecc7
2021-10-19 23:16:38 -07:00
ef15691a1e Revert D31732421: [JIT][Easy] Shape cleanups
Test Plan: revert-hammer

Differential Revision:
D31732421 (16d0896b69)

Original commit changeset: e934507d1795

fbshipit-source-id: 6b34815c556de64ee5c7ef8d41e4cb434ccd7098
2021-10-19 20:07:06 -07:00
70c9eb130d Revert D31732419: [JIT] Add partial evaluation graph stitching logic
Test Plan: revert-hammer

Differential Revision:
D31732419 (5db7db667f)

Original commit changeset: 883a55cbeef0

fbshipit-source-id: f5faba69dfb6b54aeb29d1beaeec8c5b0373830f
2021-10-19 20:07:04 -07:00
90b42452e2 Revert D31732417: Fix bug preventing optimization from firing
Test Plan: revert-hammer

Differential Revision:
D31732417 (853fc25fb0)

Original commit changeset: dd734254c021

fbshipit-source-id: 3da0663dac5b5d2117b3d7abdbcd45d96f98de33
2021-10-19 20:07:02 -07:00
b8d58129bb Revert D31732420: Add x + 0 optimization
Test Plan: revert-hammer

Differential Revision:
D31732420 (66543f88de)

Original commit changeset: 0271e0dc0dda

fbshipit-source-id: c2beea1661e10c2f1a982b5d4a34b1041dcb1204
2021-10-19 20:07:00 -07:00
e730752610 Revert D31732416: Add Handling of Cat in Shape Analysis
Test Plan: revert-hammer

Differential Revision:
D31732416 (cc7de1df3b)

Original commit changeset: 6d93ddf62c34

fbshipit-source-id: e2c9713177a7f783897e99dd71e631fb275c37da
2021-10-19 20:06:57 -07:00
57fcea9e88 Revert D31732418: Add support for multi output nodes in partial eval graph stitching
Test Plan: revert-hammer

Differential Revision:
D31732418 (0fdc9b77a3)

Original commit changeset: 767698d031b1

fbshipit-source-id: f899eb155dcec67d57f53a658a71169d37b63b42
2021-10-19 20:06:55 -07:00
4187d870df Revert D31732415: Add support for cat in output stitching
Test Plan: revert-hammer

Differential Revision:
D31732415 (b4db5174fe)

Original commit changeset: 7f513cea355f

fbshipit-source-id: a0d8f1512b13d51f6e50b5da58084effbaf0a0dc
2021-10-19 20:06:53 -07:00
1bf0e1acb4 Revert D31732414: Add Initial NNC Dynamic Shapes Flow
Test Plan: revert-hammer

Differential Revision:
D31732414 (de4fe7a38c)

Original commit changeset: 290a94a667c2

fbshipit-source-id: 3021a1d7a8661967e37d4f9cfc86ed47cc4a7f3d
2021-10-19 20:05:29 -07:00
9c4d7d96db Address feedback from #66673 (#66905)
Summary:
Specify both `build_generates_artifacts` and `exclude_tests` properties as suggested in https://github.com/pytorch/pytorch/pull/66673#pullrequestreview-783667960

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66905

Reviewed By: seemethere

Differential Revision: D31779742

Pulled By: malfet

fbshipit-source-id: 21f5543f3b767f38132be8c7e163455f39ff893f
2021-10-19 18:27:45 -07:00
deb6989880 [fx-acc] add optimize_quantization to FX graph opts (#65929)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65929

This adds a set of quantize/dequantize graph optimizations.

Test Plan:
```
buck test mode/opt glow/fb/fx/graph_opts:test_fx_graph_opts
```
```
Parsing buck files: finished in 0.8 sec
Building: finished in 3.0 sec (100%) 8475/80926 jobs, 0/80926 updated
  Total time: 3.9 sec
More details at https://www.internalfb.com/intern/buck/build/9dd6193b-d99c-4d2a-8ef8-4d71380916e7
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: b5a83d2a-8870-400e-b21e-3286967d1f4a
Trace available for this run at /tmp/tpx-20211018-165956.836274/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4222124724048882
    ✓ ListingSuccess: glow/fb/fx/graph_opts:test_fx_graph_opts - main (3.152)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_transpose_to_reshape_1_optimizable (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestTransposeToReshape) (0.100)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_transpose_to_reshape_0_identity (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestTransposeToReshape) (0.017)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_0 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.154)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_1 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.140)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantization_2_QuantizePerChannel_Dequantize_X_RescaleQuantized_X_ (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantization) (0.422)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_3 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.296)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_dequantize_clamp_remove_one_3 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.288)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_dequantize_clamp_remove_one_1 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.433)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_clamp_tensor (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.346)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantization_1_Quantize_Dequantize_X_RescaleQuantized_X_ (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantization) (0.403)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_transpose_to_reshape_2_unoptimizable (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestTransposeToReshape) (0.117)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_remove_one_1 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.415)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_remove_one_3 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.280)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantization_3_Dequantize_Quantize_Dequantize_X_Dequantize_rescale_X_Dequantize_X_ (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantization) (0.150)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_6 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.133)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_dequantize_clamp_remove_one_2 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.523)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_dequantize_clamp_remove_one_0 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.569)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantization_4_Rescale_QuantizeNode_QuantizeNode_ (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantization) (0.815)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_5 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.295)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_4 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.308)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_2 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.213)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_remove_one_2 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.230)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantization_0_Dequantize_Quantize_X_X (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantization) (0.336)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_remove_one_0 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.486)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_7 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.306)
Summary
  Pass: 25
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4222124724048882
```

# Before
```
Model before opt.
graph():
    %x : [#users=1] = placeholder[target=x]
    %quantize_per_tensor_2 : [#users=1] = call_function[target=torch.fx.experimental.fx_acc.acc_ops.quantize_per_tensor](args = (), kwargs = {input: %x, acc_out_ty: ((8, 4, 2), torch.qint32, False, (8, 2, 1), torch.contiguous_format, True, {scale: 1.000001e-05, zero_point: 0, qscheme: torch.per_tensor_affine})})
    %dequantize_1 : [#users=1] = call_function[target=torch.fx.experimental.fx_acc.acc_ops.dequantize](args = (), kwargs = {input: %quantize_per_tensor_2})
    %quantize_per_tensor_3 : [#users=1] = call_function[target=torch.fx.experimental.fx_acc.acc_ops.quantize_per_tensor](args = (), kwargs = {input: %dequantize_1, acc_out_ty: ((8, 4, 2), torch.qint32, False, (8, 2, 1), torch.contiguous_format, True, {scale: 1e-05, zero_point: 0, qscheme: torch.per_tensor_affine})})
    return quantize_per_tensor_3
opcode         name                   target                                            args                      kwargs
-------------  ---------------------  ------------------------------------------------  ------------------------  ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
placeholder    x                      x                                                 ()                        {}
call_function  quantize_per_tensor_2  <function quantize_per_tensor at 0x7f66030a34c0>  ()                        {'input': x, 'acc_out_ty': ((8, 4, 2), torch.qint32, False, (8, 2, 1), torch.contiguous_format, True, {'scale': 1.000001e-05, 'zero_point': 0, 'qscheme': torch.per_tensor_affine})}
call_function  dequantize_1           <function dequantize at 0x7f66030a35e0>           ()                        {'input': quantize_per_tensor_2}
call_function  quantize_per_tensor_3  <function quantize_per_tensor at 0x7f66030a34c0>  ()                        {'input': dequantize_1, 'acc_out_ty': ((8, 4, 2), torch.qint32, False, (8, 2, 1), torch.contiguous_format, True, {'scale': 1e-05, 'zero_point': 0, 'qscheme': torch.per_tensor_affine})}
output         output                 output                                            (quantize_per_tensor_3,)  {}
```

# After
```
Model after opt.
graph():
    %x : [#users=1] = placeholder[target=x]
    %quantize_per_tensor_2 : [#users=1] = call_function[target=torch.fx.experimental.fx_acc.acc_ops.quantize_per_tensor](args = (), kwargs = {input: %x, acc_out_ty: ((8, 4, 2), torch.qint32, False, (8, 2, 1), torch.contiguous_format, True, {scale: 1e-05, zero_point: 0, qscheme: torch.per_tensor_affine})})
    return quantize_per_tensor_2
opcode         name                   target                                            args                      kwargs
-------------  ---------------------  ------------------------------------------------  ------------------------  -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
placeholder    x                      x                                                 ()                        {}
call_function  quantize_per_tensor_2  <function quantize_per_tensor at 0x7f66030a34c0>  ()                        {'input': x, 'acc_out_ty': ((8, 4, 2), torch.qint32, False, (8, 2, 1), torch.contiguous_format, True, {'scale': 1e-05, 'zero_point': 0, 'qscheme': torch.per_tensor_affine})}
output         output                 output                                            (quantize_per_tensor_2,)  {}
```

Reviewed By: jfix71

Differential Revision: D30945732

fbshipit-source-id: 427cd4215b546e1d6c5362734bb7de93d0c0b1b9
2021-10-19 17:06:32 -07:00
32e3003726 Have test classes extend from common_utils.TestCase, not unittest.TestCase (#66900)
Summary:
This causes some functionality to not work, such as the disabling issues e.g., https://github.com/pytorch/pytorch/issues/66641

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66900

Reviewed By: seemethere

Differential Revision: D31778293

Pulled By: janeyx99

fbshipit-source-id: df3023ddaf7969ffb60117d1e1d7e36d87bc6139
2021-10-19 16:54:05 -07:00
de4fe7a38c Add Initial NNC Dynamic Shapes Flow (#66136)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66136

FOR REVIEWERS: this is ready to review, test failures comes from somewhere else in stack..

Takes in a TensorExprGraph of static shapes and generalizes the input shapes
to symbolic dimensions. Dimensions of value 1 will be preserved, otherwise
dimensions with the same value will be bucketed to the same symbolic shape.

E.g. `Tensor(5, 3), Tensor(3, 1) -> Tensor(SS(-1), SS(-2)), Tensor(SS(-2), 1)`

From there, runs symbolic shape inference on the graph, and creates a
versioning if in the graph with prim::TensorExprDynamicGuard checking if
the inputs at runtime match the Generalized Symbolic Shapes that are inputs
to the TE Kernel. The computate to calculate all symbolic dimensions is
inlined in to the if block with the TE Kernel. All Sym Dim Value* are
appended to the end of the TE Kernel Graph/Node inputs, and the Node is
augmented with a integer list attr `symbolic_shape_inputs` that gives the
mapping from Value * -> Symbolic Shape int64_t value. For more lengthy IR
examples and walkthrough look at ShapeAnalysisTest.DynamicShapesFusion in
`test_shape_analysis` Returns True on Success, False on Failure, can fail if
shape propagation fails to propagate # of dims or if complete shapes on
inputs not set.

Example transformation
```
graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)):
  %3 : Tensor = prim::TensorExprGroup_0(%x_inp, %y_inp, %z_inp)
  return ()
with prim::TensorExprGroup_0 = graph(%x.1 : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %y.1 : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)):
  %3 : int = prim::Constant[value=0]()
  %4 : Tensor = aten::tanh(%x.1)
  %5 : Tensor = aten::erf(%4)
  %6 : Tensor = aten::relu(%y.1)
  %7 : Tensor[] = prim::ListConstruct(%5, %6)
  %8 : Tensor = aten::cat(%7, %3)
  %9 : Tensor = aten::hardswish(%8)
  %10 : Tensor = aten::mul(%9, %z)
  return (%9)
```
->

```
  graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu),
      %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)):
  %4 : bool = prim::TensorExprDynamicGuard[types=[Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)]](%x_inp, %y_inp, %z_inp)
  %5 : Tensor = prim::If(%4)
    block0():
      %15 : int[] = aten::size(%x_inp)
      %16 : int[] = aten::size(%y_inp)
      %17 : int = prim::Constant[value=1]()
      %18 : int = prim::Constant[value=0]()
      %elem.3 : int = aten::__getitem__(%15, %18) # <string>:40:10
      %elem.5 : int = aten::__getitem__(%15, %17) # <string>:40:10
      %elem.11 : int = aten::__getitem__(%16, %18) # <string>:40:10
      %cat_dim_size.48 : int = aten::add(%elem.3, %elem.11) # <string>:321:29
      %3 : Tensor = prim::TensorExprGroup_0[symbolic_shape_inputs=[-5, -4, -3, -2]](%x_inp, %y_inp, %z_inp, %cat_dim_size.48, %elem.11, %elem.5, %elem.3)
      -> (%3)
    block1():
      %14 : Tensor = prim::FallbackGraph_1(%x_inp, %y_inp, %z_inp)
      -> (%14)
  return ()
  with prim::TensorExprGroup_0 = graph(%x.1 : Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu),
        %y.1 : Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu),
        %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu),
        %SS_5 : int,
        %SS_4 : int,
        %SS_3 : int,
        %SS_2 : int):
    %3 : int = prim::Constant[value=0]()
    %4 : Tensor(SS(-2), SS(-3)) = aten::tanh(%x.1)
    %5 : Tensor(SS(-2), SS(-3)) = aten::erf(%4)
    %6 : Tensor(SS(-4), SS(-3)) = aten::relu(%y.1)
    %7 : Tensor[] = prim::ListConstruct(%5, %6)
    %8 : Tensor(SS(-5), SS(-3)) = aten::cat(%7, %3)
    %9 : Tensor(SS(-5), SS(-3)) = aten::hardswish(%8)
    %10 : Tensor(SS(-5), SS(-3)) = aten::mul(%9, %z)
    return (%9)
```

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31732414

Pulled By: eellison

fbshipit-source-id: 290a94a667c20467717202a43c60e4f9ca4c00e2
2021-10-19 16:41:49 -07:00
b4db5174fe Add support for cat in output stitching (#66098)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66098

`cat` is somewhat special-cased right now because currently we only have list of Tensor inputs where the list is constructed in the JIT IR graph. While that is generally true for Fusion (e.g. why we have ConstantChunk) that may not be true for shape analysis generally, so I'm waiting a bit to generalize.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31732415

Pulled By: eellison

fbshipit-source-id: 7f513cea355f1e4c1d2ca7c32c06690a9bdcb050
2021-10-19 16:41:44 -07:00
0fdc9b77a3 Add support for multi output nodes in partial eval graph stitching (#66097)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66097

Adding logic to generate runtime shapes for nodes with multi-outputs. It is generalizing existing flow of looking at a node, getting its shape graph, inlining it, and adding a mapping from the output to the new value in the stitched shape compute graph to loop over multiple outputs.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31732418

Pulled By: eellison

fbshipit-source-id: 767698d031b1daf002678a025b270e0ede429061
2021-10-19 16:41:39 -07:00
cc7de1df3b Add Handling of Cat in Shape Analysis (#65575)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65575

This is needed for lowering an NNC model to mobile. It is also the last class of unhandled ops which NNC fuses, and we need integration this for computing output symbolic shapes.

The graph of with two dynamic shape inputs produces:
```
graph(%x.1 : Tensor(SS(-2), 2, 3),
      %y.1 : Tensor(SS(-3), 2, 3)):
  %5 : int = prim::Constant[value=0]()
  %4 : Tensor[] = prim::ListConstruct(%x.1, %y.1)
  %6 : Tensor(SS(-4), 2, 3) = aten::cat(%4, %5) # /private/home/eellison/pytorch/test/jit/test_symbolic_shape_analysis.py:290:19
  return (%6)
```
With a partial eval graph of
```
Done with partial evaluation
graph(%129 : int[],
      %130 : int[],
      %dim.14 : int):
  %738 : int = prim::Constant[value=3]()
  %737 : int = prim::Constant[value=2]()
  %132 : int = prim::Constant[value=0]()
  %392 : int = aten::__getitem__(%129, %132) # <string>:339:44
  %417 : int = aten::__getitem__(%130, %132) # <string>:339:44
  %cat_dim_size.48 : int = aten::add(%392, %417) # <string>:339:29
  %result_size.5 : int[] = prim::ListConstruct(%cat_dim_size.48, %737, %738)
  return (%result_size.5)
```

To handle cat, I essentially make the cat shape op variadic,
replacing
```
torch.cat([x, y]
...
def cat_shape_op(tensors: List[List[int]], dim: int):
    ...
    op(tensors)
```
with
```
def cat_shape_op(x: List[int], y: List[int], dim: int):
    tensors = [x, y]
    op(tensors)
```
This reuses the existing input Tensor properties partial evaluation path and avoids having to add special handling to optimize out `len(tensors)` calls in the IR.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31732416

Pulled By: eellison

fbshipit-source-id: 6d93ddf62c34846ec238159f75229632515530b7
2021-10-19 16:41:34 -07:00
66543f88de Add x + 0 optimization (#65574)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65574

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31732420

Pulled By: eellison

fbshipit-source-id: 0271e0dc0ddab06220048ed5bf4236fc85f3318c
2021-10-19 16:41:29 -07:00
853fc25fb0 Fix bug preventing optimization from firing (#65573)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65573

When we remove mutation on
```
x = [0, 1, 3, 4]
x[-2] = 4
```
we have a safety check that the new index will be in bounds of the old index. in practice, this should always be the case otherwise you would have a runtime error. Within that check (not within the actual adjustment) we were using the wrong length of inputs preventing the optimization from firing.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31732417

Pulled By: eellison

fbshipit-source-id: dd734254c0212ca459c1c135da262974de5299be
2021-10-19 16:41:24 -07:00
5db7db667f [JIT] Add partial evaluation graph stitching logic (#65377)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65377

When we run symbolic shape analysis on
```
conv = torch.nn.Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
max_pool = torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
mod = nn.Sequential(conv1, max_pool)
...
graph(%self : __torch__.torch.nn.modules.container.___torch_mangle_0.Sequential,
      %input.1 : Tensor):
  %18 : bool = prim::Constant[value=0]()
  %30 : int[] = prim::Constant[value=[1, 1]]()
  %29 : int[] = prim::Constant[value=[3, 3]]()
  %28 : int[] = prim::Constant[value=[2, 2]]()
  %6 : int = prim::Constant[value=1]()
  %self.0.bias : NoneType = prim::Constant()
  %self.0.weight : Double(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %input.5 : Tensor(SS(-2), 64, SS(-3), SS(-4)) = aten::conv2d(%input.1, %self.0.weight, %self.0.bias, %28, %29, %30, %6)
  %input.9 : Tensor(SS(-2), 64, SS(-5), SS(-6)) = aten::max_pool2d(%input.5, %29, %28, %30, %30, %18)
  return (%input.9)
```
we partially evaluate the shape compute graph of `conv2d`, whose output gets passed in and used to partially evaluate the shape compute graph of `max_pool2d`.

The conv2d remaining partially eval'd graph is [here](https://gist.github.com/eellison/0598bd224a422211efa1a45d2b7560b7), and the maxpool2d eval'd graph is [here](https://gist.github.com/eellison/625540b84f650ddbefd3ae5511ab8814). We can take the partially eval'd graphs of a series of operators and stitch them together, which allows us to
a) recover symbolic equivalences by CSE'ing & other optimizations
b) calculate shapes for a whole block of operators just on the input, such as for fusing the whole model to nnc with dynamic shapes and then passing along the computed symbolic shapes. the calculation will also handle error handling.
c) (future-looking) generate inputs on demand for straight-line networks that are composed just of aten operators

The combined graph of the two gives us compute for the unknown symbolic dimensions - `SS(-2), SS(-3), SS(-4), SS(-5), and SS(-6)`.
```
graph(%input.1 : int[]):
  %42 : bool = prim::Constant[value=0]() # <string>:152:17
  %15 : int = prim::Constant[value=3]()
  %input_batch_size_dim.1 : int = prim::Constant[value=0]() # <string>:417:41
  %13 : int = prim::Constant[value=1]() # <string>:426:61
  %12 : int = prim::Constant[value=4]() # <string>:437:32
  %11 : str = prim::Constant[value="AssertionError: "]()
  %9 : int = prim::Constant[value=2]()
  %8 : int = prim::Constant[value=6]()
  %7 : int = prim::Constant[value=7]()
  %16 : int = aten::len(%input.1) # <string>:438:17
  %17 : bool = aten::eq(%16, %12) # <string>:438:17
   = prim::If(%17) # <string>:438:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:438:10
      -> ()
  %18 : int = aten::__getitem__(%input.1, %13) # <string>:407:17
  %19 : bool = aten::eq(%18, %15) # <string>:407:17
   = prim::If(%19) # <string>:407:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:407:10
      -> ()
  %20 : int = aten::__getitem__(%input.1, %9) # <string>:411:20
  %21 : int = aten::add(%20, %8) # <string>:411:20
  %22 : bool = aten::ge(%21, %7) # <string>:411:20
   = prim::If(%22) # <string>:411:12
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:411:12
      -> ()
  %23 : int = aten::__getitem__(%input.1, %15) # <string>:411:20
  %24 : int = aten::add(%23, %8) # <string>:411:20
  %25 : bool = aten::ge(%24, %7) # <string>:411:20
   = prim::If(%25) # <string>:411:12
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:411:12
      -> ()
  %26 : int = aten::__getitem__(%input.1, %input_batch_size_dim.1) # <string>:422:29
  %27 : int = aten::sub(%20, %13) # <string>:428:32
  %28 : int = aten::floordiv(%27, %9) # <string>:428:32
  %29 : int = aten::add(%28, %13) # <string>:428:32
  %30 : int = aten::sub(%23, %13) # <string>:428:32
  %31 : int = aten::floordiv(%30, %9) # <string>:428:32
  %32 : int = aten::add(%31, %13) # <string>:428:32
  %48 : int = aten::floordiv(%28, %9) # <string>:133:17
  %outputSize.2 : int = aten::add(%48, %13) # <string>:136:23
  %51 : int = aten::floordiv(%31, %9) # <string>:133:17
  %outputSize.1 : int = aten::add(%51, %13) # <string>:136:23
  %53 : bool = aten::ne(%29, %input_batch_size_dim.1) # <string>:156:41
  %54 : bool = prim::If(%53) # <string>:157:64
    block0():
      %55 : bool = aten::ne(%32, %input_batch_size_dim.1) # <string>:157:93
      -> (%55)
    block1():
      -> (%42)
   = prim::If(%54) # <string>:157:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:157:10
      -> ()
  %56 : bool = aten::ge(%outputSize.1, %13) # <string>:160:17
  %57 : bool = prim::If(%56) # <string>:160:17
    block0():
      %58 : bool = aten::ge(%outputSize.2, %13) # <string>:160:38
      -> (%58)
    block1():
      -> (%42)
   = prim::If(%57) # <string>:160:10
    block0():
      -> ()
    block1():
       = prim::RaiseException(%11) # <string>:160:10
      -> ()
  return (%26, %29, %32, %outputSize.2, %outputSize.1)
  ```

This PR runs shape analysis, retains the partially evaluated graphs, and then stitches them together, keeping track of what inputs in the partial eval graph correspond to what inputs in the encompassing graph IR and what outputs correspond to what symbolic shape. Adding NNC ppl as reviewers because it is relevant to dynamic shape fusion.

Question for reviewers  : should I make this a separate file ?

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31732419

Pulled By: eellison

fbshipit-source-id: 883a55cbeef0fd5a6068a779ffa89b6f537245b3
2021-10-19 16:41:19 -07:00
16d0896b69 [JIT][Easy] Shape cleanups (#65148)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65148

No functional changes, factoring out optimizations and renaming the `graph` in symbolic shape analysis to `shape_compute_graph` as ZolotukhinM suggested

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31732421

Pulled By: eellison

fbshipit-source-id: e934507d1795e0bc4d98a3bfe6cb792e2f08b119
2021-10-19 16:39:32 -07:00
b3bb234e16 Remove THCGeneral.cpp (#66766)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66766

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D31721647

Pulled By: ngimel

fbshipit-source-id: 5033a2800871c8745a1a92e379c9f97c98af212e
2021-10-19 16:09:19 -07:00
bd4d5cb14c Sparse CSR: Add torch.empty (#63509)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63509

The primary use of `torch.empty` is to reserve memory for tensor and set the type, device, size information. The same is done here for SparseCSR.
`crow_indices` is initialized as an empty tensor of size `num_rows + 1`. `col_indices` and `values` are initialized as empty tensors of size 0.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D31770359

Pulled By: cpuhrsch

fbshipit-source-id: c83f2a2e0d7514ba24780add1086e1bccf541dd9
2021-10-19 15:59:07 -07:00
b1a6129e09 Add repr to StreamWrapper (#66880)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66880

Help to print out `fileobj`

Test Plan: Imported from OSS

Reviewed By: NivekT

Differential Revision: D31764431

Pulled By: ejguan

fbshipit-source-id: 668a8fbe0078196d4d584be3dfb413c8ad5e72b1
2021-10-19 15:28:25 -07:00
e70b5d64f4 Change README getting started link to explicit instructions (#66828)
Summary:
This changes the link for installing binaries to the page on pytorch.org that is entirely the download command selector (which isn't visible on a normal aspect ratio screen when the main website page first loads anymore).

This also includes some other random fixes:
* Update HUD link
* Clean ups

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66828

Reviewed By: malfet

Differential Revision: D31750654

Pulled By: driazati

fbshipit-source-id: aef9ceba71418f6f7648eab9a8c8a78d6c60518b
2021-10-19 14:59:48 -07:00
cbd7bac914 Migrate clang5-mobile build to GHA (#66673)
Summary:
`linux-xenial-py3-clang5-mobile-build`, `linux-xenial-py3-clang5-mobile-custom-build-dynamic`, `linux-xenial-py3-clang5-mobile-custom-build-dynamic` and `linux-xenial-py3-clang5-mobile-code-analysis` are just the flavors of regular linux build job with no tests.
`linux-xenial-py3-clang5-mobile-code-analysis` is the master only job

`code-analysis` job is dispatch to `.jenkins/pytorch/build-mobile-code-analysis.sh` in
583217fe37/.jenkins/pytorch/build.sh (L23-L25)
and all `mobile-build` jobs are dispatched to `.jenkins/pytorch/build-mobile.sh` in
583217fe37/.jenkins/pytorch/build.sh (L19-L21)

Rename `is_libtorch` `CIWorkflow` property into `build_generates_artifacts` and change defaults from False to True
Both libtorch and mobile build jobs do not generate build artifacts

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66673

Reviewed By: janeyx99

Differential Revision: D31674434

Pulled By: malfet

fbshipit-source-id: 24d05d55366202cd4d9c25ecab429cb8f670ded0
2021-10-19 14:13:29 -07:00
15f21eef5e [fx2trt]fix softmax test (#66885)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66885

Test Plan: CI

Reviewed By: hl475

Differential Revision: D31767433

fbshipit-source-id: 1ee79ac027c612b5397be9da9665fff21b2c321f
2021-10-19 13:55:49 -07:00
a1afb692f3 Fix metal issues with irange (#66877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66877

Fixes (hopefully):
```
program_source:516:27: error: use of undeclared identifier 'c10'
    for (const auto idx : c10::irange(4)) {
                          ^
program_source:590:27: error: use of undeclared identifier 'c10'
    for (const auto idx : c10::irange(4)) {
                          ^
program_source:810:26: error: use of undeclared identifier 'c10'
    for (const auto iy : c10::irange(roi_bin_grid_h)) {
                         ^
program_source:811:30: error: use of undeclared identifier 'c10'
        for (const auto ix : c10::irange(roi_bin_grid_w)) {
                             ^

DeviceName: AMD Radeon Pro 5500M, LanguageVersion: 131075
Exception raised from -[MetalContext available] at xplat/caffe2/aten/src/ATen/native/metal/MetalContext.mm:66 (most recent call first):
(no backtrace available)
```

Test Plan: Sandcastle

Reviewed By: benb, xta0

Differential Revision: D31763270

fbshipit-source-id: cfe4364b14c5fe6dbd39893788919769c9a9eb00
2021-10-19 13:49:24 -07:00
66f241230d [PyTorch] Take const Type& in {tryS,s}calarTypeFromJitType (#66717)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66717

No need to require a refcount bump for this function.
ghstack-source-id: 140921170

Test Plan: CI

Reviewed By: suo

Differential Revision: D31696898

fbshipit-source-id: a3732a04ccbddc32207ce90836030f3020154a77
2021-10-19 13:08:42 -07:00
9a00910bf3 [skip ci] Set test owner for test_linalg.py (#66844)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66844

Reviewed By: gchanan

Differential Revision: D31761714

Pulled By: janeyx99

fbshipit-source-id: a4c7b239d855707ee6ec1194f57f8a66812b4e99
2021-10-19 13:01:05 -07:00
57c596eb9e add interactive_embedded_interpreter.cpp to the OSS build (#66352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66352

Add cmake rules for interactive_embedded_interpreter.cpp .

The builtin_registry.cpp has already been handled in https://github.com/pytorch/pytorch/pull/66347 . I'll remove the change in this PR once that one is merged.

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D31521249

Pulled By: shunting314

fbshipit-source-id: bb9d340e5a6aad7d76078ca03a82b5ae7494a124
2021-10-19 12:32:49 -07:00
3488a85a76 Sparse CSR CUDA: fix input checks for addmm and mm (#66485)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66485

The errors for incorrectly sized inputs should match the dense variants
of functions.
Moved addmm_out_sparse_csr_dense_cuda from SparseCsrTensorMath.cu and
removed unnecessary device check.

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D31764036

Pulled By: cpuhrsch

fbshipit-source-id: 76900fe9e4a49474695a01f34bad41cb3422321c
2021-10-19 12:01:11 -07:00
690c2a7076 masked_scatter: fuse mask count check into one kernel (#66871)
Summary:
This saves 1 kernel launch, 7 dispatcher calls, 3 `TensorImpl` allocations and 1 CUDA memory allocation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66871

Reviewed By: gchanan

Differential Revision: D31763713

Pulled By: ngimel

fbshipit-source-id: b0d2f9415b7fd013fb4e7d68ade6e38a58f5b153
2021-10-19 11:52:38 -07:00
552af8bdef [PyTorch] Fix missing move in OptionalType::createWithContained (#66697)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66697

We own this vector, so we can move from it.
ghstack-source-id: 140742640

Test Plan: CI

Reviewed By: suo

Differential Revision: D31693230

fbshipit-source-id: 3f33ca6e47e29b0e3d6c8fad59c234c55e1e159f
2021-10-19 11:47:35 -07:00
7e81a89e13 [PyTorch] Fix performance-no-automatic-move clang tidy warnings in matchTypeVariables (#66720)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66720

See the documentation for the warning. https://clang.llvm.org/extra/clang-tidy/checks/performance-no-automatic-move.html
ghstack-source-id: 140922952

Test Plan: CI

Reviewed By: suo

Differential Revision: D31697506

fbshipit-source-id: 26ce6c47d0f3b0c4e48ecc882f6792f1b5a45bac
2021-10-19 11:30:46 -07:00
50f5689d60 Set test owner for distributions tests (#66842)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc fritzo neerajprad alicanb nikitaved

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66842

Reviewed By: neerajprad

Differential Revision: D31761720

Pulled By: janeyx99

fbshipit-source-id: 9d9e88d93e2efb90c971f165b4040880e9d90c56
2021-10-19 11:00:29 -07:00
c37f413e75 [skip ci] Change pretrained to false for quantization tests (#66795)
Summary:
Helps resolve a bit of https://github.com/pytorch/pytorch/issues/65439

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66795

Reviewed By: suo, jerryzh168

Differential Revision: D31732043

Pulled By: janeyx99

fbshipit-source-id: 10b71865fc937f9d72f2b1c04cbf3ea9a68c8818
2021-10-19 10:56:29 -07:00
c9d9244166 [skip ci] Set test owner for test_spectral_ops.py (#66843)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc mruberry peterbell10

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66843

Reviewed By: gchanan

Differential Revision: D31761715

Pulled By: janeyx99

fbshipit-source-id: 1173a200478b87568768fafcfee117c09c1cffbd
2021-10-19 10:56:27 -07:00
34051d74da Add test owner to distributed files starting with test_ (#66797)
Summary:
Action based on https://github.com/pytorch/pytorch/issues/66232

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66797

Reviewed By: gchanan

Differential Revision: D31761389

Pulled By: janeyx99

fbshipit-source-id: c27c9ab4acec1eb71d5edd4538cd113b770dfc6c
2021-10-19 10:55:20 -07:00
94afbd158c [skip ci] Set test owner for test_numpy_interop.py (#66851)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc mruberry rgommers

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66851

Reviewed By: gchanan

Differential Revision: D31761703

Pulled By: janeyx99

fbshipit-source-id: 4dec507dff0ce25d2780b6020f0d9790ab1cb499
2021-10-19 10:50:54 -07:00
17f07c310b Fix type checking errors in torch/ao/quantization/quantize_fx.py (#66804)
Summary:
- [x] Fix the Pyre type checking errors in `torch/ao/quantization/quantize_fx.py`
```
torch/quantization/quantize_fx.py:41:8 Incompatible variable type [9]: fuse_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:143:16 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:144:16 Incompatible variable type [9]: equalization_qconfig_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:206:8 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:230:12 Incompatible variable type [9]: fuse_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:268:8 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:269:8 Incompatible variable type [9]: equalization_qconfig_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:427:8 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:464:8 Incompatible variable type [9]: convert_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:486:8 Incompatible variable type [9]: convert_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/quantize_fx.py:547:8 Incompatible variable type [9]: convert_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
```
Fixes the issue: [MLH-Fellowship/pyre-check/issues/76](https://github.com/MLH-Fellowship/pyre-check/issues/76)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66804

Reviewed By: onionymous

Differential Revision: D31738171

Pulled By: 0xedward

fbshipit-source-id: 00d4c5749c469aff39a1531365461ced747e52fc
2021-10-19 09:45:18 -07:00
a2e94b80fa Create linalg.matrix_exp (#62715)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62715

Fixes https://github.com/pytorch/pytorch/issues/61648

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D31641698

Pulled By: mruberry

fbshipit-source-id: 2e2965d14807b6b4fada4b809d539066dd0ba277
2021-10-19 09:07:15 -07:00
fd608cd313 [skip ci] Set test owners for optim tests (#66861)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc vincentqb jbschlosser albanD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66861

Reviewed By: albanD

Differential Revision: D31761369

Pulled By: janeyx99

fbshipit-source-id: 57829e1f1509fc2af321530a4b55c9d33b7fb150
2021-10-19 08:39:35 -07:00
c806bb1022 [skip ci] Set test owner for test_complex.py (#66835)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66835

Reviewed By: anjali411

Differential Revision: D31761723

Pulled By: janeyx99

fbshipit-source-id: ca672f5a1be9dc27284fade725a8238cbfd877a3
2021-10-19 08:36:27 -07:00
299a6a65b2 [skip ci] Set test owners for autograd tests (#66834)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66834

Reviewed By: albanD

Differential Revision: D31761778

Pulled By: janeyx99

fbshipit-source-id: 355edfb1b940154e84fbba6f7b096605e75ae459
2021-10-19 08:35:02 -07:00
39215ddf84 [skip ci] Set test owners for dataloader tests (#66839)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc SsnL VitalyFedyunin ejguan NivekT

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66839

Reviewed By: ejguan

Differential Revision: D31761722

Pulled By: janeyx99

fbshipit-source-id: 8315ac03352c11b3215d89856b3cfda6cd78fa0c
2021-10-19 08:31:16 -07:00
9eab6da887 [skip ci] Set test owner for nn tests (#66850)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

cc albanD mruberry jbschlosser walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66850

Reviewed By: albanD

Differential Revision: D31761712

Pulled By: janeyx99

fbshipit-source-id: 7272154cac77e2ce38370775a9e8d41252e13166
2021-10-19 08:26:50 -07:00
05b6dc9d75 Fix BatchMatMul test and shape inference (#66733)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66733

Fix the test for BatchMatMul to compare glow/caffe2 outputs and fix its shape inference function since it made simplifying assumptions for broadcasting and failed on some of the shapes in the test. The previous inference was failing for any cases where the first n - 2 output dimensions of A x B was not simply that of whichever one of A or B had higher rank (ex. A: [2, 2, 2, 3, 4], B: [3, 1, 2, 2, 4, 5] we expect output dimensions [3, 2, 2, 2, 3, 5] rather than [3, 1, 2, 2, 3, 5].

Test Plan:
```
buck test glow/fb/test/numerics:test_operator_onnxifinnpi -- -r .*test_batch_matmul_manydims.* --env USE_INF_API=1
```

Reviewed By: khabinov

Differential Revision: D31701184

fbshipit-source-id: 31d0fb17409a399b90fb8042385e000ed81c3581
2021-10-19 07:53:13 -07:00
9f782f8b35 add OpInfo for torch.nn.pixel_unshuffle (#65468)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65468

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D31111699

Pulled By: zou3519

fbshipit-source-id: a92c2f1f4986a54abab82360e97ea2ce22fb9397
2021-10-19 07:36:35 -07:00
1164118fc2 add OpInfo for torch.nn.pixel_shuffle (#65467)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65467

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D31111697

Pulled By: zou3519

fbshipit-source-id: 618e6b2cc927814f85500374a2838d98c9c45d6e
2021-10-19 07:36:33 -07:00
8f09292c5e add OpInfo for torch.nn.functional.pairwise_distance (#65460)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65460

cc albanD mruberry jbschlosser walterddr

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D31111701

Pulled By: zou3519

fbshipit-source-id: a4034418cf8d14f584134a16d822181703858f99
2021-10-19 07:35:10 -07:00
0036e41143 [quant][embedding qat] Add eager QAT test for EmbeddingBag+Linear model (#66334)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66334

Test Plan: Imported from OSS

Reviewed By: HDCharles

Differential Revision: D31618283

Pulled By: b-koopman

fbshipit-source-id: bb824a341f1aa9d7e83f8e66d320a9dfd348a1d7
2021-10-19 07:03:36 -07:00
0a07488ed2 use irange for loops 1 (#66741)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66741

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D31705360

fbshipit-source-id: 7115f76e381ad2d98584eb534961c3cbb957ebaa
2021-10-19 03:28:51 -07:00
72803dbcfd [caffe2] Fix invalid vector accesses and polar() call (#66757)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66757

`InterpreterStateImpl::run()` gets the number of outputs from the current frame, but by the time the continuation completes, the frame is gone, so we're calling `front()` on an empty vector. This works out in practice (data is still there) but it is technically undefined behavior and could break in the future.

Also, `std::polar()` expects its argument to be non-negative, but `c10::polar()` does not, so implement it explicitly (implementation is the same as libstdc++).

Test Plan: JIT tests pass.

Reviewed By: zhxchen17

Differential Revision: D31715587

fbshipit-source-id: 98abcc10c2742887af866d8e70169a0187c41d33
2021-10-19 00:29:54 -07:00
147f7559b1 Add SourceView which doesn't own source text as base class of Source (#65309)
Summary:
This would save the cost copying text from stack to heap in some cases (like
parsing function schema during loading phase of libtorch.so)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65309

Reviewed By: swolchok

Differential Revision: D31060315

Pulled By: gmagogsfm

fbshipit-source-id: 0caf7a688b40df52bb4388c5191d1a42351d6f1a
2021-10-18 23:17:22 -07:00
bff64e84cd [DDP] Track models with sync bn (#66680)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66680

Closes https://github.com/pytorch/pytorch/issues/66215. Tracks models
with sync BN so we can find workflows that use them and target for perf
optimization.
ghstack-source-id: 140875182

Test Plan: CI

Reviewed By: pritamdamania87

Differential Revision: D31679477

fbshipit-source-id: 0e68cd1a7aabbc5b26227895c53d33b8e98bfb8e
2021-10-18 22:31:52 -07:00
e0643fa3fc use irange for loops 5 (#66744)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66744

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D31705358

fbshipit-source-id: d6ea350cbaa8f452fc78f238160e5374be637a48
2021-10-18 21:59:50 -07:00
bceb1db885 use irange for loops 3 (#66747)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66747

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D31705365

fbshipit-source-id: 5c3af2184766b063eed2f4e8feb69f1fedd3503e
2021-10-18 21:50:32 -07:00
061baf02bf Skip failing tests when LAPACK and MAGMA are not available (#64930)
Summary:
Skip failing tests when LAPACK and MAGMA are not available for ` test_linalg.py` and ` test_ops.py`.
Note that there's no CI without LAPACK or MAGMA. I verified locally that now it works as expected, but in the future we have no guards against tests failing again for this situation.

<details>
  <summary> test_ops.py failures that are fixed</summary>

 ```
 FAILED test/test_ops.py::TestCommonCPU::test_out_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestCommonCPU::test_reference_testing_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestCommonCPU::test_reference_testing_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_triangular_solve_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_triangular_solve_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestMathBitsCPU::test_conj_view_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestMathBitsCPU::test_conj_view_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestMathBitsCPU::test_neg_view_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestMathBitsCPU::test_neg_view_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation
 ```

</details>

<details>
  <summary> test_linalg.py failures that are fixed</summary>
```
FAILED test/test_linalg.py::TestLinalgCPU::test_norm_dtype_cpu - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCPU::test_norm_matrix_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCPU::test_norm_matrix_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCPU::test_nuclear_norm_axes_small_brute_force_old_cpu - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_eigh_hermitian_grad_meta_complex128 - RuntimeError: Calling torch.linalg.eigh or eigvalsh on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support.
FAILED test/test_linalg.py::TestLinalgMETA::test_eigh_hermitian_grad_meta_float64 - RuntimeError: Calling torch.linalg.eigh or eigvalsh on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support.
FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_lowrank_cuda_float64 - RuntimeError: Calling torch.lu on a CUDA tensor requires compiling PyTorch with MAGMA. lease rebuild with MAGMA.
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation
```
</details>

Fixes https://github.com/pytorch/pytorch/issues/59662

cc mruberry jianyuh nikitaved pearu walterddr IvanYashchuk xwang233 Lezcano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64930

Reviewed By: zou3519

Differential Revision: D31739416

Pulled By: mruberry

fbshipit-source-id: 153c40d8eeeb094b06816882a7cbb28c681509a9
2021-10-18 21:30:01 -07:00
08a464a9f3 [PyTorch] Pass c10::optional<bool> to Stride ctor by value (#66698)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66698

this type should fit in a register; no need to pass by reference.
ghstack-source-id: 140742830

Test Plan: CI

Reviewed By: suo

Differential Revision: D31693291

fbshipit-source-id: 299fb3d1830a059b59268487c22e030446c3496e
2021-10-18 21:28:56 -07:00
c9c52b760b test addr type promotion in a single test (#66812)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66802
Test time goes from 150s to 15s.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66812

Reviewed By: mruberry

Differential Revision: D31739299

Pulled By: ngimel

fbshipit-source-id: cb6d92ff335f46ee06b2480bdd9143f85865bccf
2021-10-18 21:21:11 -07:00
d05c1ec007 Add lazy Node base and associated infra (#66601)
Summary:
- Adds Node base class and unit tests
- Also adds metadata utils to enable source code annotation and scope tracking

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66601

Test Plan: Add new unit tests

Reviewed By: desertfire

Differential Revision: D31634044

fbshipit-source-id: a042d54f06fbc480acfc63c18d43cb6fceb6fea5
2021-10-18 19:09:42 -07:00
a17a4e93ce [PyTorch][easy] Fix missing move in UnionType::createWithContained (#66691)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66691

Does what it says on the tin.
ghstack-source-id: 140736047

Test Plan: CI

Reviewed By: suo

Differential Revision: D31691627

fbshipit-source-id: 21a5d0248bf3412f5af36260597a5f663ab34361
2021-10-18 18:04:22 -07:00
c9c447f4be [PyTorch] Fix missing moves in ListType (#66701)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66701

We own the argument vector.
ghstack-source-id: 140760983

Test Plan: CI

Reviewed By: suo

Differential Revision: D31693645

fbshipit-source-id: 02829bc3c728f6d1d07be08b0d977eee1efee38f
2021-10-18 18:00:18 -07:00
d0a63c978b [PyTorch][easy] Don't copy string in TensorType::repr_str unnecessarily (#66699)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66699

std::string::operator+ will copy the string an extra time even if the argument is `""`. See https://godbolt.org/z/3sM5h1qTo
ghstack-source-id: 140743822

Test Plan: CI

Reviewed By: suo

Differential Revision: D31693522

fbshipit-source-id: 6a8033c90366904b9aff44214b600cfb255a0809
2021-10-18 17:55:21 -07:00
f65b4b7a4c [PyTorch] Avoid refcount bump in UnionType::canHoldType (#66693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66693

Passing a `TypePtr` by value causes an unnececssary refcount
bump. We don't need to take ownership, so `const Type&` is all we
need.

I considered providing a compatibility shim that takes `const
TypePtr&`, but doing so is dangerous because a
copy is required to convert from a more specific pointer like
`NoneTypePtr`.
ghstack-source-id: 140737081

Test Plan: CI

Reviewed By: suo

Differential Revision: D31691869

fbshipit-source-id: f766ce3234a28771c2a9ca4c284eb3f96993a3d0
2021-10-18 17:39:59 -07:00
1db50505d5 [nn] MultiLabelSoftMarginLoss : no batch dim support (#65690)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/60585

cc albanD mruberry jbschlosser walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65690

Reviewed By: zou3519

Differential Revision: D31731162

Pulled By: jbschlosser

fbshipit-source-id: d26f27555f78afdadd49126e0548a8bfda50cc5a
2021-10-18 15:30:01 -07:00
8173d4df69 move get_cycles_per_ms() to common_utils (#66798)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66798

get_cycles_per_ms is copied and used in a few places, move it to common_utils so that it can be used as a shared util function
ghstack-source-id: 140790599

Test Plan: unit tests

Reviewed By: pritamdamania87

Differential Revision: D31706870

fbshipit-source-id: e8dccecb13862646a19aaadd7bad7c8f414fd4ab
2021-10-18 14:04:09 -07:00
d024f1134d ci: Move bazel download from github -> s3 (#66815)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66815

Was seeing 403's when attempting to wget from github, re-hosting the
binary on s3 so we shouldn't see those issues anymore

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D31740656

Pulled By: seemethere

fbshipit-source-id: 4462678d51a52b63020f8da18d7cdc80fb8dbc5d
2021-10-18 13:34:40 -07:00
06e49ea088 [not4land][quant][fx][graphmode] lower reference linear module example (#65723)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65723

Example lowering reference linear module to fbgemm/qnnpack quantized linear module

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31567461

fbshipit-source-id: 0b8fffaf8e742ec15cb07bf6a4672cf3e856db2d
2021-10-18 13:14:39 -07:00
c994a7fc2d Update documentation of torch.nn.Upsample (#66756)
Summary:
The documentation of torch.nn.Upsample stated that `align_corners` only affects `linear`, `bilinear` and `trilinear`.

This PR updates the documentation for the Python `Upsample` module and the C++ `UpsampleOptions` struct to reflect that `bicubic` is also affected by `align_corners`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66756

Reviewed By: zou3519

Differential Revision: D31731148

Pulled By: jbschlosser

fbshipit-source-id: 3ec277fc3fbdf8414d0de327d8c57ba07342a5b9
2021-10-18 13:07:17 -07:00
0974215c4d Prefer mT and mH over transpose(-2, -1) and transpose(-2, -1).conj() (#64181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64181

This PR replaces all the calls to:
- `transpose(-2, -1)` or `transpose(-1, -2)` by `mT()` in C++ and `mT` in Python
- `conj().transpose(-2, -1)` or `transpose(-2, -1).conj()` or `conj().transpose(-1, -2)` or `transpose(-1, -2).conj()` by `mH()` in C++ and `mH` in Python.

It also simplifies two pieces of code, and fixes one bug where a pair
of parentheses were missing in the function `make_symmetric_matrices`.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D31692896

Pulled By: anjali411

fbshipit-source-id: e9112c42343663d442dc5bd53ff2b492094b434a
2021-10-18 13:02:25 -07:00
44fd312604 [PyTorch] Use intrusive_ptr to save space in KernelFunction (#65618)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65618

This saves 8 bytes per KernelFunction, which should help in resource-constrained environments.
ghstack-source-id: 140731069

Test Plan: CI

Reviewed By: ezyang

Differential Revision: D25405736

fbshipit-source-id: 757c0f1387da9147e46ac69af2aa9fffd2998e35
2021-10-18 12:53:45 -07:00
622e19b859 [PyTorch] Take const Type& in TensorType::fromNumberType (#66716)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66716

No need to require a refcount bump for this function.
ghstack-source-id: 140754065

Test Plan: CI

Reviewed By: suo

Differential Revision: D31696639

fbshipit-source-id: bf8aa3f542d52e82e0f6a444b8898330f3d16a31
2021-10-18 12:49:40 -07:00
6a7296be9c [PyTorch] Use castRaw in InterfaceType (#66728)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66728

Two extra refcount bumps.
ghstack-source-id: 140760872

Test Plan: CI

Reviewed By: suo

Differential Revision: D31698577

fbshipit-source-id: 1f50195a99f98f857abc9b03b4254519c316fefe
2021-10-18 12:44:24 -07:00
9ea3424747 Set test owner for fx (#66807)
Summary:
Action following https://github.com/pytorch/pytorch/issues/66232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66807

Reviewed By: jamesr66a

Differential Revision: D31736722

Pulled By: janeyx99

fbshipit-source-id: 5ffcb02a858137211bff1eabf158001dcb0359a6
2021-10-18 12:25:38 -07:00
8637556d23 Migrate THCState to ATen (#66765)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66765

This guts `THCState` to simply be an empty struct, as well as:
- moving `THCState_getPeerToPeerAccess` and its cache into `ATen`.
- cleaning up dead code in `THCGeneral.cpp`
- moving `THCudaInit` and `THCMagma_init` into `CUDAHooks::initCUDA`

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D31721648

Pulled By: ngimel

fbshipit-source-id: 772b24787656a95f9e3fcb287d912b1c3400f32d
2021-10-18 12:14:43 -07:00
1fcbd8fa15 [PyTorch] Fix extra refcount bumps in tryEvalTypeVariables (#66722)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66722

Missing move, s/cast/castRaw/, and take TypePtr arg by const ref because we only sometimes need to take ownership.
ghstack-source-id: 140757141

Test Plan: CI

Reviewed By: suo

Differential Revision: D31697631

fbshipit-source-id: 04afe13688c6e2aaf79157400c0a44021cb8179d
2021-10-18 12:06:37 -07:00
393299b124 [PyTorch] Fix unnecessary shared_ptr copies in RRefType (#66706)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66706

Missing moves in the construction path.
ghstack-source-id: 140746585

Test Plan: CI

Reviewed By: suo

Differential Revision: D31694356

fbshipit-source-id: 8e2bf2dd41f3f65fc06e30ffd5fddd487d01aaa8
2021-10-18 12:04:43 -07:00
d5a25faf7a [PyTorch] Fix unnecessary shared_ptr copies in EnumType (#66714)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66714

Forced copy in getValueType and unnecessary use of cast over castRaw.
ghstack-source-id: 140752791

Test Plan: CI

Reviewed By: suo

Differential Revision: D31696164

fbshipit-source-id: fc2316617a61ca32f1fb952fb0af18b8784a606b
2021-10-18 12:04:41 -07:00
9b729ebc88 [jit] shape propagation for quantization (#66343)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66343

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D31515839

Pulled By: IvanKobzarev

fbshipit-source-id: 1b2b953b93210a1cade64c30302478907fc639f3
2021-10-18 12:03:20 -07:00
1cf317b85f [ONNX] Support exporting with Apex O2 (#65374) (#66700)
Summary:
Apex O2 hook state_dict to return fp16 weights as fp32. Exporter cannot identify them as same tensors.
Since this hook is only used by optimizer, it is safe to remove this hook while exporting.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66700

Reviewed By: zou3519

Differential Revision: D31695132

Pulled By: malfet

fbshipit-source-id: 977bdf57240002498f3ad0f1a8046c352e9860e6
2021-10-18 11:54:09 -07:00
624ce95201 Run sparse tests only for TensorPipe agent. (#66661)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66661

Similar to https://github.com/pytorch/pytorch/pull/66600, runs
rpc_test.py sparse tests only for TP agent.
ghstack-source-id: 140666322

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D31669850

fbshipit-source-id: 41a66c8d1843130964aede5c77d391484607214f
2021-10-18 11:53:07 -07:00
7fad47e522 torch.linalg.lstsq: forward/backward AD support (#65054)
Summary:
As per title.

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65054

Reviewed By: zou3519

Differential Revision: D31729468

Pulled By: albanD

fbshipit-source-id: ab7df824bc80128e7f64f6444c7a4baa4786c161
2021-10-18 11:28:44 -07:00
6bde474066 [PyTorch] Fix extra refcount bumps in matchTypeVariables (#66719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66719

Some cast that could be castRaw. Parameters did not need to force a refcount bump.
ghstack-source-id: 140756356

Test Plan: CI

Reviewed By: suo

Differential Revision: D31697455

fbshipit-source-id: 87a8cba221a7ae53f2a485acafd31622e9328ff0
2021-10-18 11:15:07 -07:00
c373e188d8 [PyTorch] Fix extra refcount bumps in unifyTypes (#66718)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66718

Some missing moves and use of cast instead of castRaw (due to a previous automated fixup only being a partial fix).
ghstack-source-id: 140755229

Test Plan: CI

Reviewed By: suo

Differential Revision: D31697115

fbshipit-source-id: 86743f8982951a58638ba244b3a92d3737dde58b
2021-10-18 11:13:45 -07:00
472a6f2787 Strided masked reductions: sum, amax. Testing of masked reductions. (#65990)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65990

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D31729532

Pulled By: albanD

fbshipit-source-id: 855a6bb2a7c6e75c780a64ce23c0f29321f0e511
2021-10-18 11:10:32 -07:00
d777e490a5 [bc-breaking][quant][graphmode][fx] Produce reference patterns for GeneralShapeOps (#66647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66647

Missed in the last round ,
This adds reference patterns for general shape ops like view when is_reference is True

bc-breaking:
basically disabled getitem from supporting quantized ops here, we may support it later in fbgemm

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestQuantizeFxModels

Imported from OSS

Reviewed By: H-Huang

Differential Revision: D31680379

fbshipit-source-id: 6a3a7128514baf6d92b1607308c40339469d0066
2021-10-18 11:09:17 -07:00
eb1eefc399 [PyTorch] Fix unnecessary shared_ptr copies in DictType (#66702)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66702

Missing moves in the construction path and forced copies of the key & value type on access.
ghstack-source-id: 140744707

Test Plan: CI

Reviewed By: suo

Differential Revision: D31693818

fbshipit-source-id: 4c5d2359f58148744621abe81429e56e7889f754
2021-10-18 11:05:25 -07:00
09c4e73c95 [PyTorch] Fix unnecessary shared_ptr copies in FutureType (#66704)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66704

Missing moves in the construction path.
ghstack-source-id: 140746391

Test Plan: CI

Reviewed By: suo

Differential Revision: D31694296

fbshipit-source-id: 3bed477c811069248611efdb57ad27c6ca233442
2021-10-18 11:01:00 -07:00
62e89f692f [doc] typo (#66754)
Summary:
This PR fixes a typo in the `torch/autograd/function.py` doc

-----------------------

Additionally, the example at https://pytorch.org/docs/master/autograd.html#torch.autograd.Function doesn't quite compile:
```
'builtin_function_or_method' object has no attribute 'exp'
```
even though `i.exp()` is a valid function if `i` is a tensor.

I changed it to:
```
result = torch.exp(i)
```
but python doesn't like it either:
```
TypeError: exp(): argument 'input' (position 1) must be Tensor, not builtin_function_or_method
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66754

Reviewed By: albanD

Differential Revision: D31729400

Pulled By: soulitzer

fbshipit-source-id: eef783bcdc8d4693a8b7f1ab581e948abc0f9b94
2021-10-18 10:33:56 -07:00
f4a7273b5c Set test owners for module: ci (#66796)
Summary:
Action based on RFC https://github.com/pytorch/pytorch/issues/66232

cc seemethere malfet pytorch/pytorch-dev-infra

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66796

Reviewed By: seemethere

Differential Revision: D31732391

Pulled By: janeyx99

fbshipit-source-id: b894eab8a4a8737165d1ba7b536e1232f6c07a8f
2021-10-18 10:29:50 -07:00
8532061bce [sharded_tensor] support gloo/mpi backend in tests (#65855)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65855

This adjusted our test base to support non-nccl backend like gloo/mpi, so that we could test sharding on CPU with gloo/mpi backend.
ghstack-source-id: 140840866

Test Plan: wait for the CI for existing tests, also adding tests in the stacked diff above.

Reviewed By: pritamdamania87, bowangbj

Differential Revision: D31287162

fbshipit-source-id: d48dfc8ef886a4d34b1de42f3ce6b600b5c9a617
2021-10-18 10:17:59 -07:00
d549c8de78 fx quant: enable linear-bn1d fusion for PTQ (#66484)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66484

https://github.com/pytorch/pytorch/pull/50748 added linear - bn1d fusion
in Eager mode, for PTQ only. This PR also enables this in FX graph mode.

We reuse the existing conv-bn-relu fusion handler, renaming `conv` to
`conv_or_linear` for readability.

The QAT version is saved for a future PR, for both eager and FX graph.

Test Plan:
```
python test/test_quantization.py TestFuseFx.test_fuse_linear_bn_eval
```

Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D31575392

fbshipit-source-id: f69d80ef37c98cbc070099170e335e250bcdf913
2021-10-18 10:14:28 -07:00
9d287d0b63 [fx2trt]Add support for negative dim in softmax (#66760)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66760

Previously we didn't convert negative dim to positive dim.

Test Plan: WIP

Reviewed By: wushirong

Differential Revision: D31703127

fbshipit-source-id: 6d5ccecab45b46f867a05ee70c76a5980e41011d
2021-10-18 09:03:56 -07:00
aa7da7b09c [quant][embedding qat] Enable quint4 in EmbeddingBag QAT workflow (#66348)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66348

Test Plan: Imported from OSS

Reviewed By: HDCharles

Differential Revision: D31691300

Pulled By: b-koopman

fbshipit-source-id: 11bd75b608b972394fe9f7c9b7bf034af42f28b5
2021-10-18 08:51:39 -07:00
909694fd88 Fix nn.functional.max_poolNd dispatch (for arg: return_indices) (#62544)
Summary:
Please see https://github.com/pytorch/pytorch/issues/62545 for context.

The order of `return_indices, ceil_mode` is different for `nn.functional.max_poolNd` functions to what seen with `torch.nn.MaxPoolNd` (modular form). While this should be resolved in the future, it was decided to first raise a warning that the behavior will be changed in the future. (please see https://github.com/pytorch/pytorch/pull/62544#issuecomment-893770955 for more context)

This PR thus raises appropriate warnings and updates the documentation to show the full signature (along with a note) for `torch.nn.functional.max_poolNd` functions.

**Quick links:**

(_upstream_)

* Documentation of [`nn.functional.max_pool1d`](https://pytorch.org/docs/1.9.0/generated/torch.nn.functional.max_pool1d.html), [`nn.functional.max_pool2d`](https://pytorch.org/docs/stable/generated/torch.nn.functional.max_pool2d.html), and [`nn.functional.max_pool3d`](https://pytorch.org/docs/stable/generated/torch.nn.functional.max_pool3d.html).

(_this branch_)

* Documentation of [`nn.functional.max_pool1d`](https://docs-preview.pytorch.org/62544/generated/torch.nn.functional.max_pool1d.html?highlight=max_pool1d), [`nn.functional.max_pool2d`](https://docs-preview.pytorch.org/62544/generated/torch.nn.functional.max_pool2d.html?highlight=max_pool2d#torch.nn.functional.max_pool2d), and [`nn.functional.max_pool3d`](https://docs-preview.pytorch.org/62544/generated/torch.nn.functional.max_pool3d.html?highlight=max_pool3d#torch.nn.functional.max_pool3d).

cc mruberry jbschlosser

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62544

Reviewed By: gchanan

Differential Revision: D31179038

Pulled By: jbschlosser

fbshipit-source-id: 0a2c7215df9e132ce9ec51448c5b3c90bbc69030
2021-10-18 08:34:38 -07:00
e4a9ee8d42 Deduplicate codegenOutputQuery to query maximum CUDA compute capabilities (#55901)
Summary:
There were 2 versions of the same code which were slightly different although functionally equivalent.
When adding support for another CUDA / device version both would need to be changed and kept in sync. So it is better to have only 1 version of it as the unique source of truth.

I chose the implementation which looks cleaner and easier to read and added some minor enhancements and comments to further increase readability.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55901

Reviewed By: H-Huang

Differential Revision: D31636917

Pulled By: bertmaher

fbshipit-source-id: 622e1fabc39de4f3f1b1aa9a1544cfbd35a5cfd9
2021-10-18 07:42:15 -07:00
811f5a2b94 Adding StreamWrapper to ensure file object will be closed (#66715)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66715

Adding StreamWrapper to streams produced by DataPipes within PyTorch Core and TorchData

Test Plan: OSS CI and Internal Tests

Reviewed By: ejguan

Differential Revision: D31695248

fbshipit-source-id: c26fa1bc1688d5597851ad265f667fafdcd64c59
2021-10-18 07:31:32 -07:00
0d203a16fe Add relative and absolute tolerances for matrix_rank, pinv (#63102)
Summary:
This pull request introduces new keyword arguments for `torch.linalg.matrix_rank` and `torch.linalg.pinv`: `atol` and `rtol`.

Currently, only tensor overload has default values for either `atol` or `rtol`, the float overload requires both arguments to be specified.

FC compatibility: https://github.com/pytorch/pytorch/pull/63102#discussion_r710930509

Fixes https://github.com/pytorch/pytorch/issues/54151. Fixes https://github.com/pytorch/pytorch/issues/66618.

cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63102

Reviewed By: H-Huang

Differential Revision: D31641456

Pulled By: mruberry

fbshipit-source-id: 4c765508ab1657730703e42975fc8c0d0a60eb7c
2021-10-17 22:15:42 -07:00
53aac4b6f3 [PyTorch] Allow override for macro HAS_DEMANGLE (#66540)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66540

Currently the macro `HAS_DEMANGLE` is determined by compiler predefined macros. Here I'm adding an option to allow `HAS_DEMANGLE` to be defined in build files.

Test Plan: Rely on CI

Reviewed By: poweic

Differential Revision: D31600007

fbshipit-source-id: 76cf088b0f5ee940e977d3b213f1446ea64be036
2021-10-17 16:10:45 -07:00
3b4cb9ddca Revert D31577488: Migrate THCState to ATen
Test Plan: revert-hammer

Differential Revision:
D31577488 (65adf1dfa2)

Original commit changeset: 90604f30854f

fbshipit-source-id: 3d7e35b3d6ea94f2c999bcf821b33a9cf1db01ee
2021-10-16 21:51:36 -07:00
719d43a2a2 Revert D31547709: Remove THCGeneral.cpp
Test Plan: revert-hammer

Differential Revision:
D31547709 (aa0c31876b)

Original commit changeset: 059c47621863

fbshipit-source-id: e8c3597f2badbc5ecf356b381edea06a07331f24
2021-10-16 21:50:19 -07:00
8854817f44 Implement Python Array API asarray function. (#60627)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60627

In this PR, the core of `frombuffer` and `fromDLPack` onto _tensor_new.cpp_. `asarray`
uses such refactored functions for interpreting the object as a tensor. We follow the
Python Array API standard found:

https://data-apis.org/array-api/latest/API_specification/creation_functions.html?highlight=asarray

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D31640510

Pulled By: mruberry

fbshipit-source-id: d0869e0d73cb50023d5866b001dac5d34ca30dfd
2021-10-16 21:11:31 -07:00
9e3a2babfa Make aotCompile support multiple input sizes (#66727)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66727

Make aotCompile support multiple input sizes

Test Plan:
Able to compile and run a model with multiple inputs
```
(pytorch)  ~/fbsource/fbcode/caffe2/fb/nnc
└─ $ PYTORCH_JIT_LOG_LEVEL=aot_compiler buck run //caffe2/binaries:aot_model_compiler -- --model aot_test_model.pt --model_name=aot_test_model --model_version=v1 --input_dims="2,2,2;2,2,2"
Building: finished in 3.2 sec (100%) 7461/7461 jobs, 0/7461 updated
  Total time: 3.4 sec
BUILD SUCCEEDED
[DUMP aot_compiler.cpp:097] graph before shape propagation
[DUMP aot_compiler.cpp:097] graph(%x.1 : Tensor,
[DUMP aot_compiler.cpp:097]       %y.1 : Tensor):
[DUMP aot_compiler.cpp:097]   %3 : int = prim::Constant[value=1]() # :0:0
[DUMP aot_compiler.cpp:097]   %4 : Tensor = aten::add(%x.1, %y.1, %3) # /data/users/priyaramani/fbsource/fbcode/caffe2/test/mobile/nnc/aot_test_model.py:10:15
[DUMP aot_compiler.cpp:097]   return (%4)
(1,.,.) =                                                                                                                                                                                            0.3357  0.6137
  0.8472  0.0858

(2,.,.) =
  0.8406  0.2959
  0.6012  0.7184
[ CPUFloatType{2,2,2} ]
(1,.,.) =
  0.7086  0.6398
  0.0579  0.1913

(2,.,.) =
  0.8598  0.3641
  0.5925  0.0200
[ CPUFloatType{2,2,2} ]
here
2
2
graph 0x6130001ee2d0
[DUMP aot_compiler.cpp:118] graph after shape propagation
[DUMP aot_compiler.cpp:118] graph(%x.1 : Float(2, 2, 2, strides=[4, 2, 1], requires_grad=0, device=cpu),
[DUMP aot_compiler.cpp:118]       %y.1 : Float(2, 2, 2, strides=[4, 2, 1], requires_grad=0, device=cpu)):
[DUMP aot_compiler.cpp:118]   %3 : int = prim::Constant[value=1]() # :0:0
[DUMP aot_compiler.cpp:118]   %4 : Tensor(2, 2, 2) = aten::add(%x.1, %y.1, %3) # /data/users/priyaramani/fbsource/fbcode/caffe2/test/mobile/nnc/aot_test_model.py:10:15
[DUMP aot_compiler.cpp:118]   return (%4)
The compiled llvm assembly code was saved to aot_test_model.compiled.ll
The compiled model was saved to aot_test_model.compiled.pt

└─ $ ./compile_model.sh -m aot_test_model -p /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt -v v1 -i "2,2,2;2,2,2"
+ VERSION=v1
+ getopts m:p:v:i:h opt
+ case $opt in
+ MODEL=aot_test_model
+ getopts m:p:v:i:h opt
+ case $opt in
+ MODEL_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt
+ getopts m:p:v:i:h opt
+ case $opt in
+ VERSION=v1
+ getopts m:p:v:i:h opt
+ case $opt in
+ INPUT_DIMS='2,2,2;2,2,2'
+ getopts m:p:v:i:h opt
+ require_arg m aot_test_model
+ '[' -n aot_test_model ']'
+ require_arg p /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt
+ '[' -n /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt ']'
+ require_arg i '2,2,2;2,2,2'
+ '[' -n '2,2,2;2,2,2' ']'
+ '[' '!' -f /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt ']'
+++ dirname ./compile_model.sh
++ cd .
++ pwd -P
+ SRC_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc
+ FBCODE_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../..
+ FBSOURCE_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../..
+ KERNEL_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../../xplat/pytorch_models/build/aot_test_model/v1/nnc
++ echo /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt
++ sed 's/.pt.*//'
+ MODEL_PATH_PREFIX=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model
+ LLVM_CODE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.ll
+ ASSEMBLY_CODE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.s
+ COMPILED_MODEL_FILE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.pt
+ KERNEL_FUNC_NAME=nnc_aot_test_model_v1_forward
+ cd /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../..
+ buck run //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc -- --model /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.pt --print_output true --input_dims '2,2,2$
2,2,2' --input_type 'float;float' --input_memory_format 'contiguous_format;contiguous_format'
clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument]

Downloaded 1/4 artifacts, 2.11 Kbytes, 50.0% cache miss (for updated rules)
Building: finished in 12.2 sec (100%) 4572/4572 jobs, 3/4572 updated
  Total time: 12.2 sec
BUILD SUCCEEDED
Run with 56 threads
Run with 56 threads
Loading model...
Model loaded: /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.pt
Running forward ...
(1,.,.) =
 -0.7451 -0.7451
 -0.7451 -0.7451

(2,.,.) =
 -0.7451 -0.7451
 -0.7451 -0.7451
[ CPUFloatType{2,2,2} ]
Starting benchmark.
Running warmup runs.
Main runs.
Main run finished. Milliseconds per iter: 0.0887. Iters per second: 11274
Memory usage before main runs: 71262208 bytes
Memory usage after main runs: 71573504 bytes
Average memory increase per iter: 31129.6 bytes
0 value means "not available" in above
```

Reviewed By: ljk53

Differential Revision: D31631975

fbshipit-source-id: 7956787b3e121f9c14f4733398a64c2f7ae84373
2021-10-16 20:04:52 -07:00
962c6476da Refactor: move method to func compilation work to compileMethod, add option to specify method name (#66726)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66726

Move method to func compilation work to compileMethod

Test Plan:
Mobilenetv3 compiles and runs successfully
```
(pytorch)  ~/fbsource/fbcode/caffe2/fb/nnc
└─ $ buck run //caffe2/binaries:aot_model_compiler -- --model mobilenetv3.pt --model_name=pytorch_dev_mobilenetv3 --model_version=v1 --input_dims="1,3,224,224"
Downloaded 0/4 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 13.2 sec (100%) 18719/18719 jobs, 2/18719 updated
  Total time: 13.5 sec
BUILD SUCCEEDED
The compiled llvm assembly code was saved to mobilenetv3.compiled.ll
The compiled model was saved to mobilenetv3.compiled.pt
```

Reviewed By: ljk53, IvanKobzarev

Differential Revision: D31624342

fbshipit-source-id: 233a6e94ea05ba8d6fc166d2414034c9e58cb076
2021-10-16 20:03:24 -07:00
aa0c31876b Remove THCGeneral.cpp (#66391)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66391

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D31547709

Pulled By: ngimel

fbshipit-source-id: 059c47621863738fb560f4257e7765afa9b952aa
2021-10-16 14:53:52 -07:00
8c5928bd78 add frozen_numpy as a builtin library to torch::deploy (#66297)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66297

Link register_numpy.cpp with the embedded interpreter will register numpy as a builtin library.

Test Plan: Add unit test to test basic numpy functionality in torch::deploy like creating random matrices, matric multiplication.

Reviewed By: suo

Differential Revision: D31490434

fbshipit-source-id: b052ce01fc64fb0efee846feb0acc1f107ba13e0
2021-10-15 21:48:24 -07:00
42f138469a [TS] Return early if device doesn't match (#66694)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66694

`lhs.equal(rhs)` would throw if the device doesn't match. To avoid that we return early if the device doesn't match.

Test Plan: CI

Reviewed By: houseroad

Differential Revision: D31691608

fbshipit-source-id: 513c3e0743a65d9778c7ef9b79ececfeaccc0017
2021-10-15 18:13:46 -07:00
32ac001e4d Suppress deprecated copy in vec256_qint.h (#66646)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66646

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D31660387

fbshipit-source-id: a1ea9702a8b33f78a7201a1d9214065c2fb930b1
2021-10-15 17:14:15 -07:00
65adf1dfa2 Migrate THCState to ATen (#66480)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66480

This guts `THCState` to simply be an empty struct, as well as:
- moving `THCState_getPeerToPeerAccess` and its cache into `ATen`.
- cleaning up dead code in `THCGeneral.cpp`
- moving `THCudaInit` and `THCMagma_init` into `CUDAHooks::initCUDA`

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D31577488

Pulled By: ngimel

fbshipit-source-id: 90604f30854fe766675baa3863707ac09995bc9e
2021-10-15 17:05:04 -07:00
2f099c7555 Revert D30652629: use irange for loops
Test Plan: revert-hammer

Differential Revision:
D30652629 (687c2267d4)

Original commit changeset: 0ae6c4bbbb55

fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3
2021-10-15 15:23:10 -07:00
1e2b2ee5ff sort_out_cuda: Use custom kernels to fill index tensors (#66668)
Summary:
These stable sorts currently use a combination of `at::arange`, view ops and `tensor.copy_` to fill in the initial values for the indices before calling into `CUB` to do the actual sort. This is somewhat inefficient because it requires 2 to 4 kernel launches, and the copies all use strided kernels instead of the more efficient contiguous kernels. Instead, a fairly straight-forward custom kernel is more efficient in terms of both CUDA and CPU runtime.

In a simple benchmark I profiled `a.sort(stable=True, dim=1)` for different shapes and single out the kernel invocations for intitializing the index tensors (i.e. the non-`cub` kernels). Note that when the batch dim is `<128` we call `segmented_sort_pairs_by_full_sort` instead of `segmented_sort_pairs`:

| shape        | Master (us) | This PR (us) |
|--------------|:-----------:|:------------:|
| (100, 1000)  |    5.000    |     2.300    |
| (1000, 100)  |    2.070    |     1.090    |
| (100, 10000) |    87.34    |     26.47    |
| (1000, 1000) |    28.63    |     20.27    |

Of course for sufficiently large inputs, the overall runtime is dominated by the actual sort. But I have another motive of wanting to remove operator the calls from the middle of this kernel launch code. This change makes it easier to split the kernel code that needs to be compiled with `nvcc` into it's own file that doesn't include `Tensor.h`, similar to what I'm doing in https://github.com/pytorch/pytorch/issues/66620.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66668

Reviewed By: H-Huang

Differential Revision: D31693722

Pulled By: ngimel

fbshipit-source-id: 5765926e4dbbc7a20d2940c098ed093b3de2204e
2021-10-15 15:13:02 -07:00
9ba39d2008 Clean up test running scripts (#65508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65508

This has some misc cleanups for the code that happens before `run_test.py`:

* remove hardcoding of 2 shards
* add `set -eux` in some places

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D31296509

Pulled By: driazati

fbshipit-source-id: 2df1463432846d8a4d8a579812a4e9c3b7c2b957
2021-10-15 14:36:32 -07:00
2c761caaaa [Vulkan] cat operator for channel dimension (#66669)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66669

Implemented `cat` operator for channel dimension

**Facts:**
* texture coordinate: x(width), y(height), z(depth)
* input x, y, z -> no change
* out x, y -> no change
* out z and index i, j only matter

**Equations:**
batch_size = bt0 (or bt1 or bt2 or ...) = # of batch for tensor i
ch_size = ch0 (or ch1 or ch2 or ...) = # of channels for tensor i
ch_interval = ch0 + ch1 + ch2 + ... = total # of channels for all tensors
ch_size_allprior = ch0 (or ch0+ch1 or ch0+ch1+ch2 or ...) = # of channels for tensor 0 to i-1 where pos.z = d (input)
i = index of input texel = vec4[i] of texel at posIn(x,y,z) on input texture
j = index of output texel = vec4[j] of texel at posOut(x',y',z') on input texture

posIn[i] = {x,y,z} at ith index of vec4
src_index = posIn.z * 4 + i
dst_index = int(src_index / ch_size) * ch_interval + (src_index % ch_size) + ch_size_allprior
d = posOut.z = int(dst_index / 4)
j = (dst_index % 4)
posOut[j] = {posIn.x, posIn.y, d} at jth index of vec4

**Shader pseudo code:**
posOut = posIn;
for (i = 0; i < 4; ++i) {
	src_index = posIn.z * 4 + i;
	if (src_index >= ch_size * batch_size) break; // out of range
	dst_index = int(src_index / ch_size) * ch_interval + (src_index % ch_size) + ch_size_allprior;
	posOut.z = int(dst_index / 4);
	j = (dst_index % 4);
	uOutput[j] = uInput[i]
}

Test Plan:
Test build on Android:
```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
```

Test result:
```
[ RUN      ] VulkanAPITest.cat_dim1_samefeature_success
[       OK ] VulkanAPITest.cat_dim1_samefeature_success (101 ms)
[ RUN      ] VulkanAPITest.cat_dim1_difffeature_success
[       OK ] VulkanAPITest.cat_dim1_difffeature_success (81 ms)
[ RUN      ] VulkanAPITest.cat_dim1_texture2d_success
[       OK ] VulkanAPITest.cat_dim1_texture2d_success (2 ms)
[ RUN      ] VulkanAPITest.cat_dim1_singledepth_success
[       OK ] VulkanAPITest.cat_dim1_singledepth_success (6 ms)
[ RUN      ] VulkanAPITest.cat_dim1_singletensor_success
[       OK ] VulkanAPITest.cat_dim1_singletensor_success (21 ms)
[ RUN      ] VulkanAPITest.cat_dim1_twotensors_success
[       OK ] VulkanAPITest.cat_dim1_twotensors_success (53 ms)
[ RUN      ] VulkanAPITest.cat_dim1_bat1_ch4multiple_success
[       OK ] VulkanAPITest.cat_dim1_bat1_ch4multiple_success (17 ms)
[ RUN      ] VulkanAPITest.cat_dim2_sameheight_success
[       OK ] VulkanAPITest.cat_dim2_sameheight_success (83 ms)
[ RUN      ] VulkanAPITest.cat_dim2_diffheight_success
[       OK ] VulkanAPITest.cat_dim2_diffheight_success (86 ms)
[ RUN      ] VulkanAPITest.cat_dim2_singledepth_success
[       OK ] VulkanAPITest.cat_dim2_singledepth_success (5 ms)
[ RUN      ] VulkanAPITest.cat_dim2_invalidinputs_exceptions
[       OK ] VulkanAPITest.cat_dim2_invalidinputs_exceptions (82 ms)
```

Reviewed By: SS-JIA

Differential Revision: D31593623

fbshipit-source-id: e52dc57985e3f0bb9b20313d4fcc7248a436e863
2021-10-15 14:25:19 -07:00
06cfdfae0e Promote integral inputs to floating for torch.logsumexp (#63393)
Summary:
Fixed https://github.com/pytorch/pytorch/issues/56132, Integral inputs of `torch.logsumexp` would be promoted to the floating point type.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63393

Reviewed By: ezyang

Differential Revision: D30512180

Pulled By: mruberry

fbshipit-source-id: fbde3605c15b930411d0d1eb3a132b0088187097
2021-10-15 14:20:50 -07:00
67e003f09b [Static Runtime] Determine function for ProcessedNode::run() statically (#66692)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66692

Currently `ProcessedNode::run()` performs 2 dynamic dispatches to decide which function implementation to execute depending on if the function is an out variant / native / or interpreter fallback. Note that this is happening every time an operation is executed by Static Runtime dynamically.

This change makes *that* same decision during module loading time once so that we can remove 1 dynamic dispatch cost at runtime.

**size reduction**

Saving 4 bytes per `ProcessedNode`.

- Before: sizeof(c10::variant<OutVariant, NativeFunction, Operation>):40

- After: sizeof(std::function<void(ProcessedNode*)>): 32 + sizeof(FunctionKind):4 = 36

**latency optimization**

Expected to remove 2 memory loads & 1 conditional jump per `ProcessedNode::run()` execution (needs to be confirmed from compiled binary code).

Ran `ptvsc2_predictor_bench` with `inline_cvr` with 1000 iterations:
- local : 7.56026 -> 7.24794
- local_ro: 1.5799. -> 1.55504.
- remote_ro: 10.6464 -> 10.3017

Test Plan: Ran existing unittests

Reviewed By: swolchok

Differential Revision: D31591785

fbshipit-source-id: 5de83ca386af509381e08ecedf071ee4e9f0f0b0
2021-10-15 14:07:24 -07:00
d1b6121935 Revert D31656999: Add meta support to tensor range factories
Test Plan: revert-hammer

Differential Revision:
D31656999 (7400f34b8e)

Original commit changeset: 06e7f3655b94

fbshipit-source-id: 2f9d8d1acbb01c5105ece73472e5c1f5f90886ee
2021-10-15 14:03:04 -07:00
a25648953c Add warn_only kwarg to use_deterministic_algorithms (#66233)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64883

Adds a `warn_only` kwarg to `use_deterministic_algorithms`. When enabled, calling an operation that does not have a deterministic implementation will raise a warning, rather than an error.

`torch.testing._internal.common_device_type.expectedAlertNondeterministic` is also refactored and documented in this PR to make it easier to use and understand.

cc mruberry kurtamohler

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66233

Reviewed By: bdhirsh

Differential Revision: D31616481

Pulled By: mruberry

fbshipit-source-id: 059634a82d54407492b1d8df08f059c758d0a420
2021-10-15 13:54:59 -07:00
687c2267d4 use irange for loops (#66234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

bypass_size_limit
allow-large-files

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D30652629

fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
2021-10-15 13:50:33 -07:00
b5b7d6a3a6 EmbeddingBackward exclusive_scan thrust->cub (#66566)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66566

Reviewed By: H-Huang

Differential Revision: D31637660

Pulled By: ngimel

fbshipit-source-id: 8093432bb9a9b902bb6bab7da221f0bcd7e9fb34
2021-10-15 13:46:30 -07:00
bd25f92e81 Fix Wextra issues in Half.h (#66643)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66643

Fixes:
```
caffe2/c10/util/Half.h:456:14: error: comparison of integers of different signs: 'long' and 'unsigned long' [-Werror,-Wsign-compare]
    return f > limit::max() ||
           ~ ^ ~~~~~~~~~~~~
```

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D31656816

fbshipit-source-id: 7623d20e166a9e95a949ebd8b23793f24960cf07
2021-10-15 13:38:10 -07:00
abc022f9c8 Fix torch.cholesky deprecation warning (#66645)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66645

Fixes:
```
test_cholesky_solve_batched_broadcasting_cpu_complex128 (__main__.TestLinalgCPU) ... test_linalg.py:3099: UserWarning: torch.cholesky is deprecated in favor of torch.linalg.cholesky and will be removed in a future PyTorch release.
```

Test Plan: Sandcastle

Reviewed By: mruberry

Differential Revision: D31635851

fbshipit-source-id: c377eb88d753fb573b3947f0c6ff5df055cb13d8
2021-10-15 13:24:58 -07:00
0b8dc0f04a add BFloat16 operators on CPU: logaddexp, logaddexp2, remainder (#63621)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63621

Reviewed By: H-Huang

Differential Revision: D31640811

Pulled By: mruberry

fbshipit-source-id: 1fd061b65c196398738018eefc52bf459e424b1c
2021-10-15 13:11:45 -07:00
a58852fd44 Fix fx2trt broken unit test (#66696)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66696

D31511082 (9918fd8305) moved unit test but didn't add proper target in build file, fix it in this diff.

Test Plan: buck test mode/opt caffe2/test/fx2trt/converters/...

Reviewed By: 842974287

Differential Revision: D31667697

fbshipit-source-id: 49e04afa323b27a1408c9bc2b5061b6529ced985
2021-10-15 12:56:12 -07:00
e48a4cbf64 Make several methods of SharedParserData private (#66670)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66670

Reviewed By: zhxchen17

Differential Revision: D31674377

Pulled By: gmagogsfm

fbshipit-source-id: 5c73b78f842c5c4305047ca98f40bf99bd3d2d60
2021-10-15 12:43:45 -07:00
e88d1c4f10 [PyTorch] Add tuple inline storage (#64066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64066

I noticed a bunch of time being spent heap-allocating Tuples
in the unpickler. 1-, 2-, and 3-element Tuples are apparently common
enough that they get their own bytecode instructions, so I decided to
try also giving them their own representation. We store up to 3
IValues inline in `Tuple` rather than doing a second heap allocation
for a `std::vector<IValue>`.
ghstack-source-id: 140695395

Test Plan:
Added automated tests for TupleElements.

Pixel 3 before: https://www.internalfb.com/intern/aibench/details/761596366576284
Pixel 3 after: https://www.internalfb.com/intern/aibench/details/591414145082422
We went from 347 ms to 302 ms.

Reviewed By: dhruvbird

Differential Revision: D30592622

fbshipit-source-id: 93625c54c9dca5f765ef6d5c191944179cb281a8
2021-10-15 12:16:51 -07:00
f8f9a47b02 PR3: add a workaround for reference path (#66535)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66535

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D31676400

Pulled By: rahxephon89

fbshipit-source-id: fd4c8e9bbc82930cc1255fb8bf8d8ac7f0934c3f
2021-10-15 11:56:11 -07:00
7400f34b8e Add meta support to tensor range factories (#66630)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66630

This PR adds meta backend support to the `range`, `arange`, `linspace`, and `logspace` operators.
ghstack-source-id: 140618055

Test Plan: Extended the existing tensor creation tests to assert meta backend support.

Reviewed By: ezyang

Differential Revision: D31656999

fbshipit-source-id: 06e7f3655b94c0d85a28bcd0ca61d9f9ce707f1d
2021-10-15 11:17:08 -07:00
6436bd3d5d Clarify topk doc (#65938)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/50331
<img width="855" alt="Screen Shot 2021-10-01 at 11 23 23 AM" src="https://user-images.githubusercontent.com/17888388/136036611-f2bd9c77-61b4-4ab8-85eb-44f50c1e03d7.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65938

Reviewed By: bdhirsh

Differential Revision: D31314875

Pulled By: samdow

fbshipit-source-id: bdd9425fd748710f8a64ed1989e1938dd358780f
2021-10-15 11:15:48 -07:00
2506baf9c2 [ONNX] move CheckerError from torch.onnx.utils to torch.onnx (#66644)
Summary:
This moves it to where the user would expect it to be based on the
documentation and all the other public classes in the torch.onnx module.

Also rename it from ONNXCheckerError, since the qualified name
torch.onnx.ONNXCheckerError is otherwise redundant.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66644

Reviewed By: malfet

Differential Revision: D31662559

Pulled By: msaroufim

fbshipit-source-id: bc8a57b99c2980490ede3974279d1124228a7406
2021-10-15 10:38:56 -07:00
3a9259f6cf [TensorExpr] Add missing schema for aten::where and aten::pow lowerings. (#66688)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66688

Differential Revision:
D31689431
D31689431

Test Plan: Imported from OSS

Reviewed By: bertmaher

Pulled By: ZolotukhinM

fbshipit-source-id: 6b3abb4471170ff5418f72bb700325711e7bd28f
2021-10-15 10:14:43 -07:00
06c37876b8 torch.linalg.householder_product faster backward (#63880)
Summary:
This PR implements a much more efficient algorithm. This algorithm allows to achieve MASSIVE speed-ups, especially for batched and/or larger double-precision inputs.
Here are some benchmarks:

<details>

<summary>Testing script</summary>

```python
from IPython import get_ipython
import torch
import itertools

torch.manual_seed(13)
#torch.set_num_threads(1)

ipython = get_ipython()

cpu = torch.device('cpu')
cuda = torch.device('cuda')

def generate_input(shape, dtype=torch.double, device=cpu):
    eigvals = torch.rand(*shape[:-1], dtype=dtype, device=device)
    eigvecs = torch.rand(*shape, dtype=dtype, device=device)
    input = (eigvecs * eigvals.unsqueeze(-2)) @ eigvecs.inverse()
    input.requires_grad_(True)
    tau = torch.rand(*shape[:-1], dtype=dtype, device=device)
    tau.requires_grad_(True)
    return input, tau

def run_test(shape, device, dtype):
    print(f"shape: {shape}, device: {device}, dtype: {dtype}")
    a, tau = generate_input(shape, dtype=dtype, device=device)
    prod = torch.linalg.householder_product(a, tau)
    ones_prod = torch.ones_like(prod)

    command = "torch.autograd.backward((prod,), (ones_prod), retain_graph=True)"
    if device == cuda:
        command = command + "; torch.cuda.synchronize()"
    ipython.magic(f"timeit {command}")
    print()

dtypes = [torch.float, torch.double]
devices = [cpu, cuda]
#devices = [cuda]
sizes = [
    (10, 10),
    (1000, 10, 10),
    (100, 100),
    (1000, 100, 100),
    (1000, 1000),
    (10, 1000, 1000),
]

for device, dtype, size in itertools.product(devices, dtypes, sizes):
    run_test(size, device, dtype)

```

</details>

<details>

<summary>This PR, cuda float32</summary>

```
shape: (10, 10), device: cuda, dtype: torch.float32
1.33 ms ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

shape: (1000, 10, 10), device: cuda, dtype: torch.float32
1.52 ms ± 40.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

shape: (100, 100), device: cuda, dtype: torch.float32
10.8 ms ± 9.62 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (1000, 100, 100), device: cuda, dtype: torch.float32
127 ms ± 8.45 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

shape: (1000, 1000), device: cuda, dtype: torch.float32
151 ms ± 127 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

shape: (10, 1000, 1000), device: cuda, dtype: torch.float32
981 ms ± 91.4 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

```

</details>

<details>

<summary>Master, cuda float32</summary>

```
shape: (10, 10), device: cuda, dtype: torch.float32
1.64 ms ± 6.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

shape: (1000, 10, 10), device: cuda, dtype: torch.float32
298 ms ± 463 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (100, 100), device: cuda, dtype: torch.float32
15.4 ms ± 41.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (1000, 100, 100), device: cuda, dtype: torch.float32
5.36 s ± 711 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (1000, 1000), device: cuda, dtype: torch.float32
1.64 s ± 1.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (10, 1000, 1000), device: cuda, dtype: torch.float32
15.7 s ± 43.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

```

</details>

<details>

<summary>This PR, cuda float64</summary>

```
shape: (10, 10), device: cuda, dtype: torch.float64
1.14 ms ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

shape: (1000, 10, 10), device: cuda, dtype: torch.float64
2.22 ms ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (100, 100), device: cuda, dtype: torch.float64
10.6 ms ± 11.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (1000, 100, 100), device: cuda, dtype: torch.float64
287 ms ± 84.9 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (1000, 1000), device: cuda, dtype: torch.float64
236 ms ± 41.9 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (10, 1000, 1000), device: cuda, dtype: torch.float64
1.88 s ± 88.3 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

```
</details>

<details>

<summary>Master, cuda float64</summary>

```
shape: (10, 10), device: cuda, dtype: torch.float64
1.58 ms ± 8.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

shape: (1000, 10, 10), device: cuda, dtype: torch.float64
308 ms ± 213 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (100, 100), device: cuda, dtype: torch.float64
79 ms ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

shape: (1000, 100, 100), device: cuda, dtype: torch.float64
54.2 s ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (1000, 1000), device: cuda, dtype: torch.float64
31.5 s ± 698 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (10, 1000, 1000), device: cuda, dtype: torch.float64
4min 45s ± 2.48 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

```
</details>

<details>

<summary>This PR, cpu float32</summary>

```
shape: (10, 10), device: cpu, dtype: torch.float32
476 µs ± 21.4 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (1000, 10, 10), device: cpu, dtype: torch.float32
5.1 ms ± 100 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (100, 100), device: cpu, dtype: torch.float32
4.38 ms ± 4.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (1000, 100, 100), device: cpu, dtype: torch.float32
1.55 s ± 6.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (1000, 1000), device: cpu, dtype: torch.float32
745 ms ± 407 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (10, 1000, 1000), device: cpu, dtype: torch.float32
5.44 s ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

```
</details>

<details>

<summary>Master, cpu float32</summary>

```
shape: (10, 10), device: cpu, dtype: torch.float32
387 µs ± 645 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

shape: (1000, 10, 10), device: cpu, dtype: torch.float32
12.3 ms ± 23.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (100, 100), device: cpu, dtype: torch.float32
39.4 ms ± 80.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

shape: (1000, 100, 100), device: cpu, dtype: torch.float32
29.1 s ± 44.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (1000, 1000), device: cpu, dtype: torch.float32
9.42 s ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (10, 1000, 1000), device: cpu, dtype: torch.float32
1min 50s ± 282 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

```
</details>

<details>

<summary>This PR, cpu float64</summary>

```
shape: (10, 10), device: cpu, dtype: torch.float64
381 µs ± 761 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

shape: (1000, 10, 10), device: cpu, dtype: torch.float64
6.19 ms ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (100, 100), device: cpu, dtype: torch.float64
4.6 ms ± 3.26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (1000, 100, 100), device: cpu, dtype: torch.float64
2.59 s ± 8.25 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (1000, 1000), device: cpu, dtype: torch.float64
1.07 s ± 5.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (10, 1000, 1000), device: cpu, dtype: torch.float64
14.4 s ± 13.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

```
</details>

<details>

<summary>Master, cpu float64</summary>

```
shape: (10, 10), device: cpu, dtype: torch.float64
395 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

shape: (1000, 10, 10), device: cpu, dtype: torch.float64
14.6 ms ± 9.76 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

shape: (100, 100), device: cpu, dtype: torch.float64
45.5 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

shape: (1000, 100, 100), device: cpu, dtype: torch.float64
33.1 s ± 69.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (1000, 1000), device: cpu, dtype: torch.float64
19.3 s ± 80.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

shape: (10, 1000, 1000), device: cpu, dtype: torch.float64
3min 30s ± 1.29 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

```
</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63880

Reviewed By: soulitzer

Differential Revision: D30639435

Pulled By: anjali411

fbshipit-source-id: 127789943ae56e2f1dd03e0fe76ef7b6db86bcf0
2021-10-15 09:54:30 -07:00
65e25256c3 [ROCm] enable test_distributed() in test.sh (#66657)
Summary:
Restores tests for ROCm CI that used to run prior to https://github.com/pytorch/pytorch/issues/63147.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66657

Reviewed By: soulitzer

Differential Revision: D31668379

Pulled By: malfet

fbshipit-source-id: 91a6f6c63d6c957cc5821edbd33d4c16eecc8c0a
2021-10-15 09:45:11 -07:00
8a01bbd64a add flatten parameter module (#66578)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66578

flatten parameters for performance optimization and handle the case when grad ready order is different or there are unused parameters among ranks. when there is no param to be sharded in the FSDP instance (usually root), the flatten wrapper module's flat_param is None.
ghstack-source-id: 140696745

Test Plan: unit test

Reviewed By: mrshenli

Differential Revision: D31625194

fbshipit-source-id: c40e84f9154f5703e5bacb02c37c59d6c4e055c7
2021-10-15 09:37:26 -07:00
a3d12bcdf9 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D31681115

fbshipit-source-id: e2146e59a57ff27759de18b00fb644e9dc3c5672
2021-10-15 03:07:57 -07:00
76efbccc3b [PyTorch Edge][tracing-based] Unify tracer between internal and external (#64152)
Summary:
As title, introduce the file `TracerRunner` shared by internal/external tracer and the main function is
```
TracerResult trace_run(const std::string& input_module_path);
```
which basically takes the path to model file and generate the trace result. The main difference between external tracer and internal tracer is
1. the dependency on `<yaml-cpp/yaml.h>`.
2. the output yaml file from internal tracer includes `model_version` and `model_asset`. These are only needed for internal.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64152

ghstack-source-id: 140692467

Test Plan:
```
./build/bin/model_tracer --model_input_path "/Users/chenlai/Documents/pytorch/tracing/deeplabv3_scripted_with_bundled_input.ptl" --build_yaml_path  "/Users/chenlai/Documents/pytorch/tracing/tmp.yaml"
```
```
./fbcode/caffe2/fb/model_tracer/run_model_with_bundled_inputs.sh ~/local/notebooks/prod_models/deeplabv3_scripted_with_bundled_input.ptl
```
have the same operator output

selected_operators.yaml (P460296279)
selected_mobile_ops.h (P460296258)

Reviewed By: dhruvbird

Differential Revision: D30632224

fbshipit-source-id: eb0321dbc0f1fcf6d2e05384695eebb59ac04f8c
2021-10-15 02:19:45 -07:00
1e47181c47 [DDP Logging] Add iteration in error reporting (#65772)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65772

Looking at some workloads and it would be useful to have this info.
ghstack-source-id: 140555200

Test Plan: CI

Reviewed By: zhaojuanmao, wayi1

Differential Revision: D31224417

fbshipit-source-id: 14eeb053aced87c7ca43b6879f81f54bd0a42b76
2021-10-14 22:29:36 -07:00
3740a06712 [MonitoredBarrier] Fix some logging (#65771)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65771

Fixes some logging around monitored_barrier to make it cleaner.
ghstack-source-id: 140555204

Test Plan: CI

Reviewed By: zhaojuanmao, wayi1

Differential Revision: D31222881

fbshipit-source-id: 77d6f072ce98a9b31192e0d48ea0f8cbd8f216fe
2021-10-14 22:28:16 -07:00
06fa6c15c0 Back out "Revert D31299350: Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor"" (#66393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66393

Third try!

Fixes:
- test_nccl_timeout can be flaky because of 1s timeout, bump up the timeout to resolve the flakiness. But in general we should not have been relying on time.sleep for this test, filed https://github.com/pytorch/pytorch/issues/66354 to track that.
- ciflow/all did not actually run tests due to a bug causing multigpu tests to not be run. This has since been fixed.
ghstack-source-id: 140560113

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D31534735

fbshipit-source-id: 8b7e0f4fed3972b7a77cbcda28876c9eefb0c7e2
2021-10-14 22:23:22 -07:00
59b28063b4 [NNC] Adding more python bindings for missing operators (#66612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66612

For op authoring project, we want to expose the python bindings
to create Expr. These are the missing bindings.

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D31667852

fbshipit-source-id: 6d3ff83a7676cfea391ab3ea60dde6874a64047a
2021-10-14 22:09:01 -07:00
8dcf84069e [PyTorch] Implement improved version of gather_ranges_to_dense (#66677)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66677

Reviewed By: wfanzju

Differential Revision: D31676536

fbshipit-source-id: a2eb1b1f9e5a0b78f89c3aad19f97acb7c05e1f8
2021-10-14 21:22:15 -07:00
70fc60b9d1 Revert D31325860: [PyTorch] Implement improved version of gather_ranges_to_dense
Test Plan: revert-hammer

Differential Revision:
D31325860 (23710e2d80)

Original commit changeset: 8e154f929ff7

fbshipit-source-id: 6d36d50d6bd4ec4fe07a6e2d1d0110504b9c8b53
2021-10-14 19:43:38 -07:00
b60050e96a [qat]Make sure the bn statistics are the same in the unit test. (#66244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66244

Make sure the bn statistics are the same in the unit test.
* The fused model in the existing code will have different bn statistics compared to the model without fusion. They will produce the same result when the model is in training mode, but different result in eval mode.

Test Plan: buck run mode/dev-nosan //caffe2/test:quantization -- -r quantization.eager.test_fusion.TestFusion

Reviewed By: jerryzh168

Differential Revision: D29504500

fbshipit-source-id: 41e3bfd7c652c27619baa7cbbe98d8d06a485781
2021-10-14 19:23:05 -07:00
23710e2d80 [PyTorch] Implement improved version of gather_ranges_to_dense (#66664)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66664

Reviewed By: hlu1

Differential Revision: D31325860

fbshipit-source-id: 8e154f929ff7c597ff6e41f18278b24c552d1719
2021-10-14 18:37:35 -07:00
583217fe37 changes for pytorch issue 55577 (#66571)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66571

changes for pytorch issue 55577

Test Plan:
Ran test:
python test/test_jit.py TestDict

Reviewed By: tugsbayasgalan

Differential Revision: D31622633

fbshipit-source-id: 171c68a65b1d0bf769b3d95f103daba375e95335
2021-10-14 18:19:11 -07:00
a1084401b0 Clean up DictLiteral and DictComprehension emission logic (#64953)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64953

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D30914687

Pulled By: ansley

fbshipit-source-id: ab9b9192a29f05b90c113c678e7c795bc087dc99
2021-10-14 17:35:39 -07:00
a7b79033ea Clean up ListLiteral and ListComprehension emission logic (#64952)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64952

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30914690

Pulled By: ansley

fbshipit-source-id: 83ac9bc6445f89b3f47c5404435bc6058c6f3bd7
2021-10-14 17:34:17 -07:00
22ec625028 fx2trt example: run all submodules (#66590)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66590

Updated fx2trt example to run all submodules

Added assertion to make sure outputs from lowered and regular models matches

Test Plan: buck run mode/dev-nosan caffe2:fx2trt_example

Reviewed By: 842974287

Differential Revision: D31592985

fbshipit-source-id: 45ce0b33e957f16b3729d3ecde706331c29d7214
2021-10-14 17:09:29 -07:00
20aa417e38 [PyTorch] [Quantization] Speed up PackedEmbeddingBagWeight::prepack() (#66632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66632

Calling `.item<float>()` for each element in a tensor is expensive. Instead convert the entire Tensor in one call to `Tensor::copy_(input_tensor)`. See [this post](https://fb.workplace.com/groups/1144215345733672/posts/2080756188746245/) for more details.
ghstack-source-id: 140639868

Test Plan:
Build and run with bundled inputs.

### AI Bench

Before: [AI Bench](https://www.internalfb.com/intern/aibench/details/877359346171823), [Flamegraph](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/speech_transducer_v6_perf_1634185889953.html): 500ms

After: [AI Bench](https://www.internalfb.com/intern/aibench/details/60828780633319), [Flamegraph](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/speech_transducer_v6_perf_1634231176980.html): 444ms

We went from 500ms to 444ms, which is a reduction of ~11%.

Reviewed By: supriyar

Differential Revision: D31657430

fbshipit-source-id: 199ec9de3dab84bb5727d81c7804bb83bebf7b48
2021-10-14 16:30:39 -07:00
871a31b9c4 [TensorExpr] Add missing schemas for lshift/rshift lowerings. (#66653)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66653

Test Plan: Imported from OSS

Reviewed By: navahgar, anijain2305

Differential Revision: D31664748

Pulled By: ZolotukhinM

fbshipit-source-id: 13a3154292f12b7bee43b9a5254fb43be032e7c1
2021-10-14 14:19:29 -07:00
f8348ce9c8 graceful failure for draw_graph() in acc_utils.py (#66631)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66631

writing to the current directory is causing issues in CI. we might also consider writing the ".dot" files to some temporary location.

Test Plan: CI

Reviewed By: 842974287

Differential Revision: D31657078

fbshipit-source-id: 9876327c7f172cd354f1b8e8076597c6a26e2850
2021-10-14 14:04:48 -07:00
1d90f29f14 [DOC] Improve Transformer documentation (#66574)
Summary:
Includes adding some typing annotations to TransformerEncoderLayer and TransformerDecoderLayer

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66574

Reviewed By: soulitzer

Differential Revision: D31654024

Pulled By: jbschlosser

fbshipit-source-id: 9026bd36541699b7205e893decf5abc4a3f0ab5e
2021-10-14 13:26:12 -07:00
3097755e7a [DOC] Fix typo in KLDivLoss (#66583)
Summary:
Fix simple typo.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66583

Reviewed By: soulitzer

Differential Revision: D31653998

Pulled By: jbschlosser

fbshipit-source-id: e4fc91be297cc9a85099d7883b42436b5e3392d3
2021-10-14 13:21:37 -07:00
914796a69c Fix for prim::BroadcastMKLDNNTensors issue (#66628)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66628

Ensure BroadcastMKLDNNTensors do not break the stack invariant by pushing more than 2 tensors into the stack.

Reviewed By: eellison

Differential Revision: D31638565

fbshipit-source-id: 4526c0cf7ba8d87dc8a9c213c66c711e83adfc66
2021-10-14 11:53:42 -07:00
833ede33ed Fix ubsan in concat_split_op.h (#66283)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66283

Fixes
```
UndefinedBehaviorSanitizer: nullptr-with-nonzero-offset caffe2/caffe2/operators/concat_split_op.h:185:52
```

Test Plan: Sandcastle

Reviewed By: swolchok

Differential Revision: D31486274

fbshipit-source-id: 20128056f19cf814fdc3e6e144cf9208a4080d6a
2021-10-14 11:42:30 -07:00
76f3b07caf quantization docs: remove erroneous rebase artifact (#66577)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66577

There was a rebase artifact erroneously landed to quantization docs,
this PR removes it.

Test Plan:
CI

Imported from OSS

Reviewed By: soulitzer

Differential Revision: D31651350

fbshipit-source-id: bc254cbb20724e49e1a0ec6eb6d89b28491f9f78
2021-10-14 11:30:47 -07:00
016362e2d7 Run sparse tests only for TensorPipe agent. (#66600)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66600

Sparse RPC functionality added in
https://github.com/pytorch/pytorch/pull/62794 works only for TensorPipe and is
broken for other agent types.

Moving these tests to a TensorPipe only class.
ghstack-source-id: 140553147

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D31633305

fbshipit-source-id: 37d94cb9ed5565a72a6d512c2a9db75a497d5b95
2021-10-14 11:08:15 -07:00
543b7fb942 [JIT] Fix type annotations of pooling modules (#65847)
Summary:
All of the pooling modules except MaxUnpool and LPPool return either a
Tensor or [Tensor, Tensor]. The current type annotations are inaccurate,
and prevent scripting the module if return_indices is set as True in the
module.

There's not a great way to make this agree with mypy because the
overload is dependent on the value of return_indices, an attribute.

I tried changing the annotations from `Tensor` to
`Union[Tensor, Tuple[Tensor, Tensor]]`, but that breaks a bunch of uses
that have return_indices=False.
For example, this breaks:
4e94e84f65/torch/nn/modules/container.py (L139)

Also clean up how test names were being constructed in test_jit, since
otherwise we were getting name collisions when there were two tests on
the same nn.Module.

Fixes https://github.com/pytorch/pytorch/issues/45904

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65847

Reviewed By: ZolotukhinM

Differential Revision: D31462517

Pulled By: eellison

fbshipit-source-id: 6f9e8df1be6c75e5e1e9bae07cf3ad3603ba59bd
2021-10-14 10:59:19 -07:00
51b67f2bca [qat]Removed outdated context manager in unit test. (#66274)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66274

Removed outdated context manager in unit test.
* The linked issue (https://github.com/pytorch/pytorch/issues/23825) seemed have been be fixed in 2020.

Test Plan: buck run mode/dev-nosan //caffe2/test:quantization -- -r quantization.eager.test_quantize_eager_qat

Reviewed By: vkuzo

Differential Revision: D29507087

fbshipit-source-id: e8fa04c9527023a5adaf1a012b2c393ce0c5cd97
2021-10-14 10:23:55 -07:00
49a1d7bfcb [opinfo] elemwise parcel : isfinite, isinf, isposinf, isneginf, isnan, isreal (#66400)
Summary:
Adds OpInfo for `isfinite, isinf, isposinf, isneginf, isnan, isreal`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66400

Reviewed By: bdhirsh

Differential Revision: D31602998

Pulled By: mruberry

fbshipit-source-id: 235cc414f373f014f4822a72deb1a04a58ad4a7c
2021-10-14 10:11:57 -07:00
d810e738b9 OpInfo for *_like functions (#65941)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65941

OpInfos for: empty_like, zeros_like, ones_like, full_like, randn_like

Test Plan: - run tests

Reviewed By: dagitses

Differential Revision: D31452625

Pulled By: zou3519

fbshipit-source-id: 5e6c45918694853f9252488d62bb7f4ccfa1f1e4
2021-10-14 09:14:51 -07:00
5d4452937d OpInfos for some Tensor dtype conversion methods (#64282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64282

OpInfos for:
- Tensor.bfloat16, Tensor.bool, Tensor.bypte, Tensor.char
- Tensor.double, Tensor.float, Tensor.half, Tensor.int
- Tensor.short, Tensor.long

None of these are supported by TorchScript. Also, the OpInfo autograd
test runner assumes that the operation is not allowed to change the
dtype of the argument, so only Tensor.double has
`supports_autograd=True` (in theory Tensor.bfloat16, Tensor.float,
Tensor.half should be differentiable).

Test Plan: - run tests

Reviewed By: dagitses

Differential Revision: D31452627

Pulled By: zou3519

fbshipit-source-id: b7f272e558558412c47aefe947af7f060dfb45c5
2021-10-14 09:13:30 -07:00
77f98ea5e0 assert no duplicate yaml keys in codegen (#66238)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66238

The codegen should error if it sees two yaml entries with the same key. The default behavior of python's yaml loader is to overwrite duplicate keys with the new value.

This would have caught a nasty bug that showed up in https://github.com/pytorch/pytorch/pull/66225/files#r723796194.

I tested it on that linked PR, to confirm that it errors correctly (and gives the line number containing the duplicate).

Test Plan: Imported from OSS

Reviewed By: dagitses, albanD, sean-ngo

Differential Revision: D31464585

Pulled By: bdhirsh

fbshipit-source-id: 5b35157ffa9a933bf4b344c4b9fe2878698370a3
2021-10-14 08:28:20 -07:00
fe41df3601 Deprecate x.T on tensors of dimension other than 0 or 2 (#64180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64180

**BC-breaking note:**

This PR deprecates the `Tensor.T` are not matrices. An upgrade guide is added to the
documentation for `Tensor.T`.

This PR DOES NOT make this attribute to throw an error when called on a tensor of `dim != 2`,
but this will be its behavior in a future PyTorch release.

cc mruberry rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D31610611

Pulled By: anjali411

fbshipit-source-id: af8ff7e862790dda9f06921de005b3f6fd0803c3
2021-10-14 08:17:32 -07:00
d802877dfa speed up quantized interpolate for channels last (#66525)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66525

This should solve https://github.com/pytorch/pytorch/issues/60015

There were two `q_zero_point()` accesses inside a for loop which was
expensive. Moving them to before the loop sped things up 10x for a
microbenchmark.

Test Plan:
```
// comment out benchmarks unrelated to original issue, for simplicity
cd benchmarks/operator_benchmark
python -m pt.qinterpolate_test

// before: 2994 us
// after: 324 us
// full results: https://gist.github.com/vkuzo/cc5ef9526dc0cda170d6d63498c16453
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D31592422

fbshipit-source-id: b6078ac1039573bbe545275f7aedfd580910b459
2021-10-14 08:11:26 -07:00
a40812de53 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D31646229

fbshipit-source-id: 26a89b8eb88d31259f79c8f9061e016d57a1e462
2021-10-14 04:52:16 -07:00
6310eb30d1 [SR] Clean up GetLivenessMap (#66606)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66606

- Remove dead code (see comment for where)
- Add debug prints
- Small reorganization of the code to improve readability

Reviewed By: d1jang

Differential Revision: D31568219

fbshipit-source-id: 50240c325bf4fd012e1947ac931bb67c6f5dfafb
2021-10-13 23:55:40 -07:00
e1348973ac Add common_fx2trt.py (#66579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66579

Didn't commit this file in the PR that open sources fx2trt tests

Test Plan: ci

Reviewed By: 842974287

Differential Revision: D31623354

fbshipit-source-id: 6cedbe0f229da40499b83e6df28e16caca392d9c
2021-10-13 21:24:11 -07:00
74849d9188 [acc_shape_inference] add shape inference for quantize_per_channel (#66562)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66562

Adding shape inference for `acc_ops.quantize_per_channel`, and fixing some bugs.

Bugs were related to the fact that `quantize_per_channel` arguments `scales` and `zero_points` take tensors, so when we fetch the values (which needs to be done using `.tolist()` instead of `.item()`) we may get either a list or a scalar value.

Test Plan:
# Test Quantized Resnet
From sandbox with GPU that supports quantized types (tested with V100)
`buck run mode/opt -c python.package_style=inplace caffe2:fx2trt_quantized_resnet_test`
Output
```
...
[TensorRT] INFO: [MemUsageSnapshot] Builder end: CPU 0 MiB, GPU 1548 MiB
[TensorRT] INFO: [MemUsageSnapshot] ExecutionContext creation begin: CPU 0 MiB, GPU 1548 MiB
[TensorRT] VERBOSE: Using cublasLt a tactic source
[TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.1.0
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 0, GPU 1556 (MiB)
[TensorRT] VERBOSE: Using cuDNN as a tactic source
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 0, GPU 1564 (MiB)
[TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5
[TensorRT] VERBOSE: Total per-runner device memory is 23405056
[TensorRT] VERBOSE: Total per-runner host memory is 73760
[TensorRT] VERBOSE: Allocated activation device memory of size 154140672
[TensorRT] INFO: [MemUsageSnapshot] ExecutionContext creation end: CPU 0 MiB, GPU 1736 MiB
trt fp16 time (ms/iter) 1.252899169921875
trt int8 time (ms/iter) 1.3774776458740234
trt implicit int8 time (ms/iter) 1.3835883140563965
PyTorch time (CUDA) (ms/iter) 4.34483528137207
PyTorch time (CPU) (ms/iter) 55.687150955200195
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 0, GPU 1918 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 0, GPU 1866 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 0, GPU 1738 (MiB)
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1012 12:07:23.556475 711816 DynoConfigLoader.cpp:32] Failed to read config: No dyno config client
```

# Test shape inference
`buck test mode/opt glow/fb/fx/acc_tracer:test_acc_shape_inference`
Output
```
...
Summary
  Pass: 95
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/1407375092088240
```

Reviewed By: jfix71, jerryzh168

Differential Revision: D31457323

fbshipit-source-id: 8ccc4a9b0ca655fb30838e88575aff2bf3a387a6
2021-10-13 21:03:08 -07:00
7d9bbd3596 Revert D31580382: [pytorch][PR] dropout update in autodiff
Test Plan: revert-hammer

Differential Revision:
D31580382 (eb8138d886)

Original commit changeset: 41d15da99bf4

fbshipit-source-id: 59f751ee59602a5fd09c17f8c7565dca5e2beb50
2021-10-13 19:52:05 -07:00
c1c985a282 Rename tensorexpr::Value so that it can coexist with torch::jit::Value (#66467)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66467

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D31619973

Pulled By: bertmaher

fbshipit-source-id: eebea821fbbd0ae6f0a7144809c87c7da7f88699
2021-10-13 19:41:07 -07:00
6634570aef [SR] Fix bug in ValueGroup (#66470)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66470

Reviewed By: d1jang

Differential Revision: D31566348

fbshipit-source-id: e0f634af77d893bbc8d66f214b2b8bdd6ab58cc3
2021-10-13 19:26:38 -07:00
d30397d42a [PyTorch][Static Runtime] Don't use vector in ProcessedNode (#65429)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65429

The sizes of these arrays can't change, so there's no need to waste an extra pointer on them.
ghstack-source-id: 140532722

Test Plan:
CI

I profiled this diff and the previous diff together. Comparing time spent in the operator functor handler for to_copy, I see the load instruction fetching the inputs pointer from p_node on https://www.internalfb.com/code/fbsource/[4c98a83b2451fa6750f38796c91ebb0eb0afd800]/fbcode/caffe2/torch/csrc/jit/runtime/static/ops.cpp?lines=947 (`p_node->Input(0).toTensor()`) improved a tiny bit, and the overall time spent in that wrapper decreased from 0.8% to 0.7%.

Reviewed By: hlu1

Differential Revision: D31096042

fbshipit-source-id: 35c30462d6a9f9bd555d6b23361f27962e24b395
2021-10-13 19:13:20 -07:00
c6f0dde3ca Cumsum Converter (#66376)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66376

Added converter for cumsum and unit test

Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_cumsum

Reviewed By: wushirong, 842974287

Differential Revision: D31423701

fbshipit-source-id: ee3aa625d6875ba8e6bad27044d22638e99b5c03
2021-10-13 19:04:37 -07:00
160946e3f3 Use torch.empty() instead of torch.tensor() in torch.nn.Parameter (#66486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66486

The newly-introduced Python dispatcher mode (`__torch_dispatch__`) does not have support for `torch.tensor()` (see #64360) and this causes friction in the user experience if some `nn.Modules` use `torch.tensor()` either implicitly or explicitly.

This PR replaces calls to `torch.tensor()` in `Parameter`, `UninitializedParameter`, and `UninitializedBuffer` with an equivalent call to `torch.empty()` which serves the same purpose and is syntactically more readable.
ghstack-source-id: 140520931

Test Plan: Since no behavioral change, run the existing unit and integration tests.

Reviewed By: pbelevich

Differential Revision: D31575587

fbshipit-source-id: bd7bdeea54370f3e53dc13bd182b97d0f67146f5
2021-10-13 18:56:36 -07:00
30d9fd9cf3 Migrate USE_MAGMA config macro to ATen (#66390)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66390

Test Plan: Imported from OSS

Reviewed By: malfet, bdhirsh

Differential Revision: D31547712

Pulled By: ngimel

fbshipit-source-id: 1b2ebc0d5b5d2199029274eabdd014f343cfbdd3
2021-10-13 17:50:10 -07:00
e75de4f307 remove a few unused THCTensor/Storage methods (#66555)
Summary:
Per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66555

Reviewed By: mruberry

Differential Revision: D31620969

Pulled By: ngimel

fbshipit-source-id: 1922ef523df473e8673a35c4a155b7b0cf000953
2021-10-13 17:18:11 -07:00
4e1c075542 log_sigmoid: Use log1p for improved precision (#66441)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20972

log_sigmoid calculates something like `log(1 + x)` where x is always a
positive number less than one. This wastes floating point precision
because the exponent always becomes zero. Instead, using
`log1p(x)` gives the full mantissa precision around `x=0`.

This also fixes infinity propagation because the old code does,
`exp(in - in)` when `in` is negative. Which for infinity, results in a
NaN instead of 0.

cc albanD mruberry jbschlosser walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66441

Reviewed By: bdhirsh

Differential Revision: D31619630

Pulled By: albanD

fbshipit-source-id: e7867f3459a91e944b92f8ca42b6e0697b13f89b
2021-10-13 16:36:13 -07:00
24202f7fb4 Remove native_functions.yaml dependency from Activation.cu (#64499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64499

This moves the native functions into a separate Activation.cpp file,
which calls into `launch_..._kernel` functions defined in `Activation.cu`.
The exception is `rrelu_with_noise` which is compilcated by the
random number generation code, so I've moved it into its own file.

Test Plan: Imported from OSS

Reviewed By: jbschlosser, ezyang

Differential Revision: D30867323

Pulled By: dagitses

fbshipit-source-id: a4cd6f1fb1b1fed4cc356bf8b3778991ae2278ba
2021-10-13 16:28:13 -07:00
eb8138d886 dropout update in autodiff (#66273)
Summary:
1. Unifies dropout op in autodiff
2. Removes dropout inference support in autodiff

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66273

Reviewed By: jbschlosser, gmagogsfm

Differential Revision: D31580382

Pulled By: eellison

fbshipit-source-id: 41d15da99bf4ce6c47cc335a4156c4a1c9705a70
2021-10-13 16:23:40 -07:00
5f45927d15 Autograd: Delay warnings until the end of backward execution (#66235)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/50209

This adds a new warning handler that stores all warnings in a shared
queue, which can be "replayed" at a later time and, crucially, on
another thread. Then, I use this inside the autograd engine to ensure
that warnings are processed by the handler registered on the main
thread.

For testing, I also add an operator that always warns in the backward
pass and test that the warning is a normal Python warning.

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66235

Reviewed By: ejguan

Differential Revision: D31505413

Pulled By: albanD

fbshipit-source-id: 1a7f60b038f55c20591c0748b9e86735b3fec2f9
2021-10-13 15:38:04 -07:00
42328090cb [GHA] Hardcode doc build target to master (#66567)
Summary:
According to f48f20e154/.circleci/verbatim-sources/job-specs/job-specs-custom.yml (L46-L48)
target should always be master (even on release branches) unless it is a
tagged build

Fixes https://github.com/pytorch/pytorch/issues/66466

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66567

Reviewed By: seemethere

Differential Revision: D31621530

Pulled By: malfet

fbshipit-source-id: d6de2222d0340820555a82ae90b3de22b4dc7b88
2021-10-13 15:08:46 -07:00
0aab34c26c [jit] Refcounting spot fixes in alias_analysis (#66295)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66295

Tidying up the top sources of reference count decrements seen during static runtime startup in alias_analysis.cpp specifically.
ghstack-source-id: 140484160

Test Plan:
CI

perf now shows under 2% time spend in ~__shared_count instead of about 5%.

Reviewed By: suo

Differential Revision: D31490761

fbshipit-source-id: bbdcb7f9065c3aafa7fff7bfea9cea6dbc41f9d9
2021-10-13 14:47:32 -07:00
9767282643 [jit] Add MutableTypePtrHelper::mapTypeToBorrowedAliasTypeSet (#65344)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65344

Callsites that know they are using a cache can borrow AliasTypeSets from the cache instead of copying them.
ghstack-source-id: 140484162

Test Plan: Running perf on static runtime startup seems to show less inclusive time spent in AliasDb::getElements

Reviewed By: ejguan

Differential Revision: D31027363

fbshipit-source-id: b7a1473f4f9e9f14566f56f4b3b4e6317076beeb
2021-10-13 14:47:30 -07:00
75d98fa0ae [jit] Implement one-element MemoryDAG::mayContainAlias more efficiently (#65178)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65178

There is no need to copy the MemoryLocations in this case.
ghstack-source-id: 140484161

Test Plan:
CI

static runtime startup for ctr_mobile_feed decreased from 7.0s to 6.3s

Reviewed By: suo

Differential Revision: D30984442

fbshipit-source-id: 61bb678c4480cd030aaab2bbc8a04cbd9b7c7f4d
2021-10-13 14:46:16 -07:00
9e8281fd2f [fx2trt][code quality] Add type annotation and docstring to utils functions in acc_ops_converters.py (#66496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66496

As the title. No changes on the code logic.

Test Plan: CI

Reviewed By: wushirong

Differential Revision: D31576303

fbshipit-source-id: f2132309023b3c9e09810e32af91eb42eefd3f32
2021-10-13 14:06:15 -07:00
37db650c9c [Static Runtime] Clone test does not use uninitialized memory (#66557)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66557

The test was previously using `at::empty_strided` to initialize one of its inputs. The contents of the tensor returned by this function are random, uninitialized memory. If we happened to get a NaN, this test would fail since `use_equalnan` was not set.

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D31611961

fbshipit-source-id: 79a9476d0d6ce7a9f1412eefcef19bc2618c54b8
2021-10-13 14:02:34 -07:00
82986a17a6 fix lint (#66572)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66572

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D31624043

Pulled By: suo

fbshipit-source-id: 9db9cee3140d78c2a2f0c937be84755206fee1dd
2021-10-13 13:59:08 -07:00
a82fcd3560 Disable .numpy() and .tolist() for tensor subclasses subclasses and fix .tolist() for conjugated and negated tensors (#66082)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66082

Fixes https://github.com/pytorch/pytorch/issues/66024 #65779

cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved albanD

Test Plan: Imported from OSS

Reviewed By: Gamrix, albanD

Differential Revision: D31615588

Pulled By: anjali411

fbshipit-source-id: c3e65ef0fe301630eb76732ccd7819683c09aa19
2021-10-13 13:57:51 -07:00
675ba6cd53 [qnnpack] Remove usage of conv_param_t in deconv-run.cc (#66465)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66465

conv_param_t is being removed as it stores redundant information. This removes the last usage of it in qnnpack so we can begin removing the dependency.
ghstack-source-id: 140475374

Test Plan: github tests

Reviewed By: kimishpatel

Differential Revision: D31564679

fbshipit-source-id: 049a28fac0235b2e739fb2e048484d7e8e7189fa
2021-10-13 13:51:15 -07:00
86cf22cb1c Add OpInfo for torch.bucketize (#65821)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65821

Reviewed By: malfet, mruberry

Differential Revision: D31386048

Pulled By: saketh-are

fbshipit-source-id: fae7ec7b6b57436d87d38d421c5f3f52be4cdadd
2021-10-13 13:46:35 -07:00
035310c574 Handle shared memory cases in MathBithFallback (#63602)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63602

This PR fixes the case when a read and write is performed on a memory shared between mutable and (or) non-mutable arguments. Example:
```
a=torch.tensor([1+1j])
b=a.conj()
b.add_(a) # should return tensor([2]) but returns tensor ([2-2j])
```

The issue here is that in the conjugate fallback, we resolve the conjugation in-place for mutable arguments which can be a problem as shown above in the case when other input arguments share memory with the mutable argument(s).
This PR fixes this issue by:
1. first scanning through the operator input arguments and creating a vector of mutable arguments that have the conj bit set to `True` (and accordingly setting the flag `check_for_alias_with_mut_arg ` to `True` or `False`).
2. Iterating through all the arguments. At this time we only look at the non-mutable arguments. If `check_for_alias_with_mut_arg` is set to `True`, then we iterate through `mutable_inputs` to check if the current arg tensor in question doesn't alias any of the entries in `mutable_inputs`. If yes, then we clone the non-mutable tensor arg, else we resolve the conjugation as before.
3. Now we look through the mutable_inputs vector (which contains only mutable input tensors with conj bit set to `True`). We in-place conjugate each of the entries in the vector.
4. Do the computation.
5. Re-conjugate the mutable argument tensors.

NOTE: `TensorLists` are not fully handled in ConjugateFallback. Please see the in-line comment for more details.

Fixes https://github.com/pytorch/pytorch/issues/59943

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D30466905

Pulled By: anjali411

fbshipit-source-id: 58058e5e6481da04a12d03f743c1491942a6cc9b
2021-10-13 13:39:31 -07:00
c04bcde245 Make empty* and *_like factory functions respect tensor subclasses (#65677)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65243

cc albanD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65677

Reviewed By: dagitses

Differential Revision: D31432032

Pulled By: albanD

fbshipit-source-id: 77f464974c7656c1206085aba9300471d7e0ef57
2021-10-13 13:34:53 -07:00
b792a77895 Skip interactive_embedded_interpreter.cpp for clang-tidy (#66569)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66569

Reviewed By: suo

Differential Revision: D31622885

Pulled By: malfet

fbshipit-source-id: 61bad5ff3011f992cdd149724c935c098996d6a2
2021-10-13 13:27:56 -07:00
09b90612c4 .github: Enable onnx tests (#66513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66513

These were missed in the migration of onnx to github actions.

Adds ort tests with 2 shards for the onnx workflow

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31599433

Pulled By: seemethere

fbshipit-source-id: 73dce0d3017c4280e64f0c8578e2be7ef6a168d6
2021-10-13 13:14:02 -07:00
f48f20e154 Make ContainerHash compatible with const& types (#66497)
Summary:
- this change should not impact existing use cases, but allows for
  additional use cases where the container holds const types.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66497

Reviewed By: alanwaketan

Differential Revision: D31582242

Pulled By: wconstab

fbshipit-source-id: 3a0e18b4afaf3c7ff93a0e3d09067ed066402b44
2021-10-13 12:45:17 -07:00
fdd9f49cf5 add a note on numerical accuracy (#65947)
Summary:
Per title
Fixes https://github.com/pytorch/pytorch/issues/54437

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65947

Reviewed By: albanD

Differential Revision: D31612445

Pulled By: ngimel

fbshipit-source-id: 5c155891a088aef3b9813f253d0dc1ee4d51ae1c
2021-10-13 12:43:55 -07:00
a453ebc8ac Use interactive_embedded_interpreter to dynamicly loading various third-party libraries (#66512)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66512

TLDR, we are able to use the interactive_embedded_interpreter (basically just torch::deploy interpreter with an interactive shell) to dynamicly load various third party libraries. We use the popular libraries numpy, scipy, regex, pandas for illustration purpose.

A couple of changes need to be done for the interactive_embedded_interpreter:
1, we need link with :embedded_interpreter_all rather than :embedded_interpreter so we can enable DEEPBIND and use our custom loader
2, we provide a pylibRoot path to construct the InterpreterManager. The path will be added to the embedded interpreter's sys.path. Typically we can pass in the python library root path in a conda environment so torch::deploy interpreter can find all installed packages.
3, we allow interactive_embedded_interpreter execute a script to ease recording the exploration of various python libraries.
ghstack-source-id: 140453213

Test Plan:
Install numpy, scipy, regex, pandas in the conda environment or on the machine directly. Suppose /home/shunting/.local/lib/python3.8/site-packages/ is the root path for the installed libraries.

- buck run mode/opt :interactive_embedded_interpreter -- --pylib_root=/home/shunting/.local/lib/python3.8/site-packages/ --pyscript=~/p7/iei_examples/try_regex.py
content of try_regex.py:
```
import regex

print(regex)
pat = r'(.+)\1'
print(regex.match(pat, "abcabc"))
print(regex.match(pat, "abcba"))

print("bye")
```

- buck run mode/opt :interactive_embedded_interpreter -- --pylib_root=/home/shunting/.local/lib/python3.8/site-packages/ --pyscript=~/p7/iei_examples/try_numpy.py
content of try_numpy.py:
```
import numpy as np
print(f"numpy at {np}")
a = np.random.rand(2, 3)
b = np.random.rand(3, 2)
print(np.matmul(a, b))
```

- buck run mode/opt :interactive_embedded_interpreter -- --pylib_root=/home/shunting/.local/lib/python3.8/site-packages/ --pyscript=~/p7/iei_examples/try_scipy.py
content of try_scipy.py:
```
import numpy as np
from scipy import linalg

mat_a = np.array([[1, 0, 0, 0], [1, 1, 0, 0], [1, 2, 1, 0], [1, 3, 3, 1]])
mat_b = linalg.inv(mat_a)
print(mat_b)
```

- buck run mode/opt :interactive_embedded_interpreter -- --pylib_root=/home/shunting/.local/lib/python3.8/site-packages/ --pyscript=~/p7/iei_examples/try_pandas.py
content of try_pandas.py:
```
import pandas as pd
print(f"pandas at {pd}")
df = pd.DataFrame({
  "col1": [1, 2, 3, 4],
  "col2": [2, 4, 8, 16],
})
print(df)
```

Reviewed By: suo

Differential Revision: D31587278

fbshipit-source-id: c0b031c1fa71a77cdfeba1d04514f83127f79012
2021-10-13 12:39:13 -07:00
a8815d557a [vulkan] Remove the persistent resource pool (#66478)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66478

A persistent resource pool was needed to store prepacked tensors since the main resource pool tied to the global Vulkan context would be flushed at the end of each inference run. However, prepacked tensors needed to  alive between inference runs, so an additional persistent resource pool was introduced that would only be flushed when the Vulkan context was destroyed.

However, with [this change](https://github.com/pytorch/pytorch/pull/66477) the resource pool no longer indiscrimately flushes allocated resources at the end of an inference run. Tensors will have to call `release_resources()` before they become eligible to be destroyed. Since prepacked tensors are tied to an `OpContext` object they will stay alive between inference runs.

Therefore, the persistent resource pool is no longer needed.

Test Plan: Build and run `vulkan_api_test`.

Reviewed By: beback4u

Differential Revision: D31490076

fbshipit-source-id: 3741a2333c834796d589774e819eaaf52bb9f0fe
2021-10-13 12:01:08 -07:00
cebaf21c5a [vulkan] Release GPU resources when vTensor::View is destroyed (#66477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66477

Currently, Vulkan tensor memory is allocated and deallocated through the following mechanism:

1. During inference, ops will request buffer and/or texture memory for tensors from the [Resource Pool](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/api/Resource.h#L324-L327)
2. The resource pool allocates the memory and [adds it to a vector](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/api/Resource.cpp#L609-L622) containing all the memory allocations it has made this inference, then returns the most recently allocated block of memory
3. At the end of inference, results are transferred back to the CPU and the [context is flushed](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/ops/Copy.cpp#L150)
4. As part of the context flush the [resource pool is purged](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/api/Context.cpp#L143) which [deallocates all buffer and texture memory](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/api/Resource.cpp#L683-L684) allocated by the resource pool

This pattern makes it impossible to have models with multiple outputs. When the first output tensor is transferred back to the CPU, the memory of the other output tensors will be deallocated when the context is flushed.

Instead, an alternative is to tie resource destruction to the destructor of the [vTensor::View](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/ops/Tensor.h#L243) class, which holds the actual implementation and storage of Vulkan tensors. This will ensure that memory associated with a tensor will be cleaned up whenever it is no longer used.

The new deallocation mechanism proposed is:

1. During inference, `vTensor` objects will request GPU memory from the resource pool, same as before.
2. The resource pool allocates buffer or texture memory and returns it directly to  the `vTensor`
3. Throughout inference, intermediate tensors' reference counts will go to 0 and the destructor of the `View` class will be called
4. The destructor will any texture and buffer memory it's holding to the resource pool's list of GPU memory allocations to be cleaned
5. At the end of inference `purge()` will be called which will destroy all allocations in the list of allocations to be cleaned
6. GPU memory for output tensors will not be destroyed, since their reference counts will be greater than 0, thus they have not yet been added to the list of allocations to be destroyed

Note that it is not correct to have the destructor directly deallocate GPU memory. This is due to the fact that Vulkan ops simply submit work to the GPU but does not guarantee that work has completed when the op returns. Therefore we must keep all allocated GPU memory until the end of inference, when we wait for the GPU to complete work.

Test Plan:
build and run `vulkan_api_test` to make sure existing functionality is not impacted.

Also test in a later diff that checks that output tensors stay alive after inference completes.

Reviewed By: dreiss

Differential Revision: D31510899

fbshipit-source-id: 99250c2800a68f07b1b91dbf5d3b293184da5bd2
2021-10-13 11:59:40 -07:00
5e34ac6c43 [FX] Fix cases when we should not fuse due to more than one users of intermediate node (#66472)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66472

A follow up of https://github.com/pytorch/pytorch/pull/66362. Same fix.

Test Plan:
```
buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_matmul_trt
buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_linear_trt

```

Reviewed By: wushirong, 842974287

Differential Revision: D31567662

fbshipit-source-id: 2c9e6a138fc31996d790fd4d79e0bf931507fc99
2021-10-13 11:53:42 -07:00
9d13ae450a [oss/ci] skip all dataloader tests with asan (#66561)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66561

See https://github.com/pytorch/pytorch/issues/66223 for context.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D31617142

Pulled By: suo

fbshipit-source-id: 16b280fc47a7c40fa19c5c72192d342dd33680bf
2021-10-13 11:39:41 -07:00
713e025c9f Add no-input-grad-needed cases to test_grid_sample (#66071)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66071

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D31431801

Pulled By: albanD

fbshipit-source-id: 57a94ed9e97e402aa8193d69355e57b6309c64f7
2021-10-13 10:56:47 -07:00
8a40bb62f9 Compute input gradient only if required (CUDA) (#66070)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66070

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D31431805

Pulled By: albanD

fbshipit-source-id: 8c3de6632aaee168ec6fd7eb79a5af26973af9c5
2021-10-13 10:56:45 -07:00
f8d98b5a6d Compute input gradient only if required (CPU) (#66069)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66069

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D31431803

Pulled By: albanD

fbshipit-source-id: d4caba5fa092e4ee7411502021836370082670b2
2021-10-13 10:56:43 -07:00
84385c40e4 Add output_mask (#66068)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66068

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D31431802

Pulled By: albanD

fbshipit-source-id: 322aae5614dacb06fd45e513465b7a5cc11f4dbb
2021-10-13 10:55:27 -07:00
6401658b08 fix type error in hipify_python.py (#66164)
Summary:
- [x] Fixed the Pyre type checking errors in `torch/utils/hipify/hipify_python.py`:
```
torch/utils/hipify/hipify_python.py:196:8 Incompatible variable type [9]: clean_ctx is declared to have type `GeneratedFileCleaner` but is used as type `None`.
torch/utils/hipify/hipify_python.py:944:4 Incompatible variable type [9]: clean_ctx is declared to have type `GeneratedFileCleaner` but is used as type `None`.
```

Fixing the issue: https://github.com/MLH-Fellowship/pyre-check/issues/78

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66164

Reviewed By: onionymous

Differential Revision: D31411443

Pulled By: 0xedward

fbshipit-source-id: c69f8fb839ad1d5ba5e4a223e1322ae7207e1574
2021-10-13 10:33:49 -07:00
d85948896c Add softplus support to autodiff (#63942)
Summary:
Add softplus definition to autodiff.

cc gmagogsfm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63942

Reviewed By: ngimel

Differential Revision: D31397158

Pulled By: eellison

fbshipit-source-id: f7db547370f82e5e282505c3c8415fb4fbd86d54
2021-10-13 08:08:09 -07:00
82a216c45b Add tensor.{adjoint(),H,mT,mH} methods and properties (#64179)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64179

This PR follows the discussion in https://github.com/pytorch/pytorch/issues/45063#issuecomment-904431478

Fixes https://github.com/pytorch/pytorch/issues/45063

cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30730483

Pulled By: anjali411

fbshipit-source-id: 821d25083f5f682450f6812bf852dc96a1cdf9f2
2021-10-13 07:44:43 -07:00
87df043f63 [Bootcamp][Pytorch]Add testing for complex parameters in Adagrad optimizer (#66501)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66501

Add testing for the Adagrad optimizer to ensure that it behaves as if complex numbers are two real numbers in R^2 as per issue 65711 on github
ghstack-source-id: 140414042

Test Plan:
buck test mode/dev caffe2/test:optim -- 'test_adagrad_complex'

https://pxl.cl/1R27M

Reviewed By: albanD

Differential Revision: D31584240

fbshipit-source-id: 5c9938084566b8ea49cc8ff002789731f62fe87e
2021-10-13 07:05:20 -07:00
ecb7b38c00 [PyTorch] Support additional arguments in Python record function (#65736)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65736

We ran into some limitations to extract PyTorch operator parameters through hooks or the execution graph. Some of these limitations are not due to the operator not exposing them, rather the inputs for these operators are already fused/processed in some cases (like embedding table). We want to be able to attach some metadata to the user scope record functions allowing the profilers to later extract these information.

The record function C++ API already supports taking inputs and outputs information. The corresponding Python interface does not support them and only allows a string name as record function parameter.

This diff adds support for user to optionally to add additional arguments to the record function in two ways.
1. to remain backward compatible with `record_function_op`, we have added an optional string arg to the interface: `with record_function(name, arg_str)`.
2. to support data dependency graph, we also have the new `torch.autograd._record_function_with_args_enter` and `torch.autograd._record_function_with_args_exit` functions to provide an interface where we can give additional tensor arguments. For now we imagine this can be used for debugging or analysis purpose. In this form, we currently support some basic data types as inputs: scalars, string, list, and tensor.

Example usage:

```
# record_function operator with a name and optionally, a string for arguments.
with record_function("## TEST 1 ##", "[1, 2, 3]"):
    <actual module or operator>

# more general form of record_function
a = _record_function_with_args_enter("## TEST 2 ##", 1, False, 2.5, [u, u], "hello", u)
<actual module or operator>
_record_function_with_args_exit(a)

```
Corresponding outputs in execution graph:
```
    {
      "name": "## TEST 2 ##", "id": 7, "parent": 3, "fw_parent": 0, "scope": 5, "tid": 1, "fw_tid": 0,
      "inputs": [1,false,2.5,[6,6],"hello",6], "input_shapes": [[],[],[],[[3,4,5],[3,4,5]],[],[3,4,5]], "input_types": ["Int","Bool","Double","GenericList[Tensor(float),Tensor(float)]","String","Tensor(float)"],
      "outputs": [], "output_shapes": [], "output_types": []
    },
    {
      "name": "## TEST 1 ##", "id": 3, "parent": 2, "fw_parent": 0, "scope": 5, "tid": 1, "fw_tid": 0,
      "inputs": ["1, 2, 3"], "input_shapes": [[]], "input_types": ["String"],
      "outputs": [], "output_shapes": [], "output_types": []
    },
```

Test Plan:
```
=> buck build caffe2/test:profiler --show-output
=> buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction
test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_1651304.json
Net filter:
Target net for iteration count:
Net Iterations: 3
INFO:2021-09-27 01:10:15 1651304:1651304 Config.cpp:424] Trace start time: 2021-09-27 01:10:30
Trace duration: 500ms
Warmup duration: 5s
Net size threshold: 0
GPU op count threshold: 0
Max GPU buffer size: 128MB
Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event
Manifold bucket: gpu_traces
Manifold object: tree/traces/clientAPI/0/1632730215/devvm2060.ftw0/libkineto_activities_1651304.json
Trace compression enabled: 1
INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:536] Tracing starting in 14s
INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:48] Target net for iterations not specified - picking first encountered that passes net filter
INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:57] Tracking net PyTorch Profiler for 3 iterations
INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:126] Processing 1 CPU buffers
INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:686] Recorded nets:
INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:689] PyTorch Profiler: 1 iterations
ok

----------------------------------------------------------------------
Ran 1 test in 0.021s

OK
```

Reviewed By: gdankel

Differential Revision: D31165259

fbshipit-source-id: 15920aaef7138c666e5eca2a71c3bf33073eadc4
2021-10-13 01:49:15 -07:00
9918fd8305 [fx2trt] open source tests for converters (#66361)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66361

ossci will be setup later, fbonly ci is ready

Test Plan:
buck run caffe2/test:fx2trt_test_linear

testinprod

Reviewed By: 842974287

Differential Revision: D31511082

fbshipit-source-id: 9e2c50c83fdba822cd2488eb17b5787d8a57f087
2021-10-13 00:09:43 -07:00
80a3619823 Remove THCTensorMathReduce.cuh (#66389)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66389

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31547711

Pulled By: ngimel

fbshipit-source-id: c181d14f66536b6873b5b14088312c6c70bf0855
2021-10-12 22:59:19 -07:00
bc6935ddf5 [PyTorch][Distributed][Easy] Make ShardedTensor.size() equivalent to torch.Tensor.size() (#65087) (#66012)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66012

Test Plan: Imported from OSS

Reviewed By: pritamdamania87

Differential Revision: D31345161

Pulled By: fduwjj

fbshipit-source-id: 10d6b65780ab0c6934babcc7c36a181cb66f0b7c
2021-10-12 22:26:22 -07:00
8eb85b5027 Remove THCNumerics (#66388)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66388

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D31547710

Pulled By: ngimel

fbshipit-source-id: 20710328f2e5fc2e931a3f8ba9b4243acc310d54
2021-10-12 22:05:03 -07:00
2d3b23190c Revert D31591512: .github: Enable onnx tests
Test Plan: revert-hammer

Differential Revision:
D31591512 (06a156efc7)

Original commit changeset: 4a8bb3f0e62f

fbshipit-source-id: 2d8580c0e507c2a0b30431bcf30eb01cef82f602
2021-10-12 20:17:02 -07:00
08f3823647 Sparse CSR CUDA: add addmv_out (#61407)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61407

This PR adds `addmv_out_sparse_csr_cuda`. The operation is used to
compute matrix-vector multiplication. Since structured_delegate is used
we only need to implement the out variant, the in-place and normal
variants are autogenerated.
Working on this PR revealed that float16 (and probably bfloat16) inputs
do not work correctly in cusparse, therefore for this case `addmm` is
used with squeezes and unsqueezes.

cc nikitaved pearu cpuhrsch IvanYashchuk ngimel

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31584499

Pulled By: ngimel

fbshipit-source-id: 4c507791471ada88969116b88eeaaba7a7536431
2021-10-12 20:06:56 -07:00
8492e6bc6a .github: scheduled -> schedule, fix periodic (#66531)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66531

The github.event_name should be schedule not scheduled

Reference, https://docs.github.com/en/actions/learn-github-actions/events-that-trigger-workflows#schedule

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31598136

Pulled By: seemethere

fbshipit-source-id: 4d67f7731b21e05dabc8f54b4ebf9a5d2d3a4e1e
2021-10-12 19:46:01 -07:00
06a156efc7 .github: Enable onnx tests (#66513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66513

These were missed in the migration of onnx to github actions.

Adds ort tests with 2 shards for the onnx workflow

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31591512

Pulled By: seemethere

fbshipit-source-id: 4a8bb3f0e62ff98ee77d3d8afc905f4e02db6f24
2021-10-12 19:35:09 -07:00
93d326c868 Add InplaceOrView boxed kernel (#63878)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63878

See https://github.com/pytorch/pytorch/issues/64407, https://github.com/pytorch/pytorch/issues/62032 for context:

In this PR:
 - Add boxed kernel by replicating `gen_inplace_or_view`'s logic that is ONLY for use with the Autograd not-implemented kernel
   - Unlike `gen_inplace_or_view` we always pass a view_func to as_view in order to ensure that an "derivative is not implemented" error is raised even if an in-place update is performed on the view. Without the `view_func`, the CopySlice + AsStridedBackward nodes would replace the NotImplemented node.
   - This limitation makes it impossible to use this node for general use
   - view relationship must be between first input (must be tensor) and first output (may be tensor or vec of tensor)
   - do not support non-differentiable views (_values, _indices, view.dtype) - view relationship is always fw and bw differentiable
 - Adds the macro `#define REGISTER_AUTOGRAD_NOT_IMPLEMENTED_FALLBACK(ns, op)` to be the interface for this feature:
   - static initialization can be slowed down(? not measured) if there are many registrations, because each line translates to 2 library calls but the workaround is just to manually use the two functions `AutogradNotImplementedFallback` and `ADInplaceOrViewFallback` and call `m.impl`.
 - Adds testing:
    - for views: view relationship created
      -  performing in-place operation on the view, raises properly
      - trying to create two view relationships is not allowed,
      - single view relationship but not first input/first output should error
      - view relation created properly for tensor vector output
    - for in-place:
      - version count bump
      - triggers rebase_history
      - multiple mutations is okay and also updates version counter
 - TODO (follow up): Update tutorials for adding  third-party operators (and document the above limitations)
 - TODO (follow up): Look at torch-audio/torch-vision and identify places where this can simplify existing code

EDIT: Made it more clear what is introduced in this PR and moved some more contextual stuff into the issue itself

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30901714

Pulled By: soulitzer

fbshipit-source-id: 48de14c28be023ff4bd31b7ea5e7cba88aeee04c
2021-10-12 18:55:50 -07:00
40794dbb25 add backend_config_dict to checkGraphModeFxOp (#66499)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66499

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D31582518

Pulled By: rahxephon89

fbshipit-source-id: b8107bb7140517f2dc32bf692c6b916536ea35c3
2021-10-12 18:35:54 -07:00
d32736e317 Make permission errors more human readable (#66492)
Summary:
`_mkdir_p` feels like a remnant of Python-2 era, add `exist_ok` argument and re-raise OSError to make it more human readable.

After the change attempt to build PyTorch in a folder that does not have write permissions will result in:
```
% python3.6 setup.py develop
Building wheel torch-1.10.0a0+git9509e8a
-- Building version 1.10.0a0+git9509e8a
Traceback (most recent call last):
  File "/Users/nshulga/git/pytorch-worktree/tools/setup_helpers/cmake.py", line 21, in _mkdir_p
    os.makedirs(d, exist_ok=True)
  File "/opt/homebrew/Cellar/python36/3.6.2+_254.20170915/Frameworks/Python.framework/Versions/3.6/lib/python3.6/os.py", line 220, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: 'build'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "setup.py", line 895, in <module>
    build_deps()
  File "setup.py", line 370, in build_deps
    cmake=cmake)
  File "/Users/nshulga/git/pytorch-worktree/tools/build_pytorch_libs.py", line 63, in build_caffe2
    rerun_cmake)
  File "/Users/nshulga/git/pytorch-worktree/tools/setup_helpers/cmake.py", line 225, in generate
    _mkdir_p(self.build_dir)
  File "/Users/nshulga/git/pytorch-worktree/tools/setup_helpers/cmake.py", line 23, in _mkdir_p
    raise RuntimeError(f"Failed to create folder {os.path.abspath(d)}: {e.strerror}") from e
RuntimeError: Failed to create folder /Users/nshulga/git/pytorch-worktree/build: Permission denied
```

Fixes https://github.com/pytorch/pytorch/issues/65920

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66492

Reviewed By: seemethere

Differential Revision: D31578820

Pulled By: malfet

fbshipit-source-id: afe8240983100ac0a26cc540376b9dd71b1b53af
2021-10-12 18:31:24 -07:00
d921891f57 GHA: Stop skipping periodic jobs (#66264)
Summary:
they have been skipped for too long
![image](https://user-images.githubusercontent.com/31798555/136433267-f35c0507-23ab-4348-be43-78d299c3d654.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66264

Reviewed By: dagitses, malfet, seemethere

Differential Revision: D31478705

Pulled By: janeyx99

fbshipit-source-id: 1324b123e3f8646e5cd671af4c1850398a6f6e3b
2021-10-12 14:39:47 -07:00
3ac2c74896 Revert D31082208: Use shared CUPTI by default
Test Plan: revert-hammer

Differential Revision:
D31082208 (8b0eae5aa8)

Original commit changeset: 14f66af92084

fbshipit-source-id: 0faff00832b7f79d476fd1f9f505142a548a76db
2021-10-12 14:37:54 -07:00
9984f4bb8b Remove native_functions.yaml dependency from some reduction operators (#64173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64173

This one also required restructuring the code a bit to move the kernel
code into seperate files. So, I've mainly focused on CUDA which is
where the real build-time issues are.

Test Plan: Imported from OSS

Reviewed By: jbschlosser, ezyang

Differential Revision: D30728581

Pulled By: dagitses

fbshipit-source-id: a69eea5b4100d16165a02660dde200c8f648683d
2021-10-12 13:11:24 -07:00
ee38a467ea fix normal with empty std (#66463)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65709

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66463

Reviewed By: navahgar

Differential Revision: D31561904

Pulled By: ngimel

fbshipit-source-id: 3b2f44dc0ec075fe4f9685696578a0ff6e58d501
2021-10-12 11:28:11 -07:00
8b0eae5aa8 Use shared CUPTI by default (#65401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65401

Per https://github.com/pytorch/pytorch/issues/57744 statically linked CUPTI
causes exception handling to break on certain compiler configurations, likely
because CUPTI comes with incompatible libstdc++ symbols.  Rather than pray that
something reasonable happens, use the safer configuration (dynamic linking) by
default and give a warning if the user inverts the setting.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: gdankel

Differential Revision: D31082208

Pulled By: ezyang

fbshipit-source-id: 14f66af920847e158436b5801c43f3124b109b34
2021-10-12 11:01:40 -07:00
c6216b2a43 Back out "Revert D30710710: [Pytorch Edge] Support profiling kineto events from external source" (#66421)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66421

Original commit changeset: ab6bb8fe4e83

Plus this incldes BUILD.bazel changes, the reason for the revert.

Test Plan: See original diff

Reviewed By: gdankel

Differential Revision: D31542513

fbshipit-source-id: ee30aca2d6705638f97e04b77a9ae31fe5cc4ebb
2021-10-12 10:55:29 -07:00
d7916e3734 [jit] Eliminate malloc & recursive refcount bumps in HashType::operator() (#65348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65348

Previously, this took several percent of model loading time. Now it is well under 1%.

We get this savings by avoiding allocating a vector and avoiding reference count bumps on contained types within each type.
ghstack-source-id: 140148562

Reviewed By: suo

Differential Revision: D31057278

fbshipit-source-id: 55a02cbfefb8602e41baddc2661d15385fb2da55
2021-10-12 10:51:17 -07:00
47c531b6e8 [jit] Compare object identity first in ClassType::operator== (#65347)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65347

This check is much cheaper than anything involving actually inspecting object fields (i.e., the cost is low), and if it succeeds we can skip the expensive (e.g., it involves locking a weak_ptr and then destroying the resulting shared_ptr)  function body. It almost entirely eliminates time spent in this function during model loading according to perf.
ghstack-source-id: 140148561

Test Plan: Specifically I profiled static runtime startup for the ctr_mobile_feed model and saw self time in this function go from 2-3% to 0.36%.

Reviewed By: ejguan

Differential Revision: D31057279

fbshipit-source-id: efb6bdc0957b680112ac282e85dc1b06b1b6c0bd
2021-10-12 10:49:36 -07:00
17e79bc76c remove is_reference from all is_output_quantized (#66456)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66456

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D31562633

Pulled By: rahxephon89

fbshipit-source-id: 85c73a23e90ba9c1406f4027d447fbbe4576e39a
2021-10-12 10:43:52 -07:00
702fb1de72 [fx2trt] open source tests for acc tracer (#66302)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66302

Just move files, ossci can be setup later

Test Plan:
buck run //caffe2/test:test_fx_acc_tracer

testinprod

Reviewed By: 842974287

Differential Revision: D31495087

fbshipit-source-id: f182c7438e3e80ba98924990682cb45a99b9967c
2021-10-12 10:27:34 -07:00
a6eec0c60f Upgrade onnx submodule to 85546f8c44e627f8ff1181725d03cc49f675e44f (#66427)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66427

Update the onnx submodule, so https://github.com/pytorch/pytorch/pull/66140 can land.

Test Plan: ci

Reviewed By: ezyang

Differential Revision: D31544610

fbshipit-source-id: 94831ef531bbd654a6aeb744cd53a38155848079
2021-10-12 09:46:08 -07:00
e6261083f9 [FX] fuse permute021 linear pass for trt lowering (#66362)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66362

In general we cannot rely on Permute021Linear being kept as is before lowering phase before our transformation could have traced through this module. A acc based fx pass is more reliable to recover the perf.

Test Plan:
```
buck run mode/opt -c python.package_style=inplace -c fbcode.nvcc_arch=a100 //hpc/new/models/ads/benchmarks:ads_dense_benchmark -- over-arch --model-version=23x_3tb --batch-size=2048

OverArch, PyTorch, FP16, BS: 2048, TFLOP/s: 53.22, Time per iter: 14.46ms, QPS: 141629.45
OverArch, TensorRT, FP16, BS: 2048, TFLOP/s: 92.20, Time per iter: 8.35ms, QPS: 245354.15
```

Unittest:
```
buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_linear_trt
```

Reviewed By: jianyuh, wushirong, 842974287

Differential Revision: D31525307

fbshipit-source-id: b472a8c277aa4d156d933d6a5abec091133f22c5
2021-10-12 09:41:32 -07:00
8818dda237 Fix lstsq to work with inputs that require grad (#66426)
Summary:
I updated `sample_inputs_linalg_lstsq` and `test_nondifferentiable`
now correctly reveals the failure. The internal assert error was thrown
because autograd attempts to mark integer tensor as differentiable.

Fixes https://github.com/pytorch/pytorch/issues/66420.

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66426

Reviewed By: ejguan

Differential Revision: D31550942

Pulled By: albanD

fbshipit-source-id: 4a0ca60e62c5e9bb96af5020541da2d09ea3e405
2021-10-12 08:52:21 -07:00
213ac4e59c Remove native_functions.yaml dependency from PointwiseOps (#64172)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64172

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30728584

Pulled By: dagitses

fbshipit-source-id: 2ae9686ac7c312e2d470d26a3cad12afcf7ef47b
2021-10-12 08:12:25 -07:00
8674a3c6e3 Remove native_functions.yaml dependency from PowKernel (#64171)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64171

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30728583

Pulled By: dagitses

fbshipit-source-id: ea6891a3598eead93daea620b94e50d3a3b248cf
2021-10-12 08:12:23 -07:00
1841f76cc0 Remove native_functions.yaml dependency from unary ops (#64170)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64170

Test Plan: Imported from OSS

Reviewed By: gchanan, ezyang

Differential Revision: D30728578

Pulled By: dagitses

fbshipit-source-id: 70baa90d0834e68324504c74064a1d1790193483
2021-10-12 08:11:03 -07:00
71e17d9827 [DataPipe] Fix HttpReader IterDataPipe Issue with streaming (#66432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66432

This PR aims to fix the same issue that was addressed in TorchData.

See this [TorchData PR](https://github.com/pytorch/data/pull/51) and the corresponding [issue](https://github.com/pytorch/data/issues/42) for details.

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31547565

Pulled By: NivekT

fbshipit-source-id: 1e0cb13be270e6b81a11af54fa08cf6d7e7c5721
2021-10-12 07:37:57 -07:00
5f1518609b [TensorExpr] Fix lowering for aten::t. (#65859)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65859

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D31289347

Pulled By: ZolotukhinM

fbshipit-source-id: b9648416238657fe23366928e43ed63e992a8973
2021-10-12 01:26:36 -07:00
6864146f2b [TensorExpr] Fix lowerings for aten::view and aten::reshape. (#65852)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65852

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31286024

Pulled By: ZolotukhinM

fbshipit-source-id: eb5b5f2ed86b6f325f09904e841815b8183b4e1d
2021-10-12 01:26:34 -07:00
60a2a295ce [TensorExpr] Use schema instead of op name in NNC lowerings. (#65843)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65843

Fixes #64963.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31282334

Pulled By: ZolotukhinM

fbshipit-source-id: ffd0e1b6433d9360fedd9081c01ef41b21684439
2021-10-12 01:26:32 -07:00
24b9b304d9 [TensorExpr] Nuke TE shape inference. (#65554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65554

We're relying on JIT based shape inference and not using the TE
implementation.

Question to the audience: we set `hasBroadcasts_` in that function, but
this function was almost never invoked. Do we behave correctly in the
presence of rand-calls and broadcasts?

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D31148925

Pulled By: ZolotukhinM

fbshipit-source-id: 2898a57e389ea0950163122089d0fec3d92701c4
2021-10-12 01:25:14 -07:00
18e4688199 [Pytorch Edge] Improve bundled inputs name error handling (#65856)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65856

Occasionally functions dont have this __name__ variable set and have name set instead? Not sure why this happens, but this should catch it.

Test Plan: ci

Reviewed By: iseeyuan

Differential Revision: D31286787

fbshipit-source-id: 8a339541215329b6e9ff43ef77363be41f19c5ca
2021-10-12 00:08:39 -07:00
2d1552824a Revert D31386275: Migrate THCState to ATen
Test Plan: revert-hammer

Differential Revision:
D31386275 (a6774d6e1f)

Original commit changeset: 5c1f1bbe8c3d

fbshipit-source-id: bea4e80fb0bdc57e8bb6a8ee781afd224adf4ed0
2021-10-11 22:30:08 -07:00
d8532e3524 [PyTorch] Split c10 Type.cpp into two files to allow targets to include one of them (#66445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66445

`Type.cpp` implements `demangle()` function based on the macro `HAS_DEMANGLE`. This diff splits it into two `.cpps` so that we can add either one into the build target. This change follows the patternof `flags_use_no_gflags.cpp` and `flags_use_gflags.cpp`.

Test Plan: Rely on CI

Reviewed By: iseeyuan

Differential Revision: D31551432

fbshipit-source-id: f8b11783e513fa812228ec873459ad3043ff9147
2021-10-11 21:52:24 -07:00
07ec250fd7 [deploy] fix oss build (#66347)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66347

It turns out that our hard-coded build flavor that we were running
deploy tests on in CI no longer exists lol. This PR fixes the OSS build
and also updates the build flavor.

Differential Revision:
D31517679
D31517679

Test Plan: Imported from OSS

Reviewed By: malfet, shunting314

Pulled By: suo

fbshipit-source-id: 763f126a3304f82e6dff7cff8c56414d82c54de3
2021-10-11 21:48:26 -07:00
9a85167d22 Fix batch_isend_irecv tests for err case (#63112)
Summary:
- `batch_isend_irecv` returns a list of requests instead of a single request
- remove some unused variables

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63112

Reviewed By: pbelevich, wayi1, fduwjj

Differential Revision: D30921265

fbshipit-source-id: e2075925172805d33974ef0de6fb631bdf33b5ea
2021-10-11 19:39:49 -07:00
3eb9443619 [FX] Fix issue where GraphModule.delete_all_unused_submodules deletes submodules from called leaf modules (#66430)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66430

On the whole, I'm not totally satisfied with this approach. I think we should be building a prefix tree data structure during initial iteration over the submodules and querying that when deleting submodules. But I think this approach works and I want to see if we can get it in before 1.10

Test Plan: Imported from OSS

Reviewed By: Chillee

Differential Revision: D31546137

Pulled By: jamesr66a

fbshipit-source-id: f08b8409a3cf511277017ccccb916097b7c4c4fe
2021-10-11 19:37:51 -07:00
a6774d6e1f Migrate THCState to ATen (#65948)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65948

This guts `THCState` to simply be an empty struct, as well as:
- moving `THCState_getPeerToPeerAccess` and its cache into `ATen`.
- cleaning up dead code in `THCGeneral.cpp`
- moving `THCudaInit` and `THCMagma_init` into `CUDAHooks::initCUDA`

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31386275

Pulled By: ngimel

fbshipit-source-id: 5c1f1bbe8c3d2d9f5b99996e0588fb7f07fa6a77
2021-10-11 19:31:43 -07:00
e7b5712c21 Call PyArray_Check only if NumPy is available (#66433)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66353

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66433

Reviewed By: seemethere, janeyx99

Differential Revision: D31548290

Pulled By: malfet

fbshipit-source-id: 3b094bc8195d0392338e0bdc6df2f39587b85bb3
2021-10-11 19:25:31 -07:00
565cf47abf Quantization docs: add pages for Numeric Suite (Eager and FX) (#66380)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66380

Description:
1. creates doc pages for Eager and FX numeric suites
2. adds a link from main quantization doc to (1)
3. formats docblocks in Eager NS to render well
4. adds example code and docblocks to FX numeric suite

Test Plan:
```
cd docs
make html
python -m http.server
// renders well
```

Reviewed By: jerryzh168

Differential Revision: D31543173

Pulled By: vkuzo

fbshipit-source-id: feb291bcbe92747495f45165f738631fa5cbffbd
2021-10-11 18:47:58 -07:00
8b1258698e Improve quantization API docs (#66379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66379

Description:

Creates a quantization API reference and fixes all the docblock errors.

This is #66122 to #66210 squashed together

Test Plan:
```
cd docs
make html
python -m http.server
// open webpage, inspect it, looks good
```

Reviewed By: ejguan

Differential Revision: D31543172

Pulled By: vkuzo

fbshipit-source-id: 9131363d6528337e9f100759654d3f34f02142a9
2021-10-11 18:46:11 -07:00
88ed93c2ca Fix type checking errors in torch/quantization/fx/qconfig_utils.py (#66428)
Summary:
- [x] Fix the Pyre type checking errors in `torch/quantization/fx/qconfig_utils.py`
```
torch/quantization/fx/qconfig_utils.py:241:46 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/fx/qconfig_utils.py:267:46 Incompatible variable type [9]: convert_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
torch/quantization/fx/qconfig_utils.py:284:43 Incompatible variable type [9]: fuse_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`.
```
Fixes the issue: [MLH-Fellowship/pyre-check/issues/73](https://github.com/MLH-Fellowship/pyre-check/issues/73)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66428

Reviewed By: grievejia

Differential Revision: D31545215

Pulled By: 0xedward

fbshipit-source-id: 767ae7888854c2eec2ecf14855a5b011110b9271
2021-10-11 16:48:11 -07:00
25965619dd Back out "Revert D31495086: open source engine_layer_visualize.py" (#66431)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66431

Original commit changeset: 186f3407a642

Test Plan: testinprod

Reviewed By: 842974287

Differential Revision: D31546998

fbshipit-source-id: 4bc131d895cc4a7a84a4ff277df5f99e69ef4346
2021-10-11 16:06:23 -07:00
ae5a9a451f Do not enforce unused vars rule for torch_deploy (#66447)
Summary:
Followup after  https://github.com/pytorch/pytorch/pull/66041

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66447

Reviewed By: seemethere

Differential Revision: D31554356

Pulled By: malfet

fbshipit-source-id: 6638324dcf658f4b244da285b4360ff2e2e2c013
2021-10-11 15:24:19 -07:00
7baf4f6b12 Chunk: Converter (#66028)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66028

Added converter and unit test for torch.chunk function

Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_gelu

Reviewed By: 842974287

Differential Revision: D31345180

fbshipit-source-id: 9425685671b474449e825aa2a8e7e867a329eb6e
2021-10-11 14:33:25 -07:00
cc24e4e5d0 [NNC] Normalize loops in SplitWithTail (#66242)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66242

While working on random test generation, I observed that many simple transformations were upsetting vectorization. Digging deeper, I found that it calls SplitWithTail which incorrectly splits the loop when the loop start is not zero. This path normalizes the loop before we start splitting it.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D31506853

Pulled By: anijain2305

fbshipit-source-id: 5c5f2568ce0a239bfaa515458be52541eafd23b1
2021-10-11 13:44:05 -07:00
49f1605392 [RFC] Reduce logging noise from AdagradOptimizer (#66443)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66443

For some reason, this logging is adding noise to a lot of flow jobs. I am not sure if this is actually needed.
This is called from the __init__ so it's logged all the time and logs all key:values the current local symbol.

Test Plan: N/A

Reviewed By: chowarfb

Differential Revision: D31534372

fbshipit-source-id: bed032b66fed548c97a6f66b1b9e905fd2738851
2021-10-11 13:25:41 -07:00
c03f851750 [torchelastic] Fix failing tests (#66440)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66440

* Set correct name for test worker executable
* Remove `test_get_override_executable` from oss, there already test that tests the functionality

Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/launcher/fb:launch_test

Reviewed By: d4l3k

Differential Revision: D31544853

fbshipit-source-id: e1e009b4b38830d3a78981f8f93c2314ed851695
2021-10-11 13:06:36 -07:00
1d14fbdad7 [TensorExpr] Adding missing python binding for operators (#66336)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66336

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D31544865

Pulled By: anijain2305

fbshipit-source-id: 04be6cf079efc952d0f0b1e68f7f4da4a19c64fa
2021-10-11 12:47:41 -07:00
08fab7ae13 Wextra fix for Integration.cpp (#66321)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66321

Fixes
```
stderr: caffe2/aten/src/ATen/native/Integration.cpp:62:27: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int64_t' (aka 'long') [-Werror,-Wsign-compare]
    if (curr_shape.size() >= target_n_dim)
        ~~~~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~
stderr: caffe2/aten/src/ATen/native/Integration.cpp:62:27: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int64_t' (aka 'long') [-Werror,-Wsign-compare]
    if (curr_shape.size() >= target_n_dim)
        ~~~~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~
```

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D31505347

fbshipit-source-id: 100b76215f78c3ce75bf4a993715a6767189747d
2021-10-11 12:30:46 -07:00
8c468ce00b [PyTorch][JIT] Return a reference from caching specializations of getTypePtr (#66342)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66342

`decltype(auto)` in D31486117 (fb5a80ffd8) wasn't the right choice in these specializations, because it will *still* deduce a copy.
See https://godbolt.org/z/GjbcPE1c4 for example.
ghstack-source-id: 140144199

Test Plan: CI, added new static_assert to make sure we got it right for std::tuple in particular

Reviewed By: hlu1, JasonHanwen

Differential Revision: D31514960

fbshipit-source-id: cae722aa34345b590c46eae478229cb5f4b0d7dc
2021-10-11 12:17:50 -07:00
998cb98844 [PyTorch][jit] Cache TupleType objects in getTypePtr (#66340)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66340

For functions that take `std::vector`s with `std::tuple`s in them, `getTypePtr` can get hit on every call, in which case creating a new `TupleType` object every time is expensive.
ghstack-source-id: 140143104

Test Plan: CI

Reviewed By: hlu1, JasonHanwen

Differential Revision: D31514792

fbshipit-source-id: 23652ca90ba1259afc05e953b99ce1fe1bebcc2b
2021-10-11 12:16:31 -07:00
acb0157a3d Specialization for c10::util:get_type_index<std::string> (#66290)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66290

Add full specialization for std::string type index

It slightly speeds up compilation as well as solves the ambiguity how template instantiations implemented in inline namespaces are rendered during `__PRETTY_FUNCTION__` computation.

Not sure what `#pragma` controls this behaviour, but when code is compiled by clang-12+ using libstdc++, `__PRETTY_PRINT__`, sometimes resolve `std::string` to `std::basic_string<char>` and sometimes to `std::__cxx11::basic_string<char>`, even though in the object file symbol is always inside `std::__cxx11::` namespace, which might break caffe2 serialization code that depends on dynamic hash generation

Template name resolution were debugged using https://gist.github.com/malfet/c83b9ebd35730ebf8bac7af42682ea37

(Note: this ignores all push blocking failures!)

Test Plan: CI

Reviewed By: r-barnes

Differential Revision: D31490050

fbshipit-source-id: 127091574cf6b92c7ec3f972821e4e76f5f626a9
2021-10-11 11:11:59 -07:00
901df0cc22 Skip test_nccl_errors_nonblocking (#66394)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66394

Skips this test as it currently does not seem to pass after several
internal local runs.
ghstack-source-id: 140210583

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D31534806

fbshipit-source-id: 799849a6a715506a85c9697b46f7098d9b71b32e
2021-10-11 10:08:31 -07:00
221c308389 Wextra fix for LossCTC.cpp (#66381)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66381

Fixes
```
stderr: caffe2/aten/src/ATen/native/cudnn/LossCTC.cpp:83:37: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'const long' [-Werror,-Wsign-compare]
  TORCH_CHECK(input_lengths_.size() == batch_size, "input_lengths needs to have size to match batch_size");
              ~~~~~~~~~~~~~~~~~~~~~ ^  ~~~~~~~~~~
```

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D31510217

fbshipit-source-id: e3585e08650950c08d80d347dfae375aedf2ceaf
2021-10-11 10:02:53 -07:00
736fa09a9a [Static Runtime] Manage output tensors (#65515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65515

This change enables `StaticRuntime` to manage output tensors (returned from a graph) as follows:

- At the creation of `StaticModule`, it gathers a set of candidates for output tensors (& their aliases) for managing. This is done by `ValueGroup` introduced by the previous diff.
- At the end of the 1st iteration, `MemoryPlanner` creates a set of output  `at::Tensor*` to manage. This set consists of tensors objects from the aforementioned candidates, excluding the direct output value of the graph to simplify ivalue ownership passing (`std::move(ivalue)` to return from SR). Note that this exclusion has no perf implication for  inline_cvr & ctr_mobilefeed since they only return a container object (e.g., tuple).
-  The 2nd+ iterations preallocates a slab memory and all identified output tensors during the 1st iteration. Note that these preallocated tensors are *NOT* deallocated when returned from SR. The client receives the output tensors, and completes using them, and is responsible to call `StaticRuntime::deallocateOutputTensors()` to deallocate them. This mandates that SR cannot be reentered until `deallocateOutputTensors` is called by the client.
- In case of a buggy client missing a call to `StaticRuntime::deallocateOutputTensors()`, SR throws an exception when reentered instead of leaking memory.
- Nit: I plan to use camlcase for function names, and so all newly introduced functions use camlcase despite inconsistencies with snakecase. We can gradually fix the inconsistencies.

This change will be followed by another one to enable `manage_output_tensors` from `PyTorchScriptPredictor`, starting with `ptvsc2_prediction_bench` as a testbed.

Test Plan:
- Added `StaticRuntime.ManageOutputTensors*` to cover the newly added code paths.

- Enhanced `testStaticRuntime` to exercise each unittest test case with `manage_output_tensors` on. Confirmed that SR actually managed output tensors successfully for a few existing testcases (e.g., StaticRuntime.EmbeddingBag`).

Reviewed By: hlu1

Differential Revision: D31049221

fbshipit-source-id: 4ad1599179cc7f00d29e0ce41b33f776226d4383
2021-10-11 09:50:54 -07:00
3b4b1b2d23 .github: Remove confusing ciflow_config.enabled variable (#66260)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66260

Every workflow has ciflow enabled so this is not needed anymore

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: dagitses, janeyx99

Differential Revision: D31493340

Pulled By: seemethere

fbshipit-source-id: 8718fe5d22f4be6e0900962576782a9f23162a39
2021-10-11 09:39:31 -07:00
c66847afbe Add workaround for nvcc header dependecies bug (#62550)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62550

I noticed that running the build twice in a row resulted in ~80 CUDA files being
rebuilt. Running `ninja -d explain` shows
```
ninja explain: TH/generic/THStorage.h is dirty
ninja explain: TH/generic/THStorageCopy.h is dirty
ninja explain: THC/generic/THCStorage.h is dirty
ninja explain: THC/generic/THCStorageCopy.h is dirty
ninja explain: TH/generic/THTensor.h is dirty
ninja explain: THC/generic/THCTensor.h is dirty
ninja explain: THC/generic/THCTensorCopy.h is dirty
ninja explain: THC/generic/THCTensorMath.h is dirty
ninja explain: THC/generic/THCTensorMathMagma.h is dirty
ninja explain: THC/generic/THCTensorMathPairwise.h is dirty
ninja explain: THC/generic/THCTensorScatterGather.h is dirty
```

considering `ninja` is working relative to the `build` folder, these files don't
actually exist. I traced this back to the output of `nvcc -MD` containing
paths relative to the include directory, instead of being absolute.

This adds a little script to launch the compiler then resolve any relative paths
in the `.d` file before `ninja` looks at it. To use it, I run the build with
```
export CMAKE_CUDA_COMPILER_LAUNCHER="python;`pwd`/tools/nvcc_fix_deps.py;ccache"
```

There are some possible pit-falls here. The same relative path might work for
two include directories, and the compiler could pick a different one. Or,
the compiler might have additional implicit include directories that are needed
to resolve the path. However, this has worked perfectly in my testing and it's
completely opt-in so should be fine.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31503351

Pulled By: malfet

fbshipit-source-id: b184c4526679d976b93829b5715cafcb1c7db2ae
2021-10-11 09:07:12 -07:00
c373387709 Update CMake and use native CUDA language support (#62445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445

PyTorch currently uses the old style of compiling CUDA in CMake which is just a
bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as
a language just like C++ or C.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31503350

fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55
2021-10-11 09:05:48 -07:00
d3b29afbb6 Remove old code that is unused in test/ (#66331)
Summary:
.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66331

Reviewed By: gchanan

Differential Revision: D31533549

Pulled By: albanD

fbshipit-source-id: 5addd11edc4199a88f10f0ff236be59ec2289903
2021-10-11 08:45:24 -07:00
4775419850 [BE] Address feedback from #66296 (#66315)
Summary:
Also use range loop instead of regular one

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66315

Reviewed By: albanD

Differential Revision: D31503730

Pulled By: malfet

fbshipit-source-id: f5568f7f28e15a9becd27986dd061a6fcae34651
2021-10-11 08:39:29 -07:00
822c0850cb fix pybind issue for get_autocast_cpu_dtype and get_autocast_gpu_dtype (#66396)
Summary:
There has an issue when calling **torch.get_autocast_cpu_dtype** and **torch.get_autocast_gpu_dtype**:
```
>>> torch.get_autocast_gpu_dtype()==torch.half
False
>>> torch.get_autocast_cpu_dtype()==torch.bfloat16
False
```
but the expected results  should be :
```
>>> torch.get_autocast_gpu_dtype()==torch.half
True
>>> torch.get_autocast_cpu_dtype()==torch.bfloat16
True
```

This PR is about fixing this issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66396

Reviewed By: ejguan

Differential Revision: D31541727

Pulled By: albanD

fbshipit-source-id: 1a0fe070a82590ef2926a517bf48046c2633d168
2021-10-11 08:34:48 -07:00
1b40daac74 pinv: forward/backward AD which is Frechet-defined in a rank-preserving neighborhood. (#66092)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65911. Also enables complex support/tests for `linalg_pinv` in OpInfo.

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66092

Reviewed By: ejguan

Differential Revision: D31503072

Pulled By: albanD

fbshipit-source-id: 52018e826826ae62beaad76becb5edf880be253f
2021-10-11 08:33:28 -07:00
7c2f53b363 [BE] set pretrained=False for onnx tests (#66312)
Summary:
Addresses this network risk mitigation mentioned in https://github.com/pytorch/pytorch/issues/65439#issuecomment-924627239.

I didn't include any mobile app/benchmarking changes because I think the pretrained matters there.

I ended up removing the changes in test_utils because those were sensitive to the pretrained variable.

I am saving the quantization test changes for another PR because they are currently disabled.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66312

Reviewed By: ejguan

Differential Revision: D31542992

Pulled By: janeyx99

fbshipit-source-id: 57b4f70247af25cc96c57abd9e689c34641672ff
2021-10-11 08:29:11 -07:00
1d9a6862cd fx quant: add a BC test for loading old torch.package models (#65538)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65538

Adds a test which verifies that `prepare_fx` and `convert_fx` work
on models created by `torch.package` in the past.  In detail:

1. (one time) create a model and save it with torch.package. Also save input,
expected output, and names of quantization related get_attrs added by
our passes.
2. (every time) load the model from (1), and verify that expected output
matches current output, and that get_attr targets did not change.

Test Plan:
```
python test/test_quantization.py TestSerialization.test_linear_relu_package_quantization_transforms
```

Imported from OSS

Reviewed By: supriyar

Differential Revision: D31512939

fbshipit-source-id: 718ad5fb66e09b6b31796ebe0dc698186e9a659f
2021-10-11 08:23:38 -07:00
0348148725 Update link to qnnpack in quantization doc. (#66226)
Summary:
The old repo has been archived.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66226

Reviewed By: vkuzo

Differential Revision: D31534712

Pulled By: ezyang

fbshipit-source-id: 4d7f070c8547aeb25464c72b25ed21f209821bc2
2021-10-11 08:19:19 -07:00
58fefa6516 Add pybind trampoline for ProcessGroup and Work (#66338)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66338

This commit exposes c10d extension API to Python land. Users can
now override c10d communication behaviors in pure Python, and no
longer needs to go through the cpp extension steps.

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D31514351

Pulled By: mrshenli

fbshipit-source-id: a8b94af0af7960c078e1006c29b25f7f3bd86c81
2021-10-11 06:41:06 -07:00
bc06eefebe [reland] Allow external CUDA streams to be set as current (#66324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66324

Fixes https://github.com/pytorch/pytorch/issues/65822.

Reland of https://github.com/pytorch/pytorch/pull/65914.
ghstack-source-id: 140105651

Test Plan: Added tests

Reviewed By: ngimel

Differential Revision: D31506134

fbshipit-source-id: ff56203a120befdb282e974309478ac11aa56652
2021-10-11 02:41:43 -07:00
355acfdebc [PyTorch Edge][tracing-based] use operator.yaml to build libtorch library (#66237)
Summary:
https://pxl.cl/1QK3N
Enable using the yaml file from tracer to build libtorch library for ios and android.

1. Android:
```
SELECTED_OP_LIST=/Users/chenlai/Documents/pytorch/tracing/deeplabv3_scripted_tracing_update.yaml TRACING_BASED=1  ./scripts/build_pytorch_android.sh x86
```
libtorch_lite.so x86: 3 MB (larger than H1, static is ~3.2 MB)

2. iOS
```
SELECTED_OP_LIST=/Users/chenlai/Documents/pytorch/tracing/deeplabv3_scripted_tracing_update.yaml TRACING_BASED=1 BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=SIMULATOR  ./scripts/build_ios.sh
```
Binary size: 7.6 MB
Size:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66237

ghstack-source-id: 140197164

Reviewed By: dhruvbird

Differential Revision: D31463119

fbshipit-source-id: c3f4eb71bdef1969eab6cb60999fec8547641cbd
2021-10-10 14:07:01 -07:00
9971113340 Revert D31447612: Create a documentation page for FX graph mode quantization APIs
Test Plan: revert-hammer

Differential Revision:
D31447612 (a89ac3138e)

Original commit changeset: 07d0a6137f15

fbshipit-source-id: f2cba7d835011500580b4ab9cff72171280ee18b
2021-10-10 01:51:13 -07:00
b85fd4c54f Revert D31447613: Create separate documentation pages for quantization observers and fake_quants
Test Plan: revert-hammer

Differential Revision:
D31447613 (f0fa3d1110)

Original commit changeset: 63b4cf518bad

fbshipit-source-id: 67de592d1e12a5149cdb22b0725caad063f94476
2021-10-10 01:51:11 -07:00
10633460ce Revert D31447614: Create a documentation page for torch.ao.quantization.QConfig
Test Plan: revert-hammer

Differential Revision:
D31447614 (7332ed13ed)

Original commit changeset: 5d9dd2a4e864

fbshipit-source-id: 6ac15a956222ca61f7fbb75ed36bcc58b23f0f36
2021-10-10 01:51:09 -07:00
037ac2330e Revert D31447616: Quantization docs: consilidate all API references on a single page
Test Plan: revert-hammer

Differential Revision:
D31447616 (fe86f0e068)

Original commit changeset: 2f9c4dac2b2f

fbshipit-source-id: 673368e87399f0a25441688bb9356de5a2f3e66e
2021-10-10 01:51:07 -07:00
09c3e6002b Revert D31447615: Quantization docs: rewrite API reference to be more automated
Test Plan: revert-hammer

Differential Revision:
D31447615 (7d2526ab20)

Original commit changeset: 09874ad9629f

fbshipit-source-id: 0963c9f5118e243cd299f8cded2bf7b0848a7105
2021-10-10 01:51:05 -07:00
df1858bea5 Revert D31447611: Quantization documentation: move backend section down
Test Plan: revert-hammer

Differential Revision:
D31447611 (309a8cf46c)

Original commit changeset: 537b146559bc

fbshipit-source-id: c400aef9a2ea5d18f8076879fe6354be7a6732f1
2021-10-10 01:51:03 -07:00
ad0accdecd Revert D31447610: Quantization docs: add pages for Numeric Suite (Eager and FX)
Test Plan: revert-hammer

Differential Revision:
D31447610 (9539e6216b)

Original commit changeset: 441170c4a6c3

fbshipit-source-id: b49bff54405cdb8465397077e38506a36b277921
2021-10-10 01:49:19 -07:00
291d463cf9 Revert D31495086: open source engine_layer_visualize.py
Test Plan: revert-hammer

Differential Revision:
D31495086 (150b7c7410)

Original commit changeset: 1f5505d6baac

fbshipit-source-id: 186f3407a6423f0981f0b7a2e7408ce53013fceb
2021-10-10 01:45:21 -07:00
0e0c98077f [quantized] Implement 3d convolution in qnnpack (#66350)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66350

Implements conv3d for QNNPACK by writing another kernel for the indirection buffer in 3 dimensions. Modifies all structs to take depth, with depth = 1 indicating 2d operation. gemm and conv (non transpose) work, next up is depthwise and tranpose.
ghstack-source-id: 140152440

Test Plan: test/quantization

Reviewed By: kimishpatel

Differential Revision: D30858693

fbshipit-source-id: 883cca8ec53b9e15ab4b9473c6cc042e3d049d9c
2021-10-09 12:28:24 -07:00
150b7c7410 open source engine_layer_visualize.py (#66301)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66301

Test Plan: testinprod

Reviewed By: 842974287

Differential Revision: D31495086

fbshipit-source-id: 1f5505d6baac66eca11a35ce9532d6c7c7513190
2021-10-09 10:25:03 -07:00
27f193af64 Automated submodule update: kineto (#59674)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/kineto](https://github.com/pytorch/kineto).

New submodule commit: 6f9c0eeff5

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59674

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: larryliu0820

Differential Revision: D28977762

fbshipit-source-id: d441d4d46a7044cc05eb8b21e59471deee312e02
2021-10-09 09:34:32 -07:00
84326ef059 Remove native_functions.yaml dependency from binary ops (#64169)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64169

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30728586

Pulled By: dagitses

fbshipit-source-id: 17d645b6712815d1967b9ff83eecc4d16833ee6b
2021-10-09 09:25:48 -07:00
9539e6216b Quantization docs: add pages for Numeric Suite (Eager and FX) (#66222)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66222

Description:
1. creates doc pages for Eager and FX numeric suites
2. adds a link from main quantization doc to (1)
3. formats docblocks in Eager NS to render well
4. adds example code and docblocks to FX numeric suite

Test Plan:
```
cd docs
make html
python -m http.server
// renders well
```

Reviewed By: jerryzh168

Differential Revision: D31447610

Pulled By: vkuzo

fbshipit-source-id: 441170c4a6c3ddea1e7c7c5cc2f1e1cd5aa65f2f
2021-10-09 06:46:06 -07:00
309a8cf46c Quantization documentation: move backend section down (#66210)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66210

Description:

Moves the backend section of the quantization page further down,
to ensure that the API description and reference sections are closer
to the top.

Test Plan:
```
cd docs
make html
python -m server.http
// renders well
```

Reviewed By: jerryzh168

Differential Revision: D31447611

Pulled By: vkuzo

fbshipit-source-id: 537b146559bce484588b3c78e6b0cdb4c274e8dd
2021-10-09 06:46:04 -07:00
7d2526ab20 Quantization docs: rewrite API reference to be more automated (#66201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66201

Description:

This PR switches the quantization API reference to use `autosummary`
for each section.  We define the sections and manually write a list
of modules/functions/methods to include, and sphinx does the rest.
A result is a single page where we have every quantization function
and module with a quick autogenerated blurb, and user can click
through to each of them for a full documentation page.

This mimics how the `torch.nn` and `torch.nn.functional` doc
pages are set up.

In detail, for each section before this PR:
* creates a new section using `autosummary`
* adds all modules/functions/methods which were previously in the manual section
* adds any additional modules/functions/methods which are public facing but not previously documented
* deletes the old manual summary and all links to it

Test Plan:
```
cd docs
make html
python -m http.server
// renders well, links work
```

Reviewed By: jerryzh168

Differential Revision: D31447615

Pulled By: vkuzo

fbshipit-source-id: 09874ad9629f9c00eeab79c406579c6abd974901
2021-10-09 06:46:02 -07:00
fe86f0e068 Quantization docs: consilidate all API references on a single page (#66198)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66198

Consolidates all API reference material for quantization on a single
page, to reduce duplication of information.

Future PRs will improve the API reference page itself.

Test Plan:
```
cd docs
make html
python -m http.server
// renders well
```

Reviewed By: jerryzh168

Differential Revision: D31447616

Pulled By: vkuzo

fbshipit-source-id: 2f9c4dac2b2fb377568332aef79531d1f784444a
2021-10-09 06:46:00 -07:00
7332ed13ed Create a documentation page for torch.ao.quantization.QConfig (#66129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66129

Adds a documentation page for `torch.ao.quantization.QConfig`. It is useful
for this to have a separate page since it shared between Eager and FX graph
mode quantization.

Also, ensures that all important functions and module attributes in this
module have docstrings, so users can discover these without reading the
source code.

Test Plan:
```
cd docs
make html
python -m http.server
// open webpage, inspect it, renders correctly
```

Reviewed By: jerryzh168

Differential Revision: D31447614

Pulled By: vkuzo

fbshipit-source-id: 5d9dd2a4e8647fa17b96cefbaae5299adede619c
2021-10-09 06:45:58 -07:00
f0fa3d1110 Create separate documentation pages for quantization observers and fake_quants (#66125)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66125

Before this PR, the documentation for observers and fake_quants was inlined in the
Eager mode quantization page.  This was hard to discover, especially
since that page is really long, and we now have FX graph mode quantization reusing
all of this code.

This PR moves observers and fake_quants into their own documentation pages. It also
adds docstrings to all user facing module attributes such as the default observers
and fake_quants, so people can discover them from documentation without having
to inspect the source code.

For now, enables autoformatting (which means all public classes, functions, members
with docstrings will get docs).  If we need to exclude something in these files from
docs in the future, we can go back to manual docs.

Test Plan:
```
cd docs
make html
python -m server.http
// inspect docs on localhost, renders correctly
```

Reviewed By: dagitses

Differential Revision: D31447613

Pulled By: vkuzo

fbshipit-source-id: 63b4cf518badfb29ede583a5c2ca823f572c8599
2021-10-09 06:45:56 -07:00
a89ac3138e Create a documentation page for FX graph mode quantization APIs (#66122)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66122

Description:

Adds a documentation page for FX graph mode quantization APIs which
reads from the docstrings in `quantize_fx`, and links it from the main
quantization documentation page.

Also, updates the docstrings in `quantize_fx` to render well with reStructuredText.

Test Plan:
```
cd docs
make html
python -m http.server
// open webpage, inspect it, looks good
```

Reviewed By: dagitses

Differential Revision: D31447612

Pulled By: vkuzo

fbshipit-source-id: 07d0a6137f1537af82dce0a729f9617efaa714a0
2021-10-09 06:44:38 -07:00
b96c7aea73 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D31527108

fbshipit-source-id: 40360ebf92e67fd95613cedea9988fbe52519de6
2021-10-09 06:03:49 -07:00
109aa135e6 Remove apparently unnecessary std::remove_cv_t (#66254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66254

`std::decay_t` already implies dropping the const

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D31465856

fbshipit-source-id: 851cdb9194354fe9a89b3a37a4463a43dbbcd77a
2021-10-09 00:38:44 -07:00
4cb4d11e0b Disable "-Wignored-qualifiers" for vec256_bfloat16.h (#66279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66279

This error appears when compiling with "-Wextra" and cannot be resolved by fixing the code since the return type of the instrinic being passed to `map` is fixed.

Fixes:
```
caffe2/aten/src/ATen/cpu/vec/vec256/vec256_bfloat16.h:204:28: error: 'const' type qualifier on return type has no effect [-Werror,-Wignored-qualifiers]
  Vectorized<BFloat16> map(const __m256 (*const vop)(__m256)) const {
                           ^~~~~~
caffe2/aten/src/ATen/cpu/vec/vec256/vec256_bfloat16.h:204:28: error: 'const' type qualifier on return type has no effect [-Werror,-Wignored-qualifiers]
  Vectorized<BFloat16> map(const __m256 (*const vop)(__m256)) const {
                           ^~~~~~
```

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D31480888

fbshipit-source-id: 919c0d48c8ce13ce1106a9df124a077945e36707
2021-10-08 21:47:41 -07:00
3fe5895a00 Back out "Revert D30599136: [Pytorch Edge][tracing-based] build tracer in OSS" (#66267)
Summary:
Previously https://github.com/pytorch/pytorch/pull/64087 broke the  test `binary_macos_wheel_3_7_cpu_build`, because wheel build is not happy with `model_tracer`. Considering it's prototype and there is no need to ship model_tracer via wheel at the moment, using the option `TRACING_BASED` for building tracer. When tracing-based is mature enough, we can ship the tracer binary via wheel eventually.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66267

Original commit changeset: 8ac3d75a52d0
ghstack-source-id: 140122106

Test Plan:
binary_macos_wheel_3_7_cpu_build passes

{F668643831}

Reviewed By: dhruvbird

Differential Revision: D31478593

fbshipit-source-id: 726cab1b31c4596f6268b7824eecb20e2e59d161
2021-10-08 20:12:12 -07:00
1763c25414 [PyTorch][jit] Fix excess refcounting in TupleType::compare (#66286)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66286

No need to take refcount bumps on each comparator call.

Test Plan: CI, review

Reviewed By: hlu1, JasonHanwen

Differential Revision: D31487058

fbshipit-source-id: 98d2447ac27a12695cb0ebe1e279a6b50744ff4f
2021-10-08 20:08:07 -07:00
fb5a80ffd8 [jit] Don't force refcount bumps from getTypePtr (#66282)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66282

Now that a bunch of the `FooType::get()` functions return a const reference, we can forward that behavior through `getTypePtr()` using return type deduction.

Test Plan: Inspect assembly for List_test.cpp before/after the rest of the change; reference counting is no longer in the happy path.

Reviewed By: hlu1, JasonHanwen

Differential Revision: D31486117

fbshipit-source-id: 863b677bb6685452a5b325d327bdc2a0a09627bf
2021-10-08 20:06:43 -07:00
85b562dd2b Fix type checking errors in fx/utils.py (#66311)
Summary:
- [x] Fix the Pyre type checking errors in `torch/quantization/fx/utils.py`
```
torch/quantization/fx/utils.py:490:4 Incompatible variable type [9]: target_module_type is declared to have type `Type[nn.modules.module.Module]` but is used as type `None`.
```
Fixes the issue: [MLH-Fellowship/pyre-check/issues/75](https://github.com/MLH-Fellowship/pyre-check/issues/75)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66311

Reviewed By: pradeep90

Differential Revision: D31506399

Pulled By: 0xedward

fbshipit-source-id: 3d866fba6005452378d4a2613b8689fa2d7a8b67
2021-10-08 19:14:22 -07:00
e5f6f356da [hpc infer] fix bench perf number
Reviewed By: yinghai, jianyuh

Differential Revision: D31505288

fbshipit-source-id: e4951a7c5813e0ee38903dec4cef61531f1b4059
2021-10-08 19:11:04 -07:00
904fbadaff Fix merge conflict in bc tests (#66356)
Summary:
BC test currently borken on trunk

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66356

Reviewed By: malfet

Differential Revision: D31523340

Pulled By: janeyx99

fbshipit-source-id: a8d1ff697f017c710f70a76b5bb6a2f89d7637c7
2021-10-08 18:45:15 -07:00
5a67ffe0ad [PyTorch][Static Runtime] Combine ProcessedNode::{native_,}fn_ (#65414)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65414

Saves 24 bytes (`sizeof(std::function) - 8`) per ProcessedNode.
ghstack-source-id: 139999909

Test Plan: CI

Reviewed By: hlu1

Differential Revision: D31085561

fbshipit-source-id: 70734b8319e805736ba41aedaaf7fa3d463400c9
2021-10-08 18:11:59 -07:00
566922bbcd clean up mypy nit in torch/jit/_recursive.py (#66253)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66253

This was initially broken in #65829 and unbroken in #66003, this PR cleans
it up by removing the mypy ignore line.

Test Plan:
```
mypy torch/jit/_recursive.py --no-incremental
```

Imported from OSS

Reviewed By: supriyar

Differential Revision: D31475100

fbshipit-source-id: 46ab2ede72c08b926f4f9a6b03b1a1375b884c8a
2021-10-08 18:07:33 -07:00
4a302a3074 Wextra fix for CUDAApplyUtils.cuh (#66323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66323

Fixes
```
/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/caffe2/aten/src/ATen/cuda/CUDAApplyUtils.cuh:310:48: error: comparison of integers of different signs: 'unsigned long' and 'int' [-Werror,-Wsign-compare]
  const IndexType bOffset = sizeof...(Offsets) < n ?
                            ~~~~~~~~~~~~~~~~~~ ^ ~
/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/caffe2/aten/src/ATen/cuda/CUDAApplyUtils.cuh:306:48: error: comparison of integers of different signs: 'unsigned long' and 'int' [-Werror,-Wsign-compare]
  const IndexType aOffset = sizeof...(Offsets) < n ?
                            ~~~~~~~~~~~~~~~~~~ ^ ~
```

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D31505428

fbshipit-source-id: 326fa8f41f2b200981eddc5cab035b18536cd24e
2021-10-08 18:02:09 -07:00
0a48f56318 Revert D31299350: Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor"
Test Plan: revert-hammer

Differential Revision:
D31299350 (f1f3bd8c36)

Original commit changeset: 9ad5c8fa17f7

fbshipit-source-id: d63d889922f507a4a0e2e042e451b95b9591c317
2021-10-08 17:55:28 -07:00
c62ed96496 Revert D30710710: [Pytorch Edge] Support profiling kineto events from external source
Test Plan: revert-hammer

Differential Revision:
D30710710 (c1343ff706)

Original commit changeset: 51399f9b0b64

fbshipit-source-id: ab6bb8fe4e83ed1052e621e427259192a4f0f540
2021-10-08 17:46:18 -07:00
c957d9fdf6 Replace _baddbmm_mkl_ with cpublas::gemm_batched (#66165)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66165

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D31493952

Pulled By: ngimel

fbshipit-source-id: 87cf79036c2d0f4955edbeeeb78f578b0fd223ab
2021-10-08 17:12:14 -07:00
51835bec07 Wextra fix 1 for caffe2 (#66272)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66272

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D31475543

fbshipit-source-id: f6e02d299d0b792ddb37534ad85db82af65bb42a
2021-10-08 16:36:13 -07:00
a28b038af4 [ao_migration] torch/nn/intrinsic: torch.quantization -> torch.ao.quantization (#65903)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65903

This changes the imports in the `caffe2/torch/nn/intrinsic` to include the new import locations.

```
codemod -d torch/nn/intrinsic --extensions py 'torch.quantization' 'torch.ao.quantization'
```

Test Plan: `python test/run_test.py`

Reviewed By: albanD

Differential Revision: D31301195

fbshipit-source-id: a5a9d84cb1ac33df6c90ee03cda3e2f1c5d5ff51
2021-10-08 16:21:23 -07:00
2daae532bd [ao_migration] torch/nn/qat: torch.quantization -> torch.ao.quantization (#65902)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65902

This changes the imports in the `caffe2/torch/nn/qat` to include the new import locations.

```
codemod -d torch/nn/qat --extensions py 'torch.quantization' 'torch.ao.quantization'
```

Test Plan: `python test/run_test.py`

Reviewed By: jerryzh168

Differential Revision: D31301196

fbshipit-source-id: ff237790d74cd3b3b5be642a997810f4f439a1d8
2021-10-08 16:21:21 -07:00
1a6482ee2a [ao_migration] torch/nn/quantizable: torch.quantization -> torch.ao.quantization (#65901)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65901

This changes the imports in the `caffe2/torch/nn/quantizable` to include the new import locations.

```
codemod -d torch/nn/quantizable --extensions py 'torch.quantization' 'torch.ao.quantization'
```

Test Plan: `python test/run_test.py`

Reviewed By: jerryzh168

Differential Revision: D31301194

fbshipit-source-id: 8ce8a3015ea61da62d7658846d1ca64fbdabaf7a
2021-10-08 16:21:19 -07:00
b23709df03 [ao_migration] torch/nn/quantized: torch.quantization -> torch.ao.quantization (#65900)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65900

This changes the imports in the `caffe2/torch/nn/quantized` to include the new import locations.

```
codemod -d torch/nn/quantized --extensions py 'torch.quantization' 'torch.ao.quantization'
```

Test Plan: `python test/run_test.py`

Reviewed By: jerryzh168

Differential Revision: D31301193

fbshipit-source-id: 58efb1ad51a8b441e2a3bd5b91af11eab6b9331f
2021-10-08 16:19:53 -07:00
f1f3bd8c36 Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor" (#65883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65883

Original commit changeset: d8e962b8aab6
ghstack-source-id: 139836954

Test Plan: ci

Reviewed By: zhaojuanmao

Differential Revision: D31299350

fbshipit-source-id: 9ad5c8fa17f7038ba579cb1eda6d9271ac07a130
2021-10-08 16:04:20 -07:00
c1343ff706 [Pytorch Edge] Support profiling kineto events from external source (#64397)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64397

This diff exposes a way to add events to kineto profiler from external
source.
This can be a backend that executes a subgraph and wants to record this
execution in kineto profiler.
This diff also adds "backend" metadata to identify the backend an event
would have executed on.

Test Plan:
test_lite_interpreter

Imported from OSS

Reviewed By: raziel

Differential Revision: D30710710

fbshipit-source-id: 51399f9b0b647bc2d0076074ad4ea9286d0ef3e2
2021-10-08 15:59:42 -07:00
8a02d3e5d0 Wextra fix for Tensorshape.cpp (#66320)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66320

Fixes
```
stderr: caffe2/aten/src/ATen/native/TensorShape.cpp:619:36: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'long' [-Werror,-Wsign-compare]
    for (size_t offset = 0; offset < numel; offset++) {
                            ~~~~~~ ^ ~~~~~
stderr: caffe2/aten/src/ATen/native/TensorShape.cpp:619:36: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'long' [-Werror,-Wsign-compare]
    for (size_t offset = 0; offset < numel; offset++) {
                            ~~~~~~ ^ ~~~~~
```

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D31505374

fbshipit-source-id: 0fc393dacd72a8b29c0d82561f730cc047b38f0c
2021-10-08 15:03:47 -07:00
731cf494f2 Remove cuda/Loops.cuh dependency on native_functions.yaml (#64168)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64168

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30728582

Pulled By: dagitses

fbshipit-source-id: 99dcbb9bb790dd0440d498593ac43e2c18e54a0c
2021-10-08 12:58:52 -07:00
92ce188510 Revert D31445799: [nnc] Use given kernel function name while emitting code
Test Plan: revert-hammer

Differential Revision:
D31445799 (c30dc52739)

Original commit changeset: 8d1642098313

fbshipit-source-id: 6b9d8c816437e9fcba8eb19cc683bc0a46a04cf5
2021-10-08 12:39:01 -07:00
2e6fa0261f Revert D31445797: [nnc] Added a cache to use singleton instances of PytorchLLVMJIT for every triple,cpu,attrs combination
Test Plan: revert-hammer

Differential Revision:
D31445797 (7e5ef5e517)

Original commit changeset: 4e1450100928

fbshipit-source-id: fc13b34dbb66c7a22816eb46cf6d98ae9f332d39
2021-10-08 12:38:59 -07:00
097fdcdf0c Revert D31445798: [Static Runtime] Cleanup LLVMCodeGen memory after code gen completes
Test Plan: revert-hammer

Differential Revision:
D31445798 (40dd2711b6)

Original commit changeset: c860d36456b2

fbshipit-source-id: 64d900cad87113e6b871aedd6669e771a7ede5cc
2021-10-08 12:37:48 -07:00
0be36d798b Remove Tensor.h include from TensorIterator.h (#64167)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64167

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D30728579

Pulled By: dagitses

fbshipit-source-id: 3888da00c9c8030013c8f4b39d300fe671defb05
2021-10-08 12:28:37 -07:00
bc1dec9b81 Migrate THCStorage_resizeBytes to ATen (#65944)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65944

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31386276

Pulled By: ngimel

fbshipit-source-id: a2b28bc09d11a856fdd3796d3df6f96613f13437
2021-10-08 11:50:52 -07:00
3bad54069b Concatting multiple linear layers with same input Tensor (different weight/bias) (#63198)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63198

Linear layers using the same input tensor can be concatted together
as long as the weights and biases are compatible.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31240642

fbshipit-source-id: 1e78daa6b89822412ba2513d326ee0e072ceff1e
2021-10-08 10:55:46 -07:00
94845fc44e [jit] Implement one-argument AliasDb::mayContainAlias more efficiently (#65177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65177

There is no need to heap-allocate any vectors in this case.
ghstack-source-id: 140052520

Test Plan:
CI

Startup for static runtime on ctr_mobile_feed local net decreased from 7.8s to about 7.0s

Reviewed By: malfet

Differential Revision: D30984194

fbshipit-source-id: 85091e55445f653ec728b27da4b459a2f1873013
2021-10-08 10:29:25 -07:00
c80693f7e6 [jit] Add cache for MemoryDAG::collectAllContainedMemoryLocations (#65122)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65122

Failure to cache this seems to contribute to quadratic startup time for the static runtime.

Disclaimer: I am entirely un-versed in the performance considerations for the JIT and have no idea what the other impacts of this change may be. Let the reviewer beware.
ghstack-source-id: 140052522

Reviewed By: suo

Differential Revision: D30983268

fbshipit-source-id: 4329aee6b5781f5c2e2d2334c396fab8528d4b7b
2021-10-08 10:29:23 -07:00
3ef69a4598 [static runtime] Pre-allocate hash tables (#65343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65343

No reason not to save a bit on re-hashing.
ghstack-source-id: 140052518

Test Plan:
CI

Static runtime startup seems to go from 5.9-6.0s to 5.8s-6.0s, perf shows less time spent rehashing

Reviewed By: mikeiovine

Differential Revision: D31027362

fbshipit-source-id: 39dd53ecd462693b518535856ddd92df78a4977b
2021-10-08 10:28:13 -07:00
0020a151c6 slow_conv3d grad_weight: call gemm directly (#65759)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65759

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D31257873

Pulled By: ngimel

fbshipit-source-id: 1612c0be10b2aa269c807c7b9f5470172ed68dc1
2021-10-08 09:55:08 -07:00
dfb64b3287 log API usage for fsdp API in PyTorch (#64964)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64964

log API usage for fsdp API in PyTorch

Test Plan: unit test

Reviewed By: rohan-varma

Differential Revision: D30915734

fbshipit-source-id: 5e3b335327f4a3ff59b025e8e17a0fa0b7f6597d
2021-10-08 09:32:59 -07:00
201174cb91 Revert D31389480: [pytorch][PR] Allow external CUDA streams to be set as current
Test Plan: revert-hammer

Differential Revision:
D31389480 (61f0bb70c1)

Original commit changeset: 2b2f40e5452c

fbshipit-source-id: c6631e51abcf3819732f981f646cb77b91569c7d
2021-10-08 09:20:24 -07:00
b72a1782d8 [PG Wrapper][BE] Add collective information when monitored barrier error is (#66167)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66167

Sometimes due to desync we see PG wrapper monitored barrier fail. In
this case it would be useful to print the info about the collective that was
trying to run along with the actual error.
ghstack-source-id: 140037653

Test Plan: CI

Reviewed By: cbalioglu

Differential Revision: D31353021

fbshipit-source-id: e2a515326c9314c98119978d5566eb5431cca96c
2021-10-08 09:14:24 -07:00
b5b1d49a66 [PG Wrapper][BE] Make some methods private (#66166)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66166

These methods should be private.
ghstack-source-id: 139782587

Test Plan: CI

Reviewed By: cbalioglu

Differential Revision: D31353020

fbshipit-source-id: 583fb315cc2cacc37df3d29cd5793b42558930b3
2021-10-08 09:13:02 -07:00
0cad2c0615 Move intraop_launch_future from Parallel.h (#64166)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64166

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30728585

Pulled By: dagitses

fbshipit-source-id: 75a41418ae9218bec9bac27597051295222b6eee
2021-10-08 09:07:35 -07:00
2d885ab73d [jit] Reduce refcounting of Types (#65345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65345

FooType::get() can return a const reference. Inconveniently, converting shared_ptr<FooType> to shared_ptr<Type> requires a copy & refcount bump, so to properly take advantage of this in unshapedType() we need to take a const Type& in isSubtypeOf(), which is good practice anyway -- don't require a shared_ptr if you don't need to take ownership.
ghstack-source-id: 140044165

Test Plan:
CI

perf says c10::unshapedType time decreased from 2.8% to 2.2% during static runtime startup, though I expect this to be generally beneficial.

Reviewed By: hlu1

Differential Revision: D31027361

fbshipit-source-id: 676feb81db9f74ad7b8651d8774f4ecb4cfa6ab8
2021-10-08 09:03:04 -07:00
1ae468a484 [jit] Refcounting spot fixes (#65346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65346

Tidying up the top sources of reference count decrements seen during static runtime startup.
ghstack-source-id: 140027349

Test Plan:
CI

perf now shows under 2% time spend in ~__shared_count instead of about 5%.

Reviewed By: suo

Differential Revision: D31057277

fbshipit-source-id: 9a16daf2e655fda80d4ec21290b30f02ba63d8da
2021-10-08 08:39:20 -07:00
8ebe1a924d [DataPipe] moving mux IterDataPipe test to the right location (#66277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66277

Previously, it is grouped together with tests related to `MapDataPipe`, but it should be with `IterDataPipe`.

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31485823

Pulled By: NivekT

fbshipit-source-id: d13d8c28cbfc305da0e3033d4109a0f971281a02
2021-10-08 08:32:29 -07:00
ed17851642 [DataPipe] adding test for IterableWrapperIterDataPipe (#66276)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66276

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31485824

Pulled By: NivekT

fbshipit-source-id: c7b21636e4b17e264bfb5dbea69cd3c477472f0b
2021-10-08 08:32:26 -07:00
e808e3d3d6 [DataPipe] adding SequenceWrapperMapDataPipe (#66275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66275

Once this is added to Core, TorchData's PR will not need a custom class and can use this wrapper instead.

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31485822

Pulled By: NivekT

fbshipit-source-id: 790de27629c89c0ca7163a8ee5a09ee8b8233340
2021-10-08 08:32:24 -07:00
a7cc07f109 quantized embedding: make error message clearer (#66051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66051

Make the error message clearer when quantized embedding is converted
with an unsupported dtype. This is helpful when debugging quantization
errors on new models.

Test Plan:
```
class M(nn.Module):
    def __init__(self):
        super().__init__()
        self.embedding = nn.Embedding(1, 1)

m = M().eval()
m.qconfig = torch.quantization.QConfig(
    activation=torch.quantization.MinMaxObserver.with_args(dtype=torch.qint8),
    weight=torch.quantization.MinMaxObserver.with_args(dtype=torch.qint8))
m.embedding.qconfig = m.qconfig
mp = torch.quantization.prepare(m)
mq = torch.quantization.convert(m)
// error message now includes the incorrect dtype
```

Imported from OSS

Reviewed By: dagitses

Differential Revision: D31472848

fbshipit-source-id: 86f6d90bc0ad611aa9d1bdae24497bc6f3d2acaa
2021-10-08 08:32:22 -07:00
c9aba3b128 make error message when trying to quantize non floats more specific (#66050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66050

Adds the dtype to an error message when trying to quantize something
other than a float.  This is useful for debugging quantization tools on
new models.

Test Plan:
```
x = torch.randn(1, 1, 1, 1, dtype=torch.double)
xq = torch.quantize_per_tensor(x, 0.01, 0, torch.quint8)
// error message now includes Double
```

Imported from OSS

Reviewed By: dagitses

Differential Revision: D31472849

fbshipit-source-id: 2331ffacefcbc6f8eca79694757d740de74a0f1d
2021-10-08 08:32:19 -07:00
81660c08f0 quantized add: enable broadcasting (#66049)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66049

Enables quantized add with broadcasting. As pointed out by jamesr66a,
this was disabled but TensorIterator already supports it. Added a test
case to verify.

Test Plan:
```
python test/test_quantization.py TestQuantizedOps.test_qadd_broadcast
```

Imported from OSS

Reviewed By: dagitses

Differential Revision: D31472850

fbshipit-source-id: a3b16d9000487918db743525d22db6864330762b
2021-10-08 08:31:07 -07:00
ece0221854 Rename int to long, add more C++ types. (#66108)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66108

BC-breaking change: intT is now longT (which aligns it more accurately with how
the types are referred to in C++).  The benefit for this is we can idiomatically
express all C++ dtypes (with intT now mapping to int32_t).  These types are needed
for ufunc codegen in a latter patch.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31385761

Pulled By: ezyang

fbshipit-source-id: ec6f3a0953794313470dbe14911f23ac116be425
2021-10-08 08:25:06 -07:00
11bc435622 Allow registration of custom symbolics for prim namespace (#64460) (#66139)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66139

[ONNX] Add prim::PythonOp check back in export.cpp (#64944)

Add prim::PythonOp check back in export.cpp

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31424102

fbshipit-source-id: 6d2eef767fab846ed79ea509e97b714072bac9f4

Co-authored-by: jiafatom <jiafa@microsoft.com>
2021-10-08 07:41:06 -07:00
9b09a5f7ba [ONNX] Enable scripting tests (#64780) (#66138)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66138

* Scripting tests

* Fixed scripting tests for lower opsets

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31424099

fbshipit-source-id: 67095b7ac67b9da986961788392aa92c95cf11f2
2021-10-08 07:41:03 -07:00
53fefaa916 [ONNX] Fix duplicated output same name case (#64190) (#66137)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66137

* fix duplicated output node same output name issue.

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31424100

fbshipit-source-id: b1b06a92c51744030788b651f3a597d987a8deda

Co-authored-by: hwangdeyu <dejack953@outlook.com>
2021-10-08 07:41:01 -07:00
4af47eb3a7 [ONNX] Update slice process shape to support rank only inference (#65782) (#66149)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66149

Updated logic will be able to infer rank of slice output, when only rank is known for slice input. Enables cases where `ConstantValueMap::HasRank(input)` is `True`, while `ConstantValueMap::HasShape(input)` is `False`.

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31423232

Pulled By: ezyang

fbshipit-source-id: 516e3916aa71afda2b10e44620636e42ed837236

Co-authored-by: BowenBao <bowbao@microsoft.com>
2021-10-08 07:39:40 -07:00
dc37547c44 Opinfos for avg_pooling (#64214)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64214

Added OpInfos for:
- F.adapative_avg_pool{1, 3}d
- F.avg_pool{1, 3}d

The 2d variants already had OpInfos.

Test Plan: - run tests

Reviewed By: albanD, mruberry

Differential Revision: D30667797

Pulled By: zou3519

fbshipit-source-id: 53f5cd02070de5b7db4abb017d727376b59288df
2021-10-08 07:26:08 -07:00
8d6d448238 Add HPU for Autograd Fallback (#65605)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65605

Reviewed By: albanD

Differential Revision: D31373899

Pulled By: ezyang

fbshipit-source-id: 894f62dc44b0532f152dc97b839eecfbaed25e8c
2021-10-08 07:21:44 -07:00
4af913a7cf fixed minor issues for index_add in docs (#65806)
Summary:
Hi, I'm looking forward to contributing to PyTorch, so starting with a minor fix in the documentation for `index_add`.

Currently, in the documentation for `index_add_` (please see https://pytorch.org/docs/master/generated/torch.Tensor.index_add_.html#torch.Tensor.index_add_):

1. `tensor` attribute was pointing to `torch.tensor` class, which IMO - is (thought may not be a big deal) unintentional.
2. `dim` attribute is pointing to `torch.Tensor.dim`, which again IMO - is unintentional.

This PR suggests a correction for the first point above, to rename `tensor` attribute to `input` so that it doesn't point to `torch.tensor` class. (I've verified that others ops like `scatter` use `input`, so this should not break the consistency in the documentation). I couldn't find an appropriate fix for the second point above, since renaming `dim` to something else will break the consistency (as almost all others op in PyTorch use `dim` as the attribute name).

I may be wrong here, so please let me know if there is any feedback or an alternate fix for this.

_Note:_ I plan to fix this behavior for `index_copy_` (https://pytorch.org/docs/master/generated/torch.Tensor.index_copy_.html#torch.Tensor.index_copy_) once and if this PR is approved.

To the reviewers, please help me tag the correct person who could help review this PR.

cc: krshrimali mruberry zou3519

cc brianjo mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65806

Reviewed By: dagitses, mruberry

Differential Revision: D31431182

Pulled By: zou3519

fbshipit-source-id: 66ced9677ac3bc71d672d13366f9f567ecea0a2d
2021-10-08 07:17:15 -07:00
61f0bb70c1 Allow external CUDA streams to be set as current (#65914)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65822.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65914

Reviewed By: dagitses

Differential Revision: D31389480

Pulled By: lw

fbshipit-source-id: 2b2f40e5452c5b2a0b9f0f705750d2aa9deb2ead
2021-10-08 06:09:32 -07:00
60fe854f9f [fx2trt] save and load TRTModule for OSS (#65958)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65958

zhxchen17 added `pickle` pybind for trt engine which allows us to save and load a nn.Module with trt engine in fbcode. This diff though is explicitly ser/des engine in __set_state__` and `__get_state__` so that in OSS people can also save and load TRTModule directly.

Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fx2trt

Reviewed By: wushirong

Differential Revision: D31309429

fbshipit-source-id: 9068e2ae6375ed0e1bb55b0e9d582b8d9c049dbf
2021-10-07 22:27:40 -07:00
321345d7c9 Revert "Revert D31227448: [pytorch][PR] fixing sorting in stride indices" (#66176)
Summary:
enabling https://github.com/pytorch/pytorch/issues/63940

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66176

Reviewed By: ngimel

Differential Revision: D31423920

Pulled By: dzhulgakov

fbshipit-source-id: 06b1e0f757f4fb5b31ee1fa464bcd689df919b9c
2021-10-07 22:09:07 -07:00
74477ba243 [fx2trt] More controls over output dtypes (#65959)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65959

Give some more controls over the output dtype of a trt engine. Previously it would be fp16 if we turn on fp16_mode. This diff allows the engine to generate fp32 output with fp16_mode=True.

Test Plan: CI

Reviewed By: kflu, wushirong

Differential Revision: D31243929

fbshipit-source-id: 09c752e6f382d6ad169da66878d9a9277c134869
2021-10-07 22:03:51 -07:00
227f91e72d [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D31495160

fbshipit-source-id: b0a56003a6695989dff0d325cdc118182662ec61
2021-10-07 21:09:22 -07:00
a58ff186e8 [quant][embedding qat] Add basic EmbeddingBag QAT fakeQuant workflow (#65443)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65443

Test Plan: Imported from OSS

Reviewed By: dagitses, supriyar

Differential Revision: D31456445

Pulled By: b-koopman

fbshipit-source-id: 0edda6e272d9005fce65f2ba6a5e6abc831836de
2021-10-07 20:19:29 -07:00
64caee1356 [PyTorch Edge] Leave out field for debug_handle if not being built with eager symbolication support (#66131)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66131

Turns out that a model with 72k instructions causes about 0.5MiB of additional memory overhead (if there's an 8 byte memory overhead per instruction). This is not necessary if we're building w/o eager symbolication support. This change eliminates the 8 byte `debug_handle` if the build is w/o eager symbolication support.
ghstack-source-id: 140045478

(Note: this ignores all push blocking failures!)

Test Plan:
```
buck build -c "pt.enable_eager_symbolication"=1 //xplat/caffe2/fb/lite_predictor:lite_predictor
buck build //xplat/caffe2/fb/lite_predictor:lite_predictor
```

Reviewed By: kimishpatel

Differential Revision: D31387784

fbshipit-source-id: af56787ad833b990a46b79ab021e512edaa22143
2021-10-07 20:01:18 -07:00
ebe530a9cd Periodic jobs should not have CIFLOW_DEFAULT label (#66300)
Summary:
Noticed that `periodic-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-slow-gradcheck` job has a `ciflow/default`, but does not have a `ciflow/scheduled` label
Added asserts to enforce that jobs with non-trival is_scheduled property do not have default and do have scheduled labesl

Rename `periodic-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-slow-gradcheck` to `periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck`

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66300

Reviewed By: seemethere

Differential Revision: D31493323

Pulled By: malfet

fbshipit-source-id: 194c1d7a4e659847d94a547b87a0d7d08e66406d
2021-10-07 19:57:32 -07:00
bd9eee4e65 TBB: Use static partitioner to match OpenMP scheduling (#65327)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65327

Should fix https://github.com/pytorch/pytorch/issues/64571

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D31474116

Pulled By: malfet

fbshipit-source-id: 8c4264d4778c6caf58261e3f70d72decd134128d
2021-10-07 19:12:36 -07:00
d5033410b1 Parallel: Deduplicate parallel functions in different backends (#65326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65326

parallel_for and parallel_reduce currently share some common code in
all backends, specifically for detecting if it should run in parallel
or not. This moves all the backend-specific code into a single
`internal::invoke_parallel` function and makes the `parallel_`
functions common to all backends.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D31124495

fbshipit-source-id: 65c3d2af42a8860cc4d6349566085c9fa8d8c6f0
2021-10-07 19:11:19 -07:00
e1817d895f [BE] Cleanup python_function.cpp (#66296)
Summary:
- Delete unused `var_input_idx`
- Fix `uninitialized variable` clang-tidy warning by setting `PyObject* input` to PyNone

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66296

Reviewed By: janeyx99

Differential Revision: D31491016

Pulled By: malfet

fbshipit-source-id: 08267144be0cd049d122580cdf81cf586c3e30a6
2021-10-07 18:41:17 -07:00
ca363d1e22 docker: Ensure libgnutls30 for all docker builds (#66258)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66258

Installing libgnutls30 has shown to be good when confronted with the
CERT issue related to deb.nodesource.com

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D31477789

Pulled By: seemethere

fbshipit-source-id: f87ae4c098771acc505db14e3982d8858cf7326f
2021-10-07 18:36:40 -07:00
38f5144eae Fix https://github.com/pytorch/pytorch/issues/61982 (#66015)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66015

Fixes https://github.com/pytorch/pytorch/issues/61982 by clone of
tensors in DDPSink. Only applies once for static_graph and generally for unused
params which already has overhead, so perf hit should not be an issue. Will
verify with benchmark.

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D31346633

fbshipit-source-id: 5b9245ade628565cffe01731f6a0dcbb6126029b
2021-10-07 18:11:18 -07:00
20f2e55d4f Rename cuda/Resize.cu to cuda/Resize.cpp (#65943)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65943

These files don't require nvcc to compile.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31386277

Pulled By: ngimel

fbshipit-source-id: 1066ee87fa795e2c7969447fbce1fe2633fb9680
2021-10-07 16:37:51 -07:00
86de09e49a Upgrade to ubuntu:trusty-20190515 (#63468)
Summary:
Security Upgrade to ubuntu:trusty-20190515

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63468

Reviewed By: ngimel

Differential Revision: D31393552

Pulled By: malfet

fbshipit-source-id: 4e2399e3cddc1d549c08c82c08015e00569c19bc
2021-10-07 16:28:08 -07:00
416f593080 [Static Runtime] Group graph nodes into input aliases & output aliases (#65517)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65517

This change retrofits `GetAlwaysAliveValues` into `ValueGroup` to group the values used by a graph into three groups as follows:

- input_aliases:  values that are either inputs or contain aliases of inputs or constants.
- output_aliases: values that are either outputs or contain aliases of outputs and are not in input_aliases.
- Values that dont't show up in input_aliases and output_aliases are internally created consumed within the graph.

`output_aliases` is the only new group introduced by this change, and a following diff will use this to preallocate output Tensors to accelerate Static Runtime's performance.

Test Plan: Added `ValueGroup.Init` to cover the updated code path. Note that there was no test for `GetAlwaysAliveValues` before.

Reviewed By: hlu1

Differential Revision: D30940955

fbshipit-source-id: 2cb065ecda0f447a61e64a7cf70cc7c6947f7dfc
2021-10-07 14:35:12 -07:00
0e2d1b221a [Bootcamp][Pytorch Core] Add testing for complex non-vanilla SGD
Summary: Adding test to ensure non-Vanilla SGD behaves as if complex numbers are two real numbers in R^2 as per issue 65711 on github

Test Plan:
```buck test mode/dev caffe2/test:optim -- 'test_sgd_complex'```

https://pxl.cl/1QLxw

Reviewed By: albanD

Differential Revision: D31477212

fbshipit-source-id: 500678e561a05ac96759223b4c87a37cab26c6a6
2021-10-07 14:07:39 -07:00
5e7d8ec846 Support Registering a Variable Length List of Builtin Modules for torch::deploy Builtin Libraries (#66021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66021

A builtin library consists of a list of frozen modules and a list of builtin modules. For tensorrt, it's quite simple since we only have a single builtin module tensorrt.tensorrt. But it can be complex for libraries like numpy which contains multiple builtin modules (np.core._multiarray_umath, np.random.mtrand etc.) if we want to add it as a torch::deploy builtin. We enhance the macro that registers builtin libraries to accept a variable length of builtin modules. We can use this macro to register frozentorch, frozenpython, tensorrt for now and can also use it to register libraries like numpy later on.

The enhanced macro now looks as follows. Although we don't need to worry about back-compatibility for now,  but this enhanced version is fully compatible with the previous version. The previous version is just a special case when the library contains no builtin modules.

 ```
REGISTER_TORCH_DEPLOY_BUILTIN(library_name_without_quote, frozen_modules_list,
    builtin_module_name_1, builtin_module_init_function_1, ...,
    builtin_module_name_N, builtin_module_init_function_N)
```
ghstack-source-id: 140007970

Test Plan:
1. Play around with interactive_embedded_interpreter.cpp to import torch._C, tensorrt.tensorrt etc inside the embedded interpreter.
2. Enhance test_builtin_registry.cpp
3. Run test_deploy.cpp and test_deploy_gpu.cpp

Reviewed By: suo

Differential Revision: D31349390

fbshipit-source-id: 70a1fcf660341180fc4d5195aed15ceb07c2bef7
2021-10-07 13:23:46 -07:00
40dd2711b6 [Static Runtime] Cleanup LLVMCodeGen memory after code gen completes (#66218)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66218

This stack of diffs reduces the memory used by LLVMCodeGen object.

Here are the numbers on model `294738512`: (this is the number reported as `Memory turnover after freeze_module:` in the output)

```
Before: 123343496
After : 121566008
```

So, there is a reduction of about `~1.77MB` with this change of making `PytorchLLVMJIT` a singleton.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM, hlu1

Differential Revision: D31445798

Pulled By: navahgar

fbshipit-source-id: c860d36456b2c5d3e21010c1217e2948326f666d
2021-10-07 13:17:13 -07:00
7e5ef5e517 [nnc] Added a cache to use singleton instances of PytorchLLVMJIT for every triple,cpu,attrs combination (#66217)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66217

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D31445797

Pulled By: navahgar

fbshipit-source-id: 4e1450100928132ccce4ef3c6c20ad6661cfabed
2021-10-07 13:17:11 -07:00
c30dc52739 [nnc] Use given kernel function name while emitting code (#66216)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66216

Test Plan: Imported from OSS

Reviewed By: dagitses, priyaramani

Differential Revision: D31445799

Pulled By: navahgar

fbshipit-source-id: 8d164209831339d364710b14f6a263a16e108281
2021-10-07 13:15:46 -07:00
3cc40253d9 add gather to ShardedTensor (#65671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65671

Tentative implementation to use dist.gather_object to collect shards from all ranks and then "merge" them. The merge is done on dst_rank though padding the sharded tensors into the size of full tensor based on their metadata (offsets, lengths) first, and then summing these padded tensors together.

Also considered concatenating sharded tensor without padding to minimize memory footprint (assuming padding will increase memory). But it may not be flexible enough for arbitrary sharing (e.g. shard on multiple directions)

Another way can be constructing the padded tensor on each rank and reduce to rank0. I feel this is the most easy implementation. But it will invoke higher memory usage and comm payload. Please let me know if this alternative is preferred.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23

Test Plan:
Imported from OSS

  python test/distributed/_sharded_tensor/test_sharded_tensor.py -v -k test_gather

did not manage to test on oss, but tested in fbcode by reserving on demand gpu

  arc patch D31197611

modify the test with 2 gpus as on-demand gpu only has 2 cores (D31227986)

   buck test -c fbcode.enable_gpu_sections=true mode/dev-nosan caffe2/test/distributed/_sharded_tensor:sharded_tensor -- test_gather

   buck-out/gen/caffe2/test/distributed/_sharded_tensor/sharded_tensor#binary.par  test_sharded_tensor.TestShardedTensorChunked.test_gather

{F667213605}

Reviewed By: dagitses, pritamdamania87

Differential Revision: D31197611

Pulled By: dracifer

fbshipit-source-id: cf98b4a2d7838b11b9582eb23f826bb0fa38a7f4
2021-10-07 13:01:12 -07:00
f445ed19b2 OpInfo for 2d fft functions (#66128)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66128

cc mruberry peterbell10

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D31450217

Pulled By: mruberry

fbshipit-source-id: 1952fc60c5d5f454966c43f5710b8b97a9794d0e
2021-10-07 12:50:06 -07:00
2213c463ba C++ API and docs for hfftn (#66127)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66127

cc mruberry peterbell10

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D31450216

Pulled By: mruberry

fbshipit-source-id: 2878aee294aa7d74482b66d536258bac0541408d
2021-10-07 12:48:36 -07:00
e6a4f746c2 slow_conv3d: Use at::sum for grad_bias accumulation (#65758)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65758

The same change has been made in conv2d, the proper algorithm is both
faster and gives more precision.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31257872

Pulled By: ngimel

fbshipit-source-id: 6ff3a7a00a05b66f83d45cc820bd0c230cb8de6d
2021-10-07 12:20:49 -07:00
2e4e5b0264 Add inplace_variant for resize_ OpInfo (#66135)
Summary:
Enable testing of `torch.Tensor.resize_`.
The negative view test is skipped as the test doesn't work with resize_ see
https://github.com/pytorch/pytorch/issues/65945.

cc mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66135

Reviewed By: dagitses

Differential Revision: D31444263

Pulled By: mruberry

fbshipit-source-id: 00c7fe05df28fba01508b31adb3ed4fdcf4d0326
2021-10-07 12:00:30 -07:00
361b34eb81 Chunk: acc_ops (#66010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66010

Added chunk acc op and unit test.

Removed misleading return statements.

Test Plan: buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer

Reviewed By: 842974287

Differential Revision: D31326490

fbshipit-source-id: 81183ad8773eb7471566bec07cdd3dd6c4cee217
2021-10-07 11:41:00 -07:00
9fb6ba24e7 Update torch.fx.passes.split_module docstring (#65542)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65542

Add docstring for torch.fx.passes.split_module that conforms to Google Python Style conventions.

Changed original example to the example from this diff:
https://www.internalfb.com/diff/D24925283 (9734c042b8)

Test Plan:
Ran buck test //caffe2/test:fx. No errors detected
https://pxl.cl/1QCch

Reviewed By: jamesr66a

Differential Revision: D31145694

fbshipit-source-id: 8e54f3b1be3dca1c4d414fdeeab71b9f2b5d9f3e
2021-10-07 10:37:10 -07:00
d5f64afc38 [Static Runtime] Support aten::to.prim_dtype overload (#64928)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64928

Added support this overload of `aten::to`:
```
aten::to.prim_dtype(Tensor(a) self, int? dtype, bool non_blocking=False, bool copy=False) -> Tensor(a|b)
```

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- IndividualOps_to`

Reviewed By: hlu1

Differential Revision: D30901398

fbshipit-source-id: 38ce807c30185e92dd472b404b362f22ac7e4efb
2021-10-07 10:22:44 -07:00
a8c0b362ce [pytorch][PR] Add hash and int128 utils for Lazy Tensor Core" (#66181)
Summary:
These utils are prerequisites for Lazy Node base class.
- set up new torch/csrc/lazy, test/cpp/lazy dirs
- add source files to build_variables.bzl in new lazy_core_sources var
- create new test_lazy binary

Fixes https://github.com/pytorch/pytorch/issues/65636

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66181

Original commit changeset: 3d0d5377d71e

Test Plan:
Run PyTorch XLA corresponding PR in XLA CI:
https://github.com/pytorch/xla/pull/3148/files

Reviewed By: suo

Differential Revision: D31416438

fbshipit-source-id: 58a6a49c5bc30134bc6bae2e42778f359b9a8f40
2021-10-07 10:05:26 -07:00
61fca037d6 [Part 1] upstreaming fairscale fsdp to PyTorch -- sharding, core data flow and hooks (#63881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63881
This PR includes the minimal sets of features to make FSDP work, like sharding, core data flow and hooks. More tests will be added in the follow up PRs. Tests are refactored to utilize common PyTorch utils. Codes are also refactored a little bit. Alternative ways to replace ".data" usage in this PR are still being discussed offline.

Test Plan: unit tests

Reviewed By: mrshenli

Differential Revision: D30521673

fbshipit-source-id: 9a23390dd7c925749604c6860e08fbe39ddc5500
2021-10-07 09:06:44 -07:00
88f8944ef1 Revert D30599136: [Pytorch Edge][tracing-based] build tracer in OSS
Test Plan: revert-hammer

Differential Revision:
D30599136 (eeaf527feb)

Original commit changeset: 102f23fb652c

fbshipit-source-id: 8ac3d75a52d06a5c4196bae2db1c4df2d5c5c666
2021-10-07 08:34:23 -07:00
2f1ab477f1 Speed up DataTypeToTypeMeta (#66113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66113

For a benchmark compiled in opt-mode in which the lookup items were shuffled and then the items were looked up round-robin fashion 10M times (for a total of 140M lookups) compiled in opt-mode we see:
```
Function           Container            Time (ms) Multiplier
TypeMetaToDataType if-chain                   233         1x
TypeMetaToDataType std::vector                795      3.41x
TypeMetaToDataType std::map                  1566      6.72x
TypeMetaToDataType std::unordered_map        2136      9.17x

DataTypeToTypeMeta switch                     102         1x
DataTypeToTypeMeta std::vector                666      6.53x
DataTypeToTypeMeta std::map                  1212      11.9x
DataTypeToTypeMeta std::unordered_map        1539      15.1x
DataTypeToTypeMeta folly::F14FastMap         1789      17.5x
```
From this, we draw two conclusions:
1. Using a complex container like `std::map` is worse than using a simple vector lookup here (there aren't enough items for the Big-O to assert itself).
2. Using any container at all is a mistake. (Unless we pull in more exotic reasoning like invalidating the code cache or preventing inlining.)

Test Plan: Sandcastle

Reviewed By: dzhulgakov

Differential Revision: D31375117

fbshipit-source-id: 0b310c6c2e94080d125c82fb7c2b43ab869adbcb
2021-10-07 08:06:09 -07:00
1e4bcbdddb [Bootcamp][Pytorch Core] Add test for complex numbers for vanilla SGD (#66230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66230

Adding test to ensure Vanilla SGD behaves as if complex numbers are two real numbers in R^2 as per issue 65711 on github
https://github.com/pytorch/pytorch/issues/65711
ghstack-source-id: 139918862

Test Plan:
```buck test mode/dev caffe2/test:optim -- 'test_sgd_complex'```

https://pxl.cl/1QHvX

Reviewed By: albanD

Differential Revision: D31449289

fbshipit-source-id: da8b00421085796a23b643e73f96b19b5b560a32
2021-10-07 07:14:05 -07:00
057a01556c [Static Runtime] Do not use variadic_sigrid_transforms_torch_bind if out variant is disabled (#66221)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66221

JIT doesn't have an implementation for this op, so we can only use it when out variants are enabled.

Reviewed By: hlu1

Differential Revision: D31445887

fbshipit-source-id: 4565ac4df751d8ee4052647574c43efa05ea1452
2021-10-07 06:57:17 -07:00
dcf39f9bb9 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D31464823

fbshipit-source-id: 37bd72c8f1c8240d2ae72385a0707003ddb24ce8
2021-10-07 04:17:48 -07:00
df11e2d6f9 (torch/elastic) add fqdn hostname to error printout (#66182)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66182

closes https://github.com/pytorch/pytorch/issues/63174

Does a few things:

1. adds hostname to the error report
2. moves the "root cause" section to the end (presumably since the logs are being "tailed" we want the root cause to appear at the end)
3. moves redundant error info logging to debug
4. makes the border max 60 char in length and justifies left for the header

NOTE: YOU HAVE TO annotate your main function with torch.distributed.elastic.multiprocessing.errors.record, otherwise no traceback is printed (this is because python exception propagation does NOT work out of the both for IPC - hence the extra record annotation).

Test Plan:
Sample

```
============================================================
run_script_path FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2021-10-05_17:37:22
  host      : devvm4955.prn0.facebook.com
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 3296201)
  error_file: /home/kiuk/tmp/elastic/none_3_lsytqe/attempt_0/0/error.json
  traceback :
  Traceback (most recent call last):
    File "/tmp/jetter.xr3_x6qq/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 372, in wrapper
      return f(*args, **kwargs)
    File "main.py", line 28, in main
      raise RuntimeError(args.throws)
  RuntimeError: foobar

============================================================
```

Reviewed By: cbalioglu, aivanou

Differential Revision: D31416492

fbshipit-source-id: 0aeaf6e634e23ce0ea7f6a03b12c8a9ac57246e9
2021-10-07 01:40:02 -07:00
8a974a482c [quant] Add support for quantization of Embedding{Bag} in dynamic quant APIs (#65674)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65674

Before this PR user had to use the eager mode static quantization APIs to quantize Embedding/EmbeddingBag modules.
With this PR they can use either the static or dynamic quantization APIs for Embedding quantization

The only qconfig supported for embedding quantization is float_qparams_weight_only_qconfig whcih is currently enforced in the from_float
method of the quantized Embedding/Embedding modules.

To combine embedding quantization with Linear dynamic quantization, user can use the qconfig_dict to specify different qconfig for each module type.

The prepare/convert APIs can still be used to quantize Embeddings, with the caveat that user need to ensure input to Embedding ops are FP32.

Addresses Issue #65185
ghstack-source-id: 139935419

Test Plan:
python test/test_quantization.py

Imported from OSS

Reviewed By: gchanan

Differential Revision: D31211199

fbshipit-source-id: 8c747881caee5ccbf8b93c6704b08d132049dea4
2021-10-06 23:19:38 -07:00
115526cc88 GELU Converter (#66008)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66008

Added GELU converter and updated TARGET file of deeplearning/trt/fx2trt to load the plugins onto the converters

Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_gelu

Reviewed By: 842974287

Differential Revision: D31284144

fbshipit-source-id: 0e938a47a99d289aefc3308aec3937c7334e9b8a
2021-10-06 22:25:43 -07:00
ac0dbd6eec Promote missing ops for delegated models (#66052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66052

`aten::__getitem__.Dict_str` and `prim::unchecked_cast` are used in delegate API.

ghstack-source-id: 139860350

Test Plan: CI

Reviewed By: pavithranrao

Differential Revision: D31364720

fbshipit-source-id: dfca5e3ded4cdd3329c9b9d80a13f0fb1f5f2a51
2021-10-06 21:48:42 -07:00
3f30526ff2 Remove THCAllocator (#65942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65942

This one is a bit weird. The class is called `THCIpcDeleter` but it
actually has nothing IPC-specific. It just converts
`std::shared_ptr` + `void*` into a `c10::DataPtr`. Instead, moving
the `DataPtr` conversion into the actual IPC code allows 2 memory
allocations to be elided by merging 3 separate deletion contexts
into one.

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D31386278

Pulled By: ngimel

fbshipit-source-id: 5722beed9dcf680f0eb6bbff30405cff47b21962
2021-10-06 19:04:43 -07:00
eeaf527feb [Pytorch Edge][tracing-based] build tracer in OSS (#64087)
Summary:
1. Introduce
```
MobileModelRunner.h
MobileModelRunner.cpp
TensorUtils.h
TensorUtils.cpp
```
in external. They are pretty much the same as internal, except namespace and the dependency in folly. In next prs, TensorUtils and MobileModelRunner are unified between external and internal.
2. Introduce
```
tracer.cpp
```
for external. Majority is the same as internal one, with some cleanup on unnecessary dependency. It's unified between internal and external in next change.
3. Add an executable to build the tracer. It will be built for desktop only.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64087

ghstack-source-id: 139900300

Test Plan:
Given the model
```
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.lin = nn.Linear(10, 1)
    def forward(self, x):
        return self.lin(x)

model = Net()
scripted_module = torch.jit.script(model)
example_dict = {'a' : 1, 'b' : 2}
sample_input = {
    scripted_module.forward : [(torch.zeros(1,10),)],
}

bundled_model = torch.utils.bundled_inputs.bundle_inputs(scripted_module, sample_input)
bundled_model._save_for_lite_interpreter("dummy_model_with_bundled_input.ptl")
```
External tracer
```
./build/bin/model_tracer --model_input_path "/Users/chenlai/Documents/pytorch/tracing/dummy_model_with_bundled_input.ptl" --build_yaml_path  "/Users/chenlai/Documents/pytorch/tracing/tmp.yaml"
```
and compare `tmp.yaml` with the operator list generated from
Internal tracer
```
./fbcode/caffe2/fb/model_tracer/run_model_with_bundled_inputs.sh ~/local/notebooks/prod_models/dummy_model_with_bundled_input.ptl
```
QNNPACK only:
Example yaml from internal tracer:  P460742166 [devserver]
Example yaml from external tracer: P460759099 [mac], P460742166 [devserver]

Comparison ops between internal and external on devserver:

{F666923807}

{F666924048}

Note: The operators generated on mac and devservers are different, the one on deserver includes two extra ops: `aten::addmm_, aten::slow_conv_dilated2d"`. Based on the traced list, when calling `aten::_convolution`, one calls `aten::mkldnn_convolution`, and the other calls `aten::_convolution_nogroup`, causing the divergence.

Thanks for Martin for pointing out:
> mkldnn is another backend from Intel

Reviewed By: dhruvbird

Differential Revision: D30599136

fbshipit-source-id: 102f23fb652c728a9ee4379f9acc43ae300d8e8a
2021-10-06 19:01:04 -07:00
0cab25468d [Pytorch Edge][tracing-based] reorganize model tracer dependency (#63421)
Summary:
1. move 4 files to :
```
KernelDTypeTracer.h
KernelDTypeTracer.h
OperatorCallTracer.h
OperatorCallTracer.h
```
so it's visible in OSS.

2. Update the namespace to `torch::jit::mobile`
3. Add a `fb_xplat_cxx_library` `torch_model_tracer` with the source file list above.
4. update the `fb_xplat_cxx_library`  `model_tracer_lib` dependency on the new `torch_model_tracer` library

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63421

ghstack-source-id: 139900299

Reviewed By: dhruvbird

Differential Revision: D30378069

fbshipit-source-id: d56c6140e951bc13113a76d6b63767a93843c842
2021-10-06 18:59:50 -07:00
300613dc60 make FX symbolic tracing reuse buffers if they're the same (#66211)
Summary:
Currently, if the same tensor constant is reused multiple times, we'll store a tensor constant for each time we use it.

For example
```
val = torch.randn(5)
for _ in range(10):
    x = x + val
```
ends up storing 10 tensor constants.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66211

Reviewed By: jamesr66a

Differential Revision: D31437089

Pulled By: Chillee

fbshipit-source-id: 401169c8d58ce0afb7025ae11060680ef544419f
2021-10-06 18:35:38 -07:00
67970e8c9b Add CI tests for AOT Compile (#65441)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65441

Adding CI test to verify a simple linear model can compile fine.
Successful run from CI logs:

```
+ test_aot_model_compiler
+ echo 'Testing AOT model compiler'
Testing AOT model compiler
+ source test/mobile/nnc/test_aot_compile.sh
+++ python -c 'import site; print(site.getsitepackages()[0])'
++ TORCH_INSTALL_DIR=/opt/conda/lib/python3.6/site-packages/torch
++ TORCH_BIN_DIR=/opt/conda/lib/python3.6/site-packages/torch/bin
+++ dirname test/mobile/nnc/test_aot_compile.sh
++ CURRENT_DIR=test/mobile/nnc
++ MODEL=aot_test_model.pt
++ COMPILED_MODEL=aot_test_model.compiled.pt
++ COMPILED_CODE=aot_test_model.compiled.ll
++ test_aot_model_compiler
++ python test/mobile/nnc/aot_test_model.py
++ exit_code=0
++ [[ 0 != 0 ]]
++ /opt/conda/lib/python3.6/site-packages/torch/bin/test_aot_model_compiler --model aot_test_model.pt --model_name=aot_test_model --model_version=v1 --input_dims=2,2,2
The compiled model was saved to aot_test_model.compiled.pt
++ success=1
++ '[' '!' -f aot_test_model.compiled.pt ']'
++ '[' '!' -f aot_test_model.compiled.ll ']'
++ '[' -f aot_test_model.compiled.ll ']'
++ rm aot_test_model.compiled.ll
++ '[' -f aot_test_model.compiled.pt ']'
++ rm aot_test_model.compiled.pt
++ rm aot_test_model.pt
++ '[' 1 = 0 ']'
+ [[ linux-xenial-py3.6-gcc5.4-default == pytorch-linux-xenial-py3* ]]
+ assert_git_not_dirty
+ [[ linux-xenial-py3.6-gcc5.4-default != *rocm* ]]
+ [[ linux-xenial-py3.6-gcc5.4-default != *xla* ]]
++ git status --porcelain
+ git_status=
+ [[ -n '' ]]
+ test_custom_script_ops
```

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D31348169

Pulled By: priyaramani

fbshipit-source-id: dd5c55859dfa07d150e5decc2dd7e56f43e7f66b
2021-10-06 18:23:19 -07:00
6c54971cd9 Open Registration for torch::deploy Builtins (#65953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65953

Previously if people want to add a torch::deploy builtin, they need to change torch::deploy internal code (interpreter_impl.cpp) to register the python part as frozen modules and C++ part as builtin modules. This is not convenient and error prone. We want to add open registration support for torch::deploy builtins so that people only need to add one effective line of code in there *library code* to complete the registration.

Here is an example to registry numpy as torch::deploy builtins:
  REGISTER_TORCH_DEPLOY_BUILTIN(numpy, numpy_frozen_modules, <list of name, PyInit function pairs>)

This diff supports open registration of frozen modules. It's the first step to achieve the plan above.
ghstack-source-id: 139888306

Test Plan: Run tests in test_deploy.cpp and test_builtin_registry.cpp

Reviewed By: suo

Differential Revision: D31321562

fbshipit-source-id: 6445bd8869f1bb7126b4c96cf06c31145f0e9445
2021-10-06 18:04:57 -07:00
213c3f45da [oss/ci] skip TestDataLoaderPersistentWorkers on ASAN (#66236)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66236

it's flaky, see https://github.com/pytorch/pytorch/issues/66223

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31462056

Pulled By: suo

fbshipit-source-id: f4362a8020dc05ac8856706c0508d48be026eeb8
2021-10-06 17:56:19 -07:00
4937218611 [torch][launch] Add ability to override sys.executable for torch.distributed.run (#66179)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66179

The diff adds check for `PYTHON_EXEC` environment variable. If the variable is set, it will override `sys.executable` for `torch.distibuted.run`.
This means that  if `PYTHON_EXEC` is set, user scripts executed via `torch.distributed.run` will start via value of `os.environ["PYTHON_EXEC"]`

Test Plan: unittest

Reviewed By: kiukchung

Differential Revision: D31329003

fbshipit-source-id: b9d0167d99bbf463a6390f508324883ca4a1e439
2021-10-06 17:33:19 -07:00
e8837d741e [Vulkan] cat operator for height dimension (#66103)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66103

Implemented `cat` operator for height dimension

Test Plan:
On Mac
```
cd ~/fbsource
buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac
./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64

[ RUN      ] VulkanAPITest.cat_dim2_sameheight_success
[       OK ] VulkanAPITest.cat_dim2_sameheight_success (272 ms)
[ RUN      ] VulkanAPITest.cat_dim2_diffheight_success
[       OK ] VulkanAPITest.cat_dim2_diffheight_success (161 ms)
[ RUN      ] VulkanAPITest.cat_dim2_invalidinputs_exceptions
[       OK ] VulkanAPITest.cat_dim2_invalidinputs_exceptions (235 ms)
```

On Android
```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"

[ RUN      ] VulkanAPITest.cat_dim2_sameheight_success
[       OK ] VulkanAPITest.cat_dim2_sameheight_success (98 ms)
[ RUN      ] VulkanAPITest.cat_dim2_diffheight_success
[       OK ] VulkanAPITest.cat_dim2_diffheight_success (105 ms)
[ RUN      ] VulkanAPITest.cat_dim2_invalidinputs_exceptions
[       OK ] VulkanAPITest.cat_dim2_invalidinputs_exceptions (101 ms)
```

Reviewed By: SS-JIA

Differential Revision: D31323141

fbshipit-source-id: 68b187e856758790cc5f7b0c263feb30a2bb467f
2021-10-06 16:12:59 -07:00
1d586e78c6 *_solve methods: implements forward AD (#65546)
Summary:
This PR adds forward AD for `*_solve` methods.
Additionally, `cholesky_solve` gets OpInfo + a bug fix when wrong leading dimensions could be passed to LAPACK,
and `lu_solve` gets forward AD with 2x`lu_solve` instead of 1x`lu_solve` + 2x`triangular_solve`.

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65546

Reviewed By: dagitses

Differential Revision: D31431847

Pulled By: albanD

fbshipit-source-id: 0e343e0d9da3c3d2051fca215fad289d77275251
2021-10-06 16:04:22 -07:00
78209b93b3 Don't build shared library for AOT Compiler (#66227)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66227

Building a shared library for AOT Compiler is not necessary as it's included in libtorch. Also having this built as a shared library was affecting android builds and we don't need to build AOT Compiler for mobile builds

Before fix:
```
(pytorch)  ~/local/pytorch master
└─ $ ANDROID_NDK=/opt/android_ndk/r20/ BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=armeabi-v7a ./scripts/build_android.sh -DBUILD_BINARY=ON
Build with ANDROID_ABI[armeabi-v7a], ANDROID_NATIVE_API_LEVEL[21]
Bash: GNU bash, version 5.0.11(1)-release (x86_64-redhat-linux-gnu)
Python: 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0]
Caffe2 path: /data/users/priyaramani/pytorch
Using Android NDK at /opt/android_ndk/r20/
.
.
FAILED: lib/libaot_compiler.so
: && /opt/android_ndk/r20/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ --target=armv7-none-linux-androideabi21 --gcc-toolchain=/opt/android_ndk/r20/toolchains/llvm/prebuilt/linux-x86_64 --sysroot=/opt/and
roid_ndk/r20/toolchains/llvm/prebuilt/linux-x86_64/sysroot -fPIC -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -fno-addrsig -march=armv7-a -mt
humb -Wa,--noexecstack -Wformat -Werror=format-security -frtti -fexceptions  -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -
DBUILD_LITE_INTERPRETER -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bound
s -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -W
no-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-typedef-redefinition -Wno-unknown-warning-option -Wno-unused-private-field -Wno-inconsistent-miss
ing-override -Wno-aligned-allocation-unavailable -Wno-c++14-extensions -Wno-constexpr-not-const -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -Wno-unused-but-set-variable -Wno-maybe-uninitialized
-fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -g0 -Oz -DNDEBUG  -Wl,--exclude-libs,libgcc.a -Wl,--exclude-libs,libatomic.a -static-libstdc++ -Wl,--build-id -Wl,--warn-shared-text
rel -Wl,--fatal-warnings -Wl,--exclude-libs,libunwind.a -Wl,--no-undefined -Qunused-arguments -Wl,-z,noexecstack  -rdynamic -shared -Wl,-soname,libaot_compiler.so -o lib/libaot_compiler.so caffe2/torch/CMakeFi
les/aot_compiler.dir/csrc/jit/mobile/nnc/aot_compiler.cpp.o  -latomic -lm && :
caffe2/torch/CMakeFiles/aot_compiler.dir/csrc/jit/mobile/nnc/aot_compiler.cpp.o:aot_compiler.cpp:function at::from_blob(void*, c10::ArrayRef<long long>, c10::TensorOptions const&): error: undefined reference t
o 'at::TensorMaker::make_tensor()'
.
.
caffe2/torch/CMakeFiles/aot_compiler.dir/csrc/jit/mobile/nnc/aot_compiler.cpp.o:aot_compiler.cpp:function torch::jit::mobile::nnc::Function::Function(): error: undefined reference to 'c10::AnyType::get()'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
```

After fix:
```
(pytorch)  ~/local/pytorch master
└─ $ ANDROID_NDK=/opt/android_ndk/r20/ BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=armeabi-v7a ./scripts/build_android.sh -DBUILD_BINARY=ON
Build with ANDROID_ABI[armeabi-v7a], ANDROID_NATIVE_API_LEVEL[21]
Bash: GNU bash, version 5.0.11(1)-release (x86_64-redhat-linux-gnu)
Python: 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0]
Caffe2 path: /data/users/priyaramani/pytorch
Using Android NDK at /opt/android_ndk/r20/
.
.
-- Build files have been written to: /data/users/priyaramani/pytorch/build_android
Will install headers and libs to /data/users/priyaramani/pytorch/build_android/install for further Android project usage.
[2/3] Install the project...
-- Install configuration: "Release"
Installation completed, now you can copy the headers/libs from /data/users/priyaramani/pytorch/build_android/install to your Android project directory.
```

Test Plan: Imported from OSS

Reviewed By: ljk53, axitkhurana

Differential Revision: D31450970

Pulled By: priyaramani

fbshipit-source-id: 87e48033f1db46fef112bae1239a09a2365620d2
2021-10-06 15:57:32 -07:00
4a50b6c490 fix cosine similarity dimensionality check (#66191)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66086

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66191

Reviewed By: dagitses, malfet

Differential Revision: D31436997

Pulled By: ngimel

fbshipit-source-id: 363556eea4e1696d928ae08320d298451c286b10
2021-10-06 15:44:51 -07:00
05e1476d49 [jit] Fix list copy in MemoryDAG (#65176)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65176

getElements returns a reference.
ghstack-source-id: 139745230

Test Plan:
CI

Static runtime startup for ctr_mobile_feed local net reduced from 8.35s to 7.8s

Reviewed By: malfet

Differential Revision: D30983898

fbshipit-source-id: 884bff40f12322633c0fffd45aed5b8bc7498352
2021-10-06 15:39:33 -07:00
fc4836f400 [Fix] Use full name to look for the promoted prim operator table (#66081)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66081

Two fixes:

1. Since the operators are always registered with both name and overload name, the overloaded name need to be included when looking for an operator.
2. Don't promote operators with alias, because the new registry does not support schema with alias.

ghstack-source-id: 139732099

Test Plan: CI

Reviewed By: pavithranrao

Differential Revision: D31382262

fbshipit-source-id: 43c6e6e0c13950a9ce8cf3a70debe0421372d053
2021-10-06 15:35:02 -07:00
7cc121dbcd slow_conv3d grad_input: Avoid dispatch in parallel region (#65757)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65757

See gh-56794

Avoid dispatch inside of parallel_for by:
- Replacing Tensor slicing with TensorAccessor
- Replaces `bmm` and `mm` with direct calls to gemm.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31257878

Pulled By: ngimel

fbshipit-source-id: e6aad2d5ae7fa432bd27af2b1a8b0dcef1fc6653
2021-10-06 15:08:47 -07:00
480a1a88d6 [DDP] Log iteration in debug mode (#65770)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65770

This logging info is printed out in debug mode, make it log the
iteration as well for clarity.
ghstack-source-id: 139838595

Test Plan: CI

Reviewed By: zhaojuanmao, wayi1

Differential Revision: D31222132

fbshipit-source-id: 14519aae1ba0b2a35b4b962e7d1a957c9142c8f8
2021-10-06 14:36:07 -07:00
722f1ccfb8 [DDP][Instrumentation] Profiling range for bucket copy (#65769)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65769

Seeing some bottlenecks when copying bucket to grad, help make it more
clear here.
ghstack-source-id: 139838597

Test Plan: Ci

Reviewed By: zhaojuanmao, wayi1

Differential Revision: D31217340

fbshipit-source-id: 762a254a3538eb5292b3a53bb5d1211057ecbdbb
2021-10-06 14:34:10 -07:00
84c5970a77 ci: Migrate slow_gradcheck to GHA (#65730)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65730

This should close out the door on migrating all scheduled workflows we have for CircleCI

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

cc ezyang seemethere malfet pytorch/pytorch-dev-infra

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31225188

Pulled By: seemethere

fbshipit-source-id: 4c49e88ec017edc30e07325dbc613ff54dd164d8
2021-10-06 14:29:14 -07:00
e2be087207 [oss][pytorch] Add quint2x4 dtype (#65545)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65545

Introduce 2bit qtensor. The new dtype added for this is c10::quint2x4

The underlying storage for this is still uint8_t, so we pack 4 2-bit values in a byte while quantizing it.

Kernels that use this dtype should be aware of the packing format. (4 2-bit values in one byte)

Test Plan: `buck test mode/dev-asan caffe2/test/:quantization -- test_qtensor`

Reviewed By: supriyar

Differential Revision: D31148141

fbshipit-source-id: 1dc1de719e097adaf93fee47c6d1b8010a3eae6c
2021-10-06 14:22:00 -07:00
252b6f2cba [PyTorch][easy] Remove dead std::set in parseAliasAnnotation (#65712)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65712

No reason for this to be here.
ghstack-source-id: 139743362

Test Plan: fitsships

Reviewed By: dhruvbird

Differential Revision: D31215696

fbshipit-source-id: 238ea6633629831e54847ce82de23571cf476740
2021-10-06 14:20:31 -07:00
90db214d4b support counter-based fused rowwise adagrad (#66177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66177

As title, with additional change to enable counter for SparseAdagrad.

Test Plan:
buck test //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test

Testing with canary packages

baseline: f297789852

counter run: f297789912

Reviewed By: jspark1105

Differential Revision: D30903029

fbshipit-source-id: 3ed89a7da409fd820fd0b44950407c20fa2018a5
2021-10-06 13:50:43 -07:00
6d7fab5929 [Static Runtime][easy] Clone scripts do not use aten::add (#66161)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66161

`aten::add` is not guaranteed to be bit exact with the JIT interpreter. This was causing non-deterministic test failures on master.

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D31406764

fbshipit-source-id: d968cb1bdb8f33934682ef3712a1341a3aacf18e
2021-10-06 12:37:39 -07:00
9285981de1 Clean up unused model instantiation (#65487)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65487

Test Plan: Imported from OSS

Reviewed By: jingsh

Differential Revision: D31410880

Pulled By: b-koopman

fbshipit-source-id: 09b2d2d899a232e7334c82f00eff0f900e817853
2021-10-06 12:21:56 -07:00
8548928950 Cumsum: acc_ops (#66189)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66189

Added acc_ops for cumsum and unit test

Test Plan: buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer

Reviewed By: 842974287

Differential Revision: D31355244

fbshipit-source-id: 41490d300553b0a5d52cbc4e681bdd0cf990eb42
2021-10-06 12:15:36 -07:00
623ac7eabb slow_conv3d: Avoid dispatch in parallel region (#65737)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65737

See gh-56794

Avoid dispatch inside of parallel_for by:
- Replacing Tensor slicing with TensorAccessor
- Copy bias into output only once, outside of the parallel region
- Replaces `addmm_` and `baddbmm_` with direct calls to gemm.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31257874

Pulled By: ngimel

fbshipit-source-id: 20b94daa13082fb1e39eaa8144bfa4c611b61bab
2021-10-06 12:10:55 -07:00
9a0b2acd76 [quant] Remove hypothesis from qtopk (#66158)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66158

qtopk used hypothesis which created flaky tests. In addition to that the tests generated were not representative, and would not catch the cases that we are interested in.

This diff removes the hypothesis from the qtopk and merges the qtopk and qtopk_nhwc tests. We now use specific testcases.
ghstack-source-id: 139768865

Test Plan: `buck test mode/dev //caffe2/test:quantization -- test_qtopk`

Reviewed By: jerryzh168

Differential Revision: D31401341

fbshipit-source-id: a8fb37a7221fc43c159f34e28aa4a91ed3506944
2021-10-06 11:42:34 -07:00
6d4d636d66 [GHA] Rectify trigger_action_only flag (#66209)
Summary:
No longer needed, as PR can be opened/reopened with specific label

Fixes https://github.com/pytorch/pytorch/issues/66110

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66209

Reviewed By: seemethere

Differential Revision: D31436292

Pulled By: malfet

fbshipit-source-id: 5b6e0875bec261862017dfe0eb3a5ec57fb8c705
2021-10-06 10:46:10 -07:00
c4ea447eb5 Use src size for memcpy in order to avoid fortify complaints (#65222)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65222

When compiling against the Android SDK with `--D_FORTIFY_SOURCE=2`, the compiler will complain that the `dst` size is a larger size than the `src` size due to the function templating using two differently sized objects. There is a `TORCH_CHECK` to ensure we don't go through with these `memcpy`'s, but in the interest of making the compiler happy, lets switch the `memcpy` to take `sizeof(src)`.

Test Plan: CI

Reviewed By: bertmaher, lanza

Differential Revision: D30992678

fbshipit-source-id: b3e7aa992a3650e1051abad05be800b684e6332b
2021-10-06 09:05:31 -07:00
bfaaac6392 Ignore register_rds errors (#66185)
Summary:
Network communications are flaky by nature, test should be marked as
skipped if network ops can not be completed for some reason

Fixes https://github.com/pytorch/pytorch/issues/66184

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66185

Reviewed By: seemethere

Differential Revision: D31423193

Pulled By: malfet

fbshipit-source-id: 96c3a123c65913f44ea78b30a03e8e7eda164afe
2021-10-06 08:42:35 -07:00
b8e1999253 [quant] Add op benchmark for GPU FakeQuantizePerChannel with float zero_points (#66183)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66183

Add a GPU benchmark for fakeQuant, similar to #65241
ghstack-source-id: 139810414

Test Plan: https://pxl.cl/1QjJM

Reviewed By: b-koopman

Differential Revision: D31288158

fbshipit-source-id: 65526248b5c7b70f0bc32a86b08f50b4cbc7a83d
2021-10-06 08:07:42 -07:00
9de9733390 Add 1d to 2d conv transform during mobile optimization (#65850)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65850

This step was never added
ghstack-source-id: 139753673

Test Plan: Run optimize_for_mobile on model with conv1d and see that it transforms to conv2d

Reviewed By: kimishpatel

Differential Revision: D31093503

fbshipit-source-id: 11a19f073789c01a9de80f33abbe628005996b66
2021-10-06 07:27:09 -07:00
747a5782e3 [quant][fx] Don't assume bias is a keyword argument (#61647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61647

`prepare_fx` currently assumes that bias is always a positional argument to
convolutions, and only a keyword argument to other functions. This happens to work
today due to a quirk in how `__torch_function__` is handled for python
functions but shouldn't be considered stable.

Instead, we should support `bias` for both positional and keyword forms.

cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D31401360

Pulled By: albanD

fbshipit-source-id: 1e2f53d80e2176b870f326dc498e251e2386136e
2021-10-06 07:25:47 -07:00
ab25516054 [PyTorch] Remove unused function in import (#65865)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65865

`operator_str` is not used in `import.cpp` and it is also defined in `parse_operators.cpp` so removing it from `import.cpp`.

Test Plan: CI passing

Reviewed By: iseeyuan

Differential Revision: D31293008

fbshipit-source-id: 1c857cbd63c57b8f79c1a068789fc8605605b642
2021-10-06 06:34:51 -07:00
a5895f85be [PyTorch Edge][type] Add type check in compatibility api (#63129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63129

1. Add an api to get `supported_types` from runtime, expose in c++ only.
2. Add an api to get `contained_types` from model, expose in both c++ and PyThon.
3. Add a field `contained_types_` in `type_parser.cpp` to track the contained types when parsing python string.
4. Expand `is_compatible` api to check type. When checking type, it will check the contained type list from the model with the support type list from runtime.
5. Expand the unittest for compatibility to cover type
6. Add unit test in python to check type list
ghstack-source-id: 139826944

Test Plan:
```
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.GetContainTypes'

buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleSuccess'
buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleFail'

buck test //caffe2/test:mobile
```

Reviewed By: iseeyuan

Differential Revision: D30231419

fbshipit-source-id: 8427f423ec28cc5de56411f15fd960d8595d6947
2021-10-06 02:23:44 -07:00
c75210face [PyTorch Edge][type] Move TypeParser class definition to header file (#65976)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65976

More TypeParser class to header file so it can be called from somewhere else. For example, the getContainedTypes() api in this stack can be moved to other files.
ghstack-source-id: 139826943

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D31294254

fbshipit-source-id: 1c532fd69c7f6b44ad2332055d24c95a0fac1846
2021-10-06 02:22:26 -07:00
931352c68d Make handle_torch_function_no_python_arg_parser public (#66054)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66054

I need this function in functorch to support the ability of custom
jitted kernels to invoke torch_function when applicable.

Test Plan: functorch unit tests

Reviewed By: qihqi, ngimel

Differential Revision: D31416599

Pulled By: bertmaher

fbshipit-source-id: 90b57badd6a6b9d505ebfc436869b962b55c66d7
2021-10-06 00:27:10 -07:00
c0b1965f7c Back out "[vulkan] Use push constants instead of SSBOs" (#66169)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66169

Original change: D30368834 (57e5ae5306)

Switching to Push Constants from Uniform Buffers caused some unforseen memory errors when running Mac unit tests.

We'll switch back for now until we can pinpoint and resolve the issue.

Test Plan:
Build and run `vulkan_api_test`

```
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
```

Reviewed By: beback4u

Differential Revision: D31409130

fbshipit-source-id: cab1a3330945b50522235db6738406b6037f9c68
2021-10-05 21:28:59 -07:00
8d435877d5 Fix typos at ONNX docs (#66090)
Summary:
This PR fixes small typos at ONNX docs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66090

Reviewed By: albanD

Differential Revision: D31385765

Pulled By: ezyang

fbshipit-source-id: f4879069a2acf9c8adaa81c26a6a5014634761f5
2021-10-05 21:11:47 -07:00
cbc29acca3 [Codemod][FBSourceBlackLinter] Daily arc lint --take BLACK
Reviewed By: zertosh

Differential Revision: D31423202

fbshipit-source-id: 08d249e8546c0bfe6f1145c0571141b90aad03eb
2021-10-05 20:55:56 -07:00
d1058df885 fix clang-tidy error introduced by #64382 (#65977)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65977

Reviewed By: ngimel

Differential Revision: D31423174

Pulled By: malfet

fbshipit-source-id: 0ea560b9a6ddd6431f70bd3ac10ace68e26ab352
2021-10-05 20:13:13 -07:00
6cdea8239e Precomputing Transposes for frozen linear layers (#65631)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65631

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D31314248

Pulled By: Gamrix

fbshipit-source-id: 85611f3ccfe7b91a183d5d12f7fb9aca3c51acb0
2021-10-05 20:08:32 -07:00
43e26d0086 [deploy] Improve error messaging for create_movable (#65955)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65955

This diff makes sure to give clear error message when user tries to create obj from obj that lives in different session

Test Plan: buck test //caffe2/torch/csrc/deploy:test_deploy

Reviewed By: suo

Differential Revision: D31323045

fbshipit-source-id: e7bd6f76afeb0285847bc11881185a164f80e3f0
2021-10-05 19:49:51 -07:00
3bd26792c0 Skip test_multiple_groups on windows (#66154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66154

Skips as the test is flaky:
https://github.com/pytorch/pytorch/issues/66059
ghstack-source-id: 139763149

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D31403153

fbshipit-source-id: 7f47f17cee148a708346d6d9454c44a194d13a78
2021-10-05 18:33:23 -07:00
eeabab03e7 [DataParallel] Log API Usage for tracking (#66038)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66038

Will help track workflows for DP deprecation. Tested via standalone DP
script.

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D31356975

fbshipit-source-id: c0a3ac3a1faed794e3362f3f3a19a6fb800587a7
2021-10-05 18:30:23 -07:00
dc26f5eb65 [FX] Specifies a default value when possible for placeholders created from concrete_args (#59569)
Summary:
```python
class Foo(torch.nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, a=None, b=None):
        res = a
        if b is not None:
            res = res + b
        return res

concrete_args = {'b': torch.tensor(5)}
traced = fx.symbolic_trace(Foo(), concrete_args=concrete_args)
```

Gives the following error:

```
  File "<eval_with_key_9>", line 2
    def forward(self, a = None, b_1):
                ^
SyntaxError: non-default argument follows default argument
```

Since https://github.com/pytorch/pytorch/issues/55888, placeholders are also created for concrete arguments. But these placeholders do not have default values even when  it was provided for the argument in question, causing the error above.

To solve this, I add a default value when it is available during placeholder creation for concrete arguments.

I also tried to set the default value to the value specified in concrete_args (since it many cases it will actually use this value anyway), but ran into an error because the default value is never defined:

```
def forward(self, a = None, b_1 = _tensor_constant0):
    _tensor_constant0 = self._tensor_constant0
    _tensor_constant1 = self._tensor_constant1
    add = a + _tensor_constant1;  a = _tensor_constant1 = None

NameError: name '_tensor_constant0' is not defined
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59569

Reviewed By: albanD

Differential Revision: D31385607

Pulled By: Chillee

fbshipit-source-id: 44a8ce28b5eabdb9b4c773e73a68ff0bb9c464cc
2021-10-05 17:45:09 -07:00
83bac89d64 [quant] Add fp32/fp16 zero_point support for GPU fakeQuant (#65836)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65836

Add a GPU implementation for GPU fakeQuant, similar to D30975238 (60915eb810)
ghstack-source-id: 139779416

Test Plan:
https://www.internalfb.com/intern/testinfra/testconsole/testrun/281475183488511/

{F667112564}

Reviewed By: b-koopman

Differential Revision: D31091679

fbshipit-source-id: 68fd483e6926c7fd565703c01d8ffb337b75dca5
2021-10-05 17:40:54 -07:00
f062def486 Revert D31260343: [pytorch][PR] Add hash and int128 utils for Lazy Tensor Core
Test Plan: revert-hammer

Differential Revision:
D31260343 (e94fea08d0)

Original commit changeset: 8bb1194188e3

fbshipit-source-id: 3d0d5377d71ed928015bcb2105801be368e38cd8
2021-10-05 17:15:50 -07:00
5e6347ca64 .circleci: Remove migrated distributed configs (#66174)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66174

These configs have already been migrated so going to go ahead and remove
them

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31413579

Pulled By: seemethere

fbshipit-source-id: 8923736d347eb8c8470884be413122c198d1bf20
2021-10-05 16:53:02 -07:00
e94fea08d0 Add hash and int128 utils for Lazy Tensor Core (#65635)
Summary:
These utils are prerequisites for Lazy Node base class.

- set up new torch/csrc/lazy, test/cpp/lazy dirs
- add source files to build_variables.bzl in new lazy_core_sources var
- create new test_lazy binary

Fixes https://github.com/pytorch/pytorch/issues/65636

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65635

Reviewed By: alanwaketan

Differential Revision: D31260343

Pulled By: wconstab

fbshipit-source-id: 8bb1194188e3e77fc42e08a14ba37faed37a9c2e
2021-10-05 16:43:55 -07:00
143c957c2d [nnc] Reduced memory usage of LLVMCodeGen object after code generation is complete (#65373)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65373

Test Plan: Imported from OSS

Reviewed By: bertmaher, hlu1

Differential Revision: D31066974

Pulled By: navahgar

fbshipit-source-id: 0dbe0d1746c50adee90fe5a7cc4a66adba3a229e
2021-10-05 16:27:43 -07:00
68555339d7 test_utils.py: Add another retry to test_download_url_to_file (#66159)
Summary:
Fixes one of the flakiness concerns mentioned https://github.com/pytorch/pytorch/issues/65439#issuecomment-934686485

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66159

Reviewed By: ngimel

Differential Revision: D31406485

Pulled By: janeyx99

fbshipit-source-id: cf7834cdab58360ecef1748075d52969de2e0778
2021-10-05 16:26:20 -07:00
d2021e5e68 ci: Migrate vulkan builds to GHA (#66044)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66044

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31370889

Pulled By: seemethere

fbshipit-source-id: 399f5f0c184f7856dcddb138c357f1374706e676
2021-10-05 16:11:36 -07:00
7452b65144 Remove unused dump method from VSX vec256 methods (#66085)
Summary:
Follow up after https://github.com/pytorch/pytorch/pull/63533

Probably fixes https://github.com/pytorch/pytorch/issues/65956

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66085

Reviewed By: ngimel

Differential Revision: D31382898

Pulled By: malfet

fbshipit-source-id: f3d97b0f2c7f1207827773ae85e2739f1d54b9c7
2021-10-05 16:05:01 -07:00
6e06cb76ff [JIT] Initialize CUDA context before launching fused kernel (#65064)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65064

The problem appears when nvfuser is triggered from LazyTensor.
Because LT maintains its own thread pool, the thread used for the first-time
compilation does CUDA context initialization properly, but later
cached execution may use a different thread which does not have
a proper CUDA context.

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D31269691

Pulled By: desertfire

fbshipit-source-id: 384362025c087d61e8b625ff938379df283ef8b2
2021-10-05 16:01:59 -07:00
a5e6b2b2e3 [Static Runtime] Add variadic sigrid_transforms_torch_bind (#63960)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63960

Reviewed By: hlu1

Differential Revision: D30529880

fbshipit-source-id: 1c4be2f9c0944bbe1e1c146989588c96bfd14eda
2021-10-05 16:00:36 -07:00
e7747795c9 [PyTorch Edge] Reduce dispatch table size further for a trimmed build (#66112)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66112

Eliminate Metal and Vulkan Dispatch Keys.

Test Plan: Build + Sandcastle

Differential Revision: D31298307

fbshipit-source-id: 31302fc626382db7997e5058750fa85458c9cbc1
2021-10-05 15:24:07 -07:00
a3bbaf227c Revert D31227448: [pytorch][PR] fixing sorting in stride indices
Test Plan: revert-hammer

Differential Revision:
D31227448 (da0e29edd4)

Original commit changeset: 51e3cd903757

fbshipit-source-id: a752a4df70281aa0eaaeb1afdd88395b08276da8
2021-10-05 14:28:34 -07:00
89b56d630d Create CI sev template (#66163)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66163

Reviewed By: seemethere

Differential Revision: D31407988

Pulled By: suo

fbshipit-source-id: a23b6fc5410ef1f901e2a7aacc2e0c17cb04d083
2021-10-05 13:55:07 -07:00
5883523c1d Remove dtype from torch.Storage and use only torch.ByteStorage (#62030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62030

Remove dtype tracking from Python Storage interface, remove all the different `<type>Storage` classes except for `ByteStorage`, and update serialization accordingly, while maintaining as much FC/BC as possible

Fixes https://github.com/pytorch/pytorch/issues/47442

* **THE SERIALIZATION FORMAT IS FULLY FC/BC.** We worked very hard to make sure this is the case. We will probably want to break FC at some point to make the serialization structure of tensors make more sense, but not today.
* There is now only a single torch.ByteStorage class. Methods like `Tensor.set_` no longer check that the dtype of storage is appropriate.
* As we no longer know what dtype of a storage is, we've **removed** the size method from Storage, replacing it with nbytes. This is to help catch otherwise silent errors where you confuse number of elements with number of bytes.
* `Storage._new_shared` takes a `nbytes` kwarg and will reject previous positional only calls.  `Storage._new_with_file` and `_set_from_file` require explicit element size arguments.
* It's no longer possible to convert storages to different types using the float/double/etc methods. Instead, do the conversion using a tensor.
* It's no longer possible to allocate a typed storage directly using FloatStorage/DoubleStorage/etc constructors. Instead, construct a tensor and extract its storage. The classes still exist but they are used purely for unpickling.
* The preexisting serialization format stores dtype with storage, and in fact this dtype is used to determine the dtype of the tensor overall.
 To accommodate this case, we introduce a new TypedStorage concept that exists only during unpickling time which is used to temporarily store the dtype so we can construct a tensor. **If you overrode the handling of pickling/unpickling, you MUST add handling for TypedStorage** or your serialization code will degrade to standard file-based serialization.

Original pull request: https://github.com/pytorch/pytorch/pull/59671

Reviewed By: soulitzer, ngimel

Differential Revision: D29466819

Pulled By: ezyang

fbshipit-source-id: 4a14e5d3c2b08e06e558683d97f7378a3180b00e
2021-10-05 13:50:34 -07:00
588c1787ba Update link to example pytorch/examples (#66095)
Summary:
`https://github.com/goldsborough/examples/tree/cpp/cpp` -> `https://github.com/pytorch/examples/tree/master/cpp`
As C++ examples in  https://github.com/pytorch/examples are more update

Partially addresses https://github.com/pytorch/pytorch/issues/65388

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66095

Reviewed By: janeyx99

Differential Revision: D31382888

Pulled By: malfet

fbshipit-source-id: 8884c7795386249dea07cbe66783fa1dd963e07c
2021-10-05 12:48:12 -07:00
da0e29edd4 fixing sorting in stride indices (#63940)
Summary:
Updating `computeStrideProps` logic to break ties on stride_indices.

For two dimension with identical stride, the dimension with size-1 should be considered as the faster dimension. Otherwise, its stride should be the product of existing stride and the size of the other dimension.

Note that there's still inconsistency between eager memory_format and stride_properties in JIT, this is a design issue due to the ambiguity on size-1 stride. One example showing this failing test has been disabled in the added cpp test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63940

Reviewed By: albanD

Differential Revision: D31227448

Pulled By: dzhulgakov

fbshipit-source-id: 51e3cd903757bef55d3158c057f9444d0cff7d2a
2021-10-05 12:30:41 -07:00
0d020effab [quant] Fix the parts that were missing after initial migration (#66058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66058

After the initial migration from `torch.quantization` to `torch.ao.quantization`, some of the files did not change.
This happened because the migration was done in parallel, and some of the files were landed while the others were still in the original location.
This is the last fix in the AO migration phase 1, which completely enables the ao.quantization namespace.

Test Plan: `python test/test_quantization.py`

Reviewed By: vkuzo

Differential Revision: D31366066

Pulled By: z-a-f

fbshipit-source-id: bf4a74885be89d098df2d87e685795a2a64026c5
2021-10-05 11:45:37 -07:00
727576e501 [quant] Fixing the hypothesis test for topk (#66057)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66057

The current test is creating the sets that are too slow.
This will cause either "Filtering too much" or "Timeout" errors in the future versions of hypothesis.
This PR preemptively fixes the issue.

Test Plan: `python test/test_quantization.py`

Reviewed By: vkuzo

Differential Revision: D31366065

Pulled By: z-a-f

fbshipit-source-id: deaab4da8ee02a5dee8943cabdd30fc53d894a34
2021-10-05 11:43:56 -07:00
92d0b7e99c [deploy] fix typo in registerModuleSource (#66107)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66107

lol

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D31385631

Pulled By: suo

fbshipit-source-id: a3307e2862f7951c160776eb8edb18329c937ed1
2021-10-05 11:15:35 -07:00
458a00bacb Back out "[quant] update fused_obs_fake_quant op to accept output_fake_quant argument" (#66063)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66063

Original commit changeset: bffe776216d0

Test Plan: CI

Reviewed By: vkuzo

Differential Revision: D31347042

fbshipit-source-id: f56f628dc4690187bf284a8f2fda4c6aae10c1d6
2021-10-05 11:02:54 -07:00
2b39b80971 [quantized] Replace conv_p with convolution_op in qnnpack (#65783)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65783

convolution_op makes conv_param struct redundant, since it contains all the params of conv_param and more. We don't need to pass both structs to qnnpack or hold both in the packed weights, let's just hold convolution_op.

This makes it easier to implement 3dconv since we won't have to template two structs. The conv_param struct is left in existence since tests rely on it to set up the convolution.
ghstack-source-id: 139479651

(Note: this ignores all push blocking failures!)

Test Plan: ci

Reviewed By: kimishpatel

Differential Revision: D30738727

fbshipit-source-id: e6d39644357b99d3b7491ae8a7066bf107eb8b9e
2021-10-05 11:01:26 -07:00
bda3230b62 slow_conv2d grad_weight: call gemm directly (#65726)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65726

This PR isn't strictly necessary since grad_weight doesn't use
parallel_for. However, this does reduce the function overhead and will
make it easier to parallelize in the future.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31257877

Pulled By: ngimel

fbshipit-source-id: d8ea97cc1f43d8d9dfff355ae27c9d982838b57e
2021-10-05 10:53:22 -07:00
1db78c30c9 Fix LLVM-12 concat_split_op.h error (#66060)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66060

Fixes
```
testTumHistoryAdditionalLaser (caffe2.caffe2.fb.layers.tests.tum_history_test.TestTumHistory) ... caffe2/caffe2/operators/concat_split_op.h:363:74: runtime error: applying non-zero offset 8 to null pointer
    #0 0x7f8f39d29795 in caffe2::ConcatOp<caffe2::CPUContext>::RunOnDevice() caffe2/caffe2/operators/concat_split_op.h:363
    #1 0x7f8f39c4978d in caffe2::Operator<caffe2::CPUContext>::Run(int) caffe2/caffe2/core/operator.h:987
    #2 0x7f8f381fe9c9 in caffe2::SimpleNet::Run() caffe2/caffe2/core/net_simple.cc:67
    #3 0x7f8f38ee488e in caffe2::Workspace::RunNet(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) caffe2/caffe2/core/workspace.cc:289
```

Test Plan: Sandcastle

Reviewed By: dzhulgakov, xush6528

Differential Revision: D31366205

fbshipit-source-id: 566aa519677c9d371189e4b1f81d595732861efc
2021-10-05 10:48:56 -07:00
9c3eb50b7b [PyTorch] Use std::move() in a couple places in function_schema_parser.cpp (#66114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66114

ghstack-source-id: 139712533

Test Plan: Build

Reviewed By: swolchok

Differential Revision: D31387502

fbshipit-source-id: e850cb7df397a7c5b31df995b23ad6e5c004ac86
2021-10-05 10:44:07 -07:00
aa80f05d2d Remove sync in Embedding caused by unique (#66091)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66091

Reviewed By: albanD

Differential Revision: D31385576

Pulled By: ngimel

fbshipit-source-id: e656d4d9c38b705c71853ca295f977d1cddc61a1
2021-10-05 09:39:42 -07:00
1932bc69e9 Move GHA to ONNX (#65975)
Summary:
- Delete CircleCI ONNX config
- Add sharded ONNX job to the list of generated workflows
- Move ONNX runtime installation from `pytorch-job-specs.yml` to `.jenkins/caffe2/test.sh`
- Limit MKLDNN to AVX2 ISA while running  Caffe2 tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65975

Reviewed By: seemethere

Differential Revision: D31327206

Pulled By: malfet

fbshipit-source-id: 15aa53e4481e846c62b4ee2db5c03047d68679a4
2021-10-05 09:31:57 -07:00
df475aa1dc Update Vulkan runner in benchmark binary to handle non-tensor inputs (#66123)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66123

Some models may take in a list of tensors as inputs, thus the bundled inputs will contain `IValues` that are of the type `c10::List`. For Vulkan models, every tensor in the `IValue` list has to be converted to a vulkan tensor first, and this case is not currently handled by the Vulkan model wrapper in the benchmark binary.

This diff introduces `IValue` type checking to the input processor of the Vulkan model wrapper, and adds support for Tensor and List types.

Test Plan:
```
# Build the binary
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:ptmobile_compareAndroid\#android-arm64 --show-output
# Push it to the device
adb push buck-out/gen/xplat/caffe2/ptmobile_compareAndroid\#android-arm64 /data/local/tmp/compare_models

# Run the benchmark binary
BENCH_CMD="/data/local/tmp/compare_models"
BENCH_CMD+=" --model=$PATH_TO_MODEL"
BENCH_CMD+=" --refmodel=$PATH_TO_REFERENCE_MODEL"
BENCH_CMD+=" --input_type=float --input_dims=$MODEL_INPUT_SIZE"
BENCH_CMD+=" --iter=100"
BENCH_CMD+=" --tolerance 1e-5"
```

Reviewed By: beback4u

Differential Revision: D31276862

fbshipit-source-id: 1d9abf958963da6ecad641202f0458402bee5ced
2021-10-05 07:59:56 -07:00
2a5116e159 [quant][fx2trt] Add quantize_per_channel in acc_ops and acc_ops_converter (#65287)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65287

Test Plan:
python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py

Imported from OSS

Reviewed By: 842974287

Differential Revision: D31038882

fbshipit-source-id: cd20e132ffa85f6fb070e21cd96a9e84dd15fab5
2021-10-05 02:12:00 -07:00
d609957c95 patching graph_for (#55139)
Summary:
Allows individual DifferentiableGraphOp to display optimized forward graph. This improves user visibility to graph mutation via optimization pass, especially fusion.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55139

Reviewed By: albanD

Differential Revision: D31330909

Pulled By: dzhulgakov

fbshipit-source-id: c745b482fdc34876dc404cbe3bacd99dcf2ac724
2021-10-04 21:50:22 -07:00
ed50fa2513 [Static Runtime] Test isOptimizableContainerType and getAlwaysAliveValues (#65849)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65849

Add tests for some of `StaticModule`'s exposed methods. Both of these are used by the memory planner, so it would be helpful to have some unit tests that ensure our basic invariants don't break.

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D31282901

fbshipit-source-id: e390329f4794e034170507e3a0de0abcfe0ab7b9
2021-10-04 20:46:07 -07:00
4c4525fa5c Compile without -Wno-unused-variable (take 2) (#66041)
Summary:
Delete `-Wno-unused-variable` from top level `CMakeLists.txt`
Still suppress those warnings for tests and `torch_python`

Delete number of unused variables from caffe2 code
Use `(void)var;` to suppress unused variable in range loops
Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants

Do not delete `caffe2::OperatorBase::Output` calls as they have side effects

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66041

Reviewed By: ngimel

Differential Revision: D31360142

Pulled By: malfet

fbshipit-source-id: 6fdfb9f91efdc49ca984a2f2a17ee377d28210c8
2021-10-04 20:39:39 -07:00
6b0aa2958d [FX] Support torch.layout as arg (#66048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66048

Previously, create_arg would fail if it encountered a not `None` layout argument. Adding it to `BaseArgumentTypes` list should be enough to fix that.

Test Plan: Added unittest

Reviewed By: jamesr66a

Differential Revision: D31362662

fbshipit-source-id: 20049971e18c17e9c75e50540500c567266daa55
2021-10-04 19:58:08 -07:00
6ea4902cf4 [ao_migration] torch.quantization --> torch.ao.quantization in caffe2/torch/fx (#66096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66096

codemod -m -d caffe2/torch/fx --extensions py \
    'torch.quantization' \
    'torch.ao.quantization'

Test Plan: test_in_prod

Reviewed By: z-a-f

Differential Revision: D31294195

fbshipit-source-id: 00425844f8160749f68bdbdf0e08cb22c79099c9
2021-10-04 19:57:01 -07:00
de24faec5f Binary building wthout python fix (#66031)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66030

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66031

Reviewed By: VitalyFedyunin

Differential Revision: D31356243

Pulled By: malfet

fbshipit-source-id: d1537bc65bbba5d6497ecb8db7160a397eca81fd
2021-10-04 18:34:35 -07:00
6eb3a1c831 Run master clang-tidy on PRs (#66104)
Summary:
Make PR clang-tidy a strong superset of master one
Should prevent a situation when [clang-tidy on PR](https://github.com/pytorch/pytorch/runs/3773346094) was clean but regressed  on [trunk commit](https://github.com/pytorch/pytorch/runs/3773406183?check_suite_focus=true)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66104

Reviewed By: seemethere

Differential Revision: D31384608

Pulled By: malfet

fbshipit-source-id: 397319be3480520d58eab11ec001ad7a9a94d41c
2021-10-04 18:27:38 -07:00
7c758759e3 [PyTorch Edge] Avoid string copying in TypeParser (#64278)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64278

Use c10::string_view and const char* to copy less.
ghstack-source-id: 139468089

Test Plan:
Pixel 3 before: https://www.internalfb.com/intern/aibench/details/132239033718036
Pixel 3 after: https://www.internalfb.com/intern/aibench/details/132239033718036
went from mean of 293 ms to 281 ms.

Reviewed By: dhruvbird

Differential Revision: D30650712

fbshipit-source-id: abad143f2d5cc99a30e8da376c8e37716373032a
2021-10-04 16:10:38 -07:00
69da4b4381 GHA: make obvious when we are running smoke tests to user (#66011)
Summary:
This PR clarifies what's run on PRs by explicitly stating when it runs smoke tests for windows CUDA and makes the logic so that user defined labels override other workflow logic.

1. Move smoke tests to its own config.

2. Make sure that when a user specifies a ciflow label that is not the default, the workflow runs as if it is on trunk.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66011

Test Plan:
the default on PRs would generate this matrix (default replaced by smoke_tests)
![image](https://user-images.githubusercontent.com/31798555/135672182-64454ea3-ff43-4746-b8e4-09b0b28e9d33.png)
But when retriggered with a label, it looks like (note that there's no smoke_tests config):
![image](https://user-images.githubusercontent.com/31798555/135672601-5aa9a268-bc76-40f1-80c6-62b3fac6601d.png)

Reviewed By: VitalyFedyunin, seemethere

Differential Revision: D31355130

Pulled By: janeyx99

fbshipit-source-id: fed58ade4235b58176e1d1a24101aea0bea83aa4
2021-10-04 07:53:17 -07:00
4cdfceddd2 [Reland] Avoid saving self for softmax and log_softmax (#66018)
Summary:
Reland of https://github.com/pytorch/pytorch/pull/65242

The last attempt of the reland automatically rebased onto stable, which did not yet have the revert commit

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66018

Reviewed By: albanD

Differential Revision: D31348822

Pulled By: soulitzer

fbshipit-source-id: 881d701b404530c1352ac9245bd67264e1652b8a
2021-10-03 21:35:01 -07:00
8f5631b859 Refactor functional api vectorized jacobian to use batched grad parameter (#65566)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65566

This doesn't simplify vectorized jacobian computation, but is good to consolidate logic and helps us to test the logic

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31236257

Pulled By: soulitzer

fbshipit-source-id: 00ca0aa6519bed5f9ee2c7be4daa8872af5e92cd
2021-10-03 19:55:08 -07:00
73901b099d Add batched_grad parameter to autograd.grad (#65564)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65564

- wrap the call into engine with vmap if `batched_grad` is `True`
- improves the comment on the call to engine (somewhat addressing https://github.com/pytorch/pytorch/issues/41659)
- borrows the message from functional.jacobian's vectorized argument concerning usage of the vmap feature
- adds basic test (further testing is done when we replace the usage in vectorized jacobian computation)

TODO:
 - create an issue tracking this

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31236259

Pulled By: soulitzer

fbshipit-source-id: b33e6b26ea98fa9f70c44da08458fc54ba4df0f7
2021-10-03 19:55:06 -07:00
b6d5f1ee70 Allow None to pass through for vmap (#65565)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65565

Does jax allow this?

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D31236258

Pulled By: soulitzer

fbshipit-source-id: 80460b355fc32ecbba8151e1f3179f076a927f9d
2021-10-03 19:53:49 -07:00
89ed9bdaee [Static Runtime] Fix bug of creating output aliases in aten::embedding_bag (#65516)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65516

This change fixes a bug that Static Runtime's `aten::embedding_bag` out variant implementation creates aliases in its managed output tensors.

Managed output tensors should never be an alias with each other since writing to them can illegally overwrite others' contents unintentionally, and this exact problem was causing the bug at T97393697, causing SR to return wrong return values.

This bug is detected in inline_cvr/remote_ro by a DCHECK, `verify_no_memory_overlap` (introduced by D30211705 (3fb33b38b9)), but wasn't found so far since our testing didn't include running the model in the debug mode. Fortunately this bug is not hitting production since the aliases outputs are not used in production.

This change fixes the root cause from `_embedding_bag_cpu_impl_out`  by replacing alias creation with copying.

Note that this change also includes a fundamental change in Static Runtime's unit testing: `testStaticRuntime` exercises the given graph 3 times:
 1. profile run
 2. run using the profile to allocate managed tensors
 3. reuse the managed tensors -- newly added

Adding 3 reveals this bug with a new unittest `EmbeddingBagWithManagedOutput`.

Test Plan:
- Confirmed that the crash experienced by `StaticRuntime.EmbeddingBagWithManagedOutput` disappears with this change (crash paste: P459807248).

- Added `StaticRuntime.EmbeddingBagWithManagedOutput` to detect the same problem in the future.

Reviewed By: hlu1

Differential Revision: D31104345

fbshipit-source-id: 7bddf9cd82b400d18d8ce1bf15e29b815ef9ba8f
2021-10-03 15:10:58 -07:00
40948a935d Fix LLVM-12 UB in generate_proposals_op.cc (#66009)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66009

Fixes
```
test_trace_c10_ops (jit.test_tracer.TestTracer) ... third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:374:24: runtime error: applying non-zero offset 4 to null pointer
    #0 0x7f5228f72227 in Eigen::internal::BlockImpl_dense<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >, -1, -1, false, true>::BlockImpl_dense(Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >&, long, long, long, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:374
    #1 0x7f5228f7212c in Eigen::BlockImpl<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >, -1, -1, false, Eigen::Dense>::BlockImpl(Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >&, long, long, long, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:166
    #2 0x7f5228f720dc in Eigen::Block<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >, -1, -1, false>::Block(Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >&, long, long, long, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:142
    #3 0x7f5229b0e059 in Eigen::DenseBase<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> > >::FixedBlockXpr<internal::get_fixed_value<int>::value, internal::get_fixed_value<long>::value>::Type Eigen::DenseBase<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> > >::block<int, long>(long, long, int, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/../plugins/BlockMethods.h:98
    #4 0x7f5229b0c5ca in caffe2::GenerateProposalsOp<caffe2::CPUContext>::RunOnDevice() caffe2/caffe2/operators/generate_proposals_op.cc:348
```
Also cleans up some data type and const issues around the area.

Test Plan: Sandcastle

Reviewed By: xush6528

Differential Revision: D31343046

fbshipit-source-id: fd9096c8e47a0aad529c72fd313f64ca98dcb80b
2021-10-03 12:50:21 -07:00
c7748fc172 Added validation of mode parameter in AveragedModel (#65921)
Summary:
Discussion: https://github.com/pytorch/pytorch/pull/65495#issuecomment-930460469

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65921

Reviewed By: albanD

Differential Revision: D31310105

Pulled By: prabhat00155

fbshipit-source-id: 417691832a7c793744830c11e0ce53e3972d21a3
2021-10-03 08:42:28 -07:00
0fc6bd2e47 [gpu ne eval] disable adam decay unit test for gpu (#66056)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66056

keep running into this unrelated failure when landing diffs regarding the gpu inference project,
disabling this operator unit test in gpu because it doesn't exist

RuntimeError: [enforce fail at operator.cc:277] op. Cannot create operator of type 'SmartDecaySparseAdam' on the device 'CUDA'. Verify that implementation for the corresponding device exist. It might also happen if the binary is not linked with the operator implementation code. If Python frontend is used it might happen if dyndep.InitOpsLibrary call is missing. Operator def: input: "param" input: "mom1" input: "mom2" input: "last_seen" input: "indices" input: "grad" input: "lr" input: "iter" output: "param" output: "mom1" output: "mom2" output: "last_seen" name: "" type: "SmartDecaySparseAdam" arg { name: "beta1" f: 0 } arg { name: "beta2" f: 0.9 } arg { name: "epsilon" f: 1e-05 } device_option { device_type: 1 }

https://www.internalfb.com/intern/testinfra/diagnostics/5910974579962988.562949996565057.1633122845/

Test Plan: sandcastle

Reviewed By: jianyuh

Differential Revision: D31364731

fbshipit-source-id: 7fbd994cbe7f6ca116f5f34506a1ed7f14759bdf
2021-10-03 07:40:23 -07:00
29c0725e8a Back out "[caffe2] fix LLVM-12 nullptr-with-nonzero-offset UBSAN error" (#66055)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66055

Original commit changeset: c31f179f8a7d

Reviewed By: igorsugak

Differential Revision: D31353348

fbshipit-source-id: 73d928e5c938ba604a7f9ea17a6250b57306e88f
2021-10-02 16:46:26 -07:00
7c52963350 [WIP] skip constant folding dequant node (#63991)
Summary:
This PR makes Constant Propagation to ignore dequant nodes.

https://github.com/pytorch/pytorch/issues/61092

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63991

Reviewed By: pbelevich

Differential Revision: D31363993

Pulled By: Krovatkin

fbshipit-source-id: 99f7c56a4381aff2cbdf1167508414cf240e9f75
2021-10-02 15:30:43 -07:00
8a307640db selective trt import based whether we have gpu or not (#66045)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66045

Att.

Reviewed By: kflu

Differential Revision: D31357388

fbshipit-source-id: 601affe067e5e4c1f1516dff4ac84fa9cdd27d5e
2021-10-02 06:12:37 -07:00
8b8012a165 [PyTorch Edge] Skip writing version during backport (#65842)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65842

During backport, only parts of the model (like bytecode.pkl) needs to be re-written, while the rest of the model is the same. However, `version` will always be re-written when `PyTorchStreamWriter` is destrcuted.

Change version to optional and add an api to allow skipping writing version when closing the writer.
ghstack-source-id: 139580386

Test Plan: buck run papaya/scripts/repro:save_load

Reviewed By: iseeyuan, tugsbayasgalan

Differential Revision: D31262904

fbshipit-source-id: 3b8a5e1aaa610ffb0fe8a616d9ad9d0987c03f23
2021-10-01 21:18:31 -07:00
7941590a51 [JIT] Selectively enable precise alias analysis for TupleConstruct (#66025)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66025

This change adds an option to selectively enable precise alias analysis for `prim::`TupleConstruct` (introduced by D30437737 (cd458fe092)) to minimize its exposure only to `StaticRuntime` as of now.

Test Plan: Modified existing unit tests whose behavior depends on D30437737 (cd458fe092).

Reviewed By: eellison

Differential Revision: D31350285

fbshipit-source-id: 3ce777f07f99650d74634481ad0805192dce55c6
2021-10-01 20:42:22 -07:00
e4ee5ca698 Revert D31326599: [pytorch][PR] Compile without -Wno-unused-variable
Test Plan: revert-hammer

Differential Revision:
D31326599 (a6280ab653)

Original commit changeset: 924155f1257a

fbshipit-source-id: b8ee5bc0298637443232f5ee9ec79e51ed256faf
2021-10-01 20:40:47 -07:00
5ef350d7cc Revert D31359010: [pytorch][PR] Fix cang-tidy regressions caused by #65954
Test Plan: revert-hammer

Differential Revision:
D31359010 (c269f471f4)

Original commit changeset: dce4b91a9891

fbshipit-source-id: 085417432b6748d3672b9b7141460f47d1c17a7f
2021-10-01 20:35:35 -07:00
c269f471f4 Fix cang-tidy regressions caused by #65954 (#66040)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66040

Reviewed By: ZolotukhinM

Differential Revision: D31359010

Pulled By: malfet

fbshipit-source-id: dce4b91a98913c8d8c2d8f9ebc49654265239158
2021-10-01 19:50:53 -07:00
ca76e193a3 Fix nll_backward for negative weights (#64572)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64572

Fixes https://github.com/pytorch/pytorch/issues/64256
It also fixes an inconsistent treatment of the case `reduction = "mean"`
when the whole target is equal to `ignore_index`. It now returns `NaN`
in this case, consistently with what it returns when computing the mean
over an empty tensor.

We add tests for all these cases.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D31116297

Pulled By: albanD

fbshipit-source-id: cc44e79205f5eeabf1efd7d32fe61e26ba701b52
2021-10-01 19:41:51 -07:00
eb3b9fe719 [XROS][ML] System specific adjustments for UTs to work. (#65245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65245

Building and running c10 and qnnpack tests on XROS.

Notable changes:
- Adding #if define(_XROS_) in few places not supported by XROS
- Changing Threadpool to abstract class
ghstack-source-id: 139513579

Test Plan: Run c10 and qnnpack tests on XROS.

Reviewed By: veselinp, iseeyuan

Differential Revision: D30137333

fbshipit-source-id: bb6239b935187fac712834341fe5a8d3377762b1
2021-10-01 18:15:14 -07:00
363ccb257d GELU acc OP (#65957)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65957

added accelerator ops and unit test for GELU.

Test Plan: buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer

Reviewed By: 842974287

Differential Revision: D31277083

fbshipit-source-id: f66dd05ef574db58cfa599e3575f95f1ebe82e93
2021-10-01 17:49:53 -07:00
a6280ab653 Compile without -Wno-unused-variable (#65954)
Summary:
Delete `-Wno-unused-variable` from top level `CMakeLists.txt`
Still suppress those warnings for tests and `torch_python`

Delete number of unused variables from caffe2 code
Use `(void)var;` to suppress unused variable in range loops
Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65954

Reviewed By: ngimel

Differential Revision: D31326599

Pulled By: malfet

fbshipit-source-id: 924155f1257a2ba1896c50512f615e45ca1f61f3
2021-10-01 17:40:47 -07:00
10f6294281 Fix shape inference dim_type for Clip, Mean, Div (#65996)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65996

Test Plan:
Facebook
```
buck build caffe2/caffe2/opt:bound_shape_inference_test && ./buck-out/gen/caffe2/caffe2/opt/bound_shape_inference_test --gtest_filter=*Clip*
```
```
buck build caffe2/caffe2/opt:bound_shape_inference_test && ./buck-out/gen/caffe2/caffe2/opt/bound_shape_inference_test --gtest_filter=*Div*
```
```
buck build caffe2/caffe2/opt:bound_shape_inference_test && ./buck-out/gen/caffe2/caffe2/opt/bound_shape_inference_test --gtest_filter=*Mean*
```

Reviewed By: yinghai

Differential Revision: D31121298

fbshipit-source-id: f366d8f4d4d0be159b62bfaafc42ca924c05e022
2021-10-01 17:34:34 -07:00
e1d963e8fc model_dump: Fix memory computation when both constants and data tensors are present (#66006)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66006

Previously, this was resulting in a key collision and a crash.
ghstack-source-id: 139342089

Test Plan: Ran webdriver test locally.

Reviewed By: dhruvbird

Differential Revision: D31281092

fbshipit-source-id: f31311726c681d6d7e0504ff8e84c888af9054f0
2021-10-01 16:31:06 -07:00
23caeb3f71 model_dump: Add a helper to produce html with a single call (#66005)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66005

ghstack-source-id: 139342091

Test Plan: Unit test, and used in a notebook.

Reviewed By: dhruvbird

Differential Revision: D31281091

fbshipit-source-id: 1e4d0713b9796a3d182de9e676c3b3c3b1610d6e
2021-10-01 16:29:43 -07:00
d9a95e66f0 Upload test failures to RDS (#65873)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65873

Test Plan: Imported from OSS

Reviewed By: janeyx99

Differential Revision: D31296520

Pulled By: driazati

fbshipit-source-id: 0bd3fb6b62e49c7177199001fda0e7b124a22ab2
2021-10-01 16:25:51 -07:00
f85d7422bb [fx2trt]add support for torch.tile (#66016)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66016

Add acc_ops.tile and converter for it.

Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_tile

Reviewed By: wushirong

Differential Revision: D30587939

fbshipit-source-id: 1e2613cfca486fe54fcc0d38e5c7cdeb7d0ed4a0
2021-10-01 16:06:09 -07:00
060e41eafa Forward fix type hint for DataLoader (#66001)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66001

Test Plan: Imported from OSS

Reviewed By: NivekT

Differential Revision: D31340565

Pulled By: ejguan

fbshipit-source-id: d05ae42ebf93f61d781dc5d81ef0222e24f5acb3
2021-10-01 15:48:45 -07:00
ad889d0b5e Revert D30634700: [pytorch][PR] Fix typo in tensor docs
Test Plan: revert-hammer

Differential Revision:
D30634700 (d937473709)

Original commit changeset: e8952be20966

fbshipit-source-id: b18694e332023abcdf17ec1900b81b00d21f1014
2021-10-01 15:23:38 -07:00
7d22007902 [fx-acc] add acc_op optimization flags and decorator (#65928)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65928

This diff adds a decorator for adding flags to acc_ops. These flags inform graph optimizations that the op is eligible for optimization by some general criteria (e.g. op acts elementwise, op does quantization).

This makes it simpler to expand acc_ops. The user can add an op and add flags to enable optimization without going through all graph opts and trying to determine if new acc_op is eligible for the graph optimization.

Even though our list of graph opts is small now we already see that for `sink_reshape_ops` we had hardcoded 11 pointwise acc_ops, now there are 24 pointwise acc_ops.

Test Plan:
```
buck test mode/opt glow/fb/fx/graph_opts:test_fx_sink
```

```
Parsing buck files: finished in 0.5 sec
Downloaded 0/3 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 37.1 sec (100%) 10279/10279 jobs, 3/10279 updated
  Total time: 37.7 sec
More details at https://www.internalfb.com/intern/buck/build/e13521bb-6142-4960-8cdd-6b5e4780da96
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 16260a2a-d364-4605-9111-6f2a19317036
Trace available for this run at /tmp/tpx-20210922-124332.623880/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4222124720425564
    ✓ ListingSuccess: glow/fb/fx/graph_opts:test_fx_sink - main (6.038)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_sink - test_no_sink_concat_below_quantize (glow.fb.fx.graph_opts.tests.test_fx_sink.TestSink) (0.036)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_sink - test_sink_concat_below_quantize (glow.fb.fx.graph_opts.tests.test_fx_sink.TestSink) (0.048)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_sink - test_sink_reshape_nodes (glow.fb.fx.graph_opts.tests.test_fx_sink.TestSink) (0.058)
    ✓ Pass: glow/fb/fx/graph_opts:test_fx_sink - test_no_sink (glow.fb.fx.graph_opts.tests.test_fx_sink.TestSink) (0.057)
Summary
  Pass: 4
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4222124720425564
```

Reviewed By: jfix71

Differential Revision: D31121321

fbshipit-source-id: 6f6e3b8e2d57ea30766fa6bee34ca207cec86f0f
2021-10-01 15:19:35 -07:00
d937473709 Fix typo in tensor docs (#64160)
Summary:
Remove extra character from `torch.qfint32`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64160

Test Plan: Docs

Reviewed By: jerryzh168

Differential Revision: D30634700

Pulled By: axitkhurana

fbshipit-source-id: e8952be20966b9a3f9d62d9957ae255d5d4889bb
2021-10-01 14:57:55 -07:00
8e8695285f Re-generate workflows (#66027)
Summary:
Fix master breakage

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66027

Reviewed By: suo, malfet

Differential Revision: D31353922

Pulled By: driazati

fbshipit-source-id: cdb7f639608999b6ee72f6b1000d7ecbc02efc95
2021-10-01 14:56:51 -07:00
894d296bae Remove usage of GitHub's artifact store in linux jobs (#65875)
Summary:
The docs stuff is unnecessary since they are hosted in S3 anyways, and the reports are mirrored in S3 which has better upload/download speed and is available as soon as the upload is done rather than once the workflow is complete.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65875

Reviewed By: seemethere

Differential Revision: D31296500

Pulled By: driazati

fbshipit-source-id: 8c371230d0c8c0eb785702df9ae495de85f60afa
2021-10-01 13:49:44 -07:00
6e8ffd191e Fix typo in name of LayerNormBackwardCUDAKernel (#66000)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66000

Saw this in nvprof and I'm just a little too nitpicky to let it slide!
ghstack-source-id: 139547271

Test Plan: CI

Reviewed By: xiaomengy

Differential Revision: D31340262

fbshipit-source-id: ab48dc99c34a74585e66800b4bbcccc6aabbaff2
2021-10-01 12:28:59 -07:00
ffede499b2 [PyTorch][Static Runtime] Fast path for contiguous to_copy (#65499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65499

When the tensors in question are contiguous, there is no need to go through dispatch, use TensorIterator, etc.
ghstack-source-id: 139549027

Test Plan:
Ran ptvsc2_predictor_bench for ctr_mobile_feed local net following https://fb.quip.com/q8hBAFGMeaOU (but without the profile and compare_results options).

Before:

I0922 14:00:32.261942 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 7.18124. Iters per second: 139.252
I0922 14:01:44.865965 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 7.25314. Iters per second: 137.871
I0922 14:02:56.929602 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 7.1986. Iters per second: 138.916
I0922 14:04:05.923025 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.89211. Iters per second: 145.093
I0922 14:05:17.953056 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 7.19577. Iters per second: 138.971

mean: 7.144172, stddev: 0.1283

After:

I0922 13:51:55.233937 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.79709. Iters per second: 147.122
I0922 13:53:03.062682 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.77605. Iters per second: 147.579
I0922 13:54:10.230386 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.70993. Iters per second: 149.033
I0922 13:55:18.403434 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.81044. Iters per second: 146.833
I0922 13:56:26.568646 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.80965. Iters per second: 146.85

mean: 6.800632, stddev: 0.013227

Looks like about a 5.3% improvement.

Reviewed By: hlu1

Differential Revision: D31125492

fbshipit-source-id: 92ab5af242d0a84dcf865323a57b48e8374eb823
2021-10-01 12:13:33 -07:00
7b10a76e05 [PyTorch] Try removing Android strtod implementation (#65713)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65713

This may not be needed anymore.
ghstack-source-id: 139114284

Test Plan: see if it builds

Reviewed By: dhruvbird

Differential Revision: D31216245

fbshipit-source-id: 29c9c013f94070c7713e46027881cb693b144d36
2021-10-01 11:43:15 -07:00
176d3c6fb4 [PyTorch] Fix many Tuple::elements() callsites (#64065)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64065

It is only safe to mutate Tuple elements if you are the sole owner
of the tuple. The most efficient way to do this, then, is
`std::move(*std::move(tupleIValue).toTuple()).elements()` (the
innermost move allows `IValue::toTuple()` to avoid a refcount bump and
the outermost move allows the element vector to be moved out of the
tuple), but many callsites write simply
`tupleIValue.toTuple().elements()`, which incurs many extra refcount
bumps.

ghstack-source-id: 139468088

Test Plan: CI

Reviewed By: ezyang

Differential Revision: D30592621

fbshipit-source-id: e8312de866de09b9ea2a62e5128cbf403ee16f09
2021-10-01 11:36:05 -07:00
f14e5e636d [fx2trt]fix slice tensor converter (#65960)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65960

Fix a bug in the converter and add support for negative dim.

Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_narrow

Reviewed By: wushirong

Differential Revision: D31310232

fbshipit-source-id: 62887369d830202cae6d63b41747225b12dcf754
2021-10-01 11:29:42 -07:00
21eebc9fd6 [PyTorch][easy] Use copy-and-move instead of copy-and-swap in IValue::operator= (#65826)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65826

Should be marginally more efficient.
ghstack-source-id: 139315050

Test Plan: CI

Reviewed By: ezyang

Differential Revision: D31272489

fbshipit-source-id: 7c309d67a0ec0ada35a5b62497bac374538394a9
2021-10-01 11:16:42 -07:00
592481a5cc [fx][const_fold] Refactor to use base split module to simplify, and correctly handle non-single-Tensor outputs (#65933)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65933

We use `split_module` to split the input model that we want to const fold into const and non-const subgraphs. Previously we were taking the non-const graph and trying to hack it back into the same signature as the input model. However this was complex/buggy.

Instead, refactor to just keep using the base split module that contains both const and non-const graphs. This means we:
- Inline the non-const graph into the split module
- Remove the const graph from the module and replace it with a getattr that will be run to insert that attr when we `run_folding`

Test Plan: Added test coverage to cover newly supported folding, and updated other tests for new strategy.

Reviewed By: yinghai

Differential Revision: D31293307

fbshipit-source-id: 6e283a8c7222cf07b14e30e74dffc8ae5ee8b55f
2021-10-01 10:26:29 -07:00
34682377b9 [iOS][CI] Update dev certs (#66004)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65988

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66004

Reviewed By: xta0

Differential Revision: D31340893

Pulled By: malfet

fbshipit-source-id: 3bf0be266e9686a73d62e86c5cf0bebeb0416260
2021-10-01 09:38:49 -07:00
ccf8d48f16 Revert D31317680: [pytorch][PR] Avoid saving self forsoftmax and log_softmax
Test Plan: revert-hammer

Differential Revision:
D31317680 (5f7cadc7aa)

Original commit changeset: b3b921e06775

fbshipit-source-id: 1bca0672383536a2c21243ceb52349c766a94344
2021-10-01 09:31:44 -07:00
21da6ae9ce suppress mypy error (#66003)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66003

Differential Revision:
D31340874
D31340874

Test Plan: Imported from OSS

Reviewed By: seemethere

Pulled By: suo

fbshipit-source-id: d9ef0f40625fe5ff21f8a5e044d5a75400367dc2
2021-10-01 09:17:42 -07:00
eac218dbc6 Revert "Port sort kernel to structured kernels. (#62391)" (#65876)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65876

This reverts commit 93852bb2d41d90b6ac660015d79f7474bcebb774.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D31296329

Pulled By: bdhirsh

fbshipit-source-id: 85eae72f2346d69290f440f5393a7da096a96c6e
2021-10-01 07:50:28 -07:00
5f7cadc7aa Avoid saving self forsoftmax and log_softmax (#65242)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64000
 - updates double backward formula to compute grad wrt output instead of self
 - ~~In some of the error messages, we still refer to the dtype of the input, even though we are now checking the dtype of the output~~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65242

Reviewed By: malfet

Differential Revision: D31317680

Pulled By: soulitzer

fbshipit-source-id: b3b921e06775cfc12e5a97a9ee8d73aec3aac7c3
2021-10-01 07:49:07 -07:00
383c0a3858 Fix internal assert failure for torch.all and torch.any with requires_grad=True (#65714)
Summary:
This PR fixes https://github.com/pytorch/pytorch/issues/58547.
I added an OpInfo-based test that fails on master and passes with the
proposed changes.

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65714

Reviewed By: saketh-are, mruberry

Differential Revision: D31248307

Pulled By: albanD

fbshipit-source-id: 041eaa9b744c3043f78dd8ae5f457f67c311df4f
2021-10-01 07:32:44 -07:00
53c0d91db9 Make autograd codegen for differentiable outputs safer to use (#65823)
Summary:
This PR adds raising an error when `len(output_differentiability) != len(outputs)`

Notes in derivatives.yml tell that
> 'output_differentiability' and value a list of the same length as the number of outputs from the forward function.

but it was not enforced in codegen leading to confusion and unexpected bugs https://github.com/pytorch/pytorch/issues/65061#issuecomment-930271126.

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65823

Reviewed By: mrshenli

Differential Revision: D31307312

Pulled By: albanD

fbshipit-source-id: caeb949e9249310dffd237e77871e6d0d784e298
2021-10-01 07:27:57 -07:00
bff8d8fd28 [nnc] Add BufHandle.store to python API (#65213)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65213

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D31328502

Pulled By: bertmaher

fbshipit-source-id: 1f260f68692c3859350587afe021a500672d79f0
2021-10-01 06:59:50 -07:00
8cf047afac [nnc] Add call_with_numel interface for fast CUDA calls (#65213)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65213

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D31319012

Pulled By: bertmaher

fbshipit-source-id: 93fee80f956795470f5a2ce3b33c2ea2f132036f
2021-10-01 06:58:37 -07:00
8595b6eeed Avoid UB when indexing into size-0 tensors (#65878)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65878

If we attempt to compute an offset into an empty tensor we trigger UB, since
we'd be adding an offset to a nullptr, which is UB
(https://reviews.llvm.org/D67122) even if we never use the pointer.

Since indexing into an empty tensor yields an empty tensor anyways, let's just
return the underlying (null) data ptr in this case.

ghstack-source-id: 139448496

Test Plan:
r-barnes originally pointed this out to me in a failing TE fuser test:
https://www.internalfb.com/intern/testinfra/diagnostics/5910974579561425.281475022329152.1632898053/
```
buck test mode/dev //caffe2/test:jit -- --exact 'caffe2/test:jit - test_unsupported_nn_functional_pad_circular_cpu_float32 (test_jit_fuser_te.TestNNCOpInfoCPU)'
```

But it turns out it's easily triggered by anything that tries to operate on a
slice of a size-0 tensor:
```
def test_pad(self):
    F.pad(torch.ones(0, 3, 3), (1, 2), 'circular')

def test_index(self):
    input = torch.zeros(0, 3, 3)
    out = torch.zeros(0, 3, 6)
    out[..., 1:4] = input[..., 0:3]

def test_add(self):
    torch.ones(0, 2)[:, 1] + torch.ones(0, 1)
```

What's the right place for these sort of operator corner-case tests?  Should
they be/are they part of OpInfo?

Reviewed By: jamesr66a

Differential Revision: D31296914

fbshipit-source-id: 0ef52ad311dceeed985498f8d9390bc6fbaefbfc
2021-10-01 06:55:15 -07:00
fc52f1293e Improve pytorch type hints (Dataloader, trig functions)
Summary:
This is to fix Pyre errors in our applications:
* calling `tensor.cos()` etc.
* creating a data loader with batch sampler that is `List[List[int]]`.

Test Plan: TODO: rebase the diffs and run Pyre.

Reviewed By: ejguan

Differential Revision: D31309564

fbshipit-source-id: 1c6f3070d7570260de170e2fe2153d277b246745
2021-10-01 06:53:57 -07:00
982ef8837b [Static Runtime] Fuse ListUnpack + gather_ranges_to_dense (#65116)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65116

Fuse `fb::gather_ranges_to_dense` with `prim::ListUnpack`.
```
%0 : Tensor[] = fb::gather_ranges_to_dense(...)
%1: Tensor, %2: Tensor, ... = prim::ListUnpack(%0)
```
turns into:
```
%0: Tensor, %1: Tensor, ... = fb::gather_ranges_to_dense(...)
```

Reviewed By: hlu1

Differential Revision: D30973525

fbshipit-source-id: f0349baa1622b697ee2ab652376a24ec0d89e819
2021-10-01 06:49:54 -07:00
227e37dd39 pytorch quantization ao migration phase 2: caffe2/test (#65832)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65832

Renames `torch.quantization` to `torch.ao.quantization` in `caffe2/test`
folder.

```
find caffe2/test/ -type f -name "*.py" -print0 | xargs -0 sed -i "s/torch\.quantization/torch.ao.quantization/g"
HG: manually revert the files testing this migration
hg revert caffe2/test/quantization/ao_migration/common.py
hg revert caffe2/test/quantization/ao_migration/test_ao_migration.py
```

Test Plan: CI

Reviewed By: z-a-f

Differential Revision: D31275754

fbshipit-source-id: 4ed54a74525634feb0f47a26d071102e19c30049
2021-10-01 06:26:30 -07:00
dac35b3592 pytorch quantization ao migration phase 2: torch/jit (#65829)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65829

Renames `torch.quantization` to `torch.ao.quantization` in `torch/jit` folder.

```
find caffe2/torch/jit/ -type f -name "*.py" -print0 | xargs -0 sed -i "s/torch\.quantization/torch.ao.quantization/g"
```

Test Plan: CI

Reviewed By: z-a-f

Differential Revision: D31273365

fbshipit-source-id: 350eb116148d91b967d428b54413caee4fd68438
2021-10-01 06:22:22 -07:00
e3af4be963 pytorch quantization ao migration phase 2: caffe2/benchmark (#65833)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65833

Renames `torch.quantization` to `torch.ao.quantization` in `caffe2/benchmarks`
folder.

```
find caffe2/benchmarks/ -type f -name "*.py" -print0 | xargs -0 sed -i "s/torch\.quantization/torch.ao.quantization/g"
```

Test Plan: CI

Reviewed By: z-a-f

Differential Revision: D31275963

fbshipit-source-id: 8596bf28df5c3ad2c4490ac8abb285d6517c0116
2021-10-01 06:17:36 -07:00
c1447f06a8 [special] special alias for softmax (#62251)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62251

Reviewed By: H-Huang

Differential Revision: D31141834

Pulled By: mruberry

fbshipit-source-id: aecaf62af248e9034ef589159ce0fb325c729493
2021-10-01 03:55:32 -07:00
c27b427cd9 [sparsity] Add m-out-of-n support in the WeightNormSparsifier (#65295)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65295

The m-out-of-n is implemented as follows:

1. Compute the blocks that need to be sparsified using the weight-norm criterion
2. Within each block below the threshold find the smallest absolute value elements
3. Zero out only the smallest values within each block

m-out-of-n describes sparsification scheme where in a block with "n" elements, only "m" of them would be zeroed-out.
Block sparsity, with the whole block being all zeros, is a special case of m-out-n: If m==n, the whole block is reset.

This echoes the implementation described in the https://github.com/pytorch/pytorch/issues/59835,
as well as meets the support of the nVidia cusparselt requirements.
To support the CUDA sparsity (2/4), one would need to set the sparsity_level to 1.0.
That translates to all blocks of shape 1x4 within a tensor will sprasify with 2-out-4 scheme.

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31186828

Pulled By: z-a-f

fbshipit-source-id: 7bd3e2707915b90f4831859781fc6e25f716c618
2021-10-01 03:19:15 -07:00
8b1aa85388 [sparsity] Change API to take FQNs as configuration (#65296)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65296

The original API described in the https://github.com/pytorch/pytorch/issues/59835
assumed that the per-layer configuration would take a module/layer
reference. However, a more useful approach is to refer to the layers
by their fully qualified names (FQN). That allows us to store the
configuration in a file without serializing the models.

We define a layer's FQN as it's "path" within a model. For example,
if one can refer to a model using `model.layer0.sublayerX`, the FQN
of the sublayerX is `'layer0.sublayerX'`.

Test Plan:
```
python test/test_ao_sparsity.py -- TestBaseSparsifier
buck test mode/opt //caffe2:test -- TestBaseSparsifier
```

Reviewed By: gchanan

Differential Revision: D31186830

Pulled By: z-a-f

fbshipit-source-id: d8d87f1c054e5c10d470e67837476a11e0a9b1d4
2021-10-01 03:17:31 -07:00
ea0de37d2e [PyTorch] Avoid string construction from const char* and speedup empty string creation if error messages are suppressed (#65939)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65939

This change includes 2 separate optimizations.

1. Provide an overload of `debugString(const char*, ...)` in addition to `debugString(std::string, ...)` for cases where `const char*` is passed in to avoid `std::string` construction in cases where `STRIP_ERROR_MESSAGES` is also defined and the caller is passing in a `const char*`
2. Return `std::string("", 0)` instead of `""` since the former triggers no call to `std::basic_string`'s constructor whereas the latter does. [Godbolt Link](https://godbolt.org/z/oTExed5h8). However, I'm surprosed by this since the man page for [std::basic_string](https://en.cppreference.com/w/cpp/string/basic_string/basic_string) clearly states that the constexpr overload is since C++20, and I am building using `-Os -std=c++17`

Godbolt Screenshot:

{F667311023}

ghstack-source-id: 139507542

Test Plan:
CI and local build via:

```
buck build //xplat/caffe2/fb/lite_predictor:lite_predictor
```

Reviewed By: swolchok

Differential Revision: D31312942

fbshipit-source-id: aa24abbfe1c16419f235d037595321982614c5ea
2021-10-01 00:17:21 -07:00
2828ce53fd Added jit log stream changing function and some refactor (#65768)
Summary:
Description:
- Have only added `stdout` and `stderr` as possible options from python
  API for now. We can do file path passing later maybe.
- Put the class `JitLoggingConfig` in the cpp file as none of its methods were being used outside of this file.

Python API:
`torch._C._jit_set_logging_stream('stdout|stderr')`
C++ API:
`::torch::jit::set_jit_logging_output_stream(ostream);`

Testing:
- Tested python API locally.
- Unit test for the C++ API is written

Fixes https://github.com/pytorch/pytorch/issues/54182

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65768

Reviewed By: mrshenli

Differential Revision: D31291739

Pulled By: ZolotukhinM

fbshipit-source-id: eee72edc20488efad78a01c5b0ed8a132886a08d
2021-09-30 23:25:11 -07:00
33c03cb61a [deploy][1/n] Make deploy code conform to PyTorch style. (#65861)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65861

First in a series. This PR changes the code in deploy.h/cpp and
interpreter_impl.h/cpp to be camel case instead of snake case. Starting
with this as it has the most impact on downstream users.

Test Plan: Imported from OSS

Reviewed By: shannonzhu

Differential Revision: D31291183

Pulled By: suo

fbshipit-source-id: ba6f74042947c9a08fb9cb3ad7276d8dbb5b2934
2021-09-30 22:59:47 -07:00
765b6a90f3 [TensorExpr] Move lowerings registration from kernel.cpp to lowerings.cpp. (#65553)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65553

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31148921

Pulled By: ZolotukhinM

fbshipit-source-id: 772062155043d4be9e9a25f6259b8e4a6cb762f4
2021-09-30 22:56:22 -07:00
015e0079e3 [TensorExpr] Move 'compute*' functions to operators/... (#65552)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65552

This PR is mostly a verbatim move of several functions to different
files. The goal is to have more consistency in what resides where.

With this PR:
* All `compute*` functions defining how a given operator needs to be
lowered to TE IR will reside in `operators/*.{cpp,h}`.
* Auxiliary functions for these functions will reside in
`operators/misc.cpp`. `compute*` functions for ops not belonging
anywhere else can also go to that file.
* `operators/unary.*` is renamed to `operators/pointwise.*` and now
includes functions like `computeTwoOperands`.
* `kernel.*` now contains *only JIT-related* logic and implementations of
`TensorExprKernel` methods.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31148923

Pulled By: ZolotukhinM

fbshipit-source-id: e36ad8e779b8d30a33b49ea4ebf6d6a7438989f4
2021-09-30 22:56:20 -07:00
3a0165da49 [TensorExpr] Port NNC lowerings to the new registry mechanism. (#65551)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65551

Previously we had a big switch on Op kind to decide how to lower a given
JIT operator to NNC. This PR changes this switch to a hash table lookup.

Why? This helps us with at least two things:
1) With this approach we can easily check if we know how to handle a
given node in advance - i.e. we can inspect the entire graph and tell
whether it's possible to compile it or not without actually trying to do
that and dying in the middle. This would allow us to, say, provide
user-friendly error messages in AOT workflow.
2) We can switch to use schema instead of op kind to determine correct
lowering. Unlike op schema, op kind might be ambigous (see e.g. #64963)
and using it instead of schema can lead to bugs.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31148926

Pulled By: ZolotukhinM

fbshipit-source-id: ac12684e2126c899426ef5e4cc1e3f70fa01f704
2021-09-30 22:56:18 -07:00
eee9ad0fdd [TensorExpr] Add a skeleton for a registry of NNC lowerings. (#65550)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65550

This PR adds the source files and the class for the registry, subsequent
PRs actually port existing lowerings to this mechanism.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31148922

Pulled By: ZolotukhinM

fbshipit-source-id: 4c087b22ee898d5a5a18a5d2a4bb795aa2ffd655
2021-09-30 22:56:16 -07:00
d84191fcc6 [TensorExpr] Kernel: make prim::ConstantChunk handled like other ops. (#65549)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65549

Previously it had a special handling, with this change it follows the
same mechanism as other ops.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31148924

Pulled By: ZolotukhinM

fbshipit-source-id: 572d8ae5e123e7a0e2a656154d7bd0f73c785a06
2021-09-30 22:55:00 -07:00
a6ad2b41ac [Static Runtime] Make module_ optional in StaticModule (#65882)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65882

`torch::jit::Module` is refcounted. There is no need to wrap it in a `shared_ptr`.

Test Plan:
```
buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest
```

Reviewed By: mikeiovine

Differential Revision: D31012222

fbshipit-source-id: 74d234bd85423e5ba0e396f24899631354a2c74b
2021-09-30 22:48:49 -07:00
08df4c2b3c slow_conv2d grad_input: avoid dispatch in parallel region (#65725)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65725

See gh-56794

Avoid dispatch inside of parallel_for by:
1. Replacing Tensor slicing with TensorAccessor
2. Call `grad_input.zero_()` only once, outside of the parallel region
3. Replace `at::mm` with a `gemm` call

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D31257876

Pulled By: ngimel

fbshipit-source-id: f2902edeccd161431c1dfb1ab3e165d039ec259d
2021-09-30 22:47:31 -07:00
6502fb89dd Make JIT Aliasing Test Less Brittle (#65493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65493

Added a last resolve to use whatever ATen operator that has Tensor outputs in the graph as the operator node to check alias annotation.

Test Plan: python test/test_ops.py -k test_variant_consistency_jit

Reviewed By: mrshenli

Differential Revision: D31321221

Pulled By: alanwaketan

fbshipit-source-id: f4a5cbfd36bd0867d8c1bf9de9a65365ee7c35d6
2021-09-30 22:43:03 -07:00
4f5ea5983a [QPL] move metadata logging to markerEnd for model run QPL (#65451)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65451

This diff moved metadata logging from marker start to marker end. This should improve perf because we can skip metadata logging when mark is not sampled (using isMarkerOn)

Test Plan:
Verified metadata are logged: https://fburl.com/scuba/qpl_metrics/pytorch_employee/armjgtyw
https://fburl.com/scuba/qpl_metrics/pytorch_employee/zz36zkr1

Reviewed By: xcheng16

Differential Revision: D31105548

fbshipit-source-id: 0eafaaefecb7e230021616e397e548a2fd2b92e9
2021-09-30 22:12:40 -07:00
2481c06496 [caffe2] fix LLVM-12 nullptr-with-nonzero-offset UBSAN error (#65506)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65506

Test Plan: run a adfinder canary and verify this error is fixed.

Reviewed By: swolchok

Differential Revision: D31130083

fbshipit-source-id: c31f179f8a7de75ed6f6e7ee68b197f2970ddd3d
2021-09-30 21:47:25 -07:00
f6dfac6974 Migrate THCCachingHostAllocator to ATen (#65746)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65746

This also removes the cudaHostAllocator field on THCState, since there
doesn't seem to be an API anywhere for customizing it.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D31236630

Pulled By: ngimel

fbshipit-source-id: 2a8e756222ae70565e77f8e7139d60ec5be32276
2021-09-30 21:26:38 -07:00
d39790340d [ONNX] Enable export of __xor_ (#64042) (#64581)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64581

* Enbale xor

* Update test_pytorch_onnx_onnxruntime.py

* Update symbolic_opset9.py

* Update symbolic_opset9.py

* Update test_pytorch_onnx_onnxruntime.py

* Update symbolic_opset9.py

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D30919598

Pulled By: malfet

fbshipit-source-id: 044e55d0697da0050f26a6ceccd1517493d7e8a6
2021-09-30 21:09:01 -07:00
e598ba2ef3 [ONNX] Fix inplace fill_ dtype export mismatch (#64233) (#64580)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64580

Append `type_as` after convert `fill_` to `full_like` without dtype argument.

BowenBao <bowbao@microsoft.com>

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D30919599

Pulled By: malfet

fbshipit-source-id: f174977ced8f2c991b0615b65ff7c23fecf301c2
2021-09-30 21:08:59 -07:00
89cbe6229d [ONNX] Update doc and error message for indexing export (#64290) (#64579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64579

Added suggested workarounds into indexing section of onnx export documentation.
Update indexing export warning message with link to documentation.

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D30919603

Pulled By: malfet

fbshipit-source-id: 7fe65cb5aa7de4f7d93ff05011ba22f5adb27811

Co-authored-by: BowenBao <bowbao@microsoft.com>
2021-09-30 21:08:56 -07:00
d4ff344fae [ONNX] Fix remainder export (#64230) (#64578)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64578

* Fix remainder export for edge case when input is negative. New export relies on true_divide export.
* Simplified true_divide export. Cleaned up redundant code which is handled by scalar type analysis pass. Removed dependency on `onnx::Where`, thus supports opset 7 & 8.

Fixes #60179

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D30919601

Pulled By: malfet

fbshipit-source-id: 0f78621c0ac3bdb6bf4225e049ba5f470dc8ab12

Co-authored-by: BowenBao <bowbao@microsoft.com>
2021-09-30 21:08:54 -07:00
0f0ef4fe64 Add onnx test for batched_nms (#53175) (#64381)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64381

* Added new ONNX test for batched_nms

* Update test according to PR in torchvision

* Update test/onnx/test_pytorch_onnx_onnxruntime.py

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D30919602

Pulled By: malfet

fbshipit-source-id: edfb5b9f75077429f7f242fd6ac06d962968dfba

Co-authored-by: Bowen Bao <imbowenbao@outlook.com>
2021-09-30 21:08:52 -07:00
7e15f2ddaa [ONNX] Fix gather squeeze axis in constant folding (#63588) (#64379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64379

* Fix gather squeeze axis in constant folding

* mypy

* fix indent

* address comments

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D30919604

Pulled By: malfet

fbshipit-source-id: 90edb054491433a0da2fe82324ac7c12f1ef062b
2021-09-30 21:08:50 -07:00
41bdfe3919 [ONNX] Fix cuda test case (#63597) (#64378)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64378

* skip script test for unsupported autocast.
* Fix test case by adding missed `autocast` and `model.cuda()`.

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D30919600

Pulled By: malfet

fbshipit-source-id: 3231fc672d97de487d6e4460626df0ba25f212ce

Co-authored-by: BowenBao <bowbao@microsoft.com>
2021-09-30 21:08:48 -07:00
2d61009f4a [ONNX] Fix input sequence for pad op (#60554) (#64377)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64377

* Fix for input primitive sequence

* Test mypy

* Fix for tracing tuples

* Fix for extra inputs

* flake8

* Rebase

* Fix for tracing tuples

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D30919606

Pulled By: malfet

fbshipit-source-id: a718c4a12cda77b968cb636acd7aa63d7b5ba326
2021-09-30 21:08:45 -07:00
f17ee368b3 Fix empty size constant creation (#63607) (#64376)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64376

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D30919608

Pulled By: malfet

fbshipit-source-id: 0e789e8470ce0f130148df764ce77f6d4fd0a274
2021-09-30 21:08:43 -07:00
84190dafa8 [ONNX] Update instance_norm implementation and support training (#60538) (#64375)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64375

* Update the instance_norm track_running_stats=True implementation and support the training mode
* Reference: 9baf75c86e/aten/src/ATen/native/Normalization.cpp (L532)
* Fix https://github.com/pytorch/pytorch/issues/53887

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D30919605

Pulled By: malfet

fbshipit-source-id: 306eb2a1122bb5d90dcb7c18260a3a2057a21c34

Co-authored-by: hwangdeyu <dejack953@outlook.com>
2021-09-30 21:07:26 -07:00
3d6d4f4322 [fx2trt][quant] Add lowering support for per channel quantization in fx2trt (#64787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64787

This PR added support for lowering per channel quantization and dequantization operators
in fx2trt, this also extends TensorMeta with extra arguments corresponding to per channel quantized Tensors,
initially I was thinking of adding a qpram that can capture everything, but currently we still have some lowering support
for fbgemm ops (which has scale and zero_point in operator interface). I think we can move everything to qprams
after we deprecate lowering support for fbgemm ops in the future.

Test Plan:
Test for per channel weight:
```
python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py
```

change BC compatibility test expect for TensorMeta
```
python test/test_fx.py TestFXAPIBackwardCompatibility.test_class_member_back_compat --accept
```

Imported from OSS

Reviewed By: jfix71, mrshenli, 842974287

Differential Revision: D30879848

fbshipit-source-id: 76c3804bb1d9343183ae53d9f02c1a3bf6c79e1c
2021-09-30 18:54:14 -07:00
207fefc988 Delete rouge cu102 windows builds (#65961)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65961

Reviewed By: seemethere

Differential Revision: D31325279

Pulled By: malfet

fbshipit-source-id: b8748c0040cdcfb8182eb7c59a3770b7d0681de9
2021-09-30 18:44:02 -07:00
b3da2afebe Clarified difference in behavior of empty_strided and as_strided (#64568)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64568

Fix: #64389

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D31299999

Pulled By: mruberry

fbshipit-source-id: dd538ffa7cc1267ab6472806f4216b170dd0faad
2021-09-30 17:27:59 -07:00
22f36353dc Revert D31137652: [pytorch][PR] Skip failing tests when LAPACK and MAGMA are not available
Test Plan: revert-hammer

Differential Revision:
D31137652 (dd354117ef)

Original commit changeset: c969f75d7cf1

fbshipit-source-id: bc4cde4eeb5d38ac940ebb471abbd8b9009b3aee
2021-09-30 16:08:57 -07:00
6285348f06 Implement n-dimensional hermitian FFTs (#63890)
Summary:
Closes https://github.com/pytorch/pytorch/issues/59127

cc mruberry peterbell10 walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63890

Reviewed By: ngimel

Differential Revision: D30761909

Pulled By: mruberry

fbshipit-source-id: 06e1e4dc65726f35c99a74f18b9fa36eb7d694a5
2021-09-30 16:02:28 -07:00
70f9f58a71 Add __module__ to torch.dtype.__dict__ (#65182)
Summary:
torch.dtype.__reduce__ returns a string, which causes Pickle to look
up the object by module and name. In order to find the right module,
Pickle looks for __module__ on the object; if it doesn't find that, it
falls back to searching sys.modules.

Previously, torch.dtype instances did not have a `__module__`
attribute, so pickling dtypes would fall back to a search of
sys.module.

Instances of normal Python objects have a `__module__` attribute
because normal Python classes have a `__module__` key in their
`__dict__`. Imitate that by populating one in `torch.dtype`.

We set the field in `tp_dict` before calling `PyType_Ready` (instead
of afterwards) because of the doc warning against mutating a type's
dictionary once initialized:
https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_dict

fixes https://github.com/pytorch/pytorch/issues/65077

 ---

I didn't add any tests because I didn't see any obvious places with similar tests for pickling or dtype objects. Let me know if I missed the right place, or should start one.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65182

Reviewed By: mrshenli

Differential Revision: D31310530

Pulled By: ezyang

fbshipit-source-id: 20cd713ce175a709d6ce47459c3891162ce29d77
2021-09-30 14:58:11 -07:00
38c77539e8 [PyTorch][Edge] Fix inefficiency in objLoaderMobile (#65710)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65710

No need to incur extra refcount bumps, and no need to use a stringstream for what are presumably string keys anyway.
ghstack-source-id: 139325445

Test Plan: CI, reviewers to confirm the keys are supposed to be strings

Reviewed By: dhruvbird

Differential Revision: D31215347

fbshipit-source-id: 82be93cb2e57aefe94edf74d149115cb734112be
2021-09-30 14:53:40 -07:00
8f3983254b [MicroBench] Added a micro benchmark for prefix sum (#65790)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65790

Here are the results of the benchmark:

* ATen - version that calls `at::cumsum`
* NNC - a simple prefix-sum loop implemented in NNC (not vectorized)
* Local - a C++ implementation of the simple prefix-sum loop
* LocalAVX2 - a vectorized C++ implementation of prefix-sum, only using AVX2
* LocalAVX512 - a vectorized C++ implementation of prefix-sum, using AVX512.

The vectorized implementations are from the paper "Parallel Prefix Sum with SIMD" in ADMS' 20.

```
$ OMP_NUM_THREADS=1 ./buck-out/opt/gen/caffe2/benchmarks/cpp/tensorexpr/tensorexpr_bench --benchmark_filter=PrefixSumBench
Run on (36 X 1601 MHz CPU s)
2021-09-28 23:13:12
------------------------------------------------------------------------------------------
Benchmark                                   Time           CPU Iterations UserCounters...
------------------------------------------------------------------------------------------
PrefixSumBench/ATen/64                   1289 ns       1289 ns     543199 GB/s=397.069M/s
PrefixSumBench/ATen/256                  1867 ns       1867 ns     374232 GB/s=1096.8M/s
PrefixSumBench/ATen/1024                 4169 ns       4169 ns     167889 GB/s=1.9649G/s
PrefixSumBench/ATen/4096                14137 ns      14136 ns      49266 GB/s=2.31806G/s
PrefixSumBench/ATen/16384               49887 ns      49883 ns      13988 GB/s=2.6276G/s
PrefixSumBench/ATen/65536              193742 ns     193686 ns       3628 GB/s=2.7069G/s
PrefixSumBench/ATen/262144             764803 ns     764774 ns        917 GB/s=2.74219G/s
PrefixSumBench/ATen/1048576           3040653 ns    3040277 ns        231 GB/s=2.75916G/s
PrefixSumBench/Local/64                   586 ns        586 ns    1197003 GB/s=873.244M/s
PrefixSumBench/Local/256                 1077 ns       1077 ns     646265 GB/s=1.90143G/s
PrefixSumBench/Local/1024                3050 ns       3050 ns     229458 GB/s=2.68579G/s
PrefixSumBench/Local/4096               11910 ns      11910 ns      58953 GB/s=2.75132G/s
PrefixSumBench/Local/16384              43204 ns      43202 ns      16081 GB/s=3.03393G/s
PrefixSumBench/Local/65536             167966 ns     167966 ns       4154 GB/s=3.12139G/s
PrefixSumBench/Local/262144            667631 ns     667613 ns       1048 GB/s=3.14127G/s
PrefixSumBench/Local/1048576          2654785 ns    2654631 ns        264 GB/s=3.15999G/s
PrefixSumBench/NNC/64                     642 ns        642 ns    1095277 GB/s=797.442M/s
PrefixSumBench/NNC/256                   1139 ns       1138 ns     617214 GB/s=1.799G/s
PrefixSumBench/NNC/1024                  3103 ns       3103 ns     225531 GB/s=2.63979G/s
PrefixSumBench/NNC/4096                 12053 ns      12052 ns      58084 GB/s=2.71883G/s
PrefixSumBench/NNC/16384                43227 ns      43225 ns      16192 GB/s=3.03231G/s
PrefixSumBench/NNC/65536               168065 ns     168056 ns       4153 GB/s=3.11972G/s
PrefixSumBench/NNC/262144              668974 ns     668921 ns       1045 GB/s=3.13513G/s
PrefixSumBench/NNC/1048576            2657464 ns    2657341 ns        263 GB/s=3.15677G/s
PrefixSumBench/LocalAVX2/64               523 ns        523 ns    1351308 GB/s=979.537M/s
PrefixSumBench/LocalAVX2/256              755 ns        755 ns     927762 GB/s=2.71159G/s
PrefixSumBench/LocalAVX2/1024            1759 ns       1759 ns     400355 GB/s=4.65609G/s
PrefixSumBench/LocalAVX2/4096            6708 ns       6706 ns     103959 GB/s=4.88649G/s
PrefixSumBench/LocalAVX2/16384          22143 ns      22142 ns      31229 GB/s=5.91951G/s
PrefixSumBench/LocalAVX2/65536          83649 ns      83642 ns       8350 GB/s=6.26828G/s
PrefixSumBench/LocalAVX2/262144        330433 ns     330427 ns       2133 GB/s=6.34679G/s
PrefixSumBench/LocalAVX2/1048576      1302301 ns    1302179 ns        537 GB/s=6.44198G/s
PrefixSumBench/LocalAVX512/64             474 ns        474 ns    1459151 GB/s=1080.8M/s
PrefixSumBench/LocalAVX512/256            576 ns        576 ns    1217442 GB/s=3.55524G/s
PrefixSumBench/LocalAVX512/1024           994 ns        994 ns     703387 GB/s=8.24434G/s
PrefixSumBench/LocalAVX512/4096          3642 ns       3641 ns     190646 GB/s=8.99857G/s
PrefixSumBench/LocalAVX512/16384        10140 ns      10140 ns      68947 GB/s=12.9267G/s
PrefixSumBench/LocalAVX512/65536        35739 ns      35736 ns      19567 GB/s=14.6711G/s
PrefixSumBench/LocalAVX512/262144      156415 ns     156413 ns       4467 GB/s=13.4078G/s
PrefixSumBench/LocalAVX512/1048576     613952 ns     613876 ns       1144 GB/s=13.665G/s
```

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D31253849

Pulled By: navahgar

fbshipit-source-id: f33e7be787c86a09e90babddd66b16e2e0777eb4
2021-09-30 14:44:52 -07:00
24f59fa20b [ci] fix softmax bc check (#65952)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65952

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D31320441

Pulled By: suo

fbshipit-source-id: ddd2ccca523d7ed31b231d924fbd6206525f16cf
2021-09-30 14:40:43 -07:00
d4d3bb91f9 Refactor OperatorSupport related code and fix TRT not supporting int64 dtype (#65848)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65848

This diff includes:

* [fix]: The initialization of `OperatorSupport._support_dict` makes it a class variable, so we need to move its initialization into constructor.
* Add abstract class (more of an interface) `OperatorSupportBase`, since `OperatorSupport`'s purpose is too specific.
* [refactor]: what `TRToperatorSupport` really does is to populate a `OperatorSupport._support_dict`, so there really is no reason for subclassing. So removing it, and changing it to instantiating a `OperatorSupport` with properly populated `_support_dict`.
* Add a framework for defining simple and basic op support logic, and composing them into more complex ones:
    1. `create_op_support` wraps a function into a `OperatorSupportBase` instance
    2. `chain` can combine several simple `OperatorSupportBase` into more complex ones
    3. `OpSupports` provides a set of pre-defined, simple `OperatorSupportBase` that can be composed together using `chain`.
        1. Currently the only pre-defined one is `decline_if_input_dtype(..)`, which declares a node non-supported, if its args are of user specified dtype
* Fix `TRTOperatorSupport` so that it not only looks for registered converters, but also decline a node if its arg is of int64

Test Plan: linter and CI

Reviewed By: 842974287

Differential Revision: D31275525

fbshipit-source-id: bbc02f7ccf4902a7912bb98ba5be2c2fbd53b606
2021-09-30 13:36:26 -07:00
9ae63bd87c Revert D31238123: [pytorch][PR] Avoid saving self forsoftmax and log_softmax
Test Plan: revert-hammer

Differential Revision:
D31238123 (fb412bdd80)

Original commit changeset: afd319d3676d

fbshipit-source-id: b7980d653a4b8322a225f1dd08c2857ecbe5bc94
2021-09-30 11:34:14 -07:00
541eb1db63 Add cuSPARSE descriptors and update CSR addmm (#60838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60838

Rewrote `addmm_out_sparse_csr_dense_cuda` implementation using new cusparse descriptors.

`addmm` now works without conversions with both 32-bit and 64-bit indices.
The dense tensors can have a row- or column-major layout. If the dense tensors are a contiguous slice of a larger tensor, the storage is used directly without temporary copies.

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D30643191

Pulled By: cpuhrsch

fbshipit-source-id: 5555f5b59b288daa3a3987d322a93dada63b46c8
2021-09-30 11:32:51 -07:00
be00f0207a Update git version for CentOS base dockers (#65703)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65048

cc jeffdaily sunway513 jithunnair-amd ROCmSupport

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65703

Reviewed By: albanD

Differential Revision: D31245666

Pulled By: janeyx99

fbshipit-source-id: 5431876bf19435eb3fd90a53a3ec94fd66c9210e
2021-09-30 11:26:21 -07:00
8297a16cc0 [ci] try installing libgnutls to fix cert error (#65934)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65934

see: https://github.com/pytorch/pytorch/issues/65931, this was a
suggested remediation on the linked issue

Test Plan: Imported from OSS

Reviewed By: malfet, zhouzhuojie

Differential Revision: D31313040

Pulled By: suo

fbshipit-source-id: a9e2b82a1e879962af768ed3049c73ab77394738
2021-09-30 11:23:17 -07:00
6a30d83596 Move ASAN to GHA (#65846)
Summary:
- Introduce `ciflow/sanitizers` label
- Modify asan pattern in `.jenkins/pytorch/build.sh`
- Produce wheel in `.jenkins/pytorch/build-asan.sh`
- Increase stack size hard limit to 82Mb in test docker containers

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65846

Reviewed By: seemethere

Differential Revision: D31282654

Pulled By: malfet

fbshipit-source-id: f73e692899cc9bbe106ececc26f1fe430dfeae9d
2021-09-30 09:49:52 -07:00
cdbfb2b689 .github: Bump linux and windows gpu max available (#65923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65923

Still noticing that queues are long particularly for windows GPU
machines, bumping this to compensate

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31308728

Pulled By: seemethere

fbshipit-source-id: b68c3a76335960def23e1f425ba5b0a219f07e73
2021-09-30 09:38:02 -07:00
928a4bbafb [JIT] Fix compilation unit reference link in constant object upon load (#65784)
Summary:
Follow up to https://github.com/pytorch/pytorch/pull/65442, make sure objects inserted into the graph from load do not holding owning reference.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65784

Reviewed By: suo

Differential Revision: D31251033

Pulled By: eellison

fbshipit-source-id: 59efe19ce6f70744383de4eebf0f89f79f3eb03a
2021-09-30 09:32:28 -07:00
8130157504 [DataPipe] Fixes an issue where TarArchiveReader closes stream when read into a buffer (#65877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65877

Fixes #65808

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31296041

Pulled By: NivekT

fbshipit-source-id: cdcad3a333ae9781d6063678a122a128955b0ff4
2021-09-30 08:46:32 -07:00
7f87ff183d [RFC] [Modular] Include less headers in vararg_functions.cpp (#65672)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65672

`ATen/ATen.h` has a list of all headers but vararg_functions.cpp only uses two of them. Change to include less for min_runtime.

ghstack-source-id: 139389772

Test Plan: CI

Reviewed By: larryliu0820

Differential Revision: D31198293

fbshipit-source-id: 9794a2696a1b124be7fced2836c633ae899aa5c8
2021-09-30 08:35:28 -07:00
ea776fa034 Update CODEOWNERS for optim (#65773)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65773

Reviewed By: mrshenli

Differential Revision: D31269749

Pulled By: albanD

fbshipit-source-id: 1ec35d2396797b8e97a7122e2b3a9021f8fcf0a0
2021-09-30 08:30:42 -07:00
b777d790ea Convert Sampler back to lazily construction (#63646)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63646

Fixes #63609

Test Plan: Imported from OSS

Reviewed By: NivekT

Differential Revision: D30451774

Pulled By: ejguan

fbshipit-source-id: 550d77494326446d1a42b5da0559e0d384c47413
2021-09-30 07:32:06 -07:00
4666e3f192 [quant] update fused_obs_fake_quant op to accept output_fake_quant argument (#65621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65621

Add a new attribute to the FusedMovingAvgObsFakeQuantize that controls if the Fake Quant operation should be applied at the output of a particular layer. The motivation is to give the users additional control to control the numerics of the fake_quant operators during training. It defaults to always fake quant the output (True).

Note: We will still observer the tensors as before (only the fake_quant operation is controlled using this flag)

For example
```
input model
x -> fc1 -> fc2 -> non_quantizable_op -> fc3

After fake_quant
x -> fake_quant(x) -> fc1 -> fake_quant(fc1) -> fc2 -> fake_quant(fc2) -> non_quantizable_op -> fake_quant() -> fc3 -> fake_quantize(fc3)

With output_fake_quant disabled at the output of fc2 and fc3 (since their outputs are non-quantizable)
x -> fake_quant(x) -> fc1 -> fake_quant(fc1) -> fc2 -> non_quantizable_op -> fake_quant() -> fc3
```

Test Plan: ./buck-out/gen/caffe2/test/quantization_fx\#binary.par -r test_disable_output_fake_quant

Reviewed By: jerryzh168

Differential Revision: D31174526

fbshipit-source-id: bffe776216d041fb09133a6fb09bfc2c0bb46b89
2021-09-30 01:08:01 -07:00
6d4b93bd96 [quant] adding memoryless observers for embeddingbag QAT work (#65699)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65699

related to: https://github.com/pytorch/pytorch/pull/65443#discussion_r715132425

The QAT and PAT (pruning aware training) support for embedding bags needs a memoryless observer to work properly. This is necessitated by the changing pruned/non-pruned weights during training which can significantly change the quantization parameters.

This PR adds a memoryless flag to the simpler observer classes (not moving average since those explicitly have memory)

In addition to the above, I altered the reset_min_max_vals
function for MinMaxObserver so that it would preserve the device of the
existing self.min_val and self.max_val which was not preserved
previously compared to how it is initialized (using factory_kwargs)

Test Plan:
python test/test_quantization.py TestObserver

(added test_memoryless_minmaxobserver, test_memoryless_per_channel_minmaxobserver, test_memoryless_histogramobserver)

Imported from OSS

Reviewed By: supriyar

Differential Revision: D31209773

fbshipit-source-id: 44a63298e44880fbd3576f49ac568e781f3fd79a
2021-09-30 00:55:32 -07:00
de80aff72d Revert D31132861: Make JIT Aliasing Test Less Brittle
Test Plan: revert-hammer

Differential Revision:
D31132861 (9f97c66a7a)

Original commit changeset: 26fc2e6bc77b

fbshipit-source-id: 46be9168179d555be6b6a92b54b2bb84b3f834ed
2021-09-29 23:39:40 -07:00
4176afc4a0 [Static Runtime] Disable SigridTransform + ListUnpack fusion when outputs reachable from graph output (#62697)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62697

Reviewed By: hlu1

Differential Revision: D29979402

fbshipit-source-id: 913e8396a0530ce3617211112a2b1147ef2e9df9
2021-09-29 22:47:48 -07:00
edab202a30 [DatePipe] add deprecation warnings for DataPipes that will solely exist in TorchData (#65827)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65827

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31272794

Pulled By: NivekT

fbshipit-source-id: 8da8266184b4df050422904cbc5fca6d7c3d2e02
2021-09-29 22:42:22 -07:00
cd458fe092 [JIT] Make output of prim::TupleConstruct alias only with its inputs (#64879)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64879

This change makes the output of `prim::TupleConstruct` alias only with its inputs *when* the created tuple is directly returned from the graph.

The same treatment could be made to any tuples newly constructed by `prim::TupleConstruct` if they do not let their elements escape. However, this change only focuses on only one simplest, but frequently used usecase: tuples constructed only to be returned from a graph. This usecase turns out to be very often used.

Test Plan:
Added
- `AliasMoveForTupleConstructWithSingleUseAsGraphOutput`
- `WildcardAliasForTupleConstructWithUses`

to cover the newly added code.

Reviewed By: eellison

Differential Revision: D30437737

fbshipit-source-id: 417fbc6bc348062e60e7acdddd340d4754d090eb
2021-09-29 21:56:31 -07:00
dd354117ef Skip failing tests when LAPACK and MAGMA are not available (#64930)
Summary:
Skip failing tests when LAPACK and MAGMA are not available for ` test_linalg.py` and ` test_ops.py`.
Note that there's no CI without LAPACK or MAGMA. I verified locally that now it works as expected, but in the future we have no guards against tests failing again for this situation.

<details>
  <summary> test_ops.py failures that are fixed</summary>

 ```
 FAILED test/test_ops.py::TestCommonCPU::test_out_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestCommonCPU::test_reference_testing_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestCommonCPU::test_reference_testing_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_triangular_solve_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_triangular_solve_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestMathBitsCPU::test_conj_view_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestMathBitsCPU::test_conj_view_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestMathBitsCPU::test_neg_view_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_ops.py::TestMathBitsCPU::test_neg_view_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation
 ```

</details>

<details>
  <summary> test_linalg.py failures that are fixed</summary>
```
FAILED test/test_linalg.py::TestLinalgCPU::test_norm_dtype_cpu - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCPU::test_norm_matrix_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCPU::test_norm_matrix_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCPU::test_nuclear_norm_axes_small_brute_force_old_cpu - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_eigh_hermitian_grad_meta_complex128 - RuntimeError: Calling torch.linalg.eigh or eigvalsh on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support.
FAILED test/test_linalg.py::TestLinalgMETA::test_eigh_hermitian_grad_meta_float64 - RuntimeError: Calling torch.linalg.eigh or eigvalsh on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support.
FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_lowrank_cuda_float64 - RuntimeError: Calling torch.lu on a CUDA tensor requires compiling PyTorch with MAGMA. lease rebuild with MAGMA.
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_complex128 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_complex64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation
FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation
```
</details>

Fixes https://github.com/pytorch/pytorch/issues/59662

cc mruberry jianyuh nikitaved pearu walterddr IvanYashchuk xwang233 Lezcano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64930

Reviewed By: H-Huang

Differential Revision: D31137652

Pulled By: mruberry

fbshipit-source-id: c969f75d7cf185765211004a0878e7c8a5d3cbf7
2021-09-29 21:31:14 -07:00
2c29ec2a41 Remove "SciPioneer" from PT Distributed code owners (#65862)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65862

ghstack-source-id: 139378782

Test Plan: N/A

Reviewed By: rohan-varma

Differential Revision: D31291340

fbshipit-source-id: 65d6a82c57dd50d8a4241e9442d73002590989d9
2021-09-29 20:52:01 -07:00
91f8755b0e Revert D31005792: [NCCL] Init dummy NCCL comms in constructor
Test Plan: revert-hammer

Differential Revision:
D31005792 (2b22a5dde2)

Original commit changeset: c2c582dee25a

fbshipit-source-id: d8e962b8aab6fda8a6c013e8577492dff9568c27
2021-09-29 20:46:38 -07:00
5349ea921b Migrate THCIntegerDivider.cuh to ATen (#65745)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65745

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31257937

fbshipit-source-id: 283693525859b7a77a116df0c227653763911a42
2021-09-29 20:37:41 -07:00
3900509b7d (torchelastic) make --max_restarts explicit in the quickstart and runner docs (#65838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65838

closes https://github.com/pytorch/pytorch/pull/65675

The default `--max_restarts` for `torch.distributed.run` was changed to `0` from `3` to make things backwards compatible with `torch.distributed.launch`. Since the default `--max_restarts` used to be greater than `0` we never documented passing `--max_restarts` explicitly in any of our example code.

Test Plan: N/A doc change only

Reviewed By: d4l3k

Differential Revision: D31279544

fbshipit-source-id: 98b31e6a158371bc56907552c5c13958446716f9
2021-09-29 19:29:01 -07:00
c7ef620a14 [quant] Add imports to the torch/ao/quantization/__init__.py (#64911)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64911

The import statements that involve the `quantize.py` were not added to the module level __init__ file. Those imports are necessary to mimic the behavior of the old import locations. Otherwise, the user would need to change their import statements to `from torch.ao.quantization.quantize import quantize` (instead of `from torch.ao.quantization import quantize`.

Another change in this diff is that we don't use `__all__` anymore. The all dunder was never used in quantization anyway, and just creates a potential bug when using `from ... import *`.
ghstack-source-id: 139342483

Test Plan: `buck test mode/dev //caffe2/test:quantization`

Reviewed By: vkuzo

Differential Revision: D30897663

fbshipit-source-id: a7b4919a191755e3ba690a79ce3362889f416689
2021-09-29 19:08:45 -07:00
fb412bdd80 Avoid saving self forsoftmax and log_softmax (#65242)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64000
 - updates double backward formula to compute grad wrt output instead of self
 - ~~In some of the error messages, we still refer to the dtype of the input, even though we are now checking the dtype of the output~~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65242

Reviewed By: albanD

Differential Revision: D31238123

Pulled By: soulitzer

fbshipit-source-id: afd319d3676d9ef8d81607e0e8c2a3e6d09f68e4
2021-09-29 18:16:12 -07:00
768cfaa8f8 fix typo in _sharded_tensor (#65511)
Summary:
per title

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65511

Reviewed By: albanD

Differential Revision: D31239269

Pulled By: cbalioglu

fbshipit-source-id: 602c0bf7ef96a930606d68b15a5b3cadda9d9437
2021-09-29 18:00:47 -07:00
9f97c66a7a Make JIT Aliasing Test Less Brittle (#65493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65493

Added a last resolve to use whatever ATen operator that has Tensor outputs in the graph as the operator node to check alias annotation.

Test Plan:
python test/test_ops.py -k test_variant_consistency_jit_linalg_tensorinv
python test/test_ops.py -k test_variant_consistency_jit_nn_functional_normalize

Reviewed By: eellison

Differential Revision: D31132861

Pulled By: alanwaketan

fbshipit-source-id: 26fc2e6bc77be3a296967cf29a3f6ded231302fa
2021-09-29 17:11:04 -07:00
91611fe1d1 Decouple forward AD checks from backward AD in OpInfo tests and gradcheck (#65040)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64999

- Adds a flag to gradcheck `check_backward_ad` that can be used to disable gradcheck for backward ad
  - This is a bit bc-breaking in terms of positional args, but I prefer this ordering
- In OpInfo tests for forward ad:
  - set `check_backward_ad` False
- In test_ops treat `supports_autograd` as if it is `supports_backward_ad` (it basically already is)
  - the only modification needed is to no longer skip forward ad tests if `supports_autograd` is false
  - test_dtype, test_variant_consistency, etc behave correctly as-is
  - In a follow-up PR, we can rename it to actually be `supports_backward_ad`
- Testing
  - https://github.com/pytorch/pytorch/pull/65060

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65040

Reviewed By: albanD

Differential Revision: D31238177

Pulled By: soulitzer

fbshipit-source-id: f068d4cbe7ffb094930b16cddb210583b9b7b2c4
2021-09-29 17:01:34 -07:00
5950240bdf Stop Win+CUDA-10.2 builds (#65649)
Summary:
See https://github.com/pytorch/pytorch/issues/65612 and https://github.com/pytorch/pytorch/issues/25393

Fixes https://github.com/pytorch/pytorch/issues/65648

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65649

Reviewed By: janeyx99

Differential Revision: D31189692

Pulled By: malfet

fbshipit-source-id: 6ec0548d5833f3428d882071d26c357d89b0a9ba
2021-09-29 15:41:23 -07:00
2b22a5dde2 [NCCL] Init dummy NCCL comms in constructor (#65173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65173

Initializes dummy NCCL communicators in constructor for a basic health
check that communicators can be initialized prior to launching the first
collective.

After successful init, we immediately use `ncclCommAbort` to destroy these
communicators to ensure they don't interfere with regular communicator creation
during collectives.

Test Plan: CI

Reviewed By: pritamdamania87

Differential Revision: D31005792

fbshipit-source-id: c2c582dee25a098361ead6ef03f541e7833c606b
2021-09-29 15:36:54 -07:00
ad85b582da Remove THCDeviceTensor (#65744)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65744

This is just dead code.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31257940

fbshipit-source-id: 6c02264106c2dcbadd332f24b95bc9351a04fd9e
2021-09-29 14:54:46 -07:00
20374c991b slow_conv2d_forward: avoid calling dispatcher in parallel region (#65724)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65724

See gh-56794

Avoid dispatch inside of parallel_for by:
1. Replacing Tensor slicing with TensorAccessor
2. Copy bias into output only once, outside of the parallel region
3. Replaces `addmm`_ with a direct call to gemm.

Technically this also adds a new requirement that the output always be
contiguous, but the out argument version isn't exposed or used
anywhere in the `torch.nn` API. So that should be fine.

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D31257875

Pulled By: ngimel

fbshipit-source-id: 84d2b39e7f65334bdfcc2c4719f93ee3c514ca32
2021-09-29 14:09:32 -07:00
7191dd2613 Update Module docstring for Python 3 (#65748)
Summary:
In Python 3, we can call `super()` without any arguments.

If I understand correctly, Python 2 is no longer supported by PyTorch, so we can change the documentation to be Python-3 only :)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65748

Reviewed By: saketh-are

Differential Revision: D31246055

Pulled By: albanD

fbshipit-source-id: 3980def1a556d4bdfa391ea61cb2a65efa20df79
2021-09-29 13:40:15 -07:00
8bf0ba546e ns for fx: add basic testing on cuda (#65593)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65593

Adds test cases that the three Numeric Suite Core APIs work
when the models are on cuda.  In particular:
1. create models and move them to cuda
2. add loggers (if applicable)
3. run data through (if applicable)
4. extract results

It works without code changes because a `Logger` object is
created without any device specific objects (they only get
added if a data is passed through). It's good to have this tested.

Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extract_weights_cuda
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_add_loggers_cuda
python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_add_shadow_loggers_cuda
```

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D31160897

fbshipit-source-id: 8eacf164d0496baf2830491200ea721c0f32ac92
2021-09-29 13:06:30 -07:00
0dd1b74a5b Migrate THCScanUtils to ATen (#65743)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65743

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31257938

fbshipit-source-id: 273b22df41bb7f2a0ab605ec1f6322c2937e7472
2021-09-29 12:39:37 -07:00
a84feeeade [PyTorch Edge] Conditionally trim dispatch key set to save heap memory at runtime (#65732)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65732

For certain on-device uses, runtime memory comes at a premium. On-device deployments won't use all the available dispatch keys, so it makes sense to keep only the on-device specific ones around for such uses to reduce runtime heap memory allocated.

This change keeps just 10 dispatch keys (the ones that used on-device), guarded under the `C10_MOBILE_TRIM_DISPATCH_KEYS` macro. it tries to keep the other code-paths unaffected and uses `constexpr` for use in the `array` declaration, and simple inline functions to ensure that the compiler is able to optimize these for server builds.

Test Plan:
Build and check mobile models end to end.

```
buck build -c "pt.enable_milan_dispatch_keys_trimming"=1 //xplat/caffe2/fb/lite_predictor:lite_predictor
```

Reviewed By: ezyang

Differential Revision: D31185407

fbshipit-source-id: e954765606373dea6ee9466a851dca7684167b0b
2021-09-29 12:20:33 -07:00
7b5d676fa1 .github: Bump linux gpu max limit to 100 (#65831)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65831

Was noticing scaling issues last night due to the lack of
linux.8xlarge.nvidia.gpu machines, seems as though that even at max
capacity we were still about ~50 queued workflows behind, this should
close that gap.

Also since these run the longest types of tests these are the most
likely to overlap with scale messages being processed while available
runners are still maxed out

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31275892

Pulled By: seemethere

fbshipit-source-id: b22ceda115b70d7bdd9c4bc207b55ffab50381ef
2021-09-29 12:06:54 -07:00
c975ca4337 [Static Runtime] Simplify out variant overload implementations (#65384)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65384

The following pattern appears frequently in `ops.cpp`:

```
if (!n->matches(schema_1) && !n->matches(schema_2) && ... && !n->matches(schema_n)) {
    LogAndDumpSchema(n);
    return nullptr;
}

return [](ProcessedNode* p_node) {
    if (p_node->Output(0).isNone()) {
        if (p_node->Input(i).isSomeType()) {
            // special logic for schema 1
        } else if (p_node->Input(i).isSomeOtherType()) {
            // special logic for schema 2
        } else if (...) {
            // special logic for schema3
        }
        // and so on
    } else {
        // another complicated type checking chain
    }
};
```

A much cleaner way to implement operator overloads is like this:
```
if (n->matches(schema_1)) {
    return schema_1_impl;
} else if (n->matches(schema_2)) {
    return schema_2_impl;
}
// and so on
```

This has a few advantages:
* Significantly reduces complexity of the out variant implementations, especially for ops with more than 2 overloads. One implementation corresponds to one schema. This makes the implementation more readable/maintainable.
* Adhering to this convention makes it easier to add a new overload. Just add a new `n->matches(...)` case instead of working the schema into existing complicated logic.
* Ops are marginally faster since we don't have to check types at runtime.

Note: there are a few cases where this actually made the code less concise (`aten::div`), so I left those ops untouched.

Thanks for pointing this out in another diff d1jang

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D31072328

fbshipit-source-id: c40a4f7e6a79881e94c9ec49e9008ed75cfc8688
2021-09-29 12:02:11 -07:00
2f712c452e .github: Remove confusing on_pull_request variable (#65731)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65731

It originally had purpose but after ciflow was introduced every PR had
on_pull_request set so it's not really as useful as it once was

Also removes the equally as confusing only_build_on_pull_request
variable as well

This change should produce no functional changes in our generated workflows

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

cc ezyang seemethere malfet pytorch/pytorch-dev-infra

Test Plan: Imported from OSS

Reviewed By: janeyx99

Differential Revision: D31225398

Pulled By: seemethere

fbshipit-source-id: 7bd8e8175794ab7d09b0632321bf52538435e858
2021-09-29 11:56:13 -07:00
6c2f235d36 common_utils.py: Add ASAN as a platform for which you can disable tests (#65791)
Summary:
Could be useful for the future.

Next steps: document it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65791

Reviewed By: suo

Differential Revision: D31254115

Pulled By: janeyx99

fbshipit-source-id: 715c18b4505f2be6328aa0be25976116d6956b25
2021-09-29 11:00:03 -07:00
911d01c1de type annotate operator_support (#65136)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65136

Opportunistically add type annotation for operator_support.py

Test Plan: run linter, CI

Reviewed By: yinghai

Differential Revision: D30928464

fbshipit-source-id: 615c75152b9938792f03cdceb2a113bda6ab28c7
2021-09-29 10:38:47 -07:00
085e2f7bdd [ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610

- Replace HIP_PLATFORM_HCC with USE_ROCM
- Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION.

- In the next PR
   - Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify.
   - HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd

Reviewed By: jbschlosser

Differential Revision: D30909053

Pulled By: ezyang

fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06
2021-09-29 09:55:43 -07:00
9b40eaaaab Revert D31193205: [pytorch][PR] CMake: Limit python include directories to only python libraries
Test Plan: revert-hammer

Differential Revision:
D31193205 (971c57f1d0)

Original commit changeset: 5c1b554a59d0

fbshipit-source-id: 5719b7df987ded6e7e212749a438db947656df87
2021-09-29 09:49:33 -07:00
2670cacfc2 LLVM-12 fix for tensor_new.cpp (#65785)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65785

Fixes offset to nullptr at fbcode/caffe2/torch/csrc/utils/tensor_new.cpp:206

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D31250995

fbshipit-source-id: 56c7761787e732180a2537a8aa4346a39e7399a8
2021-09-29 09:35:18 -07:00
09eb3e661c don't check 0 elements for cat symbolic diff (#65751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65751

Fixes symbolic script grad formula for cat to correctly handle empty tensors

Test Plan: Existing tests

Reviewed By: eellison

Differential Revision: D31208364

fbshipit-source-id: d676d9abcc033b56076fa946f58f3db50034502d
2021-09-29 09:34:03 -07:00
1d681c1ab2 Migrate THCThrustAllocator to ATen (#65492)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65492

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D31148180

Pulled By: ngimel

fbshipit-source-id: d5e4902036493517ca97c3442713b5e0e79229f9
2021-09-29 09:27:41 -07:00
971c57f1d0 CMake: Limit python include directories to only python libraries (#65654)
Summary:
`include_directories` is old-style CMake which adds the include path to every file being compiled. This instead makes python, numpy and pybind11 into targets that only torch_python and caffe2_pybind_state are linked to. So, python libraries can't be accidentally included elsewhere.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65654

Reviewed By: gchanan

Differential Revision: D31193205

Pulled By: malfet

fbshipit-source-id: 5c1b554a59d0e441a701a04ebb62f0032d38b208
2021-09-29 08:09:08 -07:00
5f7ab7be6f [Static Runtime] concat_add_mul_replacenan_clip retains axis arg (#65741)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65741

This op previously assumed `axis == 1`, causing graphs that would otherwise be valid to return incorrect results after fusing.

Reviewed By: hlu1

Differential Revision: D31234944

fbshipit-source-id: 89885a3b119357698ebd9fd429b009813260a2f4
2021-09-29 08:04:20 -07:00
f63150fd1d [PyTorch Edge] Reduce the cost of computing isIncludedInAlias() (#65735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65735

Currently, `isIncludedInAlias()` calls `getRuntimeDispatchKeySet()` which creates a new `DispatchKeySet` object from an enumerated list of dispatch keys. `isIncludedInAlias()` then checks if a single dispatch key is part of this set. Instead, just pass in the key one wishes to check. This is marginally faster.

ghstack-source-id: 139281528

Test Plan:
See these 2 AI Bench Runs on the Milan-FFF-11-30 device.

### Before
[AI Bench](https://www.internalfb.com/intern/aibench/details/237302972704466), [Flamegraph](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/speech_transducer_v25_perf_1632804218329.html)

### After
[AI Bench](https://www.internalfb.com/intern/aibench/details/606320012968375), [Flamegraph](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/speech_transducer_v25_perf_1632807348803.html)

Check the the flamegraphs, and focus on any kernel registration code path during library initialization.

Reviewed By: swolchok

Differential Revision: D31228062

fbshipit-source-id: 7a986e3593c30ded7919cd3b564ec579dc97ab5f
2021-09-29 07:40:39 -07:00
aebde1bc2b deprecate device getter from torch.testing namespace (#63844)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63844

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D31141433

Pulled By: mruberry

fbshipit-source-id: a29331278ab99a19e225e2cb357458e3db4f9732
2021-09-29 02:40:52 -07:00
07d5d7b5cc move kernel launch checks from torch.testing to torch.testing._internal.check_kernel_launches (#60862)
Summary:
The fact that these functions are only used in a single test might be a good enough reason to move them to that module.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60862

Reviewed By: H-Huang

Differential Revision: D31141354

Pulled By: mruberry

fbshipit-source-id: 6ce1f721b88620c5f46222ad1b942bc689f0a3e0
2021-09-29 00:39:22 -07:00
0a0564a347 Revert D31206837: [pytorch][PR] *_solve methods: implements forward AD
Test Plan: revert-hammer

Differential Revision:
D31206837 (26e31f76b0)

Original commit changeset: 040beda97442

fbshipit-source-id: f28091327357af9f54f367eda6606240924b93ac
2021-09-28 23:31:16 -07:00
f9c2dc860d make layout check optional in torch.testing.assert_close() (#65419)
Summary:
In case the inputs have a different layout, `assert_close(..., check_layout=False)` converts them to strided before comparison. This is helpful if you just want to compare the values of sparse COO / CSR tensor against a strided reference.

This keeps BC, since the default `check_layout=True` was the old, hard-coded behavior.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65419

Reviewed By: H-Huang

Differential Revision: D31133629

Pulled By: mruberry

fbshipit-source-id: ca8918af81fb0e0ba263104836a4c2eeacdfc7e6
2021-09-28 23:23:41 -07:00
8a247fb418 LLVM-12 fix for shm_mutex (#65781)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65781

Fixes
```
stderr: In file included from caffe2/caffe2/contrib/shm_mutex/shm_mutex.cc:1:
caffe2/caffe2/contrib/shm_mutex/shm_mutex.h:334:28: error: anonymous non-C-compatible type given name for linkage purposes by alias declaration; add a tag name here [-Werror,-Wnon-c-typedef-for-linkage]
using TicketStruct = struct : ShmBaseHeader {
                           ^
                            TicketStruct
caffe2/caffe2/contrib/shm_mutex/shm_mutex.h:334:31: note: type is not C-compatible due to this base class
using TicketStruct = struct : ShmBaseHeader {
                              ^~~~~~~~~~~~~
caffe2/caffe2/contrib/shm_mutex/shm_mutex.h:334:7: note: type is given name 'TicketStruct' for linkage purposes by this alias declaration
using TicketStruct = struct : ShmBaseHeader {
      ^
1 error generated.
Cannot execute a rule out of process. On RE worker. Thread: Thread[main,5,main]
Command failed with exit code 1.
```

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D31248938

fbshipit-source-id: 47342fecc72ada9397a1b7bd6fcabfccf988dd3e
2021-09-28 22:51:38 -07:00
4a7a0ea42e Skip flaky ASAN tests (#65792)
Summary:
See https://github.com/pytorch/pytorch/issues/65727

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65792

Reviewed By: janeyx99

Differential Revision: D31254490

Pulled By: malfet

fbshipit-source-id: 76714db30a5566fbab95179236ccdafab22cf551
2021-09-28 22:33:02 -07:00
d528c7f3c0 .github: Move windows back to default directory (#64962)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64962

Moves windows builds / tests back to the default directory. Previously
we had moved them because checkout would sometimes fail due to file
handlers still being open on the working directory.

Moving back to the default directory also has the added bonus of sccache
working again so here's to hoping that this doesn't have any adverse
affects

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

cc peterjc123 mszhanyi skyline75489 nbcsm ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31250072

Pulled By: seemethere

fbshipit-source-id: a803bf0e00e1b2b0d63f78600588281622ee0652
2021-09-28 19:41:35 -07:00
ed4491be6f Fix error code checking for Windows build scripts (#57331)
Summary:
The variable `%errorlevel%` is evaluated before the whole line of command starts, so it is useless when used in a if-block. Also, let's prevent using `%errorlevel%` because it may be set by the users accidentally.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57331

Reviewed By: anjali411

Differential Revision: D28140182

Pulled By: malfet

fbshipit-source-id: a3f21d65623bb25f039805c175e9f3b468bcb548
2021-09-28 19:27:07 -07:00
0d7036fdaf don't leak build time path name to runtime for frozen python modules (#65715)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65715

Here is how we freeze a python module:
- we call python builtin compile method with the source code of the modules and the path. This method returns a python code object
- we call marshal.dumps to serialize the code object to bytes.

The code_object.co_filename actually matches the one passed in to the compile method. We can simply replace that with a marker
to avoid leak build time path to runtime.

This works on nested code objects as well:
```
#!/bin/env python3.8
import marshal

code_str = """
print("hello")

class MyCls:
    def __init__(self):
        pass
"""
co = compile(code_str, "<Generated by torch::deploy>", "exec")
cobytes = marshal.dumps(co)
import pdb; pdb.set_trace()
```

Checking `co`:
```
(Pdb) co.co_filename
'<Generated by torch::deploy>'
(Pdb) co.co_consts
('hello', <code object MyCls at 0x7f0e8670bbe0, file "<Generated by torch::deploy>", line 4>, 'MyCls', None)
(Pdb) co.co_consts[1].co_filename
'<Generated by torch::deploy>'
```

Test Plan:
Find the serialized frozenmodule for torch.nn.modules.linear module in the generated bytecode_x.c file. Put the content to /tmp/linear.bytecode

Run the testing script:
```
import marshal
co_bytes = bytes(eval("[{}]".format("".join(open('/tmp/linear.bytecode').readlines()).replace('\n', '').replace('\t', ''))))
co = marshal.loads(co_bytes)
print(co)

```

The output for the paste without the change:
```
<code object <module> at 0x7f39ca7f07c0, file "/data/users/shunting/fbsource/fbcode/buck-out/opt/gen/caffe2/gen_frozen_torchpython_src__srcs/torch/nn/modules/linear.py", line 1>
```

The output for the paste with the change:
```
<code object <module> at 0x7f05a765d710, file "<Generated by torch::deploy>", line 1>
````

Note that the file part is changed as expected.

Reviewed By: suo

Differential Revision: D31214555

fbshipit-source-id: 56958e0a7352f8c30a3377f83209efe7db61f0fb
2021-09-28 19:25:51 -07:00
72b27bde83 [CIFlow] Modify workflow trigger logic (#65733)
Summary:
CIFLow workflows should always run on push event
On pull-request workflow should run if label conditions are met or if
no `ciflow/` labels are associated with it, workflow is enabled by
default

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65733

Reviewed By: zhouzhuojie

Differential Revision: D31251278

Pulled By: malfet

fbshipit-source-id: 31ce745cb224df7c6fec1682ec4180513e3dadf3
2021-09-28 19:19:49 -07:00
b3c32ad32f .github: Move calculate-docker-image into build (#65789)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65789

These common types of jobs can be moved into build since it's typically
a no-op, could be annoying in the future to debug docker builds but
dedicating an entire ephemeral node to a noop seems like a waste to me

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet, janeyx99

Differential Revision: D31253017

Pulled By: seemethere

fbshipit-source-id: c7b5ea35a57fb1576122df219d387c86e420fd1f
2021-09-28 19:15:24 -07:00
609384c056 [sparsity][doc] Docstring for WeightNormSparsifier (#65294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65294

This adds the docstring documentation to the WeightNormSparsifier and adds the typehints for the constructor args.
Note, this does not require testing as only the doc is changed.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D31186827

Pulled By: z-a-f

fbshipit-source-id: c5010c9bba25b074c4cc6c88f251474b758f950d
2021-09-28 14:14:51 -07:00
92ee5cc2e2 [sparsity] Fix for accumulation bug in WeightNormSparsifier (#65293)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65293

This fixes a bug in the WeightNormSparsifier, where the mask is being multiplied by the newly computed mask.
Because the mask elements are binary 0/1, this accumulates the mask over every iteration, eventually collapsing the mask to zero.
This bug accidentally bled through from old versions.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D31186829

Pulled By: z-a-f

fbshipit-source-id: 3f5b2c833148ab0bd8084e7410ce398f1252e65e
2021-09-28 14:14:49 -07:00
a90912ecc5 [sparsity] Remove the pack_param from the sparsifier state_dict (#65292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65292

That was the original design, that we decided to simplify by removing the packing in the sparsifier.
The state of the sparsifier is saved directly, and the old behavior accidentally bled through to the current version.
This change removes the `_pack_params` method, and changes the state_dict to include the state directly.
We don't have to change the load_state_dict, as it will work with either the old or the new format.

The main reason for this PR is the simplification. The original design didn't achieve anything useful by packing the sparsification parameters.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D31186826

Pulled By: z-a-f

fbshipit-source-id: 4ad72a7e669f048d2f2d269269ee11b63fa169db
2021-09-28 14:12:52 -07:00
c829cb6840 Port min kernel to structured kernels. (#61450)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61450

Tracking issue: #55070

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D29741713

Pulled By: bdhirsh

fbshipit-source-id: 2c107752a90fd39cfb55e08aaf3541bd484a5fc3
2021-09-28 14:03:54 -07:00
c2252b3aa6 Port max kernel to structured kernels. (#61449)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61449

Tracking issue: #55070

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D29741714

Pulled By: bdhirsh

fbshipit-source-id: 6c8c17d20f578ab0af8a969d103a19ccd8d51842
2021-09-28 14:02:26 -07:00
51f1569c77 Add checks for structured in-place operations. (#65686)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65686

Fixes: #57827

This PR introduces `check_inplace` function. It contains some common checks for all
structured in-place operators (e.g. dtype, device, and sizes). `set_output` method calls
`check_inplace` on in-place specializations of structured kernels.

Besides that, it also:
- adds overlap assertions for both in-place and out-of-place overloads
- remove in-place operator specific `TORCH_CHECK` around the code base

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31234063

Pulled By: ezyang

fbshipit-source-id: fa3b45775af7812e07a282e7cae00b68caf0fdb0
2021-09-28 13:21:26 -07:00
93852bb2d4 Port sort kernel to structured kernels. (#62391)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62391

Tracking issue: #55070

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D30903992

Pulled By: bdhirsh

fbshipit-source-id: 52687aa2483c101056825433d39d69c60b829c62
2021-09-28 13:12:35 -07:00
57529d48c4 [quant] Fix applying non-zero offset 1 to null pointer in quantized interpolation (#65570)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65570

Although this is not an issue that could pop-up in practice, LLVM-12 throws an error about this issue if not checked.

Test Plan: `buck test mode/dev //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_empty_batch (quantization.core.test_quantized_op.TestQuantizedOps)'`

Reviewed By: r-barnes

Differential Revision: D31151681

fbshipit-source-id: e039c6aa1687a61ef6774f045744dc9d768d5c80
2021-09-28 12:28:59 -07:00
4752453d27 [Structured Kernels] Port for baddbmm and bmm (#64805)
Summary:
This PR attempts to port `baddbmm` and `bmm` to structured kernels. The reason it's in the same PR: because a lot of it is common for both the ops, including the checks and implementation.

Issue tracker: https://github.com/pytorch/pytorch/issues/55070

cc: ysiraichi ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64805

Reviewed By: gchanan

Differential Revision: D31134454

Pulled By: ezyang

fbshipit-source-id: 3294619834a8cc6a0407aea660c556d3a42b6261
2021-09-28 11:07:31 -07:00
278edb5626 .circleci: Only generate docker configs we need (#65728)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65728

Changes the docker image generation script to only include image build
jobs for images that we actually use within CircleCI

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

cc ezyang seemethere malfet pytorch/pytorch-dev-infra

Test Plan: Imported from OSS

Reviewed By: janeyx99

Differential Revision: D31224674

Pulled By: seemethere

fbshipit-source-id: 64b14e1a4ef82d345ec7b898c4c89d9a9419e4de
2021-09-28 10:38:13 -07:00
145202c45b Define timeout in TestIndividualWorkerQueue (#65742)
Summary:
This test occasionally deadlocks while waiting for the child process to report result.
But as the test is small, entire test should never take more than 1-2 sec, but to be on the safe side set timeout to 5 sec

Somewhat mitigates https://github.com/pytorch/pytorch/issues/65727

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65742

Reviewed By: janeyx99, ejguan

Differential Revision: D31235116

Pulled By: malfet

fbshipit-source-id: 0cdd2f7295f6f9fcefee954a14352e18fae20696
2021-09-28 10:01:19 -07:00
50edc2679d onnx/test.sh: Run test/onnx in only shard 1 (#65722)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65458

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65722

Reviewed By: albanD

Differential Revision: D31223236

Pulled By: janeyx99

fbshipit-source-id: 3b648cb940a95866f465b27b8bdc74b06d258140
2021-09-28 08:45:25 -07:00
87cd658c27 Add override to virtual destructor in derived class (#65476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65476

As suggested by `-Winconsistent-missing-destructor-override`.

Test Plan: CI

Reviewed By: pritamdamania87

Differential Revision: D31115128

fbshipit-source-id: a4e2441c13704c0c46e3e86f7886fca76c40ca39
2021-09-28 08:37:23 -07:00
57e5ae5306 [vulkan] Use push constants instead of SSBOs (#65716)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65716

Currently, we send arguments to shaders by creating and filling a SSBO (Shader Storage Buffer Object). However, we can instead use [push constants](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkCmdPushConstants.html) to send a small amount of uniform data to shaders.

Push constants are slightly more performant than using a SSBO and also have the added benefit of not needing to allocate and manage memory for a buffer object since they update the pipeline data directly.

The downside of using push constants is that there is a maximum size for a push constant block, described by `maxPushConstantsSize` in [VkPhysicalDeviceLimits](https://www.khronos.org/registry/vulkan/specs/1.1/html/vkspec.html#VkPhysicalDeviceLimits). The minimum size guaranteed by the spec is 128 bytes, which is enough for 32 `float`/`int` variables, or 8 `vec4` variables. This should be enough for our purposes.

Currently, the Convolution shaders use the largest uniform block which only uses 22 bytes.

Test Plan:
Run `vulkan_api_test`:

```
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
```

Reviewed By: beback4u

Differential Revision: D30368834

fbshipit-source-id: 65a42b9da1a9084ba2337b41eaab9b612583c408
2021-09-28 08:32:30 -07:00
e155e7520f MaxUnpooling: parallel_for not always backed by OMP (#65655)
Summary:
Use `c10::optional` + thread_fence  instead of `#pragma omp critical` inside max_unpooling kernels

Using any OpenMP pragma in `at::parallel_for` body is wrong, as it can
be implemented using native treading algorithms such as ptrheads

`c10::optional` sounds like a much better approach to pair of
`has_error` and `error_index` variables. Use `std::atomic_thread_fence` to ensure error_index value is synchronized.

It also fixes ICE reported in https://github.com/pytorch/pytorch/issues/65578

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65655

Reviewed By: ngimel

Differential Revision: D31206501

Pulled By: malfet

fbshipit-source-id: 93df34530e721777b69509cd6c68f5d713fb2b2a
2021-09-28 08:13:58 -07:00
26e31f76b0 *_solve methods: implements forward AD (#65546)
Summary:
This PR adds forward AD for `*_solve` methods.
Additionally, `cholesky_solve` gets OpInfo + a bug fix when wrong leading dimensions could be passed to LAPACK,
and `lu_solve` gets forward AD with 2x`lu_solve` instead of 1x`lu_solve` + 2x`triangular_solve`.

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65546

Reviewed By: gchanan

Differential Revision: D31206837

Pulled By: albanD

fbshipit-source-id: 040beda97442e7a88a9df9abc7bb18313ce55bc3
2021-09-28 06:51:32 -07:00
2ea724b1fd Added option to update parameters using state_dict in AveragedModel (#65495)
Summary:
While implementing [EMA](https://github.com/pytorch/vision/pull/4381)(which extends AveragedModel) in torchvision, update_parameters() from AveragedModel could not be used as it did not handle state_dict(), so a custom update_parameters() needed to be defined in [EMA class](https://github.com/pytorch/vision/pull/4406). This PR aims to handle this scenario removing the need for this custom update_parameters() implementation.

Discussion: https://github.com/pytorch/vision/pull/4406#pullrequestreview-753734102

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65495

Reviewed By: datumbox

Differential Revision: D31176742

Pulled By: prabhat00155

fbshipit-source-id: 326d14876018f21cf602bab5eaba344678dbabe2
2021-09-28 03:34:49 -07:00
3324bae5f1 Remove THCTensor.cu and THCTensorCopy.cu copy (#65491)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65491

The only user of any of this code is THCStorage_copy, so I've
migrated that to call `Tensor.copy_` directly.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D31148183

Pulled By: ngimel

fbshipit-source-id: 92bab71306c84bc481c47a0615ebb811af2c2875
2021-09-27 23:21:45 -07:00
6a99053515 Added sparse-tensor copy logic to dispatcher (#65304)
Summary:
- Only ported copy for sparse tensor to dispatcher. Everything else is the same
- Duplicated code for named tensor handling in sparse tensor copy
	- Might change it later to handle named tensors using dispatcher

Issue https://github.com/pytorch/pytorch/issues/61122

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65304

Reviewed By: gchanan

Differential Revision: D31176720

Pulled By: ezyang

fbshipit-source-id: 56757a3b0fb56c3d05c16dd935428a0cd91ea766
2021-09-27 20:08:27 -07:00
43d47bdcca [tensorexpr] conv2d handle optional bias (#64750)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64750

conv2d bias is optional. It will be ArgNone in processing of the graph.
This bias is prim::constant NoneType, so we do not know shape at the moment of constant binding.

This adding it as a constant zeros Tensor at the moment of graph processing => for that adding `std::vector<TensorExprKernel::ConstantDescr>& constants and std::vector<at::Tensor>& constant_tensors` to `computeOperandValue` as  it is not in `TensorExprKernel`

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30842101

Pulled By: IvanKobzarev

fbshipit-source-id: 88020f6934e43fe606f8eae928b7e21b7c3f15f6
2021-09-27 20:00:53 -07:00
31ea4358d8 [tensorexpr] Add Op handling for mobilenetv3 large (#64741)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64741

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30839110

Pulled By: IvanKobzarev

fbshipit-source-id: d8e89c086c713fbe816dd8c8096cd64c05dc7431
2021-09-27 20:00:51 -07:00
c28e3ffb4b [jit] Shape propagation batch_norm, dropout, quantize, hardswidh (#64740)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64740

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D30839111

Pulled By: IvanKobzarev

fbshipit-source-id: c8f477ee05769865c0a23127b7f8a8276f46b54e
2021-09-27 19:59:34 -07:00
46b3fc032a Migrate remainder of THCDeviceUtils.cuh to ATen (#65472)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65472

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D31148181

Pulled By: ngimel

fbshipit-source-id: f777ba85b1cd8cb98b0ceb1756c558dde5862fc2
2021-09-27 19:37:06 -07:00
12137db5e3 Fix the slowdown of _object_to_tensor since 1.9 (#65721)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65721

#Closes: https://github.com/pytorch/pytorch/issues/65696

The bug is introduced in https://github.com/pytorch/pytorch/pull/55861, and it causes 100X slowdown since 1.9.
ghstack-source-id: 139128267

Test Plan:
Performance test:
```
import time

from torch.distributed.distributed_c10d import _object_to_tensor

start = time.time()
_object_to_tensor("x" * 50_000_000)
print("Time:", time.time() - start)
```

Reviewed By: rohan-varma

Differential Revision: D31219794

fbshipit-source-id: 1abec38f9d51361c1eab6ad5efd87b589322e208
2021-09-27 19:22:10 -07:00
002ff19836 [acc_utils] Fix off by one for model info getter (#65708)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65708

att

Test Plan: added unit test

Reviewed By: khabinov

Differential Revision: D31209992

fbshipit-source-id: c1b4e70bd9705dcfdf3039cb8791149c8646f1d7
2021-09-27 19:01:55 -07:00
63bb7c6dba Refactor AotCompile to return a pair (#65707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65707

Refactoring aotCompile to return a pair of compiled function and the LLVM assembly instead of updating an incoming string with assembly code

Testing: Gives expected results when compiled and run
```
(pytorch)  ~/local/pytorch refactor_aot
└─ $ build/bin/aot_model_compiler --model mobilenetv3.pt --model_name=pytorch_dev_mobilenetv3 --model_version=v1 --input_dims="2,2,2"
The compiled model was saved to mobilenetv3.compiled.pt
```

Test Plan: Imported from OSS

Reviewed By: qihqi

Differential Revision: D31220452

Pulled By: priyaramani

fbshipit-source-id: f957c53ba83f876a2e7dbdd4b4571a760b3b6a9a
2021-09-27 18:56:04 -07:00
e9327ed2ce Add nn.function.hardtanh in acc_tracer (#65639)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65639

This op is used by mobilenet v2.

Test Plan:
buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_hardtanh
buck test glow/fb/fx/acc_tracer:test_acc_shape_inference -- hardtanh
buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_hardtanh

Reviewed By: yinghai

Differential Revision: D31184297

fbshipit-source-id: 5a04319f6d16fb930372442616e27211107ecc67
2021-09-27 18:40:18 -07:00
6a6ee92e36 [quant] Add op benchmark for CPU FakeQuantizePerChannel with float zero_points (#65241)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65241

Test Plan: Imported from OSS

Reviewed By: jingsh

Differential Revision: D31150087

Pulled By: b-koopman

fbshipit-source-id: a00d4995841eee81305d0007c908473cc3d5a727
2021-09-27 16:01:49 -07:00
7c62b6e973 add deepcopy support to subclasses (#65584)
Summary:
Happy to get any feedback on how to make this code cleaner!

This:
- Fix Tensor attribute deepcopy BC-breaking?
- Add a test for Tensor attribute deepcopy
- Fix subclass deepcopy
- Moves the subclass serialization tests into their own class not to interfere with other serialization test logic
- Add a test for subclass deepcopy

cc ezyang gchanan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65584

Reviewed By: gchanan

Differential Revision: D31206590

Pulled By: albanD

fbshipit-source-id: 74a8f0767f4933b9c941fbea880a8fd1b893ea2f
2021-09-27 14:36:22 -07:00
f5b4e369f6 Sparse SoftMax: Remove unused variables (#65539)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65539

This function doesn't directly use thrust so these are simply unused variables.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D31193191

Pulled By: malfet

fbshipit-source-id: 231b6a197c9f1bd5a61e46cb858e8eedc85b2818
2021-09-27 13:51:49 -07:00
e1340d4282 [GHA] Small refactors (#65647)
Summary:
Introduce `main` method in generate_ci_workflows
Check that all `ciflow/` labels start with the same prefix
Move `ciflow_should_run` defenition to common.yml.j2

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65647

Reviewed By: janeyx99

Differential Revision: D31189537

Pulled By: malfet

fbshipit-source-id: 7cc47f63fb334c57f450034b931ff5bae1c0ed8b
2021-09-27 13:14:49 -07:00
fea32be964 Add HPU type for check_base_legacy_new (#65410)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65410

Reviewed By: H-Huang

Differential Revision: D31143754

Pulled By: malfet

fbshipit-source-id: 32abfbae4f7c09924c7dfa16758d64a2215ec636
2021-09-27 13:13:34 -07:00
82e0bf44c0 Apply linter suggestions to #65137 (#65459)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65459

Just run linter on the change and apply all suggestions

Test Plan: N/A

Reviewed By: seemethere

Differential Revision: D31102960

fbshipit-source-id: 04e1d07935690f2ddbc64533661b3e55379d13b5
2021-09-27 13:07:40 -07:00
811601e19a Upload sccache stats (#65582)
Summary:
This adds some tracking to metrics.pytorch.org for sccache build stats per environment

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65582

Reviewed By: malfet, zhouzhuojie, janeyx99

Differential Revision: D31160761

Pulled By: driazati

fbshipit-source-id: a497918bafbe610a51c92a9139684cd3efe670d3
2021-09-27 12:55:10 -07:00
ea546e20fd [Reland] nn.functional.linear OpInfo (#65498)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65498

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D31171149

Pulled By: zou3519

fbshipit-source-id: badb06af08a772397b0280189385723c0175200b
2021-09-27 12:42:46 -07:00
b91375f741 upgrade windows cuda installer: cu11.1.0 to cu11.1.1 (#65669)
Summary:
Fixes pytorch/vision#4483

Please merge it with https://github.com/pytorch/builder/pull/857

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65669

Reviewed By: gchanan

Differential Revision: D31205107

Pulled By: janeyx99

fbshipit-source-id: 654f0440ad33d2517db95d64df64e14de1233ad7
2021-09-27 12:27:19 -07:00
cd2656a2e5 [package] add some docs describing how to debug dependencies (#65704)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65704

As title.

Test Plan: Imported from OSS

Reviewed By: tugsbayasgalan

Differential Revision: D31209866

Pulled By: suo

fbshipit-source-id: 4c8ec1d5418ea75b71c4b9a498b86f0ef5383544
2021-09-27 12:14:23 -07:00
10d0dbc6d9 Avoid storage access for HPU tensors (#65409)
Summary:
Add is_hpu() methods for Aten tensor and device

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65409

Reviewed By: wconstab, H-Huang

Differential Revision: D31134422

Pulled By: malfet

fbshipit-source-id: 181ebb67dce8e05a0723ef3c82f23e39228841ee
2021-09-27 11:54:30 -07:00
aa5d2a8d86 Remove confusing SHARD_NUMBER resetting logic (#65701)
Summary:
The SHARD_NUMBER reset was to figure out a way to differentiate whether we had just one shard vs multiple.

We shouldn't reset SHARD_NUMBER but instead should just pass and use NUM_TEST_SHARDS for clarity and ease of scaling up to more shards.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65701

Reviewed By: driazati

Differential Revision: D31209306

Pulled By: janeyx99

fbshipit-source-id: 3a3504bd47e655d62aa0d9ed2f4657ca34c71c0e
2021-09-27 10:55:00 -07:00
facff2ec65 Update ProcessGroup collective C++ APIs to be non-pure virtual functions (#64943)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64943

Most ProcessGroup Collective APIs are pure virtual. As a result, c10d extensions need to override all of them and throw an error if they don't need certain APIs. This is too verbose for users. This commit changes those collective APIs to virtual and throws an error by default. Note that ProcessGroup is still an abstract class as `getBackendName` is a pure virtual function that all subclasses have to override.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang cbalioglu gcramer23

Test Plan: Imported from OSS

Reviewed By: cbalioglu

Differential Revision: D30906866

Pulled By: mrshenli

fbshipit-source-id: c4df8962d60350a44d2df72fd04f9dd6eadb9fa6
2021-09-26 19:19:43 -07:00
cd80bbe5f5 Bug fixes in dataframe_wrapper (#65629)
Summary:
## Description
- Updated functions in `dataframe_wrapper.py` to return values
- Fixed bug in `set_df_wrapper` to update `global default_wrapper`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65629

Reviewed By: ejguan

Differential Revision: D31180110

Pulled By: Nayef211

fbshipit-source-id: a8046e582fd6ce982fcdc89dae4932d0edc83d6b
2021-09-25 21:09:41 -07:00
1c8949c51a [BE] Run Zero test internally (#65519)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65519

Adds buck target so we can run this internally.
ghstack-source-id: 139009957

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D31072784

fbshipit-source-id: 7185cc1e6f9df3d79251eb017270471942a9d7dd
2021-09-25 13:26:50 -07:00
f70147b426 [BE] Enable ZeRO test on windows (#65385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65385

Enables the ZeRO tests to run on windows. Closes
https://github.com/pytorch/pytorch/issues/63086.

Backend == NCCL was used as a proxy to see if we were running under CUDA, but Windows GPU tests uses Gloo. In this case use Gloo on GPU.

For some reason these tests don't seem to test Gloo on GPU with ZeRO in general (picks NCCL backend when GPU is available), so kept that behavior for now.
ghstack-source-id: 139003920

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D31071181

fbshipit-source-id: 45a76309ac5e882f5aa6c4b130118a68800754bb
2021-09-25 13:25:40 -07:00
4fe66d962d [Codemod][FBSourceBlackLinter] Daily arc lint --take BLACK
Reviewed By: zertosh

Differential Revision: D31192084

fbshipit-source-id: 25d490783b876253ddd1ad0a70832766ebd33f51
2021-09-25 06:42:19 -07:00
146817c9d0 Add all_paths utility function (#65602)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65602

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D31163681

Pulled By: tugsbayasgalan

fbshipit-source-id: fa0b28b1d3b73efcc7671698a613e695a01cc103
2021-09-25 01:11:20 -07:00
0256c3be50 [TensorExpr] Delete dtype_ field from Let - it should use its var's dtype. (#65634)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65634

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D31182697

Pulled By: ZolotukhinM

fbshipit-source-id: 572ecd74cdf2a671ee98e81f0b3e387f3d9c6202
2021-09-25 00:11:06 -07:00
399214efd6 Revert D31172530: [pytorch][PR] Enable CUPTI for kineto by default on windows
Test Plan: revert-hammer

Differential Revision:
D31172530 (6b60884f12)

Original commit changeset: 2c69ed0282c5

fbshipit-source-id: 649e040a8c44b0f536a8db397b4325309a285934
2021-09-24 19:18:15 -07:00
cda2ee9016 Add nn.function.hardswish in acc_tracer (#65590)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65590

hardswish is used by mobile net v3 oss model.
This diff added hardswish support in acc_tracer

Test Plan:
buck test glow/fb/fx/acc_tracer:test_acc_shape_inference
buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_hardswish

Reviewed By: 842974287

Differential Revision: D30950061

fbshipit-source-id: cab57b8de5bea3a4d9d2b7d2a410d9afe787d66f
2021-09-24 17:30:39 -07:00
1de8976e85 Add quantized::convtranspose2d (#63914)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63914

Test Plan: Imported from OSS

Reviewed By: dreiss

Differential Revision: D30531889

fbshipit-source-id: a65e389da2722efbc62e3fe1edf503732326350d
2021-09-24 17:07:29 -07:00
ab5eb56983 add qmul (#63913)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63913

Test Plan: Imported from OSS

Reviewed By: dreiss

Differential Revision: D30531890

fbshipit-source-id: 29d88cc61bd1e328cc7ae7a91a2f8d4819803c8d
2021-09-24 17:06:17 -07:00
ece25c453f [PyTorch] Store Argument::alias_info_ on the heap (#64824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64824

See comment in function_schema.h for explanation. I claim that this is a good tradeoff because the aliasing information seems to be used only in compiler-ish code paths, where performance isn't as critical as actual execution. If performance is important there too, perhaps we should hoist isWrite into the Argument itself since there are several paths that only care about isWrite.
ghstack-source-id: 138958896

Test Plan: CI, profile schema parsing on startup and see much fewer page faults in createArgumentVector.

Reviewed By: suo

Differential Revision: D30860719

fbshipit-source-id: 1d4d2328f2b8e34f5ddf9d82083fd4dd7b7f738f
2021-09-24 17:00:51 -07:00
af7238f214 Rocm4.3.1 nightly (#65624)
Summary:
Depends on pytorch/builder#851.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65624

Reviewed By: zou3519

Differential Revision: D31180780

Pulled By: malfet

fbshipit-source-id: 98a51eb45985ef648108e811d2c02231ec8b3a1f
2021-09-24 16:21:01 -07:00
15724bcc03 [TensorExpr] Re-enable a float16 test. (#65632)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65632

Test Plan: Imported from OSS

Reviewed By: huiguoo

Differential Revision: D31181798

Pulled By: ZolotukhinM

fbshipit-source-id: 1a57d0a878d44f8b73f3c24eef7ba707ce18fb70
2021-09-24 15:15:42 -07:00
0d3bf97fd0 TST Adds test for non-contiguous tensors (#64954)
Summary:
Follow up to https://github.com/pytorch/pytorch/issues/61935

This PR:

1. Adds test for non-contiguous tensors
2. Fixes bug in `NLLLoss` that was catch by the test.

The reason this was not catch in `common_nn` is because `CriterionTest` overrides `test_cuda` but does not call `test_nonconfig`.

cc albanD mruberry jbschlosser walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64954

Reviewed By: zou3519

Differential Revision: D31174149

Pulled By: jbschlosser

fbshipit-source-id: a16073e59b40ccc01c82ede016b63a8db2e810f5
2021-09-24 15:05:09 -07:00
a839cec0ad .github: GHA retry docker pull (#65103)
Summary:
This should help alleviate workflows failing due to docker pull timing out, which doesn't happen often, but did happen once in the past day.

Was also reported in https://github.com/pytorch/pytorch/issues/65439

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65103

Reviewed By: driazati

Differential Revision: D31157772

Pulled By: janeyx99

fbshipit-source-id: 7bf556f849b41eeb6dea69d73e5a8e1a40dec514
2021-09-24 14:31:43 -07:00
68e5935498 Remove fgrad_input from slow_conv2d (#64280)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64280

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30830887

Pulled By: jbschlosser

fbshipit-source-id: 5a3a79ad9d9118177672eabf872f9d9a3313ebe4
2021-09-24 14:27:39 -07:00
71d1d16acb Moving the constant parameter check to a more common file (#64251)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64251

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D31161850

Pulled By: Gamrix

fbshipit-source-id: 5db3e6d52c99c1f40455c601122bb7680a287ae5
2021-09-24 13:54:27 -07:00
640a615150 [easy] [PyTorch Edge] Remove double pragma once directive in the generated code (#65620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65620

This was bothering me for a while.

ghstack-source-id: 138914860

Test Plan: Sandcastle

Reviewed By: beback4u

Differential Revision: D31162648

fbshipit-source-id: 72c47ea34d40c772bb53da721fcb36365b5dbaf3
2021-09-24 13:14:37 -07:00
57e066e188 TST Adds gradcheck and gradgradcheck to module info (#64444)
Summary:
Follow up to https://github.com/pytorch/pytorch/issues/61935

cc albanD mruberry jbschlosser walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64444

Reviewed By: pbelevich

Differential Revision: D31174672

Pulled By: jbschlosser

fbshipit-source-id: 86dc3576479974fd0996f06298c09692c07e6b24
2021-09-24 13:10:29 -07:00
6b60884f12 Enable CUPTI for kineto by default on windows (#65608)
Summary:
Retry of https://github.com/pytorch/pytorch/pull/62175

See https://github.com/pytorch/pytorch/pull/62175#issuecomment-926411151 for more information.

malfet gdankel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65608

Reviewed By: zou3519

Differential Revision: D31172530

Pulled By: gdankel

fbshipit-source-id: 2c69ed0282c54fa6cdb6e604096d0370e230fd66
2021-09-24 13:00:49 -07:00
eca4f14b6c [PyTorch] Add C10_ prefix to MPARK_* macros in variant.h (#65589)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65589

Without this prefix, the include guards interfere with attempts to indirectly include both c10::variant and the original mpark variant in the same translation unit.
ghstack-source-id: 138901838

Test Plan: Temporarily `#include <c10/util/variant.h>` in ivalue.h and buck build //data_preproc/preproc:preproc_adapter_utils mode/no-gpu -- this delayed D31101962 (01720d6a23) from fixing S244170

Reviewed By: bhosmer

Differential Revision: D31159414

fbshipit-source-id: 234c5ed37ca853702bcdf3263e4f185b95ac1d08
2021-09-24 12:57:26 -07:00
7f25c3e666 Update distributed.rst to show that CUDA send/recv on GPU is supported (#65601)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65601

I believe this feature was supported one year ago:
https://github.com/pytorch/pytorch/pull/44921

#Closes: https://github.com/pytorch/pytorch/issues/65525
ghstack-source-id: 138918961

Test Plan: N/A

Reviewed By: pritamdamania87, mingzhe09088

Differential Revision: D31163535

fbshipit-source-id: 9321a0a5137a3e265e2b54bd78730ac28c7acd55
2021-09-24 12:30:10 -07:00
760aefd34d Fix nullptr addition (#65548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65548

Fixes
caffe2/test:jit - test_unsupported_nn_functional_pad_circular_cpu_float32 (test_jit_fuser_te.TestNNCOpInfoCPU)

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D31148405

fbshipit-source-id: 4c8c693a45229ab4e59b0b0ec5326d3ac114dbaf
2021-09-24 11:43:22 -07:00
c3b09e977a [fx2trt] Refresh execution context across save/load for TRTModule. (#65592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65592

IExecutionContext might not be safe to be serialized, therefore the simplest way to support save/load of TRTModule is to re-populate the execution context upon every load.
ghstack-source-id: 138904770

Test Plan: buck run mode/dev-nosan -c python.package_style=inplace -j 40 deeplearning/trt/fx2trt:acc2trt_test

Reviewed By: zrphercule

Differential Revision: D31070427

fbshipit-source-id: 88c58c6ce50e6dc9383d7f9419b5447cb89a4a3a
2021-09-24 11:36:57 -07:00
1682722152 keep output type after calling SubgraphRewriter (#65453)
Summary:
For jit **SubgraphRewriter**, it doesn't keep output type after overwriting the old graph, for example, in profiling mode, the old graph has the old operator's shapes, but after replacing the old operator with a newer operator by applying **SubgraphRewriter**, the tensor shape info was eliminated.

The activation is that I want to replace pytorch convolution with a customer's convolution, I first register **aten::_convolution** as a profiler node that can reorder the input and output's shapes, and then using graph rewrite to replace it as **aten::conv2d**, which tensors' shapes info are eliminated. I hope using input size do some pre-progress before replacing **aten::conv2d** with the customer's convolution.

Before rewrite:
```
graph(%self.1 : __torch__.MyModule,
      %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)):
  %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/                      site-packages/torch/nn/modules/conv.py:443:0
  %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6                      /site-packages/torch/nn/modules/conv.py:443:0
  %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6                      /site-packages/torch/nn/modules/conv.py:443:0
  %4 : NoneType = prim::Constant()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %2 : int[] = prim::Constant[value=[0, 0]]()
  %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1)
  %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:2                      2:0
  %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv)
  %x : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::_convolution(%x.1, %weight, %4,                       %3, %2, %3, %6, %2, %7, %6, %6, %5, %5), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.                      6/site-packages/torch/nn/modules/conv.py:443:0
  %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%x, %z, %7) # jit_test.py:                      24:0
  return (%16)
```
 after rewrite by using **aten::conv2d**
```
graph(%self.1 : __torch__.MyModule,
      %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)):
  %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0
  %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0
  %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0
  %4 : NoneType = prim::Constant()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %2 : int[] = prim::Constant[value=[0, 0]]()
  %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1)
  %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:22:0
  %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv)
  %18 : Tensor = aten::conv2d(%x.1, %weight, %4, %3, %2, %3, %7)
  %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%18, %z, %7) # jit_test.py:24:0
  return (%16)
```

expected result after replace **aten::_convolution** with  **aten::conv2d**:

```
graph(%self.1 : __torch__.MyModule,
      %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)):
  %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/                      site-packages/torch/nn/modules/conv.py:443:0
  %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6                      /site-packages/torch/nn/modules/conv.py:443:0
  %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6                      /site-packages/torch/nn/modules/conv.py:443:0
  %4 : NoneType = prim::Constant()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %2 : int[] = prim::Constant[value=[0, 0]]()
  %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1)
  %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:2                      2:0
  %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv)
  %18 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::conv2d(%x.1, %weight, %4, %3,                       %2, %3, %7)
  %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%18, %z, %7) # jit_test.py                      :24:0
  return (%16)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65453

Reviewed By: zdevito

Differential Revision: D31162489

Pulled By: ZolotukhinM

fbshipit-source-id: 0d1c1d607cb612df47c64f173d9f4c9e8b1d6c49
2021-09-24 11:07:40 -07:00
f3587f6bfa Remove THC ScalarConvert (#65471)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65471

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D31148182

Pulled By: ngimel

fbshipit-source-id: bbf74e36a3d91a7be3e47199981440c68a2f645f
2021-09-24 10:29:51 -07:00
5b2a7eaa03 [codemod][fbcode/caffe2] Apply all buildifier fixes
Test Plan: Visual inspection. Sandcastle.

Reviewed By: zsol

Differential Revision: D31170304

fbshipit-source-id: ee56312b5262247bb5a2e68a66d51f6cb3a0bf82
2021-09-24 09:03:29 -07:00
b858993c97 Fix engine check for case where grad is a subclass (#65568)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65568

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D31158089

Pulled By: albanD

fbshipit-source-id: 2a77df9b6340107de02a043b57a36cb7ae68df34
2021-09-24 08:41:19 -07:00
e742839f0e Fix autograd engine test in python_dispatch (#65567)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65567

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D31158090

Pulled By: albanD

fbshipit-source-id: 651b78016ad978c7419343554ce7ceffd54aef1b
2021-09-24 08:39:52 -07:00
ef9e560796 [Static Runtime] Add aten::remainder out variant (#64967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64967

Out variant implementation for `aten::remainder`. Added both scalar and tensor overloads.

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Remainder`

Reviewed By: d1jang

Differential Revision: D30915469

fbshipit-source-id: 9f27f18c86d66b11eac0aa4659c7062cb785b7e9
2021-09-24 07:51:39 -07:00
b003b2a9c0 [Static Runtime] Add record functions (#64698)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64698

Reviewed By: hlu1

Differential Revision: D30747191

fbshipit-source-id: 7ded6ea9bd36b5e3343d1efa9f3c92e02ff6d7f8
2021-09-24 07:20:17 -07:00
fd24e1b61f add OpInfo for torch.repeat_interleave (#65455)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65455

Addresses facebookresearch/functorch#103.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D31111696

Pulled By: zou3519

fbshipit-source-id: 4fa73708fa915cb21adbba9cb8fd2b8f75bcd3e0
2021-09-24 07:16:08 -07:00
d85e12a5bf add OpInfo for torch.argsort (#65454)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65454

Addresses facebookresearch/functorch#103.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D31111700

Pulled By: zou3519

fbshipit-source-id: ec4babd2fcdcea856ba0ee8db0fd8f42b87269f3
2021-09-24 07:14:41 -07:00
ca66698202 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D31166199

fbshipit-source-id: 3fb46d64aba5e7c443b70beda77338f2ee63a99e
2021-09-24 02:57:37 -07:00
cc4db35205 [TensorExpr] Break circular dependency of shared pointers in MemDependencyChecker. (#65600)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65600

Previously AccessInfo owned two maps: dependencies_ and dependents_,
which represented an edge in dependency graph. These two maps were
holding shared pointers and thus each edge immediately became a cycle,
which resulted in memory leaks. This PR makes one of the ends of these
edges weak pointer thus breaking the loop.

Test Plan: buck test mode/dbgo-asan-ubsan //search/lib/query_expansion/candidate_generator/test:transliteration_expander_test -- --exact 'search/lib/query_expansion/candidate_generator/test:transliteration_expander_test - TransliterationExpander.romanizationByLocaleTest'

Reviewed By: bertmaher

Differential Revision: D31163441

Pulled By: ZolotukhinM

fbshipit-source-id: 9cef921f5c9293f1237144d1ee92e31f3e44c00a
2021-09-23 23:33:36 -07:00
01720d6a23 [JIT] constant object compilation unit ref fix (#65442)
Summary:
// A non owning pointer to a type. When a class get inserted as a constant
// into a graph, if we used a strong pointer we would have a circular reference
// from Object -> CompilationUnit and CompilationUnit -> Graph (which owns the
// Constant Object)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65442

Reviewed By: ezyang

Differential Revision: D31101962

Pulled By: eellison

fbshipit-source-id: f1c1cfbe5a8d16a832cad7ba46e2a57a98670083
2021-09-23 22:43:02 -07:00
f83250fd4e Revert logic in mobile/type_parser.cpp (#65556)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65556

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D31149080

Pulled By: ansley

fbshipit-source-id: d5986d019fc2c47fd45cc10f0397499cc1e81329
2021-09-23 22:26:02 -07:00
20143bf07f [ONNX] Deprecate use_external_data_format param from torch.onnx.export() function. (#62257) (#64382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64382

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905265

Pulled By: malfet

fbshipit-source-id: 82b4e17bfa6a8de2bfd700a5282c12f6835603cb

Co-authored-by: hwangdeyu <dejack953@outlook.com>
2021-09-23 22:20:48 -07:00
478d4cf883 [ONNX] Deprecated the example_outputs param from torch.onnx.export() function. (#62815) (#64380)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64380

* `example_outputs` used to determine the type and shape of the outputs without tracing the execution of the model. And it must be provided when exporting a ScriptModule or ScriptFunction when using export() function.

* Since we can work out `example_outputs` in internal function instead of being provided by user, so we deprecated this argument in the export() function to increase user experience of calling this function.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905266

Pulled By: malfet

fbshipit-source-id: d00b00d7d02b365d165028288ad915678caa51f2

Co-authored-by: hwangdeyu <dejack953@outlook.com>
2021-09-23 22:20:46 -07:00
9323ea2195 [ONNX] minor doc improvements and cleanup (#62514) (#64373)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64373

* Fix some bad formatting and clarify things in onnx.rst.
* In `export_to_pretty_string`:
    * Add documentation for previously undocumented args.
    * Document that `f` arg is ignored and mark it deprecated.
    * Update tests to stop setting `f`.
    * Warn if `_retain_param_name` is set.
* Use double quotes for string literals in test_operators.py.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905271

Pulled By: malfet

fbshipit-source-id: 3627eeabf40b9516c4a83cfab424ce537b36e4b3
2021-09-23 22:20:44 -07:00
9965163751 [ONNX] Add supplementary tests and description for custom_opsets param from torch.onnx.export() function. (#62085) (#64372)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64372

custom_opsets arg from torch.onnx.export() is no needed to be removed.

Add some supplementary description and tests for easier understanding.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905269

Pulled By: malfet

fbshipit-source-id: 489fbee0e2c1d6c5405c9bf7dfd85223ed981a44

Co-authored-by: hwangdeyu <dejack953@outlook.com>
2021-09-23 22:20:42 -07:00
fb71ccf0f1 [ONNX] Remove strip_doc_string param from torch.onnx.export() function. (#61712) (#64371)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64371

As of now, the "strip_doc_string" parameter was described as below:

strip_doc_string (bool, default True): do not include the field
doc_string``` from the exported model. Otherwise the field will mention the source code locations for model``.

This is usually useless to users who want to transform a PyTorch model to ONNX one. Only when the user wants to debug the export process, these source code locations could provide benefits.

To make the export() function more friendly by providing less parameters, we combined "strip_doc_string" into "verbose" parameter. If a user set verbose to True, it means the users need some log information for debugging the export process and this is similar with the purpose of strip_doc_string parameter.

But the usage of these 2 arguments are opposite: setting verbose to True means we want to print log information to help debug, which means strip_doc_string should be False. And this is how we replace strip_doc_string with verbose argument in this PR.

This PR will still keep it in torch.onnx.export() function for backward support while the usage of it has been combined with verbose argument.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905268

Pulled By: malfet

fbshipit-source-id: 2f06eb805c01fe15ff7a1b4f6595c937ba716d60

Co-authored-by: fatcat-z <zhang-ji@outlook.com>
2021-09-23 22:20:40 -07:00
47d1ed60e1 [ONNX] Remove argument _retain_param_name from torch.onnx.export() function. (#61702) (#64370)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64370

As of now, the "_retain_param_name" parameter has no description in PyTorch docs website. According to code, this argument determines if we keep the original parameter names of PyTorch model in the final ONNX graph. If this is False, those original parameter names will be replaced with a series of integers starting from 1.

Since setting numbers as parameter names make no sense to users, we remove this argument from the torch.onnx.export() function to increase user experience of calling this function.

This PR will still keep it in torch.onnx.export() function for backward support while all backend logic has been changed to work as _retain_param_name is set to True.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905270

Pulled By: malfet

fbshipit-source-id: ca60757ca17daaff937e9f08da42596086795f4a

Co-authored-by: fatcat-z <zhang-ji@outlook.com>
2021-09-23 22:18:52 -07:00
bc02255d5e Revert D30721329: [pytorch][PR] Enable CUPTI for kineto by default on windows.
Test Plan: revert-hammer

Differential Revision:
D30721329 (7dbc21bc2b)

Original commit changeset: aa1af47df8cc

fbshipit-source-id: 565d50841e19a45f8798a490aa3aa6b9f69ca404
2021-09-23 22:14:32 -07:00
8c7caedbb8 avoid re-allocation of view_shape for every tensor in torch.meshgrid (#62908)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62908

Reviewed By: mruberry

Differential Revision: D31064165

Pulled By: dagitses

fbshipit-source-id: 3ddc3088e70fc8ef6dcf56ceb67fd20991169af1
2021-09-23 21:41:51 -07:00
963ae25e41 Migrate THCAtomics to ATen (#65470)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65470

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D31148184

Pulled By: ngimel

fbshipit-source-id: aaac3dfb5f2c6f88e9bd922b3a56d0a16a861e17
2021-09-23 19:43:34 -07:00
c73f0e457e Tensor and device is_hpu methods (#65408)
Summary:
Add is_hpu() methods for Aten tensor and device

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65408

Reviewed By: malfet

Differential Revision: D31144227

Pulled By: wconstab

fbshipit-source-id: 115f4df4b8d54e6913dd51af7b6d4cacf6dd43c5
2021-09-23 18:42:45 -07:00
d78b3909e8 Explicitly destory ProcessGroup in allgather_coalesced_async test (#65513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65513

The error in #65231 means some child threads were destructed before
joined. I added some trace and prints and found that, in the failed
tests, all `assertEqual` are passed, but the `ProcessGroupGloo`
destructor wasn't called in one of the process. It could be due to
the only guarantee that Python makes is that garbage collection MAY
happen before the program exits. This commit adds an explicit
`destroy_process_group()` to alleviate the problem.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D31134174

Pulled By: mrshenli

fbshipit-source-id: 2e42fe93d3f16ce34681b591afc15a6ac0b9fab6
2021-09-23 18:35:08 -07:00
b77c979102 [quant][fx][graphmode] Make FixedQParam ops work for dtypes other than quint8 (#65484)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65484

This PR makes sure we only use FixedQParamFakeQuantize for quint8 dtype and allows user
to use other dtypes for ops like sigmoid, this is useful for producing reference pattern for
these ops that can be used in other backends like TensorRT

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D31120377

fbshipit-source-id: 3b529d588e2b6ff0377a89c181f6237f8f0cc2f5
2021-09-23 18:29:56 -07:00
a2e631b874 Windows GHA: Only upload artifacts if prev steps pass (#65561)
Summary:
Fixes a task in https://github.com/pytorch/pytorch/issues/65439

And removes the Upload to GitHub step as it's redundant with the S3 step.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65561

Reviewed By: seemethere

Differential Revision: D31157685

Pulled By: janeyx99

fbshipit-source-id: cd23113a981eb4467fea3af14d916f6f2445a02b
2021-09-23 17:38:39 -07:00
7dbc21bc2b Enable CUPTI for kineto by default on windows. (#62175)
Summary:
It fix nothing.

For tracking this PR, please refers to https://github.com/pytorch/kineto/issues/356

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62175

Reviewed By: ezyang

Differential Revision: D30721329

Pulled By: gdankel

fbshipit-source-id: aa1af47df8cc1b6f5ba2194447f62b902a6a9c84
2021-09-23 15:13:47 -07:00
f850d7ef2e [CoreML][OSS] Add Simulator tests (#65076)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65076

ghstack-source-id: 138869950

create a new conda environment - conda create --name coreml python=3.8
conda activate coreml
pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
pip install coremltools==5.0b5
cd pytorch
git fetch
git checkout gh/xta0/131/head
cd ios/TestApp/benchmark
mkdir ../models
python coreml_backend.py
Test the model_coreml.ptl in the helloworld example

Test Plan:
1. CircleCI
2. Pytorch nightly builds

Reviewed By: hanton

Differential Revision: D30912268

fbshipit-source-id: 52b2ed1ad40e5949ee2755bca113119132dfc914
2021-09-23 14:57:01 -07:00
2a0208f4dc fixed comments referring fairscale master branch (#65531)
Summary:
replace comments referring fairscale master branch with main branch

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65531

Test Plan:
buck build

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23

Reviewed By: H-Huang, anj-s

Differential Revision: D31132552

Pulled By: tmarkstrum

fbshipit-source-id: d3ee8920ab5cccad99f640934c21e8eee022e9b9
2021-09-23 14:37:58 -07:00
c015cbabf9 [codemod][fbcode/caffe2] Apply all buildifier fixes
Test Plan: Visual inspection. Sandcastle.

Reviewed By: zsol

Differential Revision: D31144864

fbshipit-source-id: f8e65fec69f88d03048df9edb98969d648eb6981
2021-09-23 14:03:19 -07:00
d07b2cb4ec [fx2trt] update the oss fx2trt exmaple (#65544)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65544

ATT

Test Plan: CI

Reviewed By: mikekgfb

Differential Revision: D31147750

fbshipit-source-id: eacc1c9157a32d6deebbfe9ff2aaae13c434e72b
2021-09-23 13:45:22 -07:00
71704349aa [DDP] Allow await of custom buffer reduction in backward (#64515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64515

For performance reasons, we would like to ensure that we can await
user collectives as part of custom buffer reduction in parallel to other work.
As a result, add support to return futures from custom buffer hooks and await
those futures at end of backwards pass.

Also added some docs to clarify how to use these APIs.
ghstack-source-id: 138793803

Test Plan: I

Reviewed By: zhaojuanmao

Differential Revision: D30757761

fbshipit-source-id: e1a2ead9ca850cb345fbee079cf0614e91bece44
2021-09-23 13:02:53 -07:00
36485d36b6 Docathon: Add docs for nn.functional.*d_max_pool (#63264)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63264

Adding docs to max_pool to resolve docathon issue #60904

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31071491

Pulled By: Gamrix

fbshipit-source-id: f4f6ec319c62ff1dfaeed8bb6bb0464b9514a7e9
2021-09-23 11:59:50 -07:00
1f0f246fe2 Automated submodule update: FBGEMM (#65360)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: 0108d4f552

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65360

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: jspark1105

Differential Revision: D31061552

fbshipit-source-id: 8bce5157a281e38cad5d5d0e9dcd123beda39735
2021-09-23 11:47:15 -07:00
65fbd2c12b [ci] do not continue through error on trunk (#65503)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65503

There are two reasons for this change:
- I don't think trunk jobs should have different behavior than their PR equivalents.
- Continuing through error makes it challenging to figure out what is
actually failing, especially given the poor UX of GitHub Actions when it
comes to reading logs

Example: https://github.com/pytorch/pytorch/runs/3680114581. Here, there
is a failure but the rendered test results tell me everything is
successful. I have no idea how to quickly tell what failed; the log is so long
and terms like "error", "failure", etc. are common enough that searching
it is very difficult.

Differential Revision:
D31130478
D31130478

Test Plan: Imported from OSS

Reviewed By: ezyang

Pulled By: suo

fbshipit-source-id: 15a80475ca4c49644c0f7b779f5c6c2ffeb946a6
2021-09-23 11:36:03 -07:00
7e772e7685 Update link to tutorial on defining NN modules (#65534)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65527. Please, see my comment in the issue: https://github.com/pytorch/pytorch/issues/65527#issuecomment-925863193. The file was renamed in ce58d5904c (diff-e5ef486bd89eb38de15752211d9437953681b8caa8f44d7c86bb820d13151df2), but the link in this repository was not updated.

It doesn't change the fact that the old link is still working, but I guess this has to be fixed in [pytorch/tutorials](https://github.com/pytorch/tutorials) instead of here.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65534

Reviewed By: soulitzer

Differential Revision: D31144269

Pulled By: H-Huang

fbshipit-source-id: f70744a21113b7dc84510e2992d87f0fed793985
2021-09-23 11:26:50 -07:00
cac7c1a192 [ci] remove auto-label-rocm workflow (#65558)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65558

This will temporarily be replaced by an FB-internal workflow that does
the exact same thing, pending a migration of this workflow to probot.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport

Test Plan: Imported from OSS

Reviewed By: zhouzhuojie, driazati

Differential Revision: D31149105

Pulled By: suo

fbshipit-source-id: 2aa122820ae3b5286774501f5ecfe052bc949dea
2021-09-23 11:15:35 -07:00
c731be8066 [BE] Use DispatchKeySet in check_base_legacy_new (#65535)
Summary:
Refactor:
```
TORCH_CHECK ( key == a ||
              key == b ||
              key == c,
              "expected key to be in ", a, " or ", b , " or ", c,
              " but got ", key);
```
into
```
TORCH_CHECK( key_set.has(key),
            "expected key to be in ", key_set,
            " but got ", key );
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65535

Reviewed By: wconstab

Differential Revision: D31144239

Pulled By: malfet

fbshipit-source-id: 68a053041a38f043e688e491889dd7ee258f3db3
2021-09-23 11:01:23 -07:00
da166d4f12 Add a timeout argument to RPC shutdown() (#65425)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65425

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23

Test Plan:
Imported from OSS

   python3 test/distributed/rpc/test_tensorpipe_agent.py -v -k test_wait_all_workers_timeout

Reviewed By: mrshenli

Differential Revision: D31092483

Pulled By: dracifer

fbshipit-source-id: 5b5e9f20b1d6602cf8cde3772678f721dddf0d78
2021-09-23 10:42:58 -07:00
97b535dabd [PyTorch] add fastToString for infer_schema (#64823)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64823

We seem to spend noticable time in vfprintf for this, and the number of arguments is almost always small enough to do this in just a few instructions.
ghstack-source-id: 138623354

Test Plan: Profile schema parsing, saw less time in vfprintf

Reviewed By: ezyang, dhruvbird

Differential Revision: D30860716

fbshipit-source-id: 09ef085cd6f93dc1eaa78790dde918ac60e67450
2021-09-23 10:15:40 -07:00
eb949464d6 [PyTorch] Fix missing moves in SchemaParser::parseArgument (#64839)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64839

Resulted in some extra shared_ptr refcount bumps.
ghstack-source-id: 138623356

Test Plan: CI

Reviewed By: smessmer

Differential Revision: D30875749

fbshipit-source-id: 531f04c453f7410ed3d4ff054217f21a250be8e9
2021-09-23 10:14:22 -07:00
14307f7a56 [Static Runtime] Added logging to dump the model graphs (#65509)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65509

With this change, we can get dumps of the model graphs by setting the env variable `PYTORCH_JIT_LOG_LEVEL=">>impl"` while running the model.

Test Plan: buck test mode/opt-clang //caffe2/benchmarks/static_runtime:static_runtime_cpptest

Reviewed By: mikeiovine

Differential Revision: D31125797

fbshipit-source-id: d8979a4e138047518140e0eaecb46e012891b17c
2021-09-23 10:06:13 -07:00
767a104698 [quant] change observer FQNs generated in prepare step (#65420)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65420

Context: In some FB use cases we have a need to map observer stats from train model checkpoint to inference model. We observerd that some buffer names are different becuase the intermediate activation tensors
are generated differently across train and inference model. More details in https://fb.quip.com/PtGcAR0S5CQP

Currently, for each observer (activation_post_process), the FQN of the module inserted is determined based on the FQN of the input tensor it is observing.

In this change we change the observer FQN to include the FQN of the op/module it is observing rather than tensor/intermediate op names along with the “input”/“output” detail.

Before
```
def forward(self, x):
    x_activation_post_process_0 = self.x_activation_post_process_0(x);  x = None
    mods1_w = self.mods1.w
    mods1_w_activation_post_process_0 = self.mods1_w_activation_post_process_0(mods1_w);  mods1_w = None
    mods1_b = self.mods1.b
    linear = torch.nn.functional.linear(x_activation_post_process_0, mods1_w_activation_post_process_0, bias = mods1_b);  x_activation_post_process_0 = mods1_w_activation_post_process_0 = mods1_b = None
    linear_activation_post_process_0 = self.linear_activation_post_process_0(linear);  linear = None
    return linear_activation_post_process_0
```

After
```
def forward(self, x):
    mods1_input_activation_post_process_0 = self.mods1_input_activation_post_process_0(x);  x = None
    mods1_w = self.mods1.w
    mods1_w_activation_post_process_0 = self.mods1_w_activation_post_process_0(mods1_w);  mods1_w = None
    mods1_b = self.mods1.b
    linear = torch.nn.functional.linear(mods1_input_activation_post_process_0, mods1_w_activation_post_process_0, bias = mods1_b);  x_activation_post_process_0 = mods1_w_activation_post_process_0 = mods1_b = None
    mods1_output_activation_post_process_0 = self.mods1_output_activation_post_process_0(linear);  linear = None
    return mods1_output_activation_post_process_0
```

Test Plan:
python test/test_quantization.py test_observer_fqn

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D31088652

fbshipit-source-id: 2f1526f578a13000b34cfd30d11f16f402fd3447
2021-09-23 09:08:10 -07:00
a012216b96 [nn] Fold : no batch dim (#64909)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64907
Reference: https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64909

Reviewed By: cpuhrsch, heitorschueroff

Differential Revision: D30991087

Pulled By: jbschlosser

fbshipit-source-id: 91a37e0b1d51472935ff2308719dfaca931513f3
2021-09-23 08:37:32 -07:00
2a4d5e4c6d [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D31138547

fbshipit-source-id: ba134ae7f057c918eaefdc6310f7663e187e9749
2021-09-23 07:54:32 -07:00
9668a8a82d [DataPipe] Update Docstrings for Tar and ZipArchiveReader (#65500)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65500

cc VitalyFedyunin ejguan

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D31127241

Pulled By: NivekT

fbshipit-source-id: aed41aa192fe55e10ba67beda460fac70f2703c7
2021-09-23 07:20:08 -07:00
7e7be526c9 Add TORCH_SHOW_CPP_STACKTRACES to Contributing.md (#64052)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64052

Reviewed By: ezyang

Differential Revision: D31107779

Pulled By: Chillee

fbshipit-source-id: 2ad8ad40cd48e54fe711863c3c74df884a2e2de7
2021-09-22 22:53:19 -07:00
14949d2922 Add nn.function.hardsigmoid in acc_tracer (#65422)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65422

hardsigmoid is used by mobile net v3 oss model.
This diff added hardsigmoid support in acc_tracer

Test Plan:
buck test glow/fb/fx/acc_tracer:test_acc_shape_inference
buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_hardsigmoid

Reviewed By: jfix71

Differential Revision: D30950304

fbshipit-source-id: 8fe4b4c6df29c06a73850d32f59321a9311f94f5
2021-09-22 20:57:42 -07:00
5525e9a591 Lock unpickling of source ranges
Summary:
The source is shared across all threads running the torchscript
interpreter, so if several threads encounter errors at once, they will all race
to unpickle the source, leading to memory corruption.

Test Plan:
Model 217993215_0 is the problematic model; I wasn't able to repro
the crash with requests stored in Hive, but I could easily by adding my
devserver (SMC tier predictor.bertrand) as a shadow tier to the model's tier
(inference_platform.predictor_model.prod.bi.217993215_latest).  (i.e., set
shadow_tier property to predictor.bertrand=1 to proxy 1% of traffic).

With this diff, the ASAN/TSAN errors go away.

Reviewed By: suo

Differential Revision: D31044009

fbshipit-source-id: 56f9ef3880e7cf09f334db71b4256e362b4de965
2021-09-22 20:41:02 -07:00
228141f939 [pytorch] more informative error msg from fbgemm embedding spmdm call (#65186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65186

FBGEMM JIT'ed EmbeddingSpMDM kernel just returns false when there's an error delegating detailed error handling to the caller (since each framework like PyTorch and Caffe2 wants to do error handling differently). Many of PyTorch code was simply reporting there was "an" error without pinpointing exactly why error happened. This diff introduces more informative error msg following what Caffe2 was doing.

Test Plan: CI

Reviewed By: dskhudia

Differential Revision: D31008300

fbshipit-source-id: b8d069af0692dc86dc642b18a9c68f22deaffea3
2021-09-22 20:13:34 -07:00
0ca1102609 [fx2trt] fuse permute + matmul using a pass instead of hardcoding it as a leaf module (#65482)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65482

Currently we hardcoded permute + bmm in a module and tagged it as a leaf module during tracing. This diff introduces a pass to fuse permute + matmul to a single node.

TODO:
For fusion transformation like this kind, they would actually share many similar code like finding the fusion pattern, replacing original nodes with fused node. Current fx subgraph rewriter allows us to specify patterns that we want to replace but we would need to extend it to allow specify constraint on nodes' kwargs.

Reviewed By: yinghai

Differential Revision: D31022055

fbshipit-source-id: 13d1f18d79b09d371897ecde840f582ccaf5713a
2021-09-22 18:43:09 -07:00
fccaa4a3c8 [fx2trt] fix transpose unittest (#65481)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65481

Previous we have `acc_ops.transpose` but after a recent diff `torch.transpose` is mapped to `acc_ops.permute`. Here we clean up the fx2trt unittest for transpose and add support for negative indices in permute.

Reviewed By: wushirong

Differential Revision: D31115280

fbshipit-source-id: 58e689e6dd14181aea5186f3bb5b8745a07d0e51
2021-09-22 18:08:55 -07:00
2f67579864 [ddp] use named_params and named_buffers explicitly (#65181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65181

This PR changes `state_dict()` during sync to `named_parameters` and `named_buffers` explicitly. the underlying motivation is that, `state_dict()` doesn't necessarily equals to "params + buffers" for all cases, state_dict is used for checkpoint purpose mainly, and params/buffers are used for training, we might have cases that params/buffers be in different forms with state_dict (i.e. state_dict we might want to save in small pieces of tensors while in training we want to concat the tensors together for performance reasons).
ghstack-source-id: 138701159

Test Plan: wait for ci

Reviewed By: divchenko, rohan-varma

Differential Revision: D31007085

fbshipit-source-id: 4e1c4fbc07110163fb9b09b043ef7b4b75150f18
2021-09-22 17:32:54 -07:00
0eaf081018 [JIT] canonicalize aten::rsub (#65014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65014

ghstack-source-id: 138656948

Test Plan:
```
(pytorch) [maxren@devvm3115.atn0 ~/pytorch] python3 test/test_jit.py TestPeephole
CUDA not available, skipping tests
monkeytype is not installed. Skipping tests for Profile-Directed Typing
........s......................
----------------------------------------------------------------------
Ran 31 tests in 0.393s

OK (skipped=1)
(pytorch) [maxren@devvm3115.atn0 ~/pytorch] python3 test/test_jit.py TestPeephole.test_normalized_rsub
CUDA not available, skipping tests
monkeytype is not installed. Skipping tests for Profile-Directed Typing
.
----------------------------------------------------------------------
Ran 1 test in 0.015s

OK
```

Reviewed By: eellison

Differential Revision: D30941389

fbshipit-source-id: 03f0416d99090845c9bfb1e5fcf771d5f1d7a050
2021-09-22 17:20:46 -07:00
32f0387ee8 Bug in CosineAnnealingWarmRestarts in optim/lr_scheduler.py (#64758)
Summary:
## {emoji:1f41b} Bug
'CosineAnnealingWarmRestarts'  object has no attribute 'T_cur'.
In the Constructor of the CosineAnnealingWarmRestarts, we're calling the constructor of the Parent class (_LRScheduler) which inturn calls the step method of the CosineAnnealingWarmRestarts.
The called method tries to update the object's attribute  'T_cur' which is not defined yet. So it raises the error.
This only holds, when we give the value for last_epoch argument as 0 or greater than 0 to the 'CosineAnnealingWarmRestarts', while initializing the object.

![Bug_in_CosineAnnealingWarmRestarts](https://user-images.githubusercontent.com/77477328/132552212-70abc8b5-0357-4c35-90a9-832648bac607.png)
## To Reproduce

Steps to reproduce the behavior:

1. Give the value for the last_epoch argument as zero OR
1. Give the value for the last_epoch argument as a Positive integer.

## Expected behavior

I only expected the 'CosineAnnealingWarmRestarts' object to be initialized.

## Environment

PyTorch version: 1.9.0+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.2
Libc version: glibc-2.31
Python version: 3.8.10  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.8.0-59-generic-x86_64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: No CUDA

## Additional context
We can able to solve this bug by moving the line 'self.T_cur = self.last_epoch' above the 'super(CosineAnnealingWarmRestarts,self).__init__()' line. Since we've initialized the "self.T_cur" to the object.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64758

Reviewed By: ezyang

Differential Revision: D31113694

Pulled By: jbschlosser

fbshipit-source-id: 98c0e292291775895dc3566fda011f2d6696f721
2021-09-22 16:55:14 -07:00
b80bdcc73b Add register_module alias to nn.Module (#65174)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60397. I'm not sure how aliases are supposed to be implemented, but this is the most basic/direct way, IMO. As a side-effect, this implementation results in a "duplicate" doc entry, inheriting the one from `add_module`:

![monkey-patch](https://user-images.githubusercontent.com/7027770/133693137-8408d8e7-1f4f-436b-b176-57dda9bc3a32.png)

An alternative implementation could be:

```python
def register_module(self, name: str, module: Optional['Module']) -> None:
    r"""Alias for :func:`add_module`."""
    self.add_module(name, module)
```

which results in this documentation:

![image](https://user-images.githubusercontent.com/7027770/133693249-d969a71a-be44-489d-9633-4f38b44ab887.png)

Questions:
1. Should I replicate the tests? There are two for `add_module`: [test_add_module_raises_error_if_attr_exists](873255c6d9/test/test_nn.py (L1420-L1434)) and [test_add_module](873255c6d9/test/test_nn.py (L1837-L1855)).
2. This PR only adds `register_module` to `nn.Module`. There is an `add_module` in [`_RemoteModule`](https://github.com/pytorch/pytorch/blob/master/torch/distributed/nn/api/remote_module.py#L311-L312), which raises `NotSupported`, and there is another one in [`ConcreteModuleTypeBuilder`](873255c6d9/torch/_C/__init__.pyi.in (L468)), which means something else, I think. Should I do anything about them?

cc ngimel SsnL

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65174

Reviewed By: soulitzer

Differential Revision: D31089717

Pulled By: jbschlosser

fbshipit-source-id: abd8d14a434fd8c7efa0bd8c242df56da33491e9
2021-09-22 16:37:28 -07:00
31584d065e [Static Runtime] Added NNC implementation for signed log1p kernel. (#65387)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65387

Added a customized NNC implementation for signed log1p kernel and enabled the fusion pass that adds the fused signed log1p op.

Also, added a SR microbenchmark for this kernel which shows the performance improvement.

Without fusion:
```
--------------------------------------------------------------------------------
Benchmark                                         Time           CPU Iterations
--------------------------------------------------------------------------------
BM_signed_log1p/16                             1953 ns       1953 ns     358746
BM_signed_log1p/64                             2049 ns       2049 ns     342145
BM_signed_log1p/512                            3291 ns       3291 ns     214342
BM_signed_log1p/4096                          15559 ns      15559 ns      44420
BM_signed_log1p/32768                        101936 ns     101935 ns       6843
BM_signed_log1p/65536                        194792 ns     194789 ns       3615
```

With NNC fusion:
```
--------------------------------------------------------------------------------
Benchmark                                         Time           CPU Iterations
--------------------------------------------------------------------------------
BM_signed_log1p/16                              369 ns        369 ns    1896179
BM_signed_log1p/64                              497 ns        497 ns    1406995
BM_signed_log1p/512                            1618 ns       1618 ns     430209
BM_signed_log1p/4096                          11327 ns      11326 ns      61463
BM_signed_log1p/32768                         84099 ns      84086 ns       8325
BM_signed_log1p/65536                        166531 ns     166510 ns       4186
```

This clearly shows >15% improvement in performance of this kernel with NNC fusion.

On inline_cvr local model, there is a small improvement in terms of profiled time spent on ops:
  without fusion: `0.9%` (computed by adding the % spent on all the 4 ops involved)
  with NNC fusion: `0.55%`

Test Plan:
`buck test mode/opt-clang //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- SignedLog1p`

Also, did the accuracy test with inline_cvr as described here, https://fb.quip.com/qmdDAJzEmPtf, on the full size model (285298536_1)

```
get 57220 prediction values
get 57220 prediction values
max_error:  0  total:  0
```

Reviewed By: hlu1

Differential Revision: D30609492

fbshipit-source-id: d2e68df580569a30ee61abb0ef18d2c4c56827bd
2021-09-22 15:53:33 -07:00
1c20b98b4b [iOS][CoreML] Check backend availability at runtime. (#65315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65315

ghstack-source-id: 138703808

Test Plan:
- OSS builds and BUCK builds
- CircleCI

Reviewed By: hanton

Differential Revision: D31048011

fbshipit-source-id: 824a8e32d65de2caf25e41efef2b022ddbb63156
2021-09-22 15:38:53 -07:00
2898ef7549 Minor ScanKernels.cu cleanup (#65350)
Summary:
- Replace THCNumerics with `at::_isnan`
- Replace `contiguous` with `expect_contiguous`
- Don't use `contiguous` on output tensors. Instead skip the copy and
  just create a new empty tensor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65350

Reviewed By: ezyang

Differential Revision: D31103501

Pulled By: ngimel

fbshipit-source-id: 9030869e28d6c570fad074fd0502076de8e2ab09
2021-09-22 15:34:01 -07:00
5739f77775 [DDP] Refactor and remove sync_params (#64514)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64514

sync_params is a misnomer since we don't actually synchroniz
parameters. While removing this I realized
`self._check_and_sync_module_buffers` does almost everything we need it to, so
just refactored that and made DDP forward call into it.
ghstack-source-id: 138684982

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D30751231

fbshipit-source-id: add7c684f5c6c71dad9e9597c7759849fa74f47a
2021-09-22 14:12:51 -07:00
ce5981e431 [DDP] Custom buffer reduction (#64513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64513

Proposal: https://github.com/pytorch/pytorch/issues/63041
Support custom buffer reduction in DDP via hook
ghstack-source-id: 138655663

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D30751152

fbshipit-source-id: 257a9d46bb178d8812d4ea5a4d9c6140b8a1791f
2021-09-22 14:11:35 -07:00
923f06621c Fix Windows ninja builds when MAX_JOBS is specified (#65444)
Summary:
Reported by cloudhan in https://github.com/pytorch/pytorch/pull/64733#issuecomment-924545463

Fixes regression introduced by 047e68235f

cc malfet seemethere

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65444

Reviewed By: dagitses, seemethere

Differential Revision: D31103260

Pulled By: malfet

fbshipit-source-id: 9d5454a64cb8a0b96264119cf16582cc5afed284
2021-09-22 14:04:31 -07:00
cbc3db8274 Create test for builtin tensorrt module in torch deploy (#63819)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63819

ghstack-source-id: 138521664

Test Plan:
buck test mode/dev-nosan caffe2/torch/csrc/deploy:test_deploy_gpu

buck test mode/opt-split-dwarf caffe2/torch/csrc/deploy:test_deploy_gpu

Reviewed By: wconstab

Differential Revision: D30499301

fbshipit-source-id: 0bc165b4ed5be28ebb0becc65f292cf26368692f
2021-09-22 13:42:35 -07:00
72fc53ff27 .github: Add timeout for test step (#65486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65486

Adding this after observing jobs running for 6+ hours on `pytorch/pytorch-canary`, still trying to debug why they happen there but this should resovle jobs running forever

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

cc ezyang seemethere malfet pytorch/pytorch-dev-infra

Test Plan: Imported from OSS

Reviewed By: ezyang, malfet, janeyx99

Differential Revision: D31117497

Pulled By: seemethere

fbshipit-source-id: 126a10e844bdef77c2852cc5c392e5f37f130f7e
2021-09-22 13:23:41 -07:00
f24bd43375 Changing type and name of local_used_maps to reflect that it is only one map (#65380)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65380

Fixing bugs that arise when running setup.py develop

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D31104844

Pulled By: jaceyca

fbshipit-source-id: acfd4cf316c71177df758ca55b470f51a17f776b
2021-09-22 11:35:33 -07:00
0fe86ac6c6 Fix torch.any documentation (#65310)
Summary:
Currently, the description of torch.any would be parsed like

```
param input
the input tensor.
```

However, it should be

```
Tests if any element in input evaluates to True.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65310

Reviewed By: ezyang

Differential Revision: D31102918

Pulled By: soulitzer

fbshipit-source-id: 678ade20ba16ad2643639fbd2420c8b36fcd8bd7
2021-09-22 11:24:20 -07:00
a0dea074b2 Remove .data from benchmarks and tensorboard (#65389)
Summary:
Related to https://github.com/pytorch/pytorch/issues/30987 and https://github.com/pytorch/pytorch/issues/33628. Fix the following tasks:

- Remove the use of `.data` in all our internal code:
  - [x] `benchmarks/`
  - [x] `torch/utils/tensorboard/`

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 albanD gchanan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65389

Reviewed By: soulitzer

Differential Revision: D31093464

Pulled By: albanD

fbshipit-source-id: 3a9c8834fd544a59a1cc2b930ae538fd1d46b232
2021-09-22 11:16:59 -07:00
70a545b21e Add Tensor._make_wrapper_subclass (#65340)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65340

I thought about a few possible ways of doing this.  The main hazard is
that if I create a CPU tensor that doesn't have any real storage, the
moment I actually try to access the data on the tensor I will segfault.
So I don't want to use _make_subclass on a "cpu meta tensor" because
the CPU meta tensor (with no subclass) is radioactive: printing it
will immediately cause a segfault.  So instead, I have to create
the CPU meta tensor AND subclass all in one go, and that means I need
another function for it.  One downside to doing it this way is
I need another overload for explicit strides, and in general it is
difficult to get the view relationships to all work out properly;
tracked at https://github.com/pytorch/pytorch/issues/65339

Fixes https://github.com/pytorch/pytorch/issues/62972
Fixes https://github.com/pytorch/pytorch/issues/62730

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31057231

Pulled By: ezyang

fbshipit-source-id: 73522769e093ae8a1bf0c7f7e594659bfb827b28
2021-09-22 11:10:47 -07:00
11ca641491 [docs] Add images to some activation functions (#65415)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65368. See discussion in the issue.

cc mruberry SsnL jbschlosser soulitzer

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65415

Reviewed By: soulitzer

Differential Revision: D31093303

Pulled By: albanD

fbshipit-source-id: 621c74c7a2aceee95e3d3b708c7f1a1d59e59b93
2021-09-22 11:05:29 -07:00
158393e1a1 Fix autograd engine checks and update InputMetadata (#65235)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65235

1. Updated the legacy type checks in `torch/csrc/autograd/engine.cpp` to individually validate the dtype, device, and layout equality for grad and tensor.
2. Removed device field from `InputMetadata` since it's already stored via storing options. Also, added `dtype()` and `layout()` methods to `InputMetadata`. To make this change, some calls had to be updated due to the change in constructor.
3. To fix https://github.com/pytorch/pytorch/issues/65016:
     a. Added a `is_tensor_subclass` field in `InputMetadata` to skip device checks for grad and tensor when the tensor has
         python key set on it (tensor subclass).

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D31117318

Pulled By: anjali411

fbshipit-source-id: 825401df98695c48bf9b320be54585f6aff500bd
2021-09-22 11:01:19 -07:00
db4b68b3ac Back out "Eagerly populate python_error::what() when TORCH_SHOW_CPP_STACKTRACES=1"
Summary: Original commit changeset: 9cfda47cafb3

Test Plan: unland

Reviewed By: ezyang

Differential Revision: D31116643

fbshipit-source-id: 631eea446ed48c63ca39281d24163a2eadbe8d12
2021-09-22 10:37:27 -07:00
b3ec88f41f ugh (#65477)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65477

Test Plan: Imported from OSS

Reviewed By: zhouzhuojie

Differential Revision: D31115936

Pulled By: suo

fbshipit-source-id: fb16911a683713fdc2393bfe7150fc29c7d6814f
2021-09-22 10:15:41 -07:00
152f0236c3 Revert D31082693: Fix autograd engine checks and update InputMetadata
Test Plan: revert-hammer

Differential Revision:
D31082693 (9324d682fd)

Original commit changeset: cb551cd438c6

fbshipit-source-id: fc60f86b80fc70058984df6bccbf240d27f5843e
2021-09-22 10:00:08 -07:00
7c9a278895 fix trailing newlines (#65474)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65474

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D31114952

Pulled By: suo

fbshipit-source-id: 3b8cde2098635450c3e22571a401f78e4e54e9e0
2021-09-22 09:48:34 -07:00
508845f2b5 [quant] AO migration of the torch/quantization/quantize_fx.py and torch/quantization/fx/* (#65033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65033

1. Move the file:
```
hg mv caffe2/torch/quantization/fx caffe2/torch/ao/quantization/fx
hg mv caffe2/torch/quantization/quantize_fx.py caffe2/torch/ao/quantization/quantize_fx.py
```
2. Create new files
```
touch caffe2/torch/quantization/quantize_fx.py
touch caffe2/torch/quantization/fx/__init__.py
```
3. import things in the new files
4. add tests to test/quantization/ao_migration/test_quantization_fx.py
this is because we have some fx import in quantize_fx and fx/*.py

Test Plan: buck test mode/dev //caffe2/test:quantization

Reviewed By: vkuzo, z-a-f

Differential Revision: D30949749

fbshipit-source-id: 9e5d4d039c8a0a0820bc9040e224f0d2c26886d3
2021-09-22 09:29:15 -07:00
762c2276e1 feed model merge net lower benchmark (#65191)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65191

Test Plan:
run command:
buck run mode/opt -c python.package_style=inplace hpc/new/models/feed/benchmark:feed_lower_benchmark

example output:
Eager, BS: 2048, TFLOP/s: 253.25, Time per iter: 4.49ms, QPS: 456289.25
TensorRT, BS: 2048, TFLOP/s: 162.30, Time per iter: 7.00ms, QPS: 292426.58

Reviewed By: yinghai

Differential Revision: D31010288

fbshipit-source-id: f30b520eca9508439588bcf48476b1b1edfb09af
2021-09-22 09:21:18 -07:00
bcc6e3ab5e add python API to print all operators that have kernels registered to a particular DispatchKey (#63575)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63575

Test Plan: Imported from OSS

Reviewed By: ezyang, Chillee

Differential Revision: D30426919

Pulled By: bdhirsh

fbshipit-source-id: b0e487e48dfe02f7b9d678403f0a2b5bfe146f4e
2021-09-22 09:15:55 -07:00
9324d682fd Fix autograd engine checks and update InputMetadata (#65235)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65235

1. Updated the legacy type checks in `torch/csrc/autograd/engine.cpp` to individually validate the dtype, device, and layout equality for grad and tensor.
2. Removed device field from `InputMetadata` since it's already stored via storing options. Also, added `dtype()` and `layout()` methods to `InputMetadata`. To make this change, some calls had to be updated due to the change in constructor.
3. To fix https://github.com/pytorch/pytorch/issues/65016:
     a. Added a `is_tensor_subclass` field in `InputMetadata` to skip device checks for grad and tensor when the tensor has
         python key set on it (tensor subclass).

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D31082693

Pulled By: anjali411

fbshipit-source-id: cb551cd438c6ca40b0f18a4d0009e0861cf0fd4e
2021-09-22 07:49:52 -07:00
f90d9b48db test_neg_view: preseve sign of sample input (#63010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63010

This changes `test_neg_view` to call the operator with the same numeric values as the original sample input.

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D31082824

Pulled By: anjali411

fbshipit-source-id: 7d50f99dc0d1343247e366cbe9b0ca081bd0a9b1
2021-09-22 07:47:42 -07:00
9d17f21e46 Added PandasDataframeWrapper (#65411)
Summary:
- Added `PandasDataframeWrapper` around `pandas` functions to easily drop-and-replace`torcharrow` for Facebook internal use cases
- Updated relevant datapipe/dataframe usesites to use the new `PandasDataframeWrapper` instead of calling `pandas` functions directly

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65411

Reviewed By: VitalyFedyunin, hudeven

Differential Revision: D31087746

Pulled By: Nayef211

fbshipit-source-id: 299901f93a967a5fb8ed99d6db9b8b9203634b8f
2021-09-22 07:42:45 -07:00
3c6d9fd124 Eagerly populate python_error::what() when TORCH_SHOW_CPP_STACKTRACES=1 (#65376)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65376

Let's suppose there's a bug in PyTorch and python_error gets thrown
and never gets caught.  Typically, you'll get a very useless error
message like this:

```
terminate called after throwing an instance of 'python_error'
  what():
  Aborted (core dumped)
```

Now, you'll get:

```
what():  unknown Python error (for more information, try rerunning with TORCH_SHOW_CPP_STACKTRACES=1)
```

and with TORCH_SHOW_CPP_STACKTRACES=1 you'll get:

```
what():  error message from Python object
```

If we're OK with making Python exceptions go even slower, we could
eagerly populate unconditionally.  I'm also not so happy we don't get
a Python backtrace or the Python error name, that's worth improving
(this is a minimal diff to get the discussion going.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31067632

Pulled By: ezyang

fbshipit-source-id: 9cfda47cafb349ee3d6853cdfb0f319073b87bff
2021-09-22 07:12:28 -07:00
2c7df1360a Bump torch version to 1.11 (#65435)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65435

Reviewed By: zhouzhuojie

Differential Revision: D31099045

Pulled By: malfet

fbshipit-source-id: 6ae6ca8a4b652fc51ee3138c800d067e144acbaa
2021-09-22 07:07:16 -07:00
96383ca704 Unify the output pathname of archive reader and extractor (#65424)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65424

This PR is re-implementation for https://github.com/facebookexternal/torchdata/pull/93
Same PR has landed into torchdata https://github.com/facebookexternal/torchdata/pull/157

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D31090447

Pulled By: ejguan

fbshipit-source-id: 45af1ad9b24310bebfd6e010f41cff398946ba65
2021-09-22 06:34:29 -07:00
e331beef20 Delete code coverage jobs from CI (#65362)
Summary:
As it does not seem useful to the lots of peope, see https://fb.workplace.com/groups/1144215345733672/posts/2062909540530910

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65362

Reviewed By: janeyx99, bdhirsh

Differential Revision: D31061945

Pulled By: malfet

fbshipit-source-id: 912ed92cc901a370a40448f1127c3ba43640ac43
2021-09-22 05:38:35 -07:00
127c9402d0 Revert "Revert D30752939: [pytorch][PR] nvfuser update" (#65137)
Summary:
This reverts commit 03389dc851db6f3ca52f9a4455ce2090c64a223d.

Attempt again for PR: https://github.com/pytorch/pytorch/issues/63745
Fixes the windows build failure.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65137

Reviewed By: seemethere, dzhulgakov, heitorschueroff

Differential Revision: D30994556

Pulled By: malfet

fbshipit-source-id: f1925b6c5cc1a1a441a96499667c91e8dfc1b53d
2021-09-22 04:54:51 -07:00
feefc94573 [fx2trt] Use itensor_to_tensor_meta to track the TensorMeta info for ITensor node (#65427)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65427

Previously we added a input_tensor_meta for dequantize function, this is a bit hacky since this creates a dependency between
the arguments of dequantize and if there are passes that changes the input then we would need to update tensor meta as well

Test Plan:
python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py

Imported from OSS

Reviewed By: soulitzer

Differential Revision: D31094274

fbshipit-source-id: 5e40648d3081e2363f3a70bcc9745df4a8190ad3
2021-09-22 00:02:31 -07:00
64d3c7388f [RELAND] Enable ncclAvg for reductions (#62835)
Summary:
Resubmit of https://github.com/pytorch/pytorch/pull/62303.

Reverts the revert, and restores some diffs that were mysteriously missing from the reverted revert. I think some of the diffs I pushed to the original PR raced with its import or landing, such that the original PR's merge didn't pick up all the diffs I wanted. I don't know enough about the landing process to do more than speculate wildly, but hopefully this resubmit sorts things out.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62835

Reviewed By: zhouzhuojie, seemethere, janeyx99, heitorschueroff

Differential Revision: D30999982

Pulled By: malfet

fbshipit-source-id: 1f70ab4055208f1c6a80c9fc9fbe292ce68ecaa9
2021-09-21 18:09:45 -07:00
3f5f721ab3 Pass through allow-list from prepare_qat into propagate_qconfig_ to allow custom mapping and custom QAT module (#65119)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65119

Pytorch Quantization: allow prepare_qat to include custom module by passing allow_list into the prepare_qat.

When we are implementing custom module and custom mapping for Quantization Aware Training (QAT), we need to add the custom module to the mappings and to the allow_list during prepare_qat. The allow_list needs to be surfaced to the  propagate_qconfig_.

Test Plan: relying on general unit test

Reviewed By: supriyar

Differential Revision: D30982060

fbshipit-source-id: 1114115b6a3b853238d33d72b5cbaafc60f463e0
2021-09-21 17:15:25 -07:00
158b8bdc8a Cleaning up DDP SPMD in reducer.cpp (#64113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64113

Since there is only one model replica per process, `replicas`
can be simplified from `std::vector<std::vector<at::Tensor>>` to
`std::vector<at::Tensor>` in the Reducer class.

Test Plan:
All tests are passing
`pytest test/distributed/test_c10d_gloo.py -vs`

Imported from OSS

Reviewed By: mrshenli

Differential Revision: D30615965

fbshipit-source-id: d2ec809d99b788c200b01411333e7dbad1269b51
2021-09-21 16:13:18 -07:00
27faa7a560 [ONNX] Support torch.isfinite export (#64759)
Summary:
Pull Request resolved:  https://github.com/pytorch/pytorch/issues/64754

1. onnx::IsInf is introduced in opset 10, onnx:isnan is introduced in opset 9 -> isfinite = not(or(isinf,isnan)) -> opset 10

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64759

Test Plan: Imported from OSS

Reviewed By: seemethere, bdhirsh

Differential Revision: D31060760

Pulled By: malfet

fbshipit-source-id: 499ecd6cc55ea881b8a57e6a9a4fb38eaaee5242
2021-09-21 15:47:48 -07:00
5aa33770f5 .circleci: Remove Windows workflows from Circle (#64959)
Summary:
Removes Windows CI from Circle

Will go in after https://github.com/pytorch/pytorch/pull/65094

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64959

Reviewed By: soulitzer

Differential Revision: D31095374

Pulled By: janeyx99

fbshipit-source-id: b0d13a59aa8c6e2f85dbd9c343cac395c4e64475
2021-09-21 15:32:24 -07:00
a1216061c1 [DataPipe] Fix deepcopy filehandle for Mapper and in-place modification for IterableWrapper (#65220)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65220

Fixes #65221

- Remove deepcopy from Mapper to support file handles
- Convert `IterableWrapper` to deepcopy iterable instance within each iterator to prevent in-place modification (different data per epoch)
- Convert `IDP` to `IterableWrapper` in test_datapipe.py
- Refine the variable names (prevent using `dp` that is module reference)

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31021886

Pulled By: ejguan

fbshipit-source-id: 72a9eee66c758e2717d591cd0942892bddedc223
2021-09-21 14:29:40 -07:00
73c4bfc30a [ONNX] Add log10 symbolic (#63418) (#64374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64374

Fixes #61332

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30919609

Pulled By: msaroufim

fbshipit-source-id: f474376bbf7b59677b10565f316384eca59dba43

Co-authored-by: Shubham Bhokare <shubhambhokare@gmail.com>
2021-09-21 13:30:59 -07:00
1fec9cd76b [Fixed] Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA] (#59980)
Summary:
This PR enables Half, BFloat16, ComplexFloat, and ComplexDouble support for matrix-matrix multiplication of COO sparse matrices.
The change is applied only to CUDA 11+ builds.

`cusparseSpGEMM` also supports `CUDA_C_16F` (complex float16) and `CUDA_C_16BF` (complex bfloat16). PyTorch also supports the complex float16 dtype (`ScalarType::ComplexHalf`), but there is no convenient dispatch, so this dtype is omitted in this PR.

cc nikitaved pearu cpuhrsch IvanYashchuk ezyang anjali411 dylanbespalko mruberry Lezcano

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59980

Reviewed By: ngimel

Differential Revision: D30994115

Pulled By: cpuhrsch

fbshipit-source-id: 4f55b99e8e25079d6273b4edf95ad6fa85aeaf24
2021-09-21 13:03:40 -07:00
8bab468943 Reduce test size for max_pool (#65336)
Summary:
Fixe OOM in slow gradcheck tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65336

Reviewed By: malfet

Differential Revision: D31059007

Pulled By: albanD

fbshipit-source-id: 2dd6967d88663558e37f8c0836ad33333c92dfb5
2021-09-21 12:57:02 -07:00
cd813f16bf Add functional api for nn.Module (#61447)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58839

After discussing with albanD he proposed this simple design.

Let's iterate over the idea here :).

Thanks.

The main point that this PR does is to use reparametrization to be reverted at the end of the functional call.
This allows us to have the original model with its status unchanged, also in this scenario the module is created without parameters so this will hard error if not all parameters are specified when the forward pass is done.

``` python
import torch
import torch.nn.utils._stateless

class MyModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.l1 = torch.nn.Linear(1, 1)

    def forward(self, x):
        return self.l1(x)

mod = MyModule()
print('weight before', mod.l1.weight)
x = torch.rand((1, 1))
parameters = {"l1.weight": torch.nn.Parameter(torch.tensor([[1.0]])),
              "l1.bias": torch.nn.Parameter(torch.tensor([0.0]))}
res = torch.nn.utils._stateless.functional_call(mod, parameters, x)
print('Functional call input ', x, ' and result ', res)
print('weight after', mod.l1.weight)
```
Output
```
weight before Parameter containing:
tensor([[-0.4419]], requires_grad=True)

Functional call input tensor([[0.3531]]) and result tensor([[0.3531]], grad_fn=<AddmmBackward>)

weight after Parameter containing:
tensor([[-0.4419]], requires_grad=True)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61447

Reviewed By: soulitzer

Differential Revision: D31082765

Pulled By: albanD

fbshipit-source-id: ba814d0f9162fb39c59989ca9a8efe160405ba76
2021-09-21 12:39:43 -07:00
c245632e2e Use higher timeout for TSAN tests. (#65391)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65391

TSAN tests are much slower than the usual dev/opt mode, about 5-10x
slower.

As a result, for TSAN build mode we use a much higher timeout for distributed
tests.
ghstack-source-id: 138584613

Test Plan: waitforbuildbot

Reviewed By: cbalioglu

Differential Revision: D31076575

fbshipit-source-id: 44a485f07101deac536470ceeff2a52cac4f9e0b
2021-09-21 12:08:27 -07:00
28bfdbb066 OpInfo for nn.functional.batch_norm (#63218)
Summary:
Addresses https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261.

* There exists `torch.batch_norm` but it takes an extra arg: `cudnn_enabled` (not there in functional variant). This is passed from the functional variant to `torch.batch_norm` here: https://github.com/pytorch/pytorch/blob/master/torch/nn/functional.py#L2282. `test_variant_consistency_jit` fails with an error: (when passed an alias)
    ```python
    File "/home/krshrimali/Documents/Projects/Quansight/pytorch/test/test_ops.py", line 457, in _test_consistency_helper
    variant_forward = variant(cloned,
    TypeError: batch_norm() missing 1 required positional arguments: "cudnn_enabled"
    ```
    * I'm not sure of a solution to this, as AFIK - there is no way to pass a lambda wrapper for an alias. Hence, I've skipped adding this as an alias there.
    * On second thought, is this even an alias?

cc: mruberry zou3519 kshitij12345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63218

Reviewed By: bdhirsh

Differential Revision: D31019785

Pulled By: zou3519

fbshipit-source-id: 2a834d05835da975289efc544a7ad7e98c99438f
2021-09-21 11:35:34 -07:00
9afdf017dc Add force_on_cpu test to win cuda10.2 on GHA (#65094)
Summary:
Part of migrating from Circle.

Once we get a successful force_on_cpu test, we can move it to trunk only.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65094

Reviewed By: seemethere

Differential Revision: D31086289

Pulled By: janeyx99

fbshipit-source-id: e1d135cc844d51f0b243b40efb49edca277d9de8
2021-09-21 11:14:15 -07:00
00b732e98b Remove orphan from cuDNN persistent note (#65160)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60009.

As the document is properly [included](https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/rnn.py#L799), and [`:orphan:` doesn't need to be used in included documents](https://github.com/sphinx-doc/sphinx/issues/6787#issuecomment-549256840), and no warning is emitted in my local build when removing it, I think it can be removed.

The artifact reported in https://github.com/pytorch/pytorch/issues/60009 can be seen in 3 pages: [torch.nn.RNN](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html#torch.nn.RNN), [torch.nn.LSTM](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html#torch.nn.LSTM), and [torch.nn.GRU](https://pytorch.org/docs/stable/generated/torch.nn.GRU.html#torch.nn.GRU).

cc ezyang suo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65160

Reviewed By: bdhirsh

Differential Revision: D31020280

Pulled By: ezyang

fbshipit-source-id: 6c3541e5a856a91cf1ce1d2db4d04f5d13118ee4
2021-09-21 11:09:47 -07:00
c0eb266c02 [Static runtime] Micro-optimization pass on GetLivenessMap (#65175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65175

More efficient use of map API, more efficient way to insert all pairs of inputs/outputs in liveness map
ghstack-source-id: 138547815

Test Plan: Time to enable static runtime down from ~8.7s to ~8.4s

Reviewed By: mikeiovine

Differential Revision: D30983897

fbshipit-source-id: fa6000bfd0fa0adfcd7c5922199ee32ada8c430e
2021-09-21 10:52:08 -07:00
6d7bc34b67 Make new_empty/new_ones/new_zeros/new_full respect subclass (#65169)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65169

Previously these composite functions created a new tensor
using at::empty (or some other factory function) using TensorOptions
which doesn't preserve Python subclass.  Making new_empty a
non-composite op and then routing everyone through it makes it
respect subclass.  We could also make all of these non-composite
but this reduces the number of derivatives.yaml entries I have to
make and allows you to trace the fill calls.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31003713

Pulled By: ezyang

fbshipit-source-id: 19f906f1404a6b724769c49f48d123f407a561ff
2021-09-21 10:50:48 -07:00
04a5e45aeb [PyTorch] Compare Type pointers before calling operator== in EqualNode (#65352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65352

This can be a big win if it saves the virtual call to operator== and the cost is tiny.
ghstack-source-id: 138497657

Test Plan: Profiled ptvsc2_predictor_bench startup, inclusive time spent in EqualNode::operator() dropped from 0.8% to negligible

Reviewed By: hlu1

Differential Revision: D30974969

fbshipit-source-id: 9c3af36cffe709dfce477dcc49722536470264a0
2021-09-21 10:46:24 -07:00
88232b4cee Fix ENABLE_RECORD_KERNEL_FUNCTION_DTYPE build (#65370)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65370

Forgot a wrapping 'namespace at' here!  And no contbuilds to test it.
ghstack-source-id: 138565579

Test Plan:
```
buck build --show-output -c pt.disable_per_op_profiling=0 -c pt.enable_record_kernel_dtype=1 -c pt.has_backtraces=1 fbsource//xplat/caffe2/fb/model_tracer:model_tracer
```

Reviewed By: JacobSzwejbka

Differential Revision: D31065923

fbshipit-source-id: ed4563fbd8f3c29f6b10ac8999c9010bd4359c97
2021-09-21 10:42:33 -07:00
3946 changed files with 298383 additions and 111113 deletions

View File

@ -46,7 +46,7 @@ steps:
curl -k https://s3.amazonaws.com/ossci-windows/sccache.exe --output .\tmp_bin\sccache.exe
curl -k https://s3.amazonaws.com/ossci-windows/sccache-cl.exe --output .\tmp_bin\sccache-cl.exe
copy .\tmp_bin\sccache.exe .\tmp_bin\nvcc.exe
curl -kL https://github.com/peterjc123/randomtemp-rust/releases/download/v0.3/randomtemp.exe --output .\tmp_bin\randomtemp.exe
curl -kL https://github.com/peterjc123/randomtemp-rust/releases/download/v0.4/randomtemp.exe --output .\tmp_bin\randomtemp.exe
displayName: Install sccache and randomtemp
condition: not(eq(variables.CUDA_VERSION, ''))

View File

@ -120,9 +120,7 @@ steps:
Write-Host "##vso[task.setvariable variable=CMAKE_LIBRARY_PATH;]$(Build.SourcesDirectory)\mkl\lib;$env:CMAKE_LIBRARY_PATH"
Write-Host "##vso[task.setvariable variable=ADDITIONAL_PATH;]$(Build.SourcesDirectory)\tmp_bin"
Write-Host "##vso[task.setvariable variable=SCCACHE_IDLE_TIMEOUT;]1500"
Write-Host "##vso[task.setvariable variable=RANDOMTEMP_EXECUTABLE;]$(Build.SourcesDirectory)\tmp_bin\nvcc.exe"
Write-Host "##vso[task.setvariable variable=CUDA_NVCC_EXECUTABLE;]$(Build.SourcesDirectory)\tmp_bin\randomtemp.exe"
Write-Host "##vso[task.setvariable variable=RANDOMTEMP_BASEDIR;]$(Build.SourcesDirectory)\tmp_bin"
Write-Host "##vso[task.setvariable variable=CMAKE_CUDA_COMPILER_LAUNCHER;]$(Build.SourcesDirectory)/tmp_bin/randomtemp.exe;$(Build.SourcesDirectory)/tmp_bin/sccache.exe"
displayName: Set MKL, sccache and randomtemp environment variables
# View current environment variables

View File

@ -1,6 +1,7 @@
build --copt=--std=c++14
build --copt=-I.
build --copt=-isystem --copt bazel-out/k8-fastbuild/bin
build --experimental_ui_max_stdouterr_bytes=2048576
# Configuration to disable tty features for environments like CI
build:no-tty --curses no
@ -11,3 +12,8 @@ build:no-tty --show_progress_rate_limit 10
build:gpu --define=cuda=true
# define a separate build folder for faster switching between configs
build:gpu --platform_suffix=-gpu
# rules_cuda configuration
build:gpu --@rules_cuda//cuda:enable_cuda
build:gpu --@rules_cuda//cuda:cuda_targets=sm_52
build:gpu --@rules_cuda//cuda:compiler=nvcc
build:gpu --repo_env=CUDA_PATH=/usr/local/cuda

View File

@ -63,7 +63,8 @@ CONFIG_TREE_DATA = OrderedDict(
],
)),
windows=(
[v for v in dimensions.GPU_VERSIONS if v not in dimensions.ROCM_VERSION_LABELS],
# Stop building Win+CU102, see https://github.com/pytorch/pytorch/issues/65648
[v for v in dimensions.GPU_VERSIONS if v not in dimensions.ROCM_VERSION_LABELS and v != "cuda102"],
OrderedDict(
wheel=dimensions.STANDARD_PYTHON_VERSIONS,
conda=dimensions.STANDARD_PYTHON_VERSIONS,

View File

@ -4,12 +4,13 @@ CUDA_VERSIONS = [
"102",
"111",
"113",
"115",
]
ROCM_VERSIONS = [
"4.0.1",
"4.1",
"4.2",
"4.3.1",
]
ROCM_VERSION_LABELS = ["rocm" + v for v in ROCM_VERSIONS]
@ -17,7 +18,6 @@ ROCM_VERSION_LABELS = ["rocm" + v for v in ROCM_VERSIONS]
GPU_VERSIONS = [None] + ["cuda" + v for v in CUDA_VERSIONS] + ROCM_VERSION_LABELS
STANDARD_PYTHON_VERSIONS = [
"3.6",
"3.7",
"3.8",
"3.9"

View File

@ -1,70 +1,7 @@
from cimodel.lib.conf_tree import ConfigNode, X, XImportant
from cimodel.lib.conf_tree import ConfigNode
CONFIG_TREE_DATA = [
("xenial", [
("gcc", [
("5.4", [ # All this subtree rebases to master and then build
("3.6", [
("important", [X(True)]),
]),
]),
# TODO: bring back libtorch test
("7", [X("3.6")]),
]),
("clang", [
("7", [
("3.6", [
("asan", [
(True, [
("shard_test", [XImportant(True)]),
]),
]),
("onnx", [XImportant(True)]),
]),
]),
]),
("cuda", [
("10.2", [
("3.6", [
# Build are needed for slow_gradcheck
('build_only', [X(True)]),
("slow_gradcheck", [
# If you update this slow gradcheck, you should
# also update docker_definitions.py to make sure
# the docker image match the config used here
(True, [
('shard_test', [XImportant(True)]),
]),
]),
# UNCOMMENT THE BELOW TO REENABLE LIBTORCH
# ("libtorch", [
# (True, [
# ('build_only', [X(True)]),
# ]),
# ]),
]),
]),
]),
]),
("bionic", [
("clang", [
("9", [
("3.6", [
("xla", [XImportant(True)]),
("vulkan", [XImportant(True)]),
]),
]),
]),
# @jithunnair-amd believes Jenkins builds are sufficient
# ("rocm", [
# ("3.9", [
# ("3.6", [
# ('build_only', [XImportant(True)]),
# ]),
# ]),
# ]),
]),
]
@ -145,7 +82,6 @@ class ExperimentalFeatureConfigNode(TreeConfigNode):
"build_only": BuildOnlyConfigNode,
"shard_test": ShardTestConfigNode,
"cuda_gcc_override": CudaGccOverrideConfigNode,
"coverage": CoverageConfigNode,
"pure_torch": PureTorchConfigNode,
"slow_gradcheck": SlowGradcheckConfigNode,
}
@ -289,14 +225,6 @@ class ShardTestConfigNode(TreeConfigNode):
return ImportantConfigNode
class CoverageConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["is_coverage"] = node_name
def child_constructor(self):
return ExperimentalFeatureConfigNode
class ImportantConfigNode(TreeConfigNode):
def modify_label(self, label):
return "IMPORTANT=" + str(label)

View File

@ -239,7 +239,6 @@ def instantiate_configs(only_slow_gradcheck):
compiler_version = fc.find_prop("compiler_version")
is_xla = fc.find_prop("is_xla") or False
is_asan = fc.find_prop("is_asan") or False
is_coverage = fc.find_prop("is_coverage") or False
is_noarch = fc.find_prop("is_noarch") or False
is_onnx = fc.find_prop("is_onnx") or False
is_pure_torch = fc.find_prop("is_pure_torch") or False
@ -284,10 +283,6 @@ def instantiate_configs(only_slow_gradcheck):
python_version = fc.find_prop("pyver")
parms_list[0] = fc.find_prop("abbreviated_pyver")
if is_coverage:
parms_list_ignored_for_docker_image.append("coverage")
python_version = fc.find_prop("pyver")
if is_noarch:
parms_list_ignored_for_docker_image.append("noarch")
@ -357,28 +352,6 @@ def instantiate_configs(only_slow_gradcheck):
tags_list=RC_PATTERN)
c.dependent_tests = gen_docs_configs(c)
if (
compiler_name != "clang"
and not rocm_version
and not is_libtorch
and not is_vulkan
and not is_pure_torch
and not is_noarch
and not is_slow_gradcheck
and not only_slow_gradcheck
and not build_only
):
distributed_test = Conf(
c.gen_build_name("") + "distributed",
[],
is_xla=False,
restrict_phases=["test"],
is_libtorch=False,
is_important=True,
parent_build=c,
)
c.dependent_tests.append(distributed_test)
config_list.append(c)
return config_list

View File

@ -1,119 +0,0 @@
import cimodel.data.simple.util.branch_filters as branch_filters
from cimodel.data.simple.util.docker_constants import (
DOCKER_IMAGE_NDK, DOCKER_REQUIREMENT_NDK
)
import cimodel.lib.miniutils as miniutils
class AndroidJob:
def __init__(self,
variant,
template_name,
is_master_only=True):
self.variant = variant
self.template_name = template_name
self.is_master_only = is_master_only
def gen_tree(self):
base_name_parts = [
"pytorch",
"linux",
"xenial",
"py3",
"clang5",
"android",
"ndk",
"r19c",
] + self.variant + [
"build",
]
full_job_name = "_".join(base_name_parts)
build_env_name = "-".join(base_name_parts)
props_dict = {
"name": full_job_name,
"build_environment": "\"{}\"".format(build_env_name),
"docker_image": "\"{}\"".format(DOCKER_IMAGE_NDK),
"requires": [DOCKER_REQUIREMENT_NDK]
}
if self.is_master_only:
props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.NON_PR_BRANCH_LIST)
return [{self.template_name: props_dict}]
class AndroidGradleJob:
def __init__(self,
job_name,
template_name,
dependencies,
is_master_only=True,
is_pr_only=False,
extra_props=tuple()):
self.job_name = job_name
self.template_name = template_name
self.dependencies = dependencies
self.is_master_only = is_master_only
self.is_pr_only = is_pr_only
self.extra_props = dict(extra_props)
def gen_tree(self):
props_dict = {
"name": self.job_name,
"requires": self.dependencies,
}
if self.is_master_only:
props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.NON_PR_BRANCH_LIST)
elif self.is_pr_only:
props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.PR_BRANCH_LIST)
if self.extra_props:
props_dict.update(self.extra_props)
return [{self.template_name: props_dict}]
WORKFLOW_DATA = [
AndroidJob(["x86_32"], "pytorch_linux_build", is_master_only=False),
AndroidJob(["x86_64"], "pytorch_linux_build"),
AndroidJob(["arm", "v7a"], "pytorch_linux_build"),
AndroidJob(["arm", "v8a"], "pytorch_linux_build"),
AndroidGradleJob(
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32",
"pytorch_android_gradle_build-x86_32",
["pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build"],
is_master_only=False,
is_pr_only=True),
AndroidGradleJob(
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",
"pytorch_android_gradle_custom_build_single",
[DOCKER_REQUIREMENT_NDK],
is_master_only=False,
is_pr_only=True),
AndroidGradleJob(
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit",
"pytorch_android_gradle_custom_build_single",
[DOCKER_REQUIREMENT_NDK],
is_master_only=False,
is_pr_only=True,
extra_props=tuple({
"lite_interpreter": miniutils.quote(str(int(False)))
}.items())),
AndroidGradleJob(
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build",
"pytorch_android_gradle_build",
["pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build",
"pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build",
"pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build",
"pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build"]),
]
def get_workflow_jobs():
return [item.gen_tree() for item in WORKFLOW_DATA]

View File

@ -120,9 +120,9 @@ WORKFLOW_DATA = [
),
SmoketestJob(
"binary_windows_build",
["wheel", "3.7", "cu102"],
["wheel", "3.7", "cu113"],
None,
"binary_windows_wheel_3_7_cu102_build",
"binary_windows_wheel_3_7_cu113_build",
is_master_only=True,
),
@ -144,11 +144,11 @@ WORKFLOW_DATA = [
),
SmoketestJob(
"binary_windows_test",
["wheel", "3.7", "cu102"],
["wheel", "3.7", "cu113"],
None,
"binary_windows_wheel_3_7_cu102_test",
"binary_windows_wheel_3_7_cu113_test",
is_master_only=True,
requires=["binary_windows_wheel_3_7_cu102_build"],
requires=["binary_windows_wheel_3_7_cu113_build"],
extra_props={
"executor": "windows-with-nvidia-gpu",
},

View File

@ -4,27 +4,8 @@ from cimodel.lib.miniutils import quote
from cimodel.data.simple.util.branch_filters import gen_filter_dict, RC_PATTERN
# TODO: make this generated from a matrix rather than just a static list
# NOTE: All hardcoded docker image builds have been migrated to GHA
IMAGE_NAMES = [
"pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7",
"pytorch-linux-bionic-py3.6-clang9",
"pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9",
"pytorch-linux-bionic-py3.8-gcc9",
"pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",
"pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",
"pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
"pytorch-linux-xenial-py3-clang5-asan",
"pytorch-linux-xenial-py3-clang7-asan",
"pytorch-linux-xenial-py3-clang7-onnx",
"pytorch-linux-xenial-py3.8",
"pytorch-linux-xenial-py3.6-clang7",
"pytorch-linux-xenial-py3.6-gcc5.4", # this one is used in doc builds
"pytorch-linux-xenial-py3.6-gcc7.2",
"pytorch-linux-xenial-py3.6-gcc7",
"pytorch-linux-bionic-rocm4.1-py3.6",
"pytorch-linux-bionic-rocm4.2-py3.6",
"pytorch-linux-bionic-rocm4.3.1-py3.6",
]
# This entry should be an element from the list above
@ -32,10 +13,12 @@ IMAGE_NAMES = [
# pytorch_build_data.py
SLOW_GRADCHECK_IMAGE_NAME = "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
def get_workflow_jobs(only_slow_gradcheck=False):
def get_workflow_jobs(images=IMAGE_NAMES, only_slow_gradcheck=False):
"""Generates a list of docker image build definitions"""
ret = []
for image_name in IMAGE_NAMES:
for image_name in images:
if image_name.startswith('docker-'):
image_name = image_name.lstrip('docker-')
if only_slow_gradcheck and image_name is not SLOW_GRADCHECK_IMAGE_NAME:
continue

View File

@ -75,6 +75,12 @@ WORKFLOW_DATA = [
IOSJob(XCODE_VERSION, ArchVariant("arm64", "custom"), extra_props={
"op_list": "mobilenetv2.yaml",
"lite_interpreter": miniutils.quote(str(int(True)))}),
IOSJob(XCODE_VERSION, ArchVariant("x86_64", "coreml"), is_org_member_context=False, extra_props={
"use_coreml": miniutils.quote(str(int(True))),
"lite_interpreter": miniutils.quote(str(int(True)))}),
IOSJob(XCODE_VERSION, ArchVariant("arm64", "coreml"), extra_props={
"use_coreml": miniutils.quote(str(int(True))),
"lite_interpreter": miniutils.quote(str(int(True)))}),
]

View File

@ -4,12 +4,6 @@ PyTorch Mobile PR builds (use linux host toolchain + mobile build options)
import cimodel.lib.miniutils as miniutils
import cimodel.data.simple.util.branch_filters
from cimodel.data.simple.util.docker_constants import (
DOCKER_IMAGE_ASAN,
DOCKER_REQUIREMENT_ASAN,
DOCKER_IMAGE_NDK,
DOCKER_REQUIREMENT_NDK
)
class MobileJob:
@ -52,33 +46,6 @@ class MobileJob:
WORKFLOW_DATA = [
MobileJob(
DOCKER_IMAGE_ASAN,
[DOCKER_REQUIREMENT_ASAN],
["build"]
),
# Use LLVM-DEV toolchain in android-ndk-r19c docker image
MobileJob(
DOCKER_IMAGE_NDK,
[DOCKER_REQUIREMENT_NDK],
["custom", "build", "dynamic"]
),
MobileJob(
DOCKER_IMAGE_NDK,
[DOCKER_REQUIREMENT_NDK],
["custom", "build", "static"]
),
# Use LLVM-DEV toolchain in android-ndk-r19c docker image
# Most of this CI is already covered by "mobile-custom-build-dynamic" job
MobileJob(
DOCKER_IMAGE_NDK,
[DOCKER_REQUIREMENT_NDK],
["code", "analysis"],
True
),
]

View File

@ -1,77 +0,0 @@
from cimodel.data.simple.util.docker_constants import (
DOCKER_IMAGE_NDK,
DOCKER_REQUIREMENT_NDK
)
class AndroidNightlyJob:
def __init__(self,
variant,
template_name,
extra_props=None,
with_docker=True,
requires=None,
no_build_suffix=False):
self.variant = variant
self.template_name = template_name
self.extra_props = extra_props or {}
self.with_docker = with_docker
self.requires = requires
self.no_build_suffix = no_build_suffix
def gen_tree(self):
base_name_parts = [
"pytorch",
"linux",
"xenial",
"py3",
"clang5",
"android",
"ndk",
"r19c",
] + self.variant
build_suffix = [] if self.no_build_suffix else ["build"]
full_job_name = "_".join(["nightly"] + base_name_parts + build_suffix)
build_env_name = "-".join(base_name_parts)
props_dict = {
"name": full_job_name,
"requires": self.requires,
"filters": {"branches": {"only": "nightly"}},
}
props_dict.update(self.extra_props)
if self.with_docker:
props_dict["docker_image"] = DOCKER_IMAGE_NDK
props_dict["build_environment"] = build_env_name
return [{self.template_name: props_dict}]
BASE_REQUIRES = [DOCKER_REQUIREMENT_NDK]
WORKFLOW_DATA = [
AndroidNightlyJob(["x86_32"], "pytorch_linux_build", requires=BASE_REQUIRES),
AndroidNightlyJob(["x86_64"], "pytorch_linux_build", requires=BASE_REQUIRES),
AndroidNightlyJob(["arm", "v7a"], "pytorch_linux_build", requires=BASE_REQUIRES),
AndroidNightlyJob(["arm", "v8a"], "pytorch_linux_build", requires=BASE_REQUIRES),
AndroidNightlyJob(["android_gradle"], "pytorch_android_gradle_build",
with_docker=False,
requires=[
"nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build",
"nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build",
"nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build",
"nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build"]),
AndroidNightlyJob(["x86_32_android_publish_snapshot"], "pytorch_android_publish_snapshot",
extra_props={"context": "org-member"},
with_docker=False,
requires=["nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_android_gradle_build"],
no_build_suffix=True),
]
def get_workflow_jobs():
return [item.gen_tree() for item in WORKFLOW_DATA]

View File

@ -5,9 +5,11 @@ import cimodel.lib.miniutils as miniutils
class IOSNightlyJob:
def __init__(self,
variant,
is_full_jit=False,
is_upload=False):
self.variant = variant
self.is_full_jit = is_full_jit
self.is_upload = is_upload
def get_phase_name(self):
@ -17,8 +19,11 @@ class IOSNightlyJob:
extra_name_suffix = [self.get_phase_name()] if self.is_upload else []
extra_name = ["full_jit"] if self.is_full_jit else []
common_name_pieces = [
"ios",
] + extra_name + [
] + ios_definitions.XCODE_VERSION.render_dots_or_parts(with_version_dots) + [
"nightly",
self.variant,
@ -31,7 +36,8 @@ class IOSNightlyJob:
return "_".join(["pytorch"] + self.get_common_name_pieces(False))
def gen_tree(self):
extra_requires = [x.gen_job_name() for x in BUILD_CONFIGS] if self.is_upload else []
build_configs = BUILD_CONFIGS_FULL_JIT if self.is_full_jit else BUILD_CONFIGS
extra_requires = [x.gen_job_name() for x in build_configs] if self.is_upload else []
props_dict = {
"build_environment": "-".join(["libtorch"] + self.get_common_name_pieces(True)),
@ -47,6 +53,9 @@ class IOSNightlyJob:
props_dict["use_metal"] = miniutils.quote(str(int(True)))
props_dict["use_coreml"] = miniutils.quote(str(int(True)))
if self.is_full_jit:
props_dict["lite_interpreter"] = miniutils.quote(str(int(False)))
template_name = "_".join([
"binary",
"ios",
@ -61,9 +70,14 @@ BUILD_CONFIGS = [
IOSNightlyJob("arm64"),
]
BUILD_CONFIGS_FULL_JIT = [
IOSNightlyJob("x86_64", is_full_jit=True),
IOSNightlyJob("arm64", is_full_jit=True),
]
WORKFLOW_DATA = BUILD_CONFIGS + [
IOSNightlyJob("binary", is_upload=True),
WORKFLOW_DATA = BUILD_CONFIGS + BUILD_CONFIGS_FULL_JIT + [
IOSNightlyJob("binary", is_full_jit=False, is_upload=True),
IOSNightlyJob("binary", is_full_jit=True, is_upload=True),
]

View File

@ -1,160 +0,0 @@
import cimodel.lib.miniutils as miniutils
from cimodel.data.simple.util.branch_filters import gen_filter_dict, RC_PATTERN, NON_PR_BRANCH_LIST
from cimodel.data.simple.util.versions import CudaVersion
class WindowsJob:
def __init__(
self,
test_index,
vscode_spec,
cuda_version,
force_on_cpu=False,
multi_gpu=False,
master_only=False,
nightly_only=False,
master_and_nightly=False
):
self.test_index = test_index
self.vscode_spec = vscode_spec
self.cuda_version = cuda_version
self.force_on_cpu = force_on_cpu
self.multi_gpu = multi_gpu
self.master_only = master_only
self.nightly_only = nightly_only
self.master_and_nightly = master_and_nightly
def gen_tree(self):
base_phase = "build" if self.test_index is None else "test"
numbered_phase = (
base_phase if self.test_index is None else base_phase + str(self.test_index)
)
key_parts = ["pytorch", "windows", base_phase]
if self.multi_gpu:
key_parts.append('multigpu')
key_name = "_".join(key_parts)
cpu_forcing_name_parts = ["on", "cpu"] if self.force_on_cpu else []
target_arch = self.cuda_version.render_dots() if self.cuda_version else "cpu"
python_version = "3.8"
base_name_parts = [
"pytorch",
"windows",
self.vscode_spec.render(),
"py" + python_version.replace(".", ""),
target_arch,
]
prerequisite_jobs = []
if base_phase == "test":
prerequisite_jobs.append("_".join(base_name_parts + ["build"]))
if self.cuda_version:
self.cudnn_version = 8 if self.cuda_version.major == 11 else 7
arch_env_elements = (
["cuda" + str(self.cuda_version.major) + "." + str(self.cuda_version.minor)]
if self.cuda_version
else ["cpu"]
)
build_environment_string = "-".join(
["pytorch", "win"]
+ self.vscode_spec.get_elements()
+ arch_env_elements
+ ["py" + python_version.split(".")[0]]
)
is_running_on_cuda = bool(self.cuda_version) and not self.force_on_cpu
if self.multi_gpu:
props_dict = {"requires": prerequisite_jobs}
else:
props_dict = {
"build_environment": build_environment_string,
"python_version": miniutils.quote(python_version),
"vs_version": miniutils.quote("16.8.6"),
"vc_version": miniutils.quote(self.vscode_spec.dotted_version()),
"vc_year": miniutils.quote(str(self.vscode_spec.year)),
"vc_product": self.vscode_spec.get_product(),
"use_cuda": miniutils.quote(str(int(is_running_on_cuda))),
"requires": prerequisite_jobs,
}
if self.master_only:
props_dict[
"filters"
] = gen_filter_dict()
elif self.nightly_only:
props_dict[
"filters"
] = gen_filter_dict(branches_list=["nightly"], tags_list=RC_PATTERN)
elif self.master_and_nightly:
props_dict[
"filters"
] = gen_filter_dict(branches_list=NON_PR_BRANCH_LIST + ["nightly"], tags_list=RC_PATTERN)
name_parts = base_name_parts + cpu_forcing_name_parts + [numbered_phase]
if not self.multi_gpu:
if base_phase == "test":
test_name = "-".join(["pytorch", "windows", numbered_phase])
props_dict["test_name"] = test_name
if is_running_on_cuda:
props_dict["executor"] = "windows-with-nvidia-gpu"
props_dict["cuda_version"] = (
miniutils.quote(str(self.cuda_version))
if self.cuda_version
else "cpu"
)
props_dict["name"] = "_".join(name_parts)
return [{key_name: props_dict}]
class VcSpec:
def __init__(self, year, version_elements=None, hide_version=False):
self.year = year
self.version_elements = version_elements or []
self.hide_version = hide_version
def get_elements(self):
if self.hide_version:
return [self.prefixed_year()]
return [self.prefixed_year()] + self.version_elements
def get_product(self):
return "BuildTools"
def dotted_version(self):
return ".".join(self.version_elements)
def prefixed_year(self):
return "vs" + str(self.year)
def render(self):
return "_".join(self.get_elements())
_VC2019 = VcSpec(2019)
WORKFLOW_DATA = [
# VS2019 CUDA-10.2
WindowsJob(None, _VC2019, CudaVersion(10, 2), master_only=True),
# VS2019 CUDA-10.2 force on cpu
WindowsJob(1, _VC2019, CudaVersion(10, 2), force_on_cpu=True, master_only=True),
# TODO: This test is disabled due to https://github.com/pytorch/pytorch/issues/59724
# WindowsJob('_azure_multi_gpu', _VC2019, CudaVersion(11, 1), multi_gpu=True, master_and_nightly=True),
]
def get_windows_workflows():
return [item.gen_tree() for item in WORKFLOW_DATA]

3054
.circleci/config.yml generated

File diff suppressed because it is too large Load Diff

View File

@ -51,9 +51,9 @@ android {
dependencies {
implementation 'com.android.support:appcompat-v7:28.0.0'
implementation 'androidx.appcompat:appcompat:1.0.0'
implementation 'com.facebook.fbjni:fbjni-java-only:0.0.3'
implementation 'com.facebook.fbjni:fbjni-java-only:0.2.2'
implementation 'com.google.code.findbugs:jsr305:3.0.1'
implementation 'com.facebook.soloader:nativeloader:0.8.0'
implementation 'com.facebook.soloader:nativeloader:0.10.1'
implementation 'junit:junit:' + rootProject.junitVersion
implementation 'androidx.test:core:' + rootProject.coreVersion

View File

@ -82,8 +82,8 @@ case "$image" in
GCC_VERSION=7
# Do not install PROTOBUF, DB, and VISION as a test
;;
pytorch-linux-xenial-py3.6-gcc5.4)
ANACONDA_PYTHON_VERSION=3.6
pytorch-linux-xenial-py3.7-gcc5.4)
ANACONDA_PYTHON_VERSION=3.7
CMAKE_VERSION=3.10.3
GCC_VERSION=5
PROTOBUF=yes
@ -91,14 +91,14 @@ case "$image" in
VISION=yes
KATEX=yes
;;
pytorch-linux-xenial-py3.6-gcc7.2)
ANACONDA_PYTHON_VERSION=3.6
pytorch-linux-xenial-py3.7-gcc7.2)
ANACONDA_PYTHON_VERSION=3.7
CMAKE_VERSION=3.10.3
GCC_VERSION=7
# Do not install PROTOBUF, DB, and VISION as a test
;;
pytorch-linux-xenial-py3.6-gcc7)
ANACONDA_PYTHON_VERSION=3.6
pytorch-linux-xenial-py3.7-gcc7)
ANACONDA_PYTHON_VERSION=3.7
CMAKE_VERSION=3.10.3
GCC_VERSION=7
PROTOBUF=yes
@ -108,7 +108,7 @@ case "$image" in
pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7)
CUDA_VERSION=10.2
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.6
ANACONDA_PYTHON_VERSION=3.7
CMAKE_VERSION=3.10.3
GCC_VERSION=7
PROTOBUF=yes
@ -119,7 +119,7 @@ case "$image" in
pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7)
CUDA_VERSION=11.1
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.6
ANACONDA_PYTHON_VERSION=3.7
CMAKE_VERSION=3.10.3
GCC_VERSION=7
PROTOBUF=yes
@ -130,7 +130,19 @@ case "$image" in
pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7)
CUDA_VERSION=11.3.0 # Deviating from major.minor to conform to nvidia's Docker image names
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.6
TENSORRT_VERSION=8.0.1.6
ANACONDA_PYTHON_VERSION=3.7
CMAKE_VERSION=3.10.3
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
;;
pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7)
CUDA_VERSION=11.5.0
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.7
CMAKE_VERSION=3.10.3
GCC_VERSION=7
PROTOBUF=yes
@ -139,15 +151,15 @@ case "$image" in
KATEX=yes
;;
pytorch-linux-xenial-py3-clang5-asan)
ANACONDA_PYTHON_VERSION=3.6
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=5.0
CMAKE_VERSION=3.10.3
CMAKE_VERSION=3.13.5
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-py3-clang7-asan)
ANACONDA_PYTHON_VERSION=3.6
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=7
CMAKE_VERSION=3.10.3
PROTOBUF=yes
@ -155,7 +167,7 @@ case "$image" in
VISION=yes
;;
pytorch-linux-xenial-py3-clang7-onnx)
ANACONDA_PYTHON_VERSION=3.6
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=7
CMAKE_VERSION=3.10.3
PROTOBUF=yes
@ -163,9 +175,9 @@ case "$image" in
VISION=yes
;;
pytorch-linux-xenial-py3-clang5-android-ndk-r19c)
ANACONDA_PYTHON_VERSION=3.6
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=5.0
CMAKE_VERSION=3.10.3
CMAKE_VERSION=3.13.5
LLVMDEV=yes
PROTOBUF=yes
ANDROID=yes
@ -173,16 +185,16 @@ case "$image" in
GRADLE_VERSION=6.8.3
NINJA_VERSION=1.9.0
;;
pytorch-linux-xenial-py3.6-clang7)
ANACONDA_PYTHON_VERSION=3.6
pytorch-linux-xenial-py3.7-clang7)
ANACONDA_PYTHON_VERSION=3.7
CMAKE_VERSION=3.10.3
CLANG_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-bionic-py3.6-clang9)
ANACONDA_PYTHON_VERSION=3.6
pytorch-linux-bionic-py3.7-clang9)
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=9
PROTOBUF=yes
DB=yes
@ -197,10 +209,10 @@ case "$image" in
DB=yes
VISION=yes
;;
pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9)
pytorch-linux-bionic-cuda10.2-cudnn7-py3.7-clang9)
CUDA_VERSION=10.2
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.6
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=9
PROTOBUF=yes
DB=yes
@ -215,34 +227,34 @@ case "$image" in
DB=yes
VISION=yes
;;
pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9)
pytorch-linux-bionic-cuda11.0-cudnn8-py3.7-gcc9)
CUDA_VERSION=11.0
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.6
ANACONDA_PYTHON_VERSION=3.7
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=3.9
;;
pytorch-linux-bionic-rocm4.1-py3.6)
ANACONDA_PYTHON_VERSION=3.6
pytorch-linux-bionic-rocm4.1-py3.7)
ANACONDA_PYTHON_VERSION=3.7
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=4.1
;;
pytorch-linux-bionic-rocm4.2-py3.6)
ANACONDA_PYTHON_VERSION=3.6
pytorch-linux-bionic-rocm4.2-py3.7)
ANACONDA_PYTHON_VERSION=3.7
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=4.2
;;
pytorch-linux-bionic-rocm4.3.1-py3.6)
ANACONDA_PYTHON_VERSION=3.6
pytorch-linux-bionic-rocm4.3.1-py3.7)
ANACONDA_PYTHON_VERSION=3.7
GCC_VERSION=9
PROTOBUF=yes
DB=yes
@ -294,6 +306,16 @@ fi
tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')
# If we are trying to use nvidia cuda image make sure it exists, otherwise use IMAGE from ghcr.io
# this logic currently only exists for ubuntu
if [[ "$image" == *cuda* && ${OS} == "ubuntu" ]]; then
IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION}"
if ! DOCKER_CLI_EXPERIMENTAL=enabled docker manifest inspect "${IMAGE_NAME}" >/dev/null 2>/dev/null; then
IMAGE_NAME="ghcr.io/pytorch/nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}"
INSTALL_CUDNN="True"
fi
fi
# Build image
# TODO: build-arg THRIFT is not turned on for any image, remove it once we confirm
# it's no longer needed.
@ -320,6 +342,7 @@ docker build \
--build-arg "GCC_VERSION=${GCC_VERSION}" \
--build-arg "CUDA_VERSION=${CUDA_VERSION}" \
--build-arg "CUDNN_VERSION=${CUDNN_VERSION}" \
--build-arg "TENSORRT_VERSION=${TENSORRT_VERSION}" \
--build-arg "ANDROID=${ANDROID}" \
--build-arg "ANDROID_NDK=${ANDROID_NDK_VERSION}" \
--build-arg "GRADLE_VERSION=${GRADLE_VERSION}" \
@ -329,6 +352,9 @@ docker build \
--build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \
--build-arg "KATEX=${KATEX:-}" \
--build-arg "ROCM_VERSION=${ROCM_VERSION:-}" \
--build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx900;gfx906}" \
--build-arg "IMAGE_NAME=${IMAGE_NAME}" \
--build-arg "INSTALL_CUDNN=${INSTALL_CUDNN}" \
-f $(dirname ${DOCKERFILE})/Dockerfile \
-t "$tmp_tag" \
"$@" \
@ -347,6 +373,7 @@ function drun() {
}
if [[ "$OS" == "ubuntu" ]]; then
if !(drun lsb_release -a 2>&1 | grep -qF Ubuntu); then
echo "OS=ubuntu, but:"
drun lsb_release -a

View File

@ -26,11 +26,14 @@ login() {
docker login -u AWS --password-stdin "$1"
}
# Retry on timeouts (can happen on job stampede).
retry login "${registry}"
# Logout on exit
trap "docker logout ${registry}" EXIT
# Only run these steps if not on github actions
if [[ -z "${GITHUB_ACTIONS}" ]]; then
# Retry on timeouts (can happen on job stampede).
retry login "${registry}"
# Logout on exit
trap "docker logout ${registry}" EXIT
fi
# export EC2=1
# export JENKINS=1
@ -45,8 +48,8 @@ trap "docker logout ${registry}" EXIT
docker push "${image}:${tag}"
docker save -o "${IMAGE_NAME}:${tag}.tar" "${image}:${tag}"
if [ -z "${DOCKER_SKIP_S3_UPLOAD:-}" ]; then
trap "rm -rf ${IMAGE_NAME}:${tag}.tar" EXIT
docker save -o "${IMAGE_NAME}:${tag}.tar" "${image}:${tag}"
aws s3 cp "${IMAGE_NAME}:${tag}.tar" "s3://ossci-linux-build/pytorch/base/${IMAGE_NAME}:${tag}.tar" --acl public-read
fi

View File

@ -4,6 +4,10 @@ FROM centos:${CENTOS_VERSION}
ARG CENTOS_VERSION
# Set AMD gpu targets to build for
ARG PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
# Install required packages to build Caffe2
# Install common dependencies (so that this step can be cached separately)
@ -11,6 +15,12 @@ ARG EC2
ADD ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
# Update CentOS git version
RUN yum -y remove git
RUN yum -y remove git-*
RUN yum -y install https://packages.endpoint.com/rhel/7/os/x86_64/endpoint-repo-1.9-1.x86_64.rpm
RUN yum install -y git
# Install devtoolset
ARG DEVTOOLSET_VERSION
ADD ./common/install_devtoolset.sh install_devtoolset.sh
@ -27,7 +37,7 @@ RUN rm install_glibc.sh
ADD ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install conda and other packages (e.g., numpy, coverage, pytest)
# Install conda and other packages (e.g., numpy, pytest)
ENV PATH /opt/conda/bin:$PATH
ARG ANACONDA_PYTHON_VERSION
ADD ./common/install_conda.sh install_conda.sh

View File

@ -11,8 +11,13 @@ install_ubuntu() {
# "$UBUNTU_VERSION" == "18.04"
if [[ "$UBUNTU_VERSION" == "18.04"* ]]; then
cmake3="cmake=3.10*"
maybe_libiomp_dev="libiomp-dev"
elif [[ "$UBUNTU_VERSION" == "20.04"* ]]; then
cmake3="cmake=3.16*"
maybe_libiomp_dev=""
else
cmake3="cmake=3.5*"
maybe_libiomp_dev="libiomp-dev"
fi
# Install common dependencies
@ -33,7 +38,7 @@ install_ubuntu() {
git \
libatlas-base-dev \
libc6-dbg \
libiomp-dev \
${maybe_libiomp_dev} \
libyaml-dev \
libz-dev \
libjpeg-dev \
@ -44,6 +49,10 @@ install_ubuntu() {
wget \
vim
# Should resolve issues related to various apt package repository cert issues
# see: https://github.com/pytorch/pytorch/issues/65931
apt-get install -y libgnutls30
# Cleanup package manager
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
@ -109,10 +118,7 @@ esac
# Install Valgrind separately since the apt-get version is too old.
mkdir valgrind_build && cd valgrind_build
VALGRIND_VERSION=3.16.1
if ! wget http://valgrind.org/downloads/valgrind-${VALGRIND_VERSION}.tar.bz2
then
wget https://sourceware.org/ftp/valgrind/valgrind-${VALGRIND_VERSION}.tar.bz2
fi
wget https://ossci-linux.s3.amazonaws.com/valgrind-${VALGRIND_VERSION}.tar.bz2
tar -xjf valgrind-${VALGRIND_VERSION}.tar.bz2
cd valgrind-${VALGRIND_VERSION}
./configure --prefix=/usr/local

View File

@ -13,7 +13,12 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
CONDA_FILE="Miniconda2-latest-Linux-x86_64.sh"
;;
3)
CONDA_FILE="Miniconda3-latest-Linux-x86_64.sh"
if [ "$ANACONDA_PYTHON_VERSION" = "3.6" ]; then
# Latest release of Conda that still supports python-3.6
CONDA_FILE="Miniconda3-py37_4.10.3-Linux-x86_64.sh"
else
CONDA_FILE="Miniconda3-latest-Linux-x86_64.sh"
fi
;;
*)
echo "Unsupported ANACONDA_PYTHON_VERSION: $ANACONDA_PYTHON_VERSION"
@ -56,7 +61,9 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
pushd /opt/conda
# Track latest conda update
as_jenkins conda update -y -n base conda
if [ "$ANACONDA_PYTHON_VERSION" != "3.6" ]; then
as_jenkins conda update -y -n base conda
fi
# Install correct Python version
as_jenkins conda install -y python="$ANACONDA_PYTHON_VERSION"
@ -86,14 +93,10 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
conda_install numpy=1.18.5 astunparse pyyaml mkl mkl-include setuptools cffi future six dataclasses typing_extensions
fi
if [[ "$CUDA_VERSION" == 10.2* ]]; then
conda_install magma-cuda102 -c pytorch
elif [[ "$CUDA_VERSION" == 11.0* ]]; then
conda_install magma-cuda110 -c pytorch
elif [[ "$CUDA_VERSION" == 11.1* ]]; then
conda_install magma-cuda111 -c pytorch
elif [[ "$CUDA_VERSION" == 11.3* ]]; then
conda_install magma-cuda113 -c pytorch
# Magma package names are concatenation of CUDA major and minor ignoring revision
# I.e. magma-cuda102 package corresponds to CUDA_VERSION=10.2 and CUDA_VERSION=10.2.89
if [ -n "$CUDA_VERSION" ]; then
conda_install magma-cuda$(TMP=${CUDA_VERSION/./};echo ${TMP%.*[0-9]}) -c pytorch
fi
# TODO: This isn't working atm
@ -103,14 +106,12 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
# TODO: Why is scipy pinned
# Pin MyPy version because new errors are likely to appear with each release
# Pin hypothesis to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136
# Pin coverage so we can use COVERAGE_RCFILE
as_jenkins pip install --progress-bar off pytest \
scipy==$SCIPY_VERSION \
scikit-image \
psutil \
unittest-xml-reporting \
boto3==1.16.34 \
coverage==5.5 \
hypothesis==4.53.2 \
expecttest==0.1.3 \
mypy==0.812 \

View File

@ -0,0 +1,10 @@
#!/bin/bash
sudo apt-get update
# also install ssh to avoid error of:
# --------------------------------------------------------------------------
# The value of the MCA parameter "plm_rsh_agent" was set to a path
# that could not be found:
# plm_rsh_agent: ssh : rsh
sudo apt-get install -y ssh
sudo apt-get update && apt-get install -y --no-install-recommends libcudnn8=8.2.0.53-1+cuda11.3 libcudnn8-dev=8.2.0.53-1+cuda11.3 && apt-mark hold libcudnn8

View File

@ -7,15 +7,18 @@ if [ -n "$GCC_VERSION" ]; then
# Need the official toolchain repo to get alternate packages
add-apt-repository ppa:ubuntu-toolchain-r/test
apt-get update
if [ "$UBUNTU_VERSION" = "16.04" -a "$GCC_VERSION" = "5" ]; then
if [[ "$UBUNTU_VERSION" == "16.04" && "${GCC_VERSION:0:1}" == "5" ]]; then
apt-get install -y g++-5=5.4.0-6ubuntu1~16.04.12
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 50
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 50
update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-5 50
else
apt-get install -y g++-$GCC_VERSION
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" 50
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" 50
update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-"$GCC_VERSION" 50
fi
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" 50
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" 50
update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-"$GCC_VERSION" 50
# Cleanup package manager
apt-get autoclean && apt-get clean

View File

@ -4,7 +4,7 @@ set -ex
OPENSSL=openssl-1.1.1k
wget -q -O "${OPENSSL}.tar.gz" "https://www.openssl.org/source/${OPENSSL}.tar.gz"
wget -q -O "${OPENSSL}.tar.gz" "https://ossci-linux.s3.amazonaws.com/${OPENSSL}.tar.gz"
tar xf "${OPENSSL}.tar.gz"
cd "${OPENSSL}"
./config --prefix=/opt/openssl -d '-Wl,--enable-new-dtags,-rpath,$(LIBRPATH)'

View File

@ -4,22 +4,27 @@ set -ex
install_magma() {
# "install" hipMAGMA into /opt/rocm/magma by copying after build
git clone https://bitbucket.org/icl/magma.git -b magma_ctrl_launch_bounds
git clone https://bitbucket.org/icl/magma.git
pushd magma
# The branch "magma_ctrl_launch_bounds" is having a fix over the below commit, so keeping the below comment for reference.
#git checkout 878b1ce02e9cfe4a829be22c8f911e9c0b6bd88f
# Work around non-asii characters in certain magma sources; remove this after upstream magma fixes this.
perl -i.bak -pe 's/[^[:ascii:]]//g' sparse/control/magma_zfree.cpp
perl -i.bak -pe 's/[^[:ascii:]]//g' sparse/control/magma_zsolverinfo.cpp
# fix for magma_queue memory leak issue
git checkout c62d700d880c7283b33fb1d615d62fc9c7f7ca21
cp make.inc-examples/make.inc.hip-gcc-mkl make.inc
echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc
echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc
echo 'DEVCCFLAGS += --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 --gpu-max-threads-per-block=256' >> make.inc
echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc
export PATH="${PATH}:/opt/rocm/bin"
if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then
amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'`
else
amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs`
fi
for arch in $amdgpu_targets; do
echo "DEVCCFLAGS += --amdgpu-target=$arch" >> make.inc
done
# hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition
sed -i 's/^FOPENMP/#FOPENMP/g' make.inc
export PATH="${PATH}:/opt/rocm/bin"
make -f make.gen.hipMAGMA -j $(nproc)
make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda
LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda
make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda
popd
mv magma /opt/rocm
@ -35,6 +40,10 @@ install_ubuntu() {
# gpg-agent is not available by default on 18.04
apt-get install -y --no-install-recommends gpg-agent
fi
if [[ $UBUNTU_VERSION == 20.04 ]]; then
# gpg-agent is not available by default on 20.04
apt-get install -y --no-install-recommends gpg-agent
fi
apt-get install -y kmod
apt-get install -y wget

View File

@ -0,0 +1,7 @@
#!/bin/bash
if [ -n "$TENSORRT_VERSION" ]; then
python3 -m pip install --upgrade setuptools pip
python3 -m pip install nvidia-pyindex
python3 -m pip install nvidia-tensorrt==${TENSORRT_VERSION} --extra-index-url https://pypi.ngc.nvidia.com
fi

View File

@ -1,13 +1,15 @@
ARG UBUNTU_VERSION
ARG CUDA_VERSION
ARG CUDNN_VERSION
ARG IMAGE_NAME
FROM nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION}
FROM ${IMAGE_NAME}
ARG UBUNTU_VERSION
ARG CUDA_VERSION
ARG CUDNN_VERSION
ENV DEBIAN_FRONTEND noninteractive
# Install common dependencies (so that this step can be cached separately)
@ -24,7 +26,7 @@ ARG KATEX
ADD ./common/install_katex.sh install_katex.sh
RUN bash ./install_katex.sh && rm install_katex.sh
# Install conda and other packages (e.g., numpy, coverage, pytest)
# Install conda and other packages (e.g., numpy, pytest)
ENV PATH /opt/conda/bin:$PATH
ARG ANACONDA_PYTHON_VERSION
ADD ./common/install_conda.sh install_conda.sh
@ -65,6 +67,12 @@ ADD ./common/install_openssl.sh install_openssl.sh
ENV OPENSSL_ROOT_DIR /opt/openssl
RUN bash ./install_openssl.sh
# (optional) Install TensorRT
ARG TENSORRT_VERSION
ADD ./common/install_tensorrt.sh install_tensorrt.sh
RUN if [ -n "${TENSORRT_VERSION}" ]; then bash ./install_tensorrt.sh; fi
RUN rm install_tensorrt.sh
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
ADD ./common/install_cmake.sh install_cmake.sh
@ -75,7 +83,7 @@ RUN rm install_cmake.sh
ADD ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
RUN bash ./install_cache.sh && rm install_cache.sh
ENV CUDA_NVCC_EXECUTABLE=/opt/cache/lib/nvcc
ENV CMAKE_CUDA_COMPILER_LAUNCHER=/opt/cache/bin/sccache
# Add jni.h for java host build
ADD ./common/install_jni.sh install_jni.sh
@ -94,9 +102,17 @@ ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
# AWS specific CUDA build guidance
ENV TORCH_CUDA_ARCH_LIST Maxwell
ENV TORCH_NVCC_FLAGS "-Xfatbin -compress-all"
ENV CUDA_PATH /usr/local/cuda
# Install LLVM dev version (Defined in the pytorch/builder github repository)
COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm
# Hack for CUDA 11.5.0 image to install cudnn8 since cudnn8 is not included with CUDA 11.5 image
# Also note cudnn 8.2.0.53 is labeled for cuda 11.3
ARG INSTALL_CUDNN
ADD ./common/install_cudnn8.sh install_cudnn8.sh
RUN if [ -n "${INSTALL_CUDNN}" ]; then bash install_cudnn8.sh; fi
RUN rm install_cudnn8.sh
USER jenkins
CMD ["bash"]

View File

@ -6,6 +6,10 @@ ARG UBUNTU_VERSION
ENV DEBIAN_FRONTEND noninteractive
# Set AMD gpu targets to build for
ARG PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
# Install common dependencies (so that this step can be cached separately)
ARG EC2
ADD ./common/install_base.sh install_base.sh
@ -21,7 +25,7 @@ RUN bash ./install_clang.sh && rm install_clang.sh
ADD ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install conda and other packages (e.g., numpy, coverage, pytest)
# Install conda and other packages (e.g., numpy, pytest)
ENV PATH /opt/conda/bin:$PATH
ARG ANACONDA_PYTHON_VERSION
ADD ./common/install_conda.sh install_conda.sh

View File

@ -33,7 +33,7 @@ ARG KATEX
ADD ./common/install_katex.sh install_katex.sh
RUN bash ./install_katex.sh && rm install_katex.sh
# Install conda and other packages (e.g., numpy, coverage, pytest)
# Install conda and other packages (e.g., numpy, pytest)
ENV PATH /opt/conda/bin:$PATH
ARG ANACONDA_PYTHON_VERSION
ADD ./common/install_conda.sh install_conda.sh

View File

@ -11,17 +11,11 @@ import sys
from collections import namedtuple
import cimodel.data.binary_build_definitions as binary_build_definitions
import cimodel.data.pytorch_build_definitions as pytorch_build_definitions
import cimodel.data.simple.android_definitions
import cimodel.data.simple.binary_smoketest
import cimodel.data.simple.docker_definitions
import cimodel.data.simple.ios_definitions
import cimodel.data.simple.macos_definitions
import cimodel.data.simple.mobile_definitions
import cimodel.data.simple.nightly_android
import cimodel.data.simple.nightly_ios
import cimodel.data.simple.anaconda_prune_defintions
import cimodel.data.windows_build_definitions as windows_build_definitions
import cimodel.lib.miniutils as miniutils
import cimodel.lib.miniyaml as miniyaml
@ -78,15 +72,15 @@ class Header(object):
for line in filter(None, lines):
output_filehandle.write(line + "\n")
def filter_master_only_jobs(items):
def _for_all_items(items, functor) -> None:
if isinstance(items, list):
for item in items:
_for_all_items(item, functor)
if isinstance(items, dict) and len(items) == 1:
item_type, item = next(iter(items.items()))
functor(item_type, item)
def _for_all_items(items, functor) -> None:
if isinstance(items, list):
for item in items:
_for_all_items(item, functor)
if isinstance(items, dict) and len(items) == 1:
item_type, item = next(iter(items.items()))
functor(item_type, item)
def filter_master_only_jobs(items):
def _is_master_item(item):
filters = item.get('filters', None)
branches = filters.get('branches', None) if filters is not None else None
@ -124,24 +118,37 @@ def filter_master_only_jobs(items):
_for_all_items(items, _save_requires_if_master)
return _do_filtering(items)
def generate_required_docker_images(items):
required_docker_images = set()
def _requires_docker_image(item_type, item):
requires = item.get('requires', None)
if not isinstance(requires, list):
return
for requirement in requires:
requirement = requirement.replace('"', '')
if requirement.startswith('docker-'):
required_docker_images.add(requirement)
_for_all_items(items, _requires_docker_image)
return required_docker_images
def gen_build_workflows_tree():
build_workflows_functions = [
cimodel.data.simple.docker_definitions.get_workflow_jobs,
pytorch_build_definitions.get_workflow_jobs,
cimodel.data.simple.macos_definitions.get_workflow_jobs,
cimodel.data.simple.android_definitions.get_workflow_jobs,
cimodel.data.simple.ios_definitions.get_workflow_jobs,
cimodel.data.simple.mobile_definitions.get_workflow_jobs,
cimodel.data.simple.binary_smoketest.get_workflow_jobs,
cimodel.data.simple.nightly_ios.get_workflow_jobs,
cimodel.data.simple.nightly_android.get_workflow_jobs,
cimodel.data.simple.anaconda_prune_defintions.get_workflow_jobs,
windows_build_definitions.get_windows_workflows,
binary_build_definitions.get_post_upload_jobs,
binary_build_definitions.get_binary_smoke_test_jobs,
]
build_jobs = [f() for f in build_workflows_functions]
build_jobs.extend(
cimodel.data.simple.docker_definitions.get_workflow_jobs(
# sort for consistency
sorted(generate_required_docker_images(build_jobs))
)
)
master_build_jobs = filter_master_only_jobs(build_jobs)
binary_build_functions = [
@ -150,11 +157,6 @@ def gen_build_workflows_tree():
binary_build_definitions.get_nightly_uploads,
]
slow_gradcheck_jobs = [
pytorch_build_definitions.get_workflow_jobs,
cimodel.data.simple.docker_definitions.get_workflow_jobs,
]
return {
"workflows": {
"binary_builds": {
@ -169,10 +171,6 @@ def gen_build_workflows_tree():
"when": r"<< pipeline.parameters.run_master_build >>",
"jobs": master_build_jobs,
},
"slow_gradcheck_build": {
"when": r"<< pipeline.parameters.run_slow_gradcheck_build >>",
"jobs": [f(only_slow_gradcheck=True) for f in slow_gradcheck_jobs],
},
}
}
@ -196,7 +194,6 @@ YAML_SOURCES = [
File("job-specs/docker_jobs.yml"),
Header("Workflows"),
Treegen(gen_build_workflows_tree, 0),
File("workflows/workflows-scheduled-ci.yml"),
File("workflows/workflows-ecr-gc.yml"),
File("workflows/workflows-promote.yml"),
]

View File

@ -61,7 +61,7 @@ git --no-pager log --max-count 1
popd
# Clone the Builder master repo
retry git clone -q https://github.com/pytorch/builder.git -b release/1.10 "$BUILDER_ROOT"
retry git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"
pushd "$BUILDER_ROOT"
echo "Using builder from "
git --no-pager log --max-count 1

View File

@ -27,4 +27,4 @@ if ! [ -x "$(command -v xcodebuild)" ]; then
exit 1
fi
PROFILE=PyTorch_CI_2022
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID} -f Accelerate,MetalPerformanceShaders,CoreML
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}

View File

@ -23,14 +23,23 @@ do
fi
done
lipo -i ${ZIP_DIR}/install/lib/*.a
echo "BUILD_LITE_INTERPRETER: ${BUILD_LITE_INTERPRETER}"
# copy the umbrella header and license
cp ${PROJ_ROOT}/ios/LibTorch-Lite.h ${ZIP_DIR}/src/
if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then
cp ${PROJ_ROOT}/ios/LibTorch-Lite.h ${ZIP_DIR}/src/
else
cp ${PROJ_ROOT}/ios/LibTorch.h ${ZIP_DIR}/src/
fi
cp ${PROJ_ROOT}/LICENSE ${ZIP_DIR}/
# zip the library
export DATE="$(date -u +%Y%m%d)"
export IOS_NIGHTLY_BUILD_VERSION="1.10.0.${DATE}"
# libtorch_lite_ios_nightly_1.10.0.20210810.zip
ZIPFILE="libtorch_lite_ios_nightly_${IOS_NIGHTLY_BUILD_VERSION}.zip"
export IOS_NIGHTLY_BUILD_VERSION="1.11.0.${DATE}"
if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then
# libtorch_lite_ios_nightly_1.11.0.20210810.zip
ZIPFILE="libtorch_lite_ios_nightly_${IOS_NIGHTLY_BUILD_VERSION}.zip"
else
ZIPFILE="libtorch_ios_nightly_build.zip"
fi
cd ${ZIP_DIR}
#for testing
touch version.txt
@ -52,13 +61,15 @@ set +x
# echo "AWS SECRET: ${AWS_SECRET_ACCESS_KEY}"
aws s3 cp ${ZIPFILE} s3://ossci-ios-build/ --acl public-read
# create a new LibTorch-Lite-Nightly.podspec from the template
echo "cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec"
cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then
# create a new LibTorch-Lite-Nightly.podspec from the template
echo "cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec"
cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
# update pod version
sed -i '' -e "s/IOS_NIGHTLY_BUILD_VERSION/${IOS_NIGHTLY_BUILD_VERSION}/g" ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
cat ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
# update pod version
sed -i '' -e "s/IOS_NIGHTLY_BUILD_VERSION/${IOS_NIGHTLY_BUILD_VERSION}/g" ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
cat ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
# push the new LibTorch-Lite-Nightly.podspec to CocoaPods
pod trunk push --verbose --allow-warnings --use-libraries --skip-import-validation ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
# push the new LibTorch-Lite-Nightly.podspec to CocoaPods
pod trunk push --verbose --allow-warnings --use-libraries --skip-import-validation ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
fi

View File

@ -11,7 +11,7 @@ NUM_CPUS=$(( $(nproc) - 2 ))
# Defaults here for **binary** linux builds so they can be changed in one place
export MAX_JOBS=${MAX_JOBS:-$(( ${NUM_CPUS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${NUM_CPUS} ))}
if [[ "${DESIRED_CUDA}" == "cu111" || "${DESIRED_CUDA}" == "cu113" ]]; then
if [[ "${DESIRED_CUDA}" =~ cu11[0-9] ]]; then
export BUILD_SPLIT_CUDA="ON"
fi

View File

@ -30,7 +30,7 @@ if [[ "\$python_nodot" = *39* ]]; then
NUMPY_PIN=">=1.20"
fi
if [[ "$DESIRED_CUDA" == "cu112" ]]; then
if [[ "$DESIRED_CUDA" == "cu112" || "$DESIRED_CUDA" == "cu115" ]]; then
EXTRA_CONDA_FLAGS="-c=conda-forge"
fi

View File

@ -85,7 +85,7 @@ PIP_UPLOAD_FOLDER='nightly/'
# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it
export DATE="$(date -u +%Y%m%d)"
#TODO: We should be pulling semver version from the base version.txt
BASE_BUILD_VERSION="1.10.0.dev$DATE"
BASE_BUILD_VERSION="1.11.0.dev$DATE"
# Change BASE_BUILD_VERSION to git tag when on a git tag
# Use 'git -C' to make doubly sure we're in the correct directory for checking
# the git tag
@ -148,7 +148,7 @@ if [[ "${BUILD_FOR_SYSTEM:-}" == "windows" ]]; then
fi
export DATE="$DATE"
export NIGHTLIES_DATE_PREAMBLE=1.10.0.dev
export NIGHTLIES_DATE_PREAMBLE=1.11.0.dev
export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"
export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"
export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"

View File

@ -65,7 +65,6 @@ cp torch/_utils_internal.py tools/shared
# Generate PyTorch files
time python tools/setup_helpers/generate_code.py \
--declarations-path build/aten/src/ATen/Declarations.yaml \
--native-functions-path aten/src/ATen/native/native_functions.yaml \
--nn-path aten/src/
@ -97,8 +96,12 @@ git status
git config user.email "soumith+bot@pytorch.org"
git config user.name "pytorchbot"
# If there aren't changes, don't make a commit; push is no-op
git commit -m "Generate C++ docs from pytorch/pytorch@$CIRCLE_SHA1" || true
git commit -m "Generate C++ docs from pytorch/pytorch@${GITHUB_SHA}" || true
git status
if [[ "${WITH_PUSH:-}" == true ]]; then
git push -u origin
fi
popd
# =================== The above code **should** be executed inside Docker container ===================

View File

@ -131,8 +131,12 @@ git status
git config user.email "soumith+bot@pytorch.org"
git config user.name "pytorchbot"
# If there aren't changes, don't make a commit; push is no-op
git commit -m "Generate Python docs from pytorch/pytorch@$CIRCLE_SHA1" || true
git commit -m "Generate Python docs from pytorch/pytorch@${GITHUB_SHA}" || true
git status
if [[ "${WITH_PUSH:-}" == true ]]; then
git push -u origin "${branch}"
fi
popd
# =================== The above code **should** be executed inside Docker container ===================

View File

@ -32,7 +32,7 @@ if ! command -v aws >/dev/null; then
fi
if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
DRIVER_FN="NVIDIA-Linux-x86_64-460.39.run"
DRIVER_FN="NVIDIA-Linux-x86_64-495.44.run"
wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"
sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)
nvidia-smi

View File

@ -11,13 +11,17 @@ case ${CUDA_VERSION} in
cuda_install_packages="nvcc_10.2 cuobjdump_10.2 nvprune_10.2 cupti_10.2 cublas_10.2 cublas_dev_10.2 cudart_10.2 cufft_10.2 cufft_dev_10.2 curand_10.2 curand_dev_10.2 cusolver_10.2 cusolver_dev_10.2 cusparse_10.2 cusparse_dev_10.2 nvgraph_10.2 nvgraph_dev_10.2 npp_10.2 npp_dev_10.2 nvrtc_10.2 nvrtc_dev_10.2 nvml_dev_10.2"
;;
11.1)
cuda_installer_name="cuda_11.1.0_456.43_win10"
cuda_installer_name="cuda_11.1.1_456.81_win10"
cuda_install_packages="nvcc_11.1 cuobjdump_11.1 nvprune_11.1 nvprof_11.1 cupti_11.1 cublas_11.1 cublas_dev_11.1 cudart_11.1 cufft_11.1 cufft_dev_11.1 curand_11.1 curand_dev_11.1 cusolver_11.1 cusolver_dev_11.1 cusparse_11.1 cusparse_dev_11.1 npp_11.1 npp_dev_11.1 nvrtc_11.1 nvrtc_dev_11.1 nvml_dev_11.1"
;;
11.3)
cuda_installer_name="cuda_11.3.0_465.89_win10"
cuda_install_packages="thrust_11.3 nvcc_11.3 cuobjdump_11.3 nvprune_11.3 nvprof_11.3 cupti_11.3 cublas_11.3 cublas_dev_11.3 cudart_11.3 cufft_11.3 cufft_dev_11.3 curand_11.3 curand_dev_11.3 cusolver_11.3 cusolver_dev_11.3 cusparse_11.3 cusparse_dev_11.3 npp_11.3 npp_dev_11.3 nvrtc_11.3 nvrtc_dev_11.3 nvml_dev_11.3"
;;
11.5)
cuda_installer_name="cuda_11.5.0_496.13_win10"
cuda_install_packages="thrust_11.5 nvcc_11.5 cuobjdump_11.5 nvprune_11.5 nvprof_11.5 cupti_11.5 cublas_11.5 cublas_dev_11.5 cudart_11.5 cufft_11.5 cufft_dev_11.5 curand_11.5 curand_dev_11.5 cusolver_11.5 cusolver_dev_11.5 cusparse_11.5 cusparse_dev_11.5 npp_11.5 npp_dev_11.5 nvrtc_11.5 nvrtc_dev_11.5 nvml_dev_11.5"
;;
*)
echo "CUDA_VERSION $CUDA_VERSION is not supported yet"
exit 1

View File

@ -19,6 +19,9 @@ case ${CUDA_VERSION} in
11.3)
archive_version="v8.2.0.53"
;;
11.5)
archive_version="v8.2.0.53"
;;
*)
echo "CUDA_VERSION: ${CUDA_VERSION} not supported yet"
exit 1

View File

@ -26,24 +26,6 @@ pytorch_params: &pytorch_params
CI_MASTER: << pipeline.parameters.run_master_build >>
resource_class: << parameters.resource_class >>
pytorch_android_params: &pytorch_android_params
parameters:
build_environment:
type: string
default: ""
op_list:
type: string
default: ""
lite_interpreter:
type: string
default: "1"
environment:
BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c"
PYTHON_VERSION: "3.6"
SELECTED_OP_LIST: << parameters.op_list >>
BUILD_LITE_INTERPRETER: << parameters.lite_interpreter >>
pytorch_ios_params: &pytorch_ios_params
parameters:
build_environment:

View File

@ -43,7 +43,8 @@
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
tag=${CIRCLE_TAG:1:5}
# turn v1.12.0rc3 into 1.12.0
tag=$(echo $CIRCLE_TAG | sed -e 's/v*\([0-9.]*\).*/\1/')
target=${tag:-master}
echo "building for ${target}"
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
@ -88,6 +89,8 @@
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
# turn v1.12.0rc3 into 1.12.0
tag=$(echo $CIRCLE_TAG | sed -e 's/v*\([0-9.]*\).*/\1/')
tag=${CIRCLE_TAG:1:5}
target=${tag:-master}
echo "building for ${target}"
@ -210,7 +213,7 @@
command: |
set -ex
source /Users/distiller/workspace/miniconda3/bin/activate
pip install boto3
python3 -m pip install boto3==1.19.12
export IN_CI=1
export JOB_BASE_NAME=$CIRCLE_JOB
@ -413,43 +416,6 @@
path: ~/workspace/build_android_x86_32_artifacts/artifacts.tgz
destination: artifacts.tgz
pytorch_android_gradle_custom_build_single:
<<: *pytorch_android_params
resource_class: large
machine:
image: ubuntu-2004:202104-01
steps:
- checkout
- calculate_docker_image_tag
- setup_linux_system_environment
- checkout
- calculate_docker_image_tag
- setup_ci_environment
- run:
name: pytorch android gradle custom build single architecture (for PR)
no_output_timeout: "1h"
command: |
set -e
# Unlike other gradle jobs, it's not worth building libtorch in a separate CI job and share via docker, because:
# 1) Not shareable: it's custom selective build, which is different from default libtorch mobile build;
# 2) Not parallelizable by architecture: it only builds libtorch for one architecture;
echo "DOCKER_IMAGE: ${DOCKER_IMAGE}:${DOCKER_TAG}"
time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null
git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0
VOLUME_MOUNTS="-v /home/circleci/project/:/var/lib/jenkins/workspace"
export id=$(docker run --env-file "${BASH_ENV}" ${VOLUME_MOUNTS} --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})
export COMMAND='((echo "export GRADLE_OFFLINE=1" && echo "export BUILD_LITE_INTERPRETER=${BUILD_LITE_INTERPRETER}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
# Skip docker push as this job is purely for size analysis purpose.
# Result binaries are already in `/home/circleci/project/` as it's mounted instead of copied.
- upload_binary_size_for_android_build:
build_type: custom-build-single
pytorch_ios_build:
<<: *pytorch_ios_params
macos:
@ -518,6 +484,7 @@
echo "IOS_PLATFORM: ${IOS_PLATFORM}"
echo "USE_PYTORCH_METAL": "${USE_METAL}"
echo "BUILD_LITE_INTERPRETER": "${BUILD_LITE_INTERPRETER}"
echo "USE_COREML_DELEGATE": "${USE_COREML_DELEGATE}"
#check the custom build flag
echo "SELECTED_OP_LIST: ${SELECTED_OP_LIST}"
@ -526,6 +493,7 @@
fi
export IOS_ARCH=${IOS_ARCH}
export IOS_PLATFORM=${IOS_PLATFORM}
export USE_COREML_DELEGATE=${USE_COREML_DELEGATE}
if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then
export USE_PYTORCH_METAL=${USE_METAL}
fi
@ -565,20 +533,32 @@
PROJ_ROOT=/Users/distiller/project
source ~/anaconda/bin/activate
# use the pytorch nightly build to generate models
conda install pytorch torchvision -c pytorch-nightly --yes
pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
# generate models for differnet backends
cd ${PROJ_ROOT}/ios/TestApp/benchmark
mkdir -p ../models
python trace_model.py
if [ ${USE_COREML_DELEGATE} == 1 ]; then
pip install coremltools==5.0b5
pip install six
python coreml_backend.py
else
python trace_model.py
fi
if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then
echo "Setting up the TestApp for LiteInterpreter"
ruby setup.rb --lite 1
else
echo "Setting up the TestApp for Full JIT"
ruby setup.rb
fi
cd ${PROJ_ROOT}/ios/TestApp
instruments -s -devices
if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then
fastlane scan --only_testing TestAppTests/TestAppTests/testLiteInterpreter
if [ ${USE_COREML_DELEGATE} == 1 ]; then
fastlane scan --only_testing TestAppTests/TestAppTests/testCoreML
else
fastlane scan --only_testing TestAppTests/TestAppTests/testLiteInterpreter
fi
else
fastlane scan --only_testing TestAppTests/TestAppTests/testFullJIT
fi

View File

@ -190,8 +190,6 @@ jobs:
if [[ ${BUILD_ENVIRONMENT} == *"multigpu"* ]]; then
echo ".jenkins/pytorch/multigpu-test.sh" >> docker_commands.sh
elif [[ ${BUILD_ENVIRONMENT} == *onnx* ]]; then
echo "pip install click mock tabulate networkx==2.0" >> docker_commands.sh
echo "pip -q install --user \"file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx\"" >> docker_commands.sh
echo ".jenkins/caffe2/test.sh" >> docker_commands.sh
else
echo ".jenkins/pytorch/test.sh" >> docker_commands.sh
@ -199,17 +197,6 @@ jobs:
echo "(cat docker_commands.sh | docker exec -u jenkins -i "$id" bash) 2>&1" > command.sh
unbuffer bash command.sh | ts
if [[ ${BUILD_ENVIRONMENT} == *"coverage"* ]]; then
echo "Retrieving C++ coverage report"
docker cp $id:/var/lib/jenkins/workspace/build/coverage.info ./test
fi
if [[ ${BUILD_ENVIRONMENT} == *"coverage"* || ${BUILD_ENVIRONMENT} == *"onnx"* ]]; then
echo "Retrieving Python coverage report"
docker cp $id:/var/lib/jenkins/workspace/test/.coverage ./test
docker cp $id:/var/lib/jenkins/workspace/test/coverage.xml ./test
python3 -mpip install codecov
python3 -mcodecov
fi
- run:
name: Report results
no_output_timeout: "5m"
@ -240,161 +227,3 @@ jobs:
when: always
- store_test_results:
path: test-reports
- store_artifacts:
path: test/.coverage
- store_artifacts:
path: test/coverage.xml
pytorch_windows_build:
<<: *pytorch_windows_params
parameters:
executor:
type: string
default: "windows-xlarge-cpu-with-nvidia-cuda"
build_environment:
type: string
default: ""
test_name:
type: string
default: ""
cuda_version:
type: string
default: "10.1"
python_version:
type: string
default: "3.8"
vs_version:
type: string
default: "16.8.6"
vc_version:
type: string
default: "14.16"
vc_year:
type: string
default: "2019"
vc_product:
type: string
default: "BuildTools"
use_cuda:
type: string
default: ""
executor: <<parameters.executor>>
steps:
- checkout
- run:
name: Install VS2019 toolchain
no_output_timeout: 10m
command: |
powershell .circleci/scripts/vs_install.ps1
- run:
name: Install Cuda
no_output_timeout: 30m
command: |
if [[ "${USE_CUDA}" == "1" ]]; then
.circleci/scripts/windows_cuda_install.sh
fi
- run:
name: Install Cudnn
command : |
if [[ "${USE_CUDA}" == "1" ]]; then
.circleci/scripts/windows_cudnn_install.sh
fi
- run:
name: Build
no_output_timeout: "90m"
command: |
set -e
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}
set -x
.jenkins/pytorch/win-build.sh
- persist_to_workspace:
root: "C:/w"
paths: build-results
- store_artifacts:
path: C:/w/build-results
pytorch_windows_test:
<<: *pytorch_windows_params
parameters:
executor:
type: string
default: "windows-medium-cpu-with-nvidia-cuda"
build_environment:
type: string
default: ""
test_name:
type: string
default: ""
cuda_version:
type: string
default: "10.1"
python_version:
type: string
default: "3.8"
vs_version:
type: string
default: "16.8.6"
vc_version:
type: string
default: "14.16"
vc_year:
type: string
default: "2019"
vc_product:
type: string
default: "BuildTools"
use_cuda:
type: string
default: ""
executor: <<parameters.executor>>
steps:
- checkout
- attach_workspace:
at: c:/users/circleci/workspace
- run:
name: Install VS2019 toolchain
no_output_timeout: 10m
command: |
powershell .circleci/scripts/vs_install.ps1
- run:
name: Install Cuda
no_output_timeout: 30m
command: |
if [[ "${CUDA_VERSION}" != "cpu" ]]; then
if [[ "${CUDA_VERSION}" != "10" || "${JOB_EXECUTOR}" != "windows-with-nvidia-gpu" ]]; then
.circleci/scripts/windows_cuda_install.sh
fi
fi
- run:
name: Install Cudnn
command : |
if [[ "${CUDA_VERSION}" != "cpu" ]]; then
.circleci/scripts/windows_cudnn_install.sh
fi
- run:
name: Test
no_output_timeout: "30m"
command: |
set -e
export IN_CI=1
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}
set -x
.jenkins/pytorch/win-test.sh
- run:
name: Report results
no_output_timeout: "5m"
command: |
set -ex
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}
pip install typing_extensions boto3
python -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
when: always
- store_test_results:
path: test/test-reports
- store_artifacts:
path: test/coverage.xml

View File

@ -1,37 +0,0 @@
# the following clones pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7's tests but enables
# slow tests and sets an environment variable so gradcheck runs with fast_mode=False
slow-gradcheck-scheduled-ci:
triggers:
- schedule:
# runs every 8 hours on the 45th minute
cron: "45 0,8,16 * * *"
filters:
branches:
only:
- master
jobs:
- docker_build_job:
name: "docker-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
image_name: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
- pytorch_linux_build:
name: periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_build
requires:
- "docker-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
build_environment: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-build"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
- pytorch_linux_test:
name: periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_old_gradcheck_test1
requires:
- periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_build
build_environment: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-old-gradcheck-test1"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
use_cuda_docker_runtime: "1"
resource_class: gpu.medium
- pytorch_linux_test:
name: periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_old_gradcheck_test2
requires:
- periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_build
build_environment: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-old-gradcheck-test2"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
use_cuda_docker_runtime: "1"
resource_class: gpu.medium

View File

@ -33,11 +33,12 @@ modernize-*,
-modernize-use-default-member-init,
-modernize-use-using,
-modernize-use-trailing-return-type,
-modernize-use-nodiscard,
performance-*,
-performance-noexcept-move-constructor,
-performance-unnecessary-value-param,
'
HeaderFilterRegex: 'torch/csrc/.*'
HeaderFilterRegex: 'torch/csrc/(?!deploy/interpreter/cpython).*'
AnalyzeTemporaryDtors: false
WarningsAsErrors: '*'
CheckOptions:

View File

@ -16,7 +16,6 @@ per-file-ignores = __init__.py: F401 torch/utils/cpp_extension.py: B950
optional-ascii-coding = True
exclude =
./.git,
./build_code_analyzer,
./build_test_custom_build,
./build,
./caffe2,

View File

@ -1,49 +0,0 @@
---
name: "\U0001F41B Bug Report"
about: Submit a bug report to help us improve PyTorch
---
## 🐛 Bug
<!-- A clear and concise description of what the bug is. -->
## To Reproduce
Steps to reproduce the behavior:
1.
1.
1.
<!-- If you have a code sample, error messages, stack traces, please provide it here as well -->
## Expected behavior
<!-- A clear and concise description of what you expected to happen. -->
## Environment
Please copy and paste the output from our
[environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)
(or fill out the checklist below manually).
You can get the script and run it with:
```
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
```
- PyTorch Version (e.g., 1.0):
- OS (e.g., Linux):
- How you installed PyTorch (`conda`, `pip`, source):
- Build command you used (if compiling from source):
- Python version:
- CUDA/cuDNN version:
- GPU models and configuration:
- Any other relevant information:
## Additional context
<!-- Add any other context about the problem here. -->

56
.github/ISSUE_TEMPLATE/bug-report.yml vendored Normal file
View File

@ -0,0 +1,56 @@
name: 🐛 Bug Report
description: Create a report to help us reproduce and fix the bug
body:
- type: markdown
attributes:
value: >
#### Before submitting a bug, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/pytorch/pytorch/issues?q=is%3Aissue+sort%3Acreated-desc+).
- type: textarea
attributes:
label: 🐛 Describe the bug
description: |
Please provide a clear and concise description of what the bug is.
If relevant, add a minimal example so that we can reproduce the error by running the code. It is very important for the snippet to be as succinct (minimal) as possible, so please take time to trim down any irrelevant code to help us debug efficiently. We are going to copy-paste your code and we expect to get the same result as you did: avoid any external data, and include the relevant imports, etc. For example:
```python
# All necessary imports at the beginning
import torch
# A succinct reproducing example trimmed down to the essential parts:
t = torch.rand(5, 10) # Note: the bug is here, we should pass requires_grad=True
t.sum().backward()
```
If the code is too long (hopefully, it isn't), feel free to put it in a public gist and link it in the issue: https://gist.github.com.
Please also paste or describe the results you observe instead of the expected results. If you observe an error, please paste the error message including the **full** traceback of the exception. It may be relevant to wrap error messages in ```` ```triple quotes blocks``` ````.
placeholder: |
A clear and concise description of what the bug is.
```python
# Sample code to reproduce the problem
```
```
The error message you got, with the full traceback.
```
validations:
required: true
- type: textarea
attributes:
label: Versions
description: |
Please run the following and paste the output below.
```sh
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
```
validations:
required: true
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!

39
.github/ISSUE_TEMPLATE/ci-sev.md vendored Normal file
View File

@ -0,0 +1,39 @@
---
name: "⚠CI SEV"
about: Tracking incidents for PyTorch's CI infra.
---
> NOTE: Remember to label this issue with "`ci: sev`"
## Current Status
*Status could be: preemptive, ongoing, mitigated, closed. Also tell people if they need to take action to fix it (i.e. rebase)*.
## Error looks like
*Provide some way users can tell that this SEV is causing their issue.*
## Incident timeline (all times pacific)
*Include when the incident began, when it was detected, mitigated, root caused, and finally closed.*
<details>
<summary> Click for example </summary>
e.g.
- 10/30 7:27a incident began
- 10/30 8:30a detected by <method>
- 10/30 9:00 pm root caused as…
- 10/30 9:10 pm mitigated by…
- 10/31 10: am closed by…
</details>
## User impact
*How does this affect users of PyTorch CI?*
## Root cause
*What was the root cause of this issue?*
## Mitigation
*How did we mitigate the issue?*
## Prevention/followups
*How do we prevent issues like this in the future?*

5
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@ -0,0 +1,5 @@
blank_issues_enabled: true
contact_links:
- name: Questions
url: https://discuss.pytorch.org/
about: Ask questions and discuss with other pytorch community members

View File

@ -1,9 +0,0 @@
---
name: "\U0001F4DA Documentation"
about: Report an issue related to https://pytorch.org/docs
---
## 📚 Documentation
<!-- A clear and concise description of what content in https://pytorch.org/docs is an issue. If this has to do with the general https://pytorch.org website, please file an issue at https://github.com/pytorch/pytorch.github.io/issues/new/choose instead. If this has to do with https://pytorch.org/tutorials, please file an issue at https://github.com/pytorch/tutorials/issues/new -->

View File

@ -0,0 +1,20 @@
name: 📚 Documentation
description: Report an issue related to https://pytorch.org/docs/stable/index.html
body:
- type: textarea
attributes:
label: 📚 The doc issue
description: >
A clear and concise description of what content in https://pytorch.org/docs/stable/index.html is an issue. If this has to do with the general https://pytorch.org website, please file an issue at https://github.com/pytorch/pytorch.github.io/issues/new/choose instead. If this has to do with https://pytorch.org/tutorials, please file an issue at https://github.com/pytorch/tutorials/issues/new.
validations:
required: true
- type: textarea
attributes:
label: Suggest a potential alternative/fix
description: >
Tell us how we could improve the documentation in this regard.
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!

View File

@ -1,24 +0,0 @@
---
name: "\U0001F680 Feature Request"
about: Submit a proposal/request for a new PyTorch feature
---
## 🚀 Feature
<!-- A clear and concise description of the feature proposal -->
## Motivation
<!-- Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too -->
## Pitch
<!-- A clear and concise description of what you want to happen. -->
## Alternatives
<!-- A clear and concise description of any alternative solutions or features you've considered, if any. -->
## Additional context
<!-- Add any other context or screenshots about the feature request here. -->

View File

@ -0,0 +1,25 @@
name: 🚀 Feature request
description: Submit a proposal/request for a new pytorch feature
body:
- type: textarea
attributes:
label: 🚀 The feature, motivation and pitch
description: >
A clear and concise description of the feature proposal. Please outline the motivation for the proposal. Is your feature request related to a specific problem? e.g., *"I'm working on X and would like Y to be possible"*. If this is related to another GitHub issue, please link here too.
validations:
required: true
- type: textarea
attributes:
label: Alternatives
description: >
A description of any alternative solutions or features you've considered, if any.
- type: textarea
attributes:
label: Additional context
description: >
Add any other context or screenshots about the feature request.
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!

View File

@ -1,13 +0,0 @@
---
name: "❓Questions/Help/Support"
about: Do you need support? We have resources.
---
## ❓ Questions and Help
### Please note that this issue tracker is not a help form and this issue will be closed.
We have a set of [listed resources available on the website](https://pytorch.org/resources). Our primary means of support is our discussion forum:
- [Discussion Forum](https://discuss.pytorch.org/)

View File

@ -1 +1 @@
Fixes #{issue number}
Fixes #ISSUE_NUMBER

View File

@ -1,5 +1,6 @@
self-hosted-runner:
labels:
- linux.large
- linux.2xlarge
- linux.8xlarge.nvidia.gpu
- linux.16xlarge.nvidia.gpu

254
.github/generated-ciflow-ruleset.json generated vendored
View File

@ -2,100 +2,236 @@
"__comment": "@generated DO NOT EDIT MANUALLY, Generation script: .github/scripts/generate_ci_workflows.py",
"label_rules": {
"ciflow/all": [
"libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
"libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
"caffe2-linux-xenial-py3.7-gcc5.4",
"docker-builds",
"ios-12-5-1-arm64",
"ios-12-5-1-arm64-coreml",
"ios-12-5-1-arm64-custom-ops",
"ios-12-5-1-arm64-full-jit",
"ios-12-5-1-arm64-metal",
"ios-12-5-1-x86-64",
"ios-12-5-1-x86-64-coreml",
"ios-12-5-1-x86-64-full-jit",
"libtorch-linux-xenial-cuda10.2-py3.7-gcc7",
"libtorch-linux-xenial-cuda11.3-py3.7-gcc7",
"linux-bionic-cuda10.2-py3.9-gcc7",
"linux-bionic-py3.6-clang9",
"linux-bionic-py3.8-gcc9-coverage",
"linux-xenial-cuda10.2-py3.6-gcc7",
"linux-xenial-cuda11.3-py3.6-gcc7",
"linux-xenial-py3.6-gcc5.4",
"linux-xenial-py3.6-gcc7-bazel-test",
"parallelnative-linux-xenial-py3.6-gcc5.4",
"periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-linux-xenial-cuda11.1-py3.6-gcc7",
"linux-bionic-py3.7-clang9",
"linux-docs",
"linux-docs-push",
"linux-vulkan-bionic-py3.7-clang9",
"linux-xenial-cuda11.3-py3.7-gcc7",
"linux-xenial-cuda11.3-py3.7-gcc7-bazel-test",
"linux-xenial-py3-clang5-mobile-build",
"linux-xenial-py3-clang5-mobile-custom-build-static",
"linux-xenial-py3.7-clang7-asan",
"linux-xenial-py3.7-clang7-onnx",
"linux-xenial-py3.7-gcc5.4",
"linux-xenial-py3.7-gcc7",
"macos-10-15-py3-arm64",
"macos-10-15-py3-lite-interpreter-x86-64",
"macos-11-py3-x86-64",
"parallelnative-linux-xenial-py3.7-gcc5.4",
"periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7",
"periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7",
"periodic-linux-bionic-cuda11.5-py3.7-gcc7",
"periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck",
"periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug",
"periodic-win-vs2019-cuda11.1-py3",
"puretorch-linux-xenial-py3.6-gcc5.4",
"periodic-win-vs2019-cuda11.5-py3",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit",
"win-vs2019-cpu-py3",
"win-vs2019-cuda10.2-py3",
"win-vs2019-cuda11.3-py3"
],
"ciflow/bazel": [
"linux-xenial-py3.6-gcc7-bazel-test"
"ciflow/android": [
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit"
],
"ciflow/coverage": [
"linux-bionic-py3.8-gcc9-coverage"
"ciflow/bazel": [
"linux-xenial-cuda11.3-py3.7-gcc7-bazel-test"
],
"ciflow/cpu": [
"linux-bionic-py3.6-clang9",
"linux-bionic-py3.8-gcc9-coverage",
"linux-xenial-py3.6-gcc5.4",
"linux-xenial-py3.6-gcc7-bazel-test",
"parallelnative-linux-xenial-py3.6-gcc5.4",
"puretorch-linux-xenial-py3.6-gcc5.4",
"caffe2-linux-xenial-py3.7-gcc5.4",
"linux-bionic-py3.7-clang9",
"linux-docs",
"linux-docs-push",
"linux-vulkan-bionic-py3.7-clang9",
"linux-xenial-cuda11.3-py3.7-gcc7-bazel-test",
"linux-xenial-py3.7-clang7-asan",
"linux-xenial-py3.7-clang7-onnx",
"linux-xenial-py3.7-gcc5.4",
"linux-xenial-py3.7-gcc7",
"parallelnative-linux-xenial-py3.7-gcc5.4",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit",
"win-vs2019-cpu-py3"
],
"ciflow/cuda": [
"libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
"libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
"libtorch-linux-xenial-cuda10.2-py3.7-gcc7",
"libtorch-linux-xenial-cuda11.3-py3.7-gcc7",
"linux-bionic-cuda10.2-py3.9-gcc7",
"linux-xenial-cuda10.2-py3.6-gcc7",
"linux-xenial-cuda11.3-py3.6-gcc7",
"periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-linux-xenial-cuda11.1-py3.6-gcc7",
"linux-xenial-cuda11.3-py3.7-gcc7",
"periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7",
"periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7",
"periodic-linux-bionic-cuda11.5-py3.7-gcc7",
"periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck",
"periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug",
"periodic-win-vs2019-cuda11.1-py3",
"win-vs2019-cuda10.2-py3",
"periodic-win-vs2019-cuda11.5-py3",
"win-vs2019-cuda11.3-py3"
],
"ciflow/default": [
"linux-bionic-py3.6-clang9",
"linux-bionic-py3.8-gcc9-coverage",
"linux-xenial-cuda11.3-py3.6-gcc7",
"linux-xenial-py3.6-gcc5.4",
"linux-xenial-py3.6-gcc7-bazel-test",
"linux-bionic-py3.7-clang9",
"linux-docs",
"linux-vulkan-bionic-py3.7-clang9",
"linux-xenial-cuda11.3-py3.7-gcc7",
"linux-xenial-cuda11.3-py3.7-gcc7-bazel-test",
"linux-xenial-py3-clang5-mobile-build",
"linux-xenial-py3-clang5-mobile-custom-build-static",
"linux-xenial-py3.7-clang7-asan",
"linux-xenial-py3.7-clang7-onnx",
"linux-xenial-py3.7-gcc5.4",
"linux-xenial-py3.7-gcc7",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit",
"win-vs2019-cpu-py3",
"win-vs2019-cuda11.3-py3"
],
"ciflow/docs": [
"linux-docs"
],
"ciflow/ios": [
"ios-12-5-1-arm64",
"ios-12-5-1-arm64-coreml",
"ios-12-5-1-arm64-custom-ops",
"ios-12-5-1-arm64-full-jit",
"ios-12-5-1-arm64-metal",
"ios-12-5-1-x86-64",
"ios-12-5-1-x86-64-coreml",
"ios-12-5-1-x86-64-full-jit"
],
"ciflow/libtorch": [
"libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
"libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
"periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7"
"libtorch-linux-xenial-cuda10.2-py3.7-gcc7",
"libtorch-linux-xenial-cuda11.3-py3.7-gcc7",
"periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7",
"periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7"
],
"ciflow/linux": [
"libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
"libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
"caffe2-linux-xenial-py3.7-gcc5.4",
"libtorch-linux-xenial-cuda10.2-py3.7-gcc7",
"libtorch-linux-xenial-cuda11.3-py3.7-gcc7",
"linux-bionic-cuda10.2-py3.9-gcc7",
"linux-bionic-py3.6-clang9",
"linux-bionic-py3.8-gcc9-coverage",
"linux-xenial-cuda10.2-py3.6-gcc7",
"linux-xenial-cuda11.3-py3.6-gcc7",
"linux-xenial-py3.6-gcc5.4",
"linux-xenial-py3.6-gcc7-bazel-test",
"parallelnative-linux-xenial-py3.6-gcc5.4",
"periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-linux-xenial-cuda11.1-py3.6-gcc7",
"puretorch-linux-xenial-py3.6-gcc5.4"
"linux-bionic-py3.7-clang9",
"linux-docs",
"linux-docs-push",
"linux-vulkan-bionic-py3.7-clang9",
"linux-xenial-cuda11.3-py3.7-gcc7",
"linux-xenial-cuda11.3-py3.7-gcc7-bazel-test",
"linux-xenial-py3-clang5-mobile-build",
"linux-xenial-py3-clang5-mobile-custom-build-static",
"linux-xenial-py3.7-clang7-asan",
"linux-xenial-py3.7-clang7-onnx",
"linux-xenial-py3.7-gcc5.4",
"linux-xenial-py3.7-gcc7",
"parallelnative-linux-xenial-py3.7-gcc5.4",
"periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7",
"periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7",
"periodic-linux-bionic-cuda11.5-py3.7-gcc7",
"periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck",
"periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit"
],
"ciflow/macos": [
"ios-12-5-1-arm64",
"ios-12-5-1-arm64-coreml",
"ios-12-5-1-arm64-custom-ops",
"ios-12-5-1-arm64-full-jit",
"ios-12-5-1-arm64-metal",
"ios-12-5-1-x86-64",
"ios-12-5-1-x86-64-coreml",
"ios-12-5-1-x86-64-full-jit",
"macos-10-15-py3-arm64",
"macos-10-15-py3-lite-interpreter-x86-64",
"macos-11-py3-x86-64"
],
"ciflow/mobile": [
"linux-xenial-py3-clang5-mobile-build",
"linux-xenial-py3-clang5-mobile-custom-build-static"
],
"ciflow/noarch": [
"linux-bionic-py3.6-clang9"
"linux-bionic-py3.7-clang9"
],
"ciflow/onnx": [
"linux-xenial-py3.7-clang7-onnx"
],
"ciflow/sanitizers": [
"linux-xenial-py3.7-clang7-asan"
],
"ciflow/scheduled": [
"periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-win-vs2019-cuda11.1-py3"
"linux-docs-push",
"periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7",
"periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7",
"periodic-linux-bionic-cuda11.5-py3.7-gcc7",
"periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck",
"periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug",
"periodic-win-vs2019-cuda11.1-py3",
"periodic-win-vs2019-cuda11.5-py3"
],
"ciflow/slow": [
"linux-bionic-cuda10.2-py3.9-gcc7",
"linux-xenial-cuda10.2-py3.6-gcc7"
"periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck"
],
"ciflow/slow-gradcheck": [
"periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck"
],
"ciflow/trunk": [
"caffe2-linux-xenial-py3.7-gcc5.4",
"docker-builds",
"ios-12-5-1-arm64",
"ios-12-5-1-arm64-coreml",
"ios-12-5-1-arm64-custom-ops",
"ios-12-5-1-arm64-full-jit",
"ios-12-5-1-arm64-metal",
"ios-12-5-1-x86-64",
"ios-12-5-1-x86-64-coreml",
"ios-12-5-1-x86-64-full-jit",
"libtorch-linux-xenial-cuda10.2-py3.7-gcc7",
"libtorch-linux-xenial-cuda11.3-py3.7-gcc7",
"linux-bionic-cuda10.2-py3.9-gcc7",
"linux-bionic-py3.7-clang9",
"linux-docs",
"linux-vulkan-bionic-py3.7-clang9",
"linux-xenial-cuda11.3-py3.7-gcc7",
"linux-xenial-cuda11.3-py3.7-gcc7-bazel-test",
"linux-xenial-py3-clang5-mobile-build",
"linux-xenial-py3-clang5-mobile-custom-build-static",
"linux-xenial-py3.7-clang7-asan",
"linux-xenial-py3.7-clang7-onnx",
"linux-xenial-py3.7-gcc5.4",
"linux-xenial-py3.7-gcc7",
"macos-10-15-py3-arm64",
"macos-10-15-py3-lite-interpreter-x86-64",
"macos-11-py3-x86-64",
"parallelnative-linux-xenial-py3.7-gcc5.4",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit",
"win-vs2019-cpu-py3",
"win-vs2019-cuda11.3-py3"
],
"ciflow/vulkan": [
"linux-vulkan-bionic-py3.7-clang9"
],
"ciflow/win": [
"periodic-win-vs2019-cuda11.1-py3",
"periodic-win-vs2019-cuda11.5-py3",
"win-vs2019-cpu-py3",
"win-vs2019-cuda10.2-py3",
"win-vs2019-cuda11.3-py3"
],
"ciflow/xla": [
"linux-bionic-py3.6-clang9"
]
},
"version": "v1"

View File

@ -15,23 +15,42 @@
# os: linux
# max_available: 20
# disk_size: 50
# is_ephemeral: true
runner_types:
# mainly used for ciflow-should-run, not made to run any serious tests
linux.large:
instance_type: c5.large
os: linux
disk_size: 10
is_ephemeral: false
linux.2xlarge:
instance_type: c5.2xlarge
os: linux
max_available: 500
disk_size: 150
linux.4xlarge:
instance_type: c5.4xlarge
os: linux
disk_size: 150
linux.8xlarge.nvidia.gpu:
instance_type: g3.8xlarge
os: linux
max_available: 50
max_available: 125
disk_size: 150
is_ephemeral: false
linux.4xlarge.nvidia.gpu:
instance_type: g3.4xlarge
os: linux
max_available: 125
disk_size: 150
is_ephemeral: false
linux.16xlarge.nvidia.gpu:
instance_type: g3.16xlarge
os: linux
max_available: 10
disk_size: 150
is_ephemeral: false
windows.4xlarge:
instance_type: c5d.4xlarge
os: windows
@ -40,5 +59,5 @@ runner_types:
windows.8xlarge.nvidia.gpu:
instance_type: p3.2xlarge
os: windows
max_available: 25
max_available: 50
disk_size: 256

View File

@ -46,11 +46,20 @@ if __name__ == "__main__":
"group": concurrency_key(filename),
"cancel-in-progress": True,
}
if data.get("concurrency", None) != expected:
actual = data.get("concurrency", None)
if actual != expected:
print(
f"'concurrency' incorrect or not found in '{filename.relative_to(REPO_ROOT)}'",
file=sys.stderr,
)
print(
f"expected: {expected}",
file=sys.stderr,
)
print(
f"actual: {actual}",
file=sys.stderr,
)
errors_found = True
if errors_found:

71
.github/scripts/export_pytorch_labels.py vendored Executable file
View File

@ -0,0 +1,71 @@
#!/usr/bin/env python3
'''
Test ownership was introduced in https://github.com/pytorch/pytorch/issues/66232.
As a part of enforcing test ownership, we want to maintain a list of existing PyTorch labels
to verify the owners' existence. This script outputs a file containing a list of existing
pytorch/pytorch labels so that the file could be uploaded to S3.
This script assumes the correct env vars are set for AWS permissions.
'''
import boto3 # type: ignore[import]
import json
from functools import lru_cache
from typing import List, Any
from urllib.request import urlopen, Request
# Modified from https://github.com/pytorch/pytorch/blob/b00206d4737d1f1e7a442c9f8a1cadccd272a386/torch/hub.py#L129
def _read_url(url: Any) -> Any:
with urlopen(url) as r:
return r.headers, r.read().decode(r.headers.get_content_charset('utf-8'))
def request_for_labels(url: str) -> Any:
headers = {'Accept': 'application/vnd.github.v3+json'}
return _read_url(Request(url, headers=headers))
def get_last_page(header: Any) -> int:
# Link info looks like: <https://api.github.com/repositories/65600975/labels?per_page=100&page=2>;
# rel="next", <https://api.github.com/repositories/65600975/labels?per_page=100&page=3>; rel="last"
link_info = header['link']
prefix = "&page="
suffix = ">;"
return int(link_info[link_info.rindex(prefix) + len(prefix):link_info.rindex(suffix)])
def update_labels(labels: List[str], info: str) -> None:
labels_json = json.loads(info)
labels.extend([x["name"] for x in labels_json])
@lru_cache()
def get_pytorch_labels() -> List[str]:
prefix = "https://api.github.com/repos/pytorch/pytorch/labels?per_page=100"
header, info = request_for_labels(prefix + "&page=1")
labels: List[str] = []
update_labels(labels, info)
last_page = get_last_page(header)
assert last_page > 0, "Error reading header info to determine total number of pages of labels"
for page_number in range(2, last_page + 1): # skip page 1
_, info = request_for_labels(prefix + f"&page={page_number}")
update_labels(labels, info)
return labels
def send_labels_to_S3(labels: List[str]) -> None:
labels_file_name = "pytorch_labels.json"
obj = boto3.resource('s3').Object('ossci-metrics', labels_file_name)
obj.put(Body=json.dumps(labels).encode())
def main() -> None:
send_labels_to_S3(get_pytorch_labels())
if __name__ == '__main__':
main()

View File

@ -72,7 +72,6 @@ LIBTORCH_CONTAINER_IMAGES = {
}
FULL_PYTHON_VERSIONS = [
"3.6",
"3.7",
"3.8",
"3.9",

View File

@ -2,7 +2,7 @@
from dataclasses import asdict, dataclass, field
from pathlib import Path
from typing import Dict, Set
from typing import Dict, Set, List, Iterable
import jinja2
import json
@ -11,12 +11,13 @@ import sys
from typing_extensions import Literal
YamlShellBool = Literal["''", 1]
Arch = Literal["windows", "linux"]
Arch = Literal["windows", "linux", "macos"]
DOCKER_REGISTRY = "308535385114.dkr.ecr.us-east-1.amazonaws.com"
GITHUB_DIR = Path(__file__).resolve().parent.parent
WINDOWS_CPU_TEST_RUNNER = "windows.4xlarge"
# contains 1 gpu
WINDOWS_CUDA_TEST_RUNNER = "windows.8xlarge.nvidia.gpu"
WINDOWS_RUNNERS = {
WINDOWS_CPU_TEST_RUNNER,
@ -24,12 +25,21 @@ WINDOWS_RUNNERS = {
}
LINUX_CPU_TEST_RUNNER = "linux.2xlarge"
LINUX_CUDA_TEST_RUNNER = "linux.8xlarge.nvidia.gpu"
# contains 1 gpu
LINUX_CUDA_TEST_RUNNER = "linux.4xlarge.nvidia.gpu"
LINUX_RUNNERS = {
LINUX_CPU_TEST_RUNNER,
LINUX_CUDA_TEST_RUNNER,
}
MACOS_TEST_RUNNER_10_15 = "macos-10.15"
MACOS_TEST_RUNNER_11 = "macos-11"
MACOS_RUNNERS = {
MACOS_TEST_RUNNER_10_15,
MACOS_TEST_RUNNER_11,
}
CUDA_RUNNERS = {
WINDOWS_CUDA_TEST_RUNNER,
LINUX_CUDA_TEST_RUNNER,
@ -41,60 +51,71 @@ CPU_RUNNERS = {
LABEL_CIFLOW_ALL = "ciflow/all"
LABEL_CIFLOW_BAZEL = "ciflow/bazel"
LABEL_CIFLOW_COVERAGE = "ciflow/coverage"
LABEL_CIFLOW_CPU = "ciflow/cpu"
LABEL_CIFLOW_CUDA = "ciflow/cuda"
LABEL_CIFLOW_DOCS = "ciflow/docs"
LABEL_CIFLOW_DEFAULT = "ciflow/default"
LABEL_CIFLOW_LIBTORCH = "ciflow/libtorch"
LABEL_CIFLOW_LINUX = "ciflow/linux"
LABEL_CIFLOW_MOBILE = "ciflow/mobile"
LABEL_CIFLOW_ANDROID = "ciflow/android"
LABEL_CIFLOW_SANITIZERS = "ciflow/sanitizers"
LABEL_CIFLOW_ONNX = "ciflow/onnx"
LABEL_CIFLOW_SCHEDULED = "ciflow/scheduled"
LABEL_CIFLOW_SLOW = "ciflow/slow"
LABEL_CIFLOW_WIN = "ciflow/win"
LABEL_CIFLOW_XLA = "ciflow/xla"
LABEL_CIFLOW_NOARCH = "ciflow/noarch"
LABEL_CIFLOW_VULKAN = "ciflow/vulkan"
LABEL_CIFLOW_PREFIX = "ciflow/"
LABEL_CIFLOW_SLOW_GRADCHECK = "ciflow/slow-gradcheck"
LABEL_CIFLOW_DOCKER = "ciflow/docker"
LABEL_CIFLOW_IOS = "ciflow/ios"
LABEL_CIFLOW_MACOS = "ciflow/macos"
LABEL_CIFLOW_TRUNK = "ciflow/trunk"
@dataclass
class CIFlowConfig:
enabled: bool = False
# For use to enable workflows to run on pytorch/pytorch-canary
run_on_canary: bool = False
labels: Set[str] = field(default_factory=set)
trigger_action: str = 'unassigned'
trigger_actor: str = 'pytorchbot'
root_job_name: str = 'ciflow_should_run'
root_job_condition: str = ''
# trigger_action_only controls if we listen only on the trigger_action of a pull_request.
# If it's False, we listen on all default pull_request actions, this is useful when
# ciflow (via probot) is not automated yet.
trigger_action_only: bool = False
label_conditions: str = ''
def gen_root_job_condition(self) -> None:
# TODO: Make conditions strict
# At the beginning of the rollout of ciflow, we keep everything the same as what we have
# Once fully rollout, we can have strict constraints
# e.g. ADD env.GITHUB_ACTOR == '{self.trigger_actor}
# REMOVE github.event.action !='{self.trigger_action}'
label_conditions = [
f"contains(github.event.pull_request.labels.*.name, '{label}')" for label in sorted(self.labels)]
if self.run_on_canary:
self.root_job_condition = "(github.repository_owner == 'pytorch') && "
# CIFlow conditions:
# - Workflow should always run on push
# - CIFLOW_DEFAULT workflows should run on PRs even if no `ciflow/` labels on PR
# - Otherwise workflow should be scheduled on all qualifying events
label_conditions = [f"contains(github.event.pull_request.labels.*.name, '{label}')" for label in sorted(self.labels)]
self.label_conditions = ' || '.join(label_conditions)
repo_condition = "github.repository_owner == 'pytorch'" if self.run_on_canary else "github.repository == 'pytorch/pytorch'"
push_event = "github.event_name == 'push'"
scheduled_event = "github.event_name == 'schedule'"
pr_updated_event = f"github.event_name == 'pull_request' && github.event.action != '{self.trigger_action}'"
if LABEL_CIFLOW_DEFAULT in self.labels:
run_with_no_labels = f"({pr_updated_event}) && " \
f"!contains(join(github.event.pull_request.labels.*.name), '{LABEL_CIFLOW_PREFIX}')"
else:
self.root_job_condition = "(github.repository == 'pytorch/pytorch') && "
self.root_job_condition += f"((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || " \
f"(github.event.action !='{self.trigger_action}') || " \
f"({' || '.join(label_conditions)}))"
run_with_no_labels = "false"
self.root_job_condition = f"${{{{ ({repo_condition}) && (\n" \
f" ({push_event}) ||\n" \
f" ({scheduled_event}) ||\n" \
f" ({self.label_conditions}) ||\n" \
f" ({run_with_no_labels}))\n"\
f" }}}}"
def reset_root_job(self) -> None:
self.root_job_name = ''
self.root_job_condition = ''
def __post_init__(self) -> None:
if not self.enabled:
self.reset_root_job()
return
self.labels.add(LABEL_CIFLOW_ALL)
if LABEL_CIFLOW_SCHEDULED not in self.labels:
self.labels.add(LABEL_CIFLOW_TRUNK)
assert all(label.startswith(LABEL_CIFLOW_PREFIX) for label in self.labels)
self.gen_root_job_condition()
@ -131,28 +152,30 @@ class CIWorkflow:
# Required fields
arch: Arch
build_environment: str
test_runner_type: str
# Optional fields
test_runner_type: str = ''
ciflow_config: CIFlowConfig = field(default_factory=CIFlowConfig)
cuda_version: str = ''
docker_image_base: str = ''
enable_doc_jobs: bool = False
exclude_test: bool = False
is_coverage: bool = False
is_libtorch: bool = False
build_generates_artifacts: bool = True
build_with_debug: bool = False
is_scheduled: str = ''
num_test_shards: int = 1
on_pull_request: bool = False
only_build_on_pull_request: bool = False
only_run_smoke_tests_on_pull_request: bool = False
num_test_shards_on_pull_request: int = -1
distributed_test: bool = True
fx2trt_test: bool = False
timeout_after: int = 240
xcode_version: str = ''
# The following variables will be set as environment variables,
# so it's easier for both shell and Python scripts to consume it if false is represented as the empty string.
enable_jit_legacy_test: YamlShellBool = "''"
enable_distributed_test: YamlShellBool = "''"
enable_fx2trt_test: YamlShellBool = "''"
enable_multigpu_test: YamlShellBool = "''"
enable_nogpu_no_avx_test: YamlShellBool = "''"
enable_nogpu_no_avx2_test: YamlShellBool = "''"
@ -161,23 +184,24 @@ class CIWorkflow:
enable_backwards_compat_test: YamlShellBool = "''"
enable_xla_test: YamlShellBool = "''"
enable_noarch_test: YamlShellBool = "''"
enable_force_on_cpu_test: YamlShellBool = "''"
def __post_init__(self) -> None:
if self.is_libtorch:
if not self.build_generates_artifacts:
self.exclude_test = True
if not self.on_pull_request:
self.only_build_on_pull_request = False
if self.distributed_test:
self.enable_distributed_test = 1
if self.fx2trt_test:
self.enable_fx2trt_test = 1
# If num_test_shards_on_pull_request is not user-defined, default to num_test_shards unless we are
# only running smoke tests on the pull request.
if self.num_test_shards_on_pull_request == -1:
# Don't waste resources on runner spinup and cooldown for another shard if we are only running a few tests
# Don't run the default if we are only running smoke tests
if self.only_run_smoke_tests_on_pull_request:
self.num_test_shards_on_pull_request = 1
self.num_test_shards_on_pull_request = 0
else:
self.num_test_shards_on_pull_request = self.num_test_shards
self.assert_valid()
@ -189,20 +213,27 @@ class CIWorkflow:
if self.arch == 'windows':
assert self.test_runner_type in WINDOWS_RUNNERS, err_message
if self.ciflow_config.enabled:
# make sure if LABEL_CIFLOW_DEFAULT is set, we then need to set trigger_action_only to False
assert self.ciflow_config.trigger_action_only != (LABEL_CIFLOW_DEFAULT in self.ciflow_config.labels)
assert self.on_pull_request
assert LABEL_CIFLOW_ALL in self.ciflow_config.labels
assert LABEL_CIFLOW_ALL in self.ciflow_config.root_job_condition
if self.arch == 'linux':
assert LABEL_CIFLOW_LINUX in self.ciflow_config.labels
if self.arch == 'windows':
assert LABEL_CIFLOW_WIN in self.ciflow_config.labels
if self.test_runner_type in CUDA_RUNNERS:
assert LABEL_CIFLOW_CUDA in self.ciflow_config.labels
if self.test_runner_type in CPU_RUNNERS:
assert LABEL_CIFLOW_CPU in self.ciflow_config.labels
assert LABEL_CIFLOW_ALL in self.ciflow_config.labels
assert LABEL_CIFLOW_ALL in self.ciflow_config.label_conditions
if self.arch == 'linux':
assert LABEL_CIFLOW_LINUX in self.ciflow_config.labels
if self.arch == 'windows':
assert LABEL_CIFLOW_WIN in self.ciflow_config.labels
if self.arch == 'macos':
assert LABEL_CIFLOW_MACOS in self.ciflow_config.labels
# Make sure that jobs with tests have a test_runner_type
if not self.exclude_test:
assert self.test_runner_type != ''
if self.test_runner_type in CUDA_RUNNERS:
assert LABEL_CIFLOW_CUDA in self.ciflow_config.labels
if self.test_runner_type in CPU_RUNNERS and not self.exclude_test:
assert LABEL_CIFLOW_CPU in self.ciflow_config.labels
if self.is_scheduled:
assert LABEL_CIFLOW_DEFAULT not in self.ciflow_config.labels
assert LABEL_CIFLOW_TRUNK not in self.ciflow_config.labels
assert LABEL_CIFLOW_SCHEDULED in self.ciflow_config.labels
if self.build_with_debug:
assert self.build_environment.endswith("-debug")
def generate_workflow_file(self, workflow_template: jinja2.Template) -> None:
output_file_path = GITHUB_DIR / f"workflows/generated-{self.build_environment}.yml"
@ -219,6 +250,30 @@ class CIWorkflow:
output_file.write("\n")
print(output_file_path)
@dataclass
class DockerWorkflow:
build_environment: str
docker_images: List[str]
# Optional fields
ciflow_config: CIFlowConfig = field(default_factory=CIFlowConfig)
cuda_version: str = ''
is_scheduled: str = ''
def generate_workflow_file(self, workflow_template: jinja2.Template) -> None:
output_file_path = GITHUB_DIR / "workflows/generated-docker-builds.yml"
with open(output_file_path, "w") as output_file:
GENERATED = "generated" # Note that please keep the variable GENERATED otherwise phabricator will hide the whole file
output_file.writelines([f"# @{GENERATED} DO NOT EDIT MANUALLY\n"])
try:
content = workflow_template.render(asdict(self))
except Exception as e:
print(f"Failed on template: {workflow_template}", file=sys.stderr)
raise e
output_file.write(content)
if content[-1] != "\n":
output_file.write("\n")
print(output_file_path)
WINDOWS_WORKFLOWS = [
CIWorkflow(
@ -226,41 +281,38 @@ WINDOWS_WORKFLOWS = [
build_environment="win-vs2019-cpu-py3",
cuda_version="cpu",
test_runner_type=WINDOWS_CPU_TEST_RUNNER,
on_pull_request=True,
num_test_shards=2,
ciflow_config=CIFlowConfig(
enabled=True,
run_on_canary=True,
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_CPU, LABEL_CIFLOW_WIN}
),
),
CIWorkflow(
arch="windows",
build_environment="win-vs2019-cuda10.2-py3",
cuda_version="10.2",
test_runner_type=WINDOWS_CUDA_TEST_RUNNER,
on_pull_request=True,
num_test_shards=2,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_CUDA, LABEL_CIFLOW_WIN}
),
),
CIWorkflow(
arch="windows",
build_environment="win-vs2019-cuda11.3-py3",
cuda_version="11.3",
test_runner_type=WINDOWS_CUDA_TEST_RUNNER,
num_test_shards=2,
on_pull_request=True,
only_run_smoke_tests_on_pull_request=True,
enable_force_on_cpu_test=1,
ciflow_config=CIFlowConfig(
enabled=True,
run_on_canary=True,
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_CUDA, LABEL_CIFLOW_WIN}
),
),
CIWorkflow(
arch="windows",
build_environment="periodic-win-vs2019-cuda11.5-py3",
cuda_version="11.5",
test_runner_type=WINDOWS_CUDA_TEST_RUNNER,
num_test_shards=2,
enable_force_on_cpu_test=1,
is_scheduled="45 4,10,16,22 * * *",
ciflow_config=CIFlowConfig(
run_on_canary=True,
labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_CUDA, LABEL_CIFLOW_WIN}
),
),
CIWorkflow(
arch="windows",
build_environment="periodic-win-vs2019-cuda11.1-py3",
@ -268,10 +320,7 @@ WINDOWS_WORKFLOWS = [
test_runner_type=WINDOWS_CUDA_TEST_RUNNER,
num_test_shards=2,
is_scheduled="45 0,4,8,12,16,20 * * *",
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_WIN, LABEL_CIFLOW_CUDA}
),
),
@ -280,17 +329,50 @@ WINDOWS_WORKFLOWS = [
LINUX_WORKFLOWS = [
CIWorkflow(
arch="linux",
build_environment="linux-xenial-py3.6-gcc5.4",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
build_environment="linux-xenial-py3.7-gcc5.4",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc5.4",
test_runner_type=LINUX_CPU_TEST_RUNNER,
on_pull_request=True,
enable_jit_legacy_test=1,
enable_doc_jobs=True,
enable_docs_test=1,
enable_backwards_compat_test=1,
enable_docs_test=1,
num_test_shards=2,
ciflow_config=CIFlowConfig(
run_on_canary=True,
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU}
),
),
CIWorkflow(
arch="linux",
build_environment="linux-docs",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc5.4",
test_runner_type=LINUX_CPU_TEST_RUNNER,
enable_doc_jobs=True,
exclude_test=True,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_DOCS, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU}
),
),
CIWorkflow(
arch="linux",
build_environment="linux-docs-push",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc5.4",
test_runner_type=LINUX_CPU_TEST_RUNNER,
enable_doc_jobs=True,
exclude_test=True,
is_scheduled="0 0 * * *", # run pushes only on a nightly schedule
# NOTE: This is purposefully left without LABEL_CIFLOW_DOCS so that you can run
# docs builds on your PR without the fear of anything pushing
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU}
),
),
CIWorkflow(
arch="linux",
build_environment="linux-xenial-py3.7-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc7",
test_runner_type=LINUX_CPU_TEST_RUNNER,
num_test_shards=2,
ciflow_config=CIFlowConfig(
enabled=True,
run_on_canary=True,
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU}
),
@ -301,257 +383,390 @@ LINUX_WORKFLOWS = [
# build_environment="paralleltbb-linux-xenial-py3.6-gcc5.4",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# # This is a master only job despite on_pull_request is set to True
# on_pull_request=True,
# ciflow_config=CIFlowConfig(
# enabled=True,
# trigger_action_only=True,
# labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},
# ),
# ),
CIWorkflow(
arch="linux",
build_environment="parallelnative-linux-xenial-py3.6-gcc5.4",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
build_environment="parallelnative-linux-xenial-py3.7-gcc5.4",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc5.4",
test_runner_type=LINUX_CPU_TEST_RUNNER,
# This is a master only job despite on_pull_request is set to True
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},
),
),
# Build PyTorch with BUILD_CAFFE2=OFF
# Build PyTorch with BUILD_CAFFE2=ON
CIWorkflow(
arch="linux",
build_environment="puretorch-linux-xenial-py3.6-gcc5.4",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
build_environment="caffe2-linux-xenial-py3.7-gcc5.4",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc5.4",
test_runner_type=LINUX_CPU_TEST_RUNNER,
exclude_test=True,
# This is a master only job despite on_pull_request is set to True
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},
),
),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-gcc7",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc7",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-asan",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-asan",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang7-onnx",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang7-onnx",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
CIWorkflow(
arch="linux",
build_environment="linux-xenial-py3-clang5-mobile-build",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-asan",
test_runner_type=LINUX_CPU_TEST_RUNNER,
build_generates_artifacts=False,
exclude_test=True,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_MOBILE, LABEL_CIFLOW_DEFAULT},
),
),
CIWorkflow(
arch="linux",
build_environment="linux-xenial-py3-clang5-mobile-custom-build-static",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
test_runner_type=LINUX_CPU_TEST_RUNNER,
build_generates_artifacts=False,
exclude_test=True,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_MOBILE, LABEL_CIFLOW_DEFAULT},
),
),
CIWorkflow(
arch="linux",
build_environment="linux-xenial-py3.7-clang7-asan",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang7-asan",
test_runner_type=LINUX_CPU_TEST_RUNNER,
num_test_shards=3,
distributed_test=False,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_SANITIZERS, LABEL_CIFLOW_CPU},
),
),
CIWorkflow(
arch="linux",
build_environment="linux-xenial-py3.7-clang7-onnx",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang7-onnx",
test_runner_type=LINUX_CPU_TEST_RUNNER,
num_test_shards=2,
distributed_test=False,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_ONNX, LABEL_CIFLOW_CPU},
),
),
CIWorkflow(
arch="linux",
build_environment="linux-bionic-cuda10.2-py3.9-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
num_test_shards=2,
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
run_on_canary=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_SLOW, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA}
),
),
CIWorkflow(
arch="linux",
build_environment="linux-xenial-cuda10.2-py3.6-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
enable_jit_legacy_test=1,
enable_multigpu_test=1,
enable_nogpu_no_avx_test=1,
enable_nogpu_no_avx2_test=1,
enable_slow_test=1,
num_test_shards=2,
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels=set([LABEL_CIFLOW_SLOW, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),
run_on_canary=True,
labels={LABEL_CIFLOW_SLOW, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA}
),
),
CIWorkflow(
arch="linux",
build_environment="libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
build_environment="libtorch-linux-xenial-cuda10.2-py3.7-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
is_libtorch=True,
on_pull_request=True,
build_generates_artifacts=False,
exclude_test=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels=set([LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),
),
),
CIWorkflow(
arch="linux",
build_environment="linux-xenial-cuda11.3-py3.6-gcc7",
build_environment="periodic-linux-bionic-cuda11.5-py3.7-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
num_test_shards=2,
is_scheduled="45 4,10,16,22 * * *",
ciflow_config=CIFlowConfig(
labels=set([LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),
),
),
CIWorkflow(
arch="linux",
build_environment="periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
build_generates_artifacts=False,
is_scheduled="45 4,10,16,22 * * *",
exclude_test=True,
ciflow_config=CIFlowConfig(
labels=set([LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),
),
),
CIWorkflow(
arch="linux",
build_environment="linux-xenial-cuda11.3-py3.7-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
num_test_shards=2,
on_pull_request=True,
fx2trt_test=True,
ciflow_config=CIFlowConfig(
enabled=True,
labels=set([LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),
),
),
CIWorkflow(
arch="linux",
build_environment="libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
build_environment="libtorch-linux-xenial-cuda11.3-py3.7-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
is_libtorch=True,
on_pull_request=True,
build_generates_artifacts=False,
exclude_test=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels=set([LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),
),
),
CIWorkflow(
arch="linux",
build_environment="periodic-linux-xenial-cuda11.1-py3.6-gcc7",
build_environment="periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
num_test_shards=2,
build_with_debug=True,
is_scheduled="45 0,4,8,12,16,20 * * *",
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA}
),
),
CIWorkflow(
arch="linux",
build_environment="periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
build_environment="periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
is_libtorch=True,
build_generates_artifacts=False,
exclude_test=True,
is_scheduled="45 0,4,8,12,16,20 * * *",
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_CUDA},
),
),
CIWorkflow(
arch="linux",
build_environment="linux-bionic-py3.8-gcc9-coverage",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.8-gcc9",
build_environment="linux-bionic-py3.7-clang9",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.7-clang9",
test_runner_type=LINUX_CPU_TEST_RUNNER,
on_pull_request=True,
is_coverage=True,
num_test_shards=2,
ciflow_config=CIFlowConfig(
enabled=True,
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_COVERAGE, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},
),
),
CIWorkflow(
arch="linux",
build_environment="linux-bionic-py3.6-clang9",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.6-clang9",
test_runner_type=LINUX_CPU_TEST_RUNNER,
on_pull_request=True,
num_test_shards=2,
distributed_test=False,
enable_noarch_test=1,
ciflow_config=CIFlowConfig(
enabled=True,
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_XLA, LABEL_CIFLOW_NOARCH},
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_NOARCH},
),
),
CIWorkflow(
arch="linux",
build_environment="linux-vulkan-bionic-py3.7-clang9",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.7-clang9",
test_runner_type=LINUX_CPU_TEST_RUNNER,
num_test_shards=1,
distributed_test=False,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_VULKAN},
),
),
CIWorkflow(
arch="linux",
build_environment="periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
num_test_shards=2,
distributed_test=False,
timeout_after=360,
# Only run this on master 4 times per day since it does take a while
is_scheduled="0 */4 * * *",
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA, LABEL_CIFLOW_SLOW_GRADCHECK, LABEL_CIFLOW_SLOW, LABEL_CIFLOW_SCHEDULED},
),
),
# CIWorkflow(
# arch="linux",
# build_environment="linux-bionic-rocm3.9-py3.6",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-rocm3.9-py3.6",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-x86_32",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-x86_64",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-arm-v7a",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-arm-v8a",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-mobile",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-asan",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-mobile-custom-dynamic",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-mobile-custom-static",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-mobile-code-analysis",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
]
ANDROID_SHORT_WORKFLOWS = [
CIWorkflow(
arch="linux",
build_environment="pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
test_runner_type=LINUX_CPU_TEST_RUNNER,
exclude_test=True,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_ANDROID, LABEL_CIFLOW_DEFAULT},
),
),
CIWorkflow(
arch="linux",
build_environment="pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
test_runner_type=LINUX_CPU_TEST_RUNNER,
exclude_test=True,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_ANDROID, LABEL_CIFLOW_DEFAULT},
),
),
]
ANDROID_WORKFLOWS = [
CIWorkflow(
arch="linux",
build_environment="pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
test_runner_type=LINUX_CPU_TEST_RUNNER,
exclude_test=True,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_ANDROID},
),
),
]
BAZEL_WORKFLOWS = [
CIWorkflow(
arch="linux",
build_environment="linux-xenial-py3.6-gcc7-bazel-test",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7",
build_environment="linux-xenial-cuda11.3-py3.7-gcc7-bazel-test",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7",
test_runner_type=LINUX_CPU_TEST_RUNNER,
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BAZEL, LABEL_CIFLOW_CPU, LABEL_CIFLOW_LINUX},
),
),
]
if __name__ == "__main__":
IOS_WORKFLOWS = [
CIWorkflow(
arch="macos",
build_environment="ios-12-5-1-arm64",
test_runner_type=MACOS_TEST_RUNNER_10_15,
exclude_test=True,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS},
),
),
CIWorkflow(
arch="macos",
build_environment="ios-12-5-1-arm64-coreml",
test_runner_type=MACOS_TEST_RUNNER_10_15,
exclude_test=True,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS},
),
),
CIWorkflow(
arch="macos",
build_environment="ios-12-5-1-arm64-full-jit",
test_runner_type=MACOS_TEST_RUNNER_10_15,
exclude_test=True,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS},
),
),
CIWorkflow(
arch="macos",
build_environment="ios-12-5-1-arm64-custom-ops",
test_runner_type=MACOS_TEST_RUNNER_10_15,
exclude_test=True,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS},
),
),
CIWorkflow(
arch="macos",
build_environment="ios-12-5-1-arm64-metal",
test_runner_type=MACOS_TEST_RUNNER_10_15,
exclude_test=True,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS},
),
),
CIWorkflow(
arch="macos",
build_environment="ios-12-5-1-x86-64",
test_runner_type=MACOS_TEST_RUNNER_10_15,
exclude_test=True,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS},
),
),
CIWorkflow(
arch="macos",
build_environment="ios-12-5-1-x86-64-coreml",
test_runner_type=MACOS_TEST_RUNNER_10_15,
exclude_test=True,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS},
),
),
CIWorkflow(
arch="macos",
build_environment="ios-12-5-1-x86-64-full-jit",
test_runner_type=MACOS_TEST_RUNNER_10_15,
exclude_test=True,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS},
),
),
]
MACOS_WORKFLOWS = [
# Distributed tests are still run on MacOS, but part of regular shards
CIWorkflow(
arch="macos",
build_environment="macos-11-py3-x86-64",
xcode_version="12.4",
test_runner_type=MACOS_TEST_RUNNER_11,
num_test_shards=2,
distributed_test=False,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_MACOS},
),
),
CIWorkflow(
arch="macos",
build_environment="macos-10-15-py3-lite-interpreter-x86-64",
xcode_version="12",
test_runner_type=MACOS_TEST_RUNNER_10_15,
exclude_test=True,
build_generates_artifacts=False,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_MACOS},
),
),
CIWorkflow(
arch="macos",
build_environment="macos-10-15-py3-arm64",
test_runner_type=MACOS_TEST_RUNNER_10_15,
exclude_test=True,
ciflow_config=CIFlowConfig(
labels={LABEL_CIFLOW_MACOS},
),
),
]
DOCKER_IMAGES = {
f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.7-clang9", # for pytorch/xla
f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-rocm4.1-py3.7", # for rocm
f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-rocm4.2-py3.7", # for rocm
f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-rocm4.3.1-py3.7", # for rocm
}
DOCKER_IMAGES.update({
workflow.docker_image_base
for workflow in [*LINUX_WORKFLOWS, *BAZEL_WORKFLOWS, *ANDROID_WORKFLOWS]
if workflow.docker_image_base
})
DOCKER_WORKFLOWS = [
DockerWorkflow(
build_environment="docker-builds",
docker_images=sorted(DOCKER_IMAGES),
# Run weekly to ensure they can build
is_scheduled="1 * */7 * *",
),
]
def main() -> None:
jinja_env = jinja2.Environment(
variable_start_string="!{{",
loader=jinja2.FileSystemLoader(str(GITHUB_DIR.joinpath("templates"))),
@ -561,6 +776,11 @@ if __name__ == "__main__":
(jinja_env.get_template("linux_ci_workflow.yml.j2"), LINUX_WORKFLOWS),
(jinja_env.get_template("windows_ci_workflow.yml.j2"), WINDOWS_WORKFLOWS),
(jinja_env.get_template("bazel_ci_workflow.yml.j2"), BAZEL_WORKFLOWS),
(jinja_env.get_template("ios_ci_workflow.yml.j2"), IOS_WORKFLOWS),
(jinja_env.get_template("macos_ci_workflow.yml.j2"), MACOS_WORKFLOWS),
(jinja_env.get_template("docker_builds_ci_workflow.yml.j2"), DOCKER_WORKFLOWS),
(jinja_env.get_template("android_ci_full_workflow.yml.j2"), ANDROID_WORKFLOWS),
(jinja_env.get_template("android_ci_workflow.yml.j2"), ANDROID_SHORT_WORKFLOWS),
]
# Delete the existing generated files first, this should align with .gitattributes file description.
existing_workflows = GITHUB_DIR.glob("workflows/generated-*")
@ -572,15 +792,14 @@ if __name__ == "__main__":
ciflow_ruleset = CIFlowRuleset()
for template, workflows in template_and_workflows:
# added Iterable check to appease the mypy gods
if not isinstance(workflows, Iterable):
raise Exception(f"How is workflows not iterable? {workflows}")
for workflow in workflows:
workflow.generate_workflow_file(workflow_template=template)
if workflow.ciflow_config.enabled:
ciflow_ruleset.add_label_rule(workflow.ciflow_config.labels, workflow.build_environment)
elif workflow.on_pull_request:
# If ciflow is disabled but still on_pull_request, we can denote
# it as a special label LABEL_CIFLOW_DEFAULT in the ruleset, which will be later
# turned into an actual LABEL_CIFLOW_DEFAULT label in the workflow.
# During the rollout phase, it has the same effect as LABEL_CIFLOW_DEFAULT
ciflow_ruleset.add_label_rule({LABEL_CIFLOW_DEFAULT}, workflow.build_environment)
ciflow_ruleset.add_label_rule(workflow.ciflow_config.labels, workflow.build_environment)
ciflow_ruleset.generate_json()
if __name__ == "__main__":
main()

View File

@ -15,6 +15,9 @@ from typing import Dict
from typing_extensions import TypedDict
BUILD_ENVIRONMENT = os.getenv('BUILD_ENVIRONMENT')
assert BUILD_ENVIRONMENT is not None
class Config(TypedDict):
num_shards: int
runner: str
@ -31,28 +34,63 @@ def get_disabled_issues() -> str:
issue_numbers = [x[4] for x in re.findall(regex, pr_body)]
return ','.join(issue_numbers)
# When the user specifies labels that are NOT ciflow/default, the expectation is
# that the workflows should be triggered as if they are on trunk. For example, when
# ciflow/all is specified, we should run the full test suite for Windows CUDA
# and NOT only the smoke tests.
def run_as_if_on_trunk() -> bool:
ON_PULL_REQUEST = os.getenv('GITHUB_HEAD_REF')
if not ON_PULL_REQUEST:
return True
from pathlib import Path
GITHUB_DIR = Path(__file__).resolve().parent.parent
with open(f'{GITHUB_DIR}/generated-ciflow-ruleset.json') as f:
labels_to_workflows = json.load(f)['label_rules']
pr_labels = json.loads(os.getenv('PR_LABELS', '[]'))
current_workflow_triggered_by_label = False
for label in pr_labels:
if label != 'ciflow/default' and label in labels_to_workflows:
workflows_triggered_by_label = labels_to_workflows[label]
if any([BUILD_ENVIRONMENT in workflow for workflow in workflows_triggered_by_label]):
current_workflow_triggered_by_label = True
break
return current_workflow_triggered_by_label
def main() -> None:
TEST_RUNNER_TYPE = os.getenv('TEST_RUNNER_TYPE')
assert TEST_RUNNER_TYPE is not None
ON_PULL_REQUEST = os.getenv('GITHUB_HEAD_REF')
RUN_SMOKE_TESTS_ONLY_ON_PR = os.getenv('RUN_SMOKE_TESTS_ONLY_ON_PR')
RUN_SMOKE_TESTS = RUN_SMOKE_TESTS_ONLY_ON_PR == "true" and not run_as_if_on_trunk()
NUM_TEST_SHARDS_ON_PULL_REQUEST = os.getenv('NUM_TEST_SHARDS_ON_PULL_REQUEST')
NUM_TEST_SHARDS = int(os.getenv('NUM_TEST_SHARDS', '1'))
if ON_PULL_REQUEST and NUM_TEST_SHARDS_ON_PULL_REQUEST:
NUM_TEST_SHARDS = int(os.getenv('NUM_TEST_SHARDS', '0'))
if not run_as_if_on_trunk() and NUM_TEST_SHARDS_ON_PULL_REQUEST:
NUM_TEST_SHARDS = int(NUM_TEST_SHARDS_ON_PULL_REQUEST)
MULTIGPU_RUNNER_TYPE = os.getenv('MULTIGPU_RUNNER_TYPE')
DISTRIBUTED_GPU_RUNNER_TYPE = os.getenv('DISTRIBUTED_GPU_RUNNER_TYPE', TEST_RUNNER_TYPE)
NOGPU_RUNNER_TYPE = os.getenv('NOGPU_RUNNER_TYPE')
configs: Dict[str, Config] = {}
if os.getenv('ENABLE_JIT_LEGACY_TEST'):
configs['jit_legacy'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
if MULTIGPU_RUNNER_TYPE is not None and os.getenv('ENABLE_MULTIGPU_TEST'):
configs['multigpu'] = {'num_shards': 1, 'runner': MULTIGPU_RUNNER_TYPE}
if NOGPU_RUNNER_TYPE is not None and os.getenv('ENABLE_NOGPU_NO_AVX_TEST'):
configs['nogpu_NO_AVX'] = {'num_shards': 1, 'runner': NOGPU_RUNNER_TYPE}
if NOGPU_RUNNER_TYPE is not None and os.getenv('ENABLE_NOGPU_NO_AVX2_TEST'):
configs['nogpu_NO_AVX2'] = {'num_shards': 1, 'runner': NOGPU_RUNNER_TYPE}
if NOGPU_RUNNER_TYPE is not None:
if os.getenv('ENABLE_NOGPU_NO_AVX_TEST'):
configs['nogpu_NO_AVX'] = {'num_shards': 1, 'runner': NOGPU_RUNNER_TYPE}
if os.getenv('ENABLE_NOGPU_NO_AVX2_TEST'):
configs['nogpu_NO_AVX2'] = {'num_shards': 1, 'runner': NOGPU_RUNNER_TYPE}
if os.getenv('ENABLE_FORCE_ON_CPU_TEST'):
configs['force_on_cpu'] = {'num_shards': 1, 'runner': NOGPU_RUNNER_TYPE}
if os.getenv('ENABLE_DISTRIBUTED_TEST'):
configs['distributed'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
configs['distributed'] = {
'num_shards': 1,
'runner': DISTRIBUTED_GPU_RUNNER_TYPE if "cuda" in str(BUILD_ENVIRONMENT) else TEST_RUNNER_TYPE
}
if os.getenv('ENABLE_FX2TRT_TEST'):
configs['fx2trt'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
if os.getenv('ENABLE_SLOW_TEST'):
configs['slow'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
if os.getenv('ENABLE_DOCS_TEST'):
@ -63,6 +101,8 @@ def main() -> None:
configs['xla'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
if os.getenv('ENABLE_NOARCH_TEST'):
configs['noarch'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
if RUN_SMOKE_TESTS:
configs['smoke_tests'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
matrix = {
'include': [
{

View File

@ -3,7 +3,7 @@
set -eou pipefail
DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID) \
DRIVER_FN="NVIDIA-Linux-x86_64-460.39.run"
DRIVER_FN="NVIDIA-Linux-x86_64-495.44.run"
YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo"
install_nvidia_docker2_amzn2() {

88
.github/scripts/lint_test_ownership.py vendored Executable file
View File

@ -0,0 +1,88 @@
#!/usr/bin/env python3
'''
Test ownership was introduced in https://github.com/pytorch/pytorch/issues/66232.
This lint verifies that every Python test file (file that matches test_*.py or *_test.py in the test folder)
has valid ownership information in a comment header. Valid means:
- The format of the header follows the pattern "# Owner(s): ["list", "of owner", "labels"]
- Each owner label actually exists in PyTorch
- Each owner label starts with "module: " or "oncall: " or is in ACCEPTABLE_OWNER_LABELS
This file is expected to run in the root directory of pytorch/pytorch.
'''
import boto3 # type: ignore[import]
import botocore # type: ignore[import]
import fnmatch
import json
import sys
from pathlib import Path
from typing import List, Any
# Team/owner labels usually start with "module: " or "oncall: ", but the following are acceptable exceptions
ACCEPTABLE_OWNER_LABELS = ["NNC", "high priority"]
GLOB_EXCEPTIONS = [
"test/run_test.py"
]
PYTORCH_ROOT = Path(__file__).resolve().parent.parent.parent
TEST_DIR = PYTORCH_ROOT / "test"
CURRENT_FILE_NAME = Path(__file__).resolve().relative_to(PYTORCH_ROOT)
S3_RESOURCE_READ_ONLY = boto3.resource("s3", config=botocore.config.Config(signature_version=botocore.UNSIGNED))
def get_all_test_files() -> List[Path]:
test_files = list(TEST_DIR.glob("**/test_*.py"))
test_files.extend(list(TEST_DIR.glob("**/*_test.py")))
return [f for f in test_files if any([fnmatch.fnmatch(str(f), g) for g in GLOB_EXCEPTIONS])]
def get_pytorch_labels() -> Any:
bucket = S3_RESOURCE_READ_ONLY.Bucket("ossci-metrics")
summaries = bucket.objects.filter(Prefix="pytorch_labels.json")
for summary in summaries:
labels = summary.get()["Body"].read()
return json.loads(labels)
# Returns a string denoting the error invalidating the label OR an empty string if nothing is wrong
def validate_label(label: str, pytorch_labels: List[str]) -> str:
if label not in pytorch_labels:
return f"{label} is not a PyTorch label (please choose from https://github.com/pytorch/pytorch/labels)"
if label.startswith("module:") or label.startswith("oncall:") or label in ACCEPTABLE_OWNER_LABELS:
return ""
return f"{label} is not an acceptable owner (please update to another label or edit ACCEPTABLE_OWNERS_LABELS " \
"in {CURRENT_FILE_NAME}"
# Returns a string denoting the error invalidating the file OR an empty string if nothing is wrong
def validate_file(filename: Path, pytorch_labels: List[str]) -> str:
prefix = "# Owner(s): "
relative_name = Path(filename).relative_to(PYTORCH_ROOT)
with open(filename) as f:
for line in f.readlines():
if line.startswith(prefix):
labels = json.loads(line[len(prefix):])
labels_msgs = [validate_label(label, pytorch_labels) for label in labels]
file_msg = ", ".join([x for x in labels_msgs if x != ""])
return f"{relative_name}: {file_msg}" if file_msg != "" else ""
return f"{relative_name}: missing a comment header with ownership information."
def main() -> None:
test_file_paths = get_all_test_files()
pytorch_labels = get_pytorch_labels()
file_msgs = [validate_file(f, pytorch_labels) for f in test_file_paths]
err_msg = "\n".join([x for x in file_msgs if x != ""])
if err_msg != "":
err_msg = err_msg + "\n\nIf you see files with missing ownership information above, " \
"please add the following line\n\n# Owner(s): [\"<owner: label>\"]\n\nto the top of each test file. " \
"The owner should be an existing pytorch/pytorch label."
print(err_msg)
sys.exit(1)
if __name__ == '__main__':
main()

View File

@ -20,8 +20,6 @@ import subprocess
from typing import List
CUDA_VERSION = "cu102"
PYTHON_VERSION = "3.7"
TORCHBENCH_CONFIG_NAME = "config.yaml"
MAGIC_PREFIX = "RUN_TORCHBENCH:"
MAGIC_TORCHBENCH_PREFIX = "TORCHBENCH_BRANCH:"
@ -45,6 +43,16 @@ def gen_abtest_config(control: str, treatment: str, models: List[str]) -> str:
config = config + "\n"
return config
def setup_gha_env(name: str, val: str) -> None:
fname = os.environ["GITHUB_ENV"]
content = f"{name}={val}\n"
with open(fname, "a") as fo:
fo.write(content)
def find_current_branch(repo_path: str) -> str:
repo = git.Repo(repo_path)
name: str = repo.active_branch.name
return name
def deploy_torchbench_config(output_dir: str, config: str) -> None:
# Create test dir if needed
pathlib.Path(output_dir).mkdir(exist_ok=True)
@ -73,25 +81,18 @@ def extract_models_from_pr(torchbench_path: str, prbody_file: str) -> List[str]:
return []
return model_list
def identify_torchbench_branch(torchbench_path: str, prbody_file: str) -> None:
branch_name: str
def find_torchbench_branch(prbody_file: str) -> str:
branch_name: str = ""
with open(prbody_file, "r") as pf:
lines = map(lambda x: x.strip(), pf.read().splitlines())
magic_lines = list(filter(lambda x: x.startswith(MAGIC_TORCHBENCH_PREFIX), lines))
if magic_lines:
# Only the first magic line will be recognized.
branch_name = magic_lines[0][len(MAGIC_TORCHBENCH_PREFIX):].strip()
# If not specified, directly return without the branch checkout
# If not specified, use main as the default branch
if not branch_name:
return
try:
print(f"Checking out the TorchBench branch: {branch_name} ...")
repo = git.Repo(torchbench_path)
origin = repo.remotes.origin
origin.fetch(branch_name)
repo.create_head(branch_name, origin.refs[branch_name]).checkout()
except git.exc.GitCommandError:
raise RuntimeError(f'{branch_name} doesn\'t exist in the pytorch/benchmark repository. Please double check.')
branch_name = "main"
return branch_name
def run_torchbench(pytorch_path: str, torchbench_path: str, output_dir: str) -> None:
# Copy system environment so that we will not override
@ -104,28 +105,41 @@ def run_torchbench(pytorch_path: str, torchbench_path: str, output_dir: str) ->
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Run TorchBench tests based on PR')
parser.add_argument('--pr-num', required=True, type=str, help="The Pull Request number")
parser.add_argument('--pr-base-sha', required=True, type=str, help="The Pull Request base hash")
parser.add_argument('--pr-head-sha', required=True, type=str, help="The Pull Request head hash")
parser.add_argument('--pr-body', required=True, help="The file that contains body of a Pull Request")
parser.add_argument('--pytorch-path', required=True, type=str, help="Path to pytorch repository")
parser.add_argument('--torchbench-path', required=True, type=str, help="Path to TorchBench repository")
subparsers = parser.add_subparsers(dest='command')
# parser for setup the torchbench branch name env
branch_parser = subparsers.add_parser("set-torchbench-branch")
# parser to run the torchbench branch
run_parser = subparsers.add_parser("run")
run_parser.add_argument('--pr-num', required=True, type=str, help="The Pull Request number")
run_parser.add_argument('--pr-base-sha', required=True, type=str, help="The Pull Request base hash")
run_parser.add_argument('--pr-head-sha', required=True, type=str, help="The Pull Request head hash")
run_parser.add_argument('--pytorch-path', required=True, type=str, help="Path to pytorch repository")
run_parser.add_argument('--torchbench-path', required=True, type=str, help="Path to TorchBench repository")
args = parser.parse_args()
output_dir: str = os.path.join(os.environ["HOME"], ".torchbench", "bisection", f"pr{args.pr_num}")
# Identify the specified models and verify the input
models = extract_models_from_pr(args.torchbench_path, args.pr_body)
if not models:
print("Can't parse the model filter from the pr body. Currently we only support allow-list.")
exit(1)
# Identify the specified TorchBench branch, verify the branch exists, and checkout the branch
try:
identify_torchbench_branch(args.torchbench_path, args.pr_body)
except RuntimeError as e:
print(f"Identify TorchBench branch failed: {str(e)}")
exit(1)
print(f"Ready to run TorchBench with benchmark. Result will be saved in the directory: {output_dir}.")
# Run TorchBench with the generated config
torchbench_config = gen_abtest_config(args.pr_base_sha, args.pr_head_sha, models)
deploy_torchbench_config(output_dir, torchbench_config)
run_torchbench(pytorch_path=args.pytorch_path, torchbench_path=args.torchbench_path, output_dir=output_dir)
if args.command == 'set-torchbench-branch':
branch_name = find_torchbench_branch(args.pr_body)
# env name: "TORCHBENCH_BRANCH"
setup_gha_env(MAGIC_TORCHBENCH_PREFIX[:-1], branch_name)
elif args.command == 'run':
output_dir: str = os.path.join(os.environ["HOME"], ".torchbench", "bisection", f"pr{args.pr_num}")
# Identify the specified models and verify the input
models = extract_models_from_pr(args.torchbench_path, args.pr_body)
if not models:
print("Can't parse the model filter from the pr body. Currently we only support allow-list.")
exit(-1)
# Assert the current branch in args.torchbench_path is the same as the one specified in pr body
branch_name = find_torchbench_branch(args.pr_body)
current_branch = find_current_branch(args.torchbench_path)
assert branch_name == current_branch, f"Torchbench repo {args.torchbench_path} is on branch {current_branch}, \
but user specified to run on branch {branch_name}."
print(f"Ready to run TorchBench with benchmark. Result will be saved in the directory: {output_dir}.")
# Run TorchBench with the generated config
torchbench_config = gen_abtest_config(args.pr_base_sha, args.pr_head_sha, models)
deploy_torchbench_config(output_dir, torchbench_config)
run_torchbench(pytorch_path=args.pytorch_path, torchbench_path=args.torchbench_path, output_dir=output_dir)
else:
print(f"The command {args.command} is not supported.")
exit(-1)

View File

@ -0,0 +1,157 @@
{%- extends "linux_ci_workflow.yml.j2" -%}
{% import 'common_android.yml.j2' as common_android %}
{%- set exclude_test = true -%}
{% block name -%}
# Template is at: .github/templates/android_ci_full_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: !{{ build_environment }}
{%- endblock %}
on:
pull_request:
types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
{% block build +%}
# building and testing in a single job since bazel runs only small subset of tests
build-and-test:
runs-on: !{{ test_runner_type }}
env:
JOB_BASE_NAME: !{{ build_environment }}-build-and-test
NUM_TEST_SHARDS: !{{ num_test_shards }}
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == '!{{ ciflow_config.trigger_action }}') && (github.event.assigneed.login == '!{{ ciflow_config.trigger_actor }}') }}
LABEL_CONDITIONS: ${{ !{{ ciflow_config.label_conditions }} }}
if: !{{ ciflow_config.root_job_condition }}
steps:
- name: print labels
run: echo "${PR_LABELS}"
!{{ common.setup_ec2_linux() }}
!{{ common.checkout_pytorch("recursive") }}
!{{ common.calculate_docker_image(false) }}
- name: Pull Docker image
run: |
!{{ common.pull_docker("${DOCKER_IMAGE}") }}
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- name: Output disk space left
run: |
sudo df -H
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
!{{ common.parse_ref() }}
!{{ common_android.build_android("pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v7a-build", "arm-v7a") }}
!{{ common_android.build_android("pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v8a-build", "arm-v8a") }}
!{{ common_android.build_android("pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32-build", "x86_32") }}
!{{ common_android.build_android("pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_64-build", "x86_64") }}
- name: Build-Final-Artifcact
env:
BRANCH: ${{ steps.parse-ref.outputs.branch }}
run: |
set -eux
docker_image_libtorch_android_x86_32="${DOCKER_IMAGE}-x86_32"
docker_image_libtorch_android_x86_64="${DOCKER_IMAGE}-x86_64"
docker_image_libtorch_android_arm_v7a="${DOCKER_IMAGE}-arm-v7a"
docker_image_libtorch_android_arm_v8a="${DOCKER_IMAGE}-arm-v8a"
echo "docker_image_commit: ${DOCKER_IMAGE}"
echo "docker_image_libtorch_android_x86_32: ${docker_image_libtorch_android_x86_32}"
echo "docker_image_libtorch_android_x86_64: ${docker_image_libtorch_android_x86_64}"
echo "docker_image_libtorch_android_arm_v7a: ${docker_image_libtorch_android_arm_v7a}"
echo "docker_image_libtorch_android_arm_v8a: ${docker_image_libtorch_android_arm_v8a}"
# x86_32
time docker pull "${docker_image_libtorch_android_x86_32}" >/dev/null
export id_x86_32
id_x86_32=$(docker run -e GRADLE_OFFLINE=1 --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins "${docker_image_libtorch_android_x86_32}")
# shellcheck disable=SC1105
((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "${id_x86_32}" bash) 2>&1
# arm-v7a
time docker pull "${docker_image_libtorch_android_arm_v7a}" >/dev/null
export id_arm_v7a
id_arm_v7a=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins "${docker_image_libtorch_android_arm_v7a}")
# shellcheck disable=SC1105
((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "${id_arm_v7a}" bash) 2>&1
mkdir -p "${GITHUB_WORKSPACE}/build_android_install_arm_v7a"
docker cp "${id_arm_v7a}:/var/lib/jenkins/workspace/build_android/install" "${GITHUB_WORKSPACE}/build_android_install_arm_v7a"
# x86_64
time docker pull "${docker_image_libtorch_android_x86_64}" >/dev/null
export id_x86_64
id_x86_64=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins "${docker_image_libtorch_android_x86_64}")
# shellcheck disable=SC1105
((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "${id_x86_64}" bash) 2>&1
mkdir -p "${GITHUB_WORKSPACE}/build_android_install_x86_64"
docker cp "${id_x86_64}:/var/lib/jenkins/workspace/build_android/install" "${GITHUB_WORKSPACE}/build_android_install_x86_64"
# arm-v8a
time docker pull "${docker_image_libtorch_android_arm_v8a}" >/dev/null
export id_arm_v8a
id_arm_v8a=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins "${docker_image_libtorch_android_arm_v8a}")
# shellcheck disable=SC1105
((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_arm_v8a" bash) 2>&1
mkdir -p "${GITHUB_WORKSPACE}/build_android_install_arm_v8a"
docker cp "${id_arm_v8a}:/var/lib/jenkins/workspace/build_android/install" "${GITHUB_WORKSPACE}/build_android_install_arm_v8a"
# Putting everything together
docker cp "${GITHUB_WORKSPACE}/build_android_install_arm_v7a" "${id_x86_32}:/var/lib/jenkins/workspace/build_android_install_arm_v7a"
docker cp "${GITHUB_WORKSPACE}/build_android_install_x86_64" "${id_x86_32}:/var/lib/jenkins/workspace/build_android_install_x86_64"
docker cp "${GITHUB_WORKSPACE}/build_android_install_arm_v8a" "${id_x86_32}:/var/lib/jenkins/workspace/build_android_install_arm_v8a"
# run gradle buildRelease
# shellcheck disable=SC1105
((echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec \
-e BUILD_ENVIRONMENT="pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build" \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e AWS_DEFAULT_REGION \
-e IS_GHA \
-e PR_NUMBER \
-e SHA1 \
-e BRANCH \
-e GITHUB_RUN_ID \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--user jenkins \
-u jenkins -i "${id_x86_32}" bash) 2>&1
mkdir -p "${GITHUB_WORKSPACE}/build_android_artifacts"
docker cp "${id_x86_32}:/var/lib/jenkins/workspace/android/artifacts.tgz" "${GITHUB_WORKSPACE}/build_android_artifacts/"
output_image="${DOCKER_IMAGE}-android-x86_32-gradle"
docker commit "${id_x86_32}" "${output_image}"
time docker push "${output_image}"
!{{ common_android.upload_androind_binary_size("prebuilt", "${GITHUB_WORKSPACE}/build_android_artifacts/artifacts.tgz") }}
- uses: !{{ common.upload_artifact_s3_action }}
name: Store PyTorch Android Build Artifacts on S3
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
build_android_artifacts/artifacts.tgz
!{{ common.teardown_ec2_linux() }}
{%- endblock %}

View File

@ -0,0 +1,103 @@
{%- extends "linux_ci_workflow.yml.j2" -%}
{% import 'common_android.yml.j2' as common_android %}
{%- set exclude_test = true -%}
{% block name -%}
# Template is at: .github/templates/android_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: !{{ build_environment }}
{%- endblock %}
on:
pull_request:
types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
{% block build +%}
# building and testing in a single job since bazel runs only small subset of tests
build-and-test:
runs-on: !{{ test_runner_type }}
env:
JOB_BASE_NAME: !{{ build_environment }}-build-and-test
NUM_TEST_SHARDS: !{{ num_test_shards }}
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == '!{{ ciflow_config.trigger_action }}') && (github.event.assigneed.login == '!{{ ciflow_config.trigger_actor }}') }}
LABEL_CONDITIONS: ${{ !{{ ciflow_config.label_conditions }} }}
if: !{{ ciflow_config.root_job_condition }}
steps:
- name: print labels
run: echo "${PR_LABELS}"
!{{ common.setup_ec2_linux() }}
!{{ common.checkout_pytorch("recursive") }}
!{{ common.calculate_docker_image(false) }}
- name: Pull Docker image
run: |
!{{ common.pull_docker("${DOCKER_IMAGE}") }}
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- name: Output disk space left
run: |
sudo df -H
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Build
run: |
set -e
# Unlike other gradle jobs, it's not worth building libtorch in a separate CI job and share via docker, because:
# 1) Not shareable: it's custom selective build, which is different from default libtorch mobile build;
# 2) Not parallelizable by architecture: it only builds libtorch for one architecture;
echo "DOCKER_IMAGE: ${DOCKER_IMAGE}"
time docker pull "${DOCKER_IMAGE}" >/dev/null
export BUILD_LITE_INTERPRETER
BUILD_LITE_INTERPRETER="1"
if [[ "${BUILD_ENVIRONMENT}" == *"full-jit" ]]; then
BUILD_LITE_INTERPRETER="0"
fi
git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0
# shellcheck disable=SC2016
export id
id=$(docker run -e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e PR_LABELS \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e BUILD_LITE_INTERPRETER \
-e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "$(pwd):/var/lib/jenkins/workspace" \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
-t -d -w /var/lib/jenkins "${DOCKER_IMAGE}")
# shellcheck disable=SC2016
export COMMAND
# shellcheck disable=SC2016
COMMAND='((echo "export GRADLE_OFFLINE=1" && echo "export BUILD_LITE_INTERPRETER=${BUILD_LITE_INTERPRETER}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo "${COMMAND}" > ./command.sh && bash ./command.sh
# Skip docker push as this job is purely for size analysis purpose.
# Result binaries are already in `/home/circleci/project/` as it's mounted instead of copied.
!{{ common.parse_ref() }}
!{{ common_android.upload_androind_binary_size("custom-build-single", "") }}
!{{ common.teardown_ec2_linux() }}
{%- endblock %}

View File

@ -1,4 +1,5 @@
{%- extends "linux_ci_workflow.yml.j2" -%}
{% import 'common_android.yml.j2' as common_android %}
{%- set exclude_test = true -%}
{% block name -%}
# Template is at: .github/templates/bazel_ci_workflow.yml.j2
@ -7,35 +8,28 @@ name: !{{ build_environment }}
{%- endblock %}
on:
{%- if on_pull_request %}
pull_request:
{%- if ciflow_config.enabled %}
{%- if ciflow_config.trigger_action_only %}
types: [!{{ ciflow_config.trigger_action }}]
{%- else %}
types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
{%- endif %}
{%- endif %}
{%- else %}
# TODO: Enable pull_request builds when we can verify capacity can be met by auto-scalers
{%- endif %}
{% block build +%}
# building and testing in a single job since bazel runs only small subset of tests
build-and-test:
runs-on: !{{ test_runner_type }}
needs: [calculate-docker-image, !{{ ciflow_config.root_job_name }}]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: !{{ build_environment }}-build-and-test
NUM_TEST_SHARDS: !{{ num_test_shards }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == '!{{ ciflow_config.trigger_action }}') && (github.event.assigneed.login == '!{{ ciflow_config.trigger_actor }}') }}
LABEL_CONDITIONS: ${{ !{{ ciflow_config.label_conditions }} }}
if: !{{ ciflow_config.root_job_condition }}
steps:
- name: print labels
run: echo "${PR_LABELS}"
!{{ common.setup_ec2_linux() }}
!{{ common.checkout_pytorch("recursive") }}
- name: Pull docker image
!{{ common.calculate_docker_image(false) }}
- name: Pull Docker image
run: |
docker pull "${DOCKER_IMAGE}"
!{{ common.pull_docker("${DOCKER_IMAGE}") }}
- name: Determine shm-size
run: |
shm_size="1g"
@ -79,23 +73,10 @@ on:
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && sudo chown -R jenkins /dev && .jenkins/pytorch/build.sh'
!{{ common.parse_ref() }}
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
!{{ common_android.upload_androind_binary_size("", "")}}
- name: Test
# Time out the test phase after 3.5 hours
timeout-minutes: 210
run: |
# detached container should get cleaned up by teardown_ec2_linux
export SHARD_NUMBER=0
@ -108,10 +89,10 @@ on:
-e GITHUB_ACTIONS \
-e IN_CI \
-e SHARD_NUMBER \
-e NUM_TEST_SHARDS \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e CONTINUE_THROUGH_ERROR \
-e PR_LABELS \
-e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
@ -132,6 +113,7 @@ on:
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
!{{ common.upload_test_reports(name='bazel') }}
!{{ common.upload_downloaded_files(name='bazel') }}
!{{ common.upload_test_statistics(build_environment) }}
!{{ common.teardown_ec2_linux() }}
{%- endblock %}

View File

@ -4,6 +4,7 @@
{%- set squid_proxy = "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -%}
{# squid_no_proxy is a list of common set of fixed domains or IPs that we don't need to proxy. See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/http_proxy_config.html#windows-proxy #}
{%- set squid_no_proxy = "localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" -%}
{%- set timeout_minutes = 240 -%}
{%- macro concurrency(build_environment) -%}
concurrency:
@ -11,6 +12,13 @@ concurrency:
cancel-in-progress: true
{%- endmacro -%}
{%- macro pull_docker(image) -%}
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
retry docker pull "!{{ image }}"
{%- endmacro -%}
{%- macro display_ec2_information() -%}
- name: Display EC2 information
shell: bash
@ -33,23 +41,23 @@ concurrency:
run: .github/scripts/parse_ref.py
{%- endmacro -%}
{%- macro upload_test_statistics(build_environment) -%}
{%- macro upload_test_statistics(build_environment, when="always()") -%}
- name: Display and upload test statistics (Click Me)
if: always()
if: !{{ when }}
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: !{{ build_environment }}-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
TAG: ${{ steps.parse-ref.outputs.tag }}
WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m pip install boto3==1.19.12
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
{%- endmacro -%}
@ -60,18 +68,13 @@ concurrency:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
--password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
!{{ pull_docker("${ALPINE_IMAGE}") }}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
@ -93,8 +96,6 @@ concurrency:
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
@ -115,14 +116,56 @@ concurrency:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: !{{ submodules }}
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
{%- endmacro -%}
{%- macro upload_downloaded_files(name, artifact_name="", use_s3=True, when="always()") -%}
- name: Zip JSONs for upload
if: !{{ when }}
env:
{%- if name == 'linux' or name == 'windows' or name == 'macos' %}
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
{%- else %}
FILE_SUFFIX: '!{{ name }}-${{ github.job }}'
{%- endif %}
{%- if name == 'windows' %}
shell: powershell
run: |
# -ir => recursive include all files in pattern
7z a "test-jsons-$Env:FILE_SUFFIX.zip" -ir'!test\*.json'
{%- else %}
run: |
# Remove any previous test jsons if they exist
rm -f test-jsons-*.zip
zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json'
{%- endif %}
{%- if use_s3 %}
- uses: !{{ upload_artifact_s3_action }}
name: Store Test Downloaded JSONs on S3
{%- else %}
- uses: actions/upload-artifact@v2
name: Store Test Downloaded JSONs on Github
{%- endif %}
if: !{{ when }}
with:
{%- if artifact_name != "" %}
name: !{{ artifact_name }}
{%- endif %}
retention-days: 14
if-no-files-found: warn
path:
test-jsons-*.zip
{%- endmacro -%}
{%- macro upload_test_reports(name) -%}
{%- macro upload_test_reports(name, artifact_name="", use_s3=True) -%}
- name: Zip test reports for upload
if: always()
env:
{%- if name == 'linux' or name == 'windows' %}
{%- if name == 'linux' or name == 'windows' or name == 'macos' %}
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
{%- else %}
FILE_SUFFIX: '!{{ name }}-${{ github.job }}'
@ -138,35 +181,22 @@ concurrency:
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
{%- endif %}
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
{%- if name == 'linux' or name == 'windows' %}
name: test-reports-${{ matrix.config }}
{%- else %}
name: test-reports-!{{ name }}
{%- endif %}
retention-days: 14
if-no-files-found: error
path:
{%- if name == 'windows' %}
pytorch-${{ github.run_id }}/test-reports-*.zip
{%- else %}
test-reports-*.zip
{%- endif %}
{%- if use_s3 %}
- uses: !{{ upload_artifact_s3_action }}
name: Store Test Reports on S3
{%- else %}
- uses: actions/upload-artifact@v2
name: Store Test Reports on Github
{%- endif %}
if: always()
with:
{%- if artifact_name != "" %}
name: !{{ artifact_name }}
{%- endif %}
retention-days: 14
if-no-files-found: error
path:
{%- if name == 'windows' %}
pytorch-${{ github.run_id }}/test-reports-*.zip
{%- else %}
test-reports-*.zip
{%- endif %}
{%- endmacro -%}
{%- macro render_test_results() -%}
@ -184,3 +214,71 @@ concurrency:
run: |
python3 tools/render_junit.py test/
{%- endmacro -%}
{%- macro calculate_docker_image(always_rebuild) -%}
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
{%- if not always_rebuild %}
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
{%- endif %}
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_SKIP_S3_UPLOAD: 1
working-directory: .circleci/docker
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
./build_docker.sh
{%- endmacro -%}
{%- macro setup_miniconda(python_version) -%}
- name: Setup miniconda
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
python-version: !{{ python_version }}
activate-environment: build
{%- endmacro -%}
{%- macro set_xcode_version(xcode_version) -%}
{%- if xcode_version != '' %}
# Set xcode xcode version to !{{ xcode_version }}
DEVELOPER_DIR: /Applications/Xcode_!{{ xcode_version }}.app/Contents/Developer
{%- endif %}
{%- endmacro -%}

81
.github/templates/common_android.yml.j2 vendored Normal file
View File

@ -0,0 +1,81 @@
{% import 'common.yml.j2' as common %}
{%- macro upload_androind_binary_size(build_type, artifacts) -%}
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
BRANCH: ${{ steps.parse-ref.outputs.branch }}
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
TAG: ${{ steps.parse-ref.outputs.tag }}
WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
# The artifact file is created inside docker container, which contains the result binaries.
# Now unpackage it into the project folder. The subsequent script will scan project folder
# to locate result binaries and report their sizes.
# If artifact file is not provided it assumes that the project folder has been mounted in
# the docker during build and already contains the result binaries, so this step can be skipped.
export ARTIFACTS=!{{ artifacts }}
if [ -n "${ARTIFACTS}" ]; then
tar xf "${ARTIFACTS}" -C "${GITHUB_WORKSPACE}"
cd "${GITHUB_WORKSPACE}"
fi
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
ANDROID_BUILD_TYPE=!{{ build_type}}
export ANDROID_BUILD_TYPE
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba "android" || exit 0
{%- endmacro -%}
{%- macro build_android(env_name, container_suffix) -%}
- name: Build-!{{ container_suffix }}
env:
BRANCH: ${{ steps.parse-ref.outputs.branch }}
run: |
# detached container should get cleaned up by teardown_ec2_linux
#!/bin/bash -eo pipefail
# Pull Docker image and run build
time docker pull "${DOCKER_IMAGE}" >/dev/null
echo "${DOCKER_IMAGE}"
export container_name
container_name=$(docker run \
-e BUILD_ENVIRONMENT=!{{ env_name }} \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e AWS_DEFAULT_REGION \
-e IS_GHA \
-e PR_NUMBER \
-e SHA1 \
-e BRANCH \
-e GITHUB_RUN_ID \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0
docker cp "${GITHUB_WORKSPACE}/." "${container_name}:/var/lib/jenkins/workspace"
# shellcheck disable=SC1105
((echo "sudo chown -R jenkins . && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "${container_name}" bash) 2>&1
# Copy dist folder back
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-!{{ container_suffix }}
docker cp "${container_name}:/var/lib/jenkins/workspace/dist" "${GITHUB_WORKSPACE}/." || echo "Dist folder not found"
docker commit "${container_name}" "${COMMIT_DOCKER_IMAGE}"
time docker push "${COMMIT_DOCKER_IMAGE}"
{%- endmacro -%}

View File

@ -0,0 +1,59 @@
{% import 'common.yml.j2' as common %}
{%- block name -%}
# Template is at: .github/templates/docker_builds_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: !{{ build_environment }}
{%- endblock %}
on:
workflow_dispatch:
pull_request:
types: [opened, synchronize, reopened]
paths:
- '.circleci/docker/**'
- '.github/workflows/generated-docker-builds.yml'
{%- if is_scheduled %}
schedule:
- cron: !{{ is_scheduled }}
{%- endif %}
!{{ common.concurrency(build_environment) }}
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
AWS_DEFAULT_REGION: us-east-1
jobs:
{% block docker_build +%}
docker-build:
runs-on: linux.2xlarge
timeout-minutes: !{{ common.timeout_minutes }}
strategy:
matrix:
include:
{%- for docker_image in docker_images %}
- docker_image_base: '!{{ docker_image }}'
docker_image_short_name: '!{{ docker_image.split('/')[-1] }}'
{%- endfor %}
env:
DOCKER_IMAGE_BASE: '${{ matrix.docker_image_base }}'
name: docker-build (${{ matrix.docker_image_short_name }})
steps:
!{{ common.setup_ec2_linux() }}
!{{ common.checkout_pytorch("recursive") }}
!{{ common.calculate_docker_image(true) }}
- name: Pull Docker image
run: |
!{{ common.pull_docker("${DOCKER_IMAGE}") }}
!{{ common.parse_ref() }}
!{{ common.teardown_ec2_linux() }}
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
{%- endblock %}

View File

@ -0,0 +1,86 @@
{% import 'common.yml.j2' as common %}
{%- block name -%}
# Template is at: .github/templates/ios_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: !{{ build_environment }}
{%- endblock %}
on:
pull_request:
types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
{%- if is_scheduled %}
schedule:
- cron: !{{ is_scheduled }}
{%- else %}
push:
branches:
- master
- release/*
{%- endif %}
workflow_dispatch:
# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179
defaults:
run:
shell: bash -x -e -l {0}
env:
BUILD_ENVIRONMENT: !{{ build_environment }}
IN_CI: 1
IS_GHA: 1
!{{ common.set_xcode_version(xcode_version) }}
jobs:
{% block build +%}
build:
runs-on: macos-10.15
timeout-minutes: !{{ common.timeout_minutes }}
env:
JOB_BASE_NAME: !{{ build_environment }}-build
IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }}
IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }}
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == '!{{ ciflow_config.trigger_action }}') && (github.event.assigneed.login == '!{{ ciflow_config.trigger_actor }}') }}
LABEL_CONDITIONS: ${{ !{{ ciflow_config.label_conditions }} }}
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
if: !{{ ciflow_config.root_job_condition }}
steps:
- name: print labels
run: echo "${PR_LABELS}"
!{{ common.checkout_pytorch("recursive") }}
!{{ common.setup_miniconda("3.8") }}
- name: Install ios / conda Dependencies
run: |
# Install dependencies
brew install libtool
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes
- name: Run Fastlane
shell: bash -e {0}
run: |
set -x
cd ios/TestApp
# install fastlane
sudo gem install bundler && bundle install
# install certificates
echo "${IOS_CERT_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o Certificates.p12
rm cert.txt
bundle exec fastlane install_root_cert
bundle exec fastlane install_dev_cert
# install the provisioning profile
PROFILE=PyTorch_CI_2022.mobileprovision
PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
mkdir -pv "${PROVISIONING_PROFILES}"
cd "${PROVISIONING_PROFILES}"
echo "${IOS_SIGN_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o ${PROFILE}
rm cert.txt
- name: Build
run: |
export TCLLIBPATH="/usr/local/lib"
python -VV
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}
scripts/build_ios.sh
{% endblock +%}
!{{ common.concurrency(build_environment) }}

View File

@ -7,18 +7,8 @@ name: !{{ build_environment }}
{%- endblock %}
on:
{%- if on_pull_request %}
pull_request:
{%- if ciflow_config.enabled %}
{%- if ciflow_config.trigger_action_only %}
types: [!{{ ciflow_config.trigger_action }}]
{%- else %}
types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
{%- endif %}
{%- endif %}
{%- else %}
# TODO: Enable pull_request builds when we can verify capacity can be met by auto-scalers
{%- endif %}
{%- if is_scheduled %}
schedule:
@ -28,6 +18,7 @@ on:
branches:
- master
- release/*
- fbsync
{%- endif %}
workflow_dispatch:
@ -38,6 +29,7 @@ env:
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
IS_GHA: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
@ -45,101 +37,52 @@ env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
AWS_DEFAULT_REGION: us-east-1
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
PYTORCH_RETRY_TEST_CASES: 1
{%- if build_with_debug %}
DEBUG: 1
{%- endif %}
!{{ common.concurrency(build_environment) }}
jobs:
{%- if ciflow_config.enabled %}
!{{ ciflow_config.root_job_name }}:
runs-on: ubuntu-18.04
if: ${{ !{{ ciflow_config.root_job_condition }} }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running !{{ ciflow_config.root_job_name }}
- name: print labels
run: echo "${LABELS}"
{%- endif %}
calculate-docker-image:
runs-on: linux.2xlarge
{%- if ciflow_config.enabled %}
needs: [!{{ ciflow_config.root_job_name }}]
{%- endif %}
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
!{{ common.setup_ec2_linux() }}
!{{ common.checkout_pytorch("false") }}
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
{% block build +%}
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, !{{ ciflow_config.root_job_name }}]
timeout-minutes: !{{ common.timeout_minutes }}
if: !{{ ciflow_config.root_job_condition }}
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: !{{ build_environment }}-build
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == '!{{ ciflow_config.trigger_action }}') && (github.event.assigneed.login == '!{{ ciflow_config.trigger_actor }}') }}
LABEL_CONDITIONS: ${{ !{{ ciflow_config.label_conditions }} }}
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: print labels
run: echo "${PR_LABELS}"
!{{ common.setup_ec2_linux() }}
!{{ common.checkout_pytorch("recursive") }}
- name: Pull docker image
!{{ common.calculate_docker_image(false) }}
- name: Pull Docker image
run: |
docker pull "${DOCKER_IMAGE}"
!{{ common.pull_docker("${DOCKER_IMAGE}") }}
!{{ common.parse_ref() }}
- name: Build
env:
BRANCH: ${{ steps.parse-ref.outputs.branch }}
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e AWS_DEFAULT_REGION \
-e IS_GHA \
-e PR_NUMBER \
-e SHA1 \
-e BRANCH \
-e GITHUB_RUN_ID \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
@ -158,19 +101,14 @@ jobs:
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
!{{ common.parse_ref() }}
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
BRANCH: ${{ steps.parse-ref.outputs.branch }}
TAG: ${{ steps.parse-ref.outputs.tag }}
WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
@ -180,7 +118,7 @@ jobs:
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
{%- if not is_libtorch %}
{%- if build_generates_artifacts %}
- name: Archive artifacts into zip
run: |
zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
@ -207,14 +145,14 @@ jobs:
{%- if not exclude_test %}
{% block test +%}
generate-test-matrix:
needs: build
runs-on: ubuntu-18.04
{%- if ciflow_config.enabled %}
needs: [!{{ ciflow_config.root_job_name }}]
{%- endif %}
timeout-minutes: !{{ common.timeout_minutes }}
env:
TEST_RUNNER_TYPE: !{{ test_runner_type }}
ENABLE_DISTRIBUTED_TEST: !{{ enable_distributed_test }}
ENABLE_JIT_LEGACY_TEST: !{{ enable_jit_legacy_test }}
ENABLE_FX2TRT_TEST: !{{ enable_fx2trt_test }}
ENABLE_MULTIGPU_TEST: !{{ enable_multigpu_test }}
ENABLE_NOGPU_NO_AVX_TEST: !{{ enable_nogpu_no_avx_test }}
ENABLE_NOGPU_NO_AVX2_TEST: !{{ enable_nogpu_no_avx2_test }}
@ -225,6 +163,7 @@ jobs:
ENABLE_NOARCH_TEST: !{{ enable_noarch_test }}
NUM_TEST_SHARDS: !{{ num_test_shards }}
MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
DISTRIBUTED_GPU_RUNNER_TYPE: linux.8xlarge.nvidia.gpu
NOGPU_RUNNER_TYPE: linux.2xlarge
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
@ -243,25 +182,25 @@ jobs:
run: .github/scripts/generate_pytorch_test_matrix.py
test:
needs: [calculate-docker-image, build, generate-test-matrix, !{{ ciflow_config.root_job_name }}]
needs: [build, generate-test-matrix]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
timeout-minutes: !{{ common.timeout_minutes }}
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }}
JOB_BASE_NAME: !{{ build_environment }}-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
!{{ common.setup_ec2_linux() }}
!{{ common.checkout_pytorch("recursive") }}
- name: Pull docker image
- name: Pull Docker image
run: |
docker pull "${DOCKER_IMAGE}"
!{{ common.pull_docker("${DOCKER_IMAGE}") }}
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
run: |
@ -293,24 +232,31 @@ jobs:
- name: Test
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
IS_GHA: 1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_DEFAULT_REGION: us-east-1
BRANCH: ${{ steps.parse-ref.outputs.branch }}
# Time out the test phase after !{{ timeout_after }} minutes
timeout-minutes: !{{ timeout_after }}
run: |
set -x
if [[ $TEST_CONFIG == 'multigpu' ]]; then
TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then
TEST_COMMAND=.jenkins/caffe2/test.sh
else
TEST_COMMAND=.jenkins/pytorch/test.sh
fi
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
PROXY_ENV=
# NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now
# We should investigate whether or not there's a list of hostnames we can add to no_proxy to
# make it so that we shouldn't have to fully disable squid for XLA tests
if [[ $TEST_CONFIG != 'xla' ]]; then
# shellcheck disable=SC2089
PROXY_ENV="-e http_proxy=!{{ common.squid_proxy }} -e https_proxy=!{{ common.squid_proxy }} -e no_proxy=!{{ common.squid_no_proxy }}"
fi
# detached container should get cleaned up by teardown_ec2_linux
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
# shellcheck disable=SC2086,SC2090
container_name=$(docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
@ -319,9 +265,8 @@ jobs:
-e GITHUB_ACTIONS \
-e IN_CI \
-e IS_GHA \
-e CIRCLE_BRANCH \
-e CIRCLE_SHA1 \
-e CIRCLE_PR_NUMBER \
-e BRANCH \
-e SHA1 \
-e AWS_DEFAULT_REGION \
-e IN_WHEEL_TEST \
-e SHARD_NUMBER \
@ -329,15 +274,17 @@ jobs:
-e TEST_CONFIG \
-e NUM_TEST_SHARDS \
-e PYTORCH_IGNORE_DISABLED_ISSUES \
-e PYTORCH_RETRY_TEST_CASES \
-e PR_LABELS \
-e CONTINUE_THROUGH_ERROR \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
${PROXY_ENV} \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--ulimit stack=10485760:83886080 \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--ipc=host \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
@ -354,12 +301,7 @@ jobs:
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
!{{ common.render_test_results() }}
{%- if is_coverage %}
- name: Report coverage
run: |
python3 -mpip install codecov==2.1.12
python3 -mcodecov
{%- endif %}
!{{ common.upload_downloaded_files(name='linux') }}
!{{ common.upload_test_reports(name='linux') }}
!{{ common.upload_test_statistics(build_environment) }}
!{{ common.teardown_ec2_linux() }}
@ -368,19 +310,21 @@ jobs:
{%- if enable_doc_jobs %}
build-docs:
runs-on: linux.2xlarge
timeout-minutes: !{{ common.timeout_minutes }}
strategy:
matrix:
docs_type: [cpp, python]
needs: [calculate-docker-image, build, !{{ ciflow_config.root_job_name }}]
needs: [build]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }}
DOCS_TYPE: ${{ matrix.docs_type }}
WITH_PUSH: ${{ github.event_name == 'schedule' }}
steps:
!{{ common.setup_ec2_linux() }}
!{{ common.checkout_pytorch("recursive") }}
- name: Pull docker image
- name: Pull Docker image
run: |
docker pull "${DOCKER_IMAGE}"
!{{ common.pull_docker("${DOCKER_IMAGE}") }}
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
@ -388,29 +332,44 @@ jobs:
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
{%- if is_scheduled %}
- name: Generate netrc (only for docs-push)
if: ${{ github.event_name == 'schedule' }}
env:
GITHUB_PYTORCHBOT_TOKEN: ${{ secrets.GH_PYTORCHBOT_TOKEN }}
run: |
# set credentials for https pushing
echo "machine github.com" > "${RUNNER_TEMP}/.netrc"
echo "login pytorchbot" >> "${RUNNER_TEMP}/.netrc"
echo "password ${GITHUB_PYTORCHBOT_TOKEN}" >> "${RUNNER_TEMP}/.netrc"
{%- endif %}
- name: Build ${{ matrix.docs_type }} docs
run: |
set -ex
time docker pull "${DOCKER_IMAGE}" > /dev/null
echo "${GITHUB_REF}"
ref=${GITHUB_REF##*/}
target=${ref//v}
# TODO: Set it correctly when workflows are scheduled on tags
target="master"
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e IN_CI \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e CIRCLE_SHA1="$GITHUB_SHA" \
-e SHA1="$GITHUB_SHA" \
-e DOCS_VERSION="${target}" \
-e DOCS_TYPE \
-e PR_LABELS \
-e WITH_PUSH \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
{%- if is_scheduled %}
-v "${RUNNER_TEMP}/.netrc":/var/lib/jenkins/.netrc \
{%- endif %}
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
@ -427,7 +386,7 @@ jobs:
retention-days: 14
s3-bucket: doc-previews
if-no-files-found: error
path: pytorch.github.io/docs/merge/
path: pytorch.github.io/docs/master/
s3-prefix: pytorch/${{ github.event.pull_request.number }}
- uses: !{{ common.upload_artifact_s3_action }}
name: Upload C++ Docs Preview
@ -438,14 +397,4 @@ jobs:
s3-bucket: doc-previews
path: cppdocs/
s3-prefix: pytorch/${{ github.event.pull_request.number }}/cppdocs
- name: Archive artifacts into zip
run: |
zip -r "docs_${DOCS_TYPE}.zip" "${GITHUB_WORKSPACE}/pytorch.github.io" "${GITHUB_WORKSPACE}/cppdocs"
- uses: actions/upload-artifact@v2
name: Store PyTorch Build Artifacts
with:
name: docs_${{ matrix.docs_type }}
path: docs_${{ matrix.docs_type }}.zip
if-no-files-found: error
!{{ common.teardown_ec2_linux() }}
{%- endif -%}

View File

@ -0,0 +1,149 @@
{% import 'common.yml.j2' as common %}
{%- block name -%}
# Template is at: .github/templates/macos_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: !{{ build_environment }}
{%- endblock %}
on:
pull_request:
types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
{%- if is_scheduled %}
schedule:
- cron: !{{ is_scheduled }}
{%- else %}
push:
branches:
- master
- release/*
- fbsync
{%- endif %}
workflow_dispatch:
# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179
defaults:
run:
shell: bash -e -l {0}
env:
BUILD_ENVIRONMENT: !{{ build_environment }}
COMPACT_JOB_NAME: !{{ build_environment }}
IN_CI: 1
IS_GHA: 1
PYTORCH_RETRY_TEST_CASES: 1
!{{ common.set_xcode_version(xcode_version) }}
jobs:
{% block build +%}
build:
runs-on: !{{ test_runner_type }}
env:
JOB_BASE_NAME: !{{ build_environment }}
# For sccache access (only on non-forked PRs)
AWS_ACCESS_KEY_ID: ${{ secrets.MACOS_SCCACHE_S3_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.MACOS_SCCACHE_S3_SECRET_ACCESS_KEY }}
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == '!{{ ciflow_config.trigger_action }}') && (github.event.assigneed.login == '!{{ ciflow_config.trigger_actor }}') }}
LABEL_CONDITIONS: ${{ !{{ ciflow_config.label_conditions }} }}
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
if: !{{ ciflow_config.root_job_condition }}
steps:
- name: print labels
run: echo "${PR_LABELS}"
!{{ common.checkout_pytorch("recursive") }}
!{{ common.setup_miniconda("3.8") }}
- name: Install macOS homebrew dependencies
run: |
# Install dependencies
brew install libomp
- name: Install sccache (only for non-forked PRs, and pushes to trunk)
if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }}
run: |
sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache
sudo chmod +x /usr/local/bin/sccache
echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}"
- name: Build
run: |
echo "CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}" >> "${GITHUB_ENV}"
.jenkins/pytorch/macos-build.sh
{%- if build_generates_artifacts %}
- name: Archive artifacts into zip
run: |
zip -1 -r artifacts.zip dist/
- uses: actions/upload-artifact@v2
name: Store PyTorch Build Artifacts on GHA
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
artifacts.zip
{%- endif %}
{% endblock +%}
{%- if not exclude_test %}
{% block test +%}
generate-test-matrix:
needs: build
runs-on: ubuntu-18.04
timeout-minutes: !{{ common.timeout_minutes }}
env:
TEST_RUNNER_TYPE: !{{ test_runner_type }}
ENABLE_DISTRIBUTED_TEST: !{{ enable_distributed_test }}
NUM_TEST_SHARDS: !{{ num_test_shards }}
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
needs: [build, generate-test-matrix]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
timeout-minutes: !{{ common.timeout_minutes }}
env:
JOB_BASE_NAME: !{{ build_environment }}-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
steps:
!{{ common.checkout_pytorch("false") }}
- uses: actions/download-artifact@v2
name: Download PyTorch Build Artifacts from GHA
with:
name: ${{ env.BUILD_ENVIRONMENT }}
path: .
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
!{{ common.setup_miniconda("3.8") }}
- name: Install macOS homebrew dependencies
run: |
# Install dependencies
brew install libomp
!{{ common.parse_ref() }}
- name: Test
run: |
python3 -mpip install dist/*.whl
.jenkins/pytorch/macos-test.sh
!{{ common.render_test_results() }}
!{{ common.upload_downloaded_files(name='macos', artifact_name="test-jsons", use_s3=False) }}
!{{ common.upload_test_reports("macos", artifact_name="test-reports", use_s3=False) }}
!{{ common.upload_test_statistics(build_environment) }}
{% endblock +%}
{%- endif %}
!{{ common.concurrency(build_environment) }}

View File

@ -19,16 +19,8 @@
name: !{{ build_environment }}
on:
{%- if on_pull_request %}
pull_request:
{%- if ciflow_config.enabled %}
{%- if ciflow_config.trigger_action_only %}
types: [!{{ ciflow_config.trigger_action }}]
{%- else %}
types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
{%- endif %}
{%- endif %}
{%- endif %}
{%- if is_scheduled %}
schedule:
- cron: !{{ is_scheduled }}
@ -37,16 +29,20 @@ on:
branches:
- master
- release/*
- fbsync
{%- endif %}
workflow_dispatch:
env:
BUILD_ENVIRONMENT: !{{ build_environment }}
BUILD_WHEEL: 1
MAX_JOBS: 8
CUDA_VERSION: "!{{ cuda_version }}"
IN_CI: 1
IS_GHA: 1
INSTALL_WINDOWS_SDK: 1
PYTHON_VERSION: "3.8"
PYTORCH_RETRY_TEST_CASES: 1
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
SCCACHE_BUCKET: "ossci-compiler-cache"
VC_PRODUCT: "BuildTools"
@ -55,46 +51,38 @@ env:
VC_YEAR: "2019"
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
no_proxy: !{{ common.squid_no_proxy }}
AWS_DEFAULT_REGION: us-east-1
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
{%- if build_with_debug %}
DEBUG: 1
{%- endif %}
{%- if cuda_version != "cpu" %}
TORCH_CUDA_ARCH_LIST: "7.0"
USE_CUDA: 1
{%- endif %}
USE_CUDA: !{{ 1 if cuda_version != "cpu" else 0 }}
!{{ common.concurrency(build_environment) }}
jobs:
{%- if ciflow_config.enabled %}
!{{ ciflow_config.root_job_name }}:
runs-on: ubuntu-18.04
if: ${{ !{{ ciflow_config.root_job_condition }} }}
steps:
- name: noop
run: echo running !{{ ciflow_config.root_job_name }}
{%- endif %}
build:
runs-on: "windows.4xlarge"
defaults:
run:
working-directory: pytorch-${{ github.run_id }}
{%- if ciflow_config.enabled %}
needs: [!{{ ciflow_config.root_job_name }}]
{%- endif %}
timeout-minutes: !{{ common.timeout_minutes }}
env:
JOB_BASE_NAME: !{{ build_environment }}-build
http_proxy: "!{{ common. squid_proxy }}"
https_proxy: "!{{ common.squid_proxy }}"
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == '!{{ ciflow_config.trigger_action }}') && (github.event.assigneed.login == '!{{ ciflow_config.trigger_actor }}') }}
LABEL_CONDITIONS: ${{ !{{ ciflow_config.label_conditions }} }}
if: !{{ ciflow_config.root_job_condition }}
steps:
- name: print labels
run: echo "${PR_LABELS}"
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
submodules: recursive
path: pytorch-${{ github.run_id }}
# deep clone, to allow use of git merge-base
fetch-depth: 0
!{{ common.checkout_pytorch("recursive") }}
!{{ common.display_ec2_information() }}
- name: Install Visual Studio 2019 toolchain
shell: powershell
@ -110,25 +98,16 @@ jobs:
run: |
.circleci/scripts/windows_cudnn_install.sh
{%- endif %}
!{{ common.parse_ref() }}
- name: Build
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
BRANCH: ${{ steps.parse-ref.outputs.branch }}
run: |
.jenkins/pytorch/win-build.sh
# Upload to github so that people can click and download artifacts
- name: Upload artifacts to Github
if: always()
uses: actions/upload-artifact@v2
# Don't fail on upload to GH since it's only for user convenience
continue-on-error: true
with:
retention-days: 14
if-no-files-found: error
name: ${{ env.BUILD_ENVIRONMENT }}
path: C:\${{ github.run_id }}\build-results
- name: Upload artifacts to s3
if: always()
uses: !{{ common.upload_artifact_s3_action }}
with:
retention-days: 14
@ -147,15 +126,17 @@ jobs:
rm -rf ./*
generate-test-matrix:
{%- if ciflow_config.enabled %}
needs: [!{{ ciflow_config.root_job_name }}]
{%- endif %}
needs: build
runs-on: ubuntu-18.04
timeout-minutes: !{{ common.timeout_minutes }}
env:
TEST_RUNNER_TYPE: !{{ test_runner_type }}
NUM_TEST_SHARDS: !{{ num_test_shards }}
NUM_TEST_SHARDS_ON_PULL_REQUEST: !{{ num_test_shards_on_pull_request }}
PR_BODY: ${{ github.event.pull_request.body }}
NOGPU_RUNNER_TYPE: windows.4xlarge
ENABLE_FORCE_ON_CPU_TEST: !{{ enable_force_on_cpu_test }}
RUN_SMOKE_TESTS_ONLY_ON_PR: !{{ only_run_smoke_tests_on_pull_request }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
@ -172,9 +153,7 @@ jobs:
run: .github/scripts/generate_pytorch_test_matrix.py
test:
{%- if only_build_on_pull_request %}
if: ${{ github.event_name == 'push' }}
{%- endif %}
timeout-minutes: !{{ common.timeout_minutes }}
env:
JOB_BASE_NAME: !{{ build_environment }}-test
SHARD_NUMBER: ${{ matrix.shard }}
@ -182,40 +161,31 @@ jobs:
TEST_CONFIG: ${{ matrix.config }}
http_proxy: "!{{ common.squid_proxy }}"
https_proxy: "!{{ common.squid_proxy }}"
RUN_SMOKE_TESTS_ONLY_ON_PR: !{{ only_run_smoke_tests_on_pull_request }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
needs: [build, generate-test-matrix, !{{ ciflow_config.root_job_name }}]
needs: [build, generate-test-matrix]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
defaults:
run:
working-directory: pytorch-${{ github.run_id }}
steps:
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
submodules: recursive
path: pytorch-${{ github.run_id }}
# deep clone, to allow use of git merge-base
fetch-depth: 0
!{{ common.display_ec2_information() }}
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
!{{ common.checkout_pytorch("recursive") }}
- name: Install Visual Studio 2019 toolchain
shell: powershell
run: |
.\.circleci\scripts\vs_install.ps1
{%- if cuda_version != "cpu" %}
- name: Install Cuda
if: ${{ matrix.config != 'force_on_cpu' }}
shell: bash
run: |
.circleci/scripts/windows_cuda_install.sh
- name: Install Cudnn
if: ${{ matrix.config != 'force_on_cpu' }}
shell: bash
run: |
.circleci/scripts/windows_cudnn_install.sh
@ -238,14 +208,11 @@ jobs:
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
# Time out the test phase after 3.5 hours
timeout-minutes: 210
run: |
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
if [[ -n $GITHUB_HEAD_REF && "$RUN_SMOKE_TESTS_ONLY_ON_PR" == "true" ]]; then
export RUN_SMOKE_TESTS_ONLY=1
fi
.jenkins/pytorch/win-test.sh
!{{ common.upload_downloaded_files(name='windows') }}
!{{ common.upload_test_reports(name='windows') }}
!{{ common.render_test_results() }}
!{{ wait_and_kill_ssh() }}

View File

@ -1,54 +0,0 @@
name: Label PRs & Issues
on:
issues:
types: [opened, edited]
pull_request_target:
types: [edited, opened, synchronize, reopened]
concurrency:
group: auto-label-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
auto-label-rocm:
if: ${{ github.repository == 'pytorch/pytorch' }}
runs-on: ubuntu-18.04
steps:
- name: Retrieve information
id: vars
env:
EVENT_NAME: ${{ github.event_name }}
PR_TITLE: ${{ github.event.pull_request.title }}
PR_NUMBER: ${{ github.event.pull_request.number }}
ISSUE_TITLE: ${{ github.event.issue.title }}
ISSUE_NUMBER: ${{ github.event.issue.number }}
run: |
set -eux
if [[ "$EVENT_NAME" == "pull_request_target" ]]; then
TITLE="${PR_TITLE}"
ISSUE_NUMBER="${PR_NUMBER}"
else
TITLE="${ISSUE_TITLE}"
# ISSUE_NUMBER is already set
fi
echo ::set-output name=TITLE::"${TITLE}"
echo ::set-output name=ISSUE_NUMBER::"${ISSUE_NUMBER}"
- name: Auto-label ROCm
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
TITLE: ${{ steps.vars.outputs.TITLE }}
ISSUE_NUMBER: ${{ steps.vars.outputs.ISSUE_NUMBER }}
OWNER: ${{ github.repository_owner }}
REPO: ${{ github.event.repository.name }}
run: |
set -eux
if [[ "${TITLE,,}" == *rocm* ]]; then
curl \
-X POST \
-H "Authorization: token ${GITHUB_TOKEN}" \
"https://api.github.com/repos/${OWNER}/${REPO}/issues/${ISSUE_NUMBER}/labels" \
-d '{"labels":["module: rocm"]}'
fi

View File

@ -95,15 +95,13 @@ jobs:
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
BRANCH: ${{ steps.parse-ref.outputs.branch }}
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
TAG: ${{ steps.parse-ref.outputs.tag }}
WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME

View File

@ -94,15 +94,13 @@ jobs:
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
BRANCH: ${{ steps.parse-ref.outputs.branch }}
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
TAG: ${{ steps.parse-ref.outputs.tag }}
WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME

View File

@ -93,15 +93,13 @@ jobs:
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
BRANCH: ${{ steps.parse-ref.outputs.branch }}
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
TAG: ${{ steps.parse-ref.outputs.tag }}
WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME

View File

@ -1,24 +1,26 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: puretorch-linux-xenial-py3.6-gcc5.4
name: caffe2-linux-xenial-py3.7-gcc5.4
on:
pull_request:
types: [unassigned]
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
- fbsync
workflow_dispatch:
env:
BUILD_ENVIRONMENT: puretorch-linux-xenial-py3.6-gcc5.4
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4
BUILD_ENVIRONMENT: caffe2-linux-xenial-py3.7-gcc5.4
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
IS_GHA: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
@ -26,31 +28,34 @@ env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
AWS_DEFAULT_REGION: us-east-1
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
PYTORCH_RETRY_TEST_CASES: 1
concurrency:
group: puretorch-linux-xenial-py3.6-gcc5.4-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
group: caffe2-linux-xenial-py3.7-gcc5.4-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
build:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
timeout-minutes: 240
if: ${{ (github.repository == 'pytorch/pytorch') && (
(github.event_name == 'push') ||
(github.event_name == 'schedule') ||
(contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||
(false))
}}
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
JOB_BASE_NAME: caffe2-linux-xenial-py3.7-gcc5.4-build
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}
LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: print labels
run: echo "${PR_LABELS}"
- name: Display EC2 information
shell: bash
run: |
@ -69,18 +74,16 @@ jobs:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
--password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
# Ensure the working directory gets chowned back to the current user
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
@ -98,17 +101,22 @@ jobs:
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
@ -140,77 +148,35 @@ jobs:
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
working-directory: .circleci/docker
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: puretorch-linux-xenial-py3.6-gcc5.4-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
./build_docker.sh
- name: Pull Docker image
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
retry docker pull "${DOCKER_IMAGE}"
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Build
env:
BRANCH: ${{ steps.parse-ref.outputs.branch }}
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e AWS_DEFAULT_REGION \
-e IS_GHA \
-e PR_NUMBER \
-e SHA1 \
-e BRANCH \
-e GITHUB_RUN_ID \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
@ -229,21 +195,14 @@ jobs:
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
BRANCH: ${{ steps.parse-ref.outputs.branch }}
TAG: ${{ steps.parse-ref.outputs.tag }}
WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
@ -270,8 +229,6 @@ jobs:
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

174
.github/workflows/generated-docker-builds.yml generated vendored Normal file
View File

@ -0,0 +1,174 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/docker_builds_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: docker-builds
on:
workflow_dispatch:
pull_request:
types: [opened, synchronize, reopened]
paths:
- '.circleci/docker/**'
- '.github/workflows/generated-docker-builds.yml'
schedule:
- cron: 1 * */7 * *
concurrency:
group: docker-builds-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
AWS_DEFAULT_REGION: us-east-1
jobs:
docker-build:
runs-on: linux.2xlarge
timeout-minutes: 240
strategy:
matrix:
include:
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.7-clang9'
docker_image_short_name: 'pytorch-linux-bionic-cuda10.2-cudnn7-py3.7-clang9'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7'
docker_image_short_name: 'pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7'
docker_image_short_name: 'pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.7-clang9'
docker_image_short_name: 'pytorch-linux-bionic-py3.7-clang9'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-rocm4.1-py3.7'
docker_image_short_name: 'pytorch-linux-bionic-rocm4.1-py3.7'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-rocm4.2-py3.7'
docker_image_short_name: 'pytorch-linux-bionic-rocm4.2-py3.7'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-rocm4.3.1-py3.7'
docker_image_short_name: 'pytorch-linux-bionic-rocm4.3.1-py3.7'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7'
docker_image_short_name: 'pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7'
docker_image_short_name: 'pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7'
docker_image_short_name: 'pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c'
docker_image_short_name: 'pytorch-linux-xenial-py3-clang5-android-ndk-r19c'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan'
docker_image_short_name: 'pytorch-linux-xenial-py3-clang5-asan'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang7-asan'
docker_image_short_name: 'pytorch-linux-xenial-py3-clang7-asan'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang7-onnx'
docker_image_short_name: 'pytorch-linux-xenial-py3-clang7-onnx'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4'
docker_image_short_name: 'pytorch-linux-xenial-py3.7-gcc5.4'
- docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc7'
docker_image_short_name: 'pytorch-linux-xenial-py3.7-gcc7'
env:
DOCKER_IMAGE_BASE: '${{ matrix.docker_image_base }}'
name: docker-build (${{ matrix.docker_image_short_name }})
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
--password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
- name: Chown workspace
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
retry docker pull "${ALPINE_IMAGE}"
# Ensure the working directory gets chowned back to the current user
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_SKIP_S3_UPLOAD: 1
working-directory: .circleci/docker
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
./build_docker.sh
- name: Pull Docker image
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
retry docker pull "${DOCKER_IMAGE}"
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af

View File

@ -0,0 +1,98 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/ios_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: ios-12-5-1-arm64-coreml
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179
defaults:
run:
shell: bash -x -e -l {0}
env:
BUILD_ENVIRONMENT: ios-12-5-1-arm64-coreml
IN_CI: 1
IS_GHA: 1
jobs:
build:
runs-on: macos-10.15
timeout-minutes: 240
env:
JOB_BASE_NAME: ios-12-5-1-arm64-coreml-build
IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }}
IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }}
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}
LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
if: ${{ (github.repository == 'pytorch/pytorch') && (
(github.event_name == 'push') ||
(github.event_name == 'schedule') ||
(contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||
(false))
}}
steps:
- name: print labels
run: echo "${PR_LABELS}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Setup miniconda
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
python-version: 3.8
activate-environment: build
- name: Install ios / conda Dependencies
run: |
# Install dependencies
brew install libtool
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes
- name: Run Fastlane
shell: bash -e {0}
run: |
set -x
cd ios/TestApp
# install fastlane
sudo gem install bundler && bundle install
# install certificates
echo "${IOS_CERT_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o Certificates.p12
rm cert.txt
bundle exec fastlane install_root_cert
bundle exec fastlane install_dev_cert
# install the provisioning profile
PROFILE=PyTorch_CI_2022.mobileprovision
PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
mkdir -pv "${PROVISIONING_PROFILES}"
cd "${PROVISIONING_PROFILES}"
echo "${IOS_SIGN_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o ${PROFILE}
rm cert.txt
- name: Build
run: |
export TCLLIBPATH="/usr/local/lib"
python -VV
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}
scripts/build_ios.sh
concurrency:
group: ios-12-5-1-arm64-coreml-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true

View File

@ -0,0 +1,98 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/ios_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: ios-12-5-1-arm64-custom-ops
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179
defaults:
run:
shell: bash -x -e -l {0}
env:
BUILD_ENVIRONMENT: ios-12-5-1-arm64-custom-ops
IN_CI: 1
IS_GHA: 1
jobs:
build:
runs-on: macos-10.15
timeout-minutes: 240
env:
JOB_BASE_NAME: ios-12-5-1-arm64-custom-ops-build
IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }}
IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }}
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}
LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
if: ${{ (github.repository == 'pytorch/pytorch') && (
(github.event_name == 'push') ||
(github.event_name == 'schedule') ||
(contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||
(false))
}}
steps:
- name: print labels
run: echo "${PR_LABELS}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Setup miniconda
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
python-version: 3.8
activate-environment: build
- name: Install ios / conda Dependencies
run: |
# Install dependencies
brew install libtool
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes
- name: Run Fastlane
shell: bash -e {0}
run: |
set -x
cd ios/TestApp
# install fastlane
sudo gem install bundler && bundle install
# install certificates
echo "${IOS_CERT_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o Certificates.p12
rm cert.txt
bundle exec fastlane install_root_cert
bundle exec fastlane install_dev_cert
# install the provisioning profile
PROFILE=PyTorch_CI_2022.mobileprovision
PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
mkdir -pv "${PROVISIONING_PROFILES}"
cd "${PROVISIONING_PROFILES}"
echo "${IOS_SIGN_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o ${PROFILE}
rm cert.txt
- name: Build
run: |
export TCLLIBPATH="/usr/local/lib"
python -VV
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}
scripts/build_ios.sh
concurrency:
group: ios-12-5-1-arm64-custom-ops-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true

View File

@ -0,0 +1,98 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/ios_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: ios-12-5-1-arm64-full-jit
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179
defaults:
run:
shell: bash -x -e -l {0}
env:
BUILD_ENVIRONMENT: ios-12-5-1-arm64-full-jit
IN_CI: 1
IS_GHA: 1
jobs:
build:
runs-on: macos-10.15
timeout-minutes: 240
env:
JOB_BASE_NAME: ios-12-5-1-arm64-full-jit-build
IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }}
IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }}
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}
LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
if: ${{ (github.repository == 'pytorch/pytorch') && (
(github.event_name == 'push') ||
(github.event_name == 'schedule') ||
(contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||
(false))
}}
steps:
- name: print labels
run: echo "${PR_LABELS}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Setup miniconda
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
python-version: 3.8
activate-environment: build
- name: Install ios / conda Dependencies
run: |
# Install dependencies
brew install libtool
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes
- name: Run Fastlane
shell: bash -e {0}
run: |
set -x
cd ios/TestApp
# install fastlane
sudo gem install bundler && bundle install
# install certificates
echo "${IOS_CERT_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o Certificates.p12
rm cert.txt
bundle exec fastlane install_root_cert
bundle exec fastlane install_dev_cert
# install the provisioning profile
PROFILE=PyTorch_CI_2022.mobileprovision
PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
mkdir -pv "${PROVISIONING_PROFILES}"
cd "${PROVISIONING_PROFILES}"
echo "${IOS_SIGN_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o ${PROFILE}
rm cert.txt
- name: Build
run: |
export TCLLIBPATH="/usr/local/lib"
python -VV
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}
scripts/build_ios.sh
concurrency:
group: ios-12-5-1-arm64-full-jit-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true

98
.github/workflows/generated-ios-12-5-1-arm64-metal.yml generated vendored Normal file
View File

@ -0,0 +1,98 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/ios_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: ios-12-5-1-arm64-metal
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179
defaults:
run:
shell: bash -x -e -l {0}
env:
BUILD_ENVIRONMENT: ios-12-5-1-arm64-metal
IN_CI: 1
IS_GHA: 1
jobs:
build:
runs-on: macos-10.15
timeout-minutes: 240
env:
JOB_BASE_NAME: ios-12-5-1-arm64-metal-build
IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }}
IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }}
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}
LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
if: ${{ (github.repository == 'pytorch/pytorch') && (
(github.event_name == 'push') ||
(github.event_name == 'schedule') ||
(contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||
(false))
}}
steps:
- name: print labels
run: echo "${PR_LABELS}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Setup miniconda
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
python-version: 3.8
activate-environment: build
- name: Install ios / conda Dependencies
run: |
# Install dependencies
brew install libtool
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes
- name: Run Fastlane
shell: bash -e {0}
run: |
set -x
cd ios/TestApp
# install fastlane
sudo gem install bundler && bundle install
# install certificates
echo "${IOS_CERT_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o Certificates.p12
rm cert.txt
bundle exec fastlane install_root_cert
bundle exec fastlane install_dev_cert
# install the provisioning profile
PROFILE=PyTorch_CI_2022.mobileprovision
PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
mkdir -pv "${PROVISIONING_PROFILES}"
cd "${PROVISIONING_PROFILES}"
echo "${IOS_SIGN_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o ${PROFILE}
rm cert.txt
- name: Build
run: |
export TCLLIBPATH="/usr/local/lib"
python -VV
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}
scripts/build_ios.sh
concurrency:
group: ios-12-5-1-arm64-metal-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true

98
.github/workflows/generated-ios-12-5-1-arm64.yml generated vendored Normal file
View File

@ -0,0 +1,98 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/ios_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: ios-12-5-1-arm64
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179
defaults:
run:
shell: bash -x -e -l {0}
env:
BUILD_ENVIRONMENT: ios-12-5-1-arm64
IN_CI: 1
IS_GHA: 1
jobs:
build:
runs-on: macos-10.15
timeout-minutes: 240
env:
JOB_BASE_NAME: ios-12-5-1-arm64-build
IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }}
IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }}
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}
LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
if: ${{ (github.repository == 'pytorch/pytorch') && (
(github.event_name == 'push') ||
(github.event_name == 'schedule') ||
(contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||
(false))
}}
steps:
- name: print labels
run: echo "${PR_LABELS}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Setup miniconda
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
python-version: 3.8
activate-environment: build
- name: Install ios / conda Dependencies
run: |
# Install dependencies
brew install libtool
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes
- name: Run Fastlane
shell: bash -e {0}
run: |
set -x
cd ios/TestApp
# install fastlane
sudo gem install bundler && bundle install
# install certificates
echo "${IOS_CERT_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o Certificates.p12
rm cert.txt
bundle exec fastlane install_root_cert
bundle exec fastlane install_dev_cert
# install the provisioning profile
PROFILE=PyTorch_CI_2022.mobileprovision
PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
mkdir -pv "${PROVISIONING_PROFILES}"
cd "${PROVISIONING_PROFILES}"
echo "${IOS_SIGN_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o ${PROFILE}
rm cert.txt
- name: Build
run: |
export TCLLIBPATH="/usr/local/lib"
python -VV
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}
scripts/build_ios.sh
concurrency:
group: ios-12-5-1-arm64-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true

View File

@ -0,0 +1,98 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/ios_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: ios-12-5-1-x86-64-coreml
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179
defaults:
run:
shell: bash -x -e -l {0}
env:
BUILD_ENVIRONMENT: ios-12-5-1-x86-64-coreml
IN_CI: 1
IS_GHA: 1
jobs:
build:
runs-on: macos-10.15
timeout-minutes: 240
env:
JOB_BASE_NAME: ios-12-5-1-x86-64-coreml-build
IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }}
IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }}
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}
LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
if: ${{ (github.repository == 'pytorch/pytorch') && (
(github.event_name == 'push') ||
(github.event_name == 'schedule') ||
(contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||
(false))
}}
steps:
- name: print labels
run: echo "${PR_LABELS}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Setup miniconda
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
python-version: 3.8
activate-environment: build
- name: Install ios / conda Dependencies
run: |
# Install dependencies
brew install libtool
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes
- name: Run Fastlane
shell: bash -e {0}
run: |
set -x
cd ios/TestApp
# install fastlane
sudo gem install bundler && bundle install
# install certificates
echo "${IOS_CERT_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o Certificates.p12
rm cert.txt
bundle exec fastlane install_root_cert
bundle exec fastlane install_dev_cert
# install the provisioning profile
PROFILE=PyTorch_CI_2022.mobileprovision
PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
mkdir -pv "${PROVISIONING_PROFILES}"
cd "${PROVISIONING_PROFILES}"
echo "${IOS_SIGN_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o ${PROFILE}
rm cert.txt
- name: Build
run: |
export TCLLIBPATH="/usr/local/lib"
python -VV
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}
scripts/build_ios.sh
concurrency:
group: ios-12-5-1-x86-64-coreml-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true

View File

@ -0,0 +1,98 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/ios_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: ios-12-5-1-x86-64-full-jit
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179
defaults:
run:
shell: bash -x -e -l {0}
env:
BUILD_ENVIRONMENT: ios-12-5-1-x86-64-full-jit
IN_CI: 1
IS_GHA: 1
jobs:
build:
runs-on: macos-10.15
timeout-minutes: 240
env:
JOB_BASE_NAME: ios-12-5-1-x86-64-full-jit-build
IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }}
IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }}
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}
LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
if: ${{ (github.repository == 'pytorch/pytorch') && (
(github.event_name == 'push') ||
(github.event_name == 'schedule') ||
(contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||
(false))
}}
steps:
- name: print labels
run: echo "${PR_LABELS}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Setup miniconda
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
python-version: 3.8
activate-environment: build
- name: Install ios / conda Dependencies
run: |
# Install dependencies
brew install libtool
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes
- name: Run Fastlane
shell: bash -e {0}
run: |
set -x
cd ios/TestApp
# install fastlane
sudo gem install bundler && bundle install
# install certificates
echo "${IOS_CERT_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o Certificates.p12
rm cert.txt
bundle exec fastlane install_root_cert
bundle exec fastlane install_dev_cert
# install the provisioning profile
PROFILE=PyTorch_CI_2022.mobileprovision
PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
mkdir -pv "${PROVISIONING_PROFILES}"
cd "${PROVISIONING_PROFILES}"
echo "${IOS_SIGN_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o ${PROFILE}
rm cert.txt
- name: Build
run: |
export TCLLIBPATH="/usr/local/lib"
python -VV
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}
scripts/build_ios.sh
concurrency:
group: ios-12-5-1-x86-64-full-jit-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true

98
.github/workflows/generated-ios-12-5-1-x86-64.yml generated vendored Normal file
View File

@ -0,0 +1,98 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/ios_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: ios-12-5-1-x86-64
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179
defaults:
run:
shell: bash -x -e -l {0}
env:
BUILD_ENVIRONMENT: ios-12-5-1-x86-64
IN_CI: 1
IS_GHA: 1
jobs:
build:
runs-on: macos-10.15
timeout-minutes: 240
env:
JOB_BASE_NAME: ios-12-5-1-x86-64-build
IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }}
IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }}
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}
LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
if: ${{ (github.repository == 'pytorch/pytorch') && (
(github.event_name == 'push') ||
(github.event_name == 'schedule') ||
(contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||
(false))
}}
steps:
- name: print labels
run: echo "${PR_LABELS}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Setup miniconda
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
python-version: 3.8
activate-environment: build
- name: Install ios / conda Dependencies
run: |
# Install dependencies
brew install libtool
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes
- name: Run Fastlane
shell: bash -e {0}
run: |
set -x
cd ios/TestApp
# install fastlane
sudo gem install bundler && bundle install
# install certificates
echo "${IOS_CERT_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o Certificates.p12
rm cert.txt
bundle exec fastlane install_root_cert
bundle exec fastlane install_dev_cert
# install the provisioning profile
PROFILE=PyTorch_CI_2022.mobileprovision
PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
mkdir -pv "${PROVISIONING_PROFILES}"
cd "${PROVISIONING_PROFILES}"
echo "${IOS_SIGN_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o ${PROFILE}
rm cert.txt
- name: Build
run: |
export TCLLIBPATH="/usr/local/lib"
python -VV
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}
scripts/build_ios.sh
concurrency:
group: ios-12-5-1-x86-64-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true

View File

@ -1,24 +1,26 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: libtorch-linux-xenial-cuda10.2-py3.6-gcc7
name: libtorch-linux-xenial-cuda10.2-py3.7-gcc7
on:
pull_request:
types: [unassigned]
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
- fbsync
workflow_dispatch:
env:
BUILD_ENVIRONMENT: libtorch-linux-xenial-cuda10.2-py3.6-gcc7
BUILD_ENVIRONMENT: libtorch-linux-xenial-cuda10.2-py3.7-gcc7
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
IS_GHA: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
@ -26,31 +28,34 @@ env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
AWS_DEFAULT_REGION: us-east-1
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
PYTORCH_RETRY_TEST_CASES: 1
concurrency:
group: libtorch-linux-xenial-cuda10.2-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
group: libtorch-linux-xenial-cuda10.2-py3.7-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/libtorch') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
build:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
timeout-minutes: 240
if: ${{ (github.repository == 'pytorch/pytorch') && (
(github.event_name == 'push') ||
(github.event_name == 'schedule') ||
(contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/libtorch') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||
(false))
}}
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
JOB_BASE_NAME: libtorch-linux-xenial-cuda10.2-py3.7-gcc7-build
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}
LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/libtorch') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: print labels
run: echo "${PR_LABELS}"
- name: Display EC2 information
shell: bash
run: |
@ -69,18 +74,16 @@ jobs:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
--password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
# Ensure the working directory gets chowned back to the current user
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
@ -98,17 +101,22 @@ jobs:
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
@ -140,77 +148,35 @@ jobs:
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
working-directory: .circleci/docker
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: libtorch-linux-xenial-cuda10.2-py3.6-gcc7-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
./build_docker.sh
- name: Pull Docker image
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
retry docker pull "${DOCKER_IMAGE}"
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Build
env:
BRANCH: ${{ steps.parse-ref.outputs.branch }}
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e AWS_DEFAULT_REGION \
-e IS_GHA \
-e PR_NUMBER \
-e SHA1 \
-e BRANCH \
-e GITHUB_RUN_ID \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
@ -229,21 +195,14 @@ jobs:
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
BRANCH: ${{ steps.parse-ref.outputs.branch }}
TAG: ${{ steps.parse-ref.outputs.tag }}
WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
@ -259,8 +218,6 @@ jobs:
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

View File

@ -1,24 +1,26 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: libtorch-linux-xenial-cuda11.3-py3.6-gcc7
name: libtorch-linux-xenial-cuda11.3-py3.7-gcc7
on:
pull_request:
types: [unassigned]
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
- fbsync
workflow_dispatch:
env:
BUILD_ENVIRONMENT: libtorch-linux-xenial-cuda11.3-py3.6-gcc7
BUILD_ENVIRONMENT: libtorch-linux-xenial-cuda11.3-py3.7-gcc7
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
IS_GHA: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
@ -26,31 +28,34 @@ env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
AWS_DEFAULT_REGION: us-east-1
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
PYTORCH_RETRY_TEST_CASES: 1
concurrency:
group: libtorch-linux-xenial-cuda11.3-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
group: libtorch-linux-xenial-cuda11.3-py3.7-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/libtorch') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
build:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
timeout-minutes: 240
if: ${{ (github.repository == 'pytorch/pytorch') && (
(github.event_name == 'push') ||
(github.event_name == 'schedule') ||
(contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/libtorch') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||
(false))
}}
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
JOB_BASE_NAME: libtorch-linux-xenial-cuda11.3-py3.7-gcc7-build
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}
LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/libtorch') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: print labels
run: echo "${PR_LABELS}"
- name: Display EC2 information
shell: bash
run: |
@ -69,18 +74,16 @@ jobs:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
--password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
# Ensure the working directory gets chowned back to the current user
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
@ -98,17 +101,22 @@ jobs:
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
@ -140,77 +148,35 @@ jobs:
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
working-directory: .circleci/docker
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: libtorch-linux-xenial-cuda11.3-py3.6-gcc7-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
./build_docker.sh
- name: Pull Docker image
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
retry docker pull "${DOCKER_IMAGE}"
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Build
env:
BRANCH: ${{ steps.parse-ref.outputs.branch }}
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e AWS_DEFAULT_REGION \
-e IS_GHA \
-e PR_NUMBER \
-e SHA1 \
-e BRANCH \
-e GITHUB_RUN_ID \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
@ -229,21 +195,14 @@ jobs:
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
BRANCH: ${{ steps.parse-ref.outputs.branch }}
TAG: ${{ steps.parse-ref.outputs.tag }}
WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
@ -259,8 +218,6 @@ jobs:
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

View File

@ -5,11 +5,12 @@ name: linux-bionic-cuda10.2-py3.9-gcc7
on:
pull_request:
types: [unassigned]
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
- fbsync
workflow_dispatch:
env:
@ -19,6 +20,7 @@ env:
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
IS_GHA: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
@ -26,31 +28,34 @@ env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
AWS_DEFAULT_REGION: us-east-1
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
PYTORCH_RETRY_TEST_CASES: 1
concurrency:
group: linux-bionic-cuda10.2-py3.9-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository_owner == 'pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/slow'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
build:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
timeout-minutes: 240
if: ${{ (github.repository_owner == 'pytorch') && (
(github.event_name == 'push') ||
(github.event_name == 'schedule') ||
(contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/slow') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||
(false))
}}
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-build
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}
LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/slow') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: print labels
run: echo "${PR_LABELS}"
- name: Display EC2 information
shell: bash
run: |
@ -69,18 +74,16 @@ jobs:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
--password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
# Ensure the working directory gets chowned back to the current user
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
@ -98,17 +101,22 @@ jobs:
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
@ -140,77 +148,35 @@ jobs:
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
working-directory: .circleci/docker
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
./build_docker.sh
- name: Pull Docker image
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
retry docker pull "${DOCKER_IMAGE}"
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Build
env:
BRANCH: ${{ steps.parse-ref.outputs.branch }}
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e AWS_DEFAULT_REGION \
-e IS_GHA \
-e PR_NUMBER \
-e SHA1 \
-e BRANCH \
-e GITHUB_RUN_ID \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
@ -229,21 +195,14 @@ jobs:
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
BRANCH: ${{ steps.parse-ref.outputs.branch }}
TAG: ${{ steps.parse-ref.outputs.tag }}
WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
@ -270,8 +229,6 @@ jobs:
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
@ -294,22 +251,25 @@ jobs:
docker system prune -af
generate-test-matrix:
needs: build
runs-on: ubuntu-18.04
needs: [ciflow_should_run]
timeout-minutes: 240
env:
TEST_RUNNER_TYPE: linux.8xlarge.nvidia.gpu
TEST_RUNNER_TYPE: linux.4xlarge.nvidia.gpu
ENABLE_DISTRIBUTED_TEST: 1
ENABLE_JIT_LEGACY_TEST: ''
ENABLE_MULTIGPU_TEST: ''
ENABLE_NOGPU_NO_AVX_TEST: ''
ENABLE_NOGPU_NO_AVX2_TEST: ''
ENABLE_SLOW_TEST: ''
ENABLE_JIT_LEGACY_TEST: 1
ENABLE_FX2TRT_TEST: ''
ENABLE_MULTIGPU_TEST: 1
ENABLE_NOGPU_NO_AVX_TEST: 1
ENABLE_NOGPU_NO_AVX2_TEST: 1
ENABLE_SLOW_TEST: 1
ENABLE_DOCS_TEST: ''
ENABLE_BACKWARDS_COMPAT_TEST: ''
ENABLE_XLA_TEST: ''
ENABLE_NOARCH_TEST: ''
NUM_TEST_SHARDS: 2
MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
DISTRIBUTED_GPU_RUNNER_TYPE: linux.8xlarge.nvidia.gpu
NOGPU_RUNNER_TYPE: linux.2xlarge
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
@ -328,19 +288,19 @@ jobs:
run: .github/scripts/generate_pytorch_test_matrix.py
test:
needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]
needs: [build, generate-test-matrix]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
timeout-minutes: 240
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }}
JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
- name: Display EC2 information
shell: bash
@ -360,18 +320,16 @@ jobs:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
--password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
# Ensure the working directory gets chowned back to the current user
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
@ -390,9 +348,16 @@ jobs:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
- name: Clean PyTorch checkout
run: |
docker pull "${DOCKER_IMAGE}"
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Pull Docker image
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
retry docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
run: |
@ -426,24 +391,31 @@ jobs:
- name: Test
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
IS_GHA: 1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_DEFAULT_REGION: us-east-1
BRANCH: ${{ steps.parse-ref.outputs.branch }}
# Time out the test phase after 240 minutes
timeout-minutes: 240
run: |
set -x
if [[ $TEST_CONFIG == 'multigpu' ]]; then
TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then
TEST_COMMAND=.jenkins/caffe2/test.sh
else
TEST_COMMAND=.jenkins/pytorch/test.sh
fi
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
PROXY_ENV=
# NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now
# We should investigate whether or not there's a list of hostnames we can add to no_proxy to
# make it so that we shouldn't have to fully disable squid for XLA tests
if [[ $TEST_CONFIG != 'xla' ]]; then
# shellcheck disable=SC2089
PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock"
fi
# detached container should get cleaned up by teardown_ec2_linux
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
# shellcheck disable=SC2086,SC2090
container_name=$(docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
@ -452,9 +424,8 @@ jobs:
-e GITHUB_ACTIONS \
-e IN_CI \
-e IS_GHA \
-e CIRCLE_BRANCH \
-e CIRCLE_SHA1 \
-e CIRCLE_PR_NUMBER \
-e BRANCH \
-e SHA1 \
-e AWS_DEFAULT_REGION \
-e IN_WHEEL_TEST \
-e SHARD_NUMBER \
@ -462,15 +433,17 @@ jobs:
-e TEST_CONFIG \
-e NUM_TEST_SHARDS \
-e PYTORCH_IGNORE_DISABLED_ISSUES \
-e PYTORCH_RETRY_TEST_CASES \
-e PR_LABELS \
-e CONTINUE_THROUGH_ERROR \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
${PROXY_ENV} \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--ulimit stack=10485760:83886080 \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--ipc=host \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
@ -499,6 +472,22 @@ jobs:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Zip JSONs for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
run: |
# Remove any previous test jsons if they exist
rm -f test-jsons-*.zip
zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json'
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Downloaded JSONs on S3
if: always()
with:
retention-days: 14
if-no-files-found: warn
path:
test-jsons-*.zip
- name: Zip test reports for upload
if: always()
env:
@ -507,15 +496,6 @@ jobs:
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
@ -530,16 +510,16 @@ jobs:
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
TAG: ${{ steps.parse-ref.outputs.tag }}
WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m pip install boto3==1.19.12
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
@ -547,8 +527,6 @@ jobs:
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

View File

@ -1,7 +1,7 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: linux-bionic-py3.6-clang9
name: linux-bionic-py3.7-clang9
on:
pull_request:
@ -10,15 +10,17 @@ on:
branches:
- master
- release/*
- fbsync
workflow_dispatch:
env:
BUILD_ENVIRONMENT: linux-bionic-py3.6-clang9
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.6-clang9
BUILD_ENVIRONMENT: linux-bionic-py3.7-clang9
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.7-clang9
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
IS_GHA: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
@ -26,31 +28,34 @@ env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
AWS_DEFAULT_REGION: us-east-1
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
PYTORCH_RETRY_TEST_CASES: 1
concurrency:
group: linux-bionic-py3.6-clang9-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
group: linux-bionic-py3.7-clang9-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/noarch') || contains(github.event.pull_request.labels.*.name, 'ciflow/xla'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
build:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
timeout-minutes: 240
if: ${{ (github.repository == 'pytorch/pytorch') && (
(github.event_name == 'push') ||
(github.event_name == 'schedule') ||
(contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/noarch') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||
((github.event_name == 'pull_request' && github.event.action != 'unassigned') && !contains(join(github.event.pull_request.labels.*.name), 'ciflow/')))
}}
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
JOB_BASE_NAME: linux-bionic-py3.7-clang9-build
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}
LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/noarch') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: print labels
run: echo "${PR_LABELS}"
- name: Display EC2 information
shell: bash
run: |
@ -69,18 +74,16 @@ jobs:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
--password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
# Ensure the working directory gets chowned back to the current user
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
@ -98,17 +101,22 @@ jobs:
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
@ -140,77 +148,35 @@ jobs:
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
working-directory: .circleci/docker
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-bionic-py3.6-clang9-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
./build_docker.sh
- name: Pull Docker image
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
retry docker pull "${DOCKER_IMAGE}"
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Build
env:
BRANCH: ${{ steps.parse-ref.outputs.branch }}
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e AWS_DEFAULT_REGION \
-e IS_GHA \
-e PR_NUMBER \
-e SHA1 \
-e BRANCH \
-e GITHUB_RUN_ID \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
@ -229,21 +195,14 @@ jobs:
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
BRANCH: ${{ steps.parse-ref.outputs.branch }}
TAG: ${{ steps.parse-ref.outputs.tag }}
WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
@ -270,8 +229,6 @@ jobs:
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
@ -294,12 +251,14 @@ jobs:
docker system prune -af
generate-test-matrix:
needs: build
runs-on: ubuntu-18.04
needs: [ciflow_should_run]
timeout-minutes: 240
env:
TEST_RUNNER_TYPE: linux.2xlarge
ENABLE_DISTRIBUTED_TEST: ''
ENABLE_JIT_LEGACY_TEST: ''
ENABLE_FX2TRT_TEST: ''
ENABLE_MULTIGPU_TEST: ''
ENABLE_NOGPU_NO_AVX_TEST: ''
ENABLE_NOGPU_NO_AVX2_TEST: ''
@ -310,6 +269,7 @@ jobs:
ENABLE_NOARCH_TEST: 1
NUM_TEST_SHARDS: 2
MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
DISTRIBUTED_GPU_RUNNER_TYPE: linux.8xlarge.nvidia.gpu
NOGPU_RUNNER_TYPE: linux.2xlarge
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
@ -328,19 +288,19 @@ jobs:
run: .github/scripts/generate_pytorch_test_matrix.py
test:
needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]
needs: [build, generate-test-matrix]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
timeout-minutes: 240
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-bionic-py3.6-clang9-test
DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }}
JOB_BASE_NAME: linux-bionic-py3.7-clang9-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
- name: Display EC2 information
shell: bash
@ -360,18 +320,16 @@ jobs:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
--password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
# Ensure the working directory gets chowned back to the current user
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
@ -390,9 +348,16 @@ jobs:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
- name: Clean PyTorch checkout
run: |
docker pull "${DOCKER_IMAGE}"
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Pull Docker image
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
retry docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
run: |
@ -426,24 +391,31 @@ jobs:
- name: Test
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
IS_GHA: 1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_DEFAULT_REGION: us-east-1
BRANCH: ${{ steps.parse-ref.outputs.branch }}
# Time out the test phase after 240 minutes
timeout-minutes: 240
run: |
set -x
if [[ $TEST_CONFIG == 'multigpu' ]]; then
TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then
TEST_COMMAND=.jenkins/caffe2/test.sh
else
TEST_COMMAND=.jenkins/pytorch/test.sh
fi
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
PROXY_ENV=
# NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now
# We should investigate whether or not there's a list of hostnames we can add to no_proxy to
# make it so that we shouldn't have to fully disable squid for XLA tests
if [[ $TEST_CONFIG != 'xla' ]]; then
# shellcheck disable=SC2089
PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock"
fi
# detached container should get cleaned up by teardown_ec2_linux
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
# shellcheck disable=SC2086,SC2090
container_name=$(docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
@ -452,9 +424,8 @@ jobs:
-e GITHUB_ACTIONS \
-e IN_CI \
-e IS_GHA \
-e CIRCLE_BRANCH \
-e CIRCLE_SHA1 \
-e CIRCLE_PR_NUMBER \
-e BRANCH \
-e SHA1 \
-e AWS_DEFAULT_REGION \
-e IN_WHEEL_TEST \
-e SHARD_NUMBER \
@ -462,15 +433,17 @@ jobs:
-e TEST_CONFIG \
-e NUM_TEST_SHARDS \
-e PYTORCH_IGNORE_DISABLED_ISSUES \
-e PYTORCH_RETRY_TEST_CASES \
-e PR_LABELS \
-e CONTINUE_THROUGH_ERROR \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
${PROXY_ENV} \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--ulimit stack=10485760:83886080 \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--ipc=host \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
@ -499,6 +472,22 @@ jobs:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Zip JSONs for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
run: |
# Remove any previous test jsons if they exist
rm -f test-jsons-*.zip
zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json'
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Downloaded JSONs on S3
if: always()
with:
retention-days: 14
if-no-files-found: warn
path:
test-jsons-*.zip
- name: Zip test reports for upload
if: always()
env:
@ -507,15 +496,6 @@ jobs:
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
@ -530,16 +510,16 @@ jobs:
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: linux-bionic-py3.6-clang9-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: linux-bionic-py3.7-clang9-test
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
TAG: ${{ steps.parse-ref.outputs.tag }}
WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m pip install boto3==1.19.12
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
@ -547,8 +527,6 @@ jobs:
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

384
.github/workflows/generated-linux-docs-push.yml generated vendored Normal file
View File

@ -0,0 +1,384 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: linux-docs-push
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
schedule:
- cron: 0 0 * * *
workflow_dispatch:
env:
BUILD_ENVIRONMENT: linux-docs-push
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
IS_GHA: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
AWS_DEFAULT_REGION: us-east-1
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
PYTORCH_RETRY_TEST_CASES: 1
concurrency:
group: linux-docs-push-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
build:
runs-on: linux.2xlarge
timeout-minutes: 240
if: ${{ (github.repository == 'pytorch/pytorch') && (
(github.event_name == 'push') ||
(github.event_name == 'schedule') ||
(contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/scheduled')) ||
(false))
}}
env:
JOB_BASE_NAME: linux-docs-push-build
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}
LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/scheduled') }}
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: print labels
run: echo "${PR_LABELS}"
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
--password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
- name: Chown workspace
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
retry docker pull "${ALPINE_IMAGE}"
# Ensure the working directory gets chowned back to the current user
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_SKIP_S3_UPLOAD: 1
working-directory: .circleci/docker
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
./build_docker.sh
- name: Pull Docker image
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
retry docker pull "${DOCKER_IMAGE}"
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Build
env:
BRANCH: ${{ steps.parse-ref.outputs.branch }}
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e AWS_DEFAULT_REGION \
-e IS_GHA \
-e PR_NUMBER \
-e SHA1 \
-e BRANCH \
-e GITHUB_RUN_ID \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
BRANCH: ${{ steps.parse-ref.outputs.branch }}
TAG: ${{ steps.parse-ref.outputs.tag }}
WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Archive artifacts into zip
run: |
zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
- uses: seemethere/upload-artifact-s3@v3
name: Store PyTorch Build Artifacts on S3
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
artifacts.zip
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
build-docs:
runs-on: linux.2xlarge
timeout-minutes: 240
strategy:
matrix:
docs_type: [cpp, python]
needs: [build]
env:
DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }}
DOCS_TYPE: ${{ matrix.docs_type }}
WITH_PUSH: ${{ github.event_name == 'schedule' }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
--password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
- name: Chown workspace
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
retry docker pull "${ALPINE_IMAGE}"
# Ensure the working directory gets chowned back to the current user
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Pull Docker image
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
retry docker pull "${DOCKER_IMAGE}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Generate netrc (only for docs-push)
if: ${{ github.event_name == 'schedule' }}
env:
GITHUB_PYTORCHBOT_TOKEN: ${{ secrets.GH_PYTORCHBOT_TOKEN }}
run: |
# set credentials for https pushing
echo "machine github.com" > "${RUNNER_TEMP}/.netrc"
echo "login pytorchbot" >> "${RUNNER_TEMP}/.netrc"
echo "password ${GITHUB_PYTORCHBOT_TOKEN}" >> "${RUNNER_TEMP}/.netrc"
- name: Build ${{ matrix.docs_type }} docs
run: |
set -ex
time docker pull "${DOCKER_IMAGE}" > /dev/null
echo "${GITHUB_REF}"
# TODO: Set it correctly when workflows are scheduled on tags
target="master"
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e IN_CI \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SHA1="$GITHUB_SHA" \
-e DOCS_VERSION="${target}" \
-e DOCS_TYPE \
-e PR_LABELS \
-e WITH_PUSH \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${RUNNER_TEMP}/.netrc":/var/lib/jenkins/.netrc \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" bash -c "sudo chown -R jenkins . && pip install dist/*.whl && ./.circleci/scripts/${DOCS_TYPE}_doc_push_script.sh"
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- uses: seemethere/upload-artifact-s3@v3
name: Upload Python Docs Preview
if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'python' }}
with:
retention-days: 14
s3-bucket: doc-previews
if-no-files-found: error
path: pytorch.github.io/docs/master/
s3-prefix: pytorch/${{ github.event.pull_request.number }}
- uses: seemethere/upload-artifact-s3@v3
name: Upload C++ Docs Preview
if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'cpp' }}
with:
retention-days: 14
if-no-files-found: error
s3-bucket: doc-previews
path: cppdocs/
s3-prefix: pytorch/${{ github.event.pull_request.number }}/cppdocs

377
.github/workflows/generated-linux-docs.yml generated vendored Normal file
View File

@ -0,0 +1,377 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: linux-docs
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
- fbsync
workflow_dispatch:
env:
BUILD_ENVIRONMENT: linux-docs
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
IS_GHA: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
AWS_DEFAULT_REGION: us-east-1
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
PYTORCH_RETRY_TEST_CASES: 1
concurrency:
group: linux-docs-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
build:
runs-on: linux.2xlarge
timeout-minutes: 240
if: ${{ (github.repository == 'pytorch/pytorch') && (
(github.event_name == 'push') ||
(github.event_name == 'schedule') ||
(contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/docs') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||
((github.event_name == 'pull_request' && github.event.action != 'unassigned') && !contains(join(github.event.pull_request.labels.*.name), 'ciflow/')))
}}
env:
JOB_BASE_NAME: linux-docs-build
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}
LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/docs') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: print labels
run: echo "${PR_LABELS}"
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
--password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
- name: Chown workspace
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
retry docker pull "${ALPINE_IMAGE}"
# Ensure the working directory gets chowned back to the current user
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_SKIP_S3_UPLOAD: 1
working-directory: .circleci/docker
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
./build_docker.sh
- name: Pull Docker image
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
retry docker pull "${DOCKER_IMAGE}"
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Build
env:
BRANCH: ${{ steps.parse-ref.outputs.branch }}
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e AWS_DEFAULT_REGION \
-e IS_GHA \
-e PR_NUMBER \
-e SHA1 \
-e BRANCH \
-e GITHUB_RUN_ID \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
BRANCH: ${{ steps.parse-ref.outputs.branch }}
TAG: ${{ steps.parse-ref.outputs.tag }}
WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Archive artifacts into zip
run: |
zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
- uses: seemethere/upload-artifact-s3@v3
name: Store PyTorch Build Artifacts on S3
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
artifacts.zip
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
build-docs:
runs-on: linux.2xlarge
timeout-minutes: 240
strategy:
matrix:
docs_type: [cpp, python]
needs: [build]
env:
DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }}
DOCS_TYPE: ${{ matrix.docs_type }}
WITH_PUSH: ${{ github.event_name == 'schedule' }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
--password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
- name: Chown workspace
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
retry docker pull "${ALPINE_IMAGE}"
# Ensure the working directory gets chowned back to the current user
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Pull Docker image
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
retry docker pull "${DOCKER_IMAGE}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Build ${{ matrix.docs_type }} docs
run: |
set -ex
time docker pull "${DOCKER_IMAGE}" > /dev/null
echo "${GITHUB_REF}"
# TODO: Set it correctly when workflows are scheduled on tags
target="master"
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e IN_CI \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SHA1="$GITHUB_SHA" \
-e DOCS_VERSION="${target}" \
-e DOCS_TYPE \
-e PR_LABELS \
-e WITH_PUSH \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" bash -c "sudo chown -R jenkins . && pip install dist/*.whl && ./.circleci/scripts/${DOCS_TYPE}_doc_push_script.sh"
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- uses: seemethere/upload-artifact-s3@v3
name: Upload Python Docs Preview
if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'python' }}
with:
retention-days: 14
s3-bucket: doc-previews
if-no-files-found: error
path: pytorch.github.io/docs/master/
s3-prefix: pytorch/${{ github.event.pull_request.number }}
- uses: seemethere/upload-artifact-s3@v3
name: Upload C++ Docs Preview
if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'cpp' }}
with:
retention-days: 14
if-no-files-found: error
s3-bucket: doc-previews
path: cppdocs/
s3-prefix: pytorch/${{ github.event.pull_request.number }}/cppdocs

View File

@ -1,7 +1,7 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: linux-bionic-py3.8-gcc9-coverage
name: linux-vulkan-bionic-py3.7-clang9
on:
pull_request:
@ -10,15 +10,17 @@ on:
branches:
- master
- release/*
- fbsync
workflow_dispatch:
env:
BUILD_ENVIRONMENT: linux-bionic-py3.8-gcc9-coverage
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.8-gcc9
BUILD_ENVIRONMENT: linux-vulkan-bionic-py3.7-clang9
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.7-clang9
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
IS_GHA: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
@ -26,31 +28,34 @@ env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
AWS_DEFAULT_REGION: us-east-1
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
PYTORCH_RETRY_TEST_CASES: 1
concurrency:
group: linux-bionic-py3.8-gcc9-coverage-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
group: linux-vulkan-bionic-py3.7-clang9-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/coverage') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
build:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
timeout-minutes: 240
if: ${{ (github.repository == 'pytorch/pytorch') && (
(github.event_name == 'push') ||
(github.event_name == 'schedule') ||
(contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') || contains(github.event.pull_request.labels.*.name, 'ciflow/vulkan')) ||
((github.event_name == 'pull_request' && github.event.action != 'unassigned') && !contains(join(github.event.pull_request.labels.*.name), 'ciflow/')))
}}
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
JOB_BASE_NAME: linux-vulkan-bionic-py3.7-clang9-build
IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}
LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') || contains(github.event.pull_request.labels.*.name, 'ciflow/vulkan') }}
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: print labels
run: echo "${PR_LABELS}"
- name: Display EC2 information
shell: bash
run: |
@ -69,18 +74,16 @@ jobs:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
--password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
# Ensure the working directory gets chowned back to the current user
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
@ -98,17 +101,22 @@ jobs:
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
submodules: recursive
- name: Clean PyTorch checkout
run: |
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
@ -140,77 +148,35 @@ jobs:
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
working-directory: .circleci/docker
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-bionic-py3.8-gcc9-coverage-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
./build_docker.sh
- name: Pull Docker image
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
retry docker pull "${DOCKER_IMAGE}"
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Build
env:
BRANCH: ${{ steps.parse-ref.outputs.branch }}
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e AWS_DEFAULT_REGION \
-e IS_GHA \
-e PR_NUMBER \
-e SHA1 \
-e BRANCH \
-e GITHUB_RUN_ID \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
@ -229,21 +195,14 @@ jobs:
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
BRANCH: ${{ steps.parse-ref.outputs.branch }}
TAG: ${{ steps.parse-ref.outputs.tag }}
WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
@ -270,8 +229,6 @@ jobs:
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
@ -294,12 +251,14 @@ jobs:
docker system prune -af
generate-test-matrix:
needs: build
runs-on: ubuntu-18.04
needs: [ciflow_should_run]
timeout-minutes: 240
env:
TEST_RUNNER_TYPE: linux.2xlarge
ENABLE_DISTRIBUTED_TEST: 1
ENABLE_DISTRIBUTED_TEST: ''
ENABLE_JIT_LEGACY_TEST: ''
ENABLE_FX2TRT_TEST: ''
ENABLE_MULTIGPU_TEST: ''
ENABLE_NOGPU_NO_AVX_TEST: ''
ENABLE_NOGPU_NO_AVX2_TEST: ''
@ -308,8 +267,9 @@ jobs:
ENABLE_BACKWARDS_COMPAT_TEST: ''
ENABLE_XLA_TEST: ''
ENABLE_NOARCH_TEST: ''
NUM_TEST_SHARDS: 2
NUM_TEST_SHARDS: 1
MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
DISTRIBUTED_GPU_RUNNER_TYPE: linux.8xlarge.nvidia.gpu
NOGPU_RUNNER_TYPE: linux.2xlarge
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
@ -328,19 +288,19 @@ jobs:
run: .github/scripts/generate_pytorch_test_matrix.py
test:
needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]
needs: [build, generate-test-matrix]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
timeout-minutes: 240
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-bionic-py3.8-gcc9-coverage-test
DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }}
JOB_BASE_NAME: linux-vulkan-bionic-py3.7-clang9-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
- name: Display EC2 information
shell: bash
@ -360,18 +320,16 @@ jobs:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
--password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
# Ensure the working directory gets chowned back to the current user
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
@ -390,9 +348,16 @@ jobs:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
- name: Clean PyTorch checkout
run: |
docker pull "${DOCKER_IMAGE}"
# Remove any artifacts from the previous checkouts
git clean -fxd
- name: Pull Docker image
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
retry docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
run: |
@ -426,24 +391,31 @@ jobs:
- name: Test
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
IS_GHA: 1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_DEFAULT_REGION: us-east-1
BRANCH: ${{ steps.parse-ref.outputs.branch }}
# Time out the test phase after 240 minutes
timeout-minutes: 240
run: |
set -x
if [[ $TEST_CONFIG == 'multigpu' ]]; then
TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then
TEST_COMMAND=.jenkins/caffe2/test.sh
else
TEST_COMMAND=.jenkins/pytorch/test.sh
fi
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
PROXY_ENV=
# NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now
# We should investigate whether or not there's a list of hostnames we can add to no_proxy to
# make it so that we shouldn't have to fully disable squid for XLA tests
if [[ $TEST_CONFIG != 'xla' ]]; then
# shellcheck disable=SC2089
PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock"
fi
# detached container should get cleaned up by teardown_ec2_linux
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
# shellcheck disable=SC2086,SC2090
container_name=$(docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
@ -452,9 +424,8 @@ jobs:
-e GITHUB_ACTIONS \
-e IN_CI \
-e IS_GHA \
-e CIRCLE_BRANCH \
-e CIRCLE_SHA1 \
-e CIRCLE_PR_NUMBER \
-e BRANCH \
-e SHA1 \
-e AWS_DEFAULT_REGION \
-e IN_WHEEL_TEST \
-e SHARD_NUMBER \
@ -462,15 +433,17 @@ jobs:
-e TEST_CONFIG \
-e NUM_TEST_SHARDS \
-e PYTORCH_IGNORE_DISABLED_ISSUES \
-e PYTORCH_RETRY_TEST_CASES \
-e PR_LABELS \
-e CONTINUE_THROUGH_ERROR \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
${PROXY_ENV} \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--ulimit stack=10485760:83886080 \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--ipc=host \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
@ -499,10 +472,22 @@ jobs:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Report coverage
- name: Zip JSONs for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
run: |
python3 -mpip install codecov==2.1.12
python3 -mcodecov
# Remove any previous test jsons if they exist
rm -f test-jsons-*.zip
zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json'
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Downloaded JSONs on S3
if: always()
with:
retention-days: 14
if-no-files-found: warn
path:
test-jsons-*.zip
- name: Zip test reports for upload
if: always()
env:
@ -511,15 +496,6 @@ jobs:
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
@ -534,16 +510,16 @@ jobs:
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: linux-bionic-py3.8-gcc9-coverage-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: linux-vulkan-bionic-py3.7-clang9-test
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
TAG: ${{ steps.parse-ref.outputs.tag }}
WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m pip install boto3==1.19.12
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
@ -551,8 +527,6 @@ jobs:
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

Some files were not shown because too many files have changed in this diff Show More