Commit Graph

43123 Commits

Author SHA1 Message Date
4868907cf3 [binaries] fix dump_operator_name binary (#71246)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71246

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D33555962

Pulled By: IvanKobzarev

fbshipit-source-id: 2b386e52fa8e76c877fec5b6b15d99f7d280801f
(cherry picked from commit f6d60fdff68964f77aa46ca2c51327cb66566194)
2022-01-20 17:33:08 +00:00
89c844db9b [torch.distributions] Implement positive-semidefinite constraint (#71375)
Summary:
While implementing https://github.com/pytorch/pytorch/issues/70275, I thought that it will be useful if there is a `torch.distributions.constraints` to check the positive-semidefiniteness of matrix random variables.
This PR implements it with `torch.linalg.eigvalsh`, different from `torch.distributions.constraints.positive_definite` implemented with `torch.linalg.cholesky_ex`.
Currently, `torch.linalg.cholesky_ex` returns only the order of the leading minor that is not positive-definite in symmetric matrices and we can't check positive semi-definiteness by the mechanism.
cc neerajprad

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71375

Reviewed By: H-Huang

Differential Revision: D33663990

Pulled By: neerajprad

fbshipit-source-id: 02cefbb595a1da5e54a239d4f17b33c619416518
(cherry picked from commit 43eaea5bd861714f234e9efc1a7fb571631298f4)
2022-01-20 17:33:08 +00:00
640bfa7e6f Refactor convolution_backward's cudnn cases (#71491)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71491

Changed the Cudnn and CudnnTranspose cases to only make the input
contiguous when it is needed for the grad_weight computation.

Reading the implementation of cudnn_convolution_transpose_backward and
cudnn_convolution_backward give me confidence that `input` isn't used
for the grad_weight computation. However, the memory format logic is so
convoluted that I'm 100$ sure this correct. All the tests though
and on request I can directly pass `backend_memory_format` to
{cudnn_convolution_backward, cudnn_convolution_transpose_backward}.

Test Plan: - pytest test/test_nn.py -v -k "conv"

Reviewed By: jbschlosser

Differential Revision: D33664694

Pulled By: zou3519

fbshipit-source-id: 9f4929686fe34f7aaf5331bfa49e98022b9d6c08
(cherry picked from commit 9e2ba0daca88139f7941bcb56bbc23825585d7a2)
2022-01-20 17:33:08 +00:00
06f14c2d63 Refactor convolution_backward's CudaDepthwise3d case (#71490)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71490

Deleted unnecessary .contiguous() calls in convolution_backward. The
CudaDepthwise3d case always hits _depthwise_3d_backward_cuda_out,
which will make arguments contiguous as necessary.

Changed _depthwise_3d_backward_cuda_out
- to make the input contiguous only when we're computing grad_weight
- to make the weight contiguous only when we're computing grad_input

Test Plan: - pytest test/test_nn.py -v -k "conv"

Reviewed By: jbschlosser

Differential Revision: D33664696

Pulled By: zou3519

fbshipit-source-id: d01d4f213e21ef4778de089a158933737b191cdf
(cherry picked from commit c6eb977c94a07f9812567a43b125b453eb5c5051)
2022-01-20 17:33:08 +00:00
17d2a5167e Refactor convolution_backward's CudaDepthwise2d case (#71489)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71489

Deleted unnecessary .contiguous() calls in convolution_backward. The
CudaDepthwise2d case always hits conv_depthwise2d_backward_cuda_out,
which makes the grad_output / self contiguous.

Changed conv_depthwise2d_backward_cuda_out to change `self_` (aka the
image input to convolution) to be contiguous only when we're computing
the grad_weight. This is because when we are computing the grad_input,
we only need the values from the grad_output and the weight.

Test Plan: - pytest test/test_nn.py -v -k "conv"

Reviewed By: jbschlosser

Differential Revision: D33664697

Pulled By: zou3519

fbshipit-source-id: 7a755fa8a076809c5490422d69fdf7ed80c8e29a
(cherry picked from commit 862ae63bab74113b3607b1bbc0a82f27992550fe)
2022-01-20 17:33:08 +00:00
42f7afc4cd [BE] Improve gitutils
Inherit `PeekableIterator` from `collections.abc.Iterator`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71515
2022-01-20 17:07:06 +00:00
a9f44b22c0 Fix composite compliance problems for linalg.{matrix_power, inv, cholesky} (#69437)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69437

linalg.{inv, cholesky} are problematic because they call .data_ptr().
This makes them not composite compliant (e.g. meta tensors will not run
on them correctly). This PR makes them composite compliant by adding a
new native_functions operator that does error checking,
`_linalg_check_errors(Tensor info, str api_name, bool is_matrix`
that is a primitive with respect to autograd.

This PR modifies linalg.inv and linalg.cholesky to call the new error
check function. I also needed to refactor singleCheckErrors and
batchCheckErrors to accept a c10::string_view instead of a
`const char*`; you can convert `const char*` to c10::string_view but not
the other way around because `string_view` does not require null
terminated buffers.

Finally, there is a bugfix in `__torch_dispatch__` for this PR for
the composite compliant testing mechanism. Previously,
`__torch_dispatch__` could not handle operators with no returns; this PR
fixes that. No returns in C++ is equivalent to a single None return in
Python.

Test Plan: - composite compliant tests

Reviewed By: albanD

Differential Revision: D32883666

Pulled By: zou3519

fbshipit-source-id: d5a3f52ebab116c93e1a54a203eacc8f787de7e2
(cherry picked from commit 9e24c9599a043877ab4f289469be55550c996a79)
2022-01-20 16:14:34 +00:00
011fd1d933 [DataPipe] improving DataPipe unit tests (#70215)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70215

A few renaming, formatting, and additional tests to make the unit tests better.

cc VitalyFedyunin ejguan NivekT

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D33344610

Pulled By: NivekT

fbshipit-source-id: bb36f7452bdc44964c9ce0650c7ae308ba2c5aa5
(cherry picked from commit 0aae20cb27038b7b3598520db4304a604f1e6799)
2022-01-20 15:49:53 +00:00
9f0c808593 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33677079

fbshipit-source-id: 997b73bebdcf83e09138bddc4bce257d0740e874
(cherry picked from commit 620023ad32a9e2c971edf79cd8d9653a987a5aff)
2022-01-20 12:13:18 +00:00
06838ce8b1 fix: do_constant_folding arg when exporting ONNX (#71348)
Summary:
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71348

Reviewed By: H-Huang

Differential Revision: D33662228

Pulled By: msaroufim

fbshipit-source-id: a69c72838b7ff41a2305453ef00666c060ade593
(cherry picked from commit 75dd62b406a655ff9612a340a80a3bd563bd9919)
2022-01-20 05:42:35 +00:00
21b697b646 add flatbuffer_loader and flatbuffer_serializer as BUCK target (#71463)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71463

title

Test Plan: unittest

Reviewed By: zhxchen17

Differential Revision: D33651339

fbshipit-source-id: 4bf325a40e263a441fd86bce560645ad0c1ebb23
(cherry picked from commit 4cb02e62a68f338e3388ad09276ced9b8f4cdcb1)
2022-01-20 04:51:10 +00:00
99df96d800 Add silu and hardsigmoid converter (#71453)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71453

As title

Test Plan: unit test

Reviewed By: frank-wei

Differential Revision: D33646384

fbshipit-source-id: d86326c93e4d6bd59c9152592721f0e6ecf7f6fb
(cherry picked from commit d886380edef3388d60d529100332f9d9564f0913)
2022-01-20 03:16:20 +00:00
80b19c4c8c Enable Python bindings for UntypedStorage (#68945)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68945

This PR enables the Python conversion functions for `Storage` (specifically `UntypedStorage`) and also cleans up some remnants of the deprecated typed storages from `DynamicTypes.cpp`.
ghstack-source-id: 147245110

Test Plan: Run the existing unit and integration tests.

Reviewed By: albanD

Differential Revision: D32676505

fbshipit-source-id: 3a3f6db4fb0da5c78dd406c96ab70bdc37015521
(cherry picked from commit d6427b94cf88b078bd228d43cd2afbabf0773b39)
2022-01-20 02:11:34 +00:00
f5b19ba683 Additional unit test for sharded linear. (#70476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70476

1) Support a single dimension for inputs
2) Test several error cases

Partially addresses https://github.com/pytorch/pytorch/issues/65638
ghstack-source-id: 146307607

Test Plan: waitforbuildbot

Reviewed By: fduwjj

Differential Revision: D33344357

fbshipit-source-id: 4de7a7177452951dbcce76f27441703447609e6f
(cherry picked from commit 96dfded5697e451b54f113f99b6d0da6f6af500d)
2022-01-20 01:23:44 +00:00
a5d5b11252 Add GitHub merge rules (#71514)
Summary:
Following subfolders of the project were identified as one that can be
merged on github first and then asynchronously merged into Meta
codebase:
## ONNX exporter
PRs that include only files under `torch/onnx`, `torch/csrc/jit/passes/onnx` and `test/onnx` and are reviewed by garymm
## CUDA fusers
PRs that include only files under `torch/csrc/jit/codegen/fuser/cuda`, `torch/csrc/jit/codegen/cuda` or `benchmarks/cpp/nvfuser` and reviewed by csarofeen or ngimel
## OSS CI
PR that include only files under `.circleci`, `.github` and `.jenkins` and reviewed either by seemethere or myself

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71514

Reviewed By: bigfootjon

Differential Revision: D33673050

Pulled By: malfet

fbshipit-source-id: 21b909d49cb73ff79879b3ea0568e53ef65aa08c
(cherry picked from commit 520226c1bf341fe6a9e1cd42f18da73c43386062)
2022-01-20 01:16:25 +00:00
c59942ac73 [PyTorch] Fix a bunch of structured kernel refcounting (#71140)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71140

Structured kernels need to use the borrowing variants of the build APIs to TensorIterator. (I am working on a debug check for this, but it is currently too strict, and relaxing it does not catch these bugs.)
ghstack-source-id: 147191022

Test Plan: CI

Reviewed By: bdhirsh

Differential Revision: D33520003

fbshipit-source-id: 3b0ff9036acdb78ae6fc7489ed0ed487d5ff080f
(cherry picked from commit 80ef4e14e33718a9ad5aaefc218bb773e3b15a5c)
2022-01-20 00:30:43 +00:00
b98e955b24 [flatbuffer] Fix forward flatbuffer type handling with dynamic type. (#71500)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71500

Some places in flatbuffer_loader.cpp need to update to newer API call following the dynamic type changes.
ghstack-source-id: 147278860

Test Plan:
rebase D33665961
```
[zhxchen17@devbig560.ftw3 /data/users/zhxchen17/fbsource]  buck run fbcode/mode/dbg //arvr/firmware/silicon/turing:test_torch -c turing.min_runtime=1 -c turing.dsp_op=1 -c turing.model_file=test1.ptl -c pt.has_backtraces=1
Action graph will be rebuilt because files have been added or removed.
Downloaded 0/4 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 6.1 sec (100%) 253/253 jobs, 3/253 updated
  Total time: 6.1 sec
BUILD SUCCEEDED
Conv:  input [1, 32, 4, 4] residuals [1] weights [4, 4, 1, 1, 2, 32] nlu_params [4, 128] in_ch 32 out_ch 32 groups 1 kernel  stride  padding  upsample 0 op_type 0 act_type 0
```

Reviewed By: qihqi

Differential Revision: D33668588

fbshipit-source-id: 44163c1bc0ea57e4bd265384a253d6cc7b96ed4a
(cherry picked from commit 746487075e36fe90317b631cb3a839d16fd0723f)
2022-01-20 00:22:35 +00:00
565f78f571 [Pytorch] Speed up LayerNorm 4-5% (#71423)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71423

Replacing this math with a load seems to improve perf.
ghstack-source-id: 147171800

Test Plan: ptvsc2_predictor_bench runs on model from mikeiovine courtesy of mikeiovine

Reviewed By: mikeiovine, xiaomengy

Differential Revision: D33552176

fbshipit-source-id: f21a4cd66c13b9fcb7bcf48f356bdc85e94c4216
(cherry picked from commit 0354fcb9889e7345321fe4dc9e30495a67709a4d)
2022-01-20 00:16:17 +00:00
958f9cf5ff [PyTorch][Static Runtime] Fix extra refcount bumps in layer_norm (#71237)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71237

Noticed these on inspection.
ghstack-source-id: 147171799

Test Plan: CI

Reviewed By: mikeiovine

Differential Revision: D33519799

fbshipit-source-id: 167c63323b345a5822303cecdbbbbb959f66f6e4
(cherry picked from commit 57e8da2d354497d3370906d1ae145288a2fd166b)
2022-01-20 00:16:17 +00:00
811af25963 Fix trivial typo at the doc of torch.lobpcg (#71464)
Summary:
I think `symmetric positive defined generalized eigenvalue problem` should be changed to `symmetric positive definite generalized eigenvalue problem`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71464

Reviewed By: ejguan

Differential Revision: D33660670

Pulled By: H-Huang

fbshipit-source-id: 85dc830ed56a98d8a38bd2843f575f6ce08498cf
(cherry picked from commit dbbef542c04a8dd93514ac7783f4546e5da7ca58)
2022-01-20 00:07:39 +00:00
dc5cda0cca Update min python version to 3.7 in setup.py and mypy configs (#71494)
Summary:
As Python-3.6 have reached EOL

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71494

Reviewed By: atalman

Differential Revision: D33667509

Pulled By: malfet

fbshipit-source-id: ab1f03085cfb9161df77ba5ce373b81f5e7ef3ae
(cherry picked from commit 60343166d97b1eb1649b29a78ad390d39926b642)
2022-01-20 00:03:57 +00:00
06bc6748a1 [acc_ops] Remove usage of kwarg expansion via **locals() for jit scripting support (#71425)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71425

att

Test Plan: CI

Reviewed By: yuhc

Differential Revision: D33639228

fbshipit-source-id: 95edced3b19a531d417538f00f0a555295c8741f
(cherry picked from commit 45455a6edc721a0362a5e775ac2fb52f4f16c84d)
2022-01-19 23:49:50 +00:00
ef4bc3fa2f [distributed] Make rref_proxy._invoke_rpc trully async when needed. (#70206)
Summary:
From https://github.com/pytorch/pytorch/issues/67626: RRefProxy (rref.rpc_async, rref.rpc_sync, rref.remote) currently uses a blocking RPC call to the owner

This is done by chaining async calls. In the sync case we wait on the
resulting Future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70206

Test Plan:
I ran rpc_tests using tensorpipe_rpc_agent_test_fixture.py and had to
adjust test_rref_proxy_timeout to the new behavior.

I ran into test_tensorpipe_set_default_timeout failing due to the
timeout being too small. Doesn't look related to this change.
mrshenli
Fixes https://github.com/pytorch/pytorch/issues/67626

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Reviewed By: pritamdamania87

Differential Revision: D33243348

Pulled By: kumpera

fbshipit-source-id: e1e8c34bb3d170407c0a793e2e585357f905d3c6
(cherry picked from commit 1ad5a7ceea17d00872e593650ef50d85bb232cda)
2022-01-19 23:37:15 +00:00
70c9146c40 [nnc] Update block and thread extents in cuda_codegen to use int64_t (#71428)
Summary:
The block and thread extent calculations in `cuda_codegen` should be using `int64_t` instead of `int`. The updated test, `test_dynamic_shapes`, fails without this change.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71428

Reviewed By: samdow

Differential Revision: D33640374

Pulled By: navahgar

fbshipit-source-id: 64c340ad2a9a1fa1fe066cf1c5dfc3b546b7be6d
(cherry picked from commit 6ea546ce116fc05d9d7e225bc29f7fe86be439de)
2022-01-19 23:21:24 +00:00
2dbbb1a921 [fx2trt] Issue warnings instead of error if there's possible const folding opportunities (#71031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71031

During the conversion stage, we might create some constants when size op is called and size is static. Raising error here causes problem for this case. Generally speaking it doesn't hurt to allow not const folding.

Test Plan:
Test with D33483843 on shufflenet.

Added unit tests.

Reviewed By: wushirong

Differential Revision: D33484183

fbshipit-source-id: 5b32c06297e56965befd7e83fe8ca273e3665cee
(cherry picked from commit e6b79bd3dd626f4b0035b9792a246fc09098d5ef)
2022-01-19 23:16:23 +00:00
61713acb07 Add trymerge workflow (#71488)
Summary:
This one, will react to `repo_dispatch` event sent by PyTorch Probot
when `pytorchbot merge this` command is issued

At the moment, workflow will only attempt to merge PRs which has not
been created from forked repo and that match rules defined in
`.github/merge_rules.json`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71488

Reviewed By: bigfootjon

Differential Revision: D33665142

Pulled By: malfet

fbshipit-source-id: e22daa1892523e62d7b7a941960636a6514cb7d7
(cherry picked from commit 92059bab073e2cd6ca6e9f946ffc2f956e22895c)
2022-01-19 23:11:48 +00:00
f45e217c01 Consolidate the overloads of TensorImpl::shallow_copy_and_detach (#68953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68953

This PR consolidates the almost identical lvalue and rvalue implementations of shallow_copy_and_detach into a single templated function.
ghstack-source-id: 147238376

Test Plan: Run existing unit tests.

Reviewed By: fduwjj

Differential Revision: D32679741

fbshipit-source-id: 89a870335d2e09ffd005c943733a787d20d352f9
(cherry picked from commit 750344c8600e05d4ab593956257c8191919eeef8)
2022-01-19 21:52:13 +00:00
805b7575db test //c10/... without Google libraries in OSS (#70853)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70853

We support both configurations, so we should ensure they both work.
ghstack-source-id: 147170900

Test Plan: This is adding a test to CI.

Reviewed By: malfet

Differential Revision: D33304505

fbshipit-source-id: 7074b6b98d05f60801bb1d74bc9ac1458c768d28
(cherry picked from commit 8e4134b77789a157be5ba3df1d07f9bb308ca3b6)
2022-01-19 20:56:12 +00:00
78e1f9db34 port //c10/macros to common build structure (#70852)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70852

This is the first change that uses a common build file, build.bzl, to
hold most of the build logic.
ghstack-source-id: 147170895

Test Plan: Relying on internal and external CI.

Reviewed By: malfet

Differential Revision: D33299331

fbshipit-source-id: a66afffba6deec76b758dfb39bdf61d747b5bd99
(cherry picked from commit d9163c56f55cfc97c20f5a6d505474d5b8839201)
2022-01-19 20:56:12 +00:00
661d10aab4 use c10/macros/cmake_macros.h in fbcode build (#70851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70851

This is a step towards OSS/fbcode convergence since OSS uses this file
in both CMake and Bazel.
ghstack-source-id: 147170896

Test Plan: Relying on the extensive CI internal tests for this.

Reviewed By: malfet

Differential Revision: D33299102

fbshipit-source-id: c650dd4755f8d696d5fce81c583d5c73782e3990
(cherry picked from commit 741ca140c82f728e3b349d703a7de239e5bbf13c)
2022-01-19 20:56:12 +00:00
bdeec0c7b6 [fx] add documentation to AccOpProperties (#71450)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71450

att

Test Plan: no test

Reviewed By: jfix71

Differential Revision: D33515471

fbshipit-source-id: ded40ca117f63c971d6c5ed4556932cc71c009ca
(cherry picked from commit a9f66d5921241645191c1df3292dc6e784860165)
2022-01-19 20:50:21 +00:00
7ce6db48e5 add rocm GHA workflow (#68552)
Summary:
cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68552

Reviewed By: bdhirsh

Differential Revision: D33569551

Pulled By: seemethere

fbshipit-source-id: cc7d68a22ad0eedd4d11eea3cf43a909e5b8616b
(cherry picked from commit 2bb701eb9d2c1ec79bf3f5b3e75cb7ec41fdeb4d)
2022-01-19 20:31:17 +00:00
15e7d18124 [jit][edge] Create convinience wrapper for dynamic type construcytors. (#71457)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71457

Today DynamicType is hard to be created because we have separare APIs for different types. In this diff we introduce an easier API to create types like the following:
```
#include <ATen/core/type_factory.h>

auto type = dynT<ListType>(dynT<TensorType>()); // etc...
```
ghstack-source-id: 147211236

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D33647746

fbshipit-source-id: c850cf31ae781244eac805906a2fc110ef065a70
(cherry picked from commit 8cfd51d75f010ca6f7f98b7e8ef807ead4d5f8f3)
2022-01-19 20:11:11 +00:00
ac26f8237c Allow disabling nvfuser without CUDA (#71358)
Summary:
On a CPU-only build of pytorch `torch._C._jit_set_nvfuser_enabled(False)` would throw an error (even though it is a no-op operation), with this fix:
```
>>> torch._C._jit_set_nvfuser_enabled(False)
False
>>> torch._C._jit_set_nvfuser_enabled(True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Running CUDA fuser is only supported on CUDA builds.
>>>
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71358

Reviewed By: eellison

Differential Revision: D33601135

Pulled By: jansel

fbshipit-source-id: c764df2fa197ce7b4f71e5df0a91cd988766e99c
(cherry picked from commit a801df93210302e918eca7134d3c0a19ac5bae5d)
2022-01-19 20:01:09 +00:00
214f4bf2ff Support sparse.sum on empty sparse tensor (#71091)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71091

Fixes https://github.com/pytorch/pytorch/issues/65394

The masked sum on a full input tensor (of any layout) with an all-true mask is the same as the sum on the strided input tensor (after applying `to_dense` to sparse inputs).
Since masked sum uses `torch.sparse.sum` then, for the simplicity of masked reductions implementations, its reduction behavior ought to be defined by the behavior of the `torch.sum`. This PR implements the behavioral connection with respect to the directional summation of empty sparse tensors that correspond to all-zero strided tensors.

cc nikitaved pearu cpuhrsch

Test Plan: Imported from OSS

Reviewed By: davidberard98

Differential Revision: D33651750

Pulled By: cpuhrsch

fbshipit-source-id: 703891bff88c8da6270b4272f5d2da81688db67d
(cherry picked from commit 53f97e80f7520594e9977ad61a1a727dadade645)
2022-01-19 18:58:08 +00:00
3b589c3497 [DDP Checkpointing] non-reentrant checkpoint tests (#69060)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69060

Saved variable hooks checkpointing was added in https://github.com/pytorch/pytorch/pull/69508, this PR adds some tests for DDP.

Specifically, we can support almost all DDP use cases with this new API, such as dynamic module with find_unused_parameters=True. One case remains to be supported, which is static_graph + non-reentrant based checkpointing. The underlying reason this does not work is https://github.com/pytorch/pytorch/issues/58111.
ghstack-source-id: 147219887

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D32712126

fbshipit-source-id: ba5ae9ca77fd8929ee020c7dc97838bae9a1931b
(cherry picked from commit 9c7f93e21728d1627d85c351a21e7c8da832bff7)
2022-01-19 18:09:41 +00:00
75aaa9f92b Remove simd qualifier for pragma omp loop in upsample_nearest_op.h (#71462)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71462

Fixes
```
      6 aienv/aienv_ig_reels_base:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
      6 deep_entity_classification/si_dec_gnn:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
      6 feed_recommendation_infra/multifeed_execution_graph_service_nosan:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
     12 mobile_cv/mobile-vision_experimental:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
     30 mobile_cv/mobile-vision_xraymobilev2_detection_caffe2:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
     42 aienv/aienv:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
    128 feed_recommendation_infra/multifeed_recagg_dev:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
    136 fluent2/fblearner_flow_projects_fluent2_nosan:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
   1338 f6/f6_nosan:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
```

Test Plan: Sandcastle

Reviewed By: luciang

Differential Revision: D33641869

fbshipit-source-id: 8424849cfac5cb0109272dec2086863067bbde66
(cherry picked from commit d18429905c7661486ed8ec0cdcdd7d94b9c62762)
2022-01-19 18:04:10 +00:00
908fd3d78b [fix] composite compliance: quantile and nanquantile (#70894)
Summary:
Reference https://github.com/pytorch/pytorch/issues/69991

Refactored such that only `out` variant copies the result into `out` otherwise we just return the result of the composite functions as is.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70894

Reviewed By: samdow

Differential Revision: D33641742

Pulled By: zou3519

fbshipit-source-id: 671be13b31a7fff3afc0b7976706a5ecfc51ccac
(cherry picked from commit e7d5ac9af319be327adc16d2d7048139a4b2ddd3)
2022-01-19 17:54:00 +00:00
a0ada2d22b Back out "[pytorch][PR] Performance and memory improvements to batched torch.linalg.solve" (#71421)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71421

Original commit changeset: 7a0dd443cd0e

Original Phabricator Diff: D33028236 (410e91adee)

Test Plan: PyTorch OSS CI

Reviewed By: ngimel

Differential Revision: D33637628

fbshipit-source-id: 1e81485be202b2f9d6a1ff315279cc099754c2dc
(cherry picked from commit c2d730bfeb2a9e4a3af1442b8d1fe5bf28a95f2b)
2022-01-19 17:26:01 +00:00
8a9243996c Lazy load pandas when importing pytorch (#71316)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/71313

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71316

Reviewed By: wenleix

Differential Revision: D33595043

Pulled By: malfet

fbshipit-source-id: da8c7a7f132696645191d7b7055c4c21970d92c3
(cherry picked from commit 2d4847780a4d26426d2300861069160836130063)
2022-01-19 17:02:50 +00:00
671a0b5376 Move sccache compilation log to its own group (#71444)
Summary:
The sccache compilation log is often misleading.

We can move it to its own group so people don't see it right away

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71444

Reviewed By: atalman

Differential Revision: D33659650

Pulled By: janeyx99

fbshipit-source-id: f22fd21640a8747beeacce8857bbb8281efd76f4
(cherry picked from commit e25970abf99801fc04d4ae15f8f5ffe63dd1dc41)
2022-01-19 16:47:36 +00:00
7ed2a43d26 Adding wheels with py3.10 (#71419)
Summary:
Adding wheels with py3.10

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71419

Reviewed By: janeyx99

Differential Revision: D33657770

Pulled By: atalman

fbshipit-source-id: 5d24f1771991ff07fbfd92d04d3d5211cf53084c
(cherry picked from commit bf2f2624e12821a417a17bd374e13fda5ab69724)
2022-01-19 16:40:39 +00:00
b56ba296b1 Support multiple input dims for sharded linear. (#70266)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70266

Addresses some of the issues mentioned in
https://github.com/pytorch/pytorch/issues/65638. ShardedLinear implementation
only support 2D inputs.

On the other hand `nn.Linear` supports arbitrary dimensions for inputs and
outputs. As a result, in this PR I've added support to ensure that
ShardedLinear supports arbitrary input dims as well.
ghstack-source-id: 147206607

Test Plan: waitforbuildbot

Reviewed By: wanchaol

Differential Revision: D33267630

fbshipit-source-id: 0460994c3aa33348b80547d9274206ef90cb29b6
(cherry picked from commit 7c289e1dbf491008e091ed0a49f98f2ebcfb4175)
2022-01-19 08:07:14 +00:00
fbc3b8c1bb [RPC] Fix a few flaky RPC tsan tests (#71460)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71460

When running with TSAN, we use a larger RPC timeout: https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/dist_utils.py#L68. As a result, the assertions here are invalid.

Tried to fix this by just setting `self.rpc_backend_options.rpc_timeout` to the new timeout, but `rpc_backend_options` is reconstructed every time it is accessed, so this doesn't work:: https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/distributed/rpc/tensorpipe_rpc_agent_test_fixture.py#L15

Just removing the asserts should be fine as they don't really add value to what's being tested.
ghstack-source-id: 147208455

Test Plan: CI

Reviewed By: fduwjj

Differential Revision: D33648421

fbshipit-source-id: 9a5052b1c851fe7f838792d8bdf17d0563b4aa00
(cherry picked from commit 96ddab3433aff88961236d2d64f2b685de1ccc15)
2022-01-19 06:12:43 +00:00
9515213070 [Operator Versioning] Remove version compare as they are decoupled now (#71461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71461

After operator versioning work, the version in model file is used for operator versioning, while bytecode_version is used for bytecode versioning (for bytecode schema). They are two seperate things now and this comparison is not needed.
ghstack-source-id: 147209286

Test Plan: CI

Reviewed By: iseeyuan, tugsbayasgalan

Differential Revision: D33648592

fbshipit-source-id: beaa136a728f88435176a00c07b2d521210f107f
(cherry picked from commit e90e650e1a5134473117eda802d679171e035082)
2022-01-19 04:51:45 +00:00
677fab6d1d Support broadcast_to on sparse COO tensors (#71073)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71073

cc nikitaved pearu cpuhrsch

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D33645744

Pulled By: cpuhrsch

fbshipit-source-id: 4775c9636c4e868022a8c1bbfec93e351d1cf885
(cherry picked from commit 640f21e09a935a1231b99ddd6472b03158bdc283)
2022-01-19 04:33:41 +00:00
9b9b878c89 Fixes jiterator cache macro include + updates CUDA note with cache variables (#71452)
Summary:
Per title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71452

Reviewed By: ngimel

Differential Revision: D33646495

Pulled By: mruberry

fbshipit-source-id: bbf627e6d7a724a83a3ea2ae9c0f50430f8d578e
(cherry picked from commit d1e72b144aad9607ce53c477b7edfdce17cfd1c0)
2022-01-19 03:45:05 +00:00
125bdb6d51 empty_meta: Add functions that don't depend on Tensor (#70615)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70615

This adds `at::detail::empty_meta` and
`at::detail::empty_strided_meta` to complement the cpu API.

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D33623678

Pulled By: ngimel

fbshipit-source-id: 59e003116361fb547ec2c633bbc15a7973e21d0e
(cherry picked from commit b4f5836fa106418755381abedf327125bde744ef)
2022-01-19 03:41:20 +00:00
b4a75af758 [fx2trt] Export some options out (#71315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71315

Add variables in LowerSetting to export options from TRTInterpreter and interpreter.run:
- explicit precision
- int8_mode

Export skip_folding_node_fn options from split_const_subgraphs.

Reviewed By: wushirong

Differential Revision: D33585385

fbshipit-source-id: 3d20b69d255ad97487e462436ae479587a8e2118
(cherry picked from commit f24a279517b16624a02d458e10275d78ec3d5699)
2022-01-19 02:13:31 +00:00
87215ed526 empty_strided: Factor out generic implementation (#70614)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70614

This creates an `empty_strided_generic` function which, similar to
`empty_generic`, is a device-independent tensor constructor. This also
adds `at::detail::empty_strided_cpu` to complement
`at::detail::empty_cpu`.

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D33623679

Pulled By: ngimel

fbshipit-source-id: 85994e88d664870bf425f398dfcdfc467885c694
(cherry picked from commit 2ff2a89df5752cfad667463aa3c3bffe8479ec9a)
2022-01-19 01:54:16 +00:00