Summary:
While implementing https://github.com/pytorch/pytorch/issues/70275, I thought that it will be useful if there is a `torch.distributions.constraints` to check the positive-semidefiniteness of matrix random variables.
This PR implements it with `torch.linalg.eigvalsh`, different from `torch.distributions.constraints.positive_definite` implemented with `torch.linalg.cholesky_ex`.
Currently, `torch.linalg.cholesky_ex` returns only the order of the leading minor that is not positive-definite in symmetric matrices and we can't check positive semi-definiteness by the mechanism.
cc neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71375
Reviewed By: H-Huang
Differential Revision: D33663990
Pulled By: neerajprad
fbshipit-source-id: 02cefbb595a1da5e54a239d4f17b33c619416518
(cherry picked from commit 43eaea5bd861714f234e9efc1a7fb571631298f4)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71491
Changed the Cudnn and CudnnTranspose cases to only make the input
contiguous when it is needed for the grad_weight computation.
Reading the implementation of cudnn_convolution_transpose_backward and
cudnn_convolution_backward give me confidence that `input` isn't used
for the grad_weight computation. However, the memory format logic is so
convoluted that I'm 100$ sure this correct. All the tests though
and on request I can directly pass `backend_memory_format` to
{cudnn_convolution_backward, cudnn_convolution_transpose_backward}.
Test Plan: - pytest test/test_nn.py -v -k "conv"
Reviewed By: jbschlosser
Differential Revision: D33664694
Pulled By: zou3519
fbshipit-source-id: 9f4929686fe34f7aaf5331bfa49e98022b9d6c08
(cherry picked from commit 9e2ba0daca88139f7941bcb56bbc23825585d7a2)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71490
Deleted unnecessary .contiguous() calls in convolution_backward. The
CudaDepthwise3d case always hits _depthwise_3d_backward_cuda_out,
which will make arguments contiguous as necessary.
Changed _depthwise_3d_backward_cuda_out
- to make the input contiguous only when we're computing grad_weight
- to make the weight contiguous only when we're computing grad_input
Test Plan: - pytest test/test_nn.py -v -k "conv"
Reviewed By: jbschlosser
Differential Revision: D33664696
Pulled By: zou3519
fbshipit-source-id: d01d4f213e21ef4778de089a158933737b191cdf
(cherry picked from commit c6eb977c94a07f9812567a43b125b453eb5c5051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71489
Deleted unnecessary .contiguous() calls in convolution_backward. The
CudaDepthwise2d case always hits conv_depthwise2d_backward_cuda_out,
which makes the grad_output / self contiguous.
Changed conv_depthwise2d_backward_cuda_out to change `self_` (aka the
image input to convolution) to be contiguous only when we're computing
the grad_weight. This is because when we are computing the grad_input,
we only need the values from the grad_output and the weight.
Test Plan: - pytest test/test_nn.py -v -k "conv"
Reviewed By: jbschlosser
Differential Revision: D33664697
Pulled By: zou3519
fbshipit-source-id: 7a755fa8a076809c5490422d69fdf7ed80c8e29a
(cherry picked from commit 862ae63bab74113b3607b1bbc0a82f27992550fe)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69437
linalg.{inv, cholesky} are problematic because they call .data_ptr().
This makes them not composite compliant (e.g. meta tensors will not run
on them correctly). This PR makes them composite compliant by adding a
new native_functions operator that does error checking,
`_linalg_check_errors(Tensor info, str api_name, bool is_matrix`
that is a primitive with respect to autograd.
This PR modifies linalg.inv and linalg.cholesky to call the new error
check function. I also needed to refactor singleCheckErrors and
batchCheckErrors to accept a c10::string_view instead of a
`const char*`; you can convert `const char*` to c10::string_view but not
the other way around because `string_view` does not require null
terminated buffers.
Finally, there is a bugfix in `__torch_dispatch__` for this PR for
the composite compliant testing mechanism. Previously,
`__torch_dispatch__` could not handle operators with no returns; this PR
fixes that. No returns in C++ is equivalent to a single None return in
Python.
Test Plan: - composite compliant tests
Reviewed By: albanD
Differential Revision: D32883666
Pulled By: zou3519
fbshipit-source-id: d5a3f52ebab116c93e1a54a203eacc8f787de7e2
(cherry picked from commit 9e24c9599a043877ab4f289469be55550c996a79)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70215
A few renaming, formatting, and additional tests to make the unit tests better.
cc VitalyFedyunin ejguan NivekT
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision: D33344610
Pulled By: NivekT
fbshipit-source-id: bb36f7452bdc44964c9ce0650c7ae308ba2c5aa5
(cherry picked from commit 0aae20cb27038b7b3598520db4304a604f1e6799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71453
As title
Test Plan: unit test
Reviewed By: frank-wei
Differential Revision: D33646384
fbshipit-source-id: d86326c93e4d6bd59c9152592721f0e6ecf7f6fb
(cherry picked from commit d886380edef3388d60d529100332f9d9564f0913)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68945
This PR enables the Python conversion functions for `Storage` (specifically `UntypedStorage`) and also cleans up some remnants of the deprecated typed storages from `DynamicTypes.cpp`.
ghstack-source-id: 147245110
Test Plan: Run the existing unit and integration tests.
Reviewed By: albanD
Differential Revision: D32676505
fbshipit-source-id: 3a3f6db4fb0da5c78dd406c96ab70bdc37015521
(cherry picked from commit d6427b94cf88b078bd228d43cd2afbabf0773b39)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70476
1) Support a single dimension for inputs
2) Test several error cases
Partially addresses https://github.com/pytorch/pytorch/issues/65638
ghstack-source-id: 146307607
Test Plan: waitforbuildbot
Reviewed By: fduwjj
Differential Revision: D33344357
fbshipit-source-id: 4de7a7177452951dbcce76f27441703447609e6f
(cherry picked from commit 96dfded5697e451b54f113f99b6d0da6f6af500d)
Summary:
Following subfolders of the project were identified as one that can be
merged on github first and then asynchronously merged into Meta
codebase:
## ONNX exporter
PRs that include only files under `torch/onnx`, `torch/csrc/jit/passes/onnx` and `test/onnx` and are reviewed by garymm
## CUDA fusers
PRs that include only files under `torch/csrc/jit/codegen/fuser/cuda`, `torch/csrc/jit/codegen/cuda` or `benchmarks/cpp/nvfuser` and reviewed by csarofeen or ngimel
## OSS CI
PR that include only files under `.circleci`, `.github` and `.jenkins` and reviewed either by seemethere or myself
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71514
Reviewed By: bigfootjon
Differential Revision: D33673050
Pulled By: malfet
fbshipit-source-id: 21b909d49cb73ff79879b3ea0568e53ef65aa08c
(cherry picked from commit 520226c1bf341fe6a9e1cd42f18da73c43386062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71140
Structured kernels need to use the borrowing variants of the build APIs to TensorIterator. (I am working on a debug check for this, but it is currently too strict, and relaxing it does not catch these bugs.)
ghstack-source-id: 147191022
Test Plan: CI
Reviewed By: bdhirsh
Differential Revision: D33520003
fbshipit-source-id: 3b0ff9036acdb78ae6fc7489ed0ed487d5ff080f
(cherry picked from commit 80ef4e14e33718a9ad5aaefc218bb773e3b15a5c)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71423
Replacing this math with a load seems to improve perf.
ghstack-source-id: 147171800
Test Plan: ptvsc2_predictor_bench runs on model from mikeiovine courtesy of mikeiovine
Reviewed By: mikeiovine, xiaomengy
Differential Revision: D33552176
fbshipit-source-id: f21a4cd66c13b9fcb7bcf48f356bdc85e94c4216
(cherry picked from commit 0354fcb9889e7345321fe4dc9e30495a67709a4d)
Summary:
From https://github.com/pytorch/pytorch/issues/67626: RRefProxy (rref.rpc_async, rref.rpc_sync, rref.remote) currently uses a blocking RPC call to the owner
This is done by chaining async calls. In the sync case we wait on the
resulting Future.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70206
Test Plan:
I ran rpc_tests using tensorpipe_rpc_agent_test_fixture.py and had to
adjust test_rref_proxy_timeout to the new behavior.
I ran into test_tensorpipe_set_default_timeout failing due to the
timeout being too small. Doesn't look related to this change.
mrshenli
Fixes https://github.com/pytorch/pytorch/issues/67626
cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang
Reviewed By: pritamdamania87
Differential Revision: D33243348
Pulled By: kumpera
fbshipit-source-id: e1e8c34bb3d170407c0a793e2e585357f905d3c6
(cherry picked from commit 1ad5a7ceea17d00872e593650ef50d85bb232cda)
Summary:
The block and thread extent calculations in `cuda_codegen` should be using `int64_t` instead of `int`. The updated test, `test_dynamic_shapes`, fails without this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71428
Reviewed By: samdow
Differential Revision: D33640374
Pulled By: navahgar
fbshipit-source-id: 64c340ad2a9a1fa1fe066cf1c5dfc3b546b7be6d
(cherry picked from commit 6ea546ce116fc05d9d7e225bc29f7fe86be439de)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71031
During the conversion stage, we might create some constants when size op is called and size is static. Raising error here causes problem for this case. Generally speaking it doesn't hurt to allow not const folding.
Test Plan:
Test with D33483843 on shufflenet.
Added unit tests.
Reviewed By: wushirong
Differential Revision: D33484183
fbshipit-source-id: 5b32c06297e56965befd7e83fe8ca273e3665cee
(cherry picked from commit e6b79bd3dd626f4b0035b9792a246fc09098d5ef)
Summary:
This one, will react to `repo_dispatch` event sent by PyTorch Probot
when `pytorchbot merge this` command is issued
At the moment, workflow will only attempt to merge PRs which has not
been created from forked repo and that match rules defined in
`.github/merge_rules.json`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71488
Reviewed By: bigfootjon
Differential Revision: D33665142
Pulled By: malfet
fbshipit-source-id: e22daa1892523e62d7b7a941960636a6514cb7d7
(cherry picked from commit 92059bab073e2cd6ca6e9f946ffc2f956e22895c)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68953
This PR consolidates the almost identical lvalue and rvalue implementations of shallow_copy_and_detach into a single templated function.
ghstack-source-id: 147238376
Test Plan: Run existing unit tests.
Reviewed By: fduwjj
Differential Revision: D32679741
fbshipit-source-id: 89a870335d2e09ffd005c943733a787d20d352f9
(cherry picked from commit 750344c8600e05d4ab593956257c8191919eeef8)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70853
We support both configurations, so we should ensure they both work.
ghstack-source-id: 147170900
Test Plan: This is adding a test to CI.
Reviewed By: malfet
Differential Revision: D33304505
fbshipit-source-id: 7074b6b98d05f60801bb1d74bc9ac1458c768d28
(cherry picked from commit 8e4134b77789a157be5ba3df1d07f9bb308ca3b6)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70852
This is the first change that uses a common build file, build.bzl, to
hold most of the build logic.
ghstack-source-id: 147170895
Test Plan: Relying on internal and external CI.
Reviewed By: malfet
Differential Revision: D33299331
fbshipit-source-id: a66afffba6deec76b758dfb39bdf61d747b5bd99
(cherry picked from commit d9163c56f55cfc97c20f5a6d505474d5b8839201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70851
This is a step towards OSS/fbcode convergence since OSS uses this file
in both CMake and Bazel.
ghstack-source-id: 147170896
Test Plan: Relying on the extensive CI internal tests for this.
Reviewed By: malfet
Differential Revision: D33299102
fbshipit-source-id: c650dd4755f8d696d5fce81c583d5c73782e3990
(cherry picked from commit 741ca140c82f728e3b349d703a7de239e5bbf13c)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71450
att
Test Plan: no test
Reviewed By: jfix71
Differential Revision: D33515471
fbshipit-source-id: ded40ca117f63c971d6c5ed4556932cc71c009ca
(cherry picked from commit a9f66d5921241645191c1df3292dc6e784860165)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71457
Today DynamicType is hard to be created because we have separare APIs for different types. In this diff we introduce an easier API to create types like the following:
```
#include <ATen/core/type_factory.h>
auto type = dynT<ListType>(dynT<TensorType>()); // etc...
```
ghstack-source-id: 147211236
Test Plan: CI
Reviewed By: iseeyuan
Differential Revision: D33647746
fbshipit-source-id: c850cf31ae781244eac805906a2fc110ef065a70
(cherry picked from commit 8cfd51d75f010ca6f7f98b7e8ef807ead4d5f8f3)
Summary:
On a CPU-only build of pytorch `torch._C._jit_set_nvfuser_enabled(False)` would throw an error (even though it is a no-op operation), with this fix:
```
>>> torch._C._jit_set_nvfuser_enabled(False)
False
>>> torch._C._jit_set_nvfuser_enabled(True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: Running CUDA fuser is only supported on CUDA builds.
>>>
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71358
Reviewed By: eellison
Differential Revision: D33601135
Pulled By: jansel
fbshipit-source-id: c764df2fa197ce7b4f71e5df0a91cd988766e99c
(cherry picked from commit a801df93210302e918eca7134d3c0a19ac5bae5d)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71091
Fixes https://github.com/pytorch/pytorch/issues/65394
The masked sum on a full input tensor (of any layout) with an all-true mask is the same as the sum on the strided input tensor (after applying `to_dense` to sparse inputs).
Since masked sum uses `torch.sparse.sum` then, for the simplicity of masked reductions implementations, its reduction behavior ought to be defined by the behavior of the `torch.sum`. This PR implements the behavioral connection with respect to the directional summation of empty sparse tensors that correspond to all-zero strided tensors.
cc nikitaved pearu cpuhrsch
Test Plan: Imported from OSS
Reviewed By: davidberard98
Differential Revision: D33651750
Pulled By: cpuhrsch
fbshipit-source-id: 703891bff88c8da6270b4272f5d2da81688db67d
(cherry picked from commit 53f97e80f7520594e9977ad61a1a727dadade645)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69060
Saved variable hooks checkpointing was added in https://github.com/pytorch/pytorch/pull/69508, this PR adds some tests for DDP.
Specifically, we can support almost all DDP use cases with this new API, such as dynamic module with find_unused_parameters=True. One case remains to be supported, which is static_graph + non-reentrant based checkpointing. The underlying reason this does not work is https://github.com/pytorch/pytorch/issues/58111.
ghstack-source-id: 147219887
Test Plan: CI
Reviewed By: zhaojuanmao
Differential Revision: D32712126
fbshipit-source-id: ba5ae9ca77fd8929ee020c7dc97838bae9a1931b
(cherry picked from commit 9c7f93e21728d1627d85c351a21e7c8da832bff7)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71462
Fixes
```
6 aienv/aienv_ig_reels_base:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
6 deep_entity_classification/si_dec_gnn:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
6 feed_recommendation_infra/multifeed_execution_graph_service_nosan:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
12 mobile_cv/mobile-vision_experimental:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
30 mobile_cv/mobile-vision_xraymobilev2_detection_caffe2:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
42 aienv/aienv:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
128 feed_recommendation_infra/multifeed_recagg_dev:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
136 fluent2/fblearner_flow_projects_fluent2_nosan:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
1338 f6/f6_nosan:caffe2/modules/detectron/upsample_nearest_op.h:65:1: error: loop not vectorized: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Werror,-Wpass-failed=transform-warning]
```
Test Plan: Sandcastle
Reviewed By: luciang
Differential Revision: D33641869
fbshipit-source-id: 8424849cfac5cb0109272dec2086863067bbde66
(cherry picked from commit d18429905c7661486ed8ec0cdcdd7d94b9c62762)
Summary:
Reference https://github.com/pytorch/pytorch/issues/69991
Refactored such that only `out` variant copies the result into `out` otherwise we just return the result of the composite functions as is.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70894
Reviewed By: samdow
Differential Revision: D33641742
Pulled By: zou3519
fbshipit-source-id: 671be13b31a7fff3afc0b7976706a5ecfc51ccac
(cherry picked from commit e7d5ac9af319be327adc16d2d7048139a4b2ddd3)
Summary:
The sccache compilation log is often misleading.
We can move it to its own group so people don't see it right away
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71444
Reviewed By: atalman
Differential Revision: D33659650
Pulled By: janeyx99
fbshipit-source-id: f22fd21640a8747beeacce8857bbb8281efd76f4
(cherry picked from commit e25970abf99801fc04d4ae15f8f5ffe63dd1dc41)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70266
Addresses some of the issues mentioned in
https://github.com/pytorch/pytorch/issues/65638. ShardedLinear implementation
only support 2D inputs.
On the other hand `nn.Linear` supports arbitrary dimensions for inputs and
outputs. As a result, in this PR I've added support to ensure that
ShardedLinear supports arbitrary input dims as well.
ghstack-source-id: 147206607
Test Plan: waitforbuildbot
Reviewed By: wanchaol
Differential Revision: D33267630
fbshipit-source-id: 0460994c3aa33348b80547d9274206ef90cb29b6
(cherry picked from commit 7c289e1dbf491008e091ed0a49f98f2ebcfb4175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71461
After operator versioning work, the version in model file is used for operator versioning, while bytecode_version is used for bytecode versioning (for bytecode schema). They are two seperate things now and this comparison is not needed.
ghstack-source-id: 147209286
Test Plan: CI
Reviewed By: iseeyuan, tugsbayasgalan
Differential Revision: D33648592
fbshipit-source-id: beaa136a728f88435176a00c07b2d521210f107f
(cherry picked from commit e90e650e1a5134473117eda802d679171e035082)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70615
This adds `at::detail::empty_meta` and
`at::detail::empty_strided_meta` to complement the cpu API.
Test Plan: Imported from OSS
Reviewed By: samdow
Differential Revision: D33623678
Pulled By: ngimel
fbshipit-source-id: 59e003116361fb547ec2c633bbc15a7973e21d0e
(cherry picked from commit b4f5836fa106418755381abedf327125bde744ef)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70614
This creates an `empty_strided_generic` function which, similar to
`empty_generic`, is a device-independent tensor constructor. This also
adds `at::detail::empty_strided_cpu` to complement
`at::detail::empty_cpu`.
Test Plan: Imported from OSS
Reviewed By: samdow
Differential Revision: D33623679
Pulled By: ngimel
fbshipit-source-id: 85994e88d664870bf425f398dfcdfc467885c694
(cherry picked from commit 2ff2a89df5752cfad667463aa3c3bffe8479ec9a)